/srv/irclogs.ubuntu.com/2016/05/16/#juju-dev.txt

mupBug #1582065 opened: Juju seems confused about a newly defined MAAS cloud <juju-core:New> <https://launchpad.net/bugs/1582065>03:46
fwereadeanastasiamac, the only thing that springs to mind is to stop the machine agent, remove all evidence of the failed charm, and let the machine agent come up and deploy it again06:44
fwereadeanastasiamac, (would have been better to do that first, but I can't immediately think of anything that the dying principal would break)06:46
fwereadeanastasiamac, (my working theory is that worker/deployer failed half way through and left garbage it subsequently interpreted as a successful deployment)06:49
anastasiamacfwereade: thnx.. i htink it's on state server... so m not sure what the implications would be..07:12
anastasiamaci mena the unit was deployed on state server machine07:12
bradmfwereade: the only thing I can find is the agent.conf, so you're suggesting remove that?07:16
bradmfwereade: as in /var/lib/juju/agents/unit-<foo>/agents.conf, I couldn't find any other evidence of it on disk07:17
bradmfwereade: and it is deployed to a state server07:17
mupBug #1582105 opened: lxd provider doesn't honour memory constraints <juju-core:New> <https://launchpad.net/bugs/1582105>07:38
fwereadebradm, anastasiamac: also check for the init system conf07:41
fwereadebradm, anastasiamac: state server shouldn't matter, it's not super-elegant to stop it but it shouldn't do any harm07:42
bradmfwereade: no init system conf for it07:42
bradmfwereade: so you're saying move the /var/lib/juju/agents/unit-<foo> and restart the machine agent?07:43
anastasiamacfwereade: bradm: got to run - family, dinner, etc... but i'll read the scroll \o/07:43
fwereadebradm, I think so, just reading the deployer code to see what I might be missing07:43
bradmfwereade: its odd, because this is the only subordinate of its type that failed07:44
fwereadebradm, also move the tools dir for that unit07:45
bradmfwereade: ah, the symlink?  can do07:46
fwereadebradm, (shouldn't *really* matter, but it's another thing the deployer is responsible for)07:46
fwereadebradm, what should happen when you start the machine agent is that it'll see unit-dying, nothing-deployer, and advance the unit to dead immediately07:46
fwereadebradm, ...wait a mo07:47
bradmfwereade: ugh, I've already done that07:47
fwereadebradm, ah no don't worry :)07:48
bradmnothing seems to have changed, though.07:48
bradmjujud is chewing away at cpu though07:48
fwereadebradm, anything from juju.worker.deployer in the logs?07:49
bradmfwereade: nope07:49
bradmfwereade: the last thing was from juju.worker.firewaller07:52
fwereadebradm, sorry, just to be clear on the syncing: from a fresh start of the state server, we see *nothing* in the juju.worker.deployer logs? but e.g. status shows units assigned to the machine?07:53
fwereadebradm, I would expect at least a "checking unit %q" message or two...07:54
bradmfwereade: I don't see anything about "checking unit" in the machine logs at all07:55
bradmfwereade: oh, no, I'm wrong07:55
fwereadebradm, phew :)07:55
fwereadebradm, at least some part of the universe makes sense ;p07:55
bradmfwereade: there's some from quite a number of hours ago, nothing for my problematic subordinate07:55
bradmfwereade: I'm getting quite a bit of stuff thrown out to juju debug-log though, seemingly all from units07:56
fwereadebradm, huh, I don't see how it could have messed up the deploy in the first place without at least noticing and checking it07:56
bradmfwereade: this jujud is seemingly quite flakey07:56
bradmI seem to get a lot of write: broken pipe in the logs07:57
fwereadebradm, what sort of things are happening? that doesn't feel *too* unexpected if we're bouncing state servers07:57
bradmfwereade: well, for a start a subordinate didn't deploy cleanly.. :)07:58
fwereadebradm, indeed :)07:58
bradmfwereade: this was landscape-client, and only on the state server did it fail07:58
bradmugh, I'm going to run out of day, will have to disappear for dinner soon07:59
fwereadebradm, can you paste any of the logs anywhere for me to take a look at?08:00
bradmthere's a bunch of juju.apiserver logs about "login for machine-x-lxc-y blocked because upgrade in progress"08:00
bradmwhich is odd, because it was deployed with 1.25.5 and hasn't been upgraded08:00
fwereadebradm, what that really means is the state server is still waking up and isn't yet certain that it's fully upgraded08:02
fwereadebradm, if it keeps saying that for a long time it's a problem, though08:02
bradmmacgreagoir: fancy seeing you here!08:04
macgreagoirbradm: :-D08:05
frobwaredimitern: ping08:28
dimiternfrobware: pong08:28
frobwaredimitern: do you have 15 mins before standup - would like to sync on the private-address issue08:29
dimiternfrobware: ok, in :45 ?08:30
frobwaredimitern: fine, thx08:31
hoenirdoes anyone have time to review my patch please?08:40
rogpeppe1dimitern, fwereade: so what's with the smiley face on juju.fail? i can still see critical bugs open...08:40
dimiternhoenir: well, beta7 is out, that's why master is unblocked08:41
dimiternhoenir: sorry :) that ^^ was for rogpeppe108:41
rogpeppe1dimitern: so master doesn't get blocked on critical bugs any more?08:41
rogpeppe1dimitern: woo!08:41
dimiternhoenir: I was about to say that most of us are less familiar with the windows support code and I'd suggest asking for a review from e.g. gsamfira or bogdanteleaga as they're most familiar with that code08:42
dimiternrogpeppe1: only for those with 'ci' and 'blocker' tags IIRC08:42
rogpeppe1dimitern: ah, cool08:43
* rogpeppe1 hits $$merge$$ and hopes08:43
dimiternfrobware: joining standup ho now08:43
rogpeppe1dimitern: i thought it was gonna be months until i was able to land my "high" importance bug fix :)08:43
dimiternrogpeppe1: merge while you can :D08:44
hoenirdimitern, bogdanteleaga already reviewed my code and it said it was ok, but he advised me also to seek some more review on irc.08:45
rogpeppe1dimitern: BTW, i just saw this comment in the ec2 provider: // TODO(dimitern): Both of these shouldn't be restricted for hosted models.08:54
rogpeppe1dimitern: i think that, for the time being at least, region should remain restricted08:55
dimiternrogpeppe1: why so?08:55
rogpeppe1dimitern: because we currently rely on the fact that models will be deployed to the same region08:55
rogpeppe1dimitern: and there's no mechanism in place to determine "default" region08:56
dimiternrogpeppe1: I see, well good that I haven't made 'region' unrestricted :)08:59
babbageclunkhoenir: I've updated my review.08:59
rogpeppe1dimitern: looks like vpc id is still restricted thougth08:59
dimiternrogpeppe1: 'vpc-id-force' is09:00
rogpeppe1dimitern: ah, what does that do?09:00
dimiternrogpeppe1: forces juju use 'vpc-id' at bootstrap only09:00
rogpeppe1dimitern: sorry, i don't understand that09:01
babbageclunkWhy is it so hard to get from a PR to the originating branch in github? I always end up editing the URL.09:01
hoenirbabbageclunk, thanks again09:01
dimiternvoidspace, babbageclunk: standup?09:02
babbageclunkShouldn't the branches in "hoenirvili  wants to merge 1 commit into juju:master from hoenirvili:refactor-userdata-cloudconfig" be links?09:02
voidspacedimitern: omw09:02
babbageclunkOh yeah, sorry - too much ranting.09:02
fwereadehoenir, made a couple of comments09:05
fwereadehoenir, hmm, (I should probably look at the filetoconst one more closely if it wasn't just a move)09:05
TheMuemorning, folks09:07
hoenirbabbageclunk, fwereade , thanks again for the comments and reviwing my code, I will start working right now to fix the mistakes.09:08
fwereadehoenir, yw, ping me if anything is not clear09:17
hoenirfwereade, after the modification I will ping you for more advice ..09:20
fwereadehoenir, cheers09:22
voidspacedimitern: babbageclunk: frobware: http://reviews.vapour.ws/r/4838/09:23
voidspacedooferlad: http://reviews.vapour.ws/r/4838/09:24
dimiternvoidspace: that looks like it includes the spaces PR as well/09:26
dimitern?09:26
voidspacedimitern: oh yes it does - sorry09:26
voidspacedimitern: it requires it, and that is still landing09:26
voidspacedimitern: https://github.com/juju/juju/pull/539409:26
voidspacedooferlad: that PR includes the "already reviewed but not yet landed" spaces PR - https://github.com/juju/juju/pull/539409:27
dimiternvoidspace: cheers09:27
babbageclunkvoidspace - You can use rbt to change the base commit for the review so it doesn't include the other changes.09:31
voidspacebabbageclunk: ah, ok09:31
voidspacehmm, if rbt worked for me09:32
voidspaceit's python 2 code and my default python is now 3 (xenial) - I *assume* that is the cause orf "no module named datetime" anyway09:32
voidspacerbt is in a virtualenv I can blow it away and restart09:32
dimiternvoidspace: what happens with AddSpace when you try to add a space with yet-unknown name, but providerID matching existing space?09:35
voidspacedimitern: the same as before09:36
voidspacedimitern: how can the name be unknown?09:36
dimiternvoidspace: won't you get ErrAborted from runTransaction due to txn.DocMissing failed for the providerID ?09:36
voidspacedimitern: just looking09:36
dimiternvoidspace: I mean add space "foo" with providerID "42", then try adding space "bar" also with provID "42"09:36
voidspacedimitern: it fails09:37
voidspacedimitern: there's a test for that09:37
voidspacedimitern: it fails with ProviderId not unique09:37
dimiternvoidspace: hmm because of the refresh + check notfound yeah I see09:38
dimiternvoidspace: ok, just checking :)09:38
voidspacedimitern: it would have been nice to avoid that Refresh09:38
dimiternnice when we have tests that already cover this case - makes refactoring with impunity possible :D09:39
voidspaceyeah :-)09:39
fwereadevoidspace, dimitern: Refresh-inside-txn is unhelpful anyway, it means that any call that changes remote state might also change arbitrary local state09:39
voidspacethat test failed initially - I had to move the Refresh code09:39
dimiternvoidspace: you can - looking up the providerIDsC doc by the providerID + global key of the space09:39
voidspacefwereade: it's "internal local state"09:39
dimiterni.e. findid()09:39
voidspacefwereade: the space we're refreshing is one created inside AddSpace and not returned09:39
fwereadevoidspace, dimitern: ah cool09:40
dimiternfwereade: yeah, I think the refresh there is relatively safe, as it's adding a new doc09:40
voidspacewe're already in an error path at that point09:40
fwereadevoidspace, dimitern: I *would* say that the notion of an error path doesn't quite fit right with txns -- the loop should be (1) check everything in-memory, (2) try to apply change with matching asserts09:41
fwereadevoidspace, dimitern: so just always reading the space doc fresh might be cleanest?09:42
fwereade...I should look at the code before pontificating too much09:42
dimiternfwereade: that's not a bad idea09:42
voidspacefwereade: what we're checking for in the assert is that another space with a different name doesn't have the same provider id09:42
voidspaceso it isn't in memory09:42
voidspaceand when we get ErrAborted we need to know why it failed to return the right error (message)09:42
dimiternfwereade: however, in that particular case we have no loop (i.e. no buildTxn, but a single try-check-erraborted-and-fail-accordingly)09:43
voidspacethe refresh failing confirms that it failed due to a duplicate provider id and not because the name already existed09:43
voidspacethe refresh is unnecessary - we've already checked for the "already exists" case so it should be safe to *assume* that it was the other assert that failed09:44
voidspacewe used to *have to refresh* because the failure was silent, no assert failure, so we had to check the insert worked by refreshing09:44
dimiternvoidspace: hmm not quite though, we have 3 asserts - model alive, doc missing for spacesC, doc missing for providerIDsC09:44
voidspaceah yes09:44
voidspacethe refresh is now only in the error path though, not on the happy path09:45
dimiternuhm09:45
voidspaceso it's already better09:45
dimiternactually also 1 assert per subnet09:45
dimiternwhen subnetids is not empty09:45
fwereadevoidspace, and that's the trouble with assuming based on asserts :) it's way too easy to lose track of the assumptions09:45
voidspaceyes, but we check for that before the refresh as well09:45
voidspaceso we'll leave the refresh in place then09:45
voidspaceit's an unlikely (theoretically impossible) failure mode anyway...09:46
* dimitern brb09:46
* fwereade has seen the code, and, yeah, that Aborted branch is too long -- IMO it's crying out to be restructured to loop over: (1) check provider id not already used (2) check subnets all exist (3) shouldn't we check that we're not sneaking subnets away from other spaces?09:50
dimiternfwereade: that sounds better and more thorough09:51
fwereadethen you (1) check sanity with a bunch of cheap reads instead of firing off an expensive txn without any reason to believe it works and (2) just let the runner mechanism handle the details09:52
fwereadeif you've messed up you'll get an ErrExcessiveContention which usually actually means that your memory checks don't match your asserts09:52
dimiternvoidspace: how about fixing that ^^ in a follow-up?09:53
fwereadedimitern, voidspace: the other reason to favour that structure, btw, is so you can build up your asserts alongside your memory checks09:53
fwereadedimitern, voidspace: having to dupe the logic sucks almost unbearably hard anyway09:53
dimiternfwereade: in the attempt > 0 case in buildTxn you mean?09:54
fwereadedimitern, voidspace: having the memory version of the code far away from the assert version is a recipe for screwups09:54
fwereadedimitern, I don't think you need to check attempt, do you?09:54
dimiternfwereade: even to know when to re-eval your local state?09:55
fwereadedimitern, yeah, but I thought there wasn't any here?09:56
fwereadedimitern, voidspace: https://github.com/juju/juju/wiki/mgo-txn-example attempts to exemplify what I'm talking about09:56
dimiternfwereade: it's not there in AddSpace code, but I reading back I see what you mean - nice example btw! thanks09:58
fwereadedimitern, cheers09:58
dimiternfwereade: any particular preference between txn.Op{.., Insert: docType{}} vs txn.Op{.., Insert: &docType{}} ?10:02
dimiternI see the former is more common10:03
fwereadedimitern, I don't much care really -- I have a slight preference for passing values rather than pointers, to signal that modifications are not expected/intended10:03
fwereadedimitern, because I *have* seen people silently changing fields when given access10:04
dimiternfwereade: right, it also looks somewhat cleaner with a value10:04
fwereadedimitern, indeed10:04
fwereadedimitern, the only argument against I can think of is that it might take more memory -- and people *do* keep bringing this up -- but it kinda baffles me. maybe when your struct is 4k big or something? in the meantime we have gigs available, let's not fret ;)10:05
dimiternfwereade: yeah, I guess for storing binary blobs it makes more sense10:06
dimiternbut even then..10:06
fwereadedimitern, well, yeah, and if it's a slice it won't copy anyway10:06
dimiternfwereade: what really scares me are maps as docs actually (or subdocs)10:07
fwereadedimitern, what's the case you need to worry about... non-pointer receivers on methods backed by array types? I think that copies the whole thing10:07
fwereadedimitern, yeah, indeed10:07
dimiternvoidspace: reviewed10:08
fwereadedimitern, fixed a bug just the other day with internal maps leaking out uncopied10:08
dimiternfwereade: nasty :/10:08
fwereadedimitern, even then, I will worry about that when I write a type like that ;)10:08
fwereadedimitern, but anyway10:09
dimiternfwereade: yeah :)10:09
fwereadedimitern, I'm pretty sure our txn composers copy internally anyway10:09
dimiternfwereade: it seems they do10:14
dimiternvoidspace, babbageclunk, dooferlad: a tiny review http://reviews.vapour.ws/r/4839/, please take a look10:57
dimiternthanks babbageclunk!11:04
babbageclunkdimitern: hmm - something interesting.11:23
dimiternbabbageclunk: yeah?11:23
babbageclunkdimitern: because the state tests use Reset to teardown and re-setup in tests...11:24
babbageclunkdimitern: and cleanups don't get removed in teardown...11:24
babbageclunkdimitern: I end up seeing repeated calls to state.Close()11:25
dimiternbabbageclunk: and panics I presume?11:25
babbageclunkdimitern: nope11:26
dimiternbabbageclunk: by "cleanups" do you mean the cleanups collection?11:26
voidspacedimitern: fwereade: thanks for the reviews by the way11:30
fwereadevoidspace, sorry it was only a glance, I think you and dimitern have it in hand11:30
babbageclunkdimitern: Yes, functions registered in suite.addCleanup11:30
voidspacefwereade: cool11:30
babbageclunkdimitern: I'm going to change CleanupSuite to set them to nil after executing them.11:31
dimiternbabbageclunk: ah, that's different - that's for bits changed by PatchValue etc.11:31
fwereadebabbageclunk, multiple Close()s should be fine/safe, but we should have got them all out of the way before we hit Reset, I think11:31
dimiternbabbageclunk: setting that to nil without un-patching everything sounds really bad11:33
babbageclunkdimitern: Executing the cleanup *is* unpatching everything.11:33
dimiternbabbageclunk: or you mean they're already un-patched, but the slice is never nil11:33
babbageclunkdimitern: yes, that11:33
dimiternbabbageclunk: right, then it sounds safe11:33
babbageclunkfwereade: What do you mean by "should have got them all out of the way before Reset"?11:34
babbageclunkfwereade: Aren't the closes being done by Reset (which calls TearDownTest)?11:35
fwereadebabbageclunk, hm, maybe I have the wrong context, code reference?11:36
dimiternbabbageclunk: if anything, it should be the other way around: TearDownTest is called by gocheck, which probably calls Reset11:36
fwereadebabbageclunk, I was thinking of the nuke-db-state Reset? which I think undercuts everything else11:36
fwereadebabbageclunk, the dummy provider will do it to you in the middle of your tests if you're not careful >_<11:37
babbageclunkfwereade: oh, no - this is a method on the allwatcher_internal_test that calls teardown and then setup - it's called in the middle of a test (because the test is running multiple scenarios).11:38
fwereadebabbageclunk, oh ffs :(11:38
fwereadebabbageclunk, don't suppose I can prevail upon you to just pull out the tests that so clearly want to be separate and deserve a test runner written for the job?11:38
babbageclunkfwereade: maybe? I'm not sure I get the second bit.11:40
fwereadebabbageclunk, insight of katco's, crudely paraphrased by me: table-based tests are rubbish because you're writing your own test runner and that's a waste of effort11:41
fwereadebabbageclunk, TBTs do have their place but it's an increidbly narrow simple one11:41
fwereadebabbageclunk, if it doesn't look like a table any more, it's probably too complex to be a good TBT IMO11:41
fwereadebabbageclunk, also, imagine how nice it will be when the tests just have their own names11:42
dimiternbabbageclunk: i.e. instead of that test calling TearDownTest inside, it should have a cleanSetup and cleanCleanup helpers that do not depend on calling SetUpTest or TearDownTest11:42
fwereadebabbageclunk, and failures get reported individually11:42
fwereadebabbageclunk, and you don't need to find your failure in the middle of 30 pages of successful logs11:42
babbageclunkfwereade: Yeah, that makes sense - I think the reason for this one is that the tests are parameterised into allModelWatcher and allwatcher flavours.11:43
babbageclunkfwereade: But I'll have a go.11:43
fwereadebabbageclunk, bleh, I see -- thanks11:44
fwereadedimitern, I'm more saying, use SetUpTest and TearDownTest as expected, and write 50 TestFooBarBaz methods that don't abuse their infrastructure11:45
dimiternfwereade: I know :) I was just trying to make it slightly easier for babbageclunk11:45
fwereadedimitern, that way you get to see which bits failed, and see their logs in isolation, and not have to worry about Assert cutting short a run11:45
fwereadeetc :)11:45
babbageclunkfwereade: But still, I think the point is a good one - I'll see if I can figure out a better way of making the tests split out.11:46
fwereadebabbageclunk, cheers11:47
dimiternbabbageclunk: might help using multiple suites to better separate concerns11:47
dimiternand minimize code duplication around setup/teardown11:47
dimiternbabbageclunk: beware though, if you're embedding a "allWatcherBaseSuite" into e.g. "allModelWatcherSuite", and both of those are registered in gocheck (gc.Suite(..)), you'll get run all tests for both suites11:49
fwereadebabbageclunk, I admit I'm not really aware of a good pattern for running the same suite against different fixtures -- everything I can immediately think of smells too much of trying to do inheritance in golang, which is pretty much always a terrible idea11:50
babbageclunkfwereade: Well, at the bottom level I can give each of the not-quite-test funcs a name and then multiply out the actual tests.11:52
babbageclunkfwereade: then it's just a matter of editor automation11:52
fwereadebabbageclunk, yeah, I think that's progress11:52
fwereadebabbageclunk, dimitern: also, yeah -- I am getting increasingly irritated at suites that are also fixtures11:54
dimiternfwereade: me too!11:54
dimiternfwereade: I've fixed a bunch of those recently, around the provisioner11:55
fwereadebabbageclunk, dimitern: have been making a point of `(*FooSuite)` receivers and explicit fixture setup and I think it's working out pretty well11:55
fwereadedimitern, nice11:55
babbageclunkfwereade, dimitern: Oh, cool - I've been mostly doing that for a bit in Python too (since JB pointed me at a blog post about it)11:58
babbageclunkHmm, that said though - allwatcher_internal_test.go is 3244 lines, so maybe I'll rewrite them all in a separate diff.11:59
katcoanastasiamac: ericsnow: perrito666: meeting time12:04
fwereadebabbageclunk, holy shit :(12:07
katcoanastasiamac: perrito666: hello?12:08
perrito666katco: ericsnow natefinch anastasiamac getting there, I sent an email about being late12:15
mupBug #1582214 opened: upgrade-juju output is confusing <juju-core:New> <https://launchpad.net/bugs/1582214>12:45
rogpeppe1i'm after a review of this change to the GCE provider if anyone wants to help out, thanks! http://reviews.vapour.ws/r/4840/12:45
=== rogpeppe1 is now known as rogpeppe
voidspacedimitern: PR updated12:56
dimiternvoidspace: thanks, will look shortly12:57
voidspacedimitern: cool13:00
babbageclunkdimitern: Ha ha, it turns out that clearing out the transaction system's collections and then asking it to run another set of operations in a transaction doesn't work very well.13:10
babbageclunkdimitern: This is obvious in retrospect.13:10
fwereadebabbageclunk, haha13:11
fwereadebabbageclunk, (do you know about SetBeforeHooks et al?)13:11
babbageclunkfwereade: no - what are those?13:11
fwereadebabbageclunk, repeatable race tests for txns13:12
fwereadebabbageclunk, hooks in just before executing the txn, so you can change state underneath it and check it aborts as expected13:12
fwereadebabbageclunk, (and just after, if you want, as well: there are a few tests that just change state under every txn and restore it before the SUT reconstructs it and checks that the runner gives up13:13
babbageclunkfwereade: oh, neat13:13
fwereadebabbageclunk, they're not quite in the right place any more because the underlying functionality moved out of state so it's not really a state responsibility any more, but still ;)13:14
fwereadebabbageclunk, they come in very handy ;)13:14
fwereadebabbageclunk, search in state for examples13:14
babbageclunkfwereade: thanks - I'll see if I can use them for this.13:15
* dimitern needs to step out for ~1h13:24
perrito666katco: I got kicked out of the call brt13:34
perrito666k there is something not ok with google calendar, can anyone give me the link? natefinch?13:35
natefinchhttps://plus.google.com/hangouts/_/canonical.com/tanzanite?authuser=113:35
natefinchperrito666: log out and back in to google?13:36
perrito666natefinch: tx, google decided that I should logout of every account13:41
babbageclunkfwereade, dimitern: If I get an ErrAborted from running a transaction, how can I work out what's causing it?14:13
rogpeppedooferlad: i see you're OCR today... fancy a review? :) http://reviews.vapour.ws/r/4840/14:13
rogpeppebabbageclunk: you can't14:13
dooferladrogpeppe: *click*14:13
babbageclunkrogpeppe: :(14:13
rogpeppebabbageclunk: the transaction might have been run by another client14:13
rogpeppebabbageclunk: that's the way that mgo/txn works14:14
rogpeppebabbageclunk: the usual approach is to delve back into the db and see what might've been the cause14:14
rogpeppebabbageclunk: and yes, it's pretty crap14:14
babbageclunkrogpeppe: Ah, so it doesn't necessarily mean that the changes didn't get made?14:14
rogpeppebabbageclunk: if you get ErrAborted from a transaction, no changes have been made14:15
rogpeppebabbageclunk: transactions work by first checking all assertions, and only if all of them pass, applying all changes14:15
babbageclunkrogpeppe: right - makes sense. Thanks.14:15
rogpeppedooferlad: ta!14:15
rogpeppedooferlad: BTW if you see mhilton's review, ignore it. he isn't a qualified juju-core reviewer. and he was part-author of the changes.14:17
dooferladsure14:17
fwereadebabbageclunk, you can't -- that's why you have to loop, and check all the things you depend on every time14:18
babbageclunkfwereade: In this case I've just deleted all of the things from all of the collections (but not the txns now :). So there really shouldn't be any asserts failing in the transaction.14:20
rogpeppebabbageclunk: you haven't got an asserts in your txn?14:20
rogpeppes/an/any/14:21
fwereadebabbageclunk, hmm, unless they're all txn.DocMissing I would expect any that imply the existence of a doc to fail14:21
babbageclunkrogpeppe: They're all DocMissing...14:21
fwereadebabbageclunk, but, how are you deleting everything?14:21
fwereadebabbageclunk, once you've used mgo/txn on a document it's basicaly off-limits for everything else14:21
fwereadebabbageclunk, SetBeforeHooks is better suited to changes you can make via the exported interface14:22
babbageclunkfwereade: Ok, that might be it - I'm deleting them using collection.RemoveAll, so outside a transaction.14:22
fwereadebabbageclunk, how are you getting that collection? .Writeable()?14:22
babbageclunkfwereade: weirdly, it works fine in mongo3.2 (for some value of works), just not in 2.414:23
fwereade/ Collection imperfectly insulates clients from the capacity to write to14:23
fwereade/ MongoDB. Query results can still be used to write; and the Writeable14:23
fwereade/ method exposes the underlying *mgo.Collection when absolutely required;14:23
fwereade/ but the general expectation in juju is that writes will occur only via14:23
fwereade/ mgo/txn, and any layer-skipping is done only in exceptional and well-14:23
fwereade/ supported circumstances.14:23
babbageclunkfwereade: Maybe some context is in order. I'm working on this: https://bugs.launchpad.net/juju-core/+bug/157329414:25
mupBug #1573294: state tests run 100x slower with mongodb3.2 <juju-release-support> <mongo3> <mongodb> <juju-core:Triaged> <https://launchpad.net/bugs/1573294>14:25
babbageclunkfwereade: I don't know whether it constitutes exceptional circumstances though.14:26
fwereadebabbageclunk, ahh, ok... hmm14:26
babbageclunkSo I'm trying out an approach where instead of deleting and recreating the DB I scrape all of the data out and make it look all clean and new.14:27
babbageclunkfwereade: But I definitely might be trying to do this at the wrong level.14:28
fwereadebabbageclunk, thinking14:29
fwereadebabbageclunk, so, are we confident it's the index-creation that's slowing us down?14:29
babbageclunkfwereade: not totally - it's big, but it's not the only big part from my instrumenting.14:30
babbageclunkfwereade: I'm partly doing this in the expectation that it'll still be too slow.14:31
babbageclunkfwereade: (I was expecting it to be easier than it has been. ;)14:31
dooferladbabbageclunk: have you just tried deleting all the collections rather than deleting everything in them, then recreating them without indexes?14:32
fwereadebabbageclunk, do we know that a second EnsureIndex runs faster than the first one?14:32
babbageclunkfwereade: Oh, hang on - do you mean the second creation of a specific index?14:33
babbageclunkdooferlad: No, not yet. I tried commenting out the index creation and running the tests and it was still hella slow.14:34
fwereadebabbageclunk, yeah14:35
fwereadebabbageclunk, ok, interesting14:35
babbageclunkfwereade: In mongo 2.4 creating the first index is much slower than any subsequent (like 100x).14:35
babbageclunkfwereade: In 3.2 I don't see the same - they're all about the same time.14:36
babbageclunkfwereade: but I haven't tried dropping and recreating the same one (which I guess would be closer to what the tests are doing).14:36
babbageclunkfwereade: (oh, just looked back - in 2.4 it's 10x, not 100x for subsequent index creation)14:37
fwereadebabbageclunk, ok, and if we don't use indexes at all we don't actually save any significant time anyway? or is "hella slow" notably better than before? ;)14:38
babbageclunkfwereade: Well, in the tests the indexes are probably having minimal effect (maybe even slightly negative).14:39
fwereadebabbageclunk, I was imagining that there was a heavy per-test index-creation cost but that the presence or absence of indexes wouldn't make much difference to the test cases themselves14:42
babbageclunkfwereade: I think my numbers were that database teardown was also a big part of the time, but the tests themselves were also still big.14:42
mupBug #1582264 opened: remove-machine fails with false "machine X is hosting containers" <ci> <lxd> <remove-machine> <status> <juju-core:Triaged> <https://launchpad.net/bugs/1582264>14:42
babbageclunkfwereade: Yeah - that's right.14:43
fwereadebabbageclunk, ok, cool, I'm catching up :)14:43
babbageclunkfwereade: (Sorry - I'd rerun the tests to get numbers, but my timing is in a stash that won't apply after the bulldozing I've been doing.)14:44
fwereadebabbageclunk, no worries, I know how it is14:44
fwereadebabbageclunk, possible derail: have you spotted any 5s-ish waits at all?14:45
babbageclunkfwereade: mmmmmmaybe? Trying to remember.14:46
fwereadebabbageclunk, file it for future reference -- 5s waits usually mean that the txn log watcher is chilling out on its own schedule, and something needs to goose it into activity with a StartSync14:47
babbageclunkfwereade: certainly things were going from 0.05s to ~5ish s, but I don't think any one block of time was ~5s.14:47
fwereadebabbageclunk, or, it it happens all the time, it means StartSync is broken14:47
babbageclunkfwereade: Ok, I'll keep an eye out for that.14:48
fwereadebabbageclunk, and it wouldn't be every test anyway, basically just watcher ones14:48
mattywfwereade, as you're around I have a quick question14:49
fwereademattyw, go on14:49
mattywfwereade, this function is far from ideal because of the panic https://github.com/juju/names/blob/master/unit.go#L2614:49
mattywI know there must be a non panicing version somewhere, but I don't see it14:50
mattywthere's someway you can do it by going around the houses14:50
fwereademattyw, it would make me super happy if you were to write useful versions of those funcs14:50
fwereadeotherwise you're stuck assuming that every client obviously always has a valid unit name because, uh...14:51
mattywfwereade, if I was to do it in pursuit of a potential panic in the apiserver would I earn some kind of prize?14:51
fwereademattyw, my admiration and respect?14:52
mattywfwereade, hmmmm, I'll take it14:52
fwereade<314:52
fwereadealso beer next time we're in the same city14:52
babbageclunkdooferlad: If I was dropping the collections I'd need to do something to redo the collections that need explicit creation, wouldn't I?14:54
dooferladbabbageclunk: yes14:56
babbageclunkdooferlad: Ok, looking at it that doesn't seem to fiddly.14:57
babbageclunkdooferlad: Do you think that'd avoid the horrible transaction mess I've gotten myself into?14:57
babbageclunkdooferlad: because I'd like that.14:58
dooferladbabbageclunk: yes - if you delete the txn collection you will be home and dry14:58
dooferlad(probably)14:58
babbageclunkdooferlad: Hmm - It was deleting the txns (although not the collection itself) that got me last time.14:59
babbageclunkdooferlad: Still, worth a go!14:59
dooferladbabbageclunk: ask fwereade, but there must be a way of deleting basically everything and starting again.15:00
dooferladbabbageclunk: see https://github.com/go-mgo/mgo/blob/v2/txn/mgo_test.go15:03
mupBug #1582268 opened: TestInstancesGathering fails <intermittent-failure> <test-failure> <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1582268>15:04
babbageclunkdooferlad: Looks suspiciously like https://github.com/juju/testing/blob/master/mgo.go#L52215:05
babbageclunkdooferlad: I think it's that stuff that I'm trying to avoid doing.15:08
cheryljkatco, frobware - ping?15:17
frobwarecherylj: pong15:17
cheryljhey frobware, can I ask you and katco to help stay on top of CI bugs this week while we're all in vancouver?15:17
frobwarecherylj: yep, sure15:17
cheryljfrobware, katco - the latest run had a lot of bugs:  http://reports.vapour.ws/releases/397315:18
cheryljsinzui, abentley - could you guys help katco and frobware prioritize CI bugs?15:18
frobwarecherylj: who should I bug for the azure related failures?15:18
cheryljfrobware: i think for that one in particular, CI needs to clean up resource groups15:19
cheryljfrobware:  it is unclear if the expectation is that juju should be cleaning those up15:20
abentleycherylj: I am kind of in the middle of submitting bugs, but I am triaging them :-)15:20
cheryljabentley: thanks :)  please work with frobware and katco to get people assigned to blockers15:20
dooferladbabbageclunk: sorry tea + daughter break15:22
babbageclunkdooferlad: :) no worries15:23
dooferladbabbageclunk: the test there does a set up suite that starts the database then has a drop all, redial step.15:23
sinzuifrobware: CI exhausted its resources this weekend. Old and broken jujus are the likely cause. I cleaned up a few hours ago and am retesting15:23
frobwaresinzui: ack15:23
dooferladbabbageclunk: this is part of the txn testing15:24
babbageclunkdooferlad: Yup - ours does something similar, but the server startup is before the suite setup.15:24
dooferladbabbageclunk: oh, I thought we had one server per test15:25
=== frankban|afk is now known as frankban
babbageclunkdooferlad: no, the server is running the whole time - it's started here: https://github.com/juju/juju/blob/master/state/package_test.go#L1815:27
mupBug #1578834 changed: update-alternatives fails to switch between juju-1 and juju-2 <cpe-sa> <packaging> <juju-core:Invalid> <juju-release-tools:Triaged> <https://launchpad.net/bugs/1578834>15:34
dimiternfrobware: do you know if we're having the call with rick_h_ today?15:40
dooferladdimitern: he has cancelled15:44
dimiterndooferlad: ok, thanks15:44
dooferladdimitern: or, at least, he set his response to not attending for this week.15:45
dooferladdimitern: not that I got a message about it :-(15:45
dimiterndooferlad: yeah, I guess I should've looked for that first :)15:45
frobwaredimitern: no15:46
dimiternfrobware: thanks for following up on the private address question15:46
dooferladfrobware / dimitern: is there anything worthy of discussion before the end of day?15:46
dooferladfrobware / dimitern: in a hangout that is15:47
frobwaredooferlad: only to scan the CI failures to see if there's stuf we can take15:47
dooferladfrobware: probably better done tomorrow at start of day?15:50
mattywfwereade, you got time to talk about manifolds and charm upgrades?15:52
fwereademattyw, sure15:53
frobwaredimitern, voidspace, dooferlad, babbageclunk: PTAL @ http://reviews.vapour.ws/r/4844/15:55
dimiternfrobware: looking15:57
dimiternfrobware: LGTM16:07
frobwaredimitern: ty16:07
=== redir_ is now known as redir
frobwaredimitern: am I correct, ... we don't have any explicit tests for machine_linklayer devices?16:15
dimiternfrobware: in state?16:15
frobwareyep16:16
dimiternfrobware: there is complete coverage of that code16:16
hoenirv16:16
frobwaredimitern: through linklayer_devices_test?16:16
dimiternfrobware: there are white-box tests in _internal_ and black-box tests otherwise16:18
frobwaredimitern: ok16:18
dimiternfrobware: so linklayerdevices_internal_test.go and linklayerdevices_test.go16:19
dimiternfrobware: similarly for linklayerdevices_ipaddresses -16:19
frobwaredimitern: I was searching for the wrong patter; was including machine16:20
dimiternfrobware: yeah, machine-related tests were split across multiple files now16:21
babbageclunkdooferlad: Hmm. Reusing the database actually does seem to bring the 3.2 performance into line with the original tests on 2.4. Except I can't run my updated tests against 2.4.17:05
voidspacedimitern: you still around?17:14
dimiternvoidspace: yeah17:14
voidspacedimitern: you got a minute or two spare to talk about linklayerdevices?17:15
dimiternvoidspace: sure - standup ho?17:15
voidspacedimitern: yep17:15
dimiternvoidspace: i'm in there now17:16
mupBug #1488245 opened: Recurring lxc issue: failed to retrieve the template to clone  <canonical-bootstack> <kanban-cross-team> <lxc> <juju-core:New> <https://launchpad.net/bugs/1488245>18:10
=== redir is now known as redir_afk
=== redir is now known as redir_afk
katconatefinch: hey any update on the bug? i think ian needs to give a demo sometime this week20:06
natefinchkatco: poking at it now... my wife got sick midday, so I had to take some time off, but actively working on it now and will spend time tonight on it as well, if I don't have it figured out soon.20:08
katconatefinch: k, lmk if you have to get pulled off again. ian needs a fix asap20:09
natefinchkatco: will do20:11
katconatefinch: (i.e. "today") :( lmk if you need help from anyone or another set of eyes or whatever20:12
natefinchkatco: lack of logging is hindering initial efforts at figuring it out, but that's easy enough to add in20:13
wallyworldnatefinch: katco: yeah, sorry about short turaround time, needs to be done for a schedule lightning talk20:26
wallyworldnatefinch: katco: and as you can imagine, given the audience here, it needs to work :-)20:27
natefinchwallyworld: np, able to repro, probably a charmstore issue, but trying to narrow it down to make sure right now.20:27
bdxmaas-juju-networking-peeps: hey, so initially a juju deployed maas node gets bridges created on its interfaces, but if I reboot the node, /etc/network/interfaces gets stomped, and juju created bridges are wiped20:33
bdx:-(20:34
bdxfiling a bug now20:34
bdxhttps://github.com/juju/juju/issues/540920:35
natefinchwallyworld, katco, ericsnow:  think I see the issue.  Store isn't returning an origin21:02
katconatefinch: how is our client handling that? why isn't it an error?21:03
natefinchkatco: trying to figure that out.  my guess is we're just retrying21:04
katconatefinch: good find, ty21:05
natefinchkatco: wrote a tiny CLI script frontend to our API client wrapper... was way faster than trying to iterate through bootstrap etc21:06
katconatefinch: very nice! love the quicker iteration :)21:06
natefinchericsnow, katco: http://pastebin.ubuntu.com/16468371/21:07
natefinchI gotta run for dinner, but I'll be back on later.  pretty sure the fix is just to default the origin to store21:08
katconatefinch: that would be nice if it was just a client side fix21:08
katcoericsnow: do you agree with that approach? and override if the resource comes from a --resource flag?21:09
natefinchkatco: this is just while downloading from the store itself21:09
katconatefinch: that seems sane. obviously couldn't originate from anywhere else21:10
natefinchkatco: in theory the store could just always return an origin of "store", but they're not, I guess on the theory that, duh it's from the store21:10
mupBug #1582408 opened: System id is listed during bootstrap and deploy of charms instead of system name <juju-core:New> <https://launchpad.net/bugs/1582408>21:11
ericsnowkatco: we *were* defaulting to store but that changed when we stopped working on the store/csclient code21:11
ericsnowkatco: so I'm on board with defaulting to that21:11
ericsnowkatco: gotta run for a couple hours (unexpectedly); will be back later21:46
katcoericsnow: k21:46
davecheneyso, all the managers are away at a sprint and master hasn't been blocked for days22:01
davecheneycoincidence ? unlikely :)22:01
katcodavecheney: lol i enjoy this theory22:02
davecheneyscience fiction > science fact22:16
natefinchkatco, ericsnow, wallyworld: btw, confirmed that fixes the bug.  Not really here, but I'll have a PR up in a couple hours-ish23:44
katconatefinch: awesome ty23:49

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!