/srv/irclogs.ubuntu.com/2016/05/16/#juju-dev.txt

mup	Bug #1582065 opened: Juju seems confused about a newly defined MAAS cloud <juju-core:New> <https://launchpad.net/bugs/1582065>	03:46
fwereade	anastasiamac, the only thing that springs to mind is to stop the machine agent, remove all evidence of the failed charm, and let the machine agent come up and deploy it again	06:44
fwereade	anastasiamac, (would have been better to do that first, but I can't immediately think of anything that the dying principal would break)	06:46
fwereade	anastasiamac, (my working theory is that worker/deployer failed half way through and left garbage it subsequently interpreted as a successful deployment)	06:49
anastasiamac	fwereade: thnx.. i htink it's on state server... so m not sure what the implications would be..	07:12
anastasiamac	i mena the unit was deployed on state server machine	07:12
bradm	fwereade: the only thing I can find is the agent.conf, so you're suggesting remove that?	07:16
bradm	fwereade: as in /var/lib/juju/agents/unit-<foo>/agents.conf, I couldn't find any other evidence of it on disk	07:17
bradm	fwereade: and it is deployed to a state server	07:17
mup	Bug #1582105 opened: lxd provider doesn't honour memory constraints <juju-core:New> <https://launchpad.net/bugs/1582105>	07:38
fwereade	bradm, anastasiamac: also check for the init system conf	07:41
fwereade	bradm, anastasiamac: state server shouldn't matter, it's not super-elegant to stop it but it shouldn't do any harm	07:42
bradm	fwereade: no init system conf for it	07:42
bradm	fwereade: so you're saying move the /var/lib/juju/agents/unit-<foo> and restart the machine agent?	07:43
anastasiamac	fwereade: bradm: got to run - family, dinner, etc... but i'll read the scroll \o/	07:43
fwereade	bradm, I think so, just reading the deployer code to see what I might be missing	07:43
bradm	fwereade: its odd, because this is the only subordinate of its type that failed	07:44
fwereade	bradm, also move the tools dir for that unit	07:45
bradm	fwereade: ah, the symlink? can do	07:46
fwereade	bradm, (shouldn't really matter, but it's another thing the deployer is responsible for)	07:46
fwereade	bradm, what should happen when you start the machine agent is that it'll see unit-dying, nothing-deployer, and advance the unit to dead immediately	07:46
fwereade	bradm, ...wait a mo	07:47
bradm	fwereade: ugh, I've already done that	07:47
fwereade	bradm, ah no don't worry :)	07:48
bradm	nothing seems to have changed, though.	07:48
bradm	jujud is chewing away at cpu though	07:48
fwereade	bradm, anything from juju.worker.deployer in the logs?	07:49
bradm	fwereade: nope	07:49
bradm	fwereade: the last thing was from juju.worker.firewaller	07:52
fwereade	bradm, sorry, just to be clear on the syncing: from a fresh start of the state server, we see nothing in the juju.worker.deployer logs? but e.g. status shows units assigned to the machine?	07:53
fwereade	bradm, I would expect at least a "checking unit %q" message or two...	07:54
bradm	fwereade: I don't see anything about "checking unit" in the machine logs at all	07:55
bradm	fwereade: oh, no, I'm wrong	07:55
fwereade	bradm, phew :)	07:55
fwereade	bradm, at least some part of the universe makes sense ;p	07:55
bradm	fwereade: there's some from quite a number of hours ago, nothing for my problematic subordinate	07:55
bradm	fwereade: I'm getting quite a bit of stuff thrown out to juju debug-log though, seemingly all from units	07:56
fwereade	bradm, huh, I don't see how it could have messed up the deploy in the first place without at least noticing and checking it	07:56
bradm	fwereade: this jujud is seemingly quite flakey	07:56
bradm	I seem to get a lot of write: broken pipe in the logs	07:57
fwereade	bradm, what sort of things are happening? that doesn't feel too unexpected if we're bouncing state servers	07:57
bradm	fwereade: well, for a start a subordinate didn't deploy cleanly.. :)	07:58
fwereade	bradm, indeed :)	07:58
bradm	fwereade: this was landscape-client, and only on the state server did it fail	07:58
bradm	ugh, I'm going to run out of day, will have to disappear for dinner soon	07:59
fwereade	bradm, can you paste any of the logs anywhere for me to take a look at?	08:00
bradm	there's a bunch of juju.apiserver logs about "login for machine-x-lxc-y blocked because upgrade in progress"	08:00
bradm	which is odd, because it was deployed with 1.25.5 and hasn't been upgraded	08:00
fwereade	bradm, what that really means is the state server is still waking up and isn't yet certain that it's fully upgraded	08:02
fwereade	bradm, if it keeps saying that for a long time it's a problem, though	08:02
bradm	macgreagoir: fancy seeing you here!	08:04
macgreagoir	bradm: :-D	08:05
frobware	dimitern: ping	08:28
dimitern	frobware: pong	08:28
frobware	dimitern: do you have 15 mins before standup - would like to sync on the private-address issue	08:29
dimitern	frobware: ok, in :45 ?	08:30
frobware	dimitern: fine, thx	08:31
hoenir	does anyone have time to review my patch please?	08:40
rogpeppe1	dimitern, fwereade: so what's with the smiley face on juju.fail? i can still see critical bugs open...	08:40
dimitern	hoenir: well, beta7 is out, that's why master is unblocked	08:41
dimitern	hoenir: sorry :) that ^^ was for rogpeppe1	08:41
rogpeppe1	dimitern: so master doesn't get blocked on critical bugs any more?	08:41
rogpeppe1	dimitern: woo!	08:41
dimitern	hoenir: I was about to say that most of us are less familiar with the windows support code and I'd suggest asking for a review from e.g. gsamfira or bogdanteleaga as they're most familiar with that code	08:42
dimitern	rogpeppe1: only for those with 'ci' and 'blocker' tags IIRC	08:42
rogpeppe1	dimitern: ah, cool	08:43
* rogpeppe1 hits $$merge$$ and hopes		08:43
dimitern	frobware: joining standup ho now	08:43
rogpeppe1	dimitern: i thought it was gonna be months until i was able to land my "high" importance bug fix :)	08:43
dimitern	rogpeppe1: merge while you can :D	08:44
hoenir	dimitern, bogdanteleaga already reviewed my code and it said it was ok, but he advised me also to seek some more review on irc.	08:45
rogpeppe1	dimitern: BTW, i just saw this comment in the ec2 provider: // TODO(dimitern): Both of these shouldn't be restricted for hosted models.	08:54
rogpeppe1	dimitern: i think that, for the time being at least, region should remain restricted	08:55
dimitern	rogpeppe1: why so?	08:55
rogpeppe1	dimitern: because we currently rely on the fact that models will be deployed to the same region	08:55
rogpeppe1	dimitern: and there's no mechanism in place to determine "default" region	08:56
dimitern	rogpeppe1: I see, well good that I haven't made 'region' unrestricted :)	08:59
babbageclunk	hoenir: I've updated my review.	08:59
rogpeppe1	dimitern: looks like vpc id is still restricted thougth	08:59
dimitern	rogpeppe1: 'vpc-id-force' is	09:00
rogpeppe1	dimitern: ah, what does that do?	09:00
dimitern	rogpeppe1: forces juju use 'vpc-id' at bootstrap only	09:00
rogpeppe1	dimitern: sorry, i don't understand that	09:01
babbageclunk	Why is it so hard to get from a PR to the originating branch in github? I always end up editing the URL.	09:01
hoenir	babbageclunk, thanks again	09:01
dimitern	voidspace, babbageclunk: standup?	09:02
babbageclunk	Shouldn't the branches in "hoenirvili wants to merge 1 commit into juju:master from hoenirvili:refactor-userdata-cloudconfig" be links?	09:02
voidspace	dimitern: omw	09:02
babbageclunk	Oh yeah, sorry - too much ranting.	09:02
fwereade	hoenir, made a couple of comments	09:05
fwereade	hoenir, hmm, (I should probably look at the filetoconst one more closely if it wasn't just a move)	09:05
TheMue	morning, folks	09:07
hoenir	babbageclunk, fwereade , thanks again for the comments and reviwing my code, I will start working right now to fix the mistakes.	09:08
fwereade	hoenir, yw, ping me if anything is not clear	09:17
hoenir	fwereade, after the modification I will ping you for more advice ..	09:20
fwereade	hoenir, cheers	09:22
voidspace	dimitern: babbageclunk: frobware: http://reviews.vapour.ws/r/4838/	09:23
voidspace	dooferlad: http://reviews.vapour.ws/r/4838/	09:24
dimitern	voidspace: that looks like it includes the spaces PR as well/	09:26
dimitern	?	09:26
voidspace	dimitern: oh yes it does - sorry	09:26
voidspace	dimitern: it requires it, and that is still landing	09:26
voidspace	dimitern: https://github.com/juju/juju/pull/5394	09:26
voidspace	dooferlad: that PR includes the "already reviewed but not yet landed" spaces PR - https://github.com/juju/juju/pull/5394	09:27
dimitern	voidspace: cheers	09:27
babbageclunk	voidspace - You can use rbt to change the base commit for the review so it doesn't include the other changes.	09:31
voidspace	babbageclunk: ah, ok	09:31
voidspace	hmm, if rbt worked for me	09:32
voidspace	it's python 2 code and my default python is now 3 (xenial) - I assume that is the cause orf "no module named datetime" anyway	09:32
voidspace	rbt is in a virtualenv I can blow it away and restart	09:32
dimitern	voidspace: what happens with AddSpace when you try to add a space with yet-unknown name, but providerID matching existing space?	09:35
voidspace	dimitern: the same as before	09:36
voidspace	dimitern: how can the name be unknown?	09:36
dimitern	voidspace: won't you get ErrAborted from runTransaction due to txn.DocMissing failed for the providerID ?	09:36
voidspace	dimitern: just looking	09:36
dimitern	voidspace: I mean add space "foo" with providerID "42", then try adding space "bar" also with provID "42"	09:36
voidspace	dimitern: it fails	09:37
voidspace	dimitern: there's a test for that	09:37
voidspace	dimitern: it fails with ProviderId not unique	09:37
dimitern	voidspace: hmm because of the refresh + check notfound yeah I see	09:38
dimitern	voidspace: ok, just checking :)	09:38
voidspace	dimitern: it would have been nice to avoid that Refresh	09:38
dimitern	nice when we have tests that already cover this case - makes refactoring with impunity possible :D	09:39
voidspace	yeah :-)	09:39
fwereade	voidspace, dimitern: Refresh-inside-txn is unhelpful anyway, it means that any call that changes remote state might also change arbitrary local state	09:39
voidspace	that test failed initially - I had to move the Refresh code	09:39
dimitern	voidspace: you can - looking up the providerIDsC doc by the providerID + global key of the space	09:39
voidspace	fwereade: it's "internal local state"	09:39
dimitern	i.e. findid()	09:39
voidspace	fwereade: the space we're refreshing is one created inside AddSpace and not returned	09:39
fwereade	voidspace, dimitern: ah cool	09:40
dimitern	fwereade: yeah, I think the refresh there is relatively safe, as it's adding a new doc	09:40
voidspace	we're already in an error path at that point	09:40
fwereade	voidspace, dimitern: I would say that the notion of an error path doesn't quite fit right with txns -- the loop should be (1) check everything in-memory, (2) try to apply change with matching asserts	09:41
fwereade	voidspace, dimitern: so just always reading the space doc fresh might be cleanest?	09:42
fwereade	...I should look at the code before pontificating too much	09:42
dimitern	fwereade: that's not a bad idea	09:42
voidspace	fwereade: what we're checking for in the assert is that another space with a different name doesn't have the same provider id	09:42
voidspace	so it isn't in memory	09:42
voidspace	and when we get ErrAborted we need to know why it failed to return the right error (message)	09:42
dimitern	fwereade: however, in that particular case we have no loop (i.e. no buildTxn, but a single try-check-erraborted-and-fail-accordingly)	09:43
voidspace	the refresh failing confirms that it failed due to a duplicate provider id and not because the name already existed	09:43
voidspace	the refresh is unnecessary - we've already checked for the "already exists" case so it should be safe to assume that it was the other assert that failed	09:44
voidspace	we used to have to refresh because the failure was silent, no assert failure, so we had to check the insert worked by refreshing	09:44
dimitern	voidspace: hmm not quite though, we have 3 asserts - model alive, doc missing for spacesC, doc missing for providerIDsC	09:44
voidspace	ah yes	09:44
voidspace	the refresh is now only in the error path though, not on the happy path	09:45
dimitern	uhm	09:45
voidspace	so it's already better	09:45
dimitern	actually also 1 assert per subnet	09:45
dimitern	when subnetids is not empty	09:45
fwereade	voidspace, and that's the trouble with assuming based on asserts :) it's way too easy to lose track of the assumptions	09:45
voidspace	yes, but we check for that before the refresh as well	09:45
voidspace	so we'll leave the refresh in place then	09:45
voidspace	it's an unlikely (theoretically impossible) failure mode anyway...	09:46
* dimitern brb		09:46
* fwereade has seen the code, and, yeah, that Aborted branch is too long -- IMO it's crying out to be restructured to loop over: (1) check provider id not already used (2) check subnets all exist (3) shouldn't we check that we're not sneaking subnets away from other spaces?		09:50
dimitern	fwereade: that sounds better and more thorough	09:51
fwereade	then you (1) check sanity with a bunch of cheap reads instead of firing off an expensive txn without any reason to believe it works and (2) just let the runner mechanism handle the details	09:52
fwereade	if you've messed up you'll get an ErrExcessiveContention which usually actually means that your memory checks don't match your asserts	09:52
dimitern	voidspace: how about fixing that ^^ in a follow-up?	09:53
fwereade	dimitern, voidspace: the other reason to favour that structure, btw, is so you can build up your asserts alongside your memory checks	09:53
fwereade	dimitern, voidspace: having to dupe the logic sucks almost unbearably hard anyway	09:53
dimitern	fwereade: in the attempt > 0 case in buildTxn you mean?	09:54
fwereade	dimitern, voidspace: having the memory version of the code far away from the assert version is a recipe for screwups	09:54
fwereade	dimitern, I don't think you need to check attempt, do you?	09:54
dimitern	fwereade: even to know when to re-eval your local state?	09:55
fwereade	dimitern, yeah, but I thought there wasn't any here?	09:56
fwereade	dimitern, voidspace: https://github.com/juju/juju/wiki/mgo-txn-example attempts to exemplify what I'm talking about	09:56
dimitern	fwereade: it's not there in AddSpace code, but I reading back I see what you mean - nice example btw! thanks	09:58
fwereade	dimitern, cheers	09:58
dimitern	fwereade: any particular preference between txn.Op{.., Insert: docType{}} vs txn.Op{.., Insert: &docType{}} ?	10:02
dimitern	I see the former is more common	10:03
fwereade	dimitern, I don't much care really -- I have a slight preference for passing values rather than pointers, to signal that modifications are not expected/intended	10:03
fwereade	dimitern, because I have seen people silently changing fields when given access	10:04
dimitern	fwereade: right, it also looks somewhat cleaner with a value	10:04
fwereade	dimitern, indeed	10:04
fwereade	dimitern, the only argument against I can think of is that it might take more memory -- and people do keep bringing this up -- but it kinda baffles me. maybe when your struct is 4k big or something? in the meantime we have gigs available, let's not fret ;)	10:05
dimitern	fwereade: yeah, I guess for storing binary blobs it makes more sense	10:06
dimitern	but even then..	10:06
fwereade	dimitern, well, yeah, and if it's a slice it won't copy anyway	10:06
dimitern	fwereade: what really scares me are maps as docs actually (or subdocs)	10:07
fwereade	dimitern, what's the case you need to worry about... non-pointer receivers on methods backed by array types? I think that copies the whole thing	10:07
fwereade	dimitern, yeah, indeed	10:07
dimitern	voidspace: reviewed	10:08
fwereade	dimitern, fixed a bug just the other day with internal maps leaking out uncopied	10:08
dimitern	fwereade: nasty :/	10:08
fwereade	dimitern, even then, I will worry about that when I write a type like that ;)	10:08
fwereade	dimitern, but anyway	10:09
dimitern	fwereade: yeah :)	10:09
fwereade	dimitern, I'm pretty sure our txn composers copy internally anyway	10:09
dimitern	fwereade: it seems they do	10:14
dimitern	voidspace, babbageclunk, dooferlad: a tiny review http://reviews.vapour.ws/r/4839/, please take a look	10:57
dimitern	thanks babbageclunk!	11:04
babbageclunk	dimitern: hmm - something interesting.	11:23
dimitern	babbageclunk: yeah?	11:23
babbageclunk	dimitern: because the state tests use Reset to teardown and re-setup in tests...	11:24
babbageclunk	dimitern: and cleanups don't get removed in teardown...	11:24
babbageclunk	dimitern: I end up seeing repeated calls to state.Close()	11:25
dimitern	babbageclunk: and panics I presume?	11:25
babbageclunk	dimitern: nope	11:26
dimitern	babbageclunk: by "cleanups" do you mean the cleanups collection?	11:26
voidspace	dimitern: fwereade: thanks for the reviews by the way	11:30
fwereade	voidspace, sorry it was only a glance, I think you and dimitern have it in hand	11:30
babbageclunk	dimitern: Yes, functions registered in suite.addCleanup	11:30
voidspace	fwereade: cool	11:30
babbageclunk	dimitern: I'm going to change CleanupSuite to set them to nil after executing them.	11:31
dimitern	babbageclunk: ah, that's different - that's for bits changed by PatchValue etc.	11:31
fwereade	babbageclunk, multiple Close()s should be fine/safe, but we should have got them all out of the way before we hit Reset, I think	11:31
dimitern	babbageclunk: setting that to nil without un-patching everything sounds really bad	11:33
babbageclunk	dimitern: Executing the cleanup is unpatching everything.	11:33
dimitern	babbageclunk: or you mean they're already un-patched, but the slice is never nil	11:33
babbageclunk	dimitern: yes, that	11:33
dimitern	babbageclunk: right, then it sounds safe	11:33
babbageclunk	fwereade: What do you mean by "should have got them all out of the way before Reset"?	11:34
babbageclunk	fwereade: Aren't the closes being done by Reset (which calls TearDownTest)?	11:35
fwereade	babbageclunk, hm, maybe I have the wrong context, code reference?	11:36
dimitern	babbageclunk: if anything, it should be the other way around: TearDownTest is called by gocheck, which probably calls Reset	11:36
fwereade	babbageclunk, I was thinking of the nuke-db-state Reset? which I think undercuts everything else	11:36
fwereade	babbageclunk, the dummy provider will do it to you in the middle of your tests if you're not careful >_<	11:37
babbageclunk	fwereade: oh, no - this is a method on the allwatcher_internal_test that calls teardown and then setup - it's called in the middle of a test (because the test is running multiple scenarios).	11:38
fwereade	babbageclunk, oh ffs :(	11:38
fwereade	babbageclunk, don't suppose I can prevail upon you to just pull out the tests that so clearly want to be separate and deserve a test runner written for the job?	11:38
babbageclunk	fwereade: maybe? I'm not sure I get the second bit.	11:40
fwereade	babbageclunk, insight of katco's, crudely paraphrased by me: table-based tests are rubbish because you're writing your own test runner and that's a waste of effort	11:41
fwereade	babbageclunk, TBTs do have their place but it's an increidbly narrow simple one	11:41
fwereade	babbageclunk, if it doesn't look like a table any more, it's probably too complex to be a good TBT IMO	11:41
fwereade	babbageclunk, also, imagine how nice it will be when the tests just have their own names	11:42
dimitern	babbageclunk: i.e. instead of that test calling TearDownTest inside, it should have a cleanSetup and cleanCleanup helpers that do not depend on calling SetUpTest or TearDownTest	11:42
fwereade	babbageclunk, and failures get reported individually	11:42
fwereade	babbageclunk, and you don't need to find your failure in the middle of 30 pages of successful logs	11:42
babbageclunk	fwereade: Yeah, that makes sense - I think the reason for this one is that the tests are parameterised into allModelWatcher and allwatcher flavours.	11:43
babbageclunk	fwereade: But I'll have a go.	11:43
fwereade	babbageclunk, bleh, I see -- thanks	11:44
fwereade	dimitern, I'm more saying, use SetUpTest and TearDownTest as expected, and write 50 TestFooBarBaz methods that don't abuse their infrastructure	11:45
dimitern	fwereade: I know :) I was just trying to make it slightly easier for babbageclunk	11:45
fwereade	dimitern, that way you get to see which bits failed, and see their logs in isolation, and not have to worry about Assert cutting short a run	11:45
fwereade	etc :)	11:45
babbageclunk	fwereade: But still, I think the point is a good one - I'll see if I can figure out a better way of making the tests split out.	11:46
fwereade	babbageclunk, cheers	11:47
dimitern	babbageclunk: might help using multiple suites to better separate concerns	11:47
dimitern	and minimize code duplication around setup/teardown	11:47
dimitern	babbageclunk: beware though, if you're embedding a "allWatcherBaseSuite" into e.g. "allModelWatcherSuite", and both of those are registered in gocheck (gc.Suite(..)), you'll get run all tests for both suites	11:49
fwereade	babbageclunk, I admit I'm not really aware of a good pattern for running the same suite against different fixtures -- everything I can immediately think of smells too much of trying to do inheritance in golang, which is pretty much always a terrible idea	11:50
babbageclunk	fwereade: Well, at the bottom level I can give each of the not-quite-test funcs a name and then multiply out the actual tests.	11:52
babbageclunk	fwereade: then it's just a matter of editor automation	11:52
fwereade	babbageclunk, yeah, I think that's progress	11:52
fwereade	babbageclunk, dimitern: also, yeah -- I am getting increasingly irritated at suites that are also fixtures	11:54
dimitern	fwereade: me too!	11:54
dimitern	fwereade: I've fixed a bunch of those recently, around the provisioner	11:55
fwereade	babbageclunk, dimitern: have been making a point of `(*FooSuite)` receivers and explicit fixture setup and I think it's working out pretty well	11:55
fwereade	dimitern, nice	11:55
babbageclunk	fwereade, dimitern: Oh, cool - I've been mostly doing that for a bit in Python too (since JB pointed me at a blog post about it)	11:58
babbageclunk	Hmm, that said though - allwatcher_internal_test.go is 3244 lines, so maybe I'll rewrite them all in a separate diff.	11:59
katco	anastasiamac: ericsnow: perrito666: meeting time	12:04
fwereade	babbageclunk, holy shit :(	12:07
katco	anastasiamac: perrito666: hello?	12:08
perrito666	katco: ericsnow natefinch anastasiamac getting there, I sent an email about being late	12:15
mup	Bug #1582214 opened: upgrade-juju output is confusing <juju-core:New> <https://launchpad.net/bugs/1582214>	12:45
rogpeppe1	i'm after a review of this change to the GCE provider if anyone wants to help out, thanks! http://reviews.vapour.ws/r/4840/	12:45
=== rogpeppe1 is now known as rogpeppe
voidspace	dimitern: PR updated	12:56
dimitern	voidspace: thanks, will look shortly	12:57
voidspace	dimitern: cool	13:00
babbageclunk	dimitern: Ha ha, it turns out that clearing out the transaction system's collections and then asking it to run another set of operations in a transaction doesn't work very well.	13:10
babbageclunk	dimitern: This is obvious in retrospect.	13:10
fwereade	babbageclunk, haha	13:11
fwereade	babbageclunk, (do you know about SetBeforeHooks et al?)	13:11
babbageclunk	fwereade: no - what are those?	13:11
fwereade	babbageclunk, repeatable race tests for txns	13:12
fwereade	babbageclunk, hooks in just before executing the txn, so you can change state underneath it and check it aborts as expected	13:12
fwereade	babbageclunk, (and just after, if you want, as well: there are a few tests that just change state under every txn and restore it before the SUT reconstructs it and checks that the runner gives up	13:13
babbageclunk	fwereade: oh, neat	13:13
fwereade	babbageclunk, they're not quite in the right place any more because the underlying functionality moved out of state so it's not really a state responsibility any more, but still ;)	13:14
fwereade	babbageclunk, they come in very handy ;)	13:14
fwereade	babbageclunk, search in state for examples	13:14
babbageclunk	fwereade: thanks - I'll see if I can use them for this.	13:15
* dimitern needs to step out for ~1h		13:24
perrito666	katco: I got kicked out of the call brt	13:34
perrito666	k there is something not ok with google calendar, can anyone give me the link? natefinch?	13:35
natefinch	https://plus.google.com/hangouts/_/canonical.com/tanzanite?authuser=1	13:35
natefinch	perrito666: log out and back in to google?	13:36
perrito666	natefinch: tx, google decided that I should logout of every account	13:41
babbageclunk	fwereade, dimitern: If I get an ErrAborted from running a transaction, how can I work out what's causing it?	14:13
rogpeppe	dooferlad: i see you're OCR today... fancy a review? :) http://reviews.vapour.ws/r/4840/	14:13
rogpeppe	babbageclunk: you can't	14:13
dooferlad	rogpeppe: click	14:13
babbageclunk	rogpeppe: :(	14:13
rogpeppe	babbageclunk: the transaction might have been run by another client	14:13
rogpeppe	babbageclunk: that's the way that mgo/txn works	14:14
rogpeppe	babbageclunk: the usual approach is to delve back into the db and see what might've been the cause	14:14
rogpeppe	babbageclunk: and yes, it's pretty crap	14:14
babbageclunk	rogpeppe: Ah, so it doesn't necessarily mean that the changes didn't get made?	14:14
rogpeppe	babbageclunk: if you get ErrAborted from a transaction, no changes have been made	14:15
rogpeppe	babbageclunk: transactions work by first checking all assertions, and only if all of them pass, applying all changes	14:15
babbageclunk	rogpeppe: right - makes sense. Thanks.	14:15
rogpeppe	dooferlad: ta!	14:15
rogpeppe	dooferlad: BTW if you see mhilton's review, ignore it. he isn't a qualified juju-core reviewer. and he was part-author of the changes.	14:17
dooferlad	sure	14:17
fwereade	babbageclunk, you can't -- that's why you have to loop, and check all the things you depend on every time	14:18
babbageclunk	fwereade: In this case I've just deleted all of the things from all of the collections (but not the txns now :). So there really shouldn't be any asserts failing in the transaction.	14:20
rogpeppe	babbageclunk: you haven't got an asserts in your txn?	14:20
rogpeppe	s/an/any/	14:21
fwereade	babbageclunk, hmm, unless they're all txn.DocMissing I would expect any that imply the existence of a doc to fail	14:21
babbageclunk	rogpeppe: They're all DocMissing...	14:21
fwereade	babbageclunk, but, how are you deleting everything?	14:21
fwereade	babbageclunk, once you've used mgo/txn on a document it's basicaly off-limits for everything else	14:21
fwereade	babbageclunk, SetBeforeHooks is better suited to changes you can make via the exported interface	14:22
babbageclunk	fwereade: Ok, that might be it - I'm deleting them using collection.RemoveAll, so outside a transaction.	14:22
fwereade	babbageclunk, how are you getting that collection? .Writeable()?	14:22
babbageclunk	fwereade: weirdly, it works fine in mongo3.2 (for some value of works), just not in 2.4	14:23
fwereade	/ Collection imperfectly insulates clients from the capacity to write to	14:23
fwereade	/ MongoDB. Query results can still be used to write; and the Writeable	14:23
fwereade	/ method exposes the underlying *mgo.Collection when absolutely required;	14:23
fwereade	/ but the general expectation in juju is that writes will occur only via	14:23
fwereade	/ mgo/txn, and any layer-skipping is done only in exceptional and well-	14:23
fwereade	/ supported circumstances.	14:23
babbageclunk	fwereade: Maybe some context is in order. I'm working on this: https://bugs.launchpad.net/juju-core/+bug/1573294	14:25
mup	Bug #1573294: state tests run 100x slower with mongodb3.2 <juju-release-support> <mongo3> <mongodb> <juju-core:Triaged> <https://launchpad.net/bugs/1573294>	14:25
babbageclunk	fwereade: I don't know whether it constitutes exceptional circumstances though.	14:26
fwereade	babbageclunk, ahh, ok... hmm	14:26
babbageclunk	So I'm trying out an approach where instead of deleting and recreating the DB I scrape all of the data out and make it look all clean and new.	14:27
babbageclunk	fwereade: But I definitely might be trying to do this at the wrong level.	14:28
fwereade	babbageclunk, thinking	14:29
fwereade	babbageclunk, so, are we confident it's the index-creation that's slowing us down?	14:29
babbageclunk	fwereade: not totally - it's big, but it's not the only big part from my instrumenting.	14:30
babbageclunk	fwereade: I'm partly doing this in the expectation that it'll still be too slow.	14:31
babbageclunk	fwereade: (I was expecting it to be easier than it has been. ;)	14:31
dooferlad	babbageclunk: have you just tried deleting all the collections rather than deleting everything in them, then recreating them without indexes?	14:32
fwereade	babbageclunk, do we know that a second EnsureIndex runs faster than the first one?	14:32
babbageclunk	fwereade: Oh, hang on - do you mean the second creation of a specific index?	14:33
babbageclunk	dooferlad: No, not yet. I tried commenting out the index creation and running the tests and it was still hella slow.	14:34
fwereade	babbageclunk, yeah	14:35
fwereade	babbageclunk, ok, interesting	14:35
babbageclunk	fwereade: In mongo 2.4 creating the first index is much slower than any subsequent (like 100x).	14:35
babbageclunk	fwereade: In 3.2 I don't see the same - they're all about the same time.	14:36
babbageclunk	fwereade: but I haven't tried dropping and recreating the same one (which I guess would be closer to what the tests are doing).	14:36
babbageclunk	fwereade: (oh, just looked back - in 2.4 it's 10x, not 100x for subsequent index creation)	14:37
fwereade	babbageclunk, ok, and if we don't use indexes at all we don't actually save any significant time anyway? or is "hella slow" notably better than before? ;)	14:38
babbageclunk	fwereade: Well, in the tests the indexes are probably having minimal effect (maybe even slightly negative).	14:39
fwereade	babbageclunk, I was imagining that there was a heavy per-test index-creation cost but that the presence or absence of indexes wouldn't make much difference to the test cases themselves	14:42
babbageclunk	fwereade: I think my numbers were that database teardown was also a big part of the time, but the tests themselves were also still big.	14:42
mup	Bug #1582264 opened: remove-machine fails with false "machine X is hosting containers" <ci> <lxd> <remove-machine> <status> <juju-core:Triaged> <https://launchpad.net/bugs/1582264>	14:42
babbageclunk	fwereade: Yeah - that's right.	14:43
fwereade	babbageclunk, ok, cool, I'm catching up :)	14:43
babbageclunk	fwereade: (Sorry - I'd rerun the tests to get numbers, but my timing is in a stash that won't apply after the bulldozing I've been doing.)	14:44
fwereade	babbageclunk, no worries, I know how it is	14:44
fwereade	babbageclunk, possible derail: have you spotted any 5s-ish waits at all?	14:45
babbageclunk	fwereade: mmmmmmaybe? Trying to remember.	14:46
fwereade	babbageclunk, file it for future reference -- 5s waits usually mean that the txn log watcher is chilling out on its own schedule, and something needs to goose it into activity with a StartSync	14:47
babbageclunk	fwereade: certainly things were going from 0.05s to ~5ish s, but I don't think any one block of time was ~5s.	14:47
fwereade	babbageclunk, or, it it happens all the time, it means StartSync is broken	14:47
babbageclunk	fwereade: Ok, I'll keep an eye out for that.	14:48
fwereade	babbageclunk, and it wouldn't be every test anyway, basically just watcher ones	14:48
mattyw	fwereade, as you're around I have a quick question	14:49
fwereade	mattyw, go on	14:49
mattyw	fwereade, this function is far from ideal because of the panic https://github.com/juju/names/blob/master/unit.go#L26	14:49
mattyw	I know there must be a non panicing version somewhere, but I don't see it	14:50
mattyw	there's someway you can do it by going around the houses	14:50
fwereade	mattyw, it would make me super happy if you were to write useful versions of those funcs	14:50
fwereade	otherwise you're stuck assuming that every client obviously always has a valid unit name because, uh...	14:51
mattyw	fwereade, if I was to do it in pursuit of a potential panic in the apiserver would I earn some kind of prize?	14:51
fwereade	mattyw, my admiration and respect?	14:52
mattyw	fwereade, hmmmm, I'll take it	14:52
fwereade	<3	14:52
fwereade	also beer next time we're in the same city	14:52
babbageclunk	dooferlad: If I was dropping the collections I'd need to do something to redo the collections that need explicit creation, wouldn't I?	14:54
dooferlad	babbageclunk: yes	14:56
babbageclunk	dooferlad: Ok, looking at it that doesn't seem to fiddly.	14:57
babbageclunk	dooferlad: Do you think that'd avoid the horrible transaction mess I've gotten myself into?	14:57
babbageclunk	dooferlad: because I'd like that.	14:58
dooferlad	babbageclunk: yes - if you delete the txn collection you will be home and dry	14:58
dooferlad	(probably)	14:58
babbageclunk	dooferlad: Hmm - It was deleting the txns (although not the collection itself) that got me last time.	14:59
babbageclunk	dooferlad: Still, worth a go!	14:59
dooferlad	babbageclunk: ask fwereade, but there must be a way of deleting basically everything and starting again.	15:00
dooferlad	babbageclunk: see https://github.com/go-mgo/mgo/blob/v2/txn/mgo_test.go	15:03
mup	Bug #1582268 opened: TestInstancesGathering fails <intermittent-failure> <test-failure> <juju-core:Triaged> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1582268>	15:04
babbageclunk	dooferlad: Looks suspiciously like https://github.com/juju/testing/blob/master/mgo.go#L522	15:05
babbageclunk	dooferlad: I think it's that stuff that I'm trying to avoid doing.	15:08
cherylj	katco, frobware - ping?	15:17
frobware	cherylj: pong	15:17
cherylj	hey frobware, can I ask you and katco to help stay on top of CI bugs this week while we're all in vancouver?	15:17
frobware	cherylj: yep, sure	15:17
cherylj	frobware, katco - the latest run had a lot of bugs: http://reports.vapour.ws/releases/3973	15:18
cherylj	sinzui, abentley - could you guys help katco and frobware prioritize CI bugs?	15:18
frobware	cherylj: who should I bug for the azure related failures?	15:18
cherylj	frobware: i think for that one in particular, CI needs to clean up resource groups	15:19
cherylj	frobware: it is unclear if the expectation is that juju should be cleaning those up	15:20
abentley	cherylj: I am kind of in the middle of submitting bugs, but I am triaging them :-)	15:20
cherylj	abentley: thanks :) please work with frobware and katco to get people assigned to blockers	15:20
dooferlad	babbageclunk: sorry tea + daughter break	15:22
babbageclunk	dooferlad: :) no worries	15:23
dooferlad	babbageclunk: the test there does a set up suite that starts the database then has a drop all, redial step.	15:23
sinzui	frobware: CI exhausted its resources this weekend. Old and broken jujus are the likely cause. I cleaned up a few hours ago and am retesting	15:23
frobware	sinzui: ack	15:23
dooferlad	babbageclunk: this is part of the txn testing	15:24
babbageclunk	dooferlad: Yup - ours does something similar, but the server startup is before the suite setup.	15:24
dooferlad	babbageclunk: oh, I thought we had one server per test	15:25
=== frankban\|afk is now known as frankban
babbageclunk	dooferlad: no, the server is running the whole time - it's started here: https://github.com/juju/juju/blob/master/state/package_test.go#L18	15:27
mup	Bug #1578834 changed: update-alternatives fails to switch between juju-1 and juju-2 <cpe-sa> <packaging> <juju-core:Invalid> <juju-release-tools:Triaged> <https://launchpad.net/bugs/1578834>	15:34
dimitern	frobware: do you know if we're having the call with rick_h_ today?	15:40
dooferlad	dimitern: he has cancelled	15:44
dimitern	dooferlad: ok, thanks	15:44
dooferlad	dimitern: or, at least, he set his response to not attending for this week.	15:45
dooferlad	dimitern: not that I got a message about it :-(	15:45
dimitern	dooferlad: yeah, I guess I should've looked for that first :)	15:45
frobware	dimitern: no	15:46
dimitern	frobware: thanks for following up on the private address question	15:46
dooferlad	frobware / dimitern: is there anything worthy of discussion before the end of day?	15:46
dooferlad	frobware / dimitern: in a hangout that is	15:47
frobware	dooferlad: only to scan the CI failures to see if there's stuf we can take	15:47
dooferlad	frobware: probably better done tomorrow at start of day?	15:50
mattyw	fwereade, you got time to talk about manifolds and charm upgrades?	15:52
fwereade	mattyw, sure	15:53
frobware	dimitern, voidspace, dooferlad, babbageclunk: PTAL @ http://reviews.vapour.ws/r/4844/	15:55
dimitern	frobware: looking	15:57
dimitern	frobware: LGTM	16:07
frobware	dimitern: ty	16:07
=== redir_ is now known as redir
frobware	dimitern: am I correct, ... we don't have any explicit tests for machine_linklayer devices?	16:15
dimitern	frobware: in state?	16:15
frobware	yep	16:16
dimitern	frobware: there is complete coverage of that code	16:16
hoenir	v	16:16
frobware	dimitern: through linklayer_devices_test?	16:16
dimitern	frobware: there are white-box tests in _internal_ and black-box tests otherwise	16:18
frobware	dimitern: ok	16:18
dimitern	frobware: so linklayerdevices_internal_test.go and linklayerdevices_test.go	16:19
dimitern	frobware: similarly for linklayerdevices_ipaddresses -	16:19
frobware	dimitern: I was searching for the wrong patter; was including machine	16:20
dimitern	frobware: yeah, machine-related tests were split across multiple files now	16:21
babbageclunk	dooferlad: Hmm. Reusing the database actually does seem to bring the 3.2 performance into line with the original tests on 2.4. Except I can't run my updated tests against 2.4.	17:05
voidspace	dimitern: you still around?	17:14
dimitern	voidspace: yeah	17:14
voidspace	dimitern: you got a minute or two spare to talk about linklayerdevices?	17:15
dimitern	voidspace: sure - standup ho?	17:15
voidspace	dimitern: yep	17:15
dimitern	voidspace: i'm in there now	17:16
mup	Bug #1488245 opened: Recurring lxc issue: failed to retrieve the template to clone <canonical-bootstack> <kanban-cross-team> <lxc> <juju-core:New> <https://launchpad.net/bugs/1488245>	18:10
=== redir is now known as redir_afk
=== redir is now known as redir_afk
katco	natefinch: hey any update on the bug? i think ian needs to give a demo sometime this week	20:06
natefinch	katco: poking at it now... my wife got sick midday, so I had to take some time off, but actively working on it now and will spend time tonight on it as well, if I don't have it figured out soon.	20:08
katco	natefinch: k, lmk if you have to get pulled off again. ian needs a fix asap	20:09
natefinch	katco: will do	20:11
katco	natefinch: (i.e. "today") :( lmk if you need help from anyone or another set of eyes or whatever	20:12
natefinch	katco: lack of logging is hindering initial efforts at figuring it out, but that's easy enough to add in	20:13
wallyworld	natefinch: katco: yeah, sorry about short turaround time, needs to be done for a schedule lightning talk	20:26
wallyworld	natefinch: katco: and as you can imagine, given the audience here, it needs to work :-)	20:27
natefinch	wallyworld: np, able to repro, probably a charmstore issue, but trying to narrow it down to make sure right now.	20:27
bdx	maas-juju-networking-peeps: hey, so initially a juju deployed maas node gets bridges created on its interfaces, but if I reboot the node, /etc/network/interfaces gets stomped, and juju created bridges are wiped	20:33
bdx	:-(	20:34
bdx	filing a bug now	20:34
bdx	https://github.com/juju/juju/issues/5409	20:35
natefinch	wallyworld, katco, ericsnow: think I see the issue. Store isn't returning an origin	21:02
katco	natefinch: how is our client handling that? why isn't it an error?	21:03
natefinch	katco: trying to figure that out. my guess is we're just retrying	21:04
katco	natefinch: good find, ty	21:05
natefinch	katco: wrote a tiny CLI script frontend to our API client wrapper... was way faster than trying to iterate through bootstrap etc	21:06
katco	natefinch: very nice! love the quicker iteration :)	21:06
natefinch	ericsnow, katco: http://pastebin.ubuntu.com/16468371/	21:07
natefinch	I gotta run for dinner, but I'll be back on later. pretty sure the fix is just to default the origin to store	21:08
katco	natefinch: that would be nice if it was just a client side fix	21:08
katco	ericsnow: do you agree with that approach? and override if the resource comes from a --resource flag?	21:09
natefinch	katco: this is just while downloading from the store itself	21:09
katco	natefinch: that seems sane. obviously couldn't originate from anywhere else	21:10
natefinch	katco: in theory the store could just always return an origin of "store", but they're not, I guess on the theory that, duh it's from the store	21:10
mup	Bug #1582408 opened: System id is listed during bootstrap and deploy of charms instead of system name <juju-core:New> <https://launchpad.net/bugs/1582408>	21:11
ericsnow	katco: we were defaulting to store but that changed when we stopped working on the store/csclient code	21:11
ericsnow	katco: so I'm on board with defaulting to that	21:11
ericsnow	katco: gotta run for a couple hours (unexpectedly); will be back later	21:46
katco	ericsnow: k	21:46
davecheney	so, all the managers are away at a sprint and master hasn't been blocked for days	22:01
davecheney	coincidence ? unlikely :)	22:01
katco	davecheney: lol i enjoy this theory	22:02
davecheney	science fiction > science fact	22:16
natefinch	katco, ericsnow, wallyworld: btw, confirmed that fixes the bug. Not really here, but I'll have a PR up in a couple hours-ish	23:44
katco	natefinch: awesome ty	23:49

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!