[03:46] Bug #1582065 opened: Juju seems confused about a newly defined MAAS cloud [06:44] anastasiamac, the only thing that springs to mind is to stop the machine agent, remove all evidence of the failed charm, and let the machine agent come up and deploy it again [06:46] anastasiamac, (would have been better to do that first, but I can't immediately think of anything that the dying principal would break) [06:49] anastasiamac, (my working theory is that worker/deployer failed half way through and left garbage it subsequently interpreted as a successful deployment) [07:12] fwereade: thnx.. i htink it's on state server... so m not sure what the implications would be.. [07:12] i mena the unit was deployed on state server machine [07:16] fwereade: the only thing I can find is the agent.conf, so you're suggesting remove that? [07:17] fwereade: as in /var/lib/juju/agents/unit-/agents.conf, I couldn't find any other evidence of it on disk [07:17] fwereade: and it is deployed to a state server [07:38] Bug #1582105 opened: lxd provider doesn't honour memory constraints [07:41] bradm, anastasiamac: also check for the init system conf [07:42] bradm, anastasiamac: state server shouldn't matter, it's not super-elegant to stop it but it shouldn't do any harm [07:42] fwereade: no init system conf for it [07:43] fwereade: so you're saying move the /var/lib/juju/agents/unit- and restart the machine agent? [07:43] fwereade: bradm: got to run - family, dinner, etc... but i'll read the scroll \o/ [07:43] bradm, I think so, just reading the deployer code to see what I might be missing [07:44] fwereade: its odd, because this is the only subordinate of its type that failed [07:45] bradm, also move the tools dir for that unit [07:46] fwereade: ah, the symlink? can do [07:46] bradm, (shouldn't *really* matter, but it's another thing the deployer is responsible for) [07:46] bradm, what should happen when you start the machine agent is that it'll see unit-dying, nothing-deployer, and advance the unit to dead immediately [07:47] bradm, ...wait a mo [07:47] fwereade: ugh, I've already done that [07:48] bradm, ah no don't worry :) [07:48] nothing seems to have changed, though. [07:48] jujud is chewing away at cpu though [07:49] bradm, anything from juju.worker.deployer in the logs? [07:49] fwereade: nope [07:52] fwereade: the last thing was from juju.worker.firewaller [07:53] bradm, sorry, just to be clear on the syncing: from a fresh start of the state server, we see *nothing* in the juju.worker.deployer logs? but e.g. status shows units assigned to the machine? [07:54] bradm, I would expect at least a "checking unit %q" message or two... [07:55] fwereade: I don't see anything about "checking unit" in the machine logs at all [07:55] fwereade: oh, no, I'm wrong [07:55] bradm, phew :) [07:55] bradm, at least some part of the universe makes sense ;p [07:55] fwereade: there's some from quite a number of hours ago, nothing for my problematic subordinate [07:56] fwereade: I'm getting quite a bit of stuff thrown out to juju debug-log though, seemingly all from units [07:56] bradm, huh, I don't see how it could have messed up the deploy in the first place without at least noticing and checking it [07:56] fwereade: this jujud is seemingly quite flakey [07:57] I seem to get a lot of write: broken pipe in the logs [07:57] bradm, what sort of things are happening? that doesn't feel *too* unexpected if we're bouncing state servers [07:58] fwereade: well, for a start a subordinate didn't deploy cleanly.. :) [07:58] bradm, indeed :) [07:58] fwereade: this was landscape-client, and only on the state server did it fail [07:59] ugh, I'm going to run out of day, will have to disappear for dinner soon [08:00] bradm, can you paste any of the logs anywhere for me to take a look at? [08:00] there's a bunch of juju.apiserver logs about "login for machine-x-lxc-y blocked because upgrade in progress" [08:00] which is odd, because it was deployed with 1.25.5 and hasn't been upgraded [08:02] bradm, what that really means is the state server is still waking up and isn't yet certain that it's fully upgraded [08:02] bradm, if it keeps saying that for a long time it's a problem, though [08:04] macgreagoir: fancy seeing you here! [08:05] bradm: :-D [08:28] dimitern: ping [08:28] frobware: pong [08:29] dimitern: do you have 15 mins before standup - would like to sync on the private-address issue [08:30] frobware: ok, in :45 ? [08:31] dimitern: fine, thx [08:40] does anyone have time to review my patch please? [08:40] dimitern, fwereade: so what's with the smiley face on juju.fail? i can still see critical bugs open... [08:41] hoenir: well, beta7 is out, that's why master is unblocked [08:41] hoenir: sorry :) that ^^ was for rogpeppe1 [08:41] dimitern: so master doesn't get blocked on critical bugs any more? [08:41] dimitern: woo! [08:42] hoenir: I was about to say that most of us are less familiar with the windows support code and I'd suggest asking for a review from e.g. gsamfira or bogdanteleaga as they're most familiar with that code [08:42] rogpeppe1: only for those with 'ci' and 'blocker' tags IIRC [08:43] dimitern: ah, cool [08:43] * rogpeppe1 hits $$merge$$ and hopes [08:43] frobware: joining standup ho now [08:43] dimitern: i thought it was gonna be months until i was able to land my "high" importance bug fix :) [08:44] rogpeppe1: merge while you can :D [08:45] dimitern, bogdanteleaga already reviewed my code and it said it was ok, but he advised me also to seek some more review on irc. [08:54] dimitern: BTW, i just saw this comment in the ec2 provider: // TODO(dimitern): Both of these shouldn't be restricted for hosted models. [08:55] dimitern: i think that, for the time being at least, region should remain restricted [08:55] rogpeppe1: why so? [08:55] dimitern: because we currently rely on the fact that models will be deployed to the same region [08:56] dimitern: and there's no mechanism in place to determine "default" region [08:59] rogpeppe1: I see, well good that I haven't made 'region' unrestricted :) [08:59] hoenir: I've updated my review. [08:59] dimitern: looks like vpc id is still restricted thougth [09:00] rogpeppe1: 'vpc-id-force' is [09:00] dimitern: ah, what does that do? [09:00] rogpeppe1: forces juju use 'vpc-id' at bootstrap only [09:01] dimitern: sorry, i don't understand that [09:01] Why is it so hard to get from a PR to the originating branch in github? I always end up editing the URL. [09:01] babbageclunk, thanks again [09:02] voidspace, babbageclunk: standup? [09:02] Shouldn't the branches in "hoenirvili wants to merge 1 commit into juju:master from hoenirvili:refactor-userdata-cloudconfig" be links? [09:02] dimitern: omw [09:02] Oh yeah, sorry - too much ranting. [09:05] hoenir, made a couple of comments [09:05] hoenir, hmm, (I should probably look at the filetoconst one more closely if it wasn't just a move) [09:07] morning, folks [09:08] babbageclunk, fwereade , thanks again for the comments and reviwing my code, I will start working right now to fix the mistakes. [09:17] hoenir, yw, ping me if anything is not clear [09:20] fwereade, after the modification I will ping you for more advice .. [09:22] hoenir, cheers [09:23] dimitern: babbageclunk: frobware: http://reviews.vapour.ws/r/4838/ [09:24] dooferlad: http://reviews.vapour.ws/r/4838/ [09:26] voidspace: that looks like it includes the spaces PR as well/ [09:26] ? [09:26] dimitern: oh yes it does - sorry [09:26] dimitern: it requires it, and that is still landing [09:26] dimitern: https://github.com/juju/juju/pull/5394 [09:27] dooferlad: that PR includes the "already reviewed but not yet landed" spaces PR - https://github.com/juju/juju/pull/5394 [09:27] voidspace: cheers [09:31] voidspace - You can use rbt to change the base commit for the review so it doesn't include the other changes. [09:31] babbageclunk: ah, ok [09:32] hmm, if rbt worked for me [09:32] it's python 2 code and my default python is now 3 (xenial) - I *assume* that is the cause orf "no module named datetime" anyway [09:32] rbt is in a virtualenv I can blow it away and restart [09:35] voidspace: what happens with AddSpace when you try to add a space with yet-unknown name, but providerID matching existing space? [09:36] dimitern: the same as before [09:36] dimitern: how can the name be unknown? [09:36] voidspace: won't you get ErrAborted from runTransaction due to txn.DocMissing failed for the providerID ? [09:36] dimitern: just looking [09:36] voidspace: I mean add space "foo" with providerID "42", then try adding space "bar" also with provID "42" [09:37] dimitern: it fails [09:37] dimitern: there's a test for that [09:37] dimitern: it fails with ProviderId not unique [09:38] voidspace: hmm because of the refresh + check notfound yeah I see [09:38] voidspace: ok, just checking :) [09:38] dimitern: it would have been nice to avoid that Refresh [09:39] nice when we have tests that already cover this case - makes refactoring with impunity possible :D [09:39] yeah :-) [09:39] voidspace, dimitern: Refresh-inside-txn is unhelpful anyway, it means that any call that changes remote state might also change arbitrary local state [09:39] that test failed initially - I had to move the Refresh code [09:39] voidspace: you can - looking up the providerIDsC doc by the providerID + global key of the space [09:39] fwereade: it's "internal local state" [09:39] i.e. findid() [09:39] fwereade: the space we're refreshing is one created inside AddSpace and not returned [09:40] voidspace, dimitern: ah cool [09:40] fwereade: yeah, I think the refresh there is relatively safe, as it's adding a new doc [09:40] we're already in an error path at that point [09:41] voidspace, dimitern: I *would* say that the notion of an error path doesn't quite fit right with txns -- the loop should be (1) check everything in-memory, (2) try to apply change with matching asserts [09:42] voidspace, dimitern: so just always reading the space doc fresh might be cleanest? [09:42] ...I should look at the code before pontificating too much [09:42] fwereade: that's not a bad idea [09:42] fwereade: what we're checking for in the assert is that another space with a different name doesn't have the same provider id [09:42] so it isn't in memory [09:42] and when we get ErrAborted we need to know why it failed to return the right error (message) [09:43] fwereade: however, in that particular case we have no loop (i.e. no buildTxn, but a single try-check-erraborted-and-fail-accordingly) [09:43] the refresh failing confirms that it failed due to a duplicate provider id and not because the name already existed [09:44] the refresh is unnecessary - we've already checked for the "already exists" case so it should be safe to *assume* that it was the other assert that failed [09:44] we used to *have to refresh* because the failure was silent, no assert failure, so we had to check the insert worked by refreshing [09:44] voidspace: hmm not quite though, we have 3 asserts - model alive, doc missing for spacesC, doc missing for providerIDsC [09:44] ah yes [09:45] the refresh is now only in the error path though, not on the happy path [09:45] uhm [09:45] so it's already better [09:45] actually also 1 assert per subnet [09:45] when subnetids is not empty [09:45] voidspace, and that's the trouble with assuming based on asserts :) it's way too easy to lose track of the assumptions [09:45] yes, but we check for that before the refresh as well [09:45] so we'll leave the refresh in place then [09:46] it's an unlikely (theoretically impossible) failure mode anyway... [09:46] * dimitern brb [09:50] * fwereade has seen the code, and, yeah, that Aborted branch is too long -- IMO it's crying out to be restructured to loop over: (1) check provider id not already used (2) check subnets all exist (3) shouldn't we check that we're not sneaking subnets away from other spaces? [09:51] fwereade: that sounds better and more thorough [09:52] then you (1) check sanity with a bunch of cheap reads instead of firing off an expensive txn without any reason to believe it works and (2) just let the runner mechanism handle the details [09:52] if you've messed up you'll get an ErrExcessiveContention which usually actually means that your memory checks don't match your asserts [09:53] voidspace: how about fixing that ^^ in a follow-up? [09:53] dimitern, voidspace: the other reason to favour that structure, btw, is so you can build up your asserts alongside your memory checks [09:53] dimitern, voidspace: having to dupe the logic sucks almost unbearably hard anyway [09:54] fwereade: in the attempt > 0 case in buildTxn you mean? [09:54] dimitern, voidspace: having the memory version of the code far away from the assert version is a recipe for screwups [09:54] dimitern, I don't think you need to check attempt, do you? [09:55] fwereade: even to know when to re-eval your local state? [09:56] dimitern, yeah, but I thought there wasn't any here? [09:56] dimitern, voidspace: https://github.com/juju/juju/wiki/mgo-txn-example attempts to exemplify what I'm talking about [09:58] fwereade: it's not there in AddSpace code, but I reading back I see what you mean - nice example btw! thanks [09:58] dimitern, cheers [10:02] fwereade: any particular preference between txn.Op{.., Insert: docType{}} vs txn.Op{.., Insert: &docType{}} ? [10:03] I see the former is more common [10:03] dimitern, I don't much care really -- I have a slight preference for passing values rather than pointers, to signal that modifications are not expected/intended [10:04] dimitern, because I *have* seen people silently changing fields when given access [10:04] fwereade: right, it also looks somewhat cleaner with a value [10:04] dimitern, indeed [10:05] dimitern, the only argument against I can think of is that it might take more memory -- and people *do* keep bringing this up -- but it kinda baffles me. maybe when your struct is 4k big or something? in the meantime we have gigs available, let's not fret ;) [10:06] fwereade: yeah, I guess for storing binary blobs it makes more sense [10:06] but even then.. [10:06] dimitern, well, yeah, and if it's a slice it won't copy anyway [10:07] fwereade: what really scares me are maps as docs actually (or subdocs) [10:07] dimitern, what's the case you need to worry about... non-pointer receivers on methods backed by array types? I think that copies the whole thing [10:07] dimitern, yeah, indeed [10:08] voidspace: reviewed [10:08] dimitern, fixed a bug just the other day with internal maps leaking out uncopied [10:08] fwereade: nasty :/ [10:08] dimitern, even then, I will worry about that when I write a type like that ;) [10:09] dimitern, but anyway [10:09] fwereade: yeah :) [10:09] dimitern, I'm pretty sure our txn composers copy internally anyway [10:14] fwereade: it seems they do [10:57] voidspace, babbageclunk, dooferlad: a tiny review http://reviews.vapour.ws/r/4839/, please take a look [11:04] thanks babbageclunk! [11:23] dimitern: hmm - something interesting. [11:23] babbageclunk: yeah? [11:24] dimitern: because the state tests use Reset to teardown and re-setup in tests... [11:24] dimitern: and cleanups don't get removed in teardown... [11:25] dimitern: I end up seeing repeated calls to state.Close() [11:25] babbageclunk: and panics I presume? [11:26] dimitern: nope [11:26] babbageclunk: by "cleanups" do you mean the cleanups collection? [11:30] dimitern: fwereade: thanks for the reviews by the way [11:30] voidspace, sorry it was only a glance, I think you and dimitern have it in hand [11:30] dimitern: Yes, functions registered in suite.addCleanup [11:30] fwereade: cool [11:31] dimitern: I'm going to change CleanupSuite to set them to nil after executing them. [11:31] babbageclunk: ah, that's different - that's for bits changed by PatchValue etc. [11:31] babbageclunk, multiple Close()s should be fine/safe, but we should have got them all out of the way before we hit Reset, I think [11:33] babbageclunk: setting that to nil without un-patching everything sounds really bad [11:33] dimitern: Executing the cleanup *is* unpatching everything. [11:33] babbageclunk: or you mean they're already un-patched, but the slice is never nil [11:33] dimitern: yes, that [11:33] babbageclunk: right, then it sounds safe [11:34] fwereade: What do you mean by "should have got them all out of the way before Reset"? [11:35] fwereade: Aren't the closes being done by Reset (which calls TearDownTest)? [11:36] babbageclunk, hm, maybe I have the wrong context, code reference? [11:36] babbageclunk: if anything, it should be the other way around: TearDownTest is called by gocheck, which probably calls Reset [11:36] babbageclunk, I was thinking of the nuke-db-state Reset? which I think undercuts everything else [11:37] babbageclunk, the dummy provider will do it to you in the middle of your tests if you're not careful >_< [11:38] fwereade: oh, no - this is a method on the allwatcher_internal_test that calls teardown and then setup - it's called in the middle of a test (because the test is running multiple scenarios). [11:38] babbageclunk, oh ffs :( [11:38] babbageclunk, don't suppose I can prevail upon you to just pull out the tests that so clearly want to be separate and deserve a test runner written for the job? [11:40] fwereade: maybe? I'm not sure I get the second bit. [11:41] babbageclunk, insight of katco's, crudely paraphrased by me: table-based tests are rubbish because you're writing your own test runner and that's a waste of effort [11:41] babbageclunk, TBTs do have their place but it's an increidbly narrow simple one [11:41] babbageclunk, if it doesn't look like a table any more, it's probably too complex to be a good TBT IMO [11:42] babbageclunk, also, imagine how nice it will be when the tests just have their own names [11:42] babbageclunk: i.e. instead of that test calling TearDownTest inside, it should have a cleanSetup and cleanCleanup helpers that do not depend on calling SetUpTest or TearDownTest [11:42] babbageclunk, and failures get reported individually [11:42] babbageclunk, and you don't need to find your failure in the middle of 30 pages of successful logs [11:43] fwereade: Yeah, that makes sense - I think the reason for this one is that the tests are parameterised into allModelWatcher and allwatcher flavours. [11:43] fwereade: But I'll have a go. [11:44] babbageclunk, bleh, I see -- thanks [11:45] dimitern, I'm more saying, use SetUpTest and TearDownTest as expected, and write 50 TestFooBarBaz methods that don't abuse their infrastructure [11:45] fwereade: I know :) I was just trying to make it slightly easier for babbageclunk [11:45] dimitern, that way you get to see which bits failed, and see their logs in isolation, and not have to worry about Assert cutting short a run [11:45] etc :) [11:46] fwereade: But still, I think the point is a good one - I'll see if I can figure out a better way of making the tests split out. [11:47] babbageclunk, cheers [11:47] babbageclunk: might help using multiple suites to better separate concerns [11:47] and minimize code duplication around setup/teardown [11:49] babbageclunk: beware though, if you're embedding a "allWatcherBaseSuite" into e.g. "allModelWatcherSuite", and both of those are registered in gocheck (gc.Suite(..)), you'll get run all tests for both suites [11:50] babbageclunk, I admit I'm not really aware of a good pattern for running the same suite against different fixtures -- everything I can immediately think of smells too much of trying to do inheritance in golang, which is pretty much always a terrible idea [11:52] fwereade: Well, at the bottom level I can give each of the not-quite-test funcs a name and then multiply out the actual tests. [11:52] fwereade: then it's just a matter of editor automation [11:52] babbageclunk, yeah, I think that's progress [11:54] babbageclunk, dimitern: also, yeah -- I am getting increasingly irritated at suites that are also fixtures [11:54] fwereade: me too! [11:55] fwereade: I've fixed a bunch of those recently, around the provisioner [11:55] babbageclunk, dimitern: have been making a point of `(*FooSuite)` receivers and explicit fixture setup and I think it's working out pretty well [11:55] dimitern, nice [11:58] fwereade, dimitern: Oh, cool - I've been mostly doing that for a bit in Python too (since JB pointed me at a blog post about it) [11:59] Hmm, that said though - allwatcher_internal_test.go is 3244 lines, so maybe I'll rewrite them all in a separate diff. [12:04] anastasiamac: ericsnow: perrito666: meeting time [12:07] babbageclunk, holy shit :( [12:08] anastasiamac: perrito666: hello? [12:15] katco: ericsnow natefinch anastasiamac getting there, I sent an email about being late [12:45] Bug #1582214 opened: upgrade-juju output is confusing [12:45] i'm after a review of this change to the GCE provider if anyone wants to help out, thanks! http://reviews.vapour.ws/r/4840/ === rogpeppe1 is now known as rogpeppe [12:56] dimitern: PR updated [12:57] voidspace: thanks, will look shortly [13:00] dimitern: cool [13:10] dimitern: Ha ha, it turns out that clearing out the transaction system's collections and then asking it to run another set of operations in a transaction doesn't work very well. [13:10] dimitern: This is obvious in retrospect. [13:11] babbageclunk, haha [13:11] babbageclunk, (do you know about SetBeforeHooks et al?) [13:11] fwereade: no - what are those? [13:12] babbageclunk, repeatable race tests for txns [13:12] babbageclunk, hooks in just before executing the txn, so you can change state underneath it and check it aborts as expected [13:13] babbageclunk, (and just after, if you want, as well: there are a few tests that just change state under every txn and restore it before the SUT reconstructs it and checks that the runner gives up [13:13] fwereade: oh, neat [13:14] babbageclunk, they're not quite in the right place any more because the underlying functionality moved out of state so it's not really a state responsibility any more, but still ;) [13:14] babbageclunk, they come in very handy ;) [13:14] babbageclunk, search in state for examples [13:15] fwereade: thanks - I'll see if I can use them for this. [13:24] * dimitern needs to step out for ~1h [13:34] katco: I got kicked out of the call brt [13:35] k there is something not ok with google calendar, can anyone give me the link? natefinch? [13:35] https://plus.google.com/hangouts/_/canonical.com/tanzanite?authuser=1 [13:36] perrito666: log out and back in to google? [13:41] natefinch: tx, google decided that I should logout of every account [14:13] fwereade, dimitern: If I get an ErrAborted from running a transaction, how can I work out what's causing it? [14:13] dooferlad: i see you're OCR today... fancy a review? :) http://reviews.vapour.ws/r/4840/ [14:13] babbageclunk: you can't [14:13] rogpeppe: *click* [14:13] rogpeppe: :( [14:13] babbageclunk: the transaction might have been run by another client [14:14] babbageclunk: that's the way that mgo/txn works [14:14] babbageclunk: the usual approach is to delve back into the db and see what might've been the cause [14:14] babbageclunk: and yes, it's pretty crap [14:14] rogpeppe: Ah, so it doesn't necessarily mean that the changes didn't get made? [14:15] babbageclunk: if you get ErrAborted from a transaction, no changes have been made [14:15] babbageclunk: transactions work by first checking all assertions, and only if all of them pass, applying all changes [14:15] rogpeppe: right - makes sense. Thanks. [14:15] dooferlad: ta! [14:17] dooferlad: BTW if you see mhilton's review, ignore it. he isn't a qualified juju-core reviewer. and he was part-author of the changes. [14:17] sure [14:18] babbageclunk, you can't -- that's why you have to loop, and check all the things you depend on every time [14:20] fwereade: In this case I've just deleted all of the things from all of the collections (but not the txns now :). So there really shouldn't be any asserts failing in the transaction. [14:20] babbageclunk: you haven't got an asserts in your txn? [14:21] s/an/any/ [14:21] babbageclunk, hmm, unless they're all txn.DocMissing I would expect any that imply the existence of a doc to fail [14:21] rogpeppe: They're all DocMissing... [14:21] babbageclunk, but, how are you deleting everything? [14:21] babbageclunk, once you've used mgo/txn on a document it's basicaly off-limits for everything else [14:22] babbageclunk, SetBeforeHooks is better suited to changes you can make via the exported interface [14:22] fwereade: Ok, that might be it - I'm deleting them using collection.RemoveAll, so outside a transaction. [14:22] babbageclunk, how are you getting that collection? .Writeable()? [14:23] fwereade: weirdly, it works fine in mongo3.2 (for some value of works), just not in 2.4 [14:23] / Collection imperfectly insulates clients from the capacity to write to [14:23] / MongoDB. Query results can still be used to write; and the Writeable [14:23] / method exposes the underlying *mgo.Collection when absolutely required; [14:23] / but the general expectation in juju is that writes will occur only via [14:23] / mgo/txn, and any layer-skipping is done only in exceptional and well- [14:23] / supported circumstances. [14:25] fwereade: Maybe some context is in order. I'm working on this: https://bugs.launchpad.net/juju-core/+bug/1573294 [14:25] Bug #1573294: state tests run 100x slower with mongodb3.2 [14:26] fwereade: I don't know whether it constitutes exceptional circumstances though. [14:26] babbageclunk, ahh, ok... hmm [14:27] So I'm trying out an approach where instead of deleting and recreating the DB I scrape all of the data out and make it look all clean and new. [14:28] fwereade: But I definitely might be trying to do this at the wrong level. [14:29] babbageclunk, thinking [14:29] babbageclunk, so, are we confident it's the index-creation that's slowing us down? [14:30] fwereade: not totally - it's big, but it's not the only big part from my instrumenting. [14:31] fwereade: I'm partly doing this in the expectation that it'll still be too slow. [14:31] fwereade: (I was expecting it to be easier than it has been. ;) [14:32] babbageclunk: have you just tried deleting all the collections rather than deleting everything in them, then recreating them without indexes? [14:32] babbageclunk, do we know that a second EnsureIndex runs faster than the first one? [14:33] fwereade: Oh, hang on - do you mean the second creation of a specific index? [14:34] dooferlad: No, not yet. I tried commenting out the index creation and running the tests and it was still hella slow. [14:35] babbageclunk, yeah [14:35] babbageclunk, ok, interesting [14:35] fwereade: In mongo 2.4 creating the first index is much slower than any subsequent (like 100x). [14:36] fwereade: In 3.2 I don't see the same - they're all about the same time. [14:36] fwereade: but I haven't tried dropping and recreating the same one (which I guess would be closer to what the tests are doing). [14:37] fwereade: (oh, just looked back - in 2.4 it's 10x, not 100x for subsequent index creation) [14:38] babbageclunk, ok, and if we don't use indexes at all we don't actually save any significant time anyway? or is "hella slow" notably better than before? ;) [14:39] fwereade: Well, in the tests the indexes are probably having minimal effect (maybe even slightly negative). [14:42] babbageclunk, I was imagining that there was a heavy per-test index-creation cost but that the presence or absence of indexes wouldn't make much difference to the test cases themselves [14:42] fwereade: I think my numbers were that database teardown was also a big part of the time, but the tests themselves were also still big. [14:42] Bug #1582264 opened: remove-machine fails with false "machine X is hosting containers" [14:43] fwereade: Yeah - that's right. [14:43] babbageclunk, ok, cool, I'm catching up :) [14:44] fwereade: (Sorry - I'd rerun the tests to get numbers, but my timing is in a stash that won't apply after the bulldozing I've been doing.) [14:44] babbageclunk, no worries, I know how it is [14:45] babbageclunk, possible derail: have you spotted any 5s-ish waits at all? [14:46] fwereade: mmmmmmaybe? Trying to remember. [14:47] babbageclunk, file it for future reference -- 5s waits usually mean that the txn log watcher is chilling out on its own schedule, and something needs to goose it into activity with a StartSync [14:47] fwereade: certainly things were going from 0.05s to ~5ish s, but I don't think any one block of time was ~5s. [14:47] babbageclunk, or, it it happens all the time, it means StartSync is broken [14:48] fwereade: Ok, I'll keep an eye out for that. [14:48] babbageclunk, and it wouldn't be every test anyway, basically just watcher ones [14:49] fwereade, as you're around I have a quick question [14:49] mattyw, go on [14:49] fwereade, this function is far from ideal because of the panic https://github.com/juju/names/blob/master/unit.go#L26 [14:50] I know there must be a non panicing version somewhere, but I don't see it [14:50] there's someway you can do it by going around the houses [14:50] mattyw, it would make me super happy if you were to write useful versions of those funcs [14:51] otherwise you're stuck assuming that every client obviously always has a valid unit name because, uh... [14:51] fwereade, if I was to do it in pursuit of a potential panic in the apiserver would I earn some kind of prize? [14:52] mattyw, my admiration and respect? [14:52] fwereade, hmmmm, I'll take it [14:52] <3 [14:52] also beer next time we're in the same city [14:54] dooferlad: If I was dropping the collections I'd need to do something to redo the collections that need explicit creation, wouldn't I? [14:56] babbageclunk: yes [14:57] dooferlad: Ok, looking at it that doesn't seem to fiddly. [14:57] dooferlad: Do you think that'd avoid the horrible transaction mess I've gotten myself into? [14:58] dooferlad: because I'd like that. [14:58] babbageclunk: yes - if you delete the txn collection you will be home and dry [14:58] (probably) [14:59] dooferlad: Hmm - It was deleting the txns (although not the collection itself) that got me last time. [14:59] dooferlad: Still, worth a go! [15:00] babbageclunk: ask fwereade, but there must be a way of deleting basically everything and starting again. [15:03] babbageclunk: see https://github.com/go-mgo/mgo/blob/v2/txn/mgo_test.go [15:04] Bug #1582268 opened: TestInstancesGathering fails [15:05] dooferlad: Looks suspiciously like https://github.com/juju/testing/blob/master/mgo.go#L522 [15:08] dooferlad: I think it's that stuff that I'm trying to avoid doing. [15:17] katco, frobware - ping? [15:17] cherylj: pong [15:17] hey frobware, can I ask you and katco to help stay on top of CI bugs this week while we're all in vancouver? [15:17] cherylj: yep, sure [15:18] frobware, katco - the latest run had a lot of bugs: http://reports.vapour.ws/releases/3973 [15:18] sinzui, abentley - could you guys help katco and frobware prioritize CI bugs? [15:18] cherylj: who should I bug for the azure related failures? [15:19] frobware: i think for that one in particular, CI needs to clean up resource groups [15:20] frobware: it is unclear if the expectation is that juju should be cleaning those up [15:20] cherylj: I am kind of in the middle of submitting bugs, but I am triaging them :-) [15:20] abentley: thanks :) please work with frobware and katco to get people assigned to blockers [15:22] babbageclunk: sorry tea + daughter break [15:23] dooferlad: :) no worries [15:23] babbageclunk: the test there does a set up suite that starts the database then has a drop all, redial step. [15:23] frobware: CI exhausted its resources this weekend. Old and broken jujus are the likely cause. I cleaned up a few hours ago and am retesting [15:23] sinzui: ack [15:24] babbageclunk: this is part of the txn testing [15:24] dooferlad: Yup - ours does something similar, but the server startup is before the suite setup. [15:25] babbageclunk: oh, I thought we had one server per test === frankban|afk is now known as frankban [15:27] dooferlad: no, the server is running the whole time - it's started here: https://github.com/juju/juju/blob/master/state/package_test.go#L18 [15:34] Bug #1578834 changed: update-alternatives fails to switch between juju-1 and juju-2 [15:40] frobware: do you know if we're having the call with rick_h_ today? [15:44] dimitern: he has cancelled [15:44] dooferlad: ok, thanks [15:45] dimitern: or, at least, he set his response to not attending for this week. [15:45] dimitern: not that I got a message about it :-( [15:45] dooferlad: yeah, I guess I should've looked for that first :) [15:46] dimitern: no [15:46] frobware: thanks for following up on the private address question [15:46] frobware / dimitern: is there anything worthy of discussion before the end of day? [15:47] frobware / dimitern: in a hangout that is [15:47] dooferlad: only to scan the CI failures to see if there's stuf we can take [15:50] frobware: probably better done tomorrow at start of day? [15:52] fwereade, you got time to talk about manifolds and charm upgrades? [15:53] mattyw, sure [15:55] dimitern, voidspace, dooferlad, babbageclunk: PTAL @ http://reviews.vapour.ws/r/4844/ [15:57] frobware: looking [16:07] frobware: LGTM [16:07] dimitern: ty === redir_ is now known as redir [16:15] dimitern: am I correct, ... we don't have any explicit tests for machine_linklayer devices? [16:15] frobware: in state? [16:16] yep [16:16] frobware: there is complete coverage of that code [16:16] v [16:16] dimitern: through linklayer_devices_test? [16:18] frobware: there are white-box tests in _internal_ and black-box tests otherwise [16:18] dimitern: ok [16:19] frobware: so linklayerdevices_internal_test.go and linklayerdevices_test.go [16:19] frobware: similarly for linklayerdevices_ipaddresses - [16:20] dimitern: I was searching for the wrong patter; was including machine [16:21] frobware: yeah, machine-related tests were split across multiple files now [17:05] dooferlad: Hmm. Reusing the database actually does seem to bring the 3.2 performance into line with the original tests on 2.4. Except I can't run my updated tests against 2.4. [17:14] dimitern: you still around? [17:14] voidspace: yeah [17:15] dimitern: you got a minute or two spare to talk about linklayerdevices? [17:15] voidspace: sure - standup ho? [17:15] dimitern: yep [17:16] voidspace: i'm in there now [18:10] Bug #1488245 opened: Recurring lxc issue: failed to retrieve the template to clone === redir is now known as redir_afk === redir is now known as redir_afk [20:06] natefinch: hey any update on the bug? i think ian needs to give a demo sometime this week [20:08] katco: poking at it now... my wife got sick midday, so I had to take some time off, but actively working on it now and will spend time tonight on it as well, if I don't have it figured out soon. [20:09] natefinch: k, lmk if you have to get pulled off again. ian needs a fix asap [20:11] katco: will do [20:12] natefinch: (i.e. "today") :( lmk if you need help from anyone or another set of eyes or whatever [20:13] katco: lack of logging is hindering initial efforts at figuring it out, but that's easy enough to add in [20:26] natefinch: katco: yeah, sorry about short turaround time, needs to be done for a schedule lightning talk [20:27] natefinch: katco: and as you can imagine, given the audience here, it needs to work :-) [20:27] wallyworld: np, able to repro, probably a charmstore issue, but trying to narrow it down to make sure right now. [20:33] maas-juju-networking-peeps: hey, so initially a juju deployed maas node gets bridges created on its interfaces, but if I reboot the node, /etc/network/interfaces gets stomped, and juju created bridges are wiped [20:34] :-( [20:34] filing a bug now [20:35] https://github.com/juju/juju/issues/5409 [21:02] wallyworld, katco, ericsnow: think I see the issue. Store isn't returning an origin [21:03] natefinch: how is our client handling that? why isn't it an error? [21:04] katco: trying to figure that out. my guess is we're just retrying [21:05] natefinch: good find, ty [21:06] katco: wrote a tiny CLI script frontend to our API client wrapper... was way faster than trying to iterate through bootstrap etc [21:06] natefinch: very nice! love the quicker iteration :) [21:07] ericsnow, katco: http://pastebin.ubuntu.com/16468371/ [21:08] I gotta run for dinner, but I'll be back on later. pretty sure the fix is just to default the origin to store [21:08] natefinch: that would be nice if it was just a client side fix [21:09] ericsnow: do you agree with that approach? and override if the resource comes from a --resource flag? [21:09] katco: this is just while downloading from the store itself [21:10] natefinch: that seems sane. obviously couldn't originate from anywhere else [21:10] katco: in theory the store could just always return an origin of "store", but they're not, I guess on the theory that, duh it's from the store [21:11] Bug #1582408 opened: System id is listed during bootstrap and deploy of charms instead of system name [21:11] katco: we *were* defaulting to store but that changed when we stopped working on the store/csclient code [21:11] katco: so I'm on board with defaulting to that [21:46] katco: gotta run for a couple hours (unexpectedly); will be back later [21:46] ericsnow: k [22:01] so, all the managers are away at a sprint and master hasn't been blocked for days [22:01] coincidence ? unlikely :) [22:02] davecheney: lol i enjoy this theory [22:16] science fiction > science fact [23:44] katco, ericsnow, wallyworld: btw, confirmed that fixes the bug. Not really here, but I'll have a PR up in a couple hours-ish [23:49] natefinch: awesome ty