[01:19] thumper: http://reviews.vapour.ws/r/3149/ -- FYI [01:20] thumper: I'd be keen to know if you still get errors with this applied, since I had to really work hard to get the peergrouper to reliably fail [01:26] thumper: and another easy one: http://reviews.vapour.ws/r/3150/ [01:26] axw: I'll grab it down and try [01:29] axw: yay, that looks like it fixed it, for me at least [01:29] thumper: sweet [01:52] menn0 thumper: can I retarget https://bugs.launchpad.net/juju-core/+bug/1516144 to a different series? it's blocking master but it's not on the master branch... [01:52] Bug #1516144: Cannot deploy charms in jes envs [01:52] axw: yeah, it is [01:53] thumper: oh. job says functional-jes [01:53] axw: menn0 is currently looking at it [01:53] ok then [01:53] yes it is the master branch, but the JES test [01:53] righto [01:53] sorry [01:53] menn0 has fixed one of the problems, but CI failed again with an IP address error [01:54] so... weird [02:18] thumper: if I repeat what the CI test does with the local provider it all works [02:18] thumper: trying with joyent now === cmars` is now known as cmars [02:58] menn0: any joy with joyent? [03:06] thumper: everything works fine when I do it manually [03:06] with local provider and joyent [03:06] FFS [03:06] i'm going to look over the logs from the test failure again in more detail [03:07] I'd reply to the curse email, and make sure it is addressed to Curtis and Aaron, cc juju-dev [03:07] let them know [03:08] the test is seeing the dummy-source/0 unit in the first hosted environment is in an error state [03:08] no idea why or how [03:08] hmm [03:09] thumper: the test is passing a config.yaml to create-environment [03:09] I wonder if that is broken somehow [03:09] I kinda guessed with that [03:09] hmm.. [03:35] menn0: there is a cursed email to reply to now [03:36] thumper: thanks, I will [03:36] thumper: I'm using the actual CI test script now [04:40] Bug #1331151 changed: 'juju destroy-environment' sometimes errors [04:49] axw: ping ? [04:49] davecheney: pong [04:50] axw: you mentioned you had some fixes for the peergrouper ? [04:50] did they land ? [04:50] ? [04:50] davecheney: no, master is blocked [04:51] davecheney: fixes here: http://reviews.vapour.ws/r/3149/ [04:57] axw: crap [05:01] menn0: is there some trick to getting hosted env... oh, oh dear [05:01] # TODO(gz): May want to gather logs from hosted env here. [05:09] thumper: https://bugs.launchpad.net/juju-core/+bug/1516498 [05:09] Bug #1516498: api/unitassigner: data race [05:16] Bug #1516498 opened: api/unitassigner: data race === mup_ is now known as mup [08:20] dimitern: ping [08:25] jam, hey, sorry I missed our 1:1 :/ [08:43] dimitern: np, maybe we can chat after the standup if you have anything you want to go over [08:48] jam, sure, ok === mthaddon` is now known as mthaddon [09:14] this PR updates juju-core to the latest charm package version: http://reviews.vapour.ws/r/3152/) [09:14] reviews appreciated :) [09:38] Bug #1516541 opened: payload/api/private: tests do not pass [09:41] Bug #1516541 changed: payload/api/private: tests do not pass [09:44] Bug #1516541 opened: payload/api/private: tests do not pass [10:01] jam, frobware, standup? [10:46] frobware, dooferlad, voidspace, please take a look when you have a moment - http://reviews.vapour.ws/r/3153/ - almost straight cherry pick from the 1.25 fix for bug 1483879 [10:46] Bug #1483879: MAAS provider: terminate-machine --force or destroy-environment don't DHCP release container IPs [11:03] voidspace, ok to start? [11:03] frobware: yep, omw [11:10] frankban, hey, do you have any idea when the guibundles branch will land on master? [11:18] dimitern: it is already landed on master [11:18] dimitern: because it was merged on the chicago-cubs one [11:18] frankban, awesome! so juju deploy bundle.yaml is usable? [11:18] dimitern: yes [11:19] frankban, nice! I'll give it a try now :) it's pity it's not mentioned in juju deploy help [11:19] dimitern: it should be mentioned actually [11:20] frankban, oh, sorry - I missed it - it's there [11:20] dimitern: cool [11:26] sweet! juju deploy bundle.yaml works just fine with spaces constraints [11:27] dimitern: \o/ [11:27] * frankban lunches [11:39] frobware, dooferlad, voidspace, I have another review for you to look at when you can - http://reviews.vapour.ws/r/3155/ - fixes spaces-based deployments on ec2 and brings feature parity between master and 1.25 [11:40] dimitern: is that a straight forward port? [11:41] voidspace, yes, no changes needed [11:41] dimitern: you don't need a review then if it's already been reviewed [11:41] dimitern: but LGTM :-) [11:41] voidspace, it still needs a review :) thanks! [11:42] dimitern: I don't think we're re-reviewing stuff that is a straight port between branches [11:43] dimitern: at least I and other people haven't been :-) [11:43] dimitern: and it doesn't seem like a good use of time [11:43] voidspace, it still needs a ship it stamp [11:43] dimitern: that isn't what we've been doing [11:43] voidspace, isn't it? [11:43] dimitern: no [11:43] dooferlad, would dimitern's change have any impact to your CI tests? ^^ [11:44] dimitern: and people shouldn't "Ship It" *without* reviewing it [11:44] voidspace, I agree [11:44] dimitern: and re-reviewing it is a waste of everyone's time [11:44] voidspace, not if you reviewed it the first time I guess [11:45] dimitern: heh, well possibly [11:45] dimitern, voidspace: cherry-pick backports that have already been reviewed shouldn't need re-reviewing, IMO. [11:45] frobware, ok, I don't mind at all to just land it then :) [11:45] if they need substantive changes then a re-review is reasonable [11:46] frobware: it shouldn't have any impact on tests. [11:46] dimitern, my only observation is for the CI tests [11:46] dooferlad: how far off getting to the maas test server are you? [11:46] dooferlad: I'm going to need it "soon-ish" [11:47] voidspace: I just ran into a KVM not appearing for one of my tests, which may be due to mhy MAAS or may be just flake. [11:47] voidspace: once I have that sorted I will have got the review answered I was looking at and can get on with the test server [11:48] voidspace: so, this afternoon. [11:48] dooferlad: ok [11:49] frobware, voidspace, thanks for the Ship It! anyway guys :) [12:36] frobware, dooferlad, voidspace, yet another for you to review - a really small one this time - http://reviews.vapour.ws/r/3156/ fixes bug 1499426 [12:36] Bug #1499426: deploying a service to a space which has no subnets causes the agent to panic [13:36] frobware, thanks for the review - I've replied and updated the PR === Spads_ is now known as Spads [13:46] fwereade, ping? [13:46] mattyw, pong [14:43] dimitern, are you waiting for a review on http://reviews.vapour.ws/r/3153/ I ask because I saw that it was being merged. [14:50] thank you wwitzel3 and katco ! [14:50] alexisb: yep, we'll get it figured out [14:51] alexisb: np [14:58] frobware, nope that one is for master and it's still blocked [14:59] frobware, and since we're not reviewing forward ports, I'll just merge it when possible, if that's ok [15:03] natefinch: standup [15:05] frobware: hey, how close are you to getting a fix for bug 1512371 for 1.25? [15:05] Bug #1512371: Using MAAS 1.9 as provider using DHCP NIC will prevent juju bootstrap [15:06] * dimitern steps out to the store; bbl [15:11] katco, probably tomorrow [15:11] frobware: kk [15:11] katco, actively working on it now [15:11] katco, blocking you? [15:11] frobware: cool, just trying to figure out how much wiggle room we have on another bug :) [15:12] frobware: nope not blocked. [15:12] katco, in terms of a making a 1.25.x release? [15:12] frobware: yeah [15:12] frobware: e.g. is everyone waiting on us [15:12] rather, i.e. === akhavr1 is now known as akhavr [15:14] cherylj, you mentioned you had the replica set problem again - still holding true? [15:15] frobware: that maas set up was hosed. I ended up tearing it down and rebuilding it. Haven't seen the problem since. [15:15] frobware: I can't say for sure there wasn't something else going on [15:19] cherylj: hey, can you read my comment at the bottom of bug 1382556 and give guidance? [15:19] Bug #1382556: "cannot allocate memory" when running "juju run" [15:20] katco: sure, taking a look.... [15:20] cherylj: ty [15:20] cherylj: this is one of the last blockers for 1.25.1 [15:20] katco: yeah. Are you guys in your stand up? Could I come chat with you guys if you are? [15:20] cherylj: of course: https://plus.google.com/hangouts/_/canonical.com/moonstone?authuser=1 [15:21] wwitzel3 katco - ping [15:21] lazypower: pong [15:21] I'm riffing with mbruzek in a hangout, and it appears juju list-payloads isn't available on 1.26-alpha1, is this known/expected behavior? [15:21] lazypower: pong [15:22] lazypower: it is not yet in master [15:22] i'm confused as to how its in 1.25 and not 1.26 :P [15:22] How did it get into 1.25 if it is not in master? [15:22] is it hidden by feature flag? [15:22] You gave us a feature then took it away! [15:23] ^ yeah, wat [15:24] lazypower: mbruzek: sorry in meeting. we started the feature based on 1.25, 1.26 was blocked by lack of a >= Go 1.3 process [15:24] hmm, hokay [15:24] lazypower: it's on the radar. we'll get it landed asap [15:24] OK, sorry to interrupt meeting. [15:24] Thanks for the follow up o/ [15:24] (we're also on bug squad this iteration) [15:36] Bug #1516668 opened: Switch juju-run to an API model (like actions) rather than SSH. [15:36] Bug #1516669 opened: Memory/goroutine leaks. [15:54] I replied to menn0's mail about the blocker again, can natefinch or someone take a look? [16:06] Bug #1516676 opened: Use of os/exec in juju is problematic in resource limited environments. [16:17] mgz: reading [16:32] katco: seems like the jes CI tests are still blocked by code introduced by the unitassigner. Should I work on that or the juju run bug? (I presume the blocker, but wanted to confirm) [16:33] natefinch: hm [16:33] natefinch: my inclination is to say the juju run bug since it's blocking the impending 1.25.1 release [16:34] natefinch: we still have some runway on master [16:34] natefinch: plus it looks like menno did a fix-committed? [16:35] katco: menno responded to the CI test failure with some comments, thread title is "Cursed (final): #3310 gitbranch:master:github.com/juju/juju 0bf7c382 (functional-jes)" [16:36] katco: basically... he thought it should have been fixed, but the CI test was still having problems [16:37] natefinch: i see. well, i think you should still focus on the 1.25.1 blocker [16:37] natefinch: that's coming out first [16:37] katco: yep, that's fine. That's why I asked :) [16:37] natefinch: yep, ty [16:42] axw: ping? [16:44] frobware: change to picking address algorithm landed on 1.25 [16:44] frobware: change discussed in standup fixed that failing test [16:45] frobware: porting to master now [16:45] frobware: also I think that the new Subnets implementation is done - but needs tests, which means I need a test harness [16:45] frobware: I can switch to ListSpaces whilst I wait for that [16:45] Bug #1516698 opened: Juju never stops trying to contact charm store [16:57] one wonders if no one thought about what might happen if this was run on an environment of 5000 machines: https://github.com/juju/juju/blob/master/apiserver/client/run.go#L164 [16:58] voidspace: *sigh*, that CI stuff took ages. I am not going to get far with gomaasapi before I need to stop (now-ish). Will see if I can take a look after dinner. [16:58] voidspace, all sounds good [17:05] sometimes I think people just randomly decide whether or not to pass around pointers versus values :/ [17:07] dooferlad: thanks [17:25] natefinch: what's your problem with that function using a pointer? [17:37] voidspace: it shouldn't be modifying the value, and the value is small enough to be copied easily. [17:37] voidspace: making it a pointer makes me wonder if it's going to be modified somewhere. [17:38] natefinch: if it's called 5000 times surely using a pointer is *more* efficient [17:38] natefinch: and if that's not the issue why does it matter if it's called for 5000 machines as you called out [17:38] natefinch: or is that a separate issue? [17:39] katco, does the lxd provider use the container/lxc code to still do container provisioning? [17:39] voidspace: separate issue... the problem is spawning 5000 goroutines that all do stuff at te same time [17:39] natefinch: right, instead of queuing [17:39] yeah, that would be much better... [17:39] cherylj: i don't think so. container/lxd [17:39] voidspace: and pointer dereference versus some small amount of memory copying is not always an obvious win [17:40] voidspace: queueing is what I'm writing right now, since this code is causing OOM issues [17:40] not always, just usually [17:40] natefinch: right, cool [17:40] voidspace: the pointer thing isn't really a problem, just a pet peeve [17:40] heh [17:40] natefinch: thanks for expanding, interesting stuff [18:30] man I love channels and goroutines [18:30] bbiab === natefinch is now known as natefinch-afk [19:43] natefinch-afk: hey did you get that tech-debt card created? === natefinch-afk is now known as natefinch [19:44] katco: oops, nope, will do now [19:54] katco: done [19:55] natefinch: ty [21:06] sinzui: what's the status with the CI blocker [21:06] ? [21:06] sinzui: menno ran the tests locally yesterday and could not reproduce [21:06] both with local, joyent, and the CI scripts [21:07] mgz: ^ I think you are versed in this topic [21:07] thumper: I replied to menno's message [21:08] * thumper hasn't got to it yet, still reading [21:08] go there [21:08] s/go/got [21:08] thumper: short version, somehow with trunk the units in the hosted environment are going *through* error, when the machines are not up, rather than pending, but once the machines are up are fine [21:08] wha? [21:08] oh... [21:08] ha [21:09] I bet this is the unit assignment worker [21:09] trying to assign them too early [21:09] and somehow marking them [21:09] there is some layering thing screwed up [21:09] natefinch ^^? [21:09] it thinks they are units for the hosting env... till it gets machines, then it works it out [21:13] thumper: reading backlog [21:13] natefinch: I was just about to look at the unit assignment worker [21:13] it seems that it is trying to assign the unit twice [21:13] because the machine isn't up yet [21:13] http://juju-ci.vapour.ws/job/functional-jes/276/console [21:13] natefinch: see the status output in there [21:14] natefinch: after two minutes, the status is taken again, and it looks ok [21:14] so it obviously settles itself down [21:14] but putting the unit into an error state is confusing the tests (and users) [21:15] thumper: I've seen it error and then settle, but I thought I'd fixed that when I told it only to run the worker on the master state machine [21:15] thumper: already assigned does not sound like the error you'd get if you tried to assign it and there was no machine yet [21:16] no, it sounds like it was assigned, and then attempted to assign it again [21:18] right, which would imply some sort of race condition - either multiple people getting notified and trying to assign (like I originally fixed) or maybe two notifications firing off in succession and thus causing two unit assignments to run concurrently.... the latter seems possible [21:19] natefinch: the other thing with this is this doesn't appear in a state server log anywhere [21:19] despite being an error that appears in status. this seems very wrong. [21:20] maor logging plz [21:20] :) [21:20] hmmm... wonder if I went too crazy in removing my debugging logging [21:22] this might be strange, but does the collection watcher fire when docs are removed? [21:22] natefinch: I have a feeling it might, but just a stab in the dark at the moment [21:22] I thought any change to the doc would fire the watcher [21:23] not just insertions [21:23] it appears that the assign units collection just has insertions and deletions and no updates [21:23] is that right? [21:24] corret [21:24] correct [21:26] natefinch: how about logging the unit ids that are being assigned [21:26] I wonder if we'll find a dupe [21:27] thumper: yeah.... I swear I was, but again, maybe I just took out too much logging [21:28] I can rerun with logging turned up more if that would maybe make things clearer [21:29] the worker has some tracef calls that you could turn on, but it definitely looks like I took out too many log statements [21:31] that'll at least tell you wht unit ids the worker is seeing firing from the watcher, and log the results of the unit assignment attempt [21:32] natefinch: what do I want... "
=DEBUG ?=TRACE" [21:32] juju.worker.unitassigner [21:32] ta [21:52] natefinch: http://juju-ci.vapour.ws/job/functional-jes/279/console [21:52] anyone knows how to ask peergrouper which of the machines is the leader? [21:52] you'll want the gathered logs when the job completes [21:54] mgz: thanks [21:54] wow, juju status --format=tabular doesn't show containers? [22:00] 2015-11-16 21:52:19 TRACE juju.worker.unitassigner unitassigner.go:56 Unit assignment results: ["cannot assign unit \"dummy-source/0\" to machine: cannot assign unit \"dummy-source/0\" to new machine or container: cannot assign unit \"dummy-source/0\" to new machine: unit is already assigned to a machine" ] [22:00] lol, this OOM error frmo juju run is a lot harder to repro when we moved to m3.mediums. [22:02] mgz: not really useful, given that we already knew that. I'm working on another bug for bug squad right now, that's blocking 1.25, but I'll try to look at that one once I get this one finished up [22:02] mgz: I'll take a look at the logs from the run later tonight and see if anything obvious pops up [22:02] gotta run and make dinner for the family === natefinch is now known as natefinch-afk [22:06] diner, honestly? at 6PM... [22:44] I'm sorry, but seriously? [22:44] a critical blocker stopping the entire team has less priority? === Makyo is now known as Guest20767 [23:14] wallyworld: I'm rebasing the azure-arm-provider branch because it's missing fixes from master [23:14] wallyworld: assuming no need to review [23:15] axw: once, just finishing meeting [23:16] axw: sorry, done now, in standup [23:16] oops, is that the time [23:48] axw: I have never suggested or required a review of merging master into a feature branch [23:49] * thumper off to walk the dog