[00:08] Hey, does anyone know if this list of supported series is up to date? https://github.com/juju/charmstore/blob/v5-unstable/internal/series/series.go#L37 [00:10] Also any ideas if there is a list of English labels for those series? [00:15] wallyworld_, ping [00:16] hey [00:16] sorry was late, I am on the hangout now [01:02] alexisb: do we have a hangout now? [01:04] menn0, yes [01:04] I am running late [01:05] menn0, I can see you [01:06] can you hear me? [01:15] thumper davecheney: FYI, peergroup fix is landed now [01:15] thanks axw [01:15] really appreciate it [01:15] np [01:15] axw: good work on the unit assignment bit too [01:21] axw: hotsauce! [01:24] axw: yes, thanks for cleaning up my mess :) === natefinch-afk is now known as natefinch [01:24] natefinch: heh, np [01:57] \o/ sped up the tests by 10s by fixing one test [01:57] * thumper wonders what else he broke [01:58] huh, tests all pass [01:58] magic [02:13] wallyworld_: ping [02:14] hi [02:15] wallyworld_: katco said you needed some clarification on things? I'm around poking this bug with a stick [02:15] i'm bootstrapping a 1.22 env to debug with [02:15] i don't really need clarification [02:15] just need to reproduce locally so i don't mess their environment [02:16] wallyworld_: ahh ok, I can reproduce the issue even with 1.25.0 .. if you upload-tools, it hoses up the ability to upgrade [02:17] even when we find and fix that, there are still on 1.22 without the fix :-( [02:17] it will be an agent related issue [02:18] wallyworld_: yeah, my hope was that since it is in all versions, the problem code would be the same [02:18] wallyworld_: also my main goal was trying to see if a new version didn't have the problem, so I could do a git bisect to isolate the problem [02:18] the other issue of course is the conn disconnect preventing upload tools from working [02:20] davecheney: http://reviews.vapour.ws/r/3182/ [02:20] davecheney: this has the code I wanted to talk to you about in it [02:24] wwitzel3: when you say 1.25, you mean a 1.25 back end? [02:25] wallyworld_: yes, I bootstrapped 1.25.0 using upload-tools and then attempted to upgrade and got the same failure [02:25] wallyworld_: no tools found OR connection shut down, depending on if I used upload-tools or not [02:26] wwitzel3: and if you bootstrap without upload-tools but attemt to upgrade with upload-tools? [02:26] works [02:26] so the upload-tools on bootstrap is the issue? [02:26] wallyworld_: maybe, I didn't test upgrading after using upload-tools to upgrade [02:26] wallyworld_: I can try that really quick though [02:27] ok [02:27] I can go 1.22 to 1.25 with upload-tools and then try to go to 1.26 [02:27] so just to be clear, you think bootstrap WITHOUT upload-tools is ok [02:27] but with uploa-tools on bootstrap is not [02:27] yes, if I bootstrap without upload-tools I can perform an upgrade to 1.26 if I set the agent-stream to devel [02:28] wwitzel3: so with upload-tools, you could try clearing ALL the tools metadata as we did yesterday [02:28] if I use upload-tools, upgrades fails, even if using upload-tools again [02:28] wallyworld_: yep, I did that [02:28] and it still fails? [02:28] now that makes no sense what so ever [02:29] oh you mean even the metadata put there from the upload-tools? [02:29] not jsut the streams data [02:30] yes [02:30] delete * from toolsmetadata collection [02:30] wallyworld_: ok, let me try that, bootstrapping 1.22.8 right now (no upload-tools) [02:31] wallyworld_: then I will upgrade to 1.25 with upload-tools, see if that puts it in the broken state [02:31] wallyworld_: then I'll db.toolsMetadata.remove() and see if I can upgrade to 1.26 [02:32] ok [02:38] thumper: me looks [02:38] davecheney: cheers [02:39] davecheney: the piece I was talking about this morning is the race condition in the api opening / timeout handling [02:39] if we hit the timeout, I close a channel [02:39] the other go routine does check this channel once the open api func returns [02:40] there is a race condition if the api is returned and the channel checked just before a different goroutine determines that we have timed out [02:40] so noone reads off the api or error return channels [02:41] however I figure what I added is better than the guaranteed block forever of the opening goroutine [02:41] if it did timeout [02:41] davecheney: I suppose we could fix that by making the apireturn and error return buffered channels of length one? [02:41] maybe [02:42] if we did that, perhaps we could get rid of the timeout channel? [02:42] however... [02:42] if we did that, the api wouldn't be closed on timeout [02:42] bah humbug [02:43] thumper: i don't like what i see [02:43] do you want me to rip it to shreads in review [02:43] or should we have a hangout to discuss the design [02:57] davecheney: shit, we have the second option? Someone should have told me two years ago :) [02:58] :D [02:58] thumper gets special treatment [03:06] davecheney: hmm [03:07] davecheney: hangout probably best [03:11] davecheney: read the review, happy to explain... but given we are trying to pass information through several layers, this is the best approach I could come up with [03:21] thumper: 1:1 hangout ? [03:22] davecheney: sure [03:30] wwitzel3: i'vr narrowed it down to the final upload tools step after the tools are built, need to dig into that a bit more. plus i've also got new logging showing juju is missing tools from simplestreams after all [03:36] gah, testing that things run in goroutines is a PITA [03:38] wallyworld: have you worked out the path where juju is getting correct tools when simplestreams isn't working as expected? [03:39] mgz: not yet. the logging i added doesn't dump the raw metadata as i did previously, i added extra logging at the point where the tools list is actually composed and returned to the caller. there's something really weird going on [03:40] thumper: http://play.golang.org/p/7HqKts_Di1 [03:48] wwitzel3: so i think i found the reason for the no matching error on upgrade, need to double check my logic === axw_ is now known as axw [04:20] cherylj: that peergroup bug is fixed now, I just forgot to tick it over [04:20] done now [04:21] and retargeted back to 1.26-alpha2 [04:23] thumper: upgrader working in the dep engine: http://reviews.vapour.ws/r/3185/diff/ [04:40] * thumper looks [05:18] davecheney: I'm off now to walk the dog [05:18] davecheney: review updated following review and chat [05:25] TheMue: ta [05:25] thumper ta [06:09] holy crap, 150 lines of test code runs perfectly the first time it successfully compiles. [06:09] (lol.... all that to test a 23 line function) [06:18] Bug #1517743 opened: api: more data races [06:18] Bug #1517744 opened: cmd/jujud/agent: more data races [06:24] Bug #1517743 changed: api: more data races [06:24] Bug #1517744 changed: cmd/jujud/agent: more data races [06:27] Bug #1517743 opened: api: more data races [06:27] Bug #1517744 opened: cmd/jujud/agent: more data races [06:33] Bug #1517747 opened: provider/joyen: more data races [06:33] Bug #1517748 opened: provider/lxd: test suite panics if lxd not installed [06:39] Bug #1517747 changed: provider/joyen: more data races [06:39] Bug #1517748 changed: provider/lxd: test suite panics if lxd not installed [06:45] Bug #1517747 opened: provider/joyen: more data races [06:45] Bug #1517748 opened: provider/lxd: test suite panics if lxd not installed [07:28] wallyworld: proposed apiserver/remoterelations, containing just the local watchers for now: https://github.com/juju/juju/pull/3779 [07:42] morning [07:47] dimitern: o/ [07:47] anastasiamac, o/ [07:47] dimitern: how is sophia? do u have snow? :D [07:48] there's nothing like an unblocked master in the morning :) [07:48] anastasiamac, oh not nearly - it's a lat autumn still 10-18 deg. [07:49] :D [07:58] axw: awesome, thanks will look soon [08:55] Bug #1302498 changed: Ensure network names are validated on deploy/add-machine once possible [10:02] frobware, jam, fwereade, standup? [10:52] dimitern, dooferlad: the answer is "yes" to does an ordinary deploy with network aliases pause with 'Waiting 120 seconds for network devices'", so not a result of butchering /e/n/i [10:53] frobware: fun! [10:54] dooferlad, it does success though [10:54] succeed [10:54] frobware, well, that's kinda good news [10:58] fwereade, hey, I've realized bindings should be updated on svc.SetCharm(), possibly allowing you to do upgrade-charm --bind like with deploy [10:59] dimitern, good point, yes they should [10:59] fwereade, initially, we could just use default bindings for new endpoints, and not implement --bind for upgrade-charm [11:00] fwereade, but the bindings definitely need updating as part of changeCharmOps [11:00] dimitern, yeah, I think that's the bulk of the cost/complexity [11:00] dimitern, once that's in, which I think it must be, the extra cost of exposing it via upgrade-charm is minimal [11:01] fwereade, exactly [11:01] dimitern, and is likely to actually be helpful in terms of showing us where concept boundaries truly lie [11:01] dimitern, the more sane clients you have, the more likely your abstraction is sane [11:01] fwereade, yeah [11:02] fwereade, cheers [11:03] fwereade, in order to assert bindings haven't changed while updating them I think I need to use txn-revno explicitly as a field, right? [11:04] dimitern, hm, quite possibly, yeah [11:04] dimitern, as long as it's a small doc only used for that purpose it's fine [11:05] fwereade, ok, so it's very much like for settings, but slightly simpler since the values are strings, not interface{} [11:05] dimitern, cool [11:38] dimitern, frobware: PTAL https://code.launchpad.net/~dooferlad/gomaasapi/subnets/+merge/277977 [11:43] dooferlad, looking [11:58] Bug #1517863 opened: Leadership appears broken in 1.25 [12:04] Bug #1517863 changed: Leadership appears broken in 1.25 [12:07] Bug #1517863 opened: Leadership appears broken in 1.25 [12:14] dooferlad, reviewed, let me know what you think.. [13:49] Bug #1517863 changed: Leadership appears broken in 1.25 [13:58] Bug #1517863 opened: Leadership appears broken in 1.25 [14:01] Bug #1517863 changed: Leadership appears broken in 1.25 [14:36] jw4: you around? [14:37] * tvansteenburgh shamelessly cross-posts: [14:37] anyone know how to get the ID of the currently executing juju action from inside the action itself? [14:37] i've seen other code using JUJU_ACTION_ID, but it's not set, and I don't see any mention of that env var in the docs [14:43] ah, it's JUJU_ACTION_UUID [14:57] stokachu: Just submitted a PR to your theblues branch. Once that's landed, it should fix the doc errors that are currently preventing your PR from landing. [14:59] kadams54: nice! [15:00] kadams54: i have a initial debian package built would you like me to create a pr for that or just maintain it out of tree? [15:02] ericsnow-afk: you going to be here today? [15:02] katco: coming === ericsnow-afk is now known as ericsnow [15:02] kadams54: ok merged that PR [15:21] tvansteenburgh: glad you found it :) [15:22] Bug #1440209 changed: juju action do doesn't accept non-string params on command line [15:23] cherylj, I think this (https://bugs.launchpad.net/juju-core/+bug/1516036) should go into 1.25.1 too. Thoughts? [15:23] Bug #1516036: provider/maas: test failure because of test isolation failure [15:24] frobware: yes, if it's a problem on 1.25 and you can get that in, that would be good. [15:25] cherylj, it depends to be a problem for devs - if your DNS doesn't use something like 8.8.8.8, or whatever CI uses, then the unit tests can fail. [15:26] frobware: things that prevent us from getting good test / CI runs are definitely good to have everywhere. [15:27] frobware: but if you can't get it done today, it can wait to 1.25.2 [15:27] frobware: actually [15:27] frobware: would you be up for looking at a different bug? [15:27] cherylj, need to land the MAAS/19 DHCP first. [15:27] cherylj, but sure [15:28] frobware: ok :) just ping me if you still feel that way after you land the MAAS 1.9 / DHCP fix :) [15:28] cherylj, what's the LP# [15:28] cherylj, curiosity killed the cat! [15:28] indeed! [15:29] There are two causing problems for 1.25 CI: bug 1514462 [15:29] Bug #1514462: Assertion failure in TestAPI2ResultError [15:29] and bug 1517611 [15:29] Bug #1517611: TestFilesystemInfo race condition in 1.25 [15:29] cherylj, Xenial - ha... just to add to the matrix [15:30] frobware: the important bit is less xenial and more go 1.5 [15:30] ENOMOREMATRIX [15:33] dimitern, dooferlad: please take a look http://reviews.vapour.ws/r/3188/ [15:34] frobware, looking [15:35] kadams54: just setup a seperate repo for the debian package i did https://github.com/battlemidget/theblues-deb [15:37] stokachu: thanks for the heads up… I'm kinda surprised we don't already have one :-) [15:37] dimitern, thanks [15:47] frobware: I still don't understand your change for bug 1516036 [15:47] Bug #1516036: provider/maas: test failure because of test isolation failure [15:48] the tests actually query a real dns server, so the solution is changing the value that escapes testing to fail more, rather than correctly isolating the tests from the runner's dns setup? [15:48] mgz, cherylj are we sure that those two bugs are what is causing the 1.25 curse? [15:49] alexisb: bug 1517611 is causing the 1.25 curse [15:49] Bug #1517611: TestFilesystemInfo race condition in 1.25 [15:49] alexisb: bug 1514462 is the main cause of failure for master on go 1.5 [15:49] Bug #1514462: Assertion failure in TestAPI2ResultError [15:50] mgz, bug 1517611 is marked incomplete? [15:50] Bug #1517611: TestFilesystemInfo race condition in 1.25 [15:50] alexisb: it's the "not on trunk, just 1.25" marker [15:50] mgz, I don't know what to do about it atm. the real issue is on the MAAS side - we're trying to workaround a sporadic bug there, [15:50] ok [15:51] frobware: what's the maas bug number? I can hit roaksoax in person. [15:51] mgz, https://bugs.launchpad.net/juju-core/+bug/1412621 [15:51] Bug #1412621: replica set EMPTYCONFIG MAAS bootstrap [15:51] frobware: reviewed [15:52] mgz: why not just delete the bug task for master? [15:52] frobware, that is a juju core bug [15:52] mgz: (the minus sign in the "Affects" column) [15:53] frobware, reviewed [15:55] ericsnow: we don't know it'as not on master, we've just not seen it [15:55] also, it confuses search [15:55] mgz: ah [15:55] mgz: k [16:00] frobware: I'm not seeing a maas bug referenced, just that it's a generic error we see when maas dhcp gets screwed, which can be a variety of reasons [16:00] frobware: roaksoax says that once juju has touched /e/n/i it's our problem, basically [16:02] frobware: I also don't see how that's related to our unit tests really touching dns [16:06] mgz: the issue is that MAAS does not always update its DNZ zone entry for the node that is just deployed which means it is unresolvable. [16:07] frobware: okay, but there's no maas bug specifically for that in launchpad? [16:07] or I'm not finding it? [16:11] mgz: I'm not sure that there was a separate bug for the MAAS issue. I think they had just looked at the same bug frobware was working on: bug 1412621 [16:11] Bug #1412621: replica set EMPTYCONFIG MAAS bootstrap [16:11] mgz: or maybe I'm remembering wrong. I thought I saw that the MAAS guys touched that bug [16:11] frobware, ah, I've haven't notices runScript takes explicit string arguments, so ignore my comment around using test.params... [16:12] cherylj: I'm just not seeing any work on the maas side to fix things [16:13] mgz, I just updated the bug [16:14] frobware: thanks! [16:15] mgz, if you drop back to IP addresses the replica set can continue; if you don't drop the unresolvable host names then the node will be unusable. I think we're pushing in the wrong direction. This, to me, is a MAAS issue. [16:15] frobware: okay, I just didn't see that communicated anywhere [16:15] mgz, but it was suggested we try and work around broken/sporadic provider behaviour. [16:15] if we want maas to fix things we need to tell them [16:16] mgz, that's mostly in the git commit [16:16] mgz, re: "I just didn't see that communicated anywhere" [16:18] mgz, resolving the names (which broke the unit tests for some) is a bad idea. We either back it out, fix it MAAS, or continue to workaround and add more crud. [16:18] I'm generally for backing that out. [16:19] mgz, the advertised solution is to use static IP addresses but I don't think you can do that for 1.8 - true? === natefinch is now known as natefinch-afk [16:21] frobware: not in default config at least, everything just asks dhcp for an address [16:25] frobware, you can do that in 1.8 - with either devices or ipaddresses APIs [16:27] mgz, dimitern: so if you're using dhcp you're like to run into this problem. if you can switch to static, there's a workaround. Open to backing this out if it is making things worse, et al. [16:28] frobware, well, having the option to do it is one thing, but actually doing it is quite a different story :) [16:28] frobware, I mean juju can't enforce restrictions on how your maas nodes get their addresses [16:28] dimitern, right. not clear how practical a solution that is (for customers) [16:29] frobware, in most cases I guess customers will use static addresses for nodes [17:00] mgz, katco ping [17:38] Bug #1517992 opened: juju-upgrade to 1.24.7 leaves juju state server unreachable [17:58] sorry ppl, cannot make it to the hangout [18:11] I guess I missed it === natefinch-afk is now known as natefinch [18:12] natefinch: we called it. too few people [18:17] katco: figured. Sorry, kids' thanksgiving thing at school ran over. [18:59] natefinch: are you porting that fix for bug 1382556 to master? [18:59] Bug #1382556: "cannot allocate memory" when running "juju run" [19:00] katco: I am now [19:00] natefinch: ty [19:18] man I wish git was better at dealing with moved files. [19:34] natefinch: did you get that bug ported to 1.25? saw the card is moved over [19:35] porting to master now, didn't realize the card should include the port, sorry, I can move it back. [19:35] katco: hit a snag that the ssh stuff moved outside the repo, so going to have to do a more manual port of that part of the code [19:35] natefinch: no worries, the bug was still open so just wanted to check [19:35] natefinch: bummer =/ [20:31] we should just rename dependencies.tsv to mergeconflict.tsv [20:33] when did we start using gopkg.in/inconshreveable/log15.v2 ? [20:33] natefinch: lxd dependency [20:33] katco: gah... libraries shouldn't log :/ [20:35] for exactly this reason [20:42] katco: can I just merge the forward ports? While I had to manually copy over the changes from 1.25 to the new repo, it really is just an exact copy of the code... there were no changes in between the two as far as I can tell. [20:43] natefinch: do you mean don't go through the bot? [20:43] natefinch: or do you mean no review? [20:43] katco: no review [20:43] natefinch: yeah go for it [20:44] Bug #1509032 changed: Juju doesn't support is own version of 1.25.0 === thumper is now known as thumper-afk [21:17] aww dammit [21:18] someone changed juju/utils with a breaking API change, without updating juju core [21:18] dooferlad: ^ [21:23] natefinch, he is sleeping I would think [21:23] alexisb: yeah, remembered after I pinged him [21:31] mgz: ping? [21:31] menn0: yo [21:31] mgz: i'm helping out with a unit test failure on ppc64 (TestFilesystemInfo) [21:31] it only seems to fail on ppc64 [21:32] can I get access to the ppc64 unit test host (stilson-09 I believe)? [21:32] menn0: sure [21:32] I already have access to stilson-08 from looking at an issue a long time ago but it doesn't appear to have any go tools installed [21:33] there are a lot more meenos than I'd expect on lp [21:38] can someone review this 5 line patch so head on master actually compiles with head on juju/utils? http://reviews.vapour.ws/r/3192/ === thumper-afk is now known as thumper [21:40] katco, menn0, thumper ^ [21:41] * menn0 looks [21:41] * thumper shipped it [21:41] thanks! [21:41] * menn0 is too slow [21:48] katco: once my fix to master lands, then my forward port should land [21:48] dinner time, back in a few hours === natefinch is now known as natefinch-afk [21:50] natefinch-afk: cool ty [22:00] katco: I'm out like a trout [22:00] ericsnow: tc [22:01] ericsnow: how far did you get? [22:02] katco: not too far [22:02] katco: it *does* reproduce reliably (by re-running the CI job) [22:02] katco: I've updated the bug report [22:10] ericsnow: ty === ericsnow is now known as ericsnow-afk [22:33] menn0: hey, did you repo? my run just before restarting hit the failure [22:35] mgz: yep, repoed it immediately [22:35] mgz: I have a "fix", but it's really looking like a Go toolchain bug [22:36] mwhudson will love you. [22:37] mgz: printing out the struct created by state.FilesystemAttachmentInfo{MountPoint: "/srv"} gives {MountPoint: ReadOnly: true} [22:37] but if I do state.FilesystemAttachmentInfo{MountPoint: "/srv", ReadOnly: false}, you get {MountPoint: "/srv", ReadOnly: false} [22:38] serious wtf territory [22:38] mwhudson isn't around today unfortunately [22:47] menn0: sooo... is your "fix" constructing the thing with full params each time? [22:48] e [22:49] menn0: we don't seem to have had any compiler changes or other suspect things on stilson-09 recently, so mystery to me why it started happening on the 1.25 branch here [23:03] mgz: yeah, it's pretty strange [23:13] hello again everybody [23:21] davecheney: I just sent you and mwhudson details of a fun ppc64 issue via email [23:27] menn0: \o/ [23:29] menn0: unconstructive answer: use go 1.5 [23:29] nobody will fix gccgo4.9 [23:30] davecheney: joy :-/ [23:32] a := s.newAgent(c, ss) [23:32] go func() { c.Check(a.Run(nil), gc.IsNil) }() [23:32] defer func() { c.Check(a.Stop(), gc.IsNil) }() [23:32] // Now run the test. [23:32] s.assertUpgradeSteps(c, state.JobHostUnits) [23:32] s.assertHostUpgrades(c) [23:32] what the f [23:48] Bug #1518128 opened: Improper address:port joining [23:48] Bug #1518131 opened: cmd/jujud/agent: different data races [23:51] Bug #1518128 changed: Improper address:port joining [23:51] Bug #1518131 changed: cmd/jujud/agent: different data races [23:57] Bug #1518128 opened: Improper address:port joining [23:57] Bug #1518131 opened: cmd/jujud/agent: different data races