[02:21] wallyworld: vsphere got a bit broken during the provisioner changes. I'm going to update common.Bootstrap to do the zone selection [02:22] might leave the other providers alone for now, just aim to get this fixed for beta2 [02:22] yikes, ok [02:22] sgtm === akhavr1 is now known as akhavr [06:02] wallyworld: I connected to VPN so not sure you saw this: wallyworld: https://github.com/juju/juju/pull/8007 fixes the vsphere issue, do you have time to review? [06:03] axw_: oh awesome, tyvm, lloking in a sec [06:06] veebers: I've got a fix ^ for the vsphere issue that balloons mentioned, but braixen is still buggered. if you bootstrap with "--to zone=aron.internal", you'll get a "Connection refused" error. I have no idea where that's coming form, or how to debug that - do you? [06:35] axw: I looked at it as well [06:36] balloons: veebers: the bot has repeatedly tried to run my patches, and seems to be starting, but failing to comment when the merge fails [06:36] jam: thanks [06:36] wallyworld: and thanks [06:36] np, thanks for fixing [06:36] veebers: : ci.jujucharms.com/job/github-merge-juju/437/console is one of those cases [06:40] jam: the point of moving to a single az param to start instance is to get that selection logic out of the providers [06:41] it's been moved up to eg the provisioner [06:41] by the time start instance is called, we will have selected a single az [06:41] wallyworld: storage and networking both have a heavy impact on az selection [06:41] wallyworld: which means all you're really doing is changing *which* function on the provider selects the az [06:42] they do; i'd have to read the code to refresh my memory [06:44] jam: i see axw is responding to your pr comments, so i'll defer to those save having same conversation in 2 places [06:44] jam wallyworld: the provisioner (and with this PR, bootstrap) first consult the provider via DeriveAvailabilityZone. if the provider says it doesn't care, then we iterate over the zones. [06:44] jam: DeriveAvailabilityZone does things like returning a zone based on placement, or based on the volume attachments, spaces, etc. [06:45] right, the provider still gets to filter out unsuitable zones [06:45] axw: does it also account for the subnet= provisioning that we have for AWS (which was quite well received, and we'd like to extend to other providers) [06:45] axw: I'm not 100% sure why DeriveAZ is better than just StartInstance [06:45] jam: AFAIK that's unchanged [06:46] jam: in two words: provisioner parallelisation [06:46] axw: subnet= *also* impacts zone [06:46] jam: yep, that code is unchanged. Derive... will still pull out the zone from subnet= [06:47] axw: how does this impact parallel provisioning at all? [06:47] you still need to do a derive for every instance, right? [06:47] whether it is done inside or outside the StartInstance call [06:48] jam: derivation isn't the problem, automatic spreading is [06:48] distribution groups [06:48] axw: because the one doesn't exist yet, it doesn't bias us from not using it? [06:49] axw: my biggest concern is that Derive only returns 1 entry [06:49] which seems very much wrong [06:49] a space might span multiple zones and you want spreading, or it may not [06:49] if we want to switch StartInstance to only allow 1 that's fine [06:49] but Derive can certainly return multiple possibilities [06:50] jam: like I said in the PR, what's there satisfies today's needs. we can easily change it as needed [06:52] jam: FWIW I did originally advocate for it returning multiple, I just didn't recall subnet= when I was talked down :) [06:53] jam: on an entirely different topic, I just ran a couple of tests upgrading from an older version of 2.3-beta2 (698a34c22f2176a6de8cd24c5ea4aa5b11637069) to tip of develop, and it works [06:53] jam: the second time the upgrade took a while, which might be related to the errors - but it did eventually upgrade [06:53] wow, just saw NEWS entry that there was a van attack in NYC today [06:54] axw: k. it might have been in progress, was mostly concerned about a 2.2 to a 2.3 being an issue [06:54] oh crap :( [06:57] axw: can I talk you back up? :) [06:57] axw: does Derive take the Placement directives (it seems like it must) [07:04] jam: yes it does [07:04] jam: I'm happy for it to change, but not in this PR [07:04] jam: ack, seems it'll be an error in the check script. I'll fix that up first thing in my morn [07:05] sorry about that :-\ [08:19] wallyworld: http://ci.jujucharms.com/job/github-merge-juju/440/console <- error in remote relation firewaller test, FYI in case there's a real issue [08:19] ta, will look [08:28] axw: i have a recollection of seeing that failure ages ago but not since. i'll see if i can repro locally [08:29] okey dokey [10:17] balloons: bug #1729248 I went to investigate test suite failures merging code into develop [10:17] Bug #1729248: ProvisionerSuite.TestProvisionerStopRetryingIfDying fails intermittently [10:17] however, the stack trace that it is giving appears to be *2.2* code [10:18] though even there, the numbers don't line up [10:19] I can't say for sure, but the string it says is on the line that failed "stop(c, p)" was removed in 2.3 and only exists in the 2.2 codebase [10:23] now, that line *does* seem to exist in the branch from Anastasia, but it *shouldn't* exist after merging into develop [10:24] is it possible that there is something wrong with the build that it isn't properly merging into develop/pulling source code from the original branch rather than the merged result? [10:25] axw: I have a concern about the merge bot, are you around possibly to discuss? [10:30] jam: I'm here but my wife and kids just got home. I can spare a couple of minutes before I head down [10:31] jam: that when did you see that error? I think I fixed it as a drive by in my PR [10:31] jam: and my PR also removed "stop(c, p)" :) [10:31] so I think that explains that [10:34] jam: linked in the bug, gtg - have a good day [11:11] axw: have a good day. The issue is less about the test failing randomly, and more that the bot seems to be running the test suite against the PR code *rather* than against the merged PR + Develop code [11:12] though maybe you're code landed right in-between run 1 and run 2 ? [11:12] could be [11:13] Something weird going on? [11:14] balloons: we've had an intermittent test, and I went to reproduce it, but couldn't get my source tree to line up with the failure lines reported in the merge request [11:14] balloons: but I think I've sorted it out [11:15] axw, thanks for fixing the vsphere issue, though as you've noted, I still don't know why the DC and vcenter refuse to talk to each other [11:15] Anastasia tried to merge against version X, which failed with intermittent failure. Andrew merged his code which fixed that in passing, then Anastasia landed her code, which now didn't have the failure ethire [11:23] jam: balloons: thnx for figuring this out \o/ [11:23] anastasiamac: well, it seems to be mostly axw doing a driveby fix [11:24] axw: \o/ (i rescind all previous \o/ on the subject then) [11:24] :,( [11:24] :) [11:25] jam: but thank you for looking in my absence [11:27] I just asked a question :-) [11:28] balloons: and a valid question it was! [13:01] I'm trying to bootstrap a controller with strict egress restrictions. I've set up an apt mirror and bootstrapped using it with --config apt-mirror=, however I'm still seeing apt operations fail because they're trying to access security.ubuntu.com. Is there a way to make them point at my apt mirror, or otherwise avoid this issue? [14:05] axw: so, i just created a controller running 500 applications, each with 3 units to test out the leadership code [14:05] axw: when doing an upgrade, it spins *hard* on corrupt lease document [14:05] axw: and while doing that, it actually ends up running out of open file handles [15:08] axw: looks like your changes to leases might account for a 100:70 improvement (whether you want to call that down 30% or 100/70 = 40% improvement or whatever). But getting away from a shared clock document is a pretty big win. === TikTok is now known as michealb [21:33] balloons: veebers: we having release call? [22:45] jam: bugger. I'm still looking for a way to fix that. thanks for testing - the improvement sounds nice though, thanks for testing