/srv/irclogs.ubuntu.com/2017/11/01/#juju-dev.txt

axwwallyworld: vsphere got a bit broken during the provisioner changes. I'm going to update common.Bootstrap to do the zone selection02:21
axwmight leave the other providers alone for now, just aim to get this fixed for beta202:22
wallyworldyikes, ok02:22
wallyworldsgtm02:22
=== akhavr1 is now known as akhavr
axw_wallyworld: I connected to VPN so not sure you saw this: wallyworld: https://github.com/juju/juju/pull/8007 fixes the vsphere issue, do you have time to review?06:02
wallyworldaxw_: oh awesome, tyvm, lloking in a sec06:03
axw_veebers: I've got a fix ^ for the vsphere issue that balloons mentioned, but braixen is still buggered. if you bootstrap with "--to zone=aron.internal", you'll get a "Connection refused" error. I have no idea where that's coming form, or how to debug that - do you?06:06
jamaxw: I looked at it as well06:35
jamballoons: veebers: the bot has repeatedly tried to run my patches, and seems to be starting, but failing to comment when the merge fails06:36
axwjam: thanks06:36
axwwallyworld: and thanks06:36
wallyworldnp, thanks for fixing06:36
jamveebers: : ci.jujucharms.com/job/github-merge-juju/437/console is one of those cases06:36
wallyworldjam: the point of moving to a single az param to start instance is to get that selection logic out of the providers06:40
wallyworldit's been moved up to eg the provisioner06:41
wallyworldby the time start instance is called, we will have selected a single az06:41
jamwallyworld: storage and networking both have a heavy impact on az selection06:41
jamwallyworld: which means all you're really doing is changing *which* function on the provider selects the az06:41
wallyworldthey do; i'd have to read the code to refresh my memory06:42
wallyworldjam: i see axw is responding to your pr comments, so i'll defer to those save having same conversation in 2 places06:44
axwjam wallyworld: the provisioner (and with this PR, bootstrap) first consult the provider via DeriveAvailabilityZone. if the provider says it doesn't care, then we iterate over the zones.06:44
axwjam: DeriveAvailabilityZone does things like returning a zone based on placement, or based on the volume attachments, spaces, etc.06:44
wallyworldright, the provider still gets to filter out unsuitable zones06:45
jamaxw: does it also account for the subnet= provisioning that we have for AWS (which was quite well received, and we'd like to extend to other providers)06:45
jamaxw: I'm not 100% sure why DeriveAZ is better than just StartInstance06:45
axwjam: AFAIK that's unchanged06:45
axwjam: in two words: provisioner parallelisation06:46
jamaxw: subnet= *also* impacts zone06:46
axwjam: yep, that code is unchanged. Derive... will still pull out the zone from subnet=06:46
jamaxw: how does this impact parallel provisioning at all?06:47
jamyou still need to do a derive for every instance, right?06:47
jamwhether it is done inside or outside the StartInstance call06:47
axwjam: derivation isn't the problem, automatic spreading is06:48
axwdistribution groups06:48
jamaxw: because the one doesn't exist yet, it doesn't bias us from not using it?06:48
jamaxw: my biggest concern is that Derive only returns 1 entry06:49
jamwhich seems very much wrong06:49
jama space might span multiple zones and you want spreading, or it may not06:49
jamif we want to switch StartInstance to only allow 1 that's fine06:49
jambut Derive can certainly return multiple possibilities06:49
axwjam: like I said in the PR, what's there satisfies today's needs. we can easily change it as needed06:50
axwjam: FWIW I did originally advocate for it returning multiple, I just didn't recall subnet= when I was talked down :)06:52
axwjam: on an entirely different topic, I just ran a couple of tests upgrading from an older version of 2.3-beta2 (698a34c22f2176a6de8cd24c5ea4aa5b11637069) to tip of develop, and it works06:53
axwjam: the second time the upgrade took a while, which might be related to the errors - but it did eventually upgrade06:53
jamwow, just saw NEWS entry that there was a van attack in NYC today06:53
jamaxw: k. it might have been in progress, was mostly concerned about a 2.2 to a 2.3 being an issue06:54
axwoh crap :(06:54
jamaxw: can I talk you back up? :)06:57
jamaxw: does Derive take the Placement directives (it seems like it must)06:57
axwjam: yes it does07:04
axwjam: I'm happy for it to change, but not in this PR07:04
veebersjam: ack, seems it'll be an error in the check script. I'll fix that up first thing in my morn07:04
veeberssorry about that :-\07:05
axwwallyworld: http://ci.jujucharms.com/job/github-merge-juju/440/console <- error in remote relation firewaller test, FYI in case there's a real issue08:19
wallyworldta, will look08:19
wallyworldaxw: i have a recollection of seeing that failure ages ago but not since. i'll see if i can repro locally08:28
axwokey dokey08:29
jamballoons: bug #1729248 I went to investigate test suite failures merging code into develop10:17
mupBug #1729248: ProvisionerSuite.TestProvisionerStopRetryingIfDying fails intermittently <intermittent-failure> <juju:Triaged> <https://launchpad.net/bugs/1729248>10:17
jamhowever, the stack trace that it is giving appears to be *2.2* code10:17
jamthough even there, the numbers don't line up10:18
jamI can't say for sure, but the string it says is on the line that failed "stop(c, p)" was removed in 2.3 and only exists in the 2.2 codebase10:19
jamnow, that line *does* seem to exist in the branch from Anastasia, but it *shouldn't* exist after merging into develop10:23
jamis it possible that there is something wrong with the build that it isn't properly merging into develop/pulling source code from the original branch rather than the merged result?10:24
jamaxw: I have a concern about the merge bot, are you around possibly to discuss?10:25
axwjam: I'm here but my wife and kids just got home. I can spare a couple of minutes before I head down10:30
axwjam: that when did you see that error? I think I fixed it as a drive by in my PR10:31
axwjam: and my PR also removed "stop(c, p)" :)10:31
axwso I think that explains that10:31
axwjam: linked in the bug, gtg - have a good day10:34
jamaxw: have a good day. The issue is less about the test failing randomly, and more that the bot seems to be running the test suite against the PR code *rather* than against the merged PR + Develop code11:11
jamthough maybe you're code landed right in-between run 1 and run 2 ?11:12
jamcould be11:12
balloonsSomething weird going on?11:13
jamballoons: we've had an intermittent test, and I went to reproduce it, but couldn't get my source tree to line up with the failure lines reported in the merge request11:14
jamballoons: but I think I've sorted it out11:14
balloonsaxw, thanks for fixing the vsphere issue, though as you've noted, I still don't know why the DC and vcenter refuse to talk to each other11:15
jamAnastasia tried to merge against version X, which failed with intermittent failure. Andrew merged his code which fixed that in passing, then Anastasia landed her code, which now didn't have the failure ethire11:15
anastasiamacjam: balloons: thnx for figuring this out \o/11:23
jamanastasiamac: well, it seems to be mostly axw doing a driveby fix11:23
anastasiamacaxw: \o/ (i rescind all previous \o/ on the subject then)11:24
jam:,(11:24
jam:)11:24
anastasiamacjam: but thank you for looking in my absence11:25
balloonsI just asked a question :-)11:27
anastasiamacballoons: and a valid question it was!11:28
ryebotI'm trying to bootstrap a controller with strict egress restrictions. I've set up an apt mirror and bootstrapped using it with --config apt-mirror=<my mirror>, however I'm still seeing apt operations fail because they're trying to access security.ubuntu.com. Is there a way to make them point at my apt mirror, or otherwise avoid this issue?13:01
jamaxw: so, i just created a controller running 500 applications, each with 3 units to test out the leadership code14:05
jamaxw: when doing an upgrade, it spins *hard* on corrupt lease document14:05
jamaxw: and while doing that, it actually ends up running out of open file handles14:05
jamaxw: looks like your changes to leases might account for a 100:70 improvement (whether you want to call that down 30% or 100/70 = 40% improvement or whatever). But getting away from a shared clock document is a pretty big win.15:08
=== TikTok is now known as michealb
wallyworld_balloons: veebers: we having release call?21:33
axwjam: bugger. I'm still looking for a way to fix that. thanks for testing - the improvement sounds nice though, thanks for testing22:45

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!