[00:01] da fuq ? [00:06] davecheney: yeah, this seems to be the fundamental problem behind the lxc containers not upgrading [00:06] * thumper is still digging [00:26] * thumper tries to igrore work for a bit and go to lunch === natefinch-afk is now known as natefinch [01:10] thumper: based on circumstantial evidence only it looks like the a stuck lease/leadership worker is behind bug 1466565 [01:10] Bug #1466565: Upgraded juju to 1.24 dies shortly after starting [01:11] thumper: based on a log message indicating that a watcher fired due to a change in the leases collection long after just about everything else was dead [01:11] * menn0 goes to try a quick repro [01:18] thumper: http://paste.ubuntu.com/11776424/ [01:18] 5 races, including the obscure apiserver one [01:18] that we talked about in the standup [01:20] dave, always playing the race card [01:27] ericsnow: you around? === kadams54 is now known as kadams54-away [01:49] thumper: who maintains gomass api ? [01:49] https://bugs.launchpad.net/juju-core/+bug/1468972 [01:49] Bug #1468972: provider/maas: race in launchpad.net/gomaasapi [01:51] Bug #1468972 opened: provider/maas: race in launchpad.net/gomaasapi [01:52] thumper: bingo... able to repro [01:54] wallyworld: we're never doing another 1.23 release again are we? [01:55] no [01:55] that's the plan [01:55] cool [01:56] wallyworld: the reason I ask is that I'm looking at a problem upgrading out of a 1.23 env which seems fairly easy to hit (almost certainly due to the lease/leadership workers not exiting) [01:57] hmmm [01:57] wallyworld: adam c has hit it and I can repro it pretty easily [01:57] wallyworld: seems like anyone who ended up on 1.23 could have trouble getting off it [01:57] i guess we could do another release then [01:57] wallyworld: that wouldn't help [01:57] or have to === kadams54-away is now known as kadams54 [01:57] wallyworld: they wouldn't be able to upgrade to that either [01:58] ah yeah [01:58] wallyworld: the issue is prevent the agent from exiting to restart into the new version [01:58] is there a workaround we can document? [01:58] wallyworld: it should be possible to work around it by manually setting the symlink [01:58] menn0: I *think* killing the jujud process would fix it [01:58] that will have to be what we do then i guess [01:59] it just deadlocks when shutting down [01:59] if it's the bug I fixed [01:59] axw: no that doesn't help because the symlink gets changed as one of the very last things that jujud does b4 it exists [01:59] ah right [01:59] axw: and b/c some workers aren't finishing it's not getting to that [01:59] * axw nods [02:00] menn0: btw, reviewed your branches. sorry for not doing so yesterday [02:00] adam gets a minute or so of working Juju before it wants to restart and then gets stuck [02:00] axw: thanks. no worries. [02:02] axw: good catches for both of the problems you noticed === anthonyf is now known as Guest78303 [02:12] davecheney: technically we maintain gomass api [02:13] menn0: which repro are you talking about? [02:13] launchpad.net/gomaasappi === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 === kadams54 is now known as kadams54-away [02:44] thumper: bug 1466565 [02:44] Bug #1466565: Upgraded juju to 1.24 dies shortly after starting [02:45] menn0: yes? [02:45] thumper: this is pretty serious actually... anyone who upgraded to 1.23 is likely to have a hard time getting off it [02:45] thumper: see the ticket for trivial repro steps [02:45] thumper: manual steps are required to upgrade [02:46] thumper: the culprit appears to be the lease worker not honouring kill requests [02:46] * thumper nods [02:58] wallyworld: I'm playing around with the Azure portal, which looks like it's using the new model... and putting machines in the same AS still forces them to the same domain-name/IP [02:58] :( [02:58] oh :-( [02:58] can you email the ms guys we have been talking to and ask about it? [03:00] wallyworld: ok [03:00] ty, may not be the answer we want but at least they may be able to explain why etc [03:09] Bug #1466565 changed: Upgraded juju to 1.24 dies shortly after starting [03:13] axw: there's a blue card in the Next lane - binding volumes/filesystems. That one has actually been done as part of the volume deletion work [03:14] wallyworld: yes, apart from UI to change binding [03:14] wallyworld: so I'll change it to just the missing bits [03:14] axw: so i reckon we should add an unplanned card worth 5 or 8 for the work done [03:15] wallyworld: it was part of the persistent volume deletion [03:15] which was just woefully underestimated [03:15] yep, i under estimated the resources card too :-( [03:15] axw: also, if/when you get a chance ptal at the resources pr again :-) [03:16] wallyworld: sure, just writing this email to guy [03:16] np [03:21] Bug #1466565 opened: Upgraded juju to 1.24 dies shortly after starting [03:27] wallyworld: sorry, dunno why I thought you were storing the URL now. I think I saw the params struct and thought that's what you were storing in state [03:28] np [03:31] wallyworld: LGTM [03:31] yay, ty [03:33] Bug #1466565 changed: Upgraded juju to 1.24 dies shortly after starting [04:00] omg so much fail [04:01] you pull a string and broken stuff appears everywhere [04:12] wallyworld, axw: can you join a hangout plxz? [04:12] https://plus.google.com/hangouts/_/canonical.com/onyx-standup [04:13] thumper: omw [04:15] thumper: are you in? just says "trying to join the call" [04:15] axw: I had that earlier today too... [04:15] * thumper tries a direct invite === kadams54 is now known as kadams54-away [04:28] axw: when did this commit land BTW? [04:28] thumper: 1.24 [04:29] I'm wondering if we should pull 1.24.1 [04:29] because this problem will stop any non-state server upgrading I think [04:29] thumper: probably not a bad idea. how come this got through CI? is it only affecting things that don't support KVM? [04:30] no idea [04:30] maybe... [04:30] there is an open issue though about CI around upgrades [04:30] as we have found so many upgrade problems [04:30] which CI didn't catch [04:30] thumper: got the OK from OIL too I think, though not sure if they do upgrade or clean install [04:31] I assigned you to the wrong bug [04:31] hang on [04:32] thumper: ta [04:36] bug 1466969 [04:36] Bug #1466969: Upgrading 1.20.14 -> 1.24.0 fails [04:36] hurngh, can't test because I have vivid [04:36] should fail from 1.23 I guess [04:36] 1.23 is terrible [04:36] you can't upgrade from 1.23 due to lease / leadership issues [04:37] try 1.22 or 1.20 [04:37] I have some 1.20.14 binaries if you want them :) [04:37] thumper: I can build them, juju 1.20 doesn't work on vivid [04:37] no systemd [04:37] ugh [04:38] geez [04:38] never mind, I'll work something out [04:38] axw: you could reproduce in ec2 [04:38] yep. I think I have a VM anyway [04:38] axw: by deploying ubuntu into a container [04:38] ok [04:39] axw: was this for 1.24.1 or 1.24.0? [04:39] axw: because there is another bug about failing to upgrade from 1.24.0 to 1.24.1 [04:39] thumper: pretty sure 1.24, I'll double check [04:40] .0 I mean [04:40] * thumper wouldn't be surprised if it is a different bug [04:40] so many bugs [04:40] :-( [04:41] thumper: yep, 1.24.0 [04:41] ok... so this other upgrade problem is something else [04:41] * thumper takes a deep breath [04:42] thumper: how do I work around this syslog upgrade issue? [04:42] upgrade to 1.24.2.1 failed (will retry): move syslog config from LogDir to DataDir: error(s) while moving old syslog config files: invalid argument [04:42] ha [04:42] I build from the 1.24.1 tag [04:43] I see, that was only broken in 1.24.2 ? [04:43] or mkdir /etc/juju- [04:43] yep [04:43] it is the commit after updating the version to 1.24.2 [04:43] okey dokey [04:43] I'll try that [05:06] Bug #1468994 opened: Multi-env unsafe leadership documents written to settings collection [05:17] thumper: digging into the leadership settings issue... the _id field was being prefix correctly [05:17] thumper: but the env-uuid field wasn't being added [05:18] thumper: so there's no cross-env leakage issues, but the upgrade step definitely gets confused [05:18] * menn0 updates ticket [05:28] axw: can I get a quick review of http://reviews.vapour.ws/r/2036/ please [05:28] it's a one-liner :) [05:28] menn0: sure [05:29] menn0: is there a minimal test you can add for it? or is that coming later? [05:30] axw: i'll have a look... i didn't have to change any tests when making this change [05:30] menn0: right, but we had missing test coverage right? [05:31] menn0: maybe not worthwhile. I'll LGTM and leave it to your discretion [05:32] axw: thinking about it, a test at this layer doensnt make sense since it's actually the responsibility of a lower level to add the env-uuid field [05:32] menn0: fair enough [05:32] axw: the fact that the lower layer didn't blow up when given a doc like this will be fixed in a later PR [05:33] axw: and tested there [05:33] menn0: SGTM [05:33] shipit [05:33] axw: cheers [05:36] thumper: https://github.com/juju/juju/pull/2662 and https://github.com/juju/juju/pull/2661 are merging now. they're the minimum fixes for the leaderships settings doc env-uuid issue for 1.24 and master. More to come to avoid this kind of thing in the future of course. [06:25] thumper: seems there's another problem too :/ 2015-06-26 06:22:31 ERROR juju.worker runner.go:218 exited "api": login for "machine-1" blocked because upgrade in progress [06:26] thumper: (machine-1 hasn't upgraded yet) [08:17] voidspace, dooferlad, hey guys, since you're on call reviewers today, along with fwereade, please review any non-reviewed PRs with priority [08:17] dimitern, am so doing :) [08:24] fwereade, cheers :) [08:30] dimitern: on it. [08:30] dooferlad, ta! [08:30] cool [08:30] dimitern: the other topic for the day seems to be bootstack related. Should we sync up with Peter now [08:30] ? [08:31] dooferlad, I'm talking to him in #juju @c [08:31] dimitern: ah, I was expecting on a different channel. [09:02] dooferlad, standup? [09:58] Bug #1469077 opened: Leadership claims, document larger than capped size [11:30] Hello ! [11:30] submitted two bugs last night. [11:30] [1] https://bugs.launchpad.net/charms/+source/quantum-gateway/+bug/1468939 [11:30] Bug #1468939: Instances fail to get metadata: The 'service_metadata_proxy' option must be enabled. [11:31] https://bugs.launchpad.net/charms/+source/nova-cloud-controller/+bug/1468918/ [11:31] Bug #1468918: neutron-server fails to start; python-neutron-vpnaas and python-neutron-lbaas packages are missing. [11:32] jamespage: Hello [12:01] Bug #1469130 opened: tools migration fails when upgrading 1.20.14 to 1.24.1 on ec2 [12:02] mattyw, would you close http://reviews.vapour.ws/r/1460/ one way or the other? looks like it has ship-its [12:03] fwereade, oooh, had forgotten about this [12:03] mattyw, cheers [12:03] fwereade, the comments seem controversial - care to make a casting vote - land or just close? [12:04] mattyw, I'm inclined to trust dave and andrew's apparent approval; nobody's complained, so land it [12:05] fwereade, landing, thanks very much [12:06] fwereade, thanks for noticing, had totally forgotten about this [12:15] niedbalski, niedbalski_: so, I'm sorry, I don't know what happened with your patches http://reviews.vapour.ws/r/1698/ and http://reviews.vapour.ws/r/1717/ ; it seems they got ship-its but never landed? if you check whether they need updating, and let me know their status, I will make sure they get landed [12:16] dimitern, http://reviews.vapour.ws/r/1403/ ? [12:24] fwereade, looking [12:25] fwereade, that needs to land yes, it's been a while [12:26] fwereade, I'll fix/respond to the current reviews and ask you for a final stamp [12:26] dimitern, cool [12:38] Syed_A, which openstack release? [12:38] jamespage: Kilo [12:38] Syed_A, for that second bug, neutron-server is not supported on nova-cloud-controller - you have to use the neutron-api charm [12:38] that applies for >= kilo [12:40] Syed_A, can you make sure that your quantum-gateway charm is up-to-date - the kilo template should have the right things set [12:40] jamespage: Ok, so if i deploy neutron-api charm i wouldn't need to install vpnaas or lbass ? [12:40] Syed_A, the neutron-api charm knows how to deploy those things for >= kilo [12:40] jamespage: Roger that. [12:41] it will enable them - nova-cloud-controller only supported 'embedded neutron-server' up to juno I think [12:41] jamespage: This may be a silly question but how can i make sure that quantum-gateway charm is up-to-date ? [12:41] fwereade, updated http://reviews.vapour.ws/r/1403/ PTAL [12:41] Syed_A, are you deployed from branches or from the juju charm store? [12:49] jamespage: juju charm store. [12:49] Syed_A, which version does 'juju status' tell you have deployed then [12:50] Syed_A, version 16 has the required templates: [12:50] https://api.jujucharms.com/charmstore/v4/trusty/quantum-gateway-16/archive/templates/kilo/nova.conf [12:50] Ok,,, checking ... [12:53] jamespage: charm: cs:trusty/quantum-gateway-16 [12:54] dimitern, LGTM [12:54] Syed_A, what's your openstack-origin configuration? [12:58] jamespage: Unfortunately, in this setup openstack-origin is not present but there is an ansible variable which specify openstack release which is set to kilo. [12:58] jamespage: The variable is used to set this repository, repo="deb http://ubuntu-cloud.archive.canonical.com/ubuntu {{ ansible_lsb.codename }}-updates/{{ openstack_release }} main" [12:59] Syed_A, I need to understand what the charm thinks it should be doing [12:59] if openstack-origin is not set correctly, it won't use the right templates [12:59] jamespage: Ok, i am going to set openstack_origin in the config right now. [12:59] irrespective of what you put in sources :) [13:00] Syed_A, this may have working in the past, but for the last release we switched how we determine openstack series to support the deploy from source feature in the charms [13:01] Syed_A, my statement about openstack-origin will apply across all of the openstack charms btw [13:01] the template loader is constructed based on that configuration [13:01] so it will assume a default of icehouse on trusty for example [13:03] jamespage: Ohhh, i got it, so this might be the reason why this charm which used to work fine, now fails. [13:03] Syed_A, that's quite possible [13:04] Syed_A, before we determined version based on packages installed - however for deploy from source, there are not any openstack packages installed :-) [13:14] fwereade, last look? http://reviews.vapour.ws/r/1403/ [13:15] dimitern, if that's all you changed just land it :) [13:15] fwereade, cheers :) will do [13:42] jamespage: I am deploying a fresh setup with these configs. [1] http://paste.ubuntu.com/11778630/ && [2] http://paste.ubuntu.com/11778641/ [13:43] Syed_A, openstack-dashboard needs openstack-origin as well [13:43] but looks much better [13:43] Syed_A, I must introduce you to bundles :-) [13:44] jamespage: bundles ? :) [13:44] Syed_A, hmm - you're doing alot of --to=X to the same machines ? [13:45] jamespage: Yes, specifying exactly where a service should go ? Isn't a good practice. [13:45] Syed_A, bundles - https://jujucharms.com/openstack-base/ [13:46] Syed_A, pushing multiple services onto the same machines without using containers won;t work [13:47] Syed_A, https://wiki.ubuntu.com/ServerTeam/OpenStackCharms/ProviderColocationSupport [13:50] jamespage: This is why i was working on a lxc based OpenStack deployment. But for now we are just deploying nova-compute and quantum-gateway on seperate machine which used to work in the past. [13:50] jamespage: Our lxc based bits are also ready just need to patch the lxc-ubuntu-cloud template for our 3 nics per container requirement. [13:51] Syed_A, I thought you where - good [13:51] you pastebin confused me [13:52] jamespage: Sorry about that. alice(controller) is 1, bob(compute) is 2 and charlie(quantum-gateway) is 3. :) [13:52] Syed_A, but you are going to use lxc containers right? [13:53] jamespage: No, not in this setup. [13:53] Syed_A, most of the controller services won't work [13:53] Syed_A, they assume control over the filesystem, so are not safe to deploy without containers [13:55] jamespage: ohhh that w'd be a problem. :/ [13:56] Syed_A, yeah - I know they will all at-least conflict on haproxy configuration [13:56] Syed_A, we enable that by default now [13:57] jamespage: for haproxy, we have a customized haproxy.cfg which fixes the issue [14:02] * fwereade was up until 2 last night, taking an extended break, may or may not be back at a reasonable time [14:03] Syed_A, you guys are terrifying me - all I can say is ymmv [14:04] Has anyone seen a problem with the GCE provider today? The juju bootstrap command is giving this error: ERROR failed to bootstrap environment: cannot start bootstrap instance: no "trusty" images in us-central1 with arches [amd64 arm64 armhf i386 ppc64el] [14:07] mbruzek: I am in #cloudware. I haven’t gotten any answers [14:08] mbruzek: there are NO images for gce http://cloud-images.ubuntu.com/releases/streams/v1/com.ubuntu.cloud:released:gce.sjson [14:08] jamespage: Our goal is to eventually move towards lxc based openstack deployment as suggested by the community. Right now i am only trying to fix this issue for the time being. We have every intention to follow the process as suggested on ubuntu wiki. [14:08] sinzui: strange that this worked before, I am just seeing this error today [14:09] mbruzek: CI tests gce, we saw the failre about 15 hours ago. [14:10] sinzui: Did you file a bug that I can contribute to? [14:10] mbruzek: no, because this is an ops issue. I am not aware of a project for gce images [14:11] mbruzek: I am crafting a email asking for someone with power to explain the situation [14:17] jamespage: You were right about the conflict at haproxy , neutron-api failed to install and logs this:INFO install error: cannot open 9696/tcp (unit "neutron-api/0"): conflicts with existing 9696/tcp (unit "nova-cloud-controller/0") [14:20] jamespage: Looks like nova-cloud-controller and neutron-api are both installing neutron-server. [14:21] Syed_A, yes [14:21] Syed_A, hmm - yes - that won't work well on a single unit [14:21] Syed_A, there is a huge assumption in the charms that they 'own' the unit [14:25] jamespage: Ok, so how can i stop nova-cloud-controller from installing neutron-server. [14:29] jamespage: Will it work if i deploy neutrona-api unit on quantum-gateway node ? [14:30] Syed_A, nope - neutron-api will trample all over the gateway charms config files [14:30] jamespage: compute node then ? [14:30] Syed_A, nova-cc decides to stop managing neutron-server - but not straight away [14:30] Syed_A, same problem - but this time neutron-openvswitch's config files [14:30] Syed_A, the charms are just not designed for this type of use [14:32] Bug #1469184 opened: listSuite teardown fails [14:32] Bug #1469186 opened: ContextRelationSuite teardown fails [14:32] jamespage: Don't you think charms should be able to deploy a standalone controller node say a VM. [14:34] Syed_A, I'm adverse to changing the design principle each charm has in that it 'owns' the unit filesystem [14:35] Syed_A, LXC containers give us a lightweight way to manage this, without having to have alot of complexity in the charms to deal with this problem [14:37] jamespage: I am inclined to agree with you. LXC works better but the use case that somebody might want to deploy a openstack controller node without using lxc is a valid use case. [14:38] Syed_A, I don't disagree with that - just saying maybe the charms are not the right way to fullfil that [14:45] fwereade: why did we write our own RPC implementation when there's one in the stdlib? [15:02] Bug #1469189 opened: unitUpgraderSuite teardown panic [15:02] Bug #1469193 opened: juju selects wrong address for API [15:02] Bug #1469196 opened: runlistener nil pointer / invalid address [15:02] jamespage: Ok let's say i fix the neutron-server manually but what about the instance metadata not working ? [15:02] Syed_A, that should be fixed by correctly specifying openstack-origin [15:06] jamespage: testing ... === kadams54 is now known as kadams54-away === kadams54-away is now known as kadams54 [15:33] jamespage: Ok so instance metadata is working. [15:33] natefinch, I can't remember what it does that the stdlib one didn't, but I know it was something :/ [15:33] jamespage: As per your suggestion correctly specifying openstack-origin fixed the issue. [15:33] natefinch, rogpeppe would remember [15:34] natefinch: there were a few reasons [15:34] natefinch: the main one is that with the stdlib version you don't get to have per-connection context [15:35] rogpeppe: ahh, interesting, yeah [15:35] natefinch: also, the way you have to phrase the stdlib methods is awkward [15:38] jamespage: If somebody is deploying openstack on public cloud and they cannot use lxc, so the suggestion here will be to start a new vm and install neutron-api as a standalone unit there ? [15:41] Bug #1469199 opened: State server seems to have died [15:41] rogpeppe: yeah, the stdlib way is kind of annoying, I'm surprised they didn't do it the way ours does... (traditional val, error return)... but I'm sure there was a reason at the time [15:41] natefinch: it's simpler to implement the way they did it [15:42] natefinch: but my reasoning was we were going to be writing lots of API entry points, so the additional complexity in the rpc package was worth it [15:46] mgz: ping [15:49] voidspace: hey [15:49] mgz: it's alright, I think I've sorted it [15:50] mgz: had a question about gomaasapi which you seem to have touched [15:50] voidspace: okay, I hall remain in the dark [15:51] mgz: heh [15:53] mgz: I hate creating JSON maps in Go :-/ [15:53] voidspace: it is not the most fun [15:56] Syed_A, yes - but that is very much an edge case [15:57] most clouds are deployed on metal :-) [15:57] Syed_A, infact what you suggest is exactly how we test the openstack charms - we have a small QA cloud (5 compute nodes) which we can standup a full openstack cloud ontop of [15:57] we can run ~15 clouds in parallel [15:57] and do things like test HA etc... [15:58] jamespage: Correct most clouds are deployed on metal. But with the latest charms neutron-api and nova-cloud-controller cannot be installed on the same physical machine ? [16:00] Syed_A, that is absolutley the case - and you will hit issues with other conflicts as well [16:00] Syed_A, which is why we have https://wiki.ubuntu.com/ServerTeam/OpenStackCharms/ProviderColocationSupport [16:00] jamespage: We also have a small setup where we test openstack. I set up HA LXC openstack setup last week. Which was fun :) [16:00] :-) [16:00] Syed_A, its neat - the qa cloud i refer to is juju deployed, and is HA control plane under lxc as well [16:02] jamespage: Cool ! [16:18] natefinch: regarding RB, did you mean the GH integration isn't working or something else? === kadams54 is now known as kadams54-away [16:27] mbruzek: gce streams are back [16:27] sinzui: thank you [16:55] ericsnow: yes, the GH integration... like, I made a PR vs. juju-process-docker and no review was created on RB [16:56] ericsnow: I probably just missed a steo [16:56] ste [16:56] step [16:56] arg... [17:06] natefinch: yeah, the repo did not have the web hook set up (I've added it) === kadams54-away is now known as kadams54 [17:08] ericsnow: can you document the steps in the wiki? [17:09] natefinch: sure [18:03] ericsnow: so, process server api in process/api/server.go? [18:04] natefinch: how about process/api/server/uniter.go [18:04] natefinch: params would live in process/api/params.go [18:05] ericsnow: is there a reason to split out the params, server, and client stuff? if each one is fairly simple and probably fits in a single file... [18:06] natefinch: my expectation is that it won't fit well in a single file [18:06] ericsnow: ok === kadams54 is now known as kadams54-away [19:43] ericsnow: when are those state functions getting merged into the feature branch? [19:44] natefinch: likely not before Monday [19:44] ericsnow: ok [20:21] this whole "duplicate every single struct in the API" thing gets really tiresome [21:51] Bug #1469318 opened: apitserver: TestAgentConnectionsShutDownWhenStateDies takes > 30 seconds to run