[00:01] davecheney: I knew that but didn't think you were around [00:02] got back on monday [00:02] took a few days off to recover [00:02] now back to hunting races [00:02] \o/ [00:25] wow! first time experiencing eqrthquake in Brisbane... 5.3 @ 35km of coral sea :D [07:29] Bug #1479653 opened: state depends on system clock [08:22] dimitern: ping [08:23] TheMue, pong [08:25] dimitern: feeling better today, good for work, but not for hangout. but getting closer with the test problem. assigning a new IP address to a machine that exists before (!) the worker started leads to the wanted event. but if the machine is created after the worker nor events are raised. interesting behavior, will add more logging [08:27] dimitern: wondered like you yesterday, that the 0.1.2.9 is never shown in the logs. [08:27] TheMue, I'm glad you're feeling better [08:27] dimitern: now simply created a 0.1.2.10 and added it to the 1st machine initialized in SetUp it funnily is logged [08:28] TheMue, hmm this feels like the EntityWatcher is forwarding the initial event, but then it's not triggering on change? [08:28] dimitern: thanks, thankfully our job allows to work below a blanket ;) [08:29] dimitern: the change comes later, but it is related to an already existing machine [08:29] TheMue, :) yeah [08:29] TheMue, not quite sure I follow you - please describe the steps when the event is triggered (and when it's not) in detail [08:30] dimitern: IMHO the state.WatchIPAddresses() should react on state.AddIPAddress() [08:30] dimitern: ok, prepare a small paste [08:31] TheMue, hmm wait [08:31] TheMue, AddIPAddress adds an alive address, right? [08:32] TheMue, doesn't the entity watcher only trigger on dead addresses? [08:32] dimitern: http://paste.ubuntu.com/11965055/ [08:34] dimitern: IMHO not, but would have to look. It's only a mapping StringWatcher, which maps the received string values to their according entity tags [08:35] dimitern: I wondered, because another existing test adding a new IP doesn't fail. but it uses the existing machine. so I added this fragment to my failing test and found the astonishing behavior. [08:39] TheMue, looking at the code to remind myself what was implemented [08:40] dimitern: /me too, digging deeper and adding more logs (have to remove them afterwards, phew) [08:51] TheMue, so the worker starts the watcher on SetUp ? [08:51] TheMue, show me your latest branch code please [08:52] dimitern: the branch is here: https://github.com/TheMue/juju/tree/addresser-worker-using-api [08:53] TheMue, so far, looking at state, api, and apiserver watcher code I can't see any obvious flaws that might be causing it, so it must be the way worker uses it [08:56] fwereade, I will need to steal thumper for 30 minutes [08:57] dimitern: ah, wait, I've got an idea. may be due to the mix of strings watcher and entity watcher [08:57] alexisb: heya o/ [08:58] TheMue, I don't see your worker implementing SetUp, where it should start the watcher [08:58] heya TheMue [08:58] TheMue, ah, I saw it below [08:58] dimitern: yep [08:59] alexisb, hey, btw there's a 30m scheduling conflict for OS+juju call and networking roadmap one [09:00] alexisb, I guess the OS call is more important than the other one? [09:02] dimitern, I am not sure what the OS+juju call? [09:02] alexisb, the one with jamespage [09:03] dimitern, you are good, they are the same [09:03] alexisb, I see :) cheers then [09:11] TheMue, does waitForInitialDead work in your branch (TestWorkerRemovesDeadAddresses) ? [09:12] dimitern: yes, all other tests are fine [09:13] TheMue, waitForInitialDead should (almost?) immediately see 2 dead IPs - 0.1.2.4 and 0.1.2.6, right? [09:14] dimitern: yes, as those belong to machine2 [09:18] TheMue, fetching the IPAddress from state and then calling EnsureDead on it is weird [09:19] TheMue, it's not what will happen in real life - the address being dead is a side effect of the machine it's assigned to getting destroyed [09:19] dimitern: you mean in TestWorkerRemovesDeadAddress? that's what I found in the existing tests [09:19] dimitern: the failing one is TestMachineRemovalTriggersWorker [09:20] TheMue, but, machine removal should indeed trigger "Set all allocated ips to dead for this machine id" [09:20] dimitern: so the original test already has been wrong? can remove it then [09:20] TheMue, no :) let's think first why it's failing [09:21] TheMue, so *only* TestMachineRemovalTriggersWorker fails? [09:21] dimitern: yes, and the IP address is dead, see the adderts following to the machine removal [09:21] dimitern: exactly, the rest works fine [09:21] TheMue, then there's the problem :) [09:22] dimitern: already the adding of the new IP to the new machine isn't reported (at least as alive) while the reporting to a machine existing before the worker is started is reported (see pastebin, the second IP is reported) [09:23] TheMue, see, removing a machine should include an op to ensure all alive ips of that machine end up dead *without* needing to do anything else (e.g. expecting the provisioner to see the machine getting dead and marking allocated ips as dead) [09:24] TheMue, so I'd dig more into the list of ops machine.EnsureDead() and/or Remove() includes w.r.t. ipaddressesC [09:24] dimitern: yeah, just started a state browsing of ipaddressesC to look exactly there ;) [09:26] TheMue, adding a new alive IP (not need to even allocate it to machine) *should* trigger the IP addresses watcher, but should be ignored by the worker I *think* [09:27] dimitern: as it is alive, yes. the logs should show it like "[LOG] 0:00.351 DEBUG juju.worker.addresser IP address ipaddress-03c48ed1-c389-4930-82e1-1df101fb7ab2 is not dead (life "alive"); skipping" [09:29] TheMue, I *seriously* hope we dial down this log message to TRACE level (image thousands of IPs being reported when rapidly provisioning machines) [09:30] dimitern: so far I haven't touched the log levels I found, but can do it while finishing the code [09:30] TheMue, if I've suggested adding it at DEBUG, sorry [09:31] dimitern: the modification on worker level has been less than thought, mostly moving the release stuff to server side and change behavior from single to bulk [09:33] TheMue, state.ensureIPAddressDeadOp looks dangerous on its own - without an assert isAliveDoc (and the corresponding handing of ErrAborted where it's called) it's potentially overwriting the life field of the doc indiscriminately [09:37] dimitern: I see. so the original intention has been to set the address to dead regardless of its life status? what's happening, when it isn't alive, so dying or already dead? [09:37] dimitern: one usage is in Machine.Remove() [09:38] TheMue, yeah, that's the one without an assert set, the other (with isAliveDoc) is in IPAddress.EnsureDead [09:38] dimitern: and one is in IPAddress.EnsureDead() with an assert isAliveDoc [09:38] h5 [09:39] * TheMue loves mongo transactions === ashipika1 is now known as ashipika [09:46] TheMue, have you tried: 1) adding s.State.StartSync() just after line 234 (asserting the addr is dead); 2) if that doesn't work, try removing one or both of the other StartSync() calls before that, but leave the one introduced in 1) [09:48] TheMue, looking at the sequence of ops, it looks like waitForReleaseOp is timing out because the apiserver has no chance of observing the address being dead before reading from the dummy ops chan [09:49] TheMue, however, since opsChan is buffered, that shouldn't be the case (unless buffer size of 10 is somehow not enough) [09:50] dimitern: already tried with a larger buffer, and played with the StartSync()s. not sure if I've done it how you've described, so I'll do now [10:00] TheMue, need to get in a call, let's continue later [10:00] dimitern: ok [11:29] morning [11:29] * perrito666 is devoid of his internet connection [11:59] TheMue, any luck isolating the issue? [12:00] dimitern: not yet done, but deeper, heads down in the lifecycle watcher ;) wondering about its merge() [12:00] dimitern: one moment, showing you an interesting log fragment [12:01] TheMue, ok [12:02] dimitern: http://paste.ubuntu.com/11966200/ [12:02] dimitern: so, here the first four addresses are the normal ones [12:02] dimitern: the 0.1.2.9 is the one for the new created machine [12:03] dimitern: the 0.1.2.10 is instead created for the existing machine [12:05] dimitern: why is the 0.1.2.9 in updates, but not in the updated ids anymore? the only step between is the merge() of the lifecycle watcher and here I'm looking now [12:05] TheMue, it looks to me the lifecycle watcher is receiving entities with wrongly prefixed IDs [12:06] dimitern: the updates map contains all known IPs so far, all with the env id as prefix [12:07] dimitern: so I have to see what merge() exactly does [12:08] TheMue, hmm that's right - the ids are ok at that point [12:08] * TheMue never has been so deep in our watcher. this mix of differently formatted events, transformations, mappings, online queries etc seems weird sometimes [12:09] TheMue, merge should combine the updates with the entities with known life and produce ids for the changes [12:09] dimitern: and you can imagine the large number or debug statements *lol* [12:10] dimitern: and here it drops the 0.1.2.9, maybe after its machine has been removed [12:13] TheMue, nothing should just remove ips without releasing them, the machine removal just triggers "set to dead" [12:14] TheMue: any part of juju, upon detailed inspection, looks weird [12:14] perrito666: *rofl* thanks for motivational remarks from Argentina [12:14] TheMue, and I don't get why 0.1.2.3 is even there [12:14] perrito666: heya btw [12:14] TheMue: hi :) [12:16] dimitern: you mean the received one? [12:16] TheMue, yeah [12:18] dimitern: you're right, the machine as well as its IPs aren't touched during the test [12:20] dimitern: http://paste.ubuntu.com/11966320/ to understand where and what I'm logging in the lifecycle watcher [12:22] * TheMue should add a debug log remover based on the comments above to his juju development tool ... [12:29] TheMue, it seems more and more like a sync issue to me [12:29] TheMue, have you tried dropping all StartSync() calls? [12:32] dimitern: yes, the log is w/o sync as well as w/ sync after the assert that the ip addrress is dead [12:32] dimitern: doesn't change anything [12:33] dimitern: and as I said, the IP assigned to the new machine is dropped in the notifications while the one for an existing machine is kept [12:33] dimitern: look how different the .9 and the .10 behave [12:35] dimitern: a theoretical question [12:36] dimitern: oh, forget, got it while formulating it [12:36] TheMue, :) [12:37] TheMue, weird issue indeed [12:37] * dimitern *hates* debugging watchers [12:37] * TheMue too [13:03] dimitern: boah, no, you don't get it [13:04] dimitern: I took a deeper look at merge() with the individual states of the IPs etc [13:04] dimitern: and I've seen that the .9 always is dead [13:04] dimitern: and never known as alive [13:04] dimitern: so no removal [13:05] dimitern: then I thought we've too fast, dead simple [13:05] dimitern: and for testing I added a 30secs pause between adding and machine removal [13:05] dimitern: and now - *TADDAAH* - the test passes [13:07] dimitern: so, yes, it is a syncing problem, but different from State.StartSync() [13:12] TheMue, sorry, was afk; catching up.. [13:13] TheMue, that sounds like the desired behavior for watchers (consolidating multiple changes between two events) [13:14] dimitern: isn't the API watcher a kind of polling? [13:14] dimitern: because we now don't have a direct state watcher anymore, but using the API [13:19] TheMue, ok, how about this: instead of sleeping for 30s, just add a short attempt loop between machine removal and adding 0.1.2.9 and setting it to dead [13:20] dimitern: sure, the hard coded sleep just has been a test [13:25] TheMue, cheers [14:43] wwitzel3: natefinch: ericsnow: ping [14:43] katco: heyheyhey [14:43] katco: pong [14:43] o/ [14:43] did you guys get my email? [14:44] how are we looking for iteration work? [14:44] katco: good, the state/persistence story won't land [14:44] katco: all others will be done by EOD Friday [14:45] katco: of the pointed stuff that is [14:45] wwitzel3: i can live with that :) [14:45] katco: we have some low-prio overhead that probably won't get done [14:46] wwitzel3: understood... glad the pointed work is mostly landed [14:46] wwitzel3: ericsnow: ty, just wanted to check in! [14:46] we'll talk about the sprint sometime after i get back. lots of interesting stuff [14:47] katco: sweet [14:47] ericsnow: wwitzel3: k gotta run to another meeting... ty again, and if i don't talk to you before, have a great weekend [14:47] katco: dibs on the Python library, lol [14:47] rofl [14:47] wwitzel3: dang it! [14:48] wwitzel3: we should pair up :) [14:48] katco: you too, safe travels [14:48] katco: ditto [15:12] o/ sinzui [15:13] sinzui: been working with fwereade on this blocker issue [15:13] just asked the bot to land it [15:13] it has been tested by Ed to deploy a complex openstack bundle that uses leadership a lot [15:13] and it all worked [15:13] \o/ [15:13] also, I have run all the tests locally, and they at least pass here [15:13] first time too [15:14] * thumper crosses fingers for the bot to do its thing [15:14] hello? [15:14] anyone alive in here? [15:15] * thumper streaks through the empty channel [15:15] * ericsnow averts eyes [15:16] ericsnow, all of us in annecy have to see it in real life [15:16] thumper: sorry, I wasn't sure if there was actually a question in all that [15:16] * alexisb is blinded [15:16] mgz: there wasn't [15:16] alexisb: :) [15:16] thumper: okay then, carry on streaking :) [15:16] but I do like to know that I'm not just talking to myself [15:16] lol [15:17] mgz, as soon as we land it is release time [15:17] well [15:17] once it passes CI [15:17] I've given up and just assume I'm always talking to myself [15:17] thumper, details details ;) [15:17] mgz, what thumper said [15:17] wwitzel3: so... tycho here is doing some lxd container stuff for us [15:17] thumper: was OTP. CI is ready for your landing [15:17] thumper: awesome [15:17] sinzui: coolio [15:18] thumper: what stuff? [15:18] container/lxd [15:19] thumper: alexisb mgz: Robie had a brilliant idea to solve the deoloyer/quicikstart/pyjujuclient problem. Maybe we can include those plugins in the juju-code source package to ensure lock-step delivery of compatible plugins to trusty (and everywhere) [15:19] thumper: right, but what about it is being done for us, I mean [15:21] wwitzel3, tych0 is adding lxd support to juju-core [15:21] sinzui, thumper and mramm have been pondering that [15:21] and I am sure would like your input [15:22] alexisb: oh, nice :) [15:22] alexisb: we can release as we have done in the past. But I thinkn we need to change the policy to release blessed revisions that have passed compatability and reliability tests. Those tests take days to run and mostly run on weekends when CI has more resources [15:23] thumper: github.com/tych0/juju lxd-container-type [16:11] anyone more or less familiar with environ.Config? [16:15] perrito666: don't know if I can help you, but ask [16:16] I am looking at the implementation because I might want to add a key but I am not sure I understand it properly [16:17] perrito666: regarding schema and default values? [17:06] Bug #1479889 opened: Test failure com_juju_juju_featuretests.TearDownTest.pN44_github.com_juju_juju_featuretests.dblogSuite [17:22] Hi there. [17:23] Need some help upgrading juju 1.23 to 1.24.3 [17:24] 1.23.3 to 1.24.3 [17:24] redelmann: what is going on? [17:25] perrito666, hi. [17:25] perrito666, i was trying to upgrade juju in maas environment [17:25] perrito666, after running "juju upgrade-juju" [17:26] perrito666, machine0.log says: http://paste.ubuntu.com/11967995/ [17:28] perrito666, Well, after that I can't run any juju command [17:28] perrito666, that's the problem :P [17:29] mm, are the machines still there? if so what is on the logs for machine 0? (Assuming you can access it) [17:30] perrito666, all machines are online, machine0.log: http://paste.ubuntu.com/11967995/ [17:31] have you tried restarting the juju service by hand? [17:32] perrito666, yes, and nothing happend [17:35] perrito666, same log [17:35] mm, strange, I think you will have to make some changes by hand [17:36] perrito666, "ls /var/lib/juju/tools": http://paste.ubuntu.com/11968047/ [17:36] perrito666, agents tools are there, but not linked [17:37] there is more than that to updates :) [17:38] perrito666, well i suppose that moving links will not fix anything [17:39] redelmann: I cannot really recall what change you need to do [17:39] perrito666, mhhh.... look at this: [17:40] perrito666, http://paste.ubuntu.com/11968067/ [17:41] redelmann: the rest are links [17:41] perrito666, :P i see [17:43] perrito666, couldn't read wrench directory: stat /var/lib/juju/wrench: no such file or directory [17:43] perrito666, that's is nothing to worry about? [17:43] that is not a problem, wrench is something to develop [17:44] t is used to introduce failures into juju [17:54] perrito666, i suppose that: rsyslogd-2039: Could no open output pipe '/dev/xconsole': No such file or directory [try http://www.rsyslog.com/e/2039 ] [17:54] perrito666, is not a problem too [18:09] ericsnow: I'd love it if you could review the status stuff again today. I think it should be all set. [18:09] natefinch: will do === kadams54 is now known as kadams54-away === kadams54_ is now known as kadams54 [18:40] perrito666, Ok, fixed by hand [18:40] hey, I was afk, how did you? [18:40] katco: could you or someone from moonstone look into this? https://bugs.launchpad.net/juju-core/+bug/1478156 [18:40] Bug #1478156: summary format does not give enough details about machine provisioning errors [18:40] katco: ugh, nvm [18:41] I see it's marked as high now, I had old data on the page [18:59] wwitzel3: you around? [19:03] ericsnow: you around? [19:03] natefinch: yep [19:05] ericsnow: I was trying to work out what exactly I needed to do for my kanban card about local file images and docker.... and it seems like there's no such thing as a local file image... they're all stored in a local docker repository and behave exactly like remote ones.... there's no "docker run file://home/nate/mydockerimage" [19:05] at least as far as I can tell [19:06] natefinch: the idea is, for local file images, to load them first [19:19] Bug #1479931 opened: Juju 1.22.6 cannot upgrade to 1.24.3/1.24.4 [19:19] Bug #1479942 opened: Reference to undefined method [19:22] ericsnow: sort of a problem... the name of the tar file bears no relation to the name of the image. [19:23] ericsnow: so if we're given foo.tar as something to load and run... we can load it, but we won't know what it is called once it's in the registry. I guess we could look in the tar file and figure it out :/ [19:23] natefinch: wwitzel3 will have to take it from here; I don't know enough about that [19:28] ericsnow: ok... actually, looks like a tar can have multiple images, so it even moreso won't work [19:40] natefinch: yeah, looking at some of the other tools out there that wrap docker, they take an inventory first, using docker images [19:41] natefinch: then they load it, and parse the diff [19:41] wwitzel3: doesn't solve the problem if more than one image is loaded from the tar file [19:41] natefinch: we could also use the remote API instead of wrapping the cmd [19:42] natefinch: it does, since we would parse out both of them and they can only specify a single image name in the process definition [19:42] wwitzel3: but I thought the feature was that the image name *is* the tar file [19:42] natefinch: gave you one last review (LGTM with some minor caveats) [19:43] ericsnow: thanks [19:43] natefinch: np [19:44] natefinch: well, in that case we could launch and register both [19:44] natefinch: or we could leave image as is and make the file to load a type specific arg [19:45] wwitzel3: so, does this seem like a useful feature? Is the idea that someone will package a tar file in their charm? [19:47] natefinch: I can't remember the reason for it, it was based on some feedback we got iirc [19:50] wwitzel3: seems like it needs to be better defined before we work on it. I don't want to guess at the correct implementation. [19:51] natefinch: I don't even see a card for it [19:51] natefinch: oh, there it is, overhead [19:51] wwitzel3: yep [19:53] natefinch: so if the file replaces the image name, then it won't matter how many images are in the tar, we would just load and launch any it contained [19:54] wwitzel3: I don't think that's a good idea... in all other cases, the process specification is for a single process - you give it a command to run, etc. I think it would be surprising for a single process definition in the yaml to result in multiple registered processes. [19:55] wwitzel3: maybe if we added a LoadFrom field in the process info that would tell Juju to load the image before launching it [19:56] or maybe we need a separate step that loads all images before we start launching processes [19:56] natefinch: I don't think it would be a surprise if I, the charm author provided a tar that had multiple images in it, but we shouldn't be designing this interacton anyway. We should probably ping lazyPower and whit about what that interaction would look like and what they want :) [19:56] hello o/ [19:56] in office hours [19:56] will circle back when we're out, because i know what you're talking about and want to be a part of it [19:57] lazyPower: awesome [19:57] lazyPower: I love a man that knows what he wants ;) [20:22] natefinch: ok my session is over, whaaatt would we like to do with process management in charming? :) i have some ideas already for example workloads to deliver with this. [20:23] ah i see, this is wrt multi processes [20:23] lazyPower: well, so, I had a work item to support loading images from files on disk (a la docker's load from a tar) [20:24] ok, i dont see shipping multiple images in teh charm, i see more shipping with a dockerfile/compose-formation, and building on the host during deploy, or pulling from a private registry [20:25] thats the established pattern. Do we want to advocate for fatpacking images in a charm? [20:26] I don't know that we want to make that standard practice, but some people may certainly ask for it. Fat charms are popular. [20:26] ok, let me re-check the spec to make sur ei'm on the same page [20:26] i dont want to try and account for something thats already been discussed. [20:28] lazyPower: AFAIK, it's not in the spec. So maybe that answers the question [20:28] ^^ ericsnow [20:28] We can always file and iterate [20:29] natefinch: this is something we added to the spec late last week in response to feedback katco got prepping for the demo [20:31] i think if you put in multiple resource uri's, fetch them [20:31] it wont be ovious to the user they only get a single resource, and thats not a one size fits all scenario [20:31] *obvious [20:32] and we'll see weird things happening like people tarballing up multiple images and then writing extra code to handle that when we could be handling it in the delivery mechanism [20:32] s/images/payloads/ [20:35] ericsnow: I think it's a bad idea to munge the idea of images with the tar files that docker supports. tar != image [20:36] ericsnow: I'd prefer to either let the charm do the loading itself during install, or add a new field that'll tell juju how to load the info [20:37] natefinch: hey, it wasn't *my* idea! :) [20:38] natefinch: I think having another field for the URI seperate from the image is fine [20:39] natefinch: since that would also work for the location of a private docker registry [20:43] wwitzel3: that seems fine... unless you wanted to specify both [20:43] wwitzel3: load the images from this tar into this registry... or is that not a thing? [20:45] natefinch: if you want to specify both, then you define two processes [20:45] natefinch: packing two images in to a single tar isn't that common from what I know, lazyPower might have more experience with that than me [20:46] natefinch: but I've not seen it done personally, because the size of the tar is already large, most people are trying to make their images and archives smaller, not bigger [20:46] wwitzel3: well, you wouldnt hve 2 images in a single tar [20:46] once you export, its a single package per container. I can see someone trying to work around an artificial limitation by bundling 2 images in a tar file [20:46] but that wouldn't be the norm i dont think. [20:47] unless you're trying to get hyper specific with arch and support multi-arch in the charm [20:47] ARMHF images will not run on amd64 for example, and vice versa [20:52] ericsnow: ug, these juju status tests are horrible [20:53] natefinch: sorry [20:53] ericsnow: as well you should be ;) [20:55] ericsnow, wwitzel3, lazyPower: what do you guys think about adding a resource: key to the process info, that gets passed to the plugin, and the plugin can handle it however it wants (for docker it would do a docker load) [20:55] I like that idea [20:56] natefinch: i would be a bit careful about the use of the word resource [20:56] natefinch: at long as it makes sense as a general feature and not just mostly-docker-specific [20:56] I really dont feel like having State all over again [20:57] ericsnow: I presume other container technologies might need a separate step for "install the image" before running it... but I don't know. [20:58] natefinch: yeah, who knows [20:58] natefinch: looking at the existing things - rocket/docker/runc - its all basically the same delivery mechanism [20:58] natefinch: for now we could just support it with a type option [20:58] but looking @ say, tomcat - loading a warfile has a different process [20:59] ericsnow: ahh, yeah, type options... that makes sense [20:59] ericsnow: forgot about that escape hatch [20:59] natefinch: yep, that's why we added them [21:00] ok I gotta run. I'll do it via type-option for now, and we can always make it more official later [21:00] natefinch: sounds good === natefinch is now known as natefinch-afk [21:00] natefinch: ericsnow - is this going into a different branch than what landed for the concept wwitzel3 did? [21:01] lazyPower: nope, it'll go into feature-proc-mgmt [21:01] lazyPower: it is going in to feature-proc-mgmt branch [21:01] ack [21:01] i'm going to setup a build and get a container running for this while its under active dev if you'd like active feedback on the feature before it hits CR [21:02] I had intended to do this for wwitzel3 but got sidetracked with the 1.0 launch of k8's [21:03] yes please *bat eyelashes* [21:03] :) you got it dude [21:04] wwitzel3: i'll ping when i'm working on it tomorrow [21:04] lazyPower: awesome, ty [21:15] cherylj: you cannot make CI regression as fix released, we have tests and cloud checks that say upgrades are broken [21:16] I didn't do that [21:17] sinzui: It was set to fix released by the QA bit [21:17] bot [21:18] cherylj: from the same report we can see http://reports.vapour.ws/releases/2934 that the 22 jobs failed [21:19] sinzui: Yeah, I can recreate the failure. Debugging it more now. [21:20] cherylj: sorry, IU have two email with your name first :( I had to make them non-voting for this run because if the command to release 1.24.4, but I will mkae the voting again soon [21:22] sinzui: I think this is a problem with 1.22.6, not 1.24.3/4. The upgrade is failing when it's trying to get the tools for 1.24.3 [21:23] just fyi [21:23] cherylj: maybe we should try 1.22.7 (1.22 tip) if it works, it is an incentive to relesse as soon as possible. [21:25] sinzui: I can give that a try after this debug run I'm doing now. [21:25] ec2 seems particularly slow for me today :( [21:27] cherylj: Indeed it is installing packafes seems to be taking longer [21:27] cherylj: Joyent and GCE are the fastest clouds. I tend to use joyent [21:28] sinzui: are there some shared creds for the core team? or do I need to create my own account? [21:29] cherylj: in cloud-city? yes you can use default-joyent. and you can try different regions [21:36] menn0: The state server will refuse connections while it's performing an upgrade, right? [21:42] menn0: It appears that the state server is hung trying to unpack the tools, and I see the syslog filling up with these errors: http://paste.ubuntu.com/11969606/ [21:46] cherylj: no the state server still accepts connections during an upgrade [21:46] this is weird. [21:46] cherylj: the available API requests are quite limited though [21:46] cherylj: status should still work [21:47] cherylj: "the not authorized for status" error is worrying [21:47] yeah [21:47] cherylj: also, the very high connection count [21:47] just keeps going up! [21:47] heh [21:47] cherylj: something in juju isn't releasing the connections [21:47] cherylj: that's probably not the root cause but related to it [21:48] cherylj: the authorization errors sounds closer the root cause [21:48] have you got the machine-0.log? [21:48] hang on... flying solo with a kid at the moment and he's calling [21:48] yeah, I can add your SSH key to this machine. [22:00] cherylj: menn0 help: I don't know which bugs thumpers merge at tip https://github.com/juju/juju/commits/1.24 were fixed. I can make a release, but I cannot say what issues are fixed [22:01] sinzui: looking [22:03] sinzui: looks like will and thumper have been activating the new leadership bits [22:03] sinzui: this will fix bug 1478024 [22:03] Bug #1478024: Looping config-changed hooks in fresh juju-core 1.24.3 Openstack deployment [22:03] \o/ [22:04] sinzui: but I wouldn't cut a release until they say it's done [22:04] sinzui: based on the commit messages it looks like they're close though [22:07] menn0: I see this in the context od thumper, mgz and alexis a few hours ago [22:07] alexisb> [22:07] mgz, as soon as we land it is release time [22:07] sinzui: ok cool [22:08] * sinzui this the final job just passed and the rev is bless by all the old rules [22:11] sinzui: we still have bug 1479931 [22:11] Bug #1479931: Juju 1.22.6 cannot upgrade to 1.24.3/1.24.4 [22:11] sinzui: for some reason the QA bot marked it as fix released for 1.24 [22:11] sinzui: but cherylj was able to repro it [22:12] sinzui: we're looking at that one now [22:12] menn0: We had to make the two jobs that show the regression non-voting, which conviced CI that ere was a bless [22:12] sinzui: never mind... I just saw your comment on the bug [22:13] sinzui: cool, makes sense [22:13] menn0: we are jugglin a nasty case of a regression in the wild. 1.24.4 is better than 1.24.3 :/ [22:13] sinzui: I don't think we should release another 1.24 until this one is figured out [22:14] menn0: I think so, I really don't like releasing in this rush. I officially EODed lat hour [22:15] menn0: we can replace the proposed version with an other fixed version while in propsed. maybe 1.24.5 can be put in place by your tuesday [22:15] sinzui: ok [22:15] sinzui: this one should be fixed soon I think. i'm getting a sense of the problem from the logs [22:17] menn0: also, I will hit the delay in Lp's builders. If I see a fix in CI, I can just switch the debs we plan to put in streams :) [22:17] sinzui: sounds good [22:18] waigani: if you get a chance could you have a look at http://reviews.vapour.ws/r/2279/ pls? (no rush though) [22:21] menn0: okay, I'm just finishing some stuff for Will. Probably get to it around 11am? [22:21] waigani: np. i'm looking at this upgrade issue anyway. [22:30] cherylj: I see the problem... the relevant revision is 0e39ac8d6fcc77793e5028e03bfb651707cf1bb6 [22:31] cherylj: if the env UUID is missing open() tries to query the DB to figure it out, but that's before the mongodb login happens in newState() so the query isn't allowed [22:32] cherylj: I find it hard to believe that this was tested with an actual upgrade... [22:33] cherylj: it should be fixable by extracting the login into it's own method [22:33] and calling that earlier in open() [22:40] menn0: waigani Can either of you review http://reviews.vapour.ws/r/2283/ [22:41] sinzui: ship it [22:42] sinzui: btw I'm pretty sure I have a fix for bug 1479931 [22:42] Bug #1479931: Juju 1.22.6 cannot upgrade to 1.24.3/1.24.4 [22:42] sinzui: testing now [22:42] menn0: Thank you . I may need to wait though. Hp cloud got relested and a job failed, so I am retesting [22:43] ok [22:43] menn0: ping when you want to merge because I might just as well release your fix [22:44] sinzui: ok [22:59] sinzui: ok that fix works... just prepping for proposing now [23:00] menn0: You rock, as does cherylj . I will let CI accept the current failure and wait for the fix [23:09] waigani or anyone else, review for CI blocker please : http://reviews.vapour.ws/r/2284/ [23:10] menn0: looking [23:12] waigani: never mind ... the change breaks the state unit tests [23:12] sinzui: this is going to take longer [23:13] menn0: okay [23:13] anastasiamac_: ok to delay 10m to wait for perrito666? [23:13] menn0: okay. I Hp hates me so I am in no rush [23:13] axw: yes :D brilliant - m going to coffee [23:13] axw: is ur school run going to b k?' [23:14] sinzui: ok. I have to be out house for a bit soon so it might be a few more hours [23:14] anastasiamac_: should be fine [23:14] axw: gr8! see u then :D [23:14] sinzui: or perhaps someone else can run with it [23:14] let's see where I get to [23:20] waigani, sinzui: tests fixed [23:20] waigani: pushing now [23:21] waigani: can you take a look again please? [23:21] menn0: yep [23:21] waigani: I need to step away for a bit. if you're happy with the change can you pls hit merge for me? [23:21] back in 10min [23:21] menn0: yep np [23:22] anastasiamac_: axw I am back, thanks :D [23:23] perrito666: axw: omw [23:29] menn0: done, I hit merge also [23:33] menn0: waigani : the magic fixes-1479931 was missing, I am adding it and requeing the merge [23:34] sinzui: ugh, sorry I keep forgetting that. [23:36] waigani, sinzui: i'm back for 20 mins or so then off again [23:36] menn0: okay I will watch the merge and retry as needed [23:37] menn0: half day for me, heading to airport in 30min. [23:37] sinzui, waigani: thanks both of you [23:37] :) [23:57] davecheney: what's happened to the ppc64le builder?