[01:30] anastasiamac: can i please get a review on this which is a fix for one of the issues discovered last week during the outage analysis http://reviews.vapour.ws/r/1824/ [01:31] wallyworld: looking :D [01:38] fun \o/ [03:19] ls [06:08] jam: hey, you around? [06:08] hiya wallyworld [06:08] quiet today with everyone away [06:08] anyways [06:08] i have a MP for python-jujuclient which retries send requests if juju says it is upgrading [06:09] it should address some of the core issues deployer is having [06:09] could you take a look? [06:09] https://code.launchpad.net/~wallyworld/python-jujuclient/retry-on-upgrade/+merge/260658 [06:11] wallyworld: I do wish it was trivial to backoff retries [06:11] yeah [06:11] it *could* be implemented, but this i think is an ok first step [06:12] it covers the small window where juju machine agent needs to first check if upgrades are needed [06:12] during which time the api is limited and so the "upgrade error" is reported [06:12] which would be < 1 second normally [06:13] or thereabouts [06:13] 99% of the time (or pick your own stat), the api goes from limited -> open because no upgrade is required [06:14] this stops the case where people juju bootstrap && juju-deploy via a script [06:14] from going wrong [06:15] wallyworld: if its upgrading can't we also get disconnected completely during this time? [06:16] so, yes but what a recent juju change did was keep the api in limited mode until the check to see if an upgrade is required. if an upgrade is required, then it does that without giving the deployer a chance to connect simply to be disconnected [06:16] but that changed opened a small window where the deployer trying to connect initially got the "upgrading error" [06:17] because the upgrade worker needed to start [06:17] wallyworld: so I see that you're retrying "upgrade in progress" which is fine, my concern is are we also retrying "I got disconnected completely". IIRC the later is what broke OIL, etc. [06:17] i didn't intend to retry anything other than "we are upgrading" [06:17] for this change [06:18] the "we got disconnected" case is a bit separate [06:18] wallyworld: Isn't the original bug about getting disconnected vs upgrading? [06:18] (the problem with upgrading is that deployer got disconnected and then just died) [06:18] jam: hangout? a bit easier to explain [06:19] https://plus.google.com/hangouts/_/canonical.com/tanzanite-stand [06:19] sec, need to grab headphones [06:19] ok [09:01] dooferlad, standup? [09:31] dimitern: thanks to dooferlad it now works! [09:32] voidspace, sweet! [09:32] voidspace, omw to our 1:1 [09:33] dimitern: grabbing coffee first [09:33] voidspace, sure [09:40] dimitern: omw [10:01] morning [10:06] perrito666, o/ === jam1 is now known as jam === wesleyma` is now known as wesleymason [13:13] sinzui: hey, i did a python-jujuclient change to fix the issue of deployer complaining that juju is upgrading. but i need to talk to a maintaner to get that merged (it's approved), and then we need to figure out how to unblock landings [13:31] sinzui: ^^^^ - i can get the fix landed but am unsure what to do next to unblock things. can we ass a 1 sec delay to the CI test until pythin-jujuclient gets rolled out [13:32] wallyworld: We automatically test tht package. [13:33] wallyworld: all slaves use the juju ppa to get its packages. That is how we cause the quickstart regression last week [13:33] sinzui: so as soon as a phtyon-jujuclient fix lands in source, CI will grab that copy? [13:34] wallyworld: no, CI gets the built packages [13:34] how long from branch getting merged till CI using the changes? [13:35] wallyworld: about 1 hour after the package is built by Lp [13:35] ok, great, i'm just trying to see how long we may still be blocke for === psivaa-lunch is now known as psivaa [14:04] ericsnow: standup [14:15] wallyworld: katco: Do either of you have a minute to review http://reviews.vapour.ws/r/1829/ [14:15] sinzui: in a meeting, sec [14:16] sinzui: +1 [14:16] thank you wallyworld === redelmann is now known as rudi_gfm [14:42] wallyworld, hey there [14:42] wallyworld, if you can find some time, please have a look at http://reviews.vapour.ws/r/1830/ - instancepoller using the api [14:42] wallyworld: any help I can give on #1460171? [14:42] Bug #1460171: Deployer fails because juju thinks it is upgrading [14:42] dooferlad, voidspace, ^^ [14:43] dimitern: looking [14:43] voidspace, thanks! [14:44] ericsnow: waiting for patch to land in python-jujuclient - no core changes [14:44] should be soon i hope [14:44] wallyworld: cool [14:44] thanks for asking [14:44] wallyworld: :) [14:44] dimitern: sorry, was talking to someone else [14:45] wallyworld, no worries [14:45] dimitern: why does facade version start at 1 whilst others start at 0 [14:45] voidspace, new facades should start at 1 [14:46] (there was some decision about this some time ago) [14:46] voidspace: 0 is for facades previous to versioning iirc [14:46] cool, thanks [14:46] wallyworld, I'd appreciate if you can confirm the instancepoller should start once per apiserver (rather than per environment) [14:47] fwereade, ^^\ [14:47] dimitern: so long as it knows how to deal with mult envs [14:47] machines are per env after all [14:47] dimitern, wallyworld: yeah, it sounds like a per-env thing to me [14:48] +1 [14:48] we could have just the one, but polling intervals get tricky [14:48] dimitern, wallyworld: and including multi-env logic in the instancepoller, rather than just running N of them, would seem suboptimal [14:48] yeah [14:49] fwereade, wallyworld, but each running instance should only work for a given env? [14:49] * dimitern wonders if requiring JobManagerEnviron will make this "just work", like for other "singleton" workers [14:49] dimitern: almost 1am here, my brain is dead sorry, i need sleep [14:50] wallyworld, get some sleep then! :) [14:50] can talk more tomorrow unless fwereade soets it out [14:50] sure, no problem [14:50] see ya later [14:51] dimitern, yes, each instance is part of one and only one env [14:51] fwereade, so I guess starting one per env should work, as login will take care of which envs to use and subsequently what will the watchers report [14:52] dimitern, I think you should just be starting the instancepoller alongside the firewaller and provisioner for each environment [14:53] fwereade, right [14:53] fwereade, so I'll change that, but the rest should be fine [14:53] fwereade, thanks! [14:55] dimitern, hey, has instancepoller just always been running non-singular? [14:56] dimitern, I'm pretty sure we don't want one per state server per env [14:56] dimitern, ...in fact [14:56] dimitern, instance address-setting txns have been among the ones we've seen clogging up stuck environments, right? [14:57] fwereade so far it was started in the StateWorker() method of the MA [14:57] dimitern, and the problems with mgo/txn absolutely centre around separate flushers racing to write the same doc [14:57] fwereade, which means once per state server [14:58] dimitern, it's also in startEnvWorkers [14:58] dimitern, ...or only there [14:58] fwereade, now it's only in startEnvWorkers (running tests still) [14:59] ah ok [14:59] dimitern, but I *do* see it non-singular in startEnvWorkers [15:00] fwereade, where? [15:00] dimitern, and as a worker that's yammering at the provider api we definitely want it to be singular, I think, not to menntion my FUD about it causing the sort of workload that stresses mgo/txn [15:00] dimitern, :1116 in master [15:01] runner.StartWorker("instancepoller", func() (worker.Worker, error) { [15:01] return instancepoller.NewWorker(st), nil [15:01] }) [15:01] fwereade, right! [15:02] dimitern, so s/runner/singularRunner/ and we get a little bit better in a couple of good ways too [15:03] dimitern, (on top of passing in the api instead of teh state :)) [15:04] fwereade, in a call, will get back to you [15:13] sinzui: Should I backport bug 1442308 to 1.23? [15:13] Bug #1442308: Juju cannot create vivid containers [15:14] cherylj: no, I don’t think we will make a 1.23.4 release since we will propse 1.24.0 on Thursday [15:14] ok, thanks! [15:15] cherylj: I will add a task to the bug as WONT FIX to be clear that we choose not to [15:15] sinzui: awesome, thank you [15:20] rebooting *sigh* [15:29] abentley: you around? [15:30] natefinch: Yes, but I have standup now. I'll ping you when done. [15:30] abentley: thx [15:54] dimitern: ping [15:54] dimitern: if you're still around [15:54] dimitern: I'm still doing your review by the way... [15:54] it's big [15:54] (the patch I mean) [15:54] but also trying to bootstrap juju with MAAS [15:54] and failing - hard to tell if current failure is a MAAS problem or a juju problem, or something else [15:55] last problem was HP propietary drivers calling deploy to fail [15:55] current problem is this: [15:55] voidspace, yeah, I'm here [15:55] voidspace, sorry about the side - it's mostly tests though :) [15:55] dimitern: http://pastebin.ubuntu.com/11499441/ [15:55] dimitern: heh, indeed [15:55] voidspace, looking [15:55] dimitern: so juju fails to contact MAAS (connection refused) [15:55] a [15:56] fetching that URL in the browser works [15:56] and there's nothing useful in the MAAS logs [15:57] the MAAS node is deployed [15:58] dimitern: I updated MAAS version and am running juju latest master [15:58] voidspace, why localhost? [15:58] dimitern: because MAAS is running locally [15:58] voidspace, on port 80? [15:58] hmmm... apparently [15:59] yes [15:59] that's working fine [15:59] voidspace, try bootstrapping with --debug to get more context [15:59] dimitern: ok, will do [16:00] dimitern: it takes about ten minutes or so because these proliants are *slow* to boot [16:00] dimitern: the intelligent bios thing takes several minutes to do its thing [16:00] I might try and disable it [16:01] but it can run in the background whilst I continue the review [16:03] voidspace, is MAAS itself configured with http://localhost/MAAS/ ? [16:03] voidspace, dpkg-reconfigure maas (IIRC) [16:04] dimitern: I'll check [16:04] when I went to 127.0.0.1/MAAS instead of localhost I had to login again [16:04] so there maybe a difference [16:04] I'll wait until this bootstrap completes [16:04] voidspace, ok [16:05] voidspace, I'm pretty sure the MAAS URL has to match exactly - both in maas config and in juju's [16:06] dimitern: yep, good call === rudi_gfm is now known as rudi === rudi is now known as redelmann [16:08] natefinch: I'm free now. [16:12] dimitern: I think it needs a visible url and not a local url [16:12] dimitern: trying with the machine IP address [16:12] voidspace, that sounds good [16:12] dimitern: i.e. a node can't use 127.0.0.1 to reach the MAAS API [16:13] taking a break [16:13] voidspace, I have a similar setup locally, but I use a 192.168.50.X - .2 for maas, the rest for the nodes [16:14] voidspace, ok, I'll need to go, but might be back later [16:14] dimitern: thanks, see you later [16:15] abentley: I was going to do something like tghis to add the actions feature flag to the CI tests... is this acceptable? http://pastebin.ubuntu.com/11499809/ [16:18] natefinch: That won't work because EnvJujuClient24 is only used for juju 1.24. I meant that you should add an EnvJujuClient22 that was used for juju 1.22, that supplied the 'actions' feature flag. [16:20] natefinch: A heads-up: jog is landing support for -e with "action do" and "action fetch" today. [16:21] natefinch: In this branch: https://code.launchpad.net/~jog/juju-ci-tools/start_chaos [16:23] dooferlad: hah, and four days later I have a working juju bootstrapped to MAAS on an HP proliant [16:23] dooferlad: the PDU seems to be working fine now too, both for switching machines on and off [16:24] dooferlad: http://pastebin.ubuntu.com/11500002/ [16:26] abentley: I' [16:27] abentley: I'm not really prepared to spend very much more time on this CI test. It's already taken 3-4 times as long as I had anticipated & scheduled [16:27] cc katco ^^ [16:28] abentley: but if I can just remove my action code and merge with what jog lands, that's fine with me, though it would make for a lot of wasted work on my part. It's unfortunate both of us were working on the same functionality. [16:29] abentley: or maybe I misunderstood what you were talking about.. do you mean he was landing code in the tests or juju-core [16:30] natefinch, sorry I was working on another project and just discovered our juju-ci-tools lib needed to handle actions differently on Friday. [16:31] natefinch: He's just done an alternative implementation of the _full_args change, none of the rest. [16:31] abentley: oh ok, that's good. I'm glad we didn't overlap much [16:35] abentley: do I have to do more in the EnvJujuClient22 than implement the _shell_environ, and add a new elif in EnvJujuClient.by_version? Something like this? http://pastebin.ubuntu.com/11500166/ [16:36] natefinch: That's all you need to do for that. [16:36] abentley: thanks [16:58] natefinch: abentley: hey... so these CI tests are being wrapped up then? [16:59] katco: yeah [17:03] yay :D [17:05] why the heck do I have to log into ubuntu to "download as text" from pastebin.ubuntu.com? [17:08] lol [17:08] you can always report it as a bug === natefinch is now known as natefinch_afk [17:57] wwitzel3: ping [17:58] katco: pong [17:59] wwitzel3: hey on the rich status spec? who do you think from ecosystems/accounting would be good to ping? [18:00] wwitzel3: it has to do with charm metadata, so charmers for sure. and i would think someone from accounts would want to give input on what information they'd like when doing installations [18:00] katco: not 100% sure, so I'd ping arosales and ask him for some candidates that might have a strong interest/opinion [18:01] wwitzel3: ty. arosales, any volunteers? https://docs.google.com/document/d/1JcWkE4SNxXuFClZGBcwnU3w13IpRU1yxMhddQG6mKyE/edit# [18:03] katco, /me looking . . [18:04] arosales: ty sir [18:04] katco, I'll bring it up on our daily and send a mail out on it too [18:04] arosales: ty... please let me know who you'd like to delegate so i can add them to the reviewers list [18:06] katco, will do [18:06] katco, thanks thanks for looking for the feedback [18:06] arosales: ty again! [18:07] katco, np. I'll should have some more information this afternoon. [18:08] arosales: i'm also pulling marcoceppi into https://docs.google.com/document/d/1LORhaYvk_A8yMHkAb9FR_cN9V0S55zEx-T6QXdmr3fU/edit# [18:08] arosales: he expressed interest in nuremberg [18:08] katco, ah yes is a good one for min version [18:10] arosales: juju min. version is the one we'll be focusing on next [18:27] natefinch_afk: jog's stuff has landed now. [18:51] abentley: thanks [18:56] abentley, sinzui: I get this error on several of the tests, despite having run make install-deps [18:56] OSError: /usr/lib/python2.7/dist-packages/lookup3.so: cannot open shared object file: No such file or directory [18:57] I wonder what that is [18:59] natefinch_afk: It appears to relate to jenkins and I I see several reports of it failing === natefinch_afk is now known as natefinch [18:59] sinzui: yeah, just found some interesting things... I found it in /usr/local/lib/python2.7/dist-packages/ [19:00] natefinch_afk: my apt-cache policy python-jenkins says I have 0.2.1-0ubuntu1 [19:00] Installed: 0.2.1-0.1 [19:01] natefinch: how did you get that version? pip? easy_install? [19:01] * sinzui thinks we need the ubuntu version [19:01] sinzui: quite possibly [19:02] sinzui: I didn't know about make install-deps when I started, so I was just installing stuff however I could find it [19:03] natefinch: understood. I have to do the same on the win and OS X machines. The issue I am reading implies the jenkins lib does work on OS X, but it is working wel enough for our tests [19:04] I'm on ubuntu... just ran pip install (I think?) because I didn't know how else to ge tit [19:04] and..... now pip is dumping a giant stack trace when I do pip uninstall jenkins. Nice. [19:05] natefinch: If you ran make install-deps, you should have python-jenkins installed via apt. [19:05] natefinch: you can run pip unistall jenkins? [19:05] * sinzui isn’t sure of the pip package name [19:05] sinzui: I can try and have it fail [19:05] sinzui: it seemed to recognize the name [19:05] abentley: yeah, apt seemed to think I had it installed via apt [19:05] abentley: surely pip is installing in a path that takes precedence. [19:06] I removed and reinstalled the apt version, it still gives me 0.2.1-0.1 [19:06] I do not have lookup3 installed, and I don't seem to need it. [19:08] I have python-jenkins 0.2.1-0.1 installed. [19:09] full stack trace from running tests (there are a handful of these): http://pastebin.ubuntu.com/11502857/ [19:11] natefinch: Can you delete /usr/local/lib/python2.7/dist-packages/jenkins.py or at least move it aside so that the correct jenkins lib gets loaded? [19:12] abentley: sure [19:15] FYI, I don't have /usr/lib/python2.7/dist-packages/jenkins.py [19:15] (if I'm supposed to) [19:17] It looks like all my jenkins stuff got installed to /usr/local/lib/python2.7/dist-packages/ instead of /usr/lib/python2.7/dist-packages/ [19:17] that sounds like "you installed something with or without sudo when you should have done it the other way" but I have no idea what, being both a linux and python n00b [19:24] natefinch: No, you shouldn't have that, you should have /usr/lib/python2.7/dist-packages/jenkins/__init__.py [19:25] abentley: ahh, ok, yes, I have that [19:27] I guess get_python_lib() must be returning the wrong thing [19:40] natefinch: There are at least two incompatible packages providing 'jenkins': https://pypi.python.org/pypi/jenkins https://pypi.python.org/pypi/python-jenkins and the one installed in /usr/local/lib is the wrong one. [19:42] abentley: how am I supposed to install it? [19:43] natefinch: The right one is already installed. You just have to get rid of the wrong one. [19:45] abentley: ahh, ok, I figured it out. pip uninstall, instead of saying "Hey, this needs to be run with sudo" instead dumped a giant ugly stack trace. [19:46] which I incorrectly interpreted as "jenkins wasn't installed with pip" [19:46] that fixed it [19:57] thumper: hiya [20:09] is there a bzr plugin that'll let me run an external merge tool to fix conflicts? I found bzr-extmerge, but it appears to be ancient (tries to run with python 2.4) [20:10] thumper, sinzui, abentley: ^^ [20:13] natefinch: No, extmerge is the only one I'm aware of. But bzr dumps THIS, BASE and OTHER files that you can use an arbitrary tool with. [20:16] * natefinch closes his eyes and runs sudo python ./setup.py [20:17] er setup.py install === brandon is now known as web === brandon is now known as Guest13004 [21:46] wallyworld: do you think the maas 1,7 test would pass if we added a 30s delay between bootstrap and deployer? [21:46] sinzui: yes [21:46] sinzui: not even 30s, more like 1 second [21:46] or 2 [21:46] let me try to solve the issue. [21:46] wallyworld: I will start with 5 seconds [21:47] ok :-) [21:57] katco: you still around? [22:01] wallyworld: I am adding a call to status between bootstrap and deployer. Do you think that is enough time? Do you have a branch ready to merge to test my change. I don’t want to start a test of an old revision if you have work queued? [22:06] sinzui: everything you need to test should be in tip of 1.24 [22:06] sinzui: the python-jujuclient work simply retries during the second or so you will be deplaying [22:06] delaying [22:06] which would make the delay unnecessary [22:08] wallyworld: I am pushing a change to all the slaves. I will retest 1.24 tip when I see the changes areive [22:08] arrive [22:08] sinzui: tyvm, i will wait with baited breath [22:09] Bug #1460184 changed: Bootstrapping fails with Maas on Ubuntu Vivid [22:14] ericsnow: any chance of a trivial time display fix review? the code change is one line, the test changes are a search and replace http://reviews.vapour.ws/r/1823/ [22:15] wallyworld: sure [22:15] ty [22:16] nice: "You Require More Vespene Gas" (in a test) [22:17] wallyworld: ship-it! [22:17] ericsnow: ty [22:17] wallyworld: any time [22:24] marcoceppi: am now, what's up? === anthonyf is now known as Guest41448 === Guest13004 is now known as web === anthonyf is now known as Guest28879 [23:45] waigani_: heya, you working on bug 1376246 ? [23:46] Bug #1376246: MAAS provider doesn't know about "Failed deployment" instance status [23:48] great, build is blocked, again [23:49] wallyworld: no, I should be able to start on that today though. [23:49] waigani_: great, becasue we want 1.24 work done so we can look to do a release overnight [23:50] wallyworld: okay, let me get a bite to eat and I'll get into it [23:51] ty [23:52] wallyworld: sorry I missed standup, been on the phone with iinet for 40 minutes trying to get my account unlocked :/ [23:52] axw: gawd, i hate isps. all fixed? [23:53] wallyworld: yeah, silly error while setting up my new modem. OTOH, seems I got swapped to the new port and now I'm syncing at 16Mb as opposed to 4Mb I was getting for the last few months [23:53] oh good :-) [23:54] axw: you free now for a chat? [23:54] sure, just a quick one tho [23:54] see you in standup