[00:15] menn0: sorry, free now [01:12] wallyworld_: sorry, I didn't see you [01:13] np [01:13] wallyworld_: I was wanting to know about how tools are stored in the controller [01:13] sure, maybe a hangout [01:13] wallyworld_: ok cool [01:14] https://plus.google.com/hangouts/_/canonical.com/tanzanite-stand [01:15] wallyworld_: i'm there now [01:15] me too [01:15] hmmm [02:47] here is a personal itch being scratched: http://reviews.vapour.ws/r/2970/ [02:49] menn0 ^^ [02:54] thumper: i'm looking [02:56] menn0: and the related juju branch http://reviews.vapour.ws/r/2971/ [02:56] thumper: looks cool. so in the case of Juju there would be a single well known location for the aliases file that people could monkey with? [02:56] menn0: and it works surprisingly well :) [02:56] menn0: see the second one :) [02:58] menn0: my testing file: http://paste.ubuntu.com/12891561/ [02:58] well, not testing, but the one I started hacking up [02:59] thumper: it's cool! I guess we can acheive almost the same thing with shell aliases but the level of integration is better with your PR [03:00] menn0: I expect to create an aliases command at some stage... [03:00] like I did with 'bzr alias' [03:00] to add and remove aliases from the file through the CLI [03:01] the command could be automagically added by the supercommand if it has been registered with an alias file location [03:02] thumper: yep that would be nice [03:02] the interesting bit would be attempting to keep some stability in the file structure [03:02] so we don't lose user comments or whitespace [03:02] that'd be easy enough :) [03:02] but later [03:03] menn0: ping, got time to chat on these comments? [03:04] rick_h_: sure. I wanted to set up a call to get the spec sorted more quickly. now is good too though [03:04] menn0: cool [03:05] menn0: https://plus.google.com/hangouts/_/canonical.com/rick?authuser=1 *adjust authuser* [04:40] * thumper is done [04:40] laters [06:53] axw: small one if you have a moment http://reviews.vapour.ws/r/2972/ [06:55] wallyworld: sure, soon as bootstrap stops flooding my connection [06:55] no rush [07:02] wallyworld: why not just log it in APIHostPortsSetter.SetAPIHostPorts? rather than sending the results all the way back up only to be logged [07:04] yeah, probably should have [07:04] was logging where the original was done [07:04] i'll change it [07:04] wallyworld: thanks [07:16] axw: well that's much simpler. sigh. so stupid [07:16] wallyworld: cool :) [07:16] looking [07:17] wallyworld: LGTM [07:17] ta [07:53] waigani, ping [07:54] (and menn0 if you're around as well) [07:54] fwereade: i happen to be around [07:54] waigani, menn0: I think it comes down to "why can't we destroy a system when it's hosting environments?" [07:55] fwereade: beacuse we have a txn assert that doesn't allow this [07:56] fwereade: are you asiking why don't we remove the assert? [07:56] fwereade: because if the system goes the API server goes and we don't want the API server to go before the other envs are gone [07:56] waigani, or at least make it conditional? [07:56] menn0, that's the purpose of dying/dead [07:56] menn0, you stay dying while you clean up [07:56] fwereade: i was guessing somewhat [07:56] menn0, waigani: once the things that depend on you have gone away, you can become dead, and get cleaned up [07:57] fwereade: what if one of the environs fails to be cleaned up? [07:57] how do we back out? [07:57] waigani, then the system stays dying, that env stays dying, and hopefully we report what's happpening [07:57] waigani, we don't [07:58] okay [07:58] fwereade: when you say make it conditional... [07:58] waigani, (am I missing something? what can/should we back out if one env won't die?) [07:59] fwereade: FTR i'm going to need some kind of environment mode for environment migration [07:59] fwereade: I was thinking we could use the same mode [07:59] field [07:59] (the field is currently called migration-mode in the spec though) [07:59] waigani, re conditional, I mean that we have two use cases -- destroy-if-empty and destroy-with-contents, if you like [08:00] waigani, it doesn't particularly phase me to have those two cases differ by one assert [08:00] menn0, it feels to me like they're very different [08:00] menn0, no argument against migration-mode [08:00] right so we have two destroy paths, one asserting no envs, the other without the assert [08:00] fwereade: ^? [08:01] fwereade: agreed. although some of the modes will also need to block the same kinds of transactions. [08:01] waigani, yeah, I think one of them is set-dying-if-refcount-0 and the other is just set-dying [08:02] okay. That will also solve the problem for me. [08:02] as to backing out, as long as we can handle the situation of several envs in different states of life with a dying system that's fine [08:03] menn0, expand a bit please? [08:03] oh no wait [08:04] fwereade: one example is when an environment is being migrated out of an environment we want to block provisioning of resources for it [08:04] fwereade: I guess that could/will also be done at the API server level [08:05] menn0, I almost think it has to be? [08:05] sorry, typing out loud. So my original consern was the same as menno's above. That if we set the system to dying first, we could get zombie resources. [08:05] waigani, IMO not if we do it properly? [08:06] waigani, from my perspective "dying" literally means "the user asked us to destroy this" [08:06] fwereade: right, so a dying system should be considered just as reliable as an alive one, you just can't provision new resources? [08:07] fwereade: i'm definitely planning to lock down the API for the environment during migration. [08:07] fwereade: I had it in my head that we'd also do it at the txn level [08:07] waigani, yeah, certain changes to dying entities are no longer appropriate, but generally they should continue to function [08:07] fwereade: but that's probably overkill [08:08] menn0, I *think* that if we've got a solid apiserver-level block then nothing else will be touching state and we're safe [08:08] menn0, so long as we do a resume-all [08:09] fwereade: there's also the added protection migrations aborting early if anything is provisioning when the migration is initiated [08:10] menn0, right, I'm not strongly against some txn-level mechanism to protect migrations [08:11] fwereade: so just to be clear, we then don't care about the race in the case where we're destroying the system and any environs. I.e. someone adds an env as I destroy everything - that env also gets a bullet? [08:11] fwereade: by resume-all you mean a mgo/txn Runner.ResumeAll? [08:11] menn0, yeah [08:11] waigani, I think so, yes [08:12] waigani, races where someone wins aren't such a worry, it's races that leave us inconsistent that keep me up at night [08:12] hmm.. right I see [08:12] waigani, eg an alive system that's quietly nuking all its tenants, or a dying system accepting new ones [08:13] lol, fair enough, I see the difference [08:14] okay I'm happy, I can move forward with this. What should we do with the environ mode branch? Still useful for migration? [08:15] fwereade: just to be clear, you're suggesting a ResumeAll just after the API gets locked down? [08:15] menn0, yes, I think we need that -- right? [08:15] waigani, might well be, menn0 will be able to answer more clearly there [08:15] waigani: yep, please keep it. we can use much of it when adding the migration-mode field [08:16] okay cool, will do [08:16] fwereade: yes I think so. just making sure we're on the same page. [08:16] menn0, I *think* we'd want the ResumeAll, wouldn't we? migration is going to write some docs, and we don't want it picking up incomplete transactions from before [08:16] cool [08:17] yep that makes sense [08:17] I hadn't consider that yet but it makes sense [08:17] otherwise something could start changing the env when we thought it was stable [08:20] wallyworld: re the API address logging change [08:21] wallyworld: I noticed you did exactly the same thing which I thought of during dinner :) [08:21] wallyworld: just log the addresses inside SetAPIHostPorts [08:21] much simpler [09:00] dimitern, jam, fwereade, TheMue, voidspace: hangout time! [09:01] dooferlad: ouch, thx, omw [09:01] dooferlad, omw [09:01] omw [09:03] dooferlad: omw [10:04] dimitern, frobware, voidspace: hangout time! [10:05] jam, also ^ [10:05] ooh.. forgot about that one, omw [10:31] frobware: actually I might be able to reproduce the "machine agent never upgrades" problem - going from 1.20 to 1.24.6 [10:32] frobware: going to see if it really is reproducable (but with debug logging on) [10:32] and if it happens with 1.24.7 [10:43] voidspace, did you use deploy different charms this time? [10:43] frobware: nope, not sure what was different [10:44] voidspace, timing related? ? [10:44] frobware: in the last deploy (currently bootstrapping again) I saw machine-0 flatline at 100% CPU constant [10:44] frobware: and machine-1 never upgraded the agent [10:44] frobware: possibly [10:44] (possibly timing related I mean) [10:44] frobware: lots of errors in the logs, but nothing *useful* [10:45] frobware: so will repeat with debug logging on [10:45] voidspace, dmesg - anything interesting in there? oom killer, et al? [10:45] frobware: ParseError [10:45] ah no [10:45] dmesg [10:45] I read that as "debug" [10:45] frobware: didn't look, will check next time [10:46] frobware: if they have lots of units there will be more load on the API server, so some symptoms maybe different [10:46] right [11:07] frobware: ah, so this time the mysql/0 unit reports the newer version - but the *machine agent* is still reporting the older version [11:07] frobware: no 100% CPU usage this time [11:08] frobware: maybe it did happen before and I just missed it (seeing the unit agent with the upgraded version) [11:08] frobware: I'll spelunk the logs and see what I can work out [13:13] dimitern, frobware pint [13:13] pin [13:13] ping [13:14] alexisb, I'll take your first offer. :) [13:15] lol [13:15] 6am and look where my mind goes, scary [13:15] frobware, can you and dimiter jump on a hangout? [13:15] alexisb, sure [13:15] I want to chat about hardware so we can get things rolling [13:16] alexisb, ah I just started a doc on that [13:16] https://plus.google.com/hangouts/_/canonical.com/andy-alexis [13:19] alexisb, hey, in a call, bbiab [13:29] Bug #1508923 opened: Support for Azure Resource Groups [13:31] dimitern, frobware and I chatted you are all good he has it handled [13:33] alexisb, awesome! [13:34] dooferlad, ping - can we HO for a bit regarding your current h/w [13:34] frobware: sure [13:35] dooferlad, let's use the standup HO [14:02] Bug #1500760 changed: all juju subcommands need to respect -e env-name flag [14:09] voidspace, dimitern: see my recent doc invite regarding h/w - would appreciate if you could fill out your sections so that we can complete today and send to rick, et al. [14:45] frobware: ok [14:45] voidspace, thx === perrito667 is now known as perrito666 === Odd_Blok1 is now known as Odd_Bloke === jcsackett_ is now known as jcsackett === Ursinha_ is now known as Ursinha [15:11] fwereade: (or anyone else) anyone know why my worker would be dying with permission denied constantly? Looks like it's the watcher returning an error on Next [15:11] - http://pastebin.ubuntu.com/12894707/ [15:12] jam: ping === frobware_ is now known as frobware [15:39] natefinch: do you know what the status is re: merging feature branches? [15:45] rogpeppe: no idea [15:46] natefinch: any idea who might know? [15:46] voidspace, do you have additional h/w requirements - I noticed you left that section empty, so just double-checking... [15:47] frobware: well, we don't know yet do we - we haven't specced what we'll need for a sufficient dev environment [15:47] voidspace, james and dimitern are basing it on 4 machines, per the bundle spec [15:48] "The openstack-base bundle indicates that 4 machines are required, each with two NICs and 2 disks however it is questionable whether developers will initially need to deploy the full bundle" [15:48] frobware: if we need the full spec, I'll need two NIC cards, 2 more disks, plus two more machines [15:49] katco: what's the process for merging feature branches? rogpeppe is asking [15:49] voidspace, OK, maybe that needs more explanation. For networking I don't think we would need 2 disks. [15:49] natefinch: $$merge$$ [15:49] frobware: but we'll need the four machines? [15:49] katco: so it's ok to merge a feature branch at any time, assuming it's blessed? [15:49] frobware: my requirements are basically identical to dooferlad [15:50] rogpeppe: yes, as long as tip of branch is blessed [15:50] frobware: as my existing hardware is very similar in spec [15:50] voidspace, I say yes. [15:50] katco: great! [15:50] voidspace, beware that is NUCs are not AMT, so you would need some kind of PDU to power on/off. [15:50] s/is/his [15:50] frobware: I have a PDU [15:51] voidspace, viola! [15:51] frobware: but the existing hardware table only had space for machines... :-) [15:51] frobware: so that's cool [15:51] voidspace, bleh [15:51] frobware: I assumed it was on purpose :-) [15:51] voidspace, feel free to add more characters :) [15:52] voidspace, spec and buy what we believe we will need to deliver [15:54] voidspace, thx for the update; this is a strawman proposal anyway - just wanted to get the ball rolling today [15:54] frobware: so far - one in three upgrades to 1.24.6 have succeeded [15:54] frobware: one in one upgrades to 1.24.7 have succeeded [15:55] trying again with 1.24.7 [15:55] voidspace, interesting [15:55] I also have debug logs from a failed one [15:55] (by failed I mean that machine agent stayed on 1.20 - everything still *appeared* to work.) [15:55] yeah, weird [15:56] frobware: and will be tricky if it's actually a bug in 1.20 [15:56] voidspace, ian mentioned that they had tried 1.24.7 in the RT ticket but I didn't see any mention of that explicitly in the bug [15:57] frobware: not in that original report definitely [15:57] voidspace, he mentioned it in passing, but again, worth confirming. [15:58] frobware: looking at the rt now [15:58] Ah, Peter has tried 1.24.7 [15:58] he has logs [15:59] perrito666: ping [16:00] voidspace: pong [16:00] perrito666: are you still looking at bug 1507867 [16:00] Bug #1507867: juju upgrade failures [16:00] perrito666: the failed upgrade rt? [16:01] voidspace: I am, I did forget to own it this am [16:01] voidspace: do you have anything to add to it? [16:01] perrito666: cool [16:02] perrito666: not really, I can reproduce an issue - when I upgrade from 1.20 to 1.24 *most* of the time (but not always) the machine agent fails to upgrade version [16:02] perrito666: the unit agent reports the correct new version, but not the machine agent [16:02] perrito666: however, I can't reproduce the bug as described (missing address or corrupted db) [16:03] perrito666: this is with a deployed mongo unit and ignore-machine-addresses on [16:03] voidspace: maybe you can help me a bit, from reading at the logs, It seems to me that the juju binary in use is in fact the old one [16:04] perrito666: it would be weird for the machine agent and unit agent to be from different binaries [16:05] but that's what status is reporting [16:05] errors in the logs correspond to older versions than the one supposedly running [16:05] voidspace: do you have that env running? [16:06] perrito666: no, my *current* env succeeded [16:06] perrito666: I'll redo it (takes about 15 - 20 mins) and *usually* fails [16:06] perrito666: I'll report back shortly [16:06] voidspace: appreciated [16:06] a ps faux will shed some light [16:14] katco, natefinch, wwitzel3: ptal http://reviews.vapour.ws/r/2930/ [16:17] ericsnow: looking [16:17] wwitzel3: ta [17:18] sinzui, mgz, have we done long jump upgrade tests from 1.18.* to 1.24.7? [17:19] cherylj: no, it isn't possible, go to 1.20, then to 1.24. [17:21] sinzui: is 1.18->1.22->1.24 ok? [17:22] cherylj: I don't have my table about, but I don't think 1.18 will accept anything but 1.20.x [17:22] cherylj: if I wasn't busy I would just reply the upgrade tests [17:22] upgrade steps [17:23] sinzui: np, I can get with you a bit later on it [17:25] voidspace: any news? [17:34] sinzui: I am a bit confused on why this bug https://bugs.launchpad.net/juju-core/+bug/1497301 is in the top list of 1.25 http://reports.vapour.ws/releases/top-issues?_charset_=UTF-8&__formid__=deform&previous_days=7&issue_count=20&update=update#1.25 [17:34] Bug #1497301: mongodb3 SASL authentication failure [17:35] perrito666: the bug happens so often in the single test we run that it dominates the count of bug frequency [17:36] sinzui: but mongodb3? [17:36] I have this sensation that I missed something [17:36] perrito666: http://reports.vapour.ws/releases/issue/55fc1a67749a5674698af639 shows every occurrence. It just happens all the for run-unit-tests-mongodb3 job, which core asked us to test [17:37] this is on our feature branch? [17:37] perrito666: [17:38] perrito666: no. this test is run for every revision in every branch. The host we run the unit tests on has mongodb 3 [17:38] ok, didn't know that :) [17:39] cherylj: bad news. Juju doesn't support 1.25.0 I think tests were written to keep juju version in devel. I will report the bug in a few minutes [17:45] cherylj: https://bugs.launchpad.net/juju-core/+bug/1509032 is super important for 1.25.0 [17:45] Bug #1509032: Juju doesn't support is own version of 1.25.0 [17:46] voidspace: I need to relocate, if you get to reproduce the error please get me more info :) [17:47] Bug #1509032 opened: Juju doesn't support is own version of 1.25.0 [17:47] sinzui: looking [17:50] oh god, this is just a horribly wrong test [17:52] sinzui: was that the only failing test? [17:53] Bug #1509032 changed: Juju doesn't support is own version of 1.25.0 [17:54] cherylj: The bug points to 2 other status tests that failed in my three tries. I don't see a 1.25.0 connection to the failures [17:54] cherylj: see 5191 on jenkins github-merge-juju job [17:54] k [17:55] cherylj: mgz: http://ci-master.vapour.ws:8080/view/Juju%20Ecosystem/job/github-merge-juju/5192/consoleText is better because it isn't a victim of bad record mac [17:55] ah, thanks [17:56] anywa, looks like three dreal failures, the one in the bug and two status tests with ahrd to read mismatches :) [17:56] okay, I'm going to handle the case of the state test failing. [17:56] Bug #1509032 opened: Juju doesn't support is own version of 1.25.0 [17:57] cmars, katco, can you volunteer someone to look at the status failures here: http://ci-master.vapour.ws:8080/view/Juju%20Ecosystem/job/github-merge-juju/5192/consoleText [17:59] Bug #1509032 changed: Juju doesn't support is own version of 1.25.0 [18:00] wwitzel3, ericsnow, natefinch can one of you guys look at these failures? [18:00] cherylj: I can look [18:00] thanks, natefinch. It's the status failures in this run: http://ci-master.vapour.ws:8080/view/Juju%20Ecosystem/job/github-merge-juju/5192/consoleText [18:02] Bug #1509032 opened: Juju doesn't support is own version of 1.25.0 [18:21] perrito666: last couple of attempts to repro *failed* (i.e. the upgrade worked - failed to fail) [18:22] perrito666: and now I'm EOD and off out to Northants Geeks [18:22] perrito666: when I come back in I may try again as I can do it in front of the TV [18:24] who the hell writes tests that check 21 lines of textual output? [18:25] obtained: expected: [18:25] thanks [18:26] cherylj: forgot I have to watch the kids while my wife takes my daughter to a doctor's appointment, so I'll be mostly afk for an hour and a half or so. [18:28] natefinch: ok, had you found anything yet to hand off? [18:30] omfg, it's a whitespace problem [18:48] cherylj: sounds like you found more than I did [18:48] cherylj: I was about to just diff the before and after on those statuses [18:48] hello [18:48] I ran juju ensure-availablility on AWS and now I can't connect to the api from the CLI [18:49] cherylj: ahh yeah I see it... the number of spaces after 1.25.1 in the status. amazing. [18:50] cherylj: (still not really here ;) [18:50] http://paste.ubuntu.com/12896313/ === natefinch is now known as natefinch-afk [18:50] natefinch-afk: no worries, I'm going to fix the status failures in the same patch. [18:51] db, jujud-machine-0 are both running on machine 0 [19:01] I'll just ask on the list. === urulama_ is now known as urulama [19:19] I need a review so we can move to 1.25.0: http://reviews.vapour.ws/r/2977/ [19:22] cmars, katco natefinch-afk wwitzel3 mgz ^^ [19:24] cherylj: lgtm [19:31] cherylj: spaces broke the status test? ouch [19:32] sinzui: yeah, awesome. [19:33] sinzui: I sort of hacked it so that it won't break if the length of the current version changes again [19:33] but ideally, we'd fix that test. [19:53] perrito666: ping? [19:54] cherylj: pong [19:54] perrito666: good afternoon :) Wanted to see if this was on your radar yet? bug 1497301 [19:54] Bug #1497301: mongodb3 SASL authentication failure [19:56] cherylj: it sort of is, I kind of just learned that we are testing that [19:56] I am not yet sure I understand what this is about [19:56] perrito666: have you run into it yourself? [20:01] sinzui: my branch still did not go trough? :( [20:02] cherylj: you got a bad record mac [20:02] I resubmitted [20:02] I know :( [20:02] and chery was faster anyway [20:02] I win! [20:03] that particular flavour seems more common in the gating job of late [20:03] perrito666: I see your merge. https://github.com/juju/juju/commits/restore-fix we are waiting for 1.25.0 to exist We stopped CI so we could test is as soon as it existed [20:06] cherylj: we presumably do want to target that bug against master as well, as the dodgy tests want fixing there too? [20:06] mgz: yes, I'm working on that now [20:06] but it wouldn't hit us until we move to 1.26.0 [20:27] cherylj: Ci has started testing your revision [20:28] sinzui: yay! [20:28] sinzui: Is there any ballpark of when we can expect 1.25.0 in the ppa? [20:28] Tomorrow cherylj [20:29] * cherylj sad panda [20:29] but so it goes... [20:31] cherylj: 3+ hours to test (and make release artifacts like real agents), then 3+ jourss to get the base debs created in the secret PPA., then 1.5 hours to publish to the CPCs, then 1.5+ hours to publish to streams.canonical.com, then 1h to copy to the public PPA. [20:32] sinzui: Oh I'm sure there's a lot that goes into it, it's just a shame that these bugs added to the delay [20:32] cherylj: we have no control over Lp or Jerff, so we can only hope we get immediate service [20:32] sinzui: what do you need from Jerff? I can ask my office mate to help [20:33] cherylj: We queue the job that makes the agents for streams.canonical.com. We expect it to deliver betweek 15 and 45 minutes past the hour. Sometimes it is many hours because the machine is nusy [20:36] sinzui: I can have Rob manually trigger the job, rather than waiting the hour to pick it up [20:36] not that it helps a *whole* lot [20:37] cherylj: that is nice, I can ask in #cloudware too when it needs to happen quickly. Since the release process is a queue of steps building on eachother. there is nothing to ask for now [20:37] yeah [21:36] Bug #1509099 opened: juju does not error or warn when agent-stream is ignored [22:14] wallyworld: ping me when you are here [22:21] Bug #1509097 opened: Juju 1.24.6 -> 1.24.7, upgrade never finishes [22:45] wallyworld, ping [22:46] hey, just talking to horatio, give be 10? [22:46] wallyworld, np, when you are free [22:46] no rush [22:49] menn0: I got the syslog for that replicaset / EMPTYCONFIG bug if you want to take a look: bug 1412621 [22:49] Bug #1412621: replica set EMPTYCONFIG MAAS bootstrap [22:54] cherylj: i've got a few errands to run right now but i'll take a look today [23:14] alexisb: just finished but about to do standup, can you wait another 10 minutes or so? [23:14] geeze wallyworld [23:15] just keep pushing me off ;) [23:15] yes I will still be here in 10 minutes [23:15] but my info may be useful for your standup [23:15] alexisb: oh, ok let's talk quikly now then