[00:08] thumper -- sorry about that! oops.... sigh [00:09] I'd cherry-picked the PR content it's waiting on in order to put the work through. let's see here... [00:11] apologies for wasting your time with that [00:15] there we go [00:15] https://github.com/juju/juju/pull/163 [00:36] wallyworld, the job failed even after I cleaned up the environment. Here are the logs http://juju-ci.vapour.ws:8080/job/local-upgrade-precise-amd64/1434/ [00:36] sinzui: sec, talking to alexisb , will contact you soon [00:37] wallyworld: do you know where I can get the instance type information I need for the client API? [00:45] wwitzel3: yeah, sorry, i read your email and haven't had a chance to respond yet - been in meetings all morning. there is an api there, i just have to lookit and and let you know. will do so soon [00:47] wallyworld: np, thanks. [00:51] sinzui: i can see the state server workers all start up, and also mongo. there appears to be no reason why the api client cannot connect to port 17070 - is it possible to do a netstat to see if the state server is indeed listening on the correct port? [00:55] wallyworld, This is what I saw during the upgrade http://pastebin.ubuntu.com/7703456/ [00:56] wallyworld, WTF, this just happened on the next test [00:56] http://juju-ci.vapour.ws:8080/job/local-upgrade-precise-amd64/1435/console [00:56] sinzui: is that done after these lines [00:56] 2014-06-26 00:13:38 INFO juju.mongo open.go:90 dialled mongo successfully [00:56] 2014-06-26 00:13:38 DEBUG juju.state open.go:58 connection established [00:56] * wallyworld looks at new console [00:57] seriously? it is mocing us [00:57] mocing [00:57] wallyworld, it is a pass with a panic...that is a first [00:57] mocking [00:58] sinzui: i think that's due to the agent shutting down for upgrade, just caught it at a bad time [00:58] wallyworld, status may have panicked, it is called several times. the last call gave a result showing all machines upgrades [00:59] wallyworld, could be...but the code i wrote tries to capture that...that is why status will be called several times [01:00] sinzui: i can see why status is behaving that way and there was a recent change there - it think it's missing a sanity check [01:01] we can get back an error from the call to get status and still have partial status to display [01:01] but we should check that we do indeed have some status and not nil [01:01] so it will be a simple fix if i am correct [01:02] sinzui: but i wonder why the CI ob failed the first time, there's no obvious reason [01:03] wallyworld, I am unsure what to do now. If I wasn't watching that happen, I would declare the test good and just focus on the ill azure [01:03] sinzui: 2 out of 3? :-D [01:03] wallyworld, exactly [01:03] let's run it again [01:04] azure and joyent are messed up. I am in the consoles killing machines [01:04] menn0: hi ya, i think there's an issue with the recent changes to status. i think we are missing a nil check. see http://pastebin.ubuntu.com/7703483/ [01:05] do you agree? [01:06] yep. davecheney has already fixed it [01:06] https://github.com/juju/juju/pull/127 [01:06] wallyworld: it looks like the landing bot didn't pick up the merge request though. [01:07] wallyworld: how do we make it notice the PR? [01:07] um [01:07] $$merge$$ should have been enough [01:07] i'll look into it [01:07] It could be because this was a PR where the merge started and then was aborted because the initial proposal wasn't quite right [01:07] wallyworld: ^^ [01:08] ah [01:08] wallyworld: I think you were the one who killed it in Jenkins (at dave and my request) [01:08] menn0: yes. but after that it needs $$merge$$ again to re-trigger [01:09] i'll do that [01:09] well it's had that [01:09] but try again I guess [01:09] wallyworld, (Sorry for this awkward question from my daughter), Are all 40-50 year-old Aussies obsessed with ABBA. [01:09] I said no, but she doesn't believe me [01:09] rotfl [01:09] only some of us :-) [01:09] abba were very, very popular here [01:10] who isn't obsessed with ABBA? am I right? [01:10] I know. I told her about how many weeks Fernando and Dancing Queen spent at number one and she then decided there is a cohort who can't let the band go [01:14] she is right [01:14] but i am not one of them :-) [01:15] wwitzel3: i sent you an email - there's a little refactoring required, sorry [01:16] wallyworld: ok, yeah, I saw that .. I think I can just add that to the interface in common provider? But I am still not sure how to actually get the provider to call the method on. [01:17] wwitzel3: you have it in your method [01:17] func (api *EnvironmentAPI) getInstanceTypes(env environs.Environ) [01:17] env is the provider [01:17] so you add the method to Environ [01:18] wallyworld: lol [01:18] wallyworld: of course it is [01:18] the ConstrainstValidator() method is already there [01:18] :-) [01:18] I was sooo close [01:18] yep :-) [01:18] wallyworld: thanks :) [01:18] np [01:19] menn0: i have no idea what's wrong, i'm just going to merge it directly [01:19] wallyworld: ok thanks [01:20] sinzui: i just merged in a fix for that status panic [01:21] :) [01:21] sinzui: it was proposed a few days ago it seems but the bot just didn't want to pick it up [01:46] review requested: https://github.com/juju/juju/pull/164 ; this just updates for the newly updated names package and makes the internal structure of the Action consistent with other state structures. [02:01] wallyworld: with you shortly [02:01] ok [02:28] thumper: I'm thinking of picking up this bug: https://github.com/juju/juju/issues/138 [02:29] thumper: I see there are two ssh clients: openssh and gocrypto embedded. Does the gocrypto save known_hosts? [02:29] waigani: first thing, can you move that bug to launchpad? [02:29] thumper: sure [02:32] thumper: https://bugs.launchpad.net/juju-core/+bug/1334481 [02:32] <_mup_> Bug #1334481: juju should not record ssh certificates of ephemeral hosts [02:32] waigani: can you also link that on github too? for the issue [02:33] thumper: done [02:33] added comment [02:34] thumper: shall I do the same for this one: https://github.com/juju/juju/issues/133 [02:34] waigani: check to see if it has been done already, but yes [02:34] though there has already been some discussion on github [02:34] ok [02:37] thumper: done, and linked on github [02:37] waigani: yes, working on that issue would be good [02:38] thumper: cool. I'll start with a failing test. So we will just not store the known hosts at all on ssh right? [02:40] waigani: yeah... but just for juju ssh [02:41] thumper: cmd/juju/ssh ? [02:41] yup [02:49] axw: got time for a quick hangout? [02:50] wallyworld: can you give me 5 mins please? [02:50] sure [02:54] wallyworld: I'm in the tanzanite-daily hangout [02:55] thumper: fwiw, on PR 163, a lot of the tag/id stuff has been clarified with that last names package update I submitted [02:55] heh... [02:55] I'm just commenting on what I see [02:55] :) [02:56] thumper: bodie_ told me this morning that he expected to have to do some refactoring once my change was in [02:56] thumper: :) [02:56] * jcw4 is just nervous about when thumper's beady eye gets on my PR next [02:57] jcw4: which PR is it you want me to look at? [02:57] 164 [02:57] or not... y'know [02:57] if you need a nap or something... [02:57] thumper, cleaned up 163, btw -- it looks like you're commenting on the content I removed :( [02:57] all jokes aside, I'm *loving* the code review process on this project [02:58] it's still useful -- the code you're commenting on is for PR 140 and 141, so I can still make use of it [02:59] bodie_: hmm... ok, what should I be looking at? [02:59] https://github.com/binary132/juju/commit/1de2d29aba97a422da32fcfde1a15c94e150e1ad [03:00] bodie_: is that because your PR 163 was rebased on top of your pending 140 and 141 ? [03:00] bodie_, jcw4: something to be aware of, with the upcoming work on multi-environment state servers, all the document _id fields will change to include the env uuid [03:00] yes, I removed the condensed commit and pushed --force so I thought it would be clear it was gone from the PR [03:01] but, the rebased commit was just the same content from 140 and 141, so I could run tests [03:02] thumper: so if we hide that _id behind the public api of the state types we should be fine right? [03:02] generally... [03:02] wallyworld: :P how many bundles have you created by hand? [03:03] rick_h_: none [03:03] but it sure would be nice to have a cli for it [03:03] wallyworld: sure, from an existing environment as a dump/backup. [03:04] eg juju start bundle followed by a series of deploy, relate commands, then juju end bundle [03:04] wallyworld: but anyway, replied. rubbing some ointment over the gui comment sting :P [03:04] wallyworld: it's called a shell script, you can do that today [03:05] rick_h_: sorry if it came out bad, wasn't intended [03:05] wallyworld: I'm joking with you [03:05] i just wanted to make the point that most of our target audience don't use guis [03:05] wallyworld: except I don't think bundle creation is a cli/scriptable thing [03:05] wallyworld: do we have a target audience now? [03:05] devop people? [03:06] i guess windows folks want a gui [03:06] because if we're going to talk small devs I'll argue with you [03:06] i'm a small dev [03:06] i don't use guis [03:06] but yes, at scale people want scriptable > * (*cough* thumper *cough*) [03:06] yup [03:06] wallyworld: never confuse you vs a target audience. [03:07] I use vim and a terminal all day and the only gui app I run is a browser [03:07] rick_h_: I'm going to make you happy and make it all script happy [03:07] rick_h_: leave it with me :) [03:07] thumper: :) hey wallyworld is preaching it too [03:07] rick_h_: I've already cleared the approach with fwereade [03:07] woot [03:07] * rick_h_ does happy dance [03:07] thumper: you have a doc to share on that? [03:08] waigani: nope, it is inside my head, but not complex [03:08] wallyworld: thumper and I were having this conversation yesterday so glad to see your chime in as well. [03:08] thumper: must be simple to be in your head [03:08] wallyworld: it is [03:08] ouch [03:08] maybe there's a GUI in it [03:09] lol [03:09] har har! [03:09] you funny [03:09] I try, it's past my bedtime [03:09] careful, you'll turn into a pumpin [03:09] k [03:09] half way there, let me get an orange shirt [03:11] ll [03:11] o [03:11] and a green hat [03:12] and a camera [03:24] * thumper takes a deep breath and moves to the next PR [03:27] * bodie_ hands thumper a bottle of water and cheers him on [03:37] thumper, wallyworld. A recent rev broke the win installer. We cannot compile it https://bugs.launchpad.net/juju-core/+bug/1334493 [03:37] <_mup_> Bug #1334493: Cannot compile win client [03:42] * sinzui forces build a a revision before the win and os revisions [03:45] bodie_, jcw4: some comments on PR 164 [03:45] thumper: right behind you [03:45] particularly the last one, as that is the biggest question I have [03:46] I'll comment on the pr thumper, but this goes back to that watcher point [03:47] if we have a watcher on the actions collection [03:47] and that watcher gets _id's for *free* [03:47] we can filter on those _id's without another db hit [03:48] sure, but that doesn't answer the question [03:48] thumper: because there could be multiple actions with the same name [03:48] jcw4: a key question is: "Is the combination of unit and action name unique?" [03:48] thumper: no [03:48] why? [03:48] what is the differentiating point that makes actions here special? [03:49] how does a user differentiate? [03:49] if I say "run the backup action" it may mean multple things? [03:49] if so, why? [03:49] or is this "an instance of someone running the backup action" ? [03:50] thumper: every time a user types 'juju do ' an Action get's queued on the actions collection using the assigned unit and name [03:50] thumper: I may say the same command twice [03:50] thumper: intending it to run twice [03:50] ok... [03:50] how come an action doesn't have a user? [03:51] or a date requested? [03:51] I think an action should have a timestamp that it was created [03:51] and who requested it [03:51] thumper: my very first PR for this document had unitName, timestamp, (no user), etc. [03:51] heh [03:52] in discussion w/fwereade we eliminated the unitName because it would be encoded in the _id [03:52] sinzui: looking [03:52] the timestamp was deemed unnecessary for now [03:52] thumper: the intent is for us to basically have a super lightweight 'tracer' implementation [03:52] * thumper coughs [03:53] thumper: and then fill in the details later [03:53] * thumper looks shiftily at fwereade's shadow [03:53] * jcw4 feels guilty for throwing fwereade under the bus [03:53] jcw4: what is the lifetime of an action? [03:53] fwiw, I think fwereade's case was sound [03:53] when do we remove it? [03:53] thumper: as long as it takes for the unit to execute it. [03:54] probably minutes or seconds [03:54] usually [03:54] so we end up with an action result? [03:54] how long do they live? [03:54] forever [03:54] and ever [03:54] ouch... [03:54] * thumper forsees an issue [03:55] we obviously have different definitions of lightweight [03:55] to me remembering who asked and when is part of very lightweight [03:55] to be fair, we haven't discussed any archiving of the results yet [03:55] when you record the result, you then have a timestamp for finish and can then deduce a duration [03:55] jcw4: but results could be big right? [03:55] thumper: indeed [03:55] jcw4: or do they point to locations on file? [03:56] not in the current implementation [03:56] well... they could... [03:56] thumper: yep... tbh we hadn't thought that far ahead yet [03:56] (we being me) [03:56] given that we want to back up the db periodically [03:56] and I don't want all my postgresql database backups stored in mongo [03:57] * davecheney shreeks [03:57] sorry davecheney, bad moment to listen [03:57] i've been listening for a wihle [03:57] i just couldn't stand it any longer :) [03:57] haha [03:57] http://paste.ubuntu.com/7703965/ [03:57] still one more race in the state/apiserver package [03:57] i'm on it [03:58] ta [03:58] thumper, davecheney to be fair we don't have *any* actions actually runnable yet, so the danger isn't there until we do :) [03:59] jcw4: anything that ends with 'my backups are stored in mongodb' is horrifying [03:59] jcw4: so you are just going to hand us a hand grenade and say "here you go, juggle" [03:59] * thumper chuckles [03:59] * jcw4 wonders how to respond to that [03:59] heh [03:59] well.... [03:59] :) [03:59] jcw4: we'll need a way for a user to say "please discard the results for this action now" [03:59] when the only tool you have is mongodb, everything looks like /dev/null [04:00] davecheney: mongo is web scale [04:00] thumper: so's /dev/null [04:00] :) [04:00] exactly [04:00] axw: wallyworld is there a race build in jenkins ? [04:00] um [04:00] thumper: so... we're trying to build/define actions here as we go [04:00] no [04:00] jcw4: that's going to end in tears [04:01] haha [04:01] we are considering it [04:01] hmm... [04:01] possible race from sabdfl, possible sadness from your team [04:01] wallyworld: i'll add it to the weekly meeting notes as a discussion point [04:01] sure [04:01] wallyworld: do you know the status of the release / upgrade ? [04:01] i was watching a bunch of reverts overnight [04:01] that then got reverted [04:01] jcw4: so... one question [04:02] davecheney: reverts were red herring, i think a few conclusions were jumped to [04:02] * thumper tries to formulate... [04:02] davecheney: someone broke the windows build, i'm fixing that now [04:02] wallyworld, the build of the older revision, the one that only reverts daves rev [04:02] wallyworld: i think it was a good hunch [04:03] thumper, davecheney we've purposefully not exposed cli usage yet so that there's minimal exposure until we're done. [04:03] jcw4: ack [04:03] wallyworld, This the the first time I have specifically tested a rev to get a pass [04:03] jcw4: IMO, and fwereade may disagree, the id for any document should be composable from attributes in that document [04:03] sinzui: you talking about the local upgrade? [04:04] jcw4: so we don't need to parse the id to get attributes [04:04] jcw4: expecially if parts of said id are used in other places [04:04] such as the tag [04:04] wallyworld, yes, but since dave's rev was immediately restored, CI never tested just the revision we wanted [04:04] going from a set of attributes to an ID is easier than trying to do the reverse [04:04] wallyworld, the rev could be restore until CI had built juju without it [04:04] thumper: that makes sense; it feels a little redundant, but makes sense to me [04:04] and the amount of data we are storing is minimal [04:04] thumper: +100 [04:05] seriously, minimal [04:05] sinzui: oh, so that one *may* have broken upgrades? i thought we just got a passing CI test? [04:05] https://bugs.launchpad.net/juju-core/+bug/1334500 [04:05] <_mup_> Bug #1334500: state/apiserver: more data races [04:05] thumper: https://github.com/juju/juju/pull/165 fixes a release blocker sinzui found [04:05] thumper, davecheney what we need is some way to incrementally build / design what we're doing, and get active feedback (like this), without a fully specified feature doc [04:05] i'll throw this back in the pool if I can't fix this by EOD [04:05] wallyworld, I think we got lucky. the test passed, yet there is a panic in it [04:05] davecheney: ack [04:05] wallyworld: looking [04:05] jcw4: yes, if you don't have that, success will be hard [04:06] sinzui: the panic was just in a juju cmd [04:06] If this rev I am testing passes I will release it. that is all I want a rev that passes that developers don't also say has a hidden bug [04:06] wallyworld: lgtm [04:06] it's fixed now but would have had little impact [04:06] thumper: thanks [04:06] thumper: i'm just going to hit merge directly so sinzui can rerun the windows build [04:07] davecheney, thumper I know you're in the middle of a couple other issues here; but we're also in somewhat of a tight spot because sabdfl is anxious for a version of actions that works end to end (even if it's very minimal) [04:07] wallyworld, I cannot [04:07] I am testing a previous rev [04:07] jcw4: ok... please can we start by updating the state doc so the id is composed from other attributes? [04:07] ah ok [04:07] thumper: +1 [04:07] CI gets nasty if I try to make it do change what is being test [04:07] jcw4: and if you don't decide to add a timestamp and user, add a note that says thumper wants it there [04:08] thumper: absolutely, and I'll add jcw4 too [04:08] \o/ [04:08] thumper: I'll also add a note to ActionResults about the long term risk of not managing old results [04:09] jcw4: I think that removing old action results must be part of the initial release [04:09] otherwise crazy ensues [04:09] thumper: agreed [04:10] probably something as easy as "juju action rm " [04:10] jcw4: so... actions are defined in the charm metadata, yes? [04:11] thumper: yes [04:11] jcw4: do we do validation somewhere on action names being requested [04:11] jcw4: is there a command to list action results? [04:11] thumper: yes, and will be [04:11] jcw4: we are going to have to have user there... ASAP [04:11] jcw4: because I will most of the time only be interested in seeing the actions I asked for [04:11] jcw4: but I should be able to see all [04:11] thumper: interesting [04:12] (assuming I have permissions) [04:12] thumper: makes sense [04:12] jcw4: as an aside, we will probably have permissions fine graned enough to say who can do what actions on which service [04:12] thumper: were you involved in the draft spec of Actions ? [04:12] * thumper handwaves [04:12] jcw4: not really, I think that was mostly sabdfl [04:13] jcw4: although I have spend most of the last two weeks just writing specs [04:13] * thumper sighs [04:13] :( [04:13] https://docs.google.com/document/d/14W1-QqB1pXZxyZW5QzFFoDwxxeQXBUzgj8IUkLId6cc/edit#heading=h.q6wtcjv2r9h [04:13] thumper: I think I want to capture a lot of your suggestions there [04:14] heh [04:14] jcw4: looks like the doc suggests a uuid for an action [04:14] yep [04:15] I don't recall if we explicitly discarded that idea or if it just slipped by us when we started worrying about filtering the events on the watcher [04:16] jcw4: also notice that the spec shows that the action records when it was invoked [04:16] that looks like a timestamp to me [04:17] * jcw4 blushes [04:17] not for the first time tonight [04:17] hmm... [04:17] I do think that the design has gotten a little overcomplicated, in that we only need one action doc, not two [04:17] two? [04:17] we should have the action results stored with the action [04:18] I see [04:18] I don't think we need an ActionResult doc [04:18] the result belongs to an action [04:18] this way you don't need to copy fields across [04:18] consider this: [04:18] $ juju status action:UUID [04:18] in the spec, there are two options: [04:19] running, or failed [04:19] this indicates to me that we are looking in one place to see the information [04:19] which means a simple database query [04:19] to get the action whether it is running or done [04:19] * thumper takes a deep breath [04:20] I feel a real design review coming along [04:20] how much time do you have? [04:20] thumper: yes that makes sense... believe it or not, we started there and currents and eddies along the way pushed us to the two docs we have now [04:20] I *want* to go for hours [04:20] * thumper smiles [04:20] I *should* have been off hours ago [04:20] :) [04:21] * thumper looks in trunk [04:21] hmm... [04:22] * thumper goes back to the spec [04:24] jcw4: ok where should I dump my thoughts? [04:25] jcw4: I don't want to put them in the spec [04:25] How about an email to the list? [04:25] jcw4: do you have a design spec? [04:25] um... yeah... ok [04:25] more potential for bikeshedding [04:25] I almost emailed the list a couple days ago, but didn't [04:25] but ok [04:25] thumper: that's true [04:25] lets try it :) [04:25] we started a couple spec docs, but nothing worth sharing [04:26] Maybe you might craft a new doc and link to it from an email? [04:26] one may fall out of the conversation [04:26] thumper: ack [04:26] <--- did you notice that? [04:26] ;) [04:27] learning new catch phrases as I go [04:31] jcw4: nice [04:38] davecheney: i'm not sure the data races are critical blockers for the 1.19.4 release - so long as CI is happy, we can fix them post release [04:39] wallyworld: sure thing [04:39] you're the judge [04:39] but if you can fix quickly.... [04:39] i'm fixing it anyway [04:39] but please lets not block this release any futher [04:39] great, may be able to sneak it in :-) [04:39] yup [04:39] that was the thinking [04:39] i'll take off the 1.19.4 milestone [04:42] sinzui: so what's the verdict with the release at the moment? [04:43] If I must release, I can use a8f48d14 which is before the 1.18.x upgrade fix [04:44] The revision under test has that fix, is before the win build broke, and might be without the local precise upgrade problem [04:46] looks reasonable [04:48] does github show commits in order? [04:48] jcw4: sent [04:48] in order of merging? [04:48] sinzui: remember that I want to disable the user command for the 1.20 release [04:48] sinzui: if the commits are in order, the 1.18 upgrade fix predates commit a8f48d14 doesn't it? [04:48] sinzui: but really just in the release branch [04:50] and yes, i just check that rev, and my 1.18 fix is there [04:50] the one to stop peer grouper publishing empty api addresses [04:50] jcw4: sorry I haven't looked in earnest before now [04:52] thumper: just one more thing on the list... I'm glad you've done what you have :) [04:52] np [04:52] thumper, wallyworld I am honestly just looking for a passing rev. We wanted to release week, so that rev would not have had any of these fixes. I was a moron for not choosing 6a2c202d when it passed 2 days ago [04:53] sinzui: you still can right? [04:53] * thumper is about to leave and cook [04:53] 2 hours of meetings tonight === thumper is now known as thumper-afk [04:53] wallyworld: fix coming up, 20 seconds [04:54] FUCK [04:54] my working copy is screwde [04:54] I can will release the newest passing rev tomorrow, when I am awake enough to not make a mistake [04:55] sinzui: next time we'll branch off a release candidate so commits to trunk don't screw us [04:55] https://github.com/juju/juju/pull/166 [04:55] wallyworld: pls review [04:55] looing [04:55] wallyworld, Next time I wont listen to developer saying we have a blocking bug. [04:55] i'm removing that errant werker.yml file [04:55] looking [04:56] wallyworld, I care about regressions in the recent commit. I don;t care about something that has been broken for weeks or months [04:56] sinzui: agreed. if something does come up and it "needs" to be marked critical, we should then get concensus at least [04:58] wallyworld, in the case of the upgrade bug, we took more time coming to consensus that fixing it. maybe a dead line is more important. time-based releases work best when we just release [04:58] wallyworld, 1.19.3 was made from a week-old rev because trunk was broken [04:59] yeah, if we can keep to a short enough release cadence [04:59] thumper-afk: thanks for the review [04:59] wallyworld, mayeb we need stop the line. no one lands a branch until the critical is fixed...no one adds another regression until we fix the current on [04:59] anyone else ? [04:59] davecheney: was it just that one attr? [04:59] sinzui: i'd rather branch [05:00] that way trunk development is not held up [05:00] wallyworld: thanks gents [05:00] i'll submit this now [05:00] sinzui: a few days before release cut a 1.19.4 rc branch [05:00] this doesn't feel like the root cause of the mogno panics [05:00] i'll keep looking [05:00] and test it, and address any blockers on that branch [05:00] davecheney: awesome for looking, thank you [05:01] wallyworld, when do we branch? trunk has been broken most of this month [05:01] thumper-afk: fyi - i'm going to stop landing names changes for the immediate [05:01] until [05:01] sinzui: fair point. ideally, i'd say stop the line when CI breaks. but with unreliable clouds..... [05:01] a. 1.19.4 lands [05:01] b. i can find those f'n races [05:02] wallyworld, exactly. I am awake now because azure and joyent cannot be trusted [05:02] davecheney: did the race detector pick up that envuuid one? [05:02] sinzui: that's the root cause of some of our "trunk is broken" issues [05:02] because we can't *enforce* a "you break it, you fix it" approach [05:03] because we can't trust why CI is broken [05:03] wallyworld, HA is the root cause of most trunk brokeness, followed by API [05:03] ah, true [05:03] * jcw4 goes to bed... [05:03] more specifically, mongo is terrible [05:03] and mongo + replicaset is even worse [05:04] but mongo is web scale :-/ [05:08] wallyworld: yup [05:08] i am *sure* there are other races [05:08] but right now, we can't see the wood for the trees [05:08] davecheney: there indeed are, but if you fix the apiserver one, then woot [05:08] there's another in our watcher shutdown code [05:08] causing session closed errors [05:09] wallyworld: on it [05:10] davecheney: you know the error i mean? [05:10] wallyworld: and it was the one we hoped jam's PR would fix [05:11] davecheney: bug 1305014 [05:11] <_mup_> Bug #1305014: panic: Session already closed in TestManageEnviron [05:11] that's the main one i think [05:11] shit [05:11] that was supposed to be fixed [05:11] we spent three days agonosing over the bloody fix [05:11] before it landed [05:11] oh? [05:11] wallyworld: this is _not_ the change that you hulk smashed for jam last night ? [05:12] before the api races over the past week, that session closed one was the main reason tests failed [05:12] nope [05:12] my change was about stopping the peer grouper publishing empty apiaddress lists [05:12] if that's the one you are referring to [05:13] right [05:14] on it then [05:33] wallyworld: is it qlways one package that blows up with the session already closed errro ? [05:33] axw: wtf does this mean. i haven't pushed to my branch since last time, yet it is saying my remote branch is behind the remote counterpart [05:33] git push origin managed-resources [05:33] To https://github.com/wallyworld/juju [05:33] ! [rejected] managed-resources -> managed-resources (non-fast-forward) [05:33] error: failed to push some refs to 'https://github.com/wallyworld/juju' [05:33] davecheney: mostly i think [05:33] davecheney, wallyworld, axw: hiya [05:33] the travebacks in the bug report should show it [05:34] hi [05:35] wallyworld: i'll figure it out [05:35] ... me waits for tests to run [05:36] davecheney: sorry, yeah i don't have the info to hand, i'd need to go and look at the bug report [05:37] wallyworld: meh, i'll live === vladk|offline is now known as vladk [05:44] axw: wtf, now the pull request diff shows all the changes to trunk after I did the initial proposal [05:45] i did rebase by branch so i could confirm there were no conflicts with trunk sit it may have bit rotted [05:45] how the fark then do you do that and no have github mess everything up? this stuff just worked flawlessly with launchpad :-( [05:46] wallyworld: "i did rebase" sounds like the start of your problems [05:46] rebase throwing away history means DAG related operations lose context (IMO) [05:46] wallyworld: so if you merge trunk, and then rebase that later [05:46] all those changes look like you introduced them (I believe) [05:47] depends on if rebase throws out the merge commit or not [05:47] jam1: i thought rebase in this case simply moved your stuff out the way, merged in tip of trunk, and put your changes back? [05:47] wallyworld: well in your history one of your changes is merging trunk, right? [05:47] jam1: sure, but with launchpad, that all just worked [05:47] anyway, it isn't something I've used tremendously [05:47] wallyworld: you never rebased in LP [05:48] you don't *have* to rebase in git [05:48] so how else do i bring in trunk and not have my wip commits all sprined through? [05:48] sprinkled [05:48] wallyworld: live with them being sprinkled, like we did with LP [05:49] well, not really [05:49] wallyworld: they were hidden by default, but you can get that with "git log --first-parent" as well [05:49] when you did the merge into trunk in lp, your merge commit was correctly placed in the timeline [05:50] so if i $$merge$$ what's there now, i assume all the commits already in trunk will be ignored and just my new stuff will go in? [05:51] wallyworld: well, it should try to merge the two, hopefully the changes that you brought in from trunk just apply cleanly [05:51] You can try just doing "git merge master --no-commit" [05:51] and see if that works without conflict. [05:52] (or upstream/master, or however you relate to the github.com/juju/juju master branch) [05:52] ok. the rebase workflow i got from rick - that's how he brings in trunk to ensure his work is sane with tip [05:52] what should i use instead? [05:52] wallyworld: "git merge master" is what I would ued [05:52] pull upstream/master maybe [05:52] use [05:52] 'pull' might work, as I think it is just fetch+merge [05:53] ok, will try that [05:53] I'm just not sure if it is also going to change the defaults for "fast forward" [05:53] i really don't understand the love for git [05:56] wallyworld: I don't know if it is love as much as "it does the job, it was popular so everyone jumped on the bandwagon, and switching tools is hard so I always prefer the one I know" [05:56] and *probably* a little bit of stockholm syndrom "this was hard so it must be good" [05:56] wallyworld: good news and bad news [05:56] sure. i wouldn't mind switch if git were better than bzr [05:56] good news: i found the data race [05:56] bad news: its in the upgrade code [05:56] which is probably why you guys can't cut a release [05:57] davecheney: which one? that watcher/session closed? [05:57] wallyworld: this shit takes _so_ long to run, i'm only reporting what I can see [05:57] the more I look, the more i'll find [05:57] ok [05:58] davecheney: upgrade only fails on precise/local [05:58] works on other clouds and series [06:00] * davecheney reaches for table leg [06:00] wallyworld: so a few things that I can concretely say are better: a) git commit is faster for really big trees and lots of history, b) git push/pull logs into github faster than Launchpad, because of LP limitations that I tried to fix, but ran into odd bugs and never got time to finally address, c) the actual transfer times are also a lot faster, d) colocated branches by default are BigDeal(tm) that you could configure Bazaar to work well, but not out of t [06:01] jam1: so i came back to work on this branch after several days. when i went to push, it complained about my branch is behind remote counterpart and to pull. so i did, but there were conflicts which precisely corresponded to the changes i had made locally, and i had to resolve by "accept mine" [06:01] jam1: bzr handles history and file renames better too [06:01] wallyworld: that sounds like a mistaken set of targets for your pushand pull [06:01] wallyworld: bzr's view of history (default in log) is beautiful (IMO) [06:01] *but* [06:01] it is very expensive to compute [06:01] as it is O(allhistory) [06:02] i push/pulled from origin/ where origin is gh.com/wallyworld/juju [06:02] so we paid a lot of user visible performance, and didn't push hard enough for how much better it actually presents history [06:02] sure, but computers are fast [06:02] how fast is fast enough [06:02] wallyworld: I certainly have my bias in that [06:02] but it didn't actually win hearts and minds [06:02] wallyworld: faster than mercurial would be fine :) [06:03] yup :-) [06:03] wallyworld: sorry was out. did you sort the PR issue? [06:03] wallyworld: also, when we had breakpoints like trying to get Mozilla (lost to hg) we were *very* slow because we were using a bad format. [06:03] * axw hasn't read all the history yet [06:03] we fixed that format in the next release [06:03] but too late [06:03] same thing for python's switchover [06:03] axw: it's all screwed. if you look, you'll see my latest commits at the end [06:03] we had an improvement in the works, but it didn't land before they made their decision. [06:03] yeah :-( [06:03] mercurial has a strong advantage that they didn't try to abstract things [06:04] they supported 1 format and focused tightly on it [06:04] git and hg both went with the "sync to local is important, remote support is not" while Bazaar abstracted out "I can treat anything as just another branch" [06:04] which also cost Performance and developer time [06:05] wallyworld: but it means you can "bzr log lp:juju-core" whereas you can't do that with git [06:05] i do like that about bzr [06:05] git only supports sync to local, and then you log, etc locally [06:05] a lot [06:05] yep [06:05] sadface, http://paste.ubuntu.com/7704302/ [06:05] wallyworld: but it means the primatives for log, etc, know that they have a local file they can just mmap, etc. [06:05] indeed [06:06] davecheney: funny, that test never fails in practice [06:07] we have other races in production code i'd be more interested in fixing [06:08] wallyworld: i think it's not a real race, it's just in the cleanup code, like most of our race [06:08] ok [06:09] wallyworld: I suspect you rebased on something other than upstream/master [06:10] axw: i rebased on master (local) [06:10] after pulling in tip from remote master [06:10] jam1: so the pr on github doesn't seem to show the latest diff vs tip of trunk like lp does after you just push shit up [06:11] jam1: because all of the noise in the pr now are actually commits in juju master [06:11] wallyworld: that I don't really know github, it is possible they find the ancestor they want when they start, and then they just stick with that one for the rest of the review [06:11] they should be ignored [06:11] wallyworld: launchpad actually does a merge without committing it, and shows that diff [06:11] which means it can even show you conflicts, etc. [06:11] i suspect you are right which makes me very sad [06:11] jam1 wallyworld: yep, ancestor only for the initial diff AFAIK [06:12] vs just "diff from common ancestor" [06:12] that sucks balls [06:12] really [06:12] how do so many people work that way? [06:12] makes it very hard to have work in progress [06:13] wallyworld: do you want to have a hangout and screen share to fix it? [06:13] ok [06:13] brb [06:15] wallyworld: in the tanzanite hangout [06:43] axw: changes pushed [06:44] wallyworld: cool, looks happier now [06:44] nfi what happened before though [06:44] axw: still had to do a push -f even the second time [06:44] wallyworld: yeah because it failed to push before [06:44] wallyworld: every time you rewrite history you have to do that [06:45] force push that is. you can only push without force if previously pushed history is unchangd [06:45] wallyworld: axw: who worked on "consider retry loop for failing direct db operations" ? [06:45] It looks like a card your team would have worked on, but nobody is assigned [06:45] jam1: no one yet [06:45] wallyworld: it was in the 'merged' column as of last week [06:46] when I moved everything from merged into the archive [06:46] wallyworld: should it be pulled out somewhere? [06:46] that wasn't intentional [06:46] wallyworld: ok, put it in your todo then? [06:46] accidental, shoul be in backlog or deleted I think [06:46] i think it can be deleted now [06:46] no need for it atm [06:46] wallyworld: and I'm pretty sure menn0 was the one who worked on "Show relation name in status output", corect? [06:46] bug #1194481 [06:46] <_mup_> Bug #1194481: Can't determine which relation is in error from status [06:46] yep [06:47] i think so [06:47] wallyworld: is there a user for "unit tests fail on utopic" ? [06:48] jam1: is that a completed card? [06:48] i fixed a couple of thise [06:48] bug #1325072 [06:48] <_mup_> Bug #1325072: unit tests fail on utopic [06:49] wallyworld: that is from "Week Ending June 6" [06:49] sounds right [06:50] wallyworld: k, I'm writing a script that pulls out stuff like velocity via the Kanban API and its showing some holes in our old labels [06:50] nothing too bad, and I probably won't worry much farther back [06:50] ok [06:51] bug #1281394 [06:51] <_mup_> Bug #1281394: uniter failed to run non-existant config-changed hook [06:52] wallyworld: you changed the name of the result error but didn't change the defers [06:52] oh ffs, sigh [06:52] will fix [06:55] axw: done [06:56] wallyworld: thanks, reviewed [06:56] thank you [06:56] was a good review [06:59] http://paste.ubuntu.com/7704476/ [06:59] i'm trying to fix the race in the upgrade test [06:59] but now it fails constantly on the safety check i put in [07:02] goroutine 4930 [sleep]: [07:02] time.Sleep(0xdf8475800) /home/dfc/go/src/pkg/runtime/time.goc:39 +0x31 [07:02] github.com/juju/juju/state/api.(*State).heartbeatMonitor(0xc20822d5e0, 0xdf8475800) /home/dfc/src/github.com/juju/juju/state/api/apiclient.go:264 +0x66 [07:02] created by github.com/juju/juju/state/api.Open /home/dfc/src/github.com/juju/juju/state/api/apiclient.go:196 +0xae3 [07:02] we leak a shitload of these goroutines [07:02] in the tests [07:02] * davecheney creates issue [07:06] jam1: i'm off to soccer, but maybe you could get someone to look at why we continue to have very limited success with CI passing the local upgrade test only on precise. the latest machine-0 log from the failed test shows nothing obvious to me - previously there were errors in the log which showed why api server on port 17070 didn't start. only thing i can see is an apt get of a mongo-server package in the middle of the re-start after [07:06] upgrade initiated. could just be log interleaving, not sure. here's a link to the latest failing job from which machine-o log can be got http://juju-ci.vapour.ws:8080/job/local-upgrade-precise-amd64/1436/ [07:07] curtis considers this a release blocker [07:08] the apt get mongo-server thing does look like the only suspicious thing i can see that may be different to trusty [07:10] wtf... 10 minutes ago leankit was reporting 700 cards, it now only reports 225. it just decided that our archive was old enough it could throw it away.... ? [07:10] wallyworld: k, I'll try to give it a look [07:11] jam1: there must be some cutoff [07:11] and many of those cards are OLD [07:11] many of them date back to Atlanta [07:11] davecheney: sure, but 10 minutes ago it gave me 700, I don't think we crossed the threshold in 10 mins [07:12] jam1: dunno, just trying to help [07:12] i'm probably not helping [07:12] jam1: i was thinking you'd delegate to someone [07:12] wallyworld: I'm pretty good at debugging stuff like this, so I'll at least give it a shot. [07:13] ok [07:14] jam1: frustratingly it passes sometimes [07:14] wallyworld: well if it is a racy install of stuff, and sometimes we manage to install first [07:14] jam1: yeah, i didn't get to look to see at what stage we apt installed, i only just looked atthe log [07:25] wallyworld: "2014-06-26 06:26:41 INFO juju.state open.go:337 found existing state servers [] [07:25] " [07:25] sounds problematic... [07:26] erk [07:27] I don't know that it is the specific problem, that is in "cloud-init" so maybe no servers are available during the first connect, but it does seem weird. [07:42] morning === vladk is now known as vladk|offline [08:21] wallyworld, jam1: I am off to get some sleep. I will release the blessed revision from this page, http://juju-ci.vapour.ws:8080/job/revision-results/ it will probably be a8f48d14 because I don't believe trunk will get better in a few hours [08:34] wallyworld, jam1 The rev I forced CI to test will pass, though I still believe local precise upgrades are dodegy [08:35] sinzui: so you think tip will pass, but we still should be investigating getting reliable P upgrades, right? [08:36] jam1 I didn't test tip. I tested an older rev that was skipped [08:36] sinzui: do you mean a8f48d14 [08:36] or something else? as I don't see any other revs being tested in "revision-resultS" [08:37] sinzui: the current loacl-upgrade-precise-amd64 is still blinking red, afaict [08:37] jam1 I tested 1d57f52 [08:37] sinzui: http://juju-ci.vapour.ws:8080/job/local-upgrade-precise-amd64/ shows that rev as failing 3 times [08:37] Jam1 yes I am waiting for the destroy-env to complete http://juju-ci.vapour.ws:8080/job/local-upgrade-precise-amd64/1439/console [08:37] ^ that is a pass [08:38] but I dare note hurry destroy-env for fear that the act will cause an error [08:38] sinzui: so blinking red is because it was red in the past but is running now? [08:39] jam1 yes [08:39] not obvious [08:39] lxc-destroy is taking forever [08:40] apologies for the cross-post, but has anyone ever seen a bug where juju confuses what machines are which machine numbers? [08:41] mivtachyahu, I haven't seen that before [08:42] I've come into work this morning to find that all the servers are jumbled up, ie what was machine 7 yesterday is machine 12 today [08:43] mivtachyahu, yes, that happen, machine numbers cannot be reused. so a number given to a machine that is added them removed also removed the number forever [08:44] ah, no, you misunderstand, 7 is now 12, 12 is 17, 17 is now 8, 8 is now 7, they're jumbled, not removed. [08:44] (those numbers illustrative, I've not mapped which machines are actually which) [08:44] juju-ci's hiest machine number is 52, but there are only 10 active machines [08:45] mivtachyahu, that is mad. How do you know they are jumbled? the ip addresses? [08:46] wallyworld, jam1, all the circles are blue http://juju-ci.vapour.ws:8080/ [08:46] when I juju ssh they have the wrong contents, when I issue a juju status, the units are showing on the correct machine *numbers*, but the public-addresses have changed. [08:46] sinzui: so we still have a chance for trunk tip if we get fixes, but we expect to release 1d57f52 [08:47] jam1. You do. so CI has about 4 hours to work [08:48] mivtachyahu: which version of juju, and which provider type? [08:48] juju 1.18.1 and on azure. [08:51] ok, nothing comes to mind. if it were in the 1.19 series then I'd be blaming availability sets because service units get a single load balanced IP [08:56] wallyworld, jam1. I reopened https://bugs.launchpad.net/juju-core/+bug/1334493 because juju doesn't execute after it is compiled on windows [08:56] <_mup_> Bug #1334493: Cannot compile win client [08:57] * sinzui tries to rebuild and hopes for the best [08:59] wallyworld: AFAICT, the Azure vhds cannot be reused. each one is a disk image for a separate VM instance, like you'd have if you were running VMs in VMWare or VirtualBox [08:59] wallyworld: i.e. they're not pristine OS images, but VM disks [09:00] wallyworld: going to move that card to "done" [09:00] axw, thank yo for investing that. [09:00] sinzui: nps [09:01] sinzui: I think we used to leak those VHDs because we were using a more error prone method of deleting disks before [09:01] sinzui: I switched the code over to using an API that deletes all associated disks when we terminate VMs [09:01] I think it's only in the 1.19 series tho [09:02] axw, the offcial api didn't let you delete them when you deleted disks until a few months ago [09:02] I had to upgrade the libraries we use to delete them [09:04] well, good news, my weird bug has fixed itself. :) [09:05] weird indeed. mivtachyahu if you stumble across the steps to reproduce the issue, please file a bug (or ping someone in here to do so) [09:06] will do [09:11] davecheney: this is committed, right? https://bugs.launchpad.net/juju-core/+bug/1334500 [09:11] <_mup_> Bug #1334500: state/apiserver: more data races [09:21] axw: yes, committed [09:21] sorry, i didn't update the status [09:22] nps [09:36] axw: I believe it is, but due to the revision that sinzui was actually able to get to pass CI, it probably won't be in 1.19.4 [09:36] davecheney: fwiw, we really don't need a get+set operation, just a simple mutexed get that will populate the cached value if it is empty would have been a better fit. [09:37] jam: i didn't want to hold the lock over that other operation [09:41] davecheney: given the whole point is that it is just a cache, I don't think we want to trigger the operation 2x while getting it. but it isn't like it is a big deal. [09:44] davecheney: I'm gonna have a look at the leaking heartbeat goroutine bug [09:44] seems to be a bunch of api.Opens without corresponding Closes. [09:46] axw: we seem to do that a fair bit in the test suite, I've caught a few in the past. [09:46] Once you have a couple Copy & Paste helps ensure it spreads :) [09:47] axw: in other code bases we had things like "ensure 0 threads are running when a test ends" [09:47] yeah, that'd be nice [09:47] perhaps we should add that to the base suite's TearDownTest [09:48] axw: well, we could try to move towards that, I think we'd find a lot of problems to start with. [09:48] axw: I also don't know if golang gives you a great view of "what is running", but probably it does somewhere [09:49] dimitern: TheMue: vladk|offline: just a reminder that we're skipping our daily standup for the team standup in 10 min [09:50] jam: we could compare runtime.NumGoroutine() before/after test run. I expect you're right and it'd be painful initially [09:50] jam, ok [09:51] axw: so for threads we compared the set of thread ids at start and end. [09:51] axw: so set(at_end) - set(at_beginning) must be empty [09:53] axw: though my google-fu says "you can't get a list of all running goroutines" [09:54] I know that you can, given that panic can print it out, but I imagine using that trick would be really really bad :) [09:55] yeah. it would be nice to compare sets, but I think just comparing size would be good enough [09:55] axw: (runtime.Stack(, all=true) and then parsing that for what is running) [09:55] morning everyone [09:56] axw: main problem with just doing the count, is that it sometimes passes accidentally, and it still doesn't give you any information about what is running that shouldn't be that you need to go fix. [09:56] In that respect, the runtime.Stack() method actually isn't terrible, as you could print out "these goroutine stacks are running and probably shouldn't be) [09:56] jam: true, though in that case you could just dump runtime.Stack(..., all) [09:56] axw: or as I was pointing out, you could just use Stack(…,all) and use that for set difference [09:57] yes, I suppose you could compare entry points [09:59] TheMue: just a reminder you're OCR today [10:00] jam: sure, already done first [10:00] first ones [10:00] TheMue: great [10:00] jam: made a calendar entry for it to not forget it ;) [10:01] TheMue: :), team standup onw [10:02] jam: yeah, here also my calendar reminded me === vladk|offline is now known as vladk === vladk is now known as vladk|offline === vladk|offline is now known as vladk === vladk is now known as vladk|offline [11:03] afk for lunch === vladk|offline is now known as vladk [11:33] dimitern: ping [11:35] dimitern: I created WatchInterfaces, My current problem is that it's impossible now to add network interfaces after they were provisioned. [11:35] I can remove this check from machine.go, but this breaks some tests. Otherwise, I can [11:36] dimitern: Otherwise, I can't test the watcher when I add the network interface [11:36] vladk, there was a slight change [11:37] dimitern: what do you mean? [11:37] vladk, jam, fwereade and i discussed and we can use a notifywatcher instead of a stringswatcher [11:37] vladk, jam, for the network interfaces [11:37] vladk, jam, that way we don't need to care about tags for interfaces [11:39] vladk, as for your question, you'll need to change AddNetworkInterface slightly, so it doesn't fail when the machine is provisioned [11:39] dimitern: this breaks some of the tests, so I need to fix them, too [11:39] vladk, i.e. assertAliveAndNotProvisioned becomes aliveDoc, and the if m.doc.Nonce != "" needs to go [11:40] vladk, yep, naturally [11:40] dimitern: should I change stringswatcher to notifywatcher? [11:42] vladk, yes [11:43] vladk, i'm updating the model doc today to reflect what we discussed [11:43] vladk, that's the only thing affecting your work now [11:43] dimitern: may I do PR with stringswatcher to get a quick feedback and change it lately? [11:44] vladk, of course [11:44] thanks [11:54] dimitern: please, review https://github.com/juju/juju/pull/169 === vladk is now known as vladk|offline === vladk|offline is now known as vladk [12:00] mgz: i'm still in a meeting, i'll ping you soon for 1:1 [12:01] wallyworld: sure, I'll hang out there for when you arrive [12:01] be there soon === vladk is now known as vladk|offline === eagles0513875 is now known as greenrice === greenrice is now known as eagles0513875 [12:49] trivial update to dependencies.tsv, anyone? https://github.com/juju/juju/pull/177 [12:49] fwereade, dimitern, mgz, natefinch, wwitzel3: ^ === urulama is now known as uru-food [12:54] * rogpeppe2 thinks it's trivial enough to just merge anyway [12:54] * rogpeppe2 does that [12:54] rogpeppe2: taking a look [12:55] rogpeppe2: argh, too quick [12:55] rogpeppe2: ;) [12:55] TheMue: that's ok - there's not exactly much to review... [12:55] vladk|offline: made some comments [12:56] rogpeppe2: have to compare this nice number to available revisions :D [12:56] TheMue: the 'bot will complain if it doesn't work... [12:56] rogpeppe2: taedd? (trial-and-error driven development) [12:57] TheMue: with changes that simple, it seems reasonable to me [12:57] rogpeppe2: yep [13:04] natefinch, I'll be with you soon === vladk|offline is now known as vladk === uru-food is now known as urulama [13:33] Greetings juju-core. There was an LXC update this morning that wipes mount fstype=rpc_pipefs, if i recall correctly this causes problems with containers does it not? [13:34] http://i.imgur.com/tjSkSG6.png [13:37] rogpeppe2: seen that the merge failed? [13:37] TheMue: no i hadn't. thanks [13:37] rogpeppe2: yw [13:38] * rogpeppe2 wants to work out a decent way to get an obvious warning when a merge fails [13:40] +1 [13:40] oh bugger, it's been changed to break the API [13:40] i'm stuffed now [13:41] because the new charm changes require the new names package [13:41] * rogpeppe2 wonders why all those tag changes needed to happen === rogpeppe2 is now known as rogpeppe [13:44] hmm, i guess i'll just hack around the issue for now [13:44] wrt my question above, here's a bug that was filed that shows the behavior: https://bugs.launchpad.net/juju-core/+bug/1319525 [13:44] <_mup_> Bug #1319525: juju-local LXC containers hang due to AppArmor denial of rpc_pipefs mount with local charms [13:44] rogpeppe: "for now"® [13:46] is there anything in place to tell a machine "hey, apiserver and stateserver have changed" ? [13:47] perrito666: the state server addresses should change [13:47] perrito666: and they can be watched [13:48] rogpeppe: come again please, I cannot join those two things you just said into something I understand [13:48] perrito666: :) [13:48] perrito666: what are you trying to do? [13:48] rogpeppe: restore ;) [13:49] current restore ssh's into all of the agents and runs a sed script to change apiadresses and stateaddress [13:49] I really would like to do something prettier [13:49] perrito666: well, i think ssh'ing in is probably the only option [13:50] perrito666: but what you do *when* you've ssh'd in could be prettier [13:50] perrito666: you could add a jujud subcommand which updates the addresses in the agent.conf file [13:50] perrito666: and then invoke that from the ssh command [13:52] Does anyone know why LXC would give up on round robin dns assignment? I have evidence here it has done so: http://pastebin.ubuntu.com/7705971/ [13:53] rogpeppe: it will have to do ¯\_(ツ)_/¯ [13:53] perrito666: what kind of thing would you *like* to be able to do? [13:55] rogpeppe: well I think that your idea pretty much sums what I would like to be able to do, perhaps wrapped, something like having agents listen for "control commands" and a mechanism to issue those, I think I have some bias for working too much with embedded devices :p === vladk is now known as vladk|offline [14:07] perrito666: it wouldn't be too hard to get agents to listen on a local socket for control commands [14:07] mgz, 1.19.4 will be the revision you created. CI had skipped it for a new rev yesterday. I made CI test just your rev to get a pass [14:07] mgz: I am very interest in your work to play unittests in lxc [14:07] perrito666: but that does mean the agent has to be up and running at the moment you're doing the restore [14:08] morning all [14:08] rogpeppe: well restore always assumed the agents are up [14:08] perrito666: really? [14:09] perrito666: how so? [14:11] rogpeppe: well, the script that runs on all machines does: [14:11] 450 initctl stop jujud-$agent [14:12] which would fail and exit the script if jujud-$agent was not up [14:12] perrito666: that'll work ok if the agent is already stopped though, won't it? [14:12] perrito666: oh really - i thought initctl stop was idempotent [14:12] perrito666: that's a bug then [14:12] perrito666: blame me :-) [14:12] sinzui: ace, thanks - I did reland the change, so will keep an eye on the job as well [14:13] rogpeppe: :) oh, then I un asume that [14:14] perrito666: no, you're right [14:15] i wonder if there's a way to tell initctl to stop a service only if it's already running [14:15] rogpeppe: || true [14:15] perrito666: ha ha [14:16] perrito666: that's indeed the simplest solution, though not great [14:16] perrito666: better would be to test the output of initctl status first, i think [14:16] rogpeppe: you would have to check status I guess [14:16] perrito666: yeah [14:16] returns stop/waiting or sth like that when not started [14:42] https://bugs.launchpad.net/juju-core/+bug/1334683 [14:42] <_mup_> Bug #1334683: juju machine numbers being incorrectly assigned [14:42] has anyone seen this before? [14:43] it's affecting someone in production [14:49] jcastro: looking [14:49] they are early adopters, so any help you can lend would be <3 [14:49] jcastro, what version of juju did they hit that bug? === makyo_ is now known as Makyo [14:50] I wonder if this is azure being wacky [14:50] I'll ask them to update the bug [14:50] alexisb: looks like 1.18.1 [14:51] jcastro, wallyworld's team will be tackling azure issues this cycle, this may be one of them [14:51] ^^ just an fyi [14:52] rock and roll! [14:55] jcastro: updating to 1.18.4 couldn't hurt [15:09] natefinch, do you have a few minutes to review https://github.com/juju/juju/pull/178 [15:10] sinzui: what happent with 1.19.5? [15:11] perrito666, well. we really want 1.20.0, though my scripts want to make 1.19.5. We will create a stable 1.20 branch and let master think it is 1.20.0 [15:12] sinzui: that explains the commit message which says something very different from the actual patch [15:13] perrito666, natefinch yep. I realised that if I make the branch 1.19.5, I need to land another branch next week to get the version right for june 30 [15:13] maybe I am wrong [15:15] sinzui: perhaps I am about to say something sinful but, wouldn't it be nice if you re-wrote a bit the past so that commit message says the right thing? [15:15] perrito666, I was thinking something a little different, but it also means retracking the PR [15:16] sinzui: if you use git ammend then push it should look as if this little mistake never happent [15:17] sinzui: LGTM [15:17] perrito666, I need to fork at juju-1.19.4 to create stable 1.20 branch. I think need to merge a branch into both devel and stable that sets the version. stable branch will want to be 1.20.0 and I will merge select revs from devel into it. Maybe devel needs to be 1.20-alpha to indicate it is devel [15:18] ^ natefinch maybe I want to do something different because I need a stable branch and juju will switch to the new version rules [15:20] perrito666, natefinch and the *next* unstable version that thumper and I discussed would be 1.21-alpha1 [15:20] * sinzui delete PR === vladk|offline is now known as vladk === vladk is now known as vladk|offline [17:00] natefinch: are we doing standup now? [17:06] ericsnow: I can't, sorry. Probably will have to be very late today if at all. I have to take my daughter to her 1 year checkup in an hour. [17:06] natefinch: no worries [17:06] Let's shoot for 3.5 hours for now, hopefully I'll be back in and working [17:08] natefinch: 3.5hours from now? [17:12] natefinch: sounds good [17:19] perrito666: from now, yeah, sorry [17:19] natefinch: I think Ill be around [17:24] natefinch, Do we need more time scheduled with gsamfira and team for the workload stuff? [17:26] wwitzel3, ping [17:27] alexisb: probably.... it's been slow going. Good, but not fast [17:28] natefinch, ok, I will put an hour on the calendar for tomorrow, then we can discuss if we want to do a few days next week [17:30] alexisb: I'm on vacation next week :/ [17:31] crap thats right [17:31] they're actually doing well, so it might not be so bad [17:31] heh ok I will schedule I bit more time tomorroe then [17:31] and then we can exit with a game plan while you are gone [18:25] hi i'm trying to bootstrap an environment on azure and it is not coming up. all-machines.log shows that 'machiner' cannot set the machine address and it is constantly restarting: http://paste.ubuntu.com/7707189/ [18:25] is this situation recoverable? === BradCrittenden is now known as bac [20:21] sinzui: Ill lgtm if you promise me that you took care of the extra step that broke things the other time :) [20:22] perrito666, I am pondering those same consequences for my inc-1.20-alpha1 branch [20:24] perrito666, We change the transition number to 1.19.9...but I don't think we can land a version change to 1.20-alpha1 until after 1.20.0 is release. 1.18.x throws a wobbly when we ask it up upgrade to a version with alphas [20:25] * sinzui ponders 1.19.5 for master until 1.20.0 is released [20:47] ericsnow: [20:47] news about nate? [20:48] perrito666: nope [20:49] ericsnow: he is not in the hangout [20:49] perrito666: yeah, not on IRC either [20:50] he most likely fell on the netsplit [20:53] sinzui: do you have much juju/azure experience? [20:53] sinzui: you got lgtmd [20:53] thank you perrito666 [20:53] bac I have a lot of janitorial azure experience === Guest8558 is now known as wallyworld [21:35] sinzui: hi, you finally got a rev to release :-) with the 1.20 branch you want to create off master, will CI be able to run tests for both the release candidate branch and trunk? will you set up a jenkins slave to test our future RC branches as well as trunk? [21:40] wallyworld CI knows how to watch any bzr or git branch [21:45] wallyworld CI knows how to watch any bzr or git branch [21:45] wallyworld_, will the lander/git-merge-juju work with a non-master branch? [21:46] I have a merge ready to try when we want [21:46] wallyworld_, also I have built a juju env from 3 clouds, a private vpn, and have some neigh impossible archs http://juju-ci.vapour.ws:8080/computer/ [21:47] I think I can now afford to be sick and get rest [21:53] is there a problem upgrading my 14.04 ubuntu that I use for develoment to go 1.3 ? [22:00] jcw4, You will discover the 1.2-1.3 bugs faster than CI's gccgo testing will report [22:00] hehe [22:01] that's what I was afraid of [22:01] does juju have a 'support matrix' of which versions of Go are supported on which platforms? [22:01] jcw4, OSX appears to be building with 1.3. It was disconcerting to see since I don't have osx hardware to test with. [22:01] sinzui: the lander should handle a non master branch - i'll confirm with martin [22:02] sinzui: I had a hard time getting all the tests to work (go 1.2) on osx [22:02] jcw4, We are officially 1.2 on all OSes for all series...except ubuntu doesn't officially provide 1.2 for precise [22:02] sinzui: I see [22:03] wallyworld_, do I not have $$merge$$ special powers? I thought my inc of master to 1.19.5 woul work [22:04] sinzui: anyone in juju team should be able to type $$merge$$, did it not work? [22:04] jcw4, you added series (maverick) support to the version name? I was pleased to see that in my test of that today [22:04] wallyworld_: ^^ [22:04] I may be impatient [22:05] jcw4: ? [22:05] sinzui addressed that comment to me but I think it was intended for you wallyworld_ ? [22:06] i didn't add maverick support, not sure why we did sice maverick is EOL [22:06] isn't it? [22:06] jcw4, no you. I was surprised to not see unknown when I bootstrapped today with an osx client [22:06] sorry osx mavericks [22:06] sinzui: you are right, the lander has not picked up your $$merge$$, i'll look into it [22:07] oh; no :-( [22:07] sinzui: I don't even know how to do that yet :) [22:07] sinzui: just to confirm - you created the 1.20 branch off the rev used to cut 1.19.4, right? [22:08] thats okay, I am EOD now. no 19 hours days now that I have a release to create stable from. and I have an army of slaves to do my bidding [22:08] wallyworld_, I sure did [22:08] awesome [22:08] i'll inc the evrsion number to 1.20 also if it hasn't been done [22:09] sinzui: you have indeed been working too hard, you need to got rest and get better, perhaps with a glass of red [22:09] :) [22:10] I will call that medicine for my soar throat. a cough suppressant [22:10] sinzui: i'll add a lander job to look at and land stuff off the 1.20 branch [22:11] sinzui: so maybe tomorrow when you come in to work you can then hook 1.20 up to CI [22:12] wallyworld_, I will be visiting mgz tomorrow. Now that I have all my slaves, I want to run unit tests in lxc on them. I think that will take 30-40 minutes off the time it take CI to run unittest, build packages, and test local provider [22:12] \o/ === alexisb is now known as alexisb_afk [22:32] wallyworld_, I just added 1.20 to the list of branches to test. Ci is testing it now. [22:32] sinzui: you are f*cking amazing [22:32] wallyworld_, +1 to that :) === Guest28217 is now known as wallyworld [22:40] * perrito666 needs to autodocument his code because he is loosing track of it [22:42] sinzui: is there a separate dashboar for 1.20 vs trunk? [22:42] wallyworld, no, sorry [22:42] that's ok just wondering [22:43] so how do you see that 1.20 vs trunk is ok? [22:51] thumper: can you ping me after your standup? [23:09] davecheney: standup take two [23:13] waigani: rightou [23:26] wallyworld: is the tree open or closed ? [23:26] open, we've created a separate 1.20 branch [23:27] on my todo list to send email [23:43] wallyworld: ping [23:43] thumper: hey, have you seen this issue https://launchpad.net/bugs/1329051 [23:43] <_mup_> Bug #1329051: local charm deployment fails on "git not found" due to wrong apt proxy [23:43] wrong proxy being used inside lxc [23:44] no [23:44] ok, it seems Juju uses the apt_proxy setting from host machine when setting up proxy inside lxc [23:44] which is wrong [23:45] i'll schedule for next stable milestone [23:46] yes, we do just blindly use the apt proxy of the host [23:50] ok, seems like a legit issue then