[00:57] anyone know how the vlan work is going on, particularly testing against maas? [00:57] bigjools: dimiter is leading that AFAIK [00:58] bigjools: but, no, I have no idea [00:58] thumper: ok thanks. I had not heard anything and, well, it's kinda late to make changes now, [00:59] * thumper nods [00:59] AH HA! [00:59] oh, no mwhudson here [00:59] i've figured out why some tests fail to compile on gccgo [00:59] davecheney: mwhudson is on leave for the start of this week [00:59] * davecheney works on smaller repro [01:01] nah, it's a bug when compiing a package for test that has both internal, package agent, and external, package agent_test suites [01:01] well, that is my hypothesis [01:01] now to make a smaller repro [01:01] ah... [01:02] it's really hard to pin down [01:02] delete either bootstrap_test.go -or- package_test.go in juju-core/agent [01:03] and the test binary builds [01:03] the common thing between them is they are both external test files [01:45] http://paste.ubuntu.com/7149183/ [01:45] http://paste.ubuntu.com/7149202/ [01:45] thumper: strong correlation between duplicate symbols and those with coverage [01:45] i need to take a walk and think about this [02:29] * thumper goes to make a coffee [02:40] YES! [02:40] i have a reproductoin [02:41] * davecheney blushes [02:54] thumper: https://code.google.com/p/go/issues/detail?id=7627 [02:54] victorious [02:56] \o/ [03:11] thumper: one-liner (excluding test) https://codereview.appspot.com/79620043/ [03:11] including, two-liner ;p [03:29] axw: and I assume you have tested locally? [03:29] thumper: yup [03:30] thumper: modified sudoers to prevent setting env vars, and ensured it failed then worked [03:30] axw: I almost have a test for the mongo service cleanup [03:30] thumper: cool [03:30] cool [03:32] yikes, I chewed through 7GB yesterday installing MAAS [03:32] forgot to tell it to not download the Internet [03:35] axw: yikes, was any of that unmetered ? [03:35] from a local mirror ? [03:36] davecheney: nope. but it's okay, I use very little of my quota [03:36] ow. [03:36] suggested daily usage: 24G ;) [03:41] daily? [03:41] geez === vladk|offline is now known as vladk === Ursinha is now known as Ursinha-afk === Ursinha-afk is now known as Ursinha [03:45] axw: store test failed :-( [03:45] le sigh [03:45] thanks [03:46] fucking kick that thing out [03:46] why do we keep having this discussion [03:46] month after month [04:00] davecheney: because we like punishing ourselves [04:07] thumper: is this the right room for an argument [04:07] davecheney: it depends [04:08] argument about what? [04:08] thumper: doing things many times to keep seeing if they hurt [04:15] axw: just pushing the test branch now [04:16] thumper: thanks [04:16] I'll go and help make dinner, and come back to check on the review a little later [04:16] after I have proposed that is [04:16] axw: and yours merged \o/ [04:17] thumper: I'm going to block SIGINT right up in cmd/juju/bootstrap.go - but now I need to make it possible to cancel tools upload [04:17] thumper: hooray [04:17] hmm... [04:18] cos otherwise you could cancel during upload and you'll also have a broken .jenv file [04:19] why do we create an empty .jenv so early? [04:19] ^ i have asked myself that question. [04:19] axw: don't forget you can't unhook a signal handler [04:20] also, sorry for butting in. Just saw something that made me extremely happy. [04:20] axw: https://codereview.appspot.com/79690043 [04:20] davecheney: don't need to in this case, but you can ...? os/signal.Stop [04:20] thumper lazyPower: yeah I kinda think we should not write it out until things are working [04:21] however we do want to still catch the interrupt so we can clean up [04:21] I dont think its a problem if there's a cleanup on exception. Thats the part that gets me, it leaves traces behind that prevent normal users from bootstrapping and they dont understand why they have to delete it. [04:22] especially now we tell them they can use ^C to undo a partial boostrap [04:22] the inverse of that - the more I think about it - is what exception is ok to remove it? I can see someone trying to bootstrap an env and wiping out their jenv of a working deployment.. [04:23] which is exponentially worse [04:23] lazyPower: i don' think it works like that [04:23] if .jenv exits [04:23] lazyPower: .jenv is only removed if created for the bootstrap [04:23] exists [04:23] then the environment is bootstrapped [04:23] what davecheney said [04:23] cogito ego bootstrap [04:23] axw: I haven't done the 'remove the rsyslog file if it exists', perhaps I should add that... [04:23] any way, dinner cooking time [04:24] thumper: would be good. I will review what you've got so far anyway. [04:24] later === thumper is now known as thumper-cooking [04:25] ahhh ok. talking about completely different phases of the bootstrap process. [04:25] Some day i'll dig through the juju-core source and try to make sense of this "go" thing you guys are having so much fun with === vladk is now known as vladk|offline [07:49] mornin' all [07:57] jam1: i'm not quite sure how your Worker.Running suggestion solves the problem [07:58] jam1: AFAICS there's a dependency between two workers - we must only start the API worker after the localstorage worker is ready. [08:00] morning [08:00] dimitern: hiya [08:01] rogpeppe, jam is proposing a mechanism for communicating that dependency [08:01] fwereade: right [08:01] rogpeppe, jam, I'm not really that keen though -- do we have use cases for that that *aren't* localstorage? because localstorage is a bit of a hack [08:01] rogpeppe: right now, we don't have a way to detect that the local storage worker is ready [08:01] fwereade: we shouldn't start the environ provisioner until the API server is ready [08:01] we do it [08:02] and just bounce [08:02] firewaller, etc. [08:02] we have a "startAfterUpgrader" [08:02] that is our hack to handle doing upgrades before running the other workers. [08:02] but it isn't generic in any fashion. [08:02] fwereade: in that particular case we want to start after the other one is stopped [08:02] i don't really see how the proposed mechanism would work for communicating the dependency, though i'm probably just being stupid here [08:02] but it is still a case of "don't start all Workers simultaneously and hope for the best" [08:03] rogpeppe: so it doesn't actually do the dependency [08:03] but right now, we don't even have visibility [08:03] rogpeppe: we know when something has been "started" but not when it would actually be able to respond to requests [08:03] jam: i don't see how you'd use the mechanism in this case to achieve the desired goal [08:04] rogpeppe: so in my first cut, it would be that Bootstrap can wait until all services claim to be Running() [08:04] jam: Bootstrap? [08:04] rogpeppe: so it doesn't actually handle dependencies, except that all dependencies must be met if everything is running. [08:04] rogpeppe: "juju bootstrap" would only return once we determine all services have gotten to Running() [08:04] so we could add an API for "are you Running()" [08:04] which reports that info [08:05] bootstrap starts, ssh's in, sets everything up, then waits to connect to the API, and waits for Running() to return true [08:05] Client.AllServicesRunning() or whatever. [08:05] jam, I worry that that would be both tricky and brittle [08:05] jam: oh, i see, this is just about bootstrap [08:05] fwereade: worse than the race conditions we have today ? [08:05] rogpeppe: well, *today* we have a clear race condition [08:06] but we have a general pattern [08:06] jam: but this could be a problem at other times too - any time the machine agent bounces [08:06] of assuming that "if I start X, it is ready" [08:06] which isn't true. [08:06] jam, I don't think that's ever really a sane assumption to make, and I'm not sure that waiting for self-reported readiness makes it much better [08:07] fwereade: so alternatives are that we put retries at all places where we think something might be starting/restarting/etc. [08:07] which is a lot of whack-a-mole (from what I can tell) [08:08] jam: alternatively we could add explicit support to localstorage for determining when that it's running [08:09] rogpeppe: sure, but we have the same thing with provisioner and api worker [08:09] so it isn't *that* special of a case [08:09] jam: do we? [08:09] rogpeppe: environ provisioner bounces until the API server is up [08:09] which is "ok" I guess [08:09] its better than when we restarted the whole thnig [08:09] jam: i don't think that's user-visible [08:09] so you kept racing until it won [08:10] rogpeppe: note, I'm not strictly suggesting that this is high priority, but it does feel like we are missing a pattern of being able to express and take action based on dependencies [08:10] jam: i see how your suggestion allows bootstrap itself to wait until everything has started, but i don't see how it could be used in a more general way [08:11] rogpeppe: so if you wanted to express "don't start the API Server until local storage is ready" [08:11] how do you know when local storage is ready [08:11] if it doesn't give a way to tell you ? [08:11] similarly [08:11] don't start env provisioner until api server is ready [08:11] you can do it bespoke each time [08:11] so the env provisioner just tries to connect, and bounces until it succeeds [08:12] the api server would try to connect to port 8040 and bounce until it can [08:12] etc. [08:12] jam: i think i would add a mechanism (not in worker.Runner) to individual workers to allow them to express readiness [08:13] jam: part of the problem is that there's not a good definition of "readiness" - for example the API server could come up and then promply go down again [08:14] rogpeppe: that is true, being able to handle transient failures is nice, though having the common cases just work is a good thing, too. === vladk|offline is now known as vladk [08:14] we *know* about start-the-world [08:14] it happens at least as often as things dying out of band [08:14] jam: but i think we could work around that in an approximate way, by saying that we'll start as soon as we know it's been up very recently [08:14] good morning [08:14] morning vladk [08:18] rogpeppe: so it is entirely possible to fix the current race by just adding a retry in AddCharm if Put fails, or even put it in the local Storage implementation that retries a Put() that fails. [08:18] I was bringing up the larger discussion because we've had a couple cases where workers depend on eachother [08:18] in ways that it feels would be good to actually model. [08:18] jam: yeah, i think it's a good discussion to bring up [08:19] jam: i'm just putting together a possible way that it might be done [08:20] rogpeppe: so "add to individual workers" doesn't seem that different from having some workers implement a Running() method. [08:20] (or Ready(), Started(), whatever else we would want to call it) [08:20] it is still difficult to define the "I want X to wait for Y" [08:20] jam: i haven't yet seen how that method allows dependencies between workers [08:20] rogpeppe: necessary but not sufficient. [08:21] as in, you have to know when something is ready before you could react to it. [08:21] jam: because workers don't have access to the Worker types created for other workers [08:21] jam, rogpeppe: I am in general in favour of the idea that workers should themselves wait for their dependencies, rather than complicating the decisions about when to start workers [08:21] an alternative is a generic event bus and things starting up fire events that others can chose to listen to. [08:21] fwereade: in a Poll and bounce method, or in a "wait for an event" style ? [08:22] fwereade: i'm not sure about that [08:22] jam, I think it's probably situational [08:22] fwereade: i don't think that, for example, the API server should know about the possibility of a localstorage worker that it maybe has to wait for [08:22] fwereade, hey [08:22] rogpeppe: right, I would want the localstorage to indicate the API server should wait, rather than the other way around. [08:23] rogpeppe, well we shouldn't have a localstorage worker anyway, which is why I'm nervous about this -- the motivating case doesn't seem like a good argument for anything [08:23] dimitern, heyhey [08:23] fwereade: don't we end up with *something* listening even if we have gridfs storage? [08:23] fwereade, re your comment on https://codereview.appspot.com/79250044/diff/20001/state/compat_test.go - do you mean readNetworks should return []string{nil}, []string{nil}, nil (networks and error) when mgo reports NotFound ? [08:24] jam, certainly we do, but doesn't that want to be part of the API server, like the charm put api etc? [08:24] dimitern, I think so, yes [08:24] fwereade: we have UploadTools and AddCharm that now both sit on a more standard HTTP POST URL, presumably object storage would also be just-another HTTP PUT/POST/etc [08:24] jam, yeah, that's my thought [08:25] jam, although probably not even that -- all we need to expose is GET, surely? [08:25] fwereade: well, fwiw, it seems really strange to have the API server be POSTing back to itself. [08:25] fwereade, I fail to see the point in that though [08:25] jam, fwereade: here's how i see that it might work: http://paste.ubuntu.com/7150179/ [08:25] jam, yeah, exactly [08:25] fwereade: so localstorage today is a way to make local look like other providres. [08:25] fwereade, what if we do want to know if there are no networks set and there are nil ones set [08:26] s/know if/distinguish between/ [08:26] rogpeppe: anyway, I don't mean to distract us from the rest of our work too much. I'm not sure that fixing this bug is worthwile yet anyway [08:26] actually, this is a little better: http://paste.ubuntu.com/7150180/ [08:26] dimitern, restate use case please? my model has been hat we shouldn't be able to tell the difference between a machine/service started with no networks set, and one created before it was possible to set networks [08:26] dimitern, when do we need to distinguish those cases? because that's basically what that NotFound does for us, and I think it's a disadvantage [08:27] jam: yes, i don't think i mind a little instability at bootstrap time *that* much [08:27] rogpeppe, honestly I do [08:27] fwereade, hmm.. ok then [08:28] rogpeppe, bootstrap weirdness is disproportionately painful for new users [08:28] fwereade, mgo.ErrNotFound will be ignored in readNetworks [08:28] fwereade: yeah, consider me convinced [08:28] dimitern, plus an explanatory comment, and we're good [08:28] fwereade: so this *particular* race requires you to be able to download a charm faster than the localstorage can start [08:29] jam, rogpeppe: so what's the actual rate we're seeing [08:30] fwereade: but I'm perfectly happy that if this takes precedence over other work, fixing it with retry code in LocalStorage.Put/Get is the most expedient fix [08:30] fwereade: CI has to do a retry loop in their local provider bootstrap && deploy test [08:30] fwereade: I've never seen it in the wild, and had to put a Sleep() in the code to trigger it myself [08:30] fwereade, something in the line of "We treat the non-existence of networks for a service or machine as a valid case and return nil error" ? [08:31] dimitern, "because pre-1.17.7" [08:31] dimitern, but yeah [08:31] fwereade, ok, cheers [08:32] jam, if it's really only hurting CI that that's a bit less of a stress for me -- and makes me all the more reluctant to build a general worker-dependency mechanism vs just implementing in-env storage [08:32] fwereade: so if you are using the local provider on a machine with low latency to launchpad and slow CPU, you could probably trigger it, that would generally be exercising the local provider on Canonistack [08:33] fwereade: the worker-dependency was never the fix for the bug [08:33] it was a "we have this pattern that seemed to come up that we aren't really handling" [08:33] jam, ok, I see [08:35] fwereade: it seems the outcome of the discussion is that Retry loops handle more cases [08:35] jam, that fits my instinct too [08:35] fwereade: the one thing I *did* think we also wanted [08:35] was a way to have Bootstrap know when things are actually ready [08:36] we can trivially hook into something like "workersStarted" and some sort of global flag the API server can see. [08:36] that Bootstrap could then request "have you started all workers" [08:36] though functionally that is probably the same as "is the API Server up" [08:36] jam, that's definitely interesting [08:37] jam, I'm a little suspicious of the quantity of gubbins we'd need but I'm not against the idea ;) [08:38] fwereade: so I already added a "workersStarted" channel to have the test suite be able to wait until it knew the workers were actually running, but I was personally aware that "started != running" [08:38] and we have no channel today to signal that [08:39] jam: i think it would be more like "are all workers currently up" than "have you started all workers" [08:41] rogpeppe: sure, but we *do* have code today that knows when all workers were started, we *don't* have a way to poll all the workers to know if they are running [08:41] (running as in able to respond to requests) [08:42] we could certainly phrase it in "is everything running" [08:42] and just return the best answer we have today [08:42] which is everything was started [08:42] and is not currently Dead [08:43] jam: i can only think of two workers like that - localstorage and API [08:43] rogpeppe, zero one infinity ;) [08:43] jam: none of the others respond to external requests AFAICS [08:43] fwereade: :) [08:44] fwereade: istm that it would be more appropriate to make those two special cases than to change the whole worker mechanism around them [08:44] hence my suggestion above [08:45] jam: what did you think of that as an idea, BTW? [08:46] rogpeppe, I'm not sure that what jam proposes is as much of a change as you seem to think [08:46] rogpeppe, sorry, I can't parse the important parts out of that paste [08:46] fwereade: ah, sorry, i'll trim [08:47] (jam, rogpeppe: there *is* some legitimate fuzziness in between "zero one infinity" and "three strikes and you refactor") [08:47] fwereade: in fact, it's easier than that - the important lines all contain the string "localStorageReady" [08:48] fwereade: Ithink it is essentially "NewWorkerReady" is a way to signal, you pass it into LocalStorage if you create it, otherwise just set ready immediately, and API Server waits on it. [08:49] jam: yeah. NewDependentWorker provides a way to gate the starting of one worker on the readiness of one or more ready signals [08:50] if the dependency graph got more complex, you could make it more table-driven [08:50] jam, rogpeppe: that does seem like a reasonable cut at it, but I do worry about that sort of mechanism's sanity in the face of restarting workers [08:51] fwereade: i think it can work ok, as far as it's possible to make it work at all [08:51] fwereade: in particular, once a worker has started, you have no idea that it remains up for any period of time [08:51] rogpeppe, you speak eloquently of the core of my objections :) [08:52] fwereade: but you have to do something some time... [08:52] fwereade: so the idea behind NewDependentWorker is that it waits for readiness and then starts the worker. [08:52] fwereade: but readiness is not set-once [08:52] fwereade: when a worker goes down, it should reset its ready flag [08:53] rogpeppe, and more mechanism creeps in to iteratively refine the quality of the guess... but we still need to be prepared for it failing to work [08:53] fwereade: another possibility i suppose is that if the thing you're dependent on goes down, then you should go down too [08:54] fwereade: sure, it can certainly fail to work - we should be prepared for workers to fail at any time [08:55] fwereade: but if we can't take a worker's readiness as a signal that we can start its dependents, then i'm not sure we can *ever* start its dependents :-) [08:56] rogpeppe, if the dependents can be resilient in the face of the occasional absences of heir dependencies, I think we sidestep the whole issue [08:57] fwereade: that's a reasonable point [08:57] fwereade: though i hate arbitrary retries :-) [08:58] rogpeppe, intellectually, so do I, but they often quite useful in practice [08:59] fwereade: you've certainly convinced me that this is the best solution at this time to this problem [08:59] * fwereade is happy then :) [08:59] * rogpeppe reluctantly throws away the nascent DependentWorker implementation [09:01] we should probably just use gridfs for storage, so there's no need for another service (though I realise worker dependency is bigger than this one issue) [09:02] fwereade: unfortunately goose has a very nice retrying HTTP Client that is not factored into a 3rd party lib, and is slightly Openstack specific (because of the Retry-after: header not being everywhere) [09:03] though I don't think the goose one retries if it fails to connect completel [09:03] completely [09:03] vs getting a Retry response [09:14] axw: +1 [09:17] mgz, hey [09:21] mgz, you should really set up some alerts on the (other) machine you're using irc to notify you when there's a message :) [09:23] axw, +100 to that [09:23] fwereade, one last look at https://codereview.appspot.com/79250044/ before I submit it for landing? [09:24] * fwereade looks [09:26] dimitern, LGTM [09:27] fwereade, thanks! [09:30] dimitern, would you take a look at https://codereview.appspot.com/79730043/ please? mostly moves, only significant change is using uniter/charm.Bundle instead of charm.Bundle [09:30] fwereade, sure, looking [09:30] dimitern, and abit of preparatory test gubbins that will be used in a followup [09:44] hello [09:45] fwereade, reviewed [09:45] hey wwitzel3 [10:07] dimitern: I have to miss the standup today, do you care to crack the Whip ? [10:07] jam1, sure, my pleasure :) [10:10] rogpeppe: did you push your changes from last night? [10:10] dimitern, reproposing [10:11] wwitzel3: checking [10:11] fwereade, will look when it appears [10:11] dimitern, cheers [10:12] good morning everyone [10:12] wwitzel3: pushed [10:13] voidspace: hiya [10:13] voidspace: hope you're feeling a bit better today [10:13] hi perrito666 [10:13] perrito666: hiya [10:13] hey voidspace , you getting along better today? [10:15] hey guys [10:15] thanks [10:16] rogpeppe: does that build for you? [10:16] wwitzel3: rogpeppe: definitely better - I wouldn't recommend coming to visit but I got a good night's sleep [10:16] wwitzel3: no, it's WIP [10:16] which makes a lot of difference :-) [10:16] voidspace: yeah [10:16] rogpeppe: k, just checking :) [10:16] fwereade, LGTM [10:18] rogpeppe: want to join the standup hangout, I'd like to talk about some of the changes? [10:18] wwitzel3: joining [10:18] rogpeppe: I didn't get back from dinner early enough last night to catch you :) [10:19] wwitzel3: i'm there [10:19] rogpeppe: I don't see you .. [10:19] rogpeppe: these hangouts are a PITA [10:20] wwitzel3: i've joined the usual juju-core standup hangout, right? [10:20] rogpeppe: yeah, same here :/ [10:20] wwitzel3: https://plus.google.com/hangouts/_/calendar/Y2Fub25pY2FsLmNvbV9pYTY3ZTFhN2hqbTFlNnMzcjJsaWQ5bmhzNEBncm91cC5jYWxlbmRhci5nb29nbGUuY29t.mf0d8r5pfb44m16v9b2n5i29ig [10:28] morning core devs: I am trying to track down the issue described in bug 1277445. It seems to be a juju-core provisioning issue, could you please take a look? [10:28] https://bugs.launchpad.net/ubuntu-advantage/+bug/1277445 [10:38] frankban, looking [10:38] dimitern: thanks [10:45] fwereade, mgz, standup [10:54] frankban, I think you're on the right track for that bug [10:54] frankban, wallyworld is the person most likely to have immediate insight, but I can talk about it myself in a bit, I think [10:54] fwereade: cool thanks. should this be assigned to juju-core? [10:55] frankban, yes, I think it should [10:55] fwereade: cool thanks [10:56] fwereade: uhm... Proprietary bugs cannot affect multiple projects. do you have a team to which assign this bug? [10:57] i can look at the bug a little later [10:58] frankban, maybe assign it to wallyworld then :) [10:58] fwereade, wallyworld: thank you! [10:58] * wallyworld has no idea off hand about root cause but will try and figure it out [11:10] voidspace: separate hangout? (unless you particularly want to follow the topics) [11:13] rogpeppe: sure, you start one and I'll grab coffee [11:13] wwitzel3, natefinch: i've pushed my WIP branch to lp:~rogpeppe/juju-core/030-MA-HA/ [11:13] voidspace: k [11:13] rogpeppe: ok, thank you [11:14] wwitzel3: did it work? I'm Z. Nate Finch noew, but of course I'm still all the way right for myself [11:14] natefinch: you're still in the same spot, you probably have to rejoin [11:15] wwitzel3: I did. Boo. [11:16] wwitzel3: wonder if making it an initial instead of a name was a mistake [11:16] voidspace: https://plus.google.com/hangouts/_/76cpijifqckugj18ncu09812oc?authuser=1&hl=en [11:28] rogpeppe: currently failing to join, trying again [11:30] natefinch: ready for my call? [11:32] wwitzel3: not exactly. Tuesday & Thursday, Lily, my older daughter, goes to preschool, and 7:30-8:30 is where I help get her up and dressed etc. Around 8:30 I'll be able to call, but I'll have Zoƫ, the baby, until about 9:15 or so, when my wife gets back from dropping off Lily. [11:32] wwitzel3: and now you know more than you ever wanted to know about my weekly schedule :) [11:33] natefinch: ok, I will look in to the upgrade stuff we discussed during standup then [11:33] wwitzel3: cool [11:33] and just ping me when you're ready, I've merged rogers latest code and pushed to my branch [11:34] natefinch: I need more nate, I need to know it all. :) [11:34] but the cameras will be installed next week, so we'll be cool then. [11:34] lol [11:35] jam: haha... we make boring TV, it's all poopy diapers and finger paints :) [11:35] (and hopefully not vice versa) [11:36] bbiab [11:36] natefinch: poopy paints and finger diapers ? Doesn't sound too bad [11:36] lol [11:38] * perrito666 discover a whole new world by having hangout in a completely different device than work [11:44] wwitzel3, natefinch, rogpeppe: so how is HA looking? Did we manage to sort out actually bring up a replica cluster? [11:45] jam: we actually got EnsureMongoServer working yesterday [11:45] rogpeppe: I thought Wayne's summary is that you still needed keyfile to have everything work together, but that would be a follow up patch [11:45] jam: we still do need keyfile, yes [11:46] jam: there should probably be an API to allow the machine agent to retrieve the keyfile data [11:46] jam: i've been considering whether we could just use a hash of the state server private key [11:47] rogpeppe: it is just a big password, right? I'd rather it just be a unique string that we hand to all the state server agents, otherwise there is bits of "why is it the hash(X)" rather than just "this is the string" [11:48] jam: if we use a hash of some existing data, we don't need any new API entry points [11:49] jam: and i can't *really* see why one might want to give a server the private key but not the keyfile [11:49] jam: or vice versa [11:49] jam: but... [11:49] rogpeppe: while that is True, I'm guessing we don't have 100% the API we need for everything today, do we? In which case we are changing the API anyway. [11:49] jam: i don't mind much either way. i was just wondering if we could save ourselves some work [11:50] jam: that's true, and i think that reminds me of some tickets we don't have on the board [11:50] jam: we need API entry points for retrieving the API server private key [11:51] rogpeppe: so I would be fine extending that API to be "give me the stuff I need to be a state server" and it hands the private key and the keyfile content and anything else we find is relevant. [11:51] jam: sgtm [11:52] jam: +1 [12:02] * perrito666 wonders why our ssh command doesn't prepend ubuntu@ to the host [12:11] perrito666: it does in cmd/juju/ssh.go [12:12] jam: I was talking about the utils one, ssh.Command [12:17] perrito666: because the utils/ssh one is meant to be less opinionated than that [12:18] i.e. it should allow ssh'ing to hosts as other users - that is required for the manual provider [12:23] axw: I see, thank you for the info :D [12:23] perrito666: nps [12:34] morning all [12:35] axw: ping [12:35] rogpeppe: yo [12:36] axw: if you wanted to deliberately inject a bad environ config into the state (for a test), how would you go about it? [12:36] axw: i am being thwarted by the Policy [12:36] rogpeppe: reopen state with a nil policy [12:36] axw: doh! of course. [12:36] axw: ta [12:36] rogpeppe: if you grep for state.Policy(nil) you'll find an example somewhere [12:36] nps [12:37] axw: presumably you could just use nil rather than state.Policy(nil) [12:37] axw: although it *was* useful to grep for :-) [12:38] rogpeppe: yeah that's just for documentation [13:00] fairly trivial review, anyone? (just moving code) https://codereview.appspot.com/76890049 [13:03] rogpeppe: lgtm [13:03] mgz: ta! === Ursinha is now known as Ursinha-afk [13:15] rogpeppe: natefinch: did you guys get the EnsureMongoServer patch put up for review? [13:15] jam1: i pushed the changes i'd made, but it wasn't anything like ready for review [13:16] jam1: i'm currently working with michael on getting the instance-ids published in the environ [13:17] jam1: I think we got the last of the unknowns out of the way. The code needs some cleanup, and I think the only thing left to do is implement upgrading existing environments. which is what we were talking about this morning in the standup [13:17] natefinch: so if we didn't support upgrade, but we did support just working in 1.18, I'd be happier working on the follow up patches. [13:18] jam1: I think we can get that ready for review by EOD, then [13:18] that is, if upgrading to 1.18 doesn't break, just doesn't give you the ability to do HA [13:19] mgz: what's up with the destroy-environment empty .jenv stuff? We'd like to do a release today to get closer to 1.18 [13:19] jam1: we won't have actual HA today, just a replicaset of 1 [13:19] natefinch: sure [13:19] perrito666: what's up with https://code.launchpad.net/~hduran-8/juju-core/1295650_rsyslog_on_restore/+merge/212207 ? [13:19] Is it something we still want to have for 1.18 ? [13:20] jam1: definitely [13:20] perrito666: well, we'd like to do a release today [13:20] jam1: I am applying the suggested changes from the review [13:20] and will re submit-it in a moment [13:21] jam1: we'll have to test upgrade === Ursinha-afk is now known as Ursinha [13:22] * perrito666 types faster [13:29] perrito666: I mostly just wanted to know if there was a blocker and if there is anything I can do to unblock it [13:29] but it sounds like you're active on it [13:30] jam1: 1.17.7 is planned for today? [13:30] axw: we would like to if there isn't any remaining blockers [13:30] the milestones look possible [13:31] axw: do you know of other things ? https://launchpad.net/juju-core/+milestone/1.17.7 [13:31] cool, looks good to me [13:31] jam1: nope, I got a little bit delayed on that one bc of the time it takes to run the suite on some situations [13:31] jam1: nope. there's the empty jenv thing, but tbh I think it could wait [13:31] axw: yeah, I was trying to poke mgz to see where that was at [13:35] rogpeppe: are you and voidspace able to pair again today? [13:35] jam1: that's what we're doing, yes [13:37] rogpeppe: sounds great [13:37] night all [13:45] jam1: I may be able to get it in before the release [13:47] mgz: so it doesn't seem critical, but it really feels like we need to make a decision whether we want to try to get it in. [13:47] okay [13:49] jam, mgz: I would be most grateful if we did, *but* I was wondering again [13:51] fwereade: certainly I'd rather have a release than wait on it (IMO), but it does seem like it should be a small amount of work. [13:51] jam, mgz: do we not namespace our local environments per actual user anyway? I don't see why we need to claim space with an empty jenv... the cost appears not to match the benefit, of defending against a user bootstrapping the sameenvmultiple times in parallel [13:51] jam, mgz: have I missed some nuance? [13:51] fwereade: JUJU_HOME [13:51] apart from that, no [13:52] mgz, explicitly shared JUJU_HOME across multiple users? [13:52] yeah, or other per-role account stuff [13:52] fwereade: the particular nuance that i'm thinking of is if the config store is actually a network-shared resource [13:52] eg, I ssh into a box to do server stack things, there can be someone else doing the same thing [13:53] fwereade: (which something that i've tried to keep as a possibility) [13:54] rogpeppe: hoping that locking scheme actually *works* on a network-shared resource is probably a bit much :) [13:54] rogpeppe, so two users storing config for their local dev environments remotely? surely the answer is not "try to act to block collisions" but "don't collide in the first place" [13:54] mgz: the locking scheme isn't defined by the interface [13:54] it's a create-a-file, error if it exists one, no? [13:55] rogpeppe, and I'm not sure it really helps for remote environments either [13:56] wwitzel3: back finally. Want to get on a call? [13:56] mgz, we are aiming for a model where, if you need to communicate with juju, you have your own identity and are expected to use that [13:57] fwereade: rogpeppe: So I see "bootstrap.CreateStateFile" but I don't see anything calling it. [13:57] fwereade: i'm thinking of a situation where there's a group of users that have a shared environment name space [13:57] mgz, I'm not really interested in the multiple-admins-debugging-a-system-with-the-systems-own-identity situation [13:57] fwereade: so that one of them might destroy an environment and re-bootstrap it, and others can still connect to it [13:58] fwereade: this corresponds closely with the way that some existing users are working [13:58] mgz, if they aren't sufficiently well-coordinated that they can avoid having two admins fixing the same problem at the same time, we are unlikely to be able to help them [13:59] mgz: well, it does require that there is *something* which the various clients can rendezvous on [13:59] I think we have a variant of this issue even if we drop the empty file creation anyway [13:59] jam: i don't think CreateStateFile is pertinent here, is it? [13:59] mgz, ok, I do not want to confuse the issue and distract you from a simple fix that solves the problem anyway [14:00] FWIW i have actually got a web-server implementation of configstore.Storage lying around somewhere [14:01] mgz, if you're having trouble with that AddService branch I can do it btw [14:01] fwereade: the real question is: do we expect users to be able to share an environments name space? [14:02] dimitern: I shall be hitting propose ina sec [14:02] fwereade: and the failure case i'm concerned about is orphan environments [14:02] mgz, ta! [14:02] anyway, i must lunch [14:03] * rogpeppe lunches [14:03] * fwereade will think on rogpeppe's words [14:11] natefinch: yep, sounds good [14:19] fwereade: rogpeppe: so I think we have a specific case where the chances of creating an empty file causing problems * the actual problem is much higher than the chance of failing because of a real race [14:20] jam, agreed [14:26] core devs: I am encountering an error while trying to deploy a service in trusty using a local env: see http://pastebin.ubuntu.com/7151411/ Is this a known issue? [14:46] fwereade, got a moment? [14:55] dimitern: https://codereview.appspot.com/79800043 - want anything else for merging with your bits? [14:56] I was faffing with making the passing around use a shared struct, but don't think you need that right now, so have deferred it for later [14:56] mgz, looking [15:00] mgz, reviewed [15:02] ah yeah, forgot nil is just fine for a slice for some reaso [15:06] dimitern: done, and landing [15:09] jam: i definitely agree that the chances of a real race is pretty low to non-existent in most realistic cases [15:10] jam: but that might not be the case in some future scenarios [15:11] rogpeppe: hey, I re submitted https://codereview.appspot.com/78870043/patch/20001/30002 would you bash-review it for me? :) [15:12] perrito666: looking [15:12] tx [15:12] perrito666: please could you use tabs where the code is already using tabs [15:13] rogpeppe: certainly, sorry, I dragged the python config from my editor [15:13] perrito666: which editor? [15:13] wwitzel3: vi [15:14] I just set tabs-> spaces for all files intead of only python [15:15] perrito666: best to avoid any tab<->space conversion in Go [15:16] perrito666: reviewed, BTW [15:16] rogpeppe: t [15:16] tx [15:16] I just re-set the editor to avoid mixups in the future [15:16] perrito666: also running gocmt on save will fix any tab/space issues as well [15:16] gofmt === hatch__ is now known as hatch [15:18] perrito666: though I still haven't got gofmt to highlight my error line in vim, so if you get that going let me know so I can steal it :) [15:18] wwitzel3: Ill take a look during the weekend, I could certainly use that [15:20] mgz: remind me how to kick the gobot into action again, please [15:21] mgz: (it hasn't done anything for a couple of hours now) [15:21] rogpeppe: depends on the kind of stuck, generally ssh in and look at the logs [15:21] and borked mongo tends to be the blame, so kill some processes and see [15:22] mgz: how does one rename the *(default) branch in bzr? [15:22] being a bit careful to distingush between test juju ones, and actual on the machine juju ones :) [15:22] mgz: that was it [15:22] wwitzel3: to setup colo, you do `bzr switch -b NAME` and NAME becomes trunk [15:22] mgz: ta! [15:22] mgz: thanks :) [15:25] hey, I am suddenly getting this while trying to fetch status http://pastebin.ubuntu.com/7151665/ [15:25] any ideas? [15:28] perrito666: looks like a testing cert, in a non-testing env [15:28] somehow the client doesn't have the self-signed bit client side [15:28] mgz: and then to get a remote branch from lp .. bzr switch -b NAME lp:BRANCH ? [15:28] maybe you're trying to talk to an older env with a fresh env? [15:29] wwitzel3: `bzr branch --switch lp:BRANCH co:NAME` [15:29] otherwise you bind, which is probably not what you want [15:41] mgz: thank you [15:46] ah, pants, conflict [15:54] mgz: what are your pants conflicting with? [15:54] my socks maybe? [15:59] fwereade: what do you know about the Characteristics field inside bootstrap.BootstrapState? [15:59] fwereade: ISTM that it's now redundant [15:59] rogpeppe, I was just about to say I think it is redundant [16:00] fwereade: and even if it isn't, it was only ever used at bootstrap time, right? [16:00] even if we do use it we should have easier ways of getting the data in [16:00] rogpeppe, concur [16:00] fwereade: all this leads to: i don't think it matters if publishing state server instance ids overwrites it [16:01] rogpeppe, agreed, but please make sure it really is unused (or guaranteed no longer used by the time it could be overwritten) [16:01] rogpeppe, and file a tech-debt bug for doing it right if we don't already [16:07] fwereade: it seems like we do it right already - i deleted [16:07] * fwereade cheers [16:08] the field [16:08] and all the code still compiles [16:08] deleting code is always my favorite thing to do [16:22] rogpeppe: https://codereview.appspot.com/79820043 [16:24] ah, it was the succesful restore of the backup that was breaking the dev certificate... price of success [16:25] simple review anyone? https://codereview.appspot.com/79820043 [16:27] fwereade, natefinch, jam: ^ [16:27] rogpeppe, meetings I'm afraid [16:27] fwereade: np [16:29] if I have a bzr branch, lets call it B, originally from branch A, then I want to start depending on branch C instead of A, is as simple as a merge and tell lbox that I depend on C? [16:30] even more trivial review anyone? https://codereview.appspot.com/79830043/ === tvansteenburgh is now known as tvansteenburgh-l [16:31] perrito666: it depends if you want launchpad to know about the prereq [16:31] perrito666: in general, prereqs are a pain === tvansteenburgh-l is now known as tvanlunchburgh [16:33] natefinch, wwitzel3: how's it going? === vladk is now known as vladk|offline [16:36] is there a known issue on trunk and issues bootstrapping.. seems like the ssh connection stalls out, even though things are potentially done. [16:48] hazmat: not that i'm aware of [16:55] rogpeppe: going good. Merged with trunk, had a few conflicts to fix... tests are compiling and almost passing now [16:55] voidspace: lp:~rogpeppe/juju-core/531-SetStateInstances [16:56] rogpeppe: same as nate, I'm just trying to get a set of test failures passing, not sure why your message before didn't hilight [16:56] rogpeppe: all the cmd/jujud bootstrap tests are compling about no --env-config being passed in, but it is clearly being passedi n. [17:01] mgz: so, using the branches workflow you taught me at the sprint [17:01] mgz: "bzr switch -b new_branch_name" [17:01] ...which I should still send to the list [17:02] mgz: always creates a branch from *the current branch* [17:02] voidspace: yup [17:02] mgz: what I inevitably (usually) want is "create a branch from trunk" instead [17:02] just `bzr switch trunk` first for normal tings [17:02] mgz: is there a handy shortcut for that [17:02] heh, ok [17:02] I usually forget... [17:02] I wondered if there was an alternative [17:05] voidspace: `alias bzrnew="bzr switch trunk&&bzr switch -b" [17:05] ` [17:05] mgz: yeah, I guess so :-) thanks [17:11] voidspace: what I hate most is forgetting to switch before doing a big merge... [17:11] as they can't be carried over, unlike general changes [17:12] right [17:12] all good fun [17:12] mgz: the thing I hate is doing a merge when I have a bunch of local changes (or switching branches when I have local changes) [17:13] mgz: I really want bzr to complain that I have local changes and make me add a flag or confirm that I really want to muck with my local changes on this branch first [17:15] hm, yeah [17:21] hey could someone remind me the process to get something merged once it got reviewed and approved? [17:21] perrito666: `bzr rv-submit` is the easy way [17:21] I think I walked you through setting that up [17:21] mgz: it is set up [17:22] :D I had forgot the command I was not sure if there was anything else to do there === tvanlunchburgh is now known as tvansteenburgh [17:31] Simple review for someone: https://codereview.appspot.com/79820043 [17:35] jam: my blocking proposal has been submited for merge [17:40] anyone could hint me the syntax for merging two lightweight branches??? [17:41] perrito666: you mean in bzr? [17:41] voidspace: reviewed [17:41] mgz: thanks, looking [17:42] perrito666: generally want to care about which one is the left hand side, so which branch you're merging into the other [17:42] mgz: on your first point, I'll add a comment [17:42] so, switch/goto that branch, then `bzr merge LOCATION_OF_OTHER_BRANCH`, then resolve conflicts, then commit [17:42] mgz: on your second, they're actually being published separately [17:43] mgz: so if we bundle them together we'll just need to unbundle them [17:43] mgz: so bundling as a struct not really appropriate [17:43] mgz: I'll add the comment and push a new revision [17:43] voidspace: fair enough [17:43] mgz: if you're happy with my excuse I'll just approve after that [17:43] mgz: I want to sort of run an integration test between two branches of mine [17:43] perrito666: okay. good special case [17:44] best thing for that, is go to trunk [17:44] merge the first branch [17:44] mgz: to add to that: address is orthogonal to instance id - we can potentially have an instance id but no address yet [17:44] merge --force the second branch [17:44] then run tests [17:44] (then throw away the changes on trunk with revert or similar) [17:44] bzr merge just fails stating that it cant find the path [17:45] rogpeppe: was just an api neatness comment, if there are good reasons to think of them as seperate values like that, then it makes sense as is [17:45] you need to address the other locations correctly [17:45] `bzr info LOCATION` should work, for bzr merge to understand it [17:46] mgz: rogpeppe: so I've updated the comment that we need to actually use instanceIds [17:46] so, you'd need a co: for the colo workflow, or some leading ../../ for seperate trees layout [17:46] mgz: rogpeppe: ok to approve? [17:46] mgz: tx [17:47] mgz: there *was* a TODO comment two lines up from your "at least a TODO comment" remark :-) [17:48] yeah, but not one I parsed apparently :) [17:48] I did giggle at the diff [17:48] mgz: thanks [17:49] mgz: a review of this would be appreciated too, if poss (deleting code only) https://codereview.appspot.com/79830043/ [17:49] hm, I kind of wanted Ian to look at that [17:52] http://beza1e1.tuxen.de/articles/accidentally_turing_complete.html [17:52] I guess we all knew C++ templates were turing complete, but did you know that Magic the Gathering rules are turing complete? [17:53] voidspace: I think Mt Gox was written in MTG [17:53] heh [17:53] apparently mediawiki templates, sendmail configuration and apache rewrite rules are all turing complete [17:53] I guess I knew there were infinite loops in MtG [17:54] wait, did I miss a conversation about Magic the Gathering, or what? [17:54] rewrite rules are a special bit of hell [17:54] can I mark my own bugs as fixed after merged? [17:55] perrito666: the bot should actually do it for you [17:55] natefinch: apparently the rules are complex enough to be turing complete [17:55] if you linked the bug correctly to the branch [17:55] voidspace: I can believe it [17:56] mgz: the branch is linked https://code.launchpad.net/~hduran-8/juju-core/1295650_rsyslog_on_restore and says there it is meged === vladk|offline is now known as vladk [17:58] perrito666: I've manually flipped the bug, in case the bot was just having a holiday [17:59] mgz: have you got a few moments to chat? [17:59] rogpeppe: sure [18:00] standup hangout? [18:00] mgz: https://plus.google.com/hangouts/_/76cpijifqckugj18ncu09812oc?authuser=1&hl=en [18:00] damn, something in agent/bootstrap tests isn't properly cleaning up behind itself :/ [18:01] rogpeppe: can you do the invite thing to that? [18:01] mgz: invited [18:02] mgz: tx [18:27] g'night all [18:28] EOD [18:28] see you tomorrow [18:30] mgz: there's one wrinkle still: what about old clients talking to the new version? i'm not sure what degree of compatibility guarantees we provide for that. [18:31] mgz: even if we leave the original instance id there, since it won't be updated, once the original instance dies, old clients won't have the fallback-to-environment option. [19:01] rogpeppe: so I'm stuck, when the cmd/jujud bootstrap_test calls selectPreferredStateServerAddress, the inst.Addresses() is returning nil [19:01] rogpeppe: can't figure out how that is even that case [19:01] wwitzel3: are you in a hangout? [19:01] yeah [19:01] wwitzel3: link? [19:01] rogpeppe: https://plus.google.com/hangouts/_/76cpitlq3utehe1ddn4giaao8s [19:08] i.addresses = instance.NewAddresses([]string{string(i.id) + ".dns"}) [19:09] wwitzel3: http://paste.ubuntu.com/7152689/ [19:14] wwitzel3: http://paste.ubuntu.com/7152710/ === Ursinha is now known as Ursinha-afk [19:24] hey, anybody from the juju core team got 30 min to work on a "special project" with me? [19:27] wwitzel3, vladk, mgz, natefinch, jam, fwereade, hazmat, cmars -- any of you available? [19:27] mramm: sure [19:27] mramm: hi [19:27] mramm: oh, missed the 30 min requirement... not really === Ursinha-afk is now known as Ursinha [19:29] what is "special project"? [19:29] natefinch: understandable, I know you are on the HA stuff [19:29] vladk: I'll ping you in private [19:29] mramm: yep. Too bad. It's close though. [19:44] :D amazon restored [19:44] the whole thing worked [20:02] natefinch: pushed the fixes for cmd/jujud/bootstrap_test.go to my HA branch. [20:02] EOD for me but I'll be floating around [20:13] sinzui: re bug 1297306 [20:13] <_mup_> Bug #1297306: local juju bootstrap fails, cannot talk to db 37017 [20:13] yes? [20:13] sinzui: I'm guessing either mongo issue or no-proxy/http-proxy issue [20:14] sinzui: do we have access to the machine? [20:14] thumper, I suspect the mongo 64k page size bug is the issue [20:14] check status of juju-db service [20:14] thumper, It is not ours [20:14] this line "sudo: unable to resolve host frb-juju" is suspect [20:40] mramm: if you'd pinged me, i might've been up for a bit of light relief :-) [20:44] rogpeppe: sorry, missed your name in the IRC list [20:44] mramm: np. i'm better doing this anyway... === vladk is now known as vladk|offline [22:43] hey did .7 got released? [22:44] perrito666: not yet AFAIK [23:57] thumper: plugins boilerplate is ready for review: https://code.launchpad.net/~waigani/juju-core/suspend-resume-local-plugin/+merge/212743 [23:58] thumper: === RUN Test [23:58] OK: 16 passed [23:58] --- PASS: Test (0.06 seconds) [23:58] PASS [23:58] ok launchpad.net/juju-core/agent 0.360s [23:58] passed with gccgo