[00:14] sinzui: i'm open for assignment of bug fixes [01:35] thumper: I need to have a call with you today about VLANs, let me know when is convenient please [01:36] bigjools: now is as good as ever [01:36] bigjools: also, not sure how useful I'm going to be :) [01:37] thumper: ok let me grab a drink and I'll call in 5 mins [01:38] it's a start if nothing else :) [01:41] thumper: calling [03:36] wallyworld_: here are the test changes I was telling you about https://codereview.appspot.com/25460045/ [03:39] ok, i've already changed the method names locally. i'll pick up the changes once you land [03:45] wallyworld_: ack [03:48] * thumper runs on yet another small errand... [07:21] axw: we might do a config.New, but the warning is inside Validate [07:22] (it is at the end of environs/config.Validate) [07:22] jam: eh, sorry, not sure how I confused that [07:22] I guess that is somehow different from Config.Validate() ? [07:22] jam: ah, config.New calls that [07:23] axw: at *one* point we had explicitly discussed not even parsing sections we don't know about so that you could have pyjuju and juju-core environments in the same file [07:23] it's just for validating common configuration [07:23] but it also applies for multi-version stuff. [07:23] axw: I don't see why we need to config.New for anything we won't use [07:23] yeah, I don't see any value in parsing if we're not using it [07:23] we should just defer to first use [07:24] ReadEnvirons just parses everything into Environs objects [07:25] jam: yep, so we could just modify Environs.Config to do the parse on first reference [07:28] axw: what is silly is in environs/open.go we ReadEnvirons("") , just to get name from envs.Default (that we may not use), and then we actually read the info from store.ReadInfo(name), and only if that fails do we actually use the envs we just read [07:28] heh [07:30] axw: so we *do* assert that environs.ReadEnvironsBytes doesn't generate an error, and you only get the error when you use environs.Config() [07:30] well, the environs.ReadEnvironsBytes().Config(name) [07:31] but that is actually because we've subtly put the err on part of the struct [07:31] waiting to report it until later. [07:32] should be nice and easy then :) [08:57] fwereade: thanks for doing the backport [08:58] jam,np [08:58] axw: so there is a small point about creating a new config each time. If we are creating a warning, we'll do it twice. [08:58] jam, the --force one might be a little trickier [08:58] I wonder if that is a problem [08:58] fwereade: because the code is different, or because it is more invasive? [08:59] jam, just because it's a few branches and I'm paranoid [08:59] fwereade: just because you're paranoid doesn't mean they *aren't* out to get you. :) [08:59] jam, words to live by === axw_ is now known as axw [09:08] axw: for https://code.launchpad.net/~axwalk/juju-core/jujud-uninstallscript/+merge/194994 [09:08] how do we handle upgrades ? [09:08] as in, there won't be anything in agent.conf on a system that we upgraded [09:09] fwereade: in https://code.launchpad.net/~wallyworld/juju-core/provisioner-api-supported-containers/+merge/194982 he mentions "A change was also made to the server side implementation so that the machine doc txn-revno is no longer checked." [09:09] that sounds risky to me, but I'd like to get your feedback on it. [09:12] axw: I didn't mean to scare you away [09:14] fwereade: one thing about our CLI API work. The new CLI is likely to be incompatible with old server versions when we have to create an API for the command. (the "easy" case vs the "trivial" case). Do we care? [09:14] we definitely haven't been implementing backwards compatibility fallback code. [09:17] jam, looking at wallyworld_'s [09:17] jam: we rarely check txn-revno. mainly for env settings. never previously for machines. i was trying to be more stringent by introducing it [09:17] jam, pondering the latter [09:17] jam, wallyworld_: a txn-revno check is a big hammer and should not generally be used until we've exhausted all other possibilities [09:17] jam, wallyworld_: far better to check only the fields we actively care about [09:18] fwereade: yes, i came to that conclusion [09:18] jam, wallyworld_: but sometimes that's not practical [09:18] http://paste.ubuntu.com/6409771/ [09:18] i like optimistic locking in general as a pattern [09:18] juju compiled with gccgo [09:18] fwereade: so that sounds like what he's done [09:18] as in, we used to assert the whole thing, but now we just assert the one field [09:18] jam, wallyworld_: last I was aware, we'd managed to eliminate them all, I guess another crept in [09:19] i introduced it [09:19] in a recent branch [09:19] davecheney, cool,does it work? ;) [09:19] it worked in practice but tests failed with some new work [09:19] davecheney: cool [09:19] davecheney: does it work? [09:20] davecheney: nice. Almost 3MB smaller than the static one. :) Wish it was more like 15MB smaller. [09:22] rogpeppe1: i didn't try to bootstrap it [09:22] oh speaking of that [09:22] jam, wallyworld_: it looks sane to me -- the only way we could be screwing that up is with multiple agents for the same machine on separate instances, and the nonce stuff should guard against that effectively [09:22] i need someone to commit a mgo change [09:22] fwereade: right. that attr is only set once as the machine agent spins up [09:23] davecheney: you can lbox propose the change, i think [09:23] it's not mine [09:23] fwereade: btw, you forgot to look at my wip branch :-( [09:23] wallyworld_, hell, sorry [09:24] this is what happens when I write code :( [09:24] np. it's at the point now where i just need to add one more tests [09:24] and i can propose formally [09:24] rogpeppe1: https://code.launchpad.net/~mwhudson/mgo/evaluation-order/+merge/194968 [09:24] mike doesn't know how to use lbox [09:24] i've done live testing abd it all seems fine [09:24] wallyworld_, link please? [09:24] https://codereview.appspot.com/25040043/ [09:24] just some tests to add [09:25] i've proposed the addsuportedcontainers stuff separately [09:25] hence the discussion earlier about txn-revno [09:25] wallyworld_, ta [09:26] davecheney: the fact that lbox is not used is not a bad thing :-) [09:26] fwiw: lucky(/tmp) % strip juju [09:26] lucky(/tmp) % ls -al juju [09:26] -rwxrwxr-x 1 dfc dfc 11438248 Nov 13 20:26 juju [09:27] rogpeppe1: anyway, that needs to land before juju will work properly [09:27] wallyworld_: I have a review in. [09:28] thanks :-) [09:28] davecheney: yeah [09:28] davecheney: you could propose it yourself, i guess [09:28] wallyworld_: I did have some comments where I think we are missing test coverage, and possibly having a client-side API that matches the other functions [09:29] but hopefully minor stuff [09:29] ok [09:29] davecheney: it would be nice if go vet (or some other tool) could give warnings about undefined behaviour like that. i guess it's not possible in general though. [09:30] jam: " after api.Set* from the API point of view" - there is no api call to get supported containers [09:30] the api is currently write only [09:30] rogpeppe1: /me considers what it would take to detect this behavior [09:30] wallyworld_: so... do we just do it on every startup ? [09:30] it would be nice if we would check if we've already done the lookupb [09:30] unless the lookup is exceptionally cheap [09:30] I guess [09:31] every machine agent start up [09:31] wallyworld_: anyway, if you *can't* test it, just say so. :) [09:31] ok :-) [09:31] davecheney: the oracle might have enough information to find some simple cases [09:31] jam: i've not done anything with permissions yet so i'll nned to see how to manipulate them to add a test [09:33] wallyworld_: there should be other tests you can crib from. Usually it is "set up 3 machines, use the agent for machine-1, try to change something on all 3 machines, and assert that you get EPERM on the ones you're not allowed" [09:33] I was actually suprrised that in one call you could change m [09:33] both machine-0 and machine-1 [09:33] jam: i simply copied another test and used a different api call [09:34] wallyworld_: so I guess, "lets think what the perms on this should be, and assert them in a test" [09:34] rogpeppe1: I also have a fix in for gccgo to fix the go/build breakage [09:34] *I* would say that the only thing that is allowed to change the value of a machine's supported containers is that machine's assigned agent [09:34] which is an AuthOwner sort of test. [09:34] davecheney: cool [09:38] wallyworld_: and if you remember the thing you copied, we might have a security hole there, so at least file a tech-debt bug to track it. [09:39] "if" is the relevant word :-) [09:39] i'll take a look [09:40] fwereade: i've just realised that for ensure-ha to be transactional, State.AddMachine needs to take a count argument and for that to create all its machines in the same transaction. Looking at the transactions around AddMachine, this is a bit OMG. [09:41] rogpeppe1, how much of it is applicable in that case? IIRC most of the complexity is around containers rather than machines themselves [09:41] fwereade: i'm not sure - i haven't grokked the code yet [09:41] jam, I'm still thinking about api backward compatibility and thinking that it kinda sinks the --force fix for 1.16 [09:42] fwereade: just the idea of making transactions that can be hundreds of operations long fills me with doubt [09:42] jam, 1.16s should really work with other 1.16s [09:42] rogpeppe1, how would they be that long? [09:42] fwereade: juju add-machine -n 100 ? [09:43] rogpeppe1, ah wait, unexplored assumption [09:43] fwereade: i guess we wouldn't need to use the count for add-machine [09:43] rogpeppe1, why does AddMachine need a count argument? [09:43] rogpeppe1, jinx :) [09:44] fwereade: not quite: State.AddMachine needs a count argument. juju add-machine doesn't need to use it. [09:44] rogpeppe1, in general if things like -n need to be transactional (which I agree they should) I think the sane answer is to stick it in a queue of somesort [09:44] rogpeppe1, not quite so sure about that [09:44] fwereade: AddMachine needs a count argument because otherwise we'd be able to have an even number of state servers [09:44] rogpeppe1, wouldn't HA methods on state be saner? [09:45] fwereade: i'd intended to do so. but those methods need to add machines [09:45] rogpeppe1, right, so they should use addMachineOps [09:47] rogpeppe1, the various unexported *Ops methods in state are the building blocks of transactions [09:47] rogpeppe1, they are I admit kinda gunky in cases, like lego buried in leafmould for years [09:48] rogpeppe1, but they're internal and therefore subject to safe improvement as required [09:48] fwereade: and then we make State.AddMachine barf if its jobs contain state server jobs? [09:48] rogpeppe1, probably [09:48] fwereade: this all makes me feel highly uncomfortable [09:49] rogpeppe1, which is I admit a hassle from a test-fixing perspective [09:49] rogpeppe1, if it's hard for you to write the code this is all the more reason we should not just hand the user the same toolkit you're reacting against and tell them to figure it out [09:49] fwereade: we'd also need to have another special case for adding machine 0 [09:50] rogpeppe1, machine 0 is already a special case [09:50] fwereade: not in state, currently [09:50] fwereade: AFAIK [09:50] fwereade: because --force requires a new API? I guess it does add a parameter, but won't that just be ignored otherwise ? [09:50] fwereade: it's hard to write the code in this particular style [09:50] rogpeppe1, it's an InjectMachine not an AddMachine [09:51] fwereade: well, InjectMachine would need the same restrictions as AddMachine, no [09:51] ? [09:53] jam, maybe it's not such a big deal -- it won't work if the agent-version is old, but it'll be silent [09:53] fwereade: for good or bad that has been our answer for API compatibility [09:53] fwereade: another possibility that means that we wouldn't have to transactionalise all this fairly arbitrary logic is to just put the ensure-ha bool value in the state [09:53] rogpeppe1, how many other InjectMachine cases are there? [09:53] fwereade: and that isn't much different than a 1.18 client trying to do it against a 1.16 system. [09:53] jam, I think there's a distinction between stuff not working across minor versions vs patch versions [09:55] fwereade: maybe, though I think from a *client* perspective patch versions shouldn't really break things either. [09:55] jam, s/patch/minor/? [09:55] *I* have been hoping to push that more once we actually got everything into the api [09:55] fwereade: right [09:55] fwereade: only one - in the manual provisioner [09:55] fwereade: I *really* want the client that is on 14.04 initially to still work 2 years later [09:55] jam, yes indeed [09:55] jam, at that point I think it's a matter of freezing Client and writing new methods that are a little bit consistent with each other, and with the style of the internal API [09:56] fwereade: yeah, I was thinking about the Batch stuff we did. And realizing that the thing we *really* want to be Batch is Client, which was written before we were focusing on it. :( [09:58] rogpeppe1, remind me, does manual bootstrap use jujud bootstrap-state? if so the other case is more like RegisterMachine -- which is really an AddMachine with instance id/hardware [09:58] fwereade: in fact, the more i think about it, the more i think it would be better if we just signalled the HA intention in the state, and let an agent sort it out, the same way the rest of our model works. [09:58] rogpeppe1, if it's a matter of rearranging the state methods that's just the usual process of development, I think [09:59] fwereade: a significant part of it is that we may very well want more on-going logic around ensure-ha in the future [09:59] rogpeppe1, the concern there is about automatic failover -- I don't want to be rearranging mongo all the time on the basis of presence alone, without user input [09:59] rogpeppe1, expand on that bit please? [10:00] jam, it has been a constant source of low-level irritation to me as well ;) [10:00] fwereade: so, for example: at some point we will probably want to automatically start a new state server machine when one falls over [10:00] rogpeppe1, we might [10:00] rogpeppe1, in which case it's a trivial agent that keeps an eye on presence and calls the ensure-ha logic that we currently require user intervention for [10:00] fwereade, jam: i'm unconvinced that the one-size-fits-all batch approach we use in our internal API is appropriate for the client API. [10:01] rogpeppe1, it is wholly apropriate for the client API, the internal API is the bit that's arguable [10:03] fwereade: expand, please [10:03] rogpeppe1, the argument that there are times when you really only want one entity to be messed with is reasonable in the case of, say, internal Machine.EnsureDead -- because it's governed by an agent that (currently) only has responsibility for one machine [10:03] rogpeppe1: a Client is much more likely to care about more than one unit/machine/thingy at a time. most of the agents have a 1-1 correspondence [10:04] rogpeppe1, I don't see any justification for requiring that any client-led change must be broken into N calls [10:05] fwereade, jam: it's easy for a client to make many calls concurrently [10:05] rogpeppe1, take DestroyMachines/DestroyUnits for example -- that goes halfway, and then does that horrible glue-errors-together business [10:05] rogpeppe1, you have a different idea of "easy" than many other people [10:05] rogpeppe1, and it also prevents us from ever batching things up usefully internally [10:05] fwereade: i have no particular objection to making some calls batch-like, on a case-by-case basis [10:05] rogpeppe1, I do [10:06] fwereade: so you do [10:06] rogpeppe1, because we're bad at predicting, and the cost of a 1-elem array vs a single object is negligible, and allows for compatibility when we get it wrong [10:07] fwereade: it's *not fucking negligible* [10:07] rogpeppe1, I'mnot saying the HA stuff is easy, if that's what you're referring to [10:08] rogpeppe1, are you making an ease-of-development argument? [10:08] fwereade: i'm making a keep-it-fucking-simple-please and this-is-unneccessary-and-insufficient argument [10:10] rogpeppe1, unnecessary I think we'll have to differ on, insufficient is more interesting [10:13] * rogpeppe1 goes for a walk [10:21] fwereade: man, you drove everyone away. :) (axw, mramm, etc.) [10:22] jam, I'm pretty unpleasant really when it comes down to it [10:22] I know *I'm* glad I only have to put up with you for 1 week every few months. :) [10:22] anyway [10:24] ;p [10:25] hi coredevs: I need to implement a "bootstrap an environment only if it's not already bootstrapped" logic for the quickstart plugin. I thought about two option 1) try: juju bootstrap; except error, if error is "already bootstrapped" then ok. Options 2) is: if JUJU_HOME/environments/.jenv exists then ok, already bootstrapped. 1) seems weak (I'd have to parse the command error string) and 2) seems to rely [10:25] on an internal detail (those jenv files). Suggestions? [10:27] frankban: it is also *possible* for foo.jenv to exist but not be bootstrapped, though that is subject to bugs in bootstrap (which are hopefully rare enough to not worry about) [10:27] frankban: an alternative is to try to connect to the env rather than try to bootstrap first [10:28] frankban, I would prefer (2) because Isee .jenv files as pretty fundamental, and waxing in importance -- the wrinkle is that a .jenv might be created without the environment being bootstrapped (by sync-tools) [10:28] jam, frankban: however if no jenv exists the env is certainly not bootstrapped [10:29] jam, frankban: and I would imagine that quickstart is *always* going to want to create a new environment [10:29] jam, frankban: so just picking a name not used by a jenv, or environments.yaml, might end-run around the problem? [10:30] fwereade: quickstart is meant to "help you along your way" [10:30] so if you haven't bootstrapped yet it starts there [10:30] if you've bootstrapped but not yet installed juju-gui [10:30] then it starts there [10:30] etc [10:30] so running it multiple times should be convergent [10:30] even if it gets ^c in the middle. [10:31] jam, bah, ok [10:31] fwereade: one of the goal is make quickstart idempotent, so, if the environment is already bootstrapped it must skip that step, if the GUI is in the env then skip that step too, and so on... jam: to connect to the API i need to know the endpoint, and I'd like to avoid asking permission and calling juju api-endpoints. [10:31] frankban: I would probably go with "if .jenv exists, try to connect" [10:31] frankban: "asking permission" ? [10:31] jam, frankban: +1 [10:32] don't you have to call "juju api-endpoints" at some point regardless [10:32] jam, frankban: and if that fails, fall back to bootstrapping? [10:32] we *do* plan to cache the api addresses in the .jenv file [10:32] but I don't know when that code will actually be written. [10:32] jam: ok so, if jenv exists, try to call api-endpoints, if the latter returns an error, bootstrap, otherwise consider the env bootstrapped? [10:32] frankban: that sounds like what I would do [10:33] frankban: well "if jenv exists, try api-endpoints if it succeeds the env is bootstrapped, else bootstrap' [10:33] frankban: there is a small potential for "it is bootstrapped but not done starting yet" [10:33] though I think "api-endpoints" should notice and hang for a while waiting for it to start [10:33] fwereade, jam: when exactly I can expect the jenv file to exist without a bootstrapped env? [10:34] frankban, if someone ran sync-tools first [10:34] frankban, (or if there's a bug) [10:34] frankban: *today* it only happens if (a) someone runs sync-tools, (b) there is a bug during bootstrap and the machine fails to start, (c) someone *else* bootstrapped copied the file to you, and then did destroy-environment [10:36] fwereade, jam, ok: so maybe the logic is: jenv does not exist -> bootstrap. [10:37] jenv exists: run "juju status" until 1) it returns an error, in which case consider the environment not bootstrapped -> bootstrap [10:37] or 2) it returns normally -> the env is bootstrapped [10:37] frankban, sgtm [10:37] (and ready to be connected to) [10:37] cool [10:39] fwereade, jam: thanks [10:39] frankban: "juju status" or "juju api-endpoints" ? [10:39] if you need the contents of status, go for it [10:40] jam: I guess juju status until the agent is started, and then api-endpoints [10:40] frankban: shouldn't api-endpoints wait for the agent as well? [10:40] Doesn't it today? [10:41] jam: I don't know, and maybe you are right, and I am being too paranoid [10:43] frankban: well, it *should* because it should have to connect to the environment to answer your question, but I would certainly consider testing it first [10:43] jam: ack, thanks [10:43] jam, frankban: I suspect that api-endpoints uses Environ.StateInfo and will thus return the address for a machine that is not necessarily ready [10:43] jam, frankban: status needs a *working* machine [10:44] fwereade: oh, ok. so the current quickstart behavior seems ok [10:44] frankban, cool [10:49] fwereade: so, api-endpoints *should* talk to the API to see if there are any other machines it doesn't know about yet. [10:49] for the HA sort of stuff. But probably *today* it may not. [10:49] jam, I agree, but I don't think it does today [10:49] fwereade: and we're overdue for standup [11:09] rogpeppe1 and dimitern: time to rumble for who gets which 1:1 time slot. :) We can have you both go at 9am local time (8UTC for Dimiter and 9UTC for Roger), or we can twist roger's arm into starting earlier [11:09] but that means Dimiter needs to bring Roger food or something once he finally wakes up. [11:11] :) [11:11] so 8 UTC should be 10 local time I guess [11:11] jam, I'm ok with that [11:12] dimitern: you changed TZ? CEST is +1 after daylight savings time ended. [11:12] jam, ah, so 9 am then [11:13] jam, well, I think I can live with that for now :) [11:19] dimitern: you should have an invite, though I don't know if your calender woes have been sorted out [11:34] * TheMue => lunch [11:42] jam, I'll take a look, 10x [11:51] * fwereade lunch [11:52] jam, accepted and added the invite [12:20] fwereade, rogpeppe1, updated https://codereview.appspot.com/25080043/ - take a look when you have some time please [12:20] * dimitern lunch [13:03] frankban, there are various pathological conditions where the jenv exists but the environment is not functional [13:04] hazmat: in which cases the status command fails, right? [13:04] api-endpoints definitely does not wait for anything atm re the env endpoint actually being available, it just returns the address [13:04] frankban, yeah [13:05] jam, its quite a bit more robust when it doesn't talk to the actual api, and it gives a user an easy way to find a failed bootstrap node without resorting to provider tools. [13:05] although i guess --debug does the same wrt to discovery of state server node [13:06] hazmat: so we can consider an environment to be bootstrapped when the jenv exists AND status returns a started agent [13:06] jam, clients can query that info if they need it [13:06] frankban, hmm.. i wouldn't invoke status.. i'd just try to connect to the api with a timeout [13:07] i guess status works, if you don't want to do a sane/long timeout period, its just expensive to throwaway the result [13:11] frankban: tbh i favour the "try to bootstrap and succeed if the error says we're already bootstrapped" approach [13:12] frankban: that's essentially what you'd be trying to replicate (flakily) by trying to decide if the environment is bootstrapped by trying to connect to it === gary_poster|away is now known as gary_poster [13:13] rogpeppe1: that was my first intent, it also avoids races. the only problem is having to parse stderr, which seems weak (i.e. the error message can change) [13:13] frankban: let's not change it then :-) [13:13] :-) [13:14] frankban: alternatively, just assume that if the .jenv file exists, the environment is bootstrapped [13:14] frankban: ignoring the pathological cases that hazmat mentions, because we're hoping to eliminate those [13:15] rogpeppe1: what about sync-tools? [13:15] frankban: most users will never use sync-tools, i think [13:15] rogpeppe1, there's no removing them.. i delete all the instances in the aws console for example. [13:16] hazmat: well, in general there is *no* way to tell if an environment is non-functional because it hasn't been bootstrapped or because it is just not working [13:17] hazmat: in this case, we want to bootstrap if the env hasn't been bootstrapped [13:17] hazmat: if you've deleted all the instances, the environment has still (logically) been bootstrapped - it's just highly non-functional... [13:21] * frankban lunches [13:44] rogpeppe1: ping [13:45] TheMue: pong [13:45] rogpeppe1: you wrote in your review that the connection has to be closed too [13:46] TheMue: that's fwereade's suggestion yes. [13:46] TheMue: personally i think it's a risky thing to do [13:46] rogpeppe1: so is it ok to explicit kill the srv in the root too inside of root.Kill()? [13:46] TheMue: not really [13:47] TheMue: you should probably just close the underlying connection [13:47] TheMue: (you'll need to actually pass it around - it's not currently available in the right place) [13:48] rogpeppe1: what would then happen to the other users/holders of the connection? [13:49] TheMue: there's only one - the agent at the other end [13:50] rogpeppe1: I meant differently, there are references to the conn inside the initial root (if I see it correctly). how will it behave if I close the connection? [13:50] TheMue: it should all just shut down in an orderly fashion [13:51] TheMue: i don't think there's any other way to drop the connection [13:51] rogpeppe1: so in order to shut it down close the conn instead of shut it down to close the conn? [13:52] rogpeppe1: so I have to find a nice way how to pass the conn to where I need it [13:54] rogpeppe1: ah, found one [13:55] TheMue: i don't think you can close the rpc.Conn, BTW [13:56] TheMue: although... maybe it might work [13:57] TheMue: in fact, that's the right thing to do [13:59] rogpeppe1: hmm? to close or not to close, that's the question [13:59] TheMue: if you do close, you'll need to do it asynchronously [14:01] TheMue: if you're going to close the connection, i think the right thing to do is just close it. that will take care of killing the relevant pinger [14:01] TheMue: hmm, except that by default we don't want to kill the pinger, we'll just stop iit [14:01] rogpeppe1: and what do you mean with async? [14:02] TheMue: rpc.Conn.Close blocks until all requests have completed [14:02] TheMue: if you call it within a request, you'll deadlock [14:02] TheMue: hmm, except in fact it'll be called in separate goroutine anyway, so it might work ok [14:02] rogpeppe1: ah, ic [14:03] * TheMue dislikes terms like "should" or "might" ;) [14:04] * rogpeppe1 goes to check that Pinger.Kill followed by Pinger.Stop will work [14:05] TheMue: the "might" comes from the fact that you'll have to make sure that a request can't block itself on the timeout goroutine because the timeout goroutine is trying to close the connection [14:05] fwereade, https://codereview.appspot.com/25080043/ updated [14:06] TheMue: it's probably best just to do go conn.Close() tbh [14:06] fwereade, I'll do some live upgrade testing later today [14:07] rogpeppe1: ok, thanks, will take that approach [14:10] jam, mgz: so, when I bzr annotate, and I see a number like "1982.5.6"... how do I turn that into an actual revision on trunk? [14:10] fwereade: you mean when it was merged? [14:10] you could do "bzr log -r 1982.5.6..-1" [14:10] or use bzr qannotate [14:10] (apt-get install qbzr) [14:10] which shows that stuff in the log [14:11] jam, thanks [14:13] fwereade: also, `bzr log -rmainline:1982.5.6` [14:15] mgz, thanks also :) [14:17] fwereade: the thing I really like about qbzr is it is pretty easy to jump around, so you can see what rev modified a file, and then quickly see what it looked like before that change [14:17] but I see how for --force you reall just want to see the mainline commits to merge it back to 1.16 [14:18] mgz: interesting. what does the "mainline:" in there make a difference? [14:18] s/what/why/ [14:19] rogpeppe1, I think I'm being stupid -- can you explain how https://codereview.appspot.com/14619045/ precipitated the change in jujud/machine_test.go ? because I can fix my 1.16 problem by adding JobManageState, and I can see why it's necessary -- but I can't figure out what in the minimal set of branches necessary to get destroy-machine --force might have actually triggered it [14:20] rogpeppe1: "bzr log -r mainline:X" logs the first mainline revision (no dots) that includes the revision in question [14:20] if you do a range like "bzr log -r 1982.5.6..-1" you'll be able to see which one it is [14:20] but the "mainline:" is *just that rev* [14:21] * fwereade waits for someone to point out something that'd be obvious to an partially-sighted and fully-intoxicated monkey [14:21] jam: so that signifies something to log in particular, or is that something that's useful anywhere a revno can be used? [14:21] rogpeppe1: should be anywhere a revno can be used [14:21] diff, etc [14:21] it is just for "-r" [14:22] fwereade: so the change for Uniter to use the cached addresses from state [14:22] means that Uniter.APIAddresses [14:22] needs to have a JobMachineState somewhere [14:22] to report its IP address [14:22] fwereade: the general change to state/api/common/addresser. [14:22] fwereade: does that make sense? [14:22] jam: so can 1982.5.6 actually specify a different revision to mainline:1982.5.6 ? [14:23] rogpeppe1: 1982.5.6 is the revision itself, mainline:1982.5.6 is the revision which merged that revision into trunk [14:23] rogpeppe1: "try it" ? [14:24] fwereade: it's necessary because one of the agents started by TestManageEnviron calls State.Addresses [14:24] fwereade: i can't quite remember the details of which [14:25] fwereade: when you say "triggered it" what are you referring to? [14:25] rogpeppe1: he is trying to backport his destroy machine --force, and it seems it is carrying with it some unexpected baggage [14:26] rogpeppe1, I cherrypicked a few unrelated branches and got those JobManageEnviron tests failing, and I can't figure out why -- apart from that, obviously, it can't work as written, and does work if it also runs JobManageState [14:26] * rogpeppe1 goes to pull 1.16 [14:28] rogpeppe1, sorry, I don't want to properly distract you -- if there's no immediate "oh yeah that was weird" that springs to mind I'll keep poking happily [14:29] fwereade: if none of the new addressing stuff made it into 1.16, it's weird that this is happening [14:29] fwereade: i.e. that adding JobManageState fixes anything [14:30] rogpeppe1, yeah, indeed [14:54] I am trying to destroy an azure environment and failing: http://pastebin.ubuntu.com/6411031/ [14:57] abentley, consistent? [14:57] fwereade: Yes. [14:58] fwereade: using 1.16.3, but it may have been bootstrapped with 1.17.x [14:59] abentley, I don't know azure at all really, but is it possible you've got some instances still around? or running under the same account? iirc the failing bit is one of the last steps at teardown time [14:59] fwereade: I don't know much about azure either. I've just been using it through juju. [15:00] jam, do you recall, did natefinch do any of the azure stuff? [15:44] lunch [15:51] fwereade: natefinch has worked with azure, and has done our Windows builds [15:52] jam, thought so -- he's coming back later, right? [15:52] fwereade: I believe so. [15:52] abentley: I know that instance teardown is particularly bad there [15:52] gossip says it takes minutes to tear down one machine, and deleting machines is protected by a single lock [15:53] jam: Yes, I've witnessed the slow teardown myself. [16:18] sinzui or abentley: either of you have time to review https://code.launchpad.net/~jcsackett/charmworld/better-latency-round-2/+merge/195091 ? [16:18] jcsackett: sure. [16:18] abentley: thanks. [16:55] * fwereade needs to stop for a while, will probably be back to say hi to those in the antipodes at least [17:40] fwereade, is https://codereview.appspot.com/25080043/ good to land? [18:40] CI found a critical regression, bug #1250974. I'll talk to wallyworld_ when he comes on online about it. [18:40] <_mup_> Bug #1250974: upgrade to 1.17.0 fails [18:47] abentley, ^ do you want to revise the details I put in the description [18:50] * rogpeppe1 is done [19:08] sinzui: done. (just swapped 2052 to 2053 at the end). [19:09] thank you [20:28] Three months on Ubuntu and I only just now realized that (0:08) next to my battery means it'll be charged in 8 minutes, not that it thinks there's only 8 minutes of charge left :/ [20:40] anybody know golang internals? we're debugging some reflection issues in gccgo [20:48] natefinch, how's the xps15 treating you? [20:48] fwereade, i know the azure stuff, what's up? [20:49] * hazmat reads abentley's traceback [20:49] abentley, so the azure provider is basically synchronous and very careful about cleaning up after itself [20:49] hazmat: so far... some problems. The hardware is awesome, but Ubuntu is having some problems... trying to get bumblebee working to enable optimus so it'll use the NVidia GPU instead of just the built-in one on the Intel chipset. [20:50] natefinch, how's the keyboard? [20:50] * hazmat has nightmares about old dell laptop keyboards [20:50] hazmat: That's good to know, but it did not succeed in this case. [20:50] abentley, two things.. you can log into the console (or use the nodejs cli tools) to verify you have no machines running, and then try running destroy again with --debug [20:51] hazmat: not terrible... it's standard size... takes a little getting used to it. but I can still generally type without looking. Of course, stuff like home end etc is moved around some. [20:51] hazmat: But this may be an Azure bogosity. As far as sinzui and I can tell, it is impossible to delete the network. We have tried from the Azure web console, and though nothing is using it, Azure says it can't be deleted because things are using it. [20:51] abentley, normally the azure provider does some polling against the operation event stream [20:51] hmm [20:51] abentley, its worked in the past but there have been changes on both end [20:52] its been about 1.5 m since i last run the azure provider.. [20:52] hazmat: But even the web console doesn't work, so I'm inclined not to blame the azure provider. [20:52] abentley, fair enough.. there are lots of resources not nesc. exposed in the ui.. so both you and sinzui had this issue? [20:53] hazmat: We are using the same subscription, so we have the same set of resources. [20:53] abentley, aha [20:53] abentley, but your using different env names and controls and networks? [20:54] abentley, i dunno that cross env sharing of resources is going to work so well [20:54] hazmat: Yes. [20:54] hazmat: We're both doing fairly limited testing, with different environment names, so it doesn't seem likely that we'll exhaust each others' resources. [20:55] Or otherwise trip on each others' feet. [20:57] abentley, well... [20:58] abentley, so everything in your azure provider section is different? === BradCrittenden is now known as bac [21:00] hazmat: No, storage-account-name, management-subscription-id, management-certificate-path are the same. Not sure about admin-secret. [21:01] hazmat: But I don't see how that's relevant to the unkillable network. [21:04] * hazmat logs into azure console [21:06] abentley, does it show networks == 0 ? [21:06] in the console [21:07] abentley, the azure provider does some stuff with networking (interlink between services) which isn't represented in the console [21:07] or the api really, just raw xml [21:07] hazmat: No, it shows networks == 2. [21:08] abentley, so possibly you have interlinks between the services in two different environments [21:08] which would explain why neither can be deleted [21:09] just guessing though.. easy to find reproduce if that's the case [21:10] hazmat: That can't be the cause, because the second network was created when we found we couldn't destroy the first network. [21:11] hazmat: i.e. the problem pre-dated the second network. [21:12] abentley, hmm.. [21:12] abentley, okay.. let me create and destroy and env.. which version you using? [21:12] of juju [21:12] 1.16.3 [21:14] k, i'm trying with trunk [21:16] we should really default the region to the same one the imagestream refs [21:16] ie East US [21:19] hazmat: sinzui reports he's had a lot of trouble with East. [21:20] abentley, but simplestreams metadata refs images there.. how do you get around the affinity group otherwise? [21:20] abentley, ie https://bugs.launchpad.net/juju-core/+bug/1251025 [21:20] <_mup_> Bug #1251025: azure provider sample config should default to East US [21:20] hazmat: All I know is that it works. [21:31] hazmat: bwahaha, finally got it working (mostly user error I think). 24" 1920x1200, 30" 2560x1600, 15.6" 3200x1800 all running smoothly. [21:34] hazmat: I've run destroy-environment and we're back to just 1 network. [21:35] hazmat: And it still fails to delete. [21:36] abentley, the azure provider doesn't provider very much log output.. [21:37] abentley, created and destroyed env without issue here. [21:37] abentley, by chance you know what was deployed in the first env.. just the ci test of wordpress/mysql ? [21:37] hazmat: Yes, that's what it was. [21:39] there's like this 3 minute pause during bootstrap with no info given to why [21:47] sinzui: do you have time to look at a one line MP? https://code.launchpad.net/~jcsackett/charms/precise/charmworld/fix-lp-creds-ini-override/+merge/195146 [21:47] I do [21:48] jcsackett, r=me [21:49] sinzui: thanks. [21:50] sinzui: hi there. is there another upgrade bug? :-( [21:52] wallyworld_, yes. take your time and read though it and maybe the log: https://bugs.launchpad.net/juju-core/+bug/1250974 [21:52] <_mup_> Bug #1250974: upgrade to 1.17.0 fails [21:53] ok. i used the same deprecation mechanism as for public bucket url so i'll have to see why that's not working here [21:54] wallyworld_, maybe it is how we test [21:54] wallyworld_, abentley hp cloud is not failing. are the configs different? [21:55] interesting. hp cloud does have a sloightly different config boilerplate written out for it via juju init [21:55] but that's just some extra comments in the yaml afaik [21:56] wallyworld_, I only speculated as the order of event that could lead to testing/ being ignored. I think the 1.16.3 tools was selected from the testing location, but by the moment of the upgrade, testing/ was no longer known [21:56] sinzui: I haven't tested in a way that would show hp failing. We test canonistack before hp, so we never bother to check whether hp is failing. [21:56] sinzui: Because we already know it's a fail. [21:58] sinzui: As far as I know, everything except "provider" and and "floating-ip" is configured differently between hp and canonistack. [21:58] sinzui: it could have something to do with the jenv stuff - that is relatively new and has been evolving since the public bucket url deprecation mechanism last worked [21:58] abentley, if we don't use metadata-tools-url in the config, does it work? [21:59] sinzui: Gotta go, but I'll check that first thing tomorrow. [21:59] I suspect it does. I think it is not possible to upgrade with a config that users are likely to have if the read the release notes or just respond to what juju is saying [21:59] thanks abentley [22:01] wallyworld_, per what I asked of abentley ^ I think the issue with upgrades is mixed configs. [22:02] as in they put tools-metadata-url in their new 1.7 env config and upgrade a 1.6 which didn't have it? [22:04] actually, if the jenv files are still present, i think any changes to the env yaml are ignored [22:04] i'm not 100% sure, but i've had to delete the jenv files previously if i wanted to introduce new config [22:04] not very intuitive if you ask me [22:06] so perhaps the user reads the release notes, edits their env yaml to add tools-metadata-url, but it is ignored because the jenv file is there and the yaml is ignored? [22:06] i'll have to check with the folks who did the jenv stuff, or read the code [22:07] wallyworld_, I want the old config to just work for users to upgrade. If users have a perfect new config it should work of course. In the case of a config with old and new values, We need to be certain we honour them. juju servers will be different that juju clients, so the configs need to work for all cases [22:08] sinzui: agreed. that's what the current code in 1.7 does - it sees tools-url, logs a warning, and sets tools-metadata-url to the tools-url value. that's how the public bucket url deprecation worked also. so i'm not sure what's happening to make it fail [22:08] wallyworld_, if we decided to fix bug 1247232, we might be able to be strict about what is in the config [22:08] <_mup_> Bug #1247232: Juju client deploys agent newer than itself [22:09] wallyworld_, once how to the old bootstrap node get the new tool? It looks like it is searching for it. since it doesn't know about metadata-tools-url, it cannot find it? [22:10] Shouldn't the client tell the server the exact version and location to use since it had to do the lookups? [22:11] sinzui: i'm not entirely familiar with the upgrade workflow - what data is passed to where etc. but that would seem sensibl [22:12] sinzui: one thing i can think of - i remove the old tools-url from the config struct that is parsed from the yaml once the new tools-metadata-url is set. perhaps that data is being sent to the old node which then doesn't see a tools-url is recognises? [22:13] wallyworld_, The log implies the server has a new config with 1.16.3. it doesn't have a tools-url set [22:13] * sinzui wishes zless shows line numbers [22:16] sinzui: so it seems perhaps that my assumption that the tools-url could be deleted from the in memory config struct once tools-metadata-url is set is wrong, since that data ends up being passed to the older 1.16 nodes. that surprises me because i wasn't aware that would happen [22:16] wallyworld_, about line 2674 of the log, I see the last know occurrence of /testing After we see that config we see juju search to the new tool, in the wrong location [22:18] wallyworld_, I think you mean the value is cleared. I believe axw reported a bug that it is not possible to delete a config key. [22:19] And maybe that has helped insulate us from upgrade issues [22:19] sinzui: right, yes. i clear the tools-url from the map of config values held by the config struct when the yaml is parsed [22:21] so, thinking out loud, if the wrong tools location is being used, perhaps it's the new 1.17 nodes not having a config with tool-metadata-url set [22:23] sinzui: so that log file just has warnings logged - it would be nice to see debug so we can see the simplestreams search path used [22:24] i'll see if i can do a test to get that happening [22:25] wallyworld_, we see all the paths being checked at 2735 and the entries say DEBUG [22:26] sinzui: i am stupid - i was looking at the console log from the link in the descripton. i didn't see the attachment [22:26] wallyworld_, no need to think ill of yourself. Aaron and I also made the same mistake [22:27] maybe i need more coffee [22:28] wallyworld_, we are a day or two away from being able to run an arbitrary branch + rev + cloud to do a singe test. Aaron hacked two runs to replay the last success and first fail today [22:28] sinzui: i'll go through the log and figure out wtf is happening and hopefully have a fix for you. sorry about the bug. i suspect it is due to slightly new config workflow interfering with it but am not sure [22:30] wallyworld_, thank you. Take your time to do a proper fix [22:31] will do. there's been a bit of churn under the covers that i'm not involved with so old assumptions maybe don't apply anymore. i think i'll have to do a full upgrade test by hand to validate [23:24] davecheney, I don't have a short list of bugs to address. I see http://tinyurl.com/juju-stakeholders and https://launchpad.net/juju-core/+milestone/1.17.0 as bugs we want to help fix. There are so many I think we can confidently work on those that we can confidently fix quickly [23:27] sinzui: ack [23:27] sinzui: work is ongoing to get juju building under gccgo [23:28] i should find some time convenient to your team to break the bad news about this extra dimention of the testing matrix [23:30] davecheney, I saw that. I had an unexpected event last evening. I had thought I would have time to ask you about that. Did I deploy still-born juju-core arm packages for 1.16.3? [23:30] s/deploy/release/ [23:31] sinzui: not really sure [23:31] does anyone use those packages ? [23:33] I suspect not since the test suite doesn't work. [23:33] We only CI on amd64. I sign tarballs that pass on amd64 [23:36] sinzui: do you have an example failure ? [23:36] that sounds like the sort of thing that is in my baliwak to fix [23:37] I don't. I saw a bug report from michael hudson and saw you comment on it [23:54] sinzui: yeah, we have a fix [23:54] need it to be merged on the mgo repo [23:55] davecheney, speaking of mgo, does the test suite always pass for you? I often see mgo failures. they are are commong for me now, but rare back in september [23:57] sinzui: i haven't run that test suite even once int he last year [23:57] is there a jenkins job for it ? [23:57] please assign any failure reports to me