/srv/irclogs.ubuntu.com/2013/11/13/#juju-dev.txt

davecheneysinzui: i'm open for assignment of bug fixes00:14
bigjoolsthumper: I need to have a call with you today about VLANs, let me know when is convenient please01:35
thumperbigjools: now is as good as ever01:36
thumperbigjools: also, not sure how useful I'm going to be :)01:36
bigjoolsthumper: ok let me grab a drink and I'll call in 5 mins01:37
bigjoolsit's a start if nothing else :)01:38
bigjoolsthumper: calling01:41
thumperwallyworld_: here are the test changes I was telling you about https://codereview.appspot.com/25460045/03:36
wallyworld_ok, i've already changed the method names locally. i'll pick up the changes once you land03:39
thumperwallyworld_: ack03:45
* thumper runs on yet another small errand...03:48
jamaxw: we might do a config.New, but the warning is inside Validate07:21
jam(it is at the end of environs/config.Validate)07:22
axwjam: eh, sorry, not sure how I confused that07:22
jamI guess that is somehow different from Config.Validate() ?07:22
axwjam: ah, config.New calls that07:22
jamaxw: at *one* point we had explicitly discussed not even parsing sections we don't know about so that you could have pyjuju and juju-core environments in the same file07:23
axwit's just for validating common configuration07:23
jambut it also applies for multi-version stuff.07:23
jamaxw: I don't see why we need to config.New for anything we won't use07:23
axwyeah, I don't see any value in parsing if we're not using it07:23
axwwe should just defer to first use07:23
jamReadEnvirons just parses everything into Environs objects07:24
axwjam: yep, so we could just modify Environs.Config to do the parse on first reference07:25
jamaxw: what is silly is in environs/open.go we ReadEnvirons("") , just to get name from envs.Default (that we may not use), and then we actually read the info from store.ReadInfo(name), and only if that fails do we actually use the envs we just read07:28
axwheh07:28
jamaxw: so we *do* assert that environs.ReadEnvironsBytes doesn't generate an error, and you only get the error when you use environs.Config()07:30
jamwell, the environs.ReadEnvironsBytes().Config(name)07:30
jambut that is actually because we've subtly put the err on part of the struct07:31
jamwaiting to report it until later.07:31
axwshould be nice and easy then :)07:32
jamfwereade: thanks for doing the backport08:57
fwereadejam,np08:58
jamaxw: so there is a small point about creating a new config each time. If we are creating a warning, we'll do it twice.08:58
fwereadejam, the --force one might be a little trickier08:58
jamI wonder if that is a problem08:58
jamfwereade: because the code is different, or because it is more invasive?08:58
fwereadejam, just because it's a few branches and I'm paranoid08:59
jamfwereade: just because you're paranoid doesn't mean they *aren't* out to get you. :)08:59
fwereadejam, words to live by08:59
=== axw_ is now known as axw
jamaxw: for https://code.launchpad.net/~axwalk/juju-core/jujud-uninstallscript/+merge/19499409:08
jamhow do we handle upgrades ?09:08
jamas in, there won't be anything in agent.conf on a system that we upgraded09:08
jamfwereade: in  https://code.launchpad.net/~wallyworld/juju-core/provisioner-api-supported-containers/+merge/194982 he mentions "A change was also made to the server side implementation so that the machine doc txn-revno is no longer checked."09:09
jamthat sounds risky to me, but I'd like to get your feedback on it.09:09
jamaxw: I didn't mean to scare you away09:12
jamfwereade: one thing about our CLI API work. The new CLI is likely to be incompatible with old server versions when we have to create an API for the command. (the "easy" case vs the "trivial" case). Do we care?09:14
jamwe definitely haven't been implementing backwards compatibility fallback code.09:14
fwereadejam, looking at wallyworld_'s09:17
wallyworld_jam: we rarely check txn-revno. mainly for env settings.  never previously for machines. i was trying to be more stringent by introducing it09:17
fwereadejam, pondering the latter09:17
fwereadejam, wallyworld_: a txn-revno check is a big hammer and should not generally be used until we've exhausted all other possibilities09:17
fwereadejam, wallyworld_: far better to check only the fields we actively care about09:17
wallyworld_fwereade: yes, i came to that conclusion09:18
fwereadejam, wallyworld_: but sometimes that's not practical09:18
davecheneyhttp://paste.ubuntu.com/6409771/09:18
wallyworld_i like optimistic locking in general as a pattern09:18
davecheneyjuju compiled with gccgo09:18
jamfwereade: so that sounds like what he's done09:18
jamas in, we used to assert the whole thing, but now we just assert the one field09:18
fwereadejam, wallyworld_: last I was aware, we'd managed to eliminate them all, I guess another crept in09:18
wallyworld_i introduced it09:19
wallyworld_in a recent branch09:19
fwereadedavecheney, cool,does it work? ;)09:19
wallyworld_it worked in practice but tests failed with some new work09:19
rogpeppe1davecheney: cool09:19
rogpeppe1davecheney: does it work?09:19
jamdavecheney: nice. Almost 3MB smaller than the static one. :) Wish it was more like 15MB smaller.09:20
davecheneyrogpeppe1: i didn't try to bootstrap it09:22
davecheneyoh speaking of that09:22
fwereadejam, wallyworld_: it looks sane to me -- the only way we could be screwing that up is with multiple agents for the same machine on separate instances, and the nonce stuff should guard against that effectively09:22
davecheneyi need someone to commit a mgo change09:22
wallyworld_fwereade: right. that attr is only set once as the machine agent spins up09:22
rogpeppe1davecheney: you can lbox propose the change, i think09:23
davecheneyit's not mine09:23
wallyworld_fwereade: btw, you forgot to look at my wip branch :-(09:23
fwereadewallyworld_, hell, sorry09:23
fwereadethis is what happens when I write code :(09:24
wallyworld_np. it's at the point now where i just need to add one more tests09:24
wallyworld_and i can propose formally09:24
davecheneyrogpeppe1: https://code.launchpad.net/~mwhudson/mgo/evaluation-order/+merge/19496809:24
davecheneymike doesn't know how to use lbox09:24
wallyworld_i've done live testing abd it all seems fine09:24
fwereadewallyworld_, link please?09:24
wallyworld_https://codereview.appspot.com/25040043/09:24
wallyworld_just some tests to add09:24
wallyworld_i've proposed the addsuportedcontainers stuff separately09:25
wallyworld_hence the discussion earlier about txn-revno09:25
fwereadewallyworld_, ta09:25
wallyworld_davecheney: the fact that lbox is not used is not a bad thing :-)09:26
davecheneyfwiw: lucky(/tmp) % strip juju09:26
davecheneylucky(/tmp) % ls -al juju09:26
davecheney-rwxrwxr-x 1 dfc dfc 11438248 Nov 13 20:26 juju09:26
davecheneyrogpeppe1: anyway, that needs to land before juju will work properly09:27
jamwallyworld_: I have a review in.09:27
wallyworld_thanks :-)09:28
rogpeppe1davecheney: yeah09:28
rogpeppe1davecheney: you could propose it yourself, i guess09:28
jamwallyworld_: I did have some comments where I think we are missing test coverage, and possibly having a client-side API that matches the other functions09:28
jambut hopefully minor stuff09:29
wallyworld_ok09:29
rogpeppe1davecheney: it would be nice if go vet (or some other tool) could give warnings about undefined behaviour like that. i guess it's not possible in general though.09:29
wallyworld_jam: " after api.Set* from the API point of view" - there is no api call to get supported containers09:30
wallyworld_the api is currently write only09:30
davecheneyrogpeppe1: /me considers what it would take to detect this behavior09:30
jamwallyworld_: so... do we just do it on every startup ?09:30
jamit would be nice if we would check if we've already done the lookupb09:30
jamunless the lookup is exceptionally cheap09:30
jamI guess09:30
wallyworld_every machine agent start up09:31
jamwallyworld_: anyway, if you *can't* test it, just say so. :)09:31
wallyworld_ok :-)09:31
rogpeppe1davecheney: the oracle might have enough information to find some simple cases09:31
wallyworld_jam: i've not done anything with permissions yet so i'll nned to see how to manipulate them to add a test09:31
jamwallyworld_: there should be other tests you can crib from. Usually it is "set up 3 machines, use the agent for machine-1, try to change something on all 3 machines, and assert that you get EPERM on the ones you're not allowed"09:33
jamI was actually suprrised that in one call you could change m09:33
jamboth machine-0 and machine-109:33
wallyworld_jam: i simply copied another test and used a different api call09:33
jamwallyworld_: so I guess, "lets think what the perms on this should be, and assert them in a test"09:34
davecheneyrogpeppe1: I also have a fix in for gccgo to fix the go/build breakage09:34
jam*I* would say that the only thing that is allowed to change the value of a machine's supported containers is that machine's assigned agent09:34
jamwhich is an AuthOwner sort of test.09:34
rogpeppe1davecheney: cool09:34
jamwallyworld_: and if you remember the thing you copied, we might have a security hole there, so at least file a tech-debt bug to track it.09:38
wallyworld_"if" is the relevant word :-)09:39
wallyworld_i'll take a look09:39
rogpeppe1fwereade: i've just realised that for ensure-ha to be transactional, State.AddMachine needs to take a count argument and for that to create all its machines in the same transaction. Looking at the transactions around AddMachine, this is a bit OMG.09:40
fwereaderogpeppe1, how much of it is applicable in that case? IIRC most of the complexity is around containers rather than machines themselves09:41
rogpeppe1fwereade: i'm not sure - i haven't grokked the code yet09:41
fwereadejam, I'm still thinking about api backward compatibility and thinking that it kinda sinks the --force fix for 1.1609:41
rogpeppe1fwereade: just the idea of making transactions that can be hundreds of operations long fills me with doubt09:42
fwereadejam, 1.16s should really work with other 1.16s09:42
fwereaderogpeppe1, how would they be that long?09:42
rogpeppe1fwereade: juju add-machine -n 100 ?09:42
fwereaderogpeppe1, ah wait, unexplored assumption09:43
rogpeppe1fwereade: i guess we wouldn't need to use the count for add-machine09:43
fwereaderogpeppe1, why does AddMachine need a count argument?09:43
fwereaderogpeppe1, jinx :)09:43
rogpeppe1fwereade: not quite: State.AddMachine needs a count argument. juju add-machine doesn't need to use it.09:44
fwereaderogpeppe1, in general if things like -n need to be transactional (which I agree they should) I think the sane answer is to stick it in a queue of somesort09:44
fwereaderogpeppe1, not quite so sure about that09:44
rogpeppe1fwereade: AddMachine needs a count argument because otherwise we'd be able to have an even number of state servers09:44
fwereaderogpeppe1, wouldn't HA methods on state be saner?09:44
rogpeppe1fwereade: i'd intended to do so. but those methods need to add machines09:45
fwereaderogpeppe1, right, so they should use addMachineOps09:45
fwereaderogpeppe1, the various unexported *Ops methods in state are the building blocks of transactions09:47
fwereaderogpeppe1, they are I admit kinda gunky in cases, like lego buried in leafmould for years09:47
fwereaderogpeppe1, but they're internal and therefore subject to safe improvement as required09:48
rogpeppe1fwereade: and then we make State.AddMachine barf if its jobs contain state server jobs?09:48
fwereaderogpeppe1, probably09:48
rogpeppe1fwereade: this all makes me feel highly uncomfortable09:48
fwereaderogpeppe1, which is I admit a hassle from a test-fixing perspective09:49
fwereaderogpeppe1, if it's hard for you to write the code this is all the more reason we should not just hand the user the same toolkit you're reacting against and tell them to figure it out09:49
rogpeppe1fwereade: we'd also need to have another special case for adding machine 009:49
fwereaderogpeppe1, machine 0 is already a special case09:50
rogpeppe1fwereade: not in state, currently09:50
rogpeppe1fwereade: AFAIK09:50
jamfwereade: because --force requires a new API? I guess it does add a parameter, but won't that just be ignored otherwise ?09:50
rogpeppe1fwereade: it's hard to write the code in this particular style09:50
fwereaderogpeppe1, it's an InjectMachine not an AddMachine09:50
rogpeppe1fwereade: well, InjectMachine would need the same restrictions as AddMachine, no09:51
rogpeppe1?09:51
fwereadejam, maybe it's not such a big deal -- it won't work if the agent-version is old, but it'll be silent09:53
jamfwereade: for good or bad that has been our answer for API compatibility09:53
rogpeppe1fwereade: another possibility that means that we wouldn't have to transactionalise all this fairly arbitrary logic is to just put the ensure-ha bool value in the state09:53
fwereaderogpeppe1, how many other InjectMachine cases are there?09:53
jamfwereade: and that isn't much different than a 1.18 client trying to do it against a 1.16 system.09:53
fwereadejam, I think there's a distinction between stuff not working across minor versions vs patch versions09:53
jamfwereade: maybe, though I think from a *client* perspective patch versions shouldn't really break things either.09:55
fwereadejam, s/patch/minor/?09:55
jam*I* have been hoping to push that more once we actually got everything into the api09:55
jamfwereade: right09:55
rogpeppe1fwereade: only one - in the manual provisioner09:55
jamfwereade: I *really* want the client that is on 14.04 initially to still work 2 years later09:55
fwereadejam, yes indeed09:55
fwereadejam, at that point I think it's a matter of freezing Client and writing new methods that are a little bit consistent with each other, and with the style of the internal API09:55
jamfwereade: yeah, I was thinking about the Batch stuff we did. And realizing that the thing we *really* want to be Batch is Client, which was written before we were focusing on it. :(09:56
fwereaderogpeppe1, remind me, does manual bootstrap use jujud bootstrap-state? if so the other case is more like RegisterMachine -- which is really an AddMachine with instance id/hardware09:58
rogpeppe1fwereade: in fact, the more i think about it, the more i think it would be better if we just signalled the HA intention in the state, and let an agent sort it out, the same way the rest of our model works.09:58
fwereaderogpeppe1, if it's a matter of rearranging the state methods that's just the usual process of development, I think09:58
rogpeppe1fwereade: a significant part of it is that we may very well want more on-going logic around ensure-ha in the future09:59
fwereaderogpeppe1, the concern there is about automatic failover -- I don't want to be rearranging mongo all the time on the basis of presence alone, without user input09:59
fwereaderogpeppe1, expand on that bit please?09:59
fwereadejam, it has been a constant source of low-level irritation to me as well ;)10:00
rogpeppe1fwereade: so, for example: at some point we will probably want to automatically start a new state server machine when one falls over10:00
fwereaderogpeppe1, we might10:00
fwereaderogpeppe1, in which case it's a trivial agent that keeps an eye on presence and calls the ensure-ha logic that we currently require user intervention for10:00
rogpeppe1fwereade, jam: i'm unconvinced that the one-size-fits-all batch approach we use in our internal API is appropriate for the client API.10:00
fwereaderogpeppe1, it is wholly apropriate for the client API, the internal API is the bit that's arguable10:01
rogpeppe1fwereade: expand, please10:03
fwereaderogpeppe1, the argument that there are times when you really only want one entity to be messed with is reasonable in the case of, say, internal Machine.EnsureDead -- because it's governed by an agent that (currently) only has responsibility for one machine10:03
jamrogpeppe1: a Client is much more likely to care about more than one unit/machine/thingy at a time. most of the agents have a 1-1 correspondence10:03
fwereaderogpeppe1, I don't see any justification for requiring that any client-led change must be broken into N calls10:04
rogpeppe1fwereade, jam: it's easy for a client to make many calls concurrently10:05
fwereaderogpeppe1, take DestroyMachines/DestroyUnits for example -- that goes halfway, and then does that horrible glue-errors-together business10:05
fwereaderogpeppe1, you have a different idea of "easy" than many other people10:05
fwereaderogpeppe1, and it also prevents us from ever batching things up usefully internally10:05
rogpeppe1fwereade: i have no particular objection to making some calls batch-like, on a case-by-case basis10:05
fwereaderogpeppe1, I do10:05
rogpeppe1fwereade: so you do10:06
fwereaderogpeppe1, because we're bad at predicting, and the cost of a 1-elem array vs a single object is negligible, and allows for compatibility when we get it wrong10:06
rogpeppe1fwereade: it's *not fucking negligible*10:07
fwereaderogpeppe1, I'mnot saying the HA stuff is easy, if that's what you're referring to10:07
fwereaderogpeppe1, are you making an ease-of-development argument?10:08
rogpeppe1fwereade: i'm making a keep-it-fucking-simple-please and this-is-unneccessary-and-insufficient argument10:08
fwereaderogpeppe1, unnecessary I think we'll have to differ on, insufficient is more interesting10:10
* rogpeppe1 goes for a walk10:13
jamfwereade: man, you drove everyone away. :) (axw, mramm, etc.)10:21
fwereadejam, I'm pretty unpleasant really when it comes down to it10:22
jamI know *I'm* glad I only have to put up with you for 1 week every few months. :)10:22
jamanyway10:22
fwereade;p10:24
frankbanhi coredevs: I need to implement a "bootstrap an environment only if it's not already bootstrapped" logic for the quickstart plugin. I thought about two option 1) try: juju bootstrap; except error, if error is "already bootstrapped" then ok. Options 2) is: if JUJU_HOME/environments/<envname>.jenv exists then ok, already bootstrapped. 1) seems weak (I'd have to parse the command error string) and 2) seems to rely10:25
frankbanon an internal detail (those jenv files). Suggestions?10:25
jamfrankban: it is also *possible* for foo.jenv to exist but not be bootstrapped, though that is subject to bugs in bootstrap (which are hopefully rare enough to not worry about)10:27
jamfrankban: an alternative is to try to connect to the env rather than try to bootstrap first10:27
fwereadefrankban, I would prefer (2) because Isee .jenv files as pretty fundamental, and waxing in importance -- the wrinkle is that a .jenv might be created without the environment being bootstrapped (by sync-tools)10:28
fwereadejam, frankban: however if no jenv exists the env is certainly not bootstrapped10:28
fwereadejam, frankban: and I would imagine that quickstart is *always* going to want to create a new environment10:29
fwereadejam, frankban: so just picking a name not used by a jenv, or environments.yaml, might end-run around the problem?10:29
jamfwereade: quickstart is meant to "help you along your way"10:30
jamso if you haven't bootstrapped yet it starts there10:30
jamif you've bootstrapped but not yet installed juju-gui10:30
jamthen it starts there10:30
jametc10:30
jamso running it multiple times should be convergent10:30
jameven if it gets ^c in the middle.10:30
fwereadejam, bah, ok10:31
frankbanfwereade: one of the goal is make quickstart idempotent, so, if the environment is already bootstrapped it must skip that step, if the GUI is in the env then skip that step too, and so on... jam: to connect to the API i need to know the endpoint, and I'd like to avoid asking permission and calling juju api-endpoints.10:31
jamfrankban: I would probably go with "if .jenv exists, try to connect"10:31
jamfrankban: "asking permission" ?10:31
fwereadejam, frankban: +110:31
jamdon't you have to call "juju api-endpoints" at some point regardless10:32
fwereadejam, frankban: and if that fails, fall back to bootstrapping?10:32
jamwe *do* plan to cache the api addresses in the .jenv file10:32
jambut I don't know when that code will actually be written.10:32
frankbanjam: ok so, if jenv exists, try to call api-endpoints, if the latter returns an error, bootstrap, otherwise consider the env bootstrapped?10:32
jamfrankban: that sounds like what I would do10:32
jamfrankban: well "if jenv exists, try api-endpoints if it succeeds the env is bootstrapped, else bootstrap'10:33
jamfrankban: there is a small potential for "it is bootstrapped but not done starting yet"10:33
jamthough I think "api-endpoints" should notice and hang for a while waiting for it to start10:33
frankbanfwereade, jam: when exactly I can expect the jenv file to exist without a bootstrapped env?10:33
fwereadefrankban, if someone ran sync-tools first10:34
fwereadefrankban, (or if there's a bug)10:34
jamfrankban: *today* it only happens if (a) someone runs sync-tools, (b) there is a bug during bootstrap and the machine fails to start, (c) someone *else* bootstrapped copied the file to you, and then did destroy-environment10:34
frankbanfwereade, jam, ok: so maybe the logic is: jenv does not exist -> bootstrap.10:36
frankbanjenv exists: run "juju status" until 1) it returns an error, in which case consider the environment not bootstrapped -> bootstrap10:37
frankbanor 2) it returns normally -> the env is bootstrapped10:37
fwereadefrankban, sgtm10:37
frankban(and ready to be connected to)10:37
frankbancool10:37
frankbanfwereade, jam: thanks10:39
jamfrankban: "juju status" or "juju api-endpoints" ?10:39
jamif you need the contents of status, go for it10:39
frankbanjam: I guess juju status until the agent is started, and then api-endpoints10:40
jamfrankban: shouldn't api-endpoints wait for the agent as well?10:40
jamDoesn't it today?10:40
frankbanjam: I don't know, and maybe you are right, and I am being too paranoid10:41
jamfrankban: well, it *should* because it should have to connect to the environment to answer your question, but I would certainly consider testing it first10:43
frankbanjam: ack, thanks10:43
fwereadejam, frankban: I suspect that api-endpoints uses Environ.StateInfo and will thus return the address for a machine that is not necessarily ready10:43
fwereadejam, frankban: status needs a *working* machine10:43
frankbanfwereade: oh, ok. so the current quickstart behavior seems ok10:44
fwereadefrankban, cool10:44
jamfwereade: so, api-endpoints *should* talk to the API to see if there are any other machines it doesn't know about yet.10:49
jamfor the HA sort of stuff. But probably *today* it may not.10:49
fwereadejam, I agree, but I don't think it does today10:49
jamfwereade: and we're overdue for standup10:49
jamrogpeppe1 and dimitern: time to rumble for who gets which 1:1 time slot. :) We can have you both go at 9am local time (8UTC for Dimiter and 9UTC for Roger), or we can twist roger's arm into starting earlier11:09
jambut that means Dimiter needs to bring Roger food or something once he finally wakes up.11:09
dimitern:)11:11
dimiternso 8 UTC should be 10 local time I guess11:11
dimiternjam, I'm ok with that11:11
jamdimitern: you changed TZ? CEST is +1 after daylight savings time ended.11:12
dimiternjam, ah, so 9 am then11:12
dimiternjam, well, I think I can live with that for now :)11:13
jamdimitern: you should have an invite, though I don't know if your calender woes have been sorted out11:19
* TheMue => lunch11:34
dimiternjam, I'll take a look, 10x11:42
* fwereade lunch11:51
dimiternjam, accepted and added the invite11:52
dimiternfwereade, rogpeppe1, updated https://codereview.appspot.com/25080043/ - take a look when you have some time please12:20
* dimitern lunch12:20
hazmatfrankban, there are various pathological conditions where the jenv exists but the environment is not functional13:03
frankbanhazmat: in which cases the status command fails, right?13:04
hazmatapi-endpoints definitely does not wait for anything atm re the env endpoint actually being available, it just returns the address13:04
hazmatfrankban, yeah13:04
hazmatjam, its quite a bit more robust when it doesn't talk to the actual api, and it gives a user an easy way to find a failed bootstrap node without resorting to provider tools.13:05
hazmatalthough i guess --debug does the same wrt to discovery of state server node13:05
frankbanhazmat: so we can consider an environment to be bootstrapped when the jenv exists AND status returns a started agent13:06
hazmatjam, clients can query that info if they need it13:06
hazmatfrankban, hmm.. i wouldn't invoke status.. i'd just try to connect to the api with a timeout13:06
hazmati guess status works, if you don't want to do a sane/long timeout period, its just expensive to throwaway the result13:07
rogpeppe1frankban: tbh i favour the "try to bootstrap and succeed if the error says we're already bootstrapped" approach13:11
rogpeppe1frankban: that's essentially what you'd be trying to replicate (flakily) by trying to decide if the environment is bootstrapped by trying to connect to it13:12
=== gary_poster|away is now known as gary_poster
frankbanrogpeppe1: that was my first intent, it also avoids races. the only problem is having to parse stderr, which seems weak (i.e. the error message can change)13:13
rogpeppe1frankban: let's not change it then :-)13:13
frankban:-)13:13
rogpeppe1frankban: alternatively, just assume that if the .jenv file exists, the environment is bootstrapped13:14
rogpeppe1frankban: ignoring the pathological cases that hazmat mentions, because we're hoping to eliminate those13:14
frankbanrogpeppe1: what about sync-tools?13:15
rogpeppe1frankban: most users will never use sync-tools, i think13:15
hazmatrogpeppe1, there's no removing them.. i delete all the instances in the aws console for example.13:15
rogpeppe1hazmat: well, in general there is *no* way to tell if an environment is non-functional because it hasn't been bootstrapped or because it is just not working13:16
rogpeppe1hazmat: in this case, we want to bootstrap if the env hasn't been bootstrapped13:17
rogpeppe1hazmat: if you've deleted all the instances, the environment has still (logically) been bootstrapped - it's just highly non-functional...13:17
* frankban lunches13:21
TheMuerogpeppe1: ping13:44
rogpeppe1TheMue: pong13:45
TheMuerogpeppe1: you wrote in your review that the connection has to be closed too13:45
rogpeppe1TheMue: that's fwereade's suggestion yes.13:46
rogpeppe1TheMue: personally i think it's a risky thing to do13:46
TheMuerogpeppe1: so is it ok to explicit kill the srv in the root too inside of root.Kill()?13:46
rogpeppe1TheMue: not really13:46
rogpeppe1TheMue: you should probably just close the underlying connection13:47
rogpeppe1TheMue: (you'll need to actually pass it around - it's not currently available in the right place)13:47
TheMuerogpeppe1: what would then happen to the other users/holders of the connection?13:48
rogpeppe1TheMue: there's only one - the agent at the other end13:49
TheMuerogpeppe1: I meant differently, there are references to the conn inside the initial root (if I see it correctly). how will it behave if I close the connection?13:50
rogpeppe1TheMue: it should all just shut down in an orderly fashion13:50
rogpeppe1TheMue: i don't think there's any other way to drop the connection13:51
TheMuerogpeppe1: so in order to shut it down close the conn instead of shut it down to close the conn?13:51
TheMuerogpeppe1: so I have to find a nice way how to pass the conn to where I need it13:52
TheMuerogpeppe1: ah, found one13:54
rogpeppe1TheMue: i don't think you can close the rpc.Conn, BTW13:55
rogpeppe1TheMue: although... maybe it might work13:56
rogpeppe1TheMue: in fact, that's the right thing to do13:57
TheMuerogpeppe1: hmm? to close or not to close, that's the question13:59
rogpeppe1TheMue: if you do close, you'll need to do it asynchronously13:59
rogpeppe1TheMue: if you're going to close the connection, i think the right thing to do is just close it. that will take care of killing the relevant pinger14:01
rogpeppe1TheMue: hmm, except that by default we don't want to kill the pinger, we'll just stop iit14:01
TheMuerogpeppe1: and what do you mean with async?14:01
rogpeppe1TheMue: rpc.Conn.Close blocks until all requests have completed14:02
rogpeppe1TheMue: if you call it within a request, you'll deadlock14:02
rogpeppe1TheMue: hmm, except in fact it'll be called in separate goroutine anyway, so it might work ok14:02
TheMuerogpeppe1: ah, ic14:02
* TheMue dislikes terms like "should" or "might" ;)14:03
* rogpeppe1 goes to check that Pinger.Kill followed by Pinger.Stop will work14:04
rogpeppe1TheMue: the "might" comes from the fact that you'll have to make sure that a request can't block itself on the timeout goroutine because the timeout goroutine is trying to close the connection14:05
dimiternfwereade, https://codereview.appspot.com/25080043/ updated14:05
rogpeppe1TheMue: it's probably best just to do go conn.Close() tbh14:06
dimiternfwereade, I'll do some live upgrade testing later today14:06
TheMuerogpeppe1: ok, thanks, will take that approach14:07
fwereadejam, mgz: so, when I bzr annotate, and I see a number like "1982.5.6"... how do I turn that into an actual revision on trunk?14:10
jamfwereade: you mean when it was merged?14:10
jamyou could do "bzr log -r 1982.5.6..-1"14:10
jamor use bzr qannotate14:10
jam(apt-get install qbzr)14:10
jamwhich shows that stuff in the log14:10
fwereadejam, thanks14:11
mgzfwereade: also, `bzr log -rmainline:1982.5.6`14:13
fwereademgz, thanks also :)14:15
jamfwereade: the thing I really like about qbzr is it is pretty easy to jump around, so you can see what rev modified a file, and then quickly see what it looked like before that change14:17
jambut I see how for --force you reall just want to see the mainline commits to merge it back to 1.1614:17
rogpeppe1mgz: interesting. what does the "mainline:" in there make a difference?14:18
rogpeppe1s/what/why/14:18
fwereaderogpeppe1, I think I'm being stupid -- can you explain how https://codereview.appspot.com/14619045/ precipitated the change in jujud/machine_test.go ? because I can fix my 1.16 problem by adding JobManageState, and I can see why it's necessary -- but I can't figure out what in the minimal set of branches necessary to get destroy-machine --force might have actually triggered it14:19
jamrogpeppe1: "bzr log -r mainline:X" logs the first mainline revision (no dots) that includes the revision in question14:20
jamif you do a range like "bzr log -r 1982.5.6..-1" you'll be able to see which one it is14:20
jambut the "mainline:" is *just that rev*14:20
* fwereade waits for someone to point out something that'd be obvious to an partially-sighted and fully-intoxicated monkey14:21
rogpeppe1jam: so that signifies something to log in particular, or is that something that's useful anywhere a revno can be used?14:21
jamrogpeppe1: should be anywhere a revno can be used14:21
jamdiff, etc14:21
jamit is just for "-r"14:21
jamfwereade: so the change for Uniter to use the cached addresses from state14:22
jammeans that Uniter.APIAddresses14:22
jamneeds to have a JobMachineState somewhere14:22
jamto report its IP address14:22
jamfwereade: the general change to state/api/common/addresser.14:22
jamfwereade: does that make sense?14:22
rogpeppe1jam: so can 1982.5.6 actually specify a different revision to mainline:1982.5.6 ?14:22
jamrogpeppe1: 1982.5.6 is the revision itself, mainline:1982.5.6 is the revision which merged that revision into trunk14:23
jamrogpeppe1: "try it" ?14:23
rogpeppe1fwereade: it's necessary because one of the agents started by TestManageEnviron calls State.Addresses14:24
rogpeppe1fwereade: i can't quite remember the details of which14:24
rogpeppe1fwereade: when you say "triggered it" what are you referring to?14:25
jamrogpeppe1: he is trying to backport his destroy machine --force, and it seems it is carrying with it some unexpected baggage14:25
fwereaderogpeppe1, I cherrypicked a few unrelated branches and got those JobManageEnviron tests failing, and I can't figure out why -- apart from that, obviously, it can't work as written, and does work if it also runs JobManageState14:26
* rogpeppe1 goes to pull 1.1614:26
fwereaderogpeppe1, sorry, I don't want to properly distract you -- if there's no immediate "oh yeah that was weird" that springs to mind I'll keep poking happily14:28
rogpeppe1fwereade: if none of the new addressing stuff made it into 1.16, it's weird that this is happening14:29
rogpeppe1fwereade: i.e. that adding JobManageState fixes anything14:29
fwereaderogpeppe1, yeah, indeed14:30
abentleyI am trying to destroy an azure environment and failing: http://pastebin.ubuntu.com/6411031/14:54
fwereadeabentley, consistent?14:57
abentleyfwereade: Yes.14:57
abentleyfwereade: using 1.16.3, but it may have been bootstrapped with 1.17.x14:58
fwereadeabentley, I don't know azure at all really, but is it possible you've got some instances still around? or running under the same account? iirc the failing bit is one of the last steps at teardown time14:59
abentleyfwereade: I don't know much about azure either.  I've just been using it through juju.14:59
fwereadejam, do you recall, did natefinch do any of the azure stuff?15:00
rogpeppe1lunch15:44
jamfwereade: natefinch has worked with azure, and has done our Windows builds15:51
fwereadejam, thought so -- he's coming back later, right?15:52
jamfwereade: I believe so.15:52
jamabentley: I know that instance teardown is particularly bad there15:52
jamgossip says it takes minutes to tear down one machine, and deleting machines is protected by a single lock15:52
abentleyjam: Yes, I've witnessed the slow teardown myself.15:53
jcsackettsinzui or abentley: either of you have time to review https://code.launchpad.net/~jcsackett/charmworld/better-latency-round-2/+merge/195091 ?16:18
abentleyjcsackett: sure.16:18
jcsackettabentley: thanks.16:18
* fwereade needs to stop for a while, will probably be back to say hi to those in the antipodes at least16:55
dimiternfwereade, is https://codereview.appspot.com/25080043/ good to land?17:40
sinzuiCI found a critical regression, bug #1250974. I'll talk to wallyworld_ when he comes on online about it.18:40
_mup_Bug #1250974: upgrade to 1.17.0 fails <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1250974>18:40
sinzuiabentley, ^ do you want to revise the details I put in the description18:47
* rogpeppe1 is done18:50
abentleysinzui: done.  (just swapped 2052 to 2053 at the end).19:08
sinzuithank you19:09
natefinchThree months on Ubuntu and I only just now realized that (0:08) next to my battery means it'll be charged in 8 minutes, not that it thinks there's only 8 minutes of charge left :/20:28
hazmatanybody know golang internals? we're debugging some reflection issues in gccgo20:40
hazmatnatefinch, how's the xps15 treating you?20:48
hazmatfwereade, i know the azure stuff, what's up?20:48
* hazmat reads abentley's traceback20:49
hazmatabentley, so the azure provider is basically synchronous and very careful about cleaning up after itself20:49
natefinchhazmat: so far... some problems.  The hardware is awesome, but Ubuntu is having some problems... trying to get bumblebee working to enable optimus so it'll use the NVidia GPU instead of just the built-in one on the Intel chipset.20:49
hazmatnatefinch, how's the keyboard?20:50
* hazmat has nightmares about old dell laptop keyboards20:50
abentleyhazmat: That's good to know, but it did not succeed in this case.20:50
hazmatabentley, two things.. you can log into the console (or use the nodejs cli tools) to verify you have no machines running, and then try running destroy again with --debug20:50
natefinchhazmat: not terrible... it's standard size... takes a little getting used to it. but I can still generally type without looking.  Of course, stuff like home end etc is moved around some.20:51
abentleyhazmat: But this may be an Azure bogosity.  As far as sinzui and I can tell, it is impossible to delete the network.  We have tried from the Azure web console, and though nothing is using it, Azure says it can't be deleted because things are using it.20:51
hazmatabentley, normally the azure provider does some polling against the operation event stream20:51
hazmathmm20:51
hazmatabentley, its worked in the past but there have been changes on both end20:51
hazmatits been about 1.5 m since i last run the azure provider..20:52
abentleyhazmat: But even the web console doesn't work, so I'm inclined not to blame the azure provider.20:52
hazmatabentley, fair enough.. there are lots of resources not nesc. exposed in the ui.. so both you and sinzui had this issue?20:52
abentleyhazmat: We are using the same subscription, so we have the same set of resources.20:53
hazmatabentley, aha20:53
hazmatabentley, but your using different env names and controls and networks?20:53
hazmatabentley, i dunno that cross env sharing of resources is going to work so well20:54
abentleyhazmat: Yes.20:54
abentleyhazmat: We're both doing fairly limited testing, with different environment names, so it doesn't seem likely that we'll exhaust each others' resources.20:54
abentleyOr otherwise trip on each others' feet.20:55
hazmatabentley,  well...20:57
hazmatabentley, so everything in your azure provider section is different?20:58
=== BradCrittenden is now known as bac
abentleyhazmat: No, storage-account-name, management-subscription-id, management-certificate-path are the same.  Not sure about admin-secret.21:00
abentleyhazmat: But I don't see how that's relevant to the unkillable network.21:01
* hazmat logs into azure console 21:04
hazmatabentley, does it show networks == 0 ?21:06
hazmatin the console21:06
hazmatabentley, the azure provider does some stuff with networking (interlink between services) which isn't represented in the console21:07
hazmator the api really, just raw xml21:07
abentleyhazmat: No, it shows networks == 2.21:07
hazmatabentley, so possibly you have interlinks between the services in two different environments21:08
hazmatwhich would explain why neither can be deleted21:08
hazmatjust guessing though.. easy to find reproduce if that's the case21:09
abentleyhazmat: That can't be the cause, because the second network was created when we found we couldn't destroy the first network.21:10
abentleyhazmat: i.e. the problem pre-dated the second network.21:11
hazmatabentley, hmm..21:12
hazmatabentley, okay.. let me create and destroy and env.. which version you using?21:12
hazmatof juju21:12
abentley1.16.321:12
hazmatk, i'm trying with trunk21:14
hazmatwe should really default the region to the same one the imagestream refs21:16
hazmatie East US21:16
abentleyhazmat: sinzui reports he's had a lot of trouble with East.21:19
hazmatabentley, but simplestreams metadata refs  images  there.. how do you get around the affinity group otherwise?21:20
hazmatabentley, ie https://bugs.launchpad.net/juju-core/+bug/125102521:20
_mup_Bug #1251025: azure provider sample config should default to East US <juju-core:New> <https://launchpad.net/bugs/1251025>21:20
abentleyhazmat: All I know is that it works.21:20
natefinchhazmat: bwahaha, finally got it working (mostly user error I think).    24" 1920x1200, 30" 2560x1600, 15.6" 3200x1800  all running smoothly.21:31
abentleyhazmat: I've run destroy-environment and we're back to just 1 network.21:34
abentleyhazmat: And it still fails to delete.21:35
hazmatabentley, the azure provider doesn't provider very much log output..21:36
hazmatabentley, created and destroyed env without issue here.21:37
hazmatabentley, by chance you know what was deployed in the first env.. just the ci test of wordpress/mysql ?21:37
abentleyhazmat: Yes, that's what it was.21:37
hazmatthere's like this 3 minute pause during bootstrap with no info given to why21:39
jcsackettsinzui: do you have time to look at a one line MP? https://code.launchpad.net/~jcsackett/charms/precise/charmworld/fix-lp-creds-ini-override/+merge/19514621:47
sinzuiI do21:47
sinzuijcsackett, r=me21:48
jcsackettsinzui: thanks.21:49
wallyworld_sinzui: hi there. is there another upgrade bug? :-(21:50
sinzuiwallyworld_, yes. take your time and read though it and maybe the log: https://bugs.launchpad.net/juju-core/+bug/125097421:52
_mup_Bug #1250974: upgrade to 1.17.0 fails <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1250974>21:52
wallyworld_ok. i used the same deprecation mechanism as for public bucket url so i'll have to see why that's not working here21:53
sinzuiwallyworld_, maybe it is how we test21:54
sinzuiwallyworld_, abentley hp cloud is not failing. are the configs different?21:54
wallyworld_interesting. hp cloud does have a sloightly different config boilerplate written out for it via juju init21:55
wallyworld_but that's just some extra comments in the yaml afaik21:55
sinzuiwallyworld_, I only speculated as the order of event that could lead to testing/ being ignored. I think the 1.16.3 tools was selected from the testing location, but by the moment of the upgrade, testing/ was no longer known21:56
abentleysinzui: I haven't tested in a way that would show hp failing.  We test canonistack before hp, so we never bother to check whether hp is failing.21:56
abentleysinzui: Because we already know it's a fail.21:56
abentleysinzui: As far as I know, everything except "provider" and and "floating-ip" is configured differently between hp and canonistack.21:58
wallyworld_sinzui: it could have something to do with the jenv stuff - that is relatively new and has been evolving since the public bucket url deprecation mechanism last worked21:58
sinzuiabentley, if we don't use metadata-tools-url in the config, does it work?21:58
abentleysinzui: Gotta go, but I'll check that first thing tomorrow.21:59
sinzuiI suspect it does. I think it is not possible to upgrade with a config that users are likely to have if the read the release notes or just respond to what juju is saying21:59
sinzuithanks abentley21:59
sinzuiwallyworld_, per what I asked of abentley ^ I think the issue with upgrades is mixed configs.22:01
wallyworld_as in they put tools-metadata-url in their new 1.7 env config and upgrade a 1.6 which didn't have it?22:02
wallyworld_actually, if the jenv files are still present, i think any changes to the env yaml are ignored22:04
wallyworld_i'm not 100% sure, but i've had to delete the jenv files previously if i wanted to introduce new config22:04
wallyworld_not very intuitive if you ask me22:04
wallyworld_so perhaps the user reads the release notes, edits their env yaml to add tools-metadata-url, but it is ignored because the jenv file is there and the yaml is ignored?22:06
wallyworld_i'll have to check with the folks who did the jenv stuff, or read the code22:06
sinzuiwallyworld_, I want the old config to just work for users to upgrade. If users have a perfect new config it should work of course. In the case of a config with old and new values, We need to be certain we honour them. juju servers will be different that juju clients, so the configs need to work for all cases22:07
wallyworld_sinzui: agreed. that's what the current code in 1.7 does - it sees tools-url, logs a warning, and sets tools-metadata-url to the tools-url value. that's how the public bucket url deprecation worked also. so i'm not sure what's happening to make it fail22:08
sinzuiwallyworld_, if we decided to fix bug 1247232, we might be able to be strict about what is in the config22:08
_mup_Bug #1247232: Juju client deploys agent newer than itself <ci> <deploy> <juju-core:Triaged> <https://launchpad.net/bugs/1247232>22:08
sinzuiwallyworld_, once how to the old bootstrap node get the new tool? It looks like it is searching for it. since it doesn't know about metadata-tools-url, it cannot find it?22:09
sinzuiShouldn't the client tell the server the exact version and location to use since it had to do the lookups?22:10
wallyworld_sinzui: i'm not entirely familiar with the upgrade workflow - what data is passed to where etc. but that would seem sensibl22:11
wallyworld_sinzui: one thing i can think of - i remove the old tools-url from the config struct that is parsed from the yaml once the new tools-metadata-url is set. perhaps that data is being sent to the old node which then doesn't see a tools-url is recognises?22:12
sinzuiwallyworld_, The log implies the server has a new config with 1.16.3. it doesn't have a tools-url set22:13
* sinzui wishes zless shows line numbers22:13
wallyworld_sinzui: so it seems perhaps that my assumption that the tools-url could be deleted from the in memory config struct once tools-metadata-url is set is wrong, since that data ends up being passed to the older 1.16 nodes. that surprises me because i wasn't aware that would happen22:16
sinzuiwallyworld_, about line 2674 of the log, I see the last know occurrence of /testing After we see that config we see juju search to the new tool, in the wrong location22:16
sinzuiwallyworld_, I think you mean the value is cleared. I believe axw reported a bug that it is not possible to delete a config key.22:18
sinzuiAnd maybe that has helped insulate us from upgrade issues22:19
wallyworld_sinzui: right, yes. i clear the tools-url from the map of config values held by the config struct when the yaml is parsed22:19
wallyworld_so, thinking out loud, if the wrong tools location is being used, perhaps it's the new 1.17 nodes not having a config with tool-metadata-url set22:21
wallyworld_sinzui: so that log file just has warnings logged - it would be nice to see debug so we can see the simplestreams search path used22:23
wallyworld_i'll see if i can do a test to get that happening22:24
sinzuiwallyworld_, we see all the paths being checked at 2735 and the entries say DEBUG22:25
wallyworld_sinzui: i am stupid - i was looking at the console log from the link in the descripton. i didn't see the attachment22:26
sinzuiwallyworld_, no need to think ill of yourself. Aaron and I also made the same mistake22:26
wallyworld_maybe i need more coffee22:27
sinzuiwallyworld_, we are a day or two away from being able to run an arbitrary branch + rev + cloud to do a singe test. Aaron hacked two runs to replay the last success and first fail today22:28
wallyworld_sinzui: i'll go through the log and figure out wtf is happening and hopefully have a fix for you. sorry about the bug. i suspect it is due to slightly new config workflow interfering with it but am not sure22:28
sinzuiwallyworld_, thank you. Take your time to do a proper fix22:30
wallyworld_will do. there's been a bit of churn under the covers that i'm not involved with so old assumptions maybe don't apply anymore. i think i'll have to do a full upgrade test by hand to validate22:31
sinzuidavecheney, I don't have a short list of bugs to address. I see http://tinyurl.com/juju-stakeholders and https://launchpad.net/juju-core/+milestone/1.17.0 as bugs we want to help fix. There are so many I think we can confidently work on those that we can confidently fix quickly23:24
davecheneysinzui: ack23:27
davecheneysinzui: work is ongoing to get juju building under gccgo23:27
davecheneyi should find some time convenient to your team to break the bad news about this extra dimention of the testing matrix23:28
sinzuidavecheney, I saw that. I had an unexpected event last evening. I had thought I would have time to ask you about that. Did I deploy still-born juju-core arm packages for 1.16.3?23:30
sinzuis/deploy/release/23:30
davecheneysinzui: not really sure23:31
davecheneydoes anyone use those packages ?23:31
sinzuiI suspect not since the test suite doesn't work.23:33
sinzuiWe only CI on amd64. I sign tarballs that pass on amd6423:33
davecheneysinzui: do you have an example failure ?23:36
davecheneythat sounds like the sort of thing that is in my baliwak to fix23:36
sinzuiI don't. I saw a bug report from michael hudson and saw you comment on it23:37
davecheneysinzui: yeah, we have a fix23:54
davecheneyneed it to be merged on the mgo repo23:54
sinzuidavecheney, speaking of mgo, does the test suite always pass for you? I often see mgo failures. they are are commong for me now, but rare back in september23:55
davecheneysinzui:  i haven't run that test suite even once int he last year23:57
davecheneyis there a jenkins job for it ?23:57
davecheneyplease assign any failure reports to me23:57

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!