/srv/irclogs.ubuntu.com/2013/11/13/#juju-dev.txt

davecheney	sinzui: i'm open for assignment of bug fixes	00:14
bigjools	thumper: I need to have a call with you today about VLANs, let me know when is convenient please	01:35
thumper	bigjools: now is as good as ever	01:36
thumper	bigjools: also, not sure how useful I'm going to be :)	01:36
bigjools	thumper: ok let me grab a drink and I'll call in 5 mins	01:37
bigjools	it's a start if nothing else :)	01:38
bigjools	thumper: calling	01:41
thumper	wallyworld_: here are the test changes I was telling you about https://codereview.appspot.com/25460045/	03:36
wallyworld_	ok, i've already changed the method names locally. i'll pick up the changes once you land	03:39
thumper	wallyworld_: ack	03:45
* thumper runs on yet another small errand...		03:48
jam	axw: we might do a config.New, but the warning is inside Validate	07:21
jam	(it is at the end of environs/config.Validate)	07:22
axw	jam: eh, sorry, not sure how I confused that	07:22
jam	I guess that is somehow different from Config.Validate() ?	07:22
axw	jam: ah, config.New calls that	07:22
jam	axw: at one point we had explicitly discussed not even parsing sections we don't know about so that you could have pyjuju and juju-core environments in the same file	07:23
axw	it's just for validating common configuration	07:23
jam	but it also applies for multi-version stuff.	07:23
jam	axw: I don't see why we need to config.New for anything we won't use	07:23
axw	yeah, I don't see any value in parsing if we're not using it	07:23
axw	we should just defer to first use	07:23
jam	ReadEnvirons just parses everything into Environs objects	07:24
axw	jam: yep, so we could just modify Environs.Config to do the parse on first reference	07:25
jam	axw: what is silly is in environs/open.go we ReadEnvirons("") , just to get name from envs.Default (that we may not use), and then we actually read the info from store.ReadInfo(name), and only if that fails do we actually use the envs we just read	07:28
axw	heh	07:28
jam	axw: so we do assert that environs.ReadEnvironsBytes doesn't generate an error, and you only get the error when you use environs.Config()	07:30
jam	well, the environs.ReadEnvironsBytes().Config(name)	07:30
jam	but that is actually because we've subtly put the err on part of the struct	07:31
jam	waiting to report it until later.	07:31
axw	should be nice and easy then :)	07:32
jam	fwereade: thanks for doing the backport	08:57
fwereade	jam,np	08:58
jam	axw: so there is a small point about creating a new config each time. If we are creating a warning, we'll do it twice.	08:58
fwereade	jam, the --force one might be a little trickier	08:58
jam	I wonder if that is a problem	08:58
jam	fwereade: because the code is different, or because it is more invasive?	08:58
fwereade	jam, just because it's a few branches and I'm paranoid	08:59
jam	fwereade: just because you're paranoid doesn't mean they aren't out to get you. :)	08:59
fwereade	jam, words to live by	08:59
=== axw_ is now known as axw
jam	axw: for https://code.launchpad.net/~axwalk/juju-core/jujud-uninstallscript/+merge/194994	09:08
jam	how do we handle upgrades ?	09:08
jam	as in, there won't be anything in agent.conf on a system that we upgraded	09:08
jam	fwereade: in https://code.launchpad.net/~wallyworld/juju-core/provisioner-api-supported-containers/+merge/194982 he mentions "A change was also made to the server side implementation so that the machine doc txn-revno is no longer checked."	09:09
jam	that sounds risky to me, but I'd like to get your feedback on it.	09:09
jam	axw: I didn't mean to scare you away	09:12
jam	fwereade: one thing about our CLI API work. The new CLI is likely to be incompatible with old server versions when we have to create an API for the command. (the "easy" case vs the "trivial" case). Do we care?	09:14
jam	we definitely haven't been implementing backwards compatibility fallback code.	09:14
fwereade	jam, looking at wallyworld_'s	09:17
wallyworld_	jam: we rarely check txn-revno. mainly for env settings. never previously for machines. i was trying to be more stringent by introducing it	09:17
fwereade	jam, pondering the latter	09:17
fwereade	jam, wallyworld_: a txn-revno check is a big hammer and should not generally be used until we've exhausted all other possibilities	09:17
fwereade	jam, wallyworld_: far better to check only the fields we actively care about	09:17
wallyworld_	fwereade: yes, i came to that conclusion	09:18
fwereade	jam, wallyworld_: but sometimes that's not practical	09:18
davecheney	http://paste.ubuntu.com/6409771/	09:18
wallyworld_	i like optimistic locking in general as a pattern	09:18
davecheney	juju compiled with gccgo	09:18
jam	fwereade: so that sounds like what he's done	09:18
jam	as in, we used to assert the whole thing, but now we just assert the one field	09:18
fwereade	jam, wallyworld_: last I was aware, we'd managed to eliminate them all, I guess another crept in	09:18
wallyworld_	i introduced it	09:19
wallyworld_	in a recent branch	09:19
fwereade	davecheney, cool,does it work? ;)	09:19
wallyworld_	it worked in practice but tests failed with some new work	09:19
rogpeppe1	davecheney: cool	09:19
rogpeppe1	davecheney: does it work?	09:19
jam	davecheney: nice. Almost 3MB smaller than the static one. :) Wish it was more like 15MB smaller.	09:20
davecheney	rogpeppe1: i didn't try to bootstrap it	09:22
davecheney	oh speaking of that	09:22
fwereade	jam, wallyworld_: it looks sane to me -- the only way we could be screwing that up is with multiple agents for the same machine on separate instances, and the nonce stuff should guard against that effectively	09:22
davecheney	i need someone to commit a mgo change	09:22
wallyworld_	fwereade: right. that attr is only set once as the machine agent spins up	09:22
rogpeppe1	davecheney: you can lbox propose the change, i think	09:23
davecheney	it's not mine	09:23
wallyworld_	fwereade: btw, you forgot to look at my wip branch :-(	09:23
fwereade	wallyworld_, hell, sorry	09:23
fwereade	this is what happens when I write code :(	09:24
wallyworld_	np. it's at the point now where i just need to add one more tests	09:24
wallyworld_	and i can propose formally	09:24
davecheney	rogpeppe1: https://code.launchpad.net/~mwhudson/mgo/evaluation-order/+merge/194968	09:24
davecheney	mike doesn't know how to use lbox	09:24
wallyworld_	i've done live testing abd it all seems fine	09:24
fwereade	wallyworld_, link please?	09:24
wallyworld_	https://codereview.appspot.com/25040043/	09:24
wallyworld_	just some tests to add	09:24
wallyworld_	i've proposed the addsuportedcontainers stuff separately	09:25
wallyworld_	hence the discussion earlier about txn-revno	09:25
fwereade	wallyworld_, ta	09:25
wallyworld_	davecheney: the fact that lbox is not used is not a bad thing :-)	09:26
davecheney	fwiw: lucky(/tmp) % strip juju	09:26
davecheney	lucky(/tmp) % ls -al juju	09:26
davecheney	-rwxrwxr-x 1 dfc dfc 11438248 Nov 13 20:26 juju	09:26
davecheney	rogpeppe1: anyway, that needs to land before juju will work properly	09:27
jam	wallyworld_: I have a review in.	09:27
wallyworld_	thanks :-)	09:28
rogpeppe1	davecheney: yeah	09:28
rogpeppe1	davecheney: you could propose it yourself, i guess	09:28
jam	wallyworld_: I did have some comments where I think we are missing test coverage, and possibly having a client-side API that matches the other functions	09:28
jam	but hopefully minor stuff	09:29
wallyworld_	ok	09:29
rogpeppe1	davecheney: it would be nice if go vet (or some other tool) could give warnings about undefined behaviour like that. i guess it's not possible in general though.	09:29
wallyworld_	jam: " after api.Set* from the API point of view" - there is no api call to get supported containers	09:30
wallyworld_	the api is currently write only	09:30
davecheney	rogpeppe1: /me considers what it would take to detect this behavior	09:30
jam	wallyworld_: so... do we just do it on every startup ?	09:30
jam	it would be nice if we would check if we've already done the lookupb	09:30
jam	unless the lookup is exceptionally cheap	09:30
jam	I guess	09:30
wallyworld_	every machine agent start up	09:31
jam	wallyworld_: anyway, if you can't test it, just say so. :)	09:31
wallyworld_	ok :-)	09:31
rogpeppe1	davecheney: the oracle might have enough information to find some simple cases	09:31
wallyworld_	jam: i've not done anything with permissions yet so i'll nned to see how to manipulate them to add a test	09:31
jam	wallyworld_: there should be other tests you can crib from. Usually it is "set up 3 machines, use the agent for machine-1, try to change something on all 3 machines, and assert that you get EPERM on the ones you're not allowed"	09:33
jam	I was actually suprrised that in one call you could change m	09:33
jam	both machine-0 and machine-1	09:33
wallyworld_	jam: i simply copied another test and used a different api call	09:33
jam	wallyworld_: so I guess, "lets think what the perms on this should be, and assert them in a test"	09:34
davecheney	rogpeppe1: I also have a fix in for gccgo to fix the go/build breakage	09:34
jam	I would say that the only thing that is allowed to change the value of a machine's supported containers is that machine's assigned agent	09:34
jam	which is an AuthOwner sort of test.	09:34
rogpeppe1	davecheney: cool	09:34
jam	wallyworld_: and if you remember the thing you copied, we might have a security hole there, so at least file a tech-debt bug to track it.	09:38
wallyworld_	"if" is the relevant word :-)	09:39
wallyworld_	i'll take a look	09:39
rogpeppe1	fwereade: i've just realised that for ensure-ha to be transactional, State.AddMachine needs to take a count argument and for that to create all its machines in the same transaction. Looking at the transactions around AddMachine, this is a bit OMG.	09:40
fwereade	rogpeppe1, how much of it is applicable in that case? IIRC most of the complexity is around containers rather than machines themselves	09:41
rogpeppe1	fwereade: i'm not sure - i haven't grokked the code yet	09:41
fwereade	jam, I'm still thinking about api backward compatibility and thinking that it kinda sinks the --force fix for 1.16	09:41
rogpeppe1	fwereade: just the idea of making transactions that can be hundreds of operations long fills me with doubt	09:42
fwereade	jam, 1.16s should really work with other 1.16s	09:42
fwereade	rogpeppe1, how would they be that long?	09:42
rogpeppe1	fwereade: juju add-machine -n 100 ?	09:42
fwereade	rogpeppe1, ah wait, unexplored assumption	09:43
rogpeppe1	fwereade: i guess we wouldn't need to use the count for add-machine	09:43
fwereade	rogpeppe1, why does AddMachine need a count argument?	09:43
fwereade	rogpeppe1, jinx :)	09:43
rogpeppe1	fwereade: not quite: State.AddMachine needs a count argument. juju add-machine doesn't need to use it.	09:44
fwereade	rogpeppe1, in general if things like -n need to be transactional (which I agree they should) I think the sane answer is to stick it in a queue of somesort	09:44
fwereade	rogpeppe1, not quite so sure about that	09:44
rogpeppe1	fwereade: AddMachine needs a count argument because otherwise we'd be able to have an even number of state servers	09:44
fwereade	rogpeppe1, wouldn't HA methods on state be saner?	09:44
rogpeppe1	fwereade: i'd intended to do so. but those methods need to add machines	09:45
fwereade	rogpeppe1, right, so they should use addMachineOps	09:45
fwereade	rogpeppe1, the various unexported *Ops methods in state are the building blocks of transactions	09:47
fwereade	rogpeppe1, they are I admit kinda gunky in cases, like lego buried in leafmould for years	09:47
fwereade	rogpeppe1, but they're internal and therefore subject to safe improvement as required	09:48
rogpeppe1	fwereade: and then we make State.AddMachine barf if its jobs contain state server jobs?	09:48
fwereade	rogpeppe1, probably	09:48
rogpeppe1	fwereade: this all makes me feel highly uncomfortable	09:48
fwereade	rogpeppe1, which is I admit a hassle from a test-fixing perspective	09:49
fwereade	rogpeppe1, if it's hard for you to write the code this is all the more reason we should not just hand the user the same toolkit you're reacting against and tell them to figure it out	09:49
rogpeppe1	fwereade: we'd also need to have another special case for adding machine 0	09:49
fwereade	rogpeppe1, machine 0 is already a special case	09:50
rogpeppe1	fwereade: not in state, currently	09:50
rogpeppe1	fwereade: AFAIK	09:50
jam	fwereade: because --force requires a new API? I guess it does add a parameter, but won't that just be ignored otherwise ?	09:50
rogpeppe1	fwereade: it's hard to write the code in this particular style	09:50
fwereade	rogpeppe1, it's an InjectMachine not an AddMachine	09:50
rogpeppe1	fwereade: well, InjectMachine would need the same restrictions as AddMachine, no	09:51
rogpeppe1	?	09:51
fwereade	jam, maybe it's not such a big deal -- it won't work if the agent-version is old, but it'll be silent	09:53
jam	fwereade: for good or bad that has been our answer for API compatibility	09:53
rogpeppe1	fwereade: another possibility that means that we wouldn't have to transactionalise all this fairly arbitrary logic is to just put the ensure-ha bool value in the state	09:53
fwereade	rogpeppe1, how many other InjectMachine cases are there?	09:53
jam	fwereade: and that isn't much different than a 1.18 client trying to do it against a 1.16 system.	09:53
fwereade	jam, I think there's a distinction between stuff not working across minor versions vs patch versions	09:53
jam	fwereade: maybe, though I think from a client perspective patch versions shouldn't really break things either.	09:55
fwereade	jam, s/patch/minor/?	09:55
jam	I have been hoping to push that more once we actually got everything into the api	09:55
jam	fwereade: right	09:55
rogpeppe1	fwereade: only one - in the manual provisioner	09:55
jam	fwereade: I really want the client that is on 14.04 initially to still work 2 years later	09:55
fwereade	jam, yes indeed	09:55
fwereade	jam, at that point I think it's a matter of freezing Client and writing new methods that are a little bit consistent with each other, and with the style of the internal API	09:55
jam	fwereade: yeah, I was thinking about the Batch stuff we did. And realizing that the thing we really want to be Batch is Client, which was written before we were focusing on it. :(	09:56
fwereade	rogpeppe1, remind me, does manual bootstrap use jujud bootstrap-state? if so the other case is more like RegisterMachine -- which is really an AddMachine with instance id/hardware	09:58
rogpeppe1	fwereade: in fact, the more i think about it, the more i think it would be better if we just signalled the HA intention in the state, and let an agent sort it out, the same way the rest of our model works.	09:58
fwereade	rogpeppe1, if it's a matter of rearranging the state methods that's just the usual process of development, I think	09:58
rogpeppe1	fwereade: a significant part of it is that we may very well want more on-going logic around ensure-ha in the future	09:59
fwereade	rogpeppe1, the concern there is about automatic failover -- I don't want to be rearranging mongo all the time on the basis of presence alone, without user input	09:59
fwereade	rogpeppe1, expand on that bit please?	09:59
fwereade	jam, it has been a constant source of low-level irritation to me as well ;)	10:00
rogpeppe1	fwereade: so, for example: at some point we will probably want to automatically start a new state server machine when one falls over	10:00
fwereade	rogpeppe1, we might	10:00
fwereade	rogpeppe1, in which case it's a trivial agent that keeps an eye on presence and calls the ensure-ha logic that we currently require user intervention for	10:00
rogpeppe1	fwereade, jam: i'm unconvinced that the one-size-fits-all batch approach we use in our internal API is appropriate for the client API.	10:00
fwereade	rogpeppe1, it is wholly apropriate for the client API, the internal API is the bit that's arguable	10:01
rogpeppe1	fwereade: expand, please	10:03
fwereade	rogpeppe1, the argument that there are times when you really only want one entity to be messed with is reasonable in the case of, say, internal Machine.EnsureDead -- because it's governed by an agent that (currently) only has responsibility for one machine	10:03
jam	rogpeppe1: a Client is much more likely to care about more than one unit/machine/thingy at a time. most of the agents have a 1-1 correspondence	10:03
fwereade	rogpeppe1, I don't see any justification for requiring that any client-led change must be broken into N calls	10:04
rogpeppe1	fwereade, jam: it's easy for a client to make many calls concurrently	10:05
fwereade	rogpeppe1, take DestroyMachines/DestroyUnits for example -- that goes halfway, and then does that horrible glue-errors-together business	10:05
fwereade	rogpeppe1, you have a different idea of "easy" than many other people	10:05
fwereade	rogpeppe1, and it also prevents us from ever batching things up usefully internally	10:05
rogpeppe1	fwereade: i have no particular objection to making some calls batch-like, on a case-by-case basis	10:05
fwereade	rogpeppe1, I do	10:05
rogpeppe1	fwereade: so you do	10:06
fwereade	rogpeppe1, because we're bad at predicting, and the cost of a 1-elem array vs a single object is negligible, and allows for compatibility when we get it wrong	10:06
rogpeppe1	fwereade: it's not fucking negligible	10:07
fwereade	rogpeppe1, I'mnot saying the HA stuff is easy, if that's what you're referring to	10:07
fwereade	rogpeppe1, are you making an ease-of-development argument?	10:08
rogpeppe1	fwereade: i'm making a keep-it-fucking-simple-please and this-is-unneccessary-and-insufficient argument	10:08
fwereade	rogpeppe1, unnecessary I think we'll have to differ on, insufficient is more interesting	10:10
* rogpeppe1 goes for a walk		10:13
jam	fwereade: man, you drove everyone away. :) (axw, mramm, etc.)	10:21
fwereade	jam, I'm pretty unpleasant really when it comes down to it	10:22
jam	I know I'm glad I only have to put up with you for 1 week every few months. :)	10:22
jam	anyway	10:22
fwereade	;p	10:24
frankban	hi coredevs: I need to implement a "bootstrap an environment only if it's not already bootstrapped" logic for the quickstart plugin. I thought about two option 1) try: juju bootstrap; except error, if error is "already bootstrapped" then ok. Options 2) is: if JUJU_HOME/environments/<envname>.jenv exists then ok, already bootstrapped. 1) seems weak (I'd have to parse the command error string) and 2) seems to rely	10:25
frankban	on an internal detail (those jenv files). Suggestions?	10:25
jam	frankban: it is also possible for foo.jenv to exist but not be bootstrapped, though that is subject to bugs in bootstrap (which are hopefully rare enough to not worry about)	10:27
jam	frankban: an alternative is to try to connect to the env rather than try to bootstrap first	10:27
fwereade	frankban, I would prefer (2) because Isee .jenv files as pretty fundamental, and waxing in importance -- the wrinkle is that a .jenv might be created without the environment being bootstrapped (by sync-tools)	10:28
fwereade	jam, frankban: however if no jenv exists the env is certainly not bootstrapped	10:28
fwereade	jam, frankban: and I would imagine that quickstart is always going to want to create a new environment	10:29
fwereade	jam, frankban: so just picking a name not used by a jenv, or environments.yaml, might end-run around the problem?	10:29
jam	fwereade: quickstart is meant to "help you along your way"	10:30
jam	so if you haven't bootstrapped yet it starts there	10:30
jam	if you've bootstrapped but not yet installed juju-gui	10:30
jam	then it starts there	10:30
jam	etc	10:30
jam	so running it multiple times should be convergent	10:30
jam	even if it gets ^c in the middle.	10:30
fwereade	jam, bah, ok	10:31
frankban	fwereade: one of the goal is make quickstart idempotent, so, if the environment is already bootstrapped it must skip that step, if the GUI is in the env then skip that step too, and so on... jam: to connect to the API i need to know the endpoint, and I'd like to avoid asking permission and calling juju api-endpoints.	10:31
jam	frankban: I would probably go with "if .jenv exists, try to connect"	10:31
jam	frankban: "asking permission" ?	10:31
fwereade	jam, frankban: +1	10:31
jam	don't you have to call "juju api-endpoints" at some point regardless	10:32
fwereade	jam, frankban: and if that fails, fall back to bootstrapping?	10:32
jam	we do plan to cache the api addresses in the .jenv file	10:32
jam	but I don't know when that code will actually be written.	10:32
frankban	jam: ok so, if jenv exists, try to call api-endpoints, if the latter returns an error, bootstrap, otherwise consider the env bootstrapped?	10:32
jam	frankban: that sounds like what I would do	10:32
jam	frankban: well "if jenv exists, try api-endpoints if it succeeds the env is bootstrapped, else bootstrap'	10:33
jam	frankban: there is a small potential for "it is bootstrapped but not done starting yet"	10:33
jam	though I think "api-endpoints" should notice and hang for a while waiting for it to start	10:33
frankban	fwereade, jam: when exactly I can expect the jenv file to exist without a bootstrapped env?	10:33
fwereade	frankban, if someone ran sync-tools first	10:34
fwereade	frankban, (or if there's a bug)	10:34
jam	frankban: today it only happens if (a) someone runs sync-tools, (b) there is a bug during bootstrap and the machine fails to start, (c) someone else bootstrapped copied the file to you, and then did destroy-environment	10:34
frankban	fwereade, jam, ok: so maybe the logic is: jenv does not exist -> bootstrap.	10:36
frankban	jenv exists: run "juju status" until 1) it returns an error, in which case consider the environment not bootstrapped -> bootstrap	10:37
frankban	or 2) it returns normally -> the env is bootstrapped	10:37
fwereade	frankban, sgtm	10:37
frankban	(and ready to be connected to)	10:37
frankban	cool	10:37
frankban	fwereade, jam: thanks	10:39
jam	frankban: "juju status" or "juju api-endpoints" ?	10:39
jam	if you need the contents of status, go for it	10:39
frankban	jam: I guess juju status until the agent is started, and then api-endpoints	10:40
jam	frankban: shouldn't api-endpoints wait for the agent as well?	10:40
jam	Doesn't it today?	10:40
frankban	jam: I don't know, and maybe you are right, and I am being too paranoid	10:41
jam	frankban: well, it should because it should have to connect to the environment to answer your question, but I would certainly consider testing it first	10:43
frankban	jam: ack, thanks	10:43
fwereade	jam, frankban: I suspect that api-endpoints uses Environ.StateInfo and will thus return the address for a machine that is not necessarily ready	10:43
fwereade	jam, frankban: status needs a working machine	10:43
frankban	fwereade: oh, ok. so the current quickstart behavior seems ok	10:44
fwereade	frankban, cool	10:44
jam	fwereade: so, api-endpoints should talk to the API to see if there are any other machines it doesn't know about yet.	10:49
jam	for the HA sort of stuff. But probably today it may not.	10:49
fwereade	jam, I agree, but I don't think it does today	10:49
jam	fwereade: and we're overdue for standup	10:49
jam	rogpeppe1 and dimitern: time to rumble for who gets which 1:1 time slot. :) We can have you both go at 9am local time (8UTC for Dimiter and 9UTC for Roger), or we can twist roger's arm into starting earlier	11:09
jam	but that means Dimiter needs to bring Roger food or something once he finally wakes up.	11:09
dimitern	:)	11:11
dimitern	so 8 UTC should be 10 local time I guess	11:11
dimitern	jam, I'm ok with that	11:11
jam	dimitern: you changed TZ? CEST is +1 after daylight savings time ended.	11:12
dimitern	jam, ah, so 9 am then	11:12
dimitern	jam, well, I think I can live with that for now :)	11:13
jam	dimitern: you should have an invite, though I don't know if your calender woes have been sorted out	11:19
* TheMue => lunch		11:34
dimitern	jam, I'll take a look, 10x	11:42
* fwereade lunch		11:51
dimitern	jam, accepted and added the invite	11:52
dimitern	fwereade, rogpeppe1, updated https://codereview.appspot.com/25080043/ - take a look when you have some time please	12:20
* dimitern lunch		12:20
hazmat	frankban, there are various pathological conditions where the jenv exists but the environment is not functional	13:03
frankban	hazmat: in which cases the status command fails, right?	13:04
hazmat	api-endpoints definitely does not wait for anything atm re the env endpoint actually being available, it just returns the address	13:04
hazmat	frankban, yeah	13:04
hazmat	jam, its quite a bit more robust when it doesn't talk to the actual api, and it gives a user an easy way to find a failed bootstrap node without resorting to provider tools.	13:05
hazmat	although i guess --debug does the same wrt to discovery of state server node	13:05
frankban	hazmat: so we can consider an environment to be bootstrapped when the jenv exists AND status returns a started agent	13:06
hazmat	jam, clients can query that info if they need it	13:06
hazmat	frankban, hmm.. i wouldn't invoke status.. i'd just try to connect to the api with a timeout	13:06
hazmat	i guess status works, if you don't want to do a sane/long timeout period, its just expensive to throwaway the result	13:07
rogpeppe1	frankban: tbh i favour the "try to bootstrap and succeed if the error says we're already bootstrapped" approach	13:11
rogpeppe1	frankban: that's essentially what you'd be trying to replicate (flakily) by trying to decide if the environment is bootstrapped by trying to connect to it	13:12
=== gary_poster\|away is now known as gary_poster
frankban	rogpeppe1: that was my first intent, it also avoids races. the only problem is having to parse stderr, which seems weak (i.e. the error message can change)	13:13
rogpeppe1	frankban: let's not change it then :-)	13:13
frankban	:-)	13:13
rogpeppe1	frankban: alternatively, just assume that if the .jenv file exists, the environment is bootstrapped	13:14
rogpeppe1	frankban: ignoring the pathological cases that hazmat mentions, because we're hoping to eliminate those	13:14
frankban	rogpeppe1: what about sync-tools?	13:15
rogpeppe1	frankban: most users will never use sync-tools, i think	13:15
hazmat	rogpeppe1, there's no removing them.. i delete all the instances in the aws console for example.	13:15
rogpeppe1	hazmat: well, in general there is no way to tell if an environment is non-functional because it hasn't been bootstrapped or because it is just not working	13:16
rogpeppe1	hazmat: in this case, we want to bootstrap if the env hasn't been bootstrapped	13:17
rogpeppe1	hazmat: if you've deleted all the instances, the environment has still (logically) been bootstrapped - it's just highly non-functional...	13:17
* frankban lunches		13:21
TheMue	rogpeppe1: ping	13:44
rogpeppe1	TheMue: pong	13:45
TheMue	rogpeppe1: you wrote in your review that the connection has to be closed too	13:45
rogpeppe1	TheMue: that's fwereade's suggestion yes.	13:46
rogpeppe1	TheMue: personally i think it's a risky thing to do	13:46
TheMue	rogpeppe1: so is it ok to explicit kill the srv in the root too inside of root.Kill()?	13:46
rogpeppe1	TheMue: not really	13:46
rogpeppe1	TheMue: you should probably just close the underlying connection	13:47
rogpeppe1	TheMue: (you'll need to actually pass it around - it's not currently available in the right place)	13:47
TheMue	rogpeppe1: what would then happen to the other users/holders of the connection?	13:48
rogpeppe1	TheMue: there's only one - the agent at the other end	13:49
TheMue	rogpeppe1: I meant differently, there are references to the conn inside the initial root (if I see it correctly). how will it behave if I close the connection?	13:50
rogpeppe1	TheMue: it should all just shut down in an orderly fashion	13:50
rogpeppe1	TheMue: i don't think there's any other way to drop the connection	13:51
TheMue	rogpeppe1: so in order to shut it down close the conn instead of shut it down to close the conn?	13:51
TheMue	rogpeppe1: so I have to find a nice way how to pass the conn to where I need it	13:52
TheMue	rogpeppe1: ah, found one	13:54
rogpeppe1	TheMue: i don't think you can close the rpc.Conn, BTW	13:55
rogpeppe1	TheMue: although... maybe it might work	13:56
rogpeppe1	TheMue: in fact, that's the right thing to do	13:57
TheMue	rogpeppe1: hmm? to close or not to close, that's the question	13:59
rogpeppe1	TheMue: if you do close, you'll need to do it asynchronously	13:59
rogpeppe1	TheMue: if you're going to close the connection, i think the right thing to do is just close it. that will take care of killing the relevant pinger	14:01
rogpeppe1	TheMue: hmm, except that by default we don't want to kill the pinger, we'll just stop iit	14:01
TheMue	rogpeppe1: and what do you mean with async?	14:01
rogpeppe1	TheMue: rpc.Conn.Close blocks until all requests have completed	14:02
rogpeppe1	TheMue: if you call it within a request, you'll deadlock	14:02
rogpeppe1	TheMue: hmm, except in fact it'll be called in separate goroutine anyway, so it might work ok	14:02
TheMue	rogpeppe1: ah, ic	14:02
* TheMue dislikes terms like "should" or "might" ;)		14:03
* rogpeppe1 goes to check that Pinger.Kill followed by Pinger.Stop will work		14:04
rogpeppe1	TheMue: the "might" comes from the fact that you'll have to make sure that a request can't block itself on the timeout goroutine because the timeout goroutine is trying to close the connection	14:05
dimitern	fwereade, https://codereview.appspot.com/25080043/ updated	14:05
rogpeppe1	TheMue: it's probably best just to do go conn.Close() tbh	14:06
dimitern	fwereade, I'll do some live upgrade testing later today	14:06
TheMue	rogpeppe1: ok, thanks, will take that approach	14:07
fwereade	jam, mgz: so, when I bzr annotate, and I see a number like "1982.5.6"... how do I turn that into an actual revision on trunk?	14:10
jam	fwereade: you mean when it was merged?	14:10
jam	you could do "bzr log -r 1982.5.6..-1"	14:10
jam	or use bzr qannotate	14:10
jam	(apt-get install qbzr)	14:10
jam	which shows that stuff in the log	14:10
fwereade	jam, thanks	14:11
mgz	fwereade: also, `bzr log -rmainline:1982.5.6`	14:13
fwereade	mgz, thanks also :)	14:15
jam	fwereade: the thing I really like about qbzr is it is pretty easy to jump around, so you can see what rev modified a file, and then quickly see what it looked like before that change	14:17
jam	but I see how for --force you reall just want to see the mainline commits to merge it back to 1.16	14:17
rogpeppe1	mgz: interesting. what does the "mainline:" in there make a difference?	14:18
rogpeppe1	s/what/why/	14:18
fwereade	rogpeppe1, I think I'm being stupid -- can you explain how https://codereview.appspot.com/14619045/ precipitated the change in jujud/machine_test.go ? because I can fix my 1.16 problem by adding JobManageState, and I can see why it's necessary -- but I can't figure out what in the minimal set of branches necessary to get destroy-machine --force might have actually triggered it	14:19
jam	rogpeppe1: "bzr log -r mainline:X" logs the first mainline revision (no dots) that includes the revision in question	14:20
jam	if you do a range like "bzr log -r 1982.5.6..-1" you'll be able to see which one it is	14:20
jam	but the "mainline:" is just that rev	14:20
* fwereade waits for someone to point out something that'd be obvious to an partially-sighted and fully-intoxicated monkey		14:21
rogpeppe1	jam: so that signifies something to log in particular, or is that something that's useful anywhere a revno can be used?	14:21
jam	rogpeppe1: should be anywhere a revno can be used	14:21
jam	diff, etc	14:21
jam	it is just for "-r"	14:21
jam	fwereade: so the change for Uniter to use the cached addresses from state	14:22
jam	means that Uniter.APIAddresses	14:22
jam	needs to have a JobMachineState somewhere	14:22
jam	to report its IP address	14:22
jam	fwereade: the general change to state/api/common/addresser.	14:22
jam	fwereade: does that make sense?	14:22
rogpeppe1	jam: so can 1982.5.6 actually specify a different revision to mainline:1982.5.6 ?	14:22
jam	rogpeppe1: 1982.5.6 is the revision itself, mainline:1982.5.6 is the revision which merged that revision into trunk	14:23
jam	rogpeppe1: "try it" ?	14:23
rogpeppe1	fwereade: it's necessary because one of the agents started by TestManageEnviron calls State.Addresses	14:24
rogpeppe1	fwereade: i can't quite remember the details of which	14:24
rogpeppe1	fwereade: when you say "triggered it" what are you referring to?	14:25
jam	rogpeppe1: he is trying to backport his destroy machine --force, and it seems it is carrying with it some unexpected baggage	14:25
fwereade	rogpeppe1, I cherrypicked a few unrelated branches and got those JobManageEnviron tests failing, and I can't figure out why -- apart from that, obviously, it can't work as written, and does work if it also runs JobManageState	14:26
* rogpeppe1 goes to pull 1.16		14:26
fwereade	rogpeppe1, sorry, I don't want to properly distract you -- if there's no immediate "oh yeah that was weird" that springs to mind I'll keep poking happily	14:28
rogpeppe1	fwereade: if none of the new addressing stuff made it into 1.16, it's weird that this is happening	14:29
rogpeppe1	fwereade: i.e. that adding JobManageState fixes anything	14:29
fwereade	rogpeppe1, yeah, indeed	14:30
abentley	I am trying to destroy an azure environment and failing: http://pastebin.ubuntu.com/6411031/	14:54
fwereade	abentley, consistent?	14:57
abentley	fwereade: Yes.	14:57
abentley	fwereade: using 1.16.3, but it may have been bootstrapped with 1.17.x	14:58
fwereade	abentley, I don't know azure at all really, but is it possible you've got some instances still around? or running under the same account? iirc the failing bit is one of the last steps at teardown time	14:59
abentley	fwereade: I don't know much about azure either. I've just been using it through juju.	14:59
fwereade	jam, do you recall, did natefinch do any of the azure stuff?	15:00
rogpeppe1	lunch	15:44
jam	fwereade: natefinch has worked with azure, and has done our Windows builds	15:51
fwereade	jam, thought so -- he's coming back later, right?	15:52
jam	fwereade: I believe so.	15:52
jam	abentley: I know that instance teardown is particularly bad there	15:52
jam	gossip says it takes minutes to tear down one machine, and deleting machines is protected by a single lock	15:52
abentley	jam: Yes, I've witnessed the slow teardown myself.	15:53
jcsackett	sinzui or abentley: either of you have time to review https://code.launchpad.net/~jcsackett/charmworld/better-latency-round-2/+merge/195091 ?	16:18
abentley	jcsackett: sure.	16:18
jcsackett	abentley: thanks.	16:18
* fwereade needs to stop for a while, will probably be back to say hi to those in the antipodes at least		16:55
dimitern	fwereade, is https://codereview.appspot.com/25080043/ good to land?	17:40
sinzui	CI found a critical regression, bug #1250974. I'll talk to wallyworld_ when he comes on online about it.	18:40
_mup_	Bug #1250974: upgrade to 1.17.0 fails <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1250974>	18:40
sinzui	abentley, ^ do you want to revise the details I put in the description	18:47
* rogpeppe1 is done		18:50
abentley	sinzui: done. (just swapped 2052 to 2053 at the end).	19:08
sinzui	thank you	19:09
natefinch	Three months on Ubuntu and I only just now realized that (0:08) next to my battery means it'll be charged in 8 minutes, not that it thinks there's only 8 minutes of charge left :/	20:28
hazmat	anybody know golang internals? we're debugging some reflection issues in gccgo	20:40
hazmat	natefinch, how's the xps15 treating you?	20:48
hazmat	fwereade, i know the azure stuff, what's up?	20:48
* hazmat reads abentley's traceback		20:49
hazmat	abentley, so the azure provider is basically synchronous and very careful about cleaning up after itself	20:49
natefinch	hazmat: so far... some problems. The hardware is awesome, but Ubuntu is having some problems... trying to get bumblebee working to enable optimus so it'll use the NVidia GPU instead of just the built-in one on the Intel chipset.	20:49
hazmat	natefinch, how's the keyboard?	20:50
* hazmat has nightmares about old dell laptop keyboards		20:50
abentley	hazmat: That's good to know, but it did not succeed in this case.	20:50
hazmat	abentley, two things.. you can log into the console (or use the nodejs cli tools) to verify you have no machines running, and then try running destroy again with --debug	20:50
natefinch	hazmat: not terrible... it's standard size... takes a little getting used to it. but I can still generally type without looking. Of course, stuff like home end etc is moved around some.	20:51
abentley	hazmat: But this may be an Azure bogosity. As far as sinzui and I can tell, it is impossible to delete the network. We have tried from the Azure web console, and though nothing is using it, Azure says it can't be deleted because things are using it.	20:51
hazmat	abentley, normally the azure provider does some polling against the operation event stream	20:51
hazmat	hmm	20:51
hazmat	abentley, its worked in the past but there have been changes on both end	20:51
hazmat	its been about 1.5 m since i last run the azure provider..	20:52
abentley	hazmat: But even the web console doesn't work, so I'm inclined not to blame the azure provider.	20:52
hazmat	abentley, fair enough.. there are lots of resources not nesc. exposed in the ui.. so both you and sinzui had this issue?	20:52
abentley	hazmat: We are using the same subscription, so we have the same set of resources.	20:53
hazmat	abentley, aha	20:53
hazmat	abentley, but your using different env names and controls and networks?	20:53
hazmat	abentley, i dunno that cross env sharing of resources is going to work so well	20:54
abentley	hazmat: Yes.	20:54
abentley	hazmat: We're both doing fairly limited testing, with different environment names, so it doesn't seem likely that we'll exhaust each others' resources.	20:54
abentley	Or otherwise trip on each others' feet.	20:55
hazmat	abentley, well...	20:57
hazmat	abentley, so everything in your azure provider section is different?	20:58
=== BradCrittenden is now known as bac
abentley	hazmat: No, storage-account-name, management-subscription-id, management-certificate-path are the same. Not sure about admin-secret.	21:00
abentley	hazmat: But I don't see how that's relevant to the unkillable network.	21:01
* hazmat logs into azure console		21:04
hazmat	abentley, does it show networks == 0 ?	21:06
hazmat	in the console	21:06
hazmat	abentley, the azure provider does some stuff with networking (interlink between services) which isn't represented in the console	21:07
hazmat	or the api really, just raw xml	21:07
abentley	hazmat: No, it shows networks == 2.	21:07
hazmat	abentley, so possibly you have interlinks between the services in two different environments	21:08
hazmat	which would explain why neither can be deleted	21:08
hazmat	just guessing though.. easy to find reproduce if that's the case	21:09
abentley	hazmat: That can't be the cause, because the second network was created when we found we couldn't destroy the first network.	21:10
abentley	hazmat: i.e. the problem pre-dated the second network.	21:11
hazmat	abentley, hmm..	21:12
hazmat	abentley, okay.. let me create and destroy and env.. which version you using?	21:12
hazmat	of juju	21:12
abentley	1.16.3	21:12
hazmat	k, i'm trying with trunk	21:14
hazmat	we should really default the region to the same one the imagestream refs	21:16
hazmat	ie East US	21:16
abentley	hazmat: sinzui reports he's had a lot of trouble with East.	21:19
hazmat	abentley, but simplestreams metadata refs images there.. how do you get around the affinity group otherwise?	21:20
hazmat	abentley, ie https://bugs.launchpad.net/juju-core/+bug/1251025	21:20
_mup_	Bug #1251025: azure provider sample config should default to East US <juju-core:New> <https://launchpad.net/bugs/1251025>	21:20
abentley	hazmat: All I know is that it works.	21:20
natefinch	hazmat: bwahaha, finally got it working (mostly user error I think). 24" 1920x1200, 30" 2560x1600, 15.6" 3200x1800 all running smoothly.	21:31
abentley	hazmat: I've run destroy-environment and we're back to just 1 network.	21:34
abentley	hazmat: And it still fails to delete.	21:35
hazmat	abentley, the azure provider doesn't provider very much log output..	21:36
hazmat	abentley, created and destroyed env without issue here.	21:37
hazmat	abentley, by chance you know what was deployed in the first env.. just the ci test of wordpress/mysql ?	21:37
abentley	hazmat: Yes, that's what it was.	21:37
hazmat	there's like this 3 minute pause during bootstrap with no info given to why	21:39
jcsackett	sinzui: do you have time to look at a one line MP? https://code.launchpad.net/~jcsackett/charms/precise/charmworld/fix-lp-creds-ini-override/+merge/195146	21:47
sinzui	I do	21:47
sinzui	jcsackett, r=me	21:48
jcsackett	sinzui: thanks.	21:49
wallyworld_	sinzui: hi there. is there another upgrade bug? :-(	21:50
sinzui	wallyworld_, yes. take your time and read though it and maybe the log: https://bugs.launchpad.net/juju-core/+bug/1250974	21:52
_mup_	Bug #1250974: upgrade to 1.17.0 fails <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1250974>	21:52
wallyworld_	ok. i used the same deprecation mechanism as for public bucket url so i'll have to see why that's not working here	21:53
sinzui	wallyworld_, maybe it is how we test	21:54
sinzui	wallyworld_, abentley hp cloud is not failing. are the configs different?	21:54
wallyworld_	interesting. hp cloud does have a sloightly different config boilerplate written out for it via juju init	21:55
wallyworld_	but that's just some extra comments in the yaml afaik	21:55
sinzui	wallyworld_, I only speculated as the order of event that could lead to testing/ being ignored. I think the 1.16.3 tools was selected from the testing location, but by the moment of the upgrade, testing/ was no longer known	21:56
abentley	sinzui: I haven't tested in a way that would show hp failing. We test canonistack before hp, so we never bother to check whether hp is failing.	21:56
abentley	sinzui: Because we already know it's a fail.	21:56
abentley	sinzui: As far as I know, everything except "provider" and and "floating-ip" is configured differently between hp and canonistack.	21:58
wallyworld_	sinzui: it could have something to do with the jenv stuff - that is relatively new and has been evolving since the public bucket url deprecation mechanism last worked	21:58
sinzui	abentley, if we don't use metadata-tools-url in the config, does it work?	21:58
abentley	sinzui: Gotta go, but I'll check that first thing tomorrow.	21:59
sinzui	I suspect it does. I think it is not possible to upgrade with a config that users are likely to have if the read the release notes or just respond to what juju is saying	21:59
sinzui	thanks abentley	21:59
sinzui	wallyworld_, per what I asked of abentley ^ I think the issue with upgrades is mixed configs.	22:01
wallyworld_	as in they put tools-metadata-url in their new 1.7 env config and upgrade a 1.6 which didn't have it?	22:02
wallyworld_	actually, if the jenv files are still present, i think any changes to the env yaml are ignored	22:04
wallyworld_	i'm not 100% sure, but i've had to delete the jenv files previously if i wanted to introduce new config	22:04
wallyworld_	not very intuitive if you ask me	22:04
wallyworld_	so perhaps the user reads the release notes, edits their env yaml to add tools-metadata-url, but it is ignored because the jenv file is there and the yaml is ignored?	22:06
wallyworld_	i'll have to check with the folks who did the jenv stuff, or read the code	22:06
sinzui	wallyworld_, I want the old config to just work for users to upgrade. If users have a perfect new config it should work of course. In the case of a config with old and new values, We need to be certain we honour them. juju servers will be different that juju clients, so the configs need to work for all cases	22:07
wallyworld_	sinzui: agreed. that's what the current code in 1.7 does - it sees tools-url, logs a warning, and sets tools-metadata-url to the tools-url value. that's how the public bucket url deprecation worked also. so i'm not sure what's happening to make it fail	22:08
sinzui	wallyworld_, if we decided to fix bug 1247232, we might be able to be strict about what is in the config	22:08
_mup_	Bug #1247232: Juju client deploys agent newer than itself <ci> <deploy> <juju-core:Triaged> <https://launchpad.net/bugs/1247232>	22:08
sinzui	wallyworld_, once how to the old bootstrap node get the new tool? It looks like it is searching for it. since it doesn't know about metadata-tools-url, it cannot find it?	22:09
sinzui	Shouldn't the client tell the server the exact version and location to use since it had to do the lookups?	22:10
wallyworld_	sinzui: i'm not entirely familiar with the upgrade workflow - what data is passed to where etc. but that would seem sensibl	22:11
wallyworld_	sinzui: one thing i can think of - i remove the old tools-url from the config struct that is parsed from the yaml once the new tools-metadata-url is set. perhaps that data is being sent to the old node which then doesn't see a tools-url is recognises?	22:12
sinzui	wallyworld_, The log implies the server has a new config with 1.16.3. it doesn't have a tools-url set	22:13
* sinzui wishes zless shows line numbers		22:13
wallyworld_	sinzui: so it seems perhaps that my assumption that the tools-url could be deleted from the in memory config struct once tools-metadata-url is set is wrong, since that data ends up being passed to the older 1.16 nodes. that surprises me because i wasn't aware that would happen	22:16
sinzui	wallyworld_, about line 2674 of the log, I see the last know occurrence of /testing After we see that config we see juju search to the new tool, in the wrong location	22:16
sinzui	wallyworld_, I think you mean the value is cleared. I believe axw reported a bug that it is not possible to delete a config key.	22:18
sinzui	And maybe that has helped insulate us from upgrade issues	22:19
wallyworld_	sinzui: right, yes. i clear the tools-url from the map of config values held by the config struct when the yaml is parsed	22:19
wallyworld_	so, thinking out loud, if the wrong tools location is being used, perhaps it's the new 1.17 nodes not having a config with tool-metadata-url set	22:21
wallyworld_	sinzui: so that log file just has warnings logged - it would be nice to see debug so we can see the simplestreams search path used	22:23
wallyworld_	i'll see if i can do a test to get that happening	22:24
sinzui	wallyworld_, we see all the paths being checked at 2735 and the entries say DEBUG	22:25
wallyworld_	sinzui: i am stupid - i was looking at the console log from the link in the descripton. i didn't see the attachment	22:26
sinzui	wallyworld_, no need to think ill of yourself. Aaron and I also made the same mistake	22:26
wallyworld_	maybe i need more coffee	22:27
sinzui	wallyworld_, we are a day or two away from being able to run an arbitrary branch + rev + cloud to do a singe test. Aaron hacked two runs to replay the last success and first fail today	22:28
wallyworld_	sinzui: i'll go through the log and figure out wtf is happening and hopefully have a fix for you. sorry about the bug. i suspect it is due to slightly new config workflow interfering with it but am not sure	22:28
sinzui	wallyworld_, thank you. Take your time to do a proper fix	22:30
wallyworld_	will do. there's been a bit of churn under the covers that i'm not involved with so old assumptions maybe don't apply anymore. i think i'll have to do a full upgrade test by hand to validate	22:31
sinzui	davecheney, I don't have a short list of bugs to address. I see http://tinyurl.com/juju-stakeholders and https://launchpad.net/juju-core/+milestone/1.17.0 as bugs we want to help fix. There are so many I think we can confidently work on those that we can confidently fix quickly	23:24
davecheney	sinzui: ack	23:27
davecheney	sinzui: work is ongoing to get juju building under gccgo	23:27
davecheney	i should find some time convenient to your team to break the bad news about this extra dimention of the testing matrix	23:28
sinzui	davecheney, I saw that. I had an unexpected event last evening. I had thought I would have time to ask you about that. Did I deploy still-born juju-core arm packages for 1.16.3?	23:30
sinzui	s/deploy/release/	23:30
davecheney	sinzui: not really sure	23:31
davecheney	does anyone use those packages ?	23:31
sinzui	I suspect not since the test suite doesn't work.	23:33
sinzui	We only CI on amd64. I sign tarballs that pass on amd64	23:33
davecheney	sinzui: do you have an example failure ?	23:36
davecheney	that sounds like the sort of thing that is in my baliwak to fix	23:36
sinzui	I don't. I saw a bug report from michael hudson and saw you comment on it	23:37
davecheney	sinzui: yeah, we have a fix	23:54
davecheney	need it to be merged on the mgo repo	23:54
sinzui	davecheney, speaking of mgo, does the test suite always pass for you? I often see mgo failures. they are are commong for me now, but rare back in september	23:55
davecheney	sinzui: i haven't run that test suite even once int he last year	23:57
davecheney	is there a jenkins job for it ?	23:57
davecheney	please assign any failure reports to me	23:57

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!