/srv/irclogs.ubuntu.com/2013/11/12/#juju-dev.txt

thumper	o/ wallyworld_	00:31
wallyworld_	hi	00:31
thumper	wallyworld_: how's tricks?	00:31
wallyworld_	ok, just typing stuff into shietveld	00:31
thumper	:)	00:31
wallyworld_	for william's branch	00:31
thumper	wallyworld_: have you looked at fwereade's branch or shall I do it?	00:31
wallyworld_	just finishing up now	00:31
wallyworld_	i have a branch of my own that i have tweaked that you could look at	00:32
thumper	sure	00:32
wallyworld_	cool :-)	00:32
thumper	I have another for the kvm prelude too	00:32
wallyworld_	ok	00:32
wallyworld_	will look after i finish current one	00:32
wallyworld_	faaark. just lost all my comments in the review :-( paste buffer error :-(	00:34
wallyworld_	cause some kiwi distracted me	00:34
* thumper looks around for that nasty kiwi		00:36
* thumper can't see anyone that matches that description		00:36
wallyworld_	thumper needs a mirror :-P	00:37
thumper	wallyworld_: is it the supported containers that needs review?	00:38
wallyworld_	yeah	00:38
wallyworld_	thumper: i knew i would get pushback on nil vs empty. sigh	00:38
thumper	heh	00:38
thumper	I think that [instance.NONE] is pretty clear for supported containers	00:39
thumper	but I do agree	00:39
thumper	in general	00:39
thumper	that empty != nil	00:39
wallyworld_	well it's true in python, java, c++ etc etc	00:39
wallyworld_	everything except Go it seems	00:39
thumper	well, they are different in go doo	00:40
thumper	too	00:40
thumper	but some people treat them the same	00:40
wallyworld_	well, sometimes	00:40
thumper	nil != []	00:40
wallyworld_	sure	00:40
wallyworld_	but you can append and iterate using a nil splice	00:40
wallyworld_	slice	00:40
wallyworld_	and it behaves just like empty	00:40
thumper	yeah...	00:40
wallyworld_	and marshalling stuffs it up	00:40
wallyworld_	i dowonder what crack the language designers were smoking	00:41
wallyworld_	not only in this instance	00:41
thumper	someone making opinionated decisions that work most of the time	00:42
thumper	I can see it from their point of view	00:42
wallyworld_	opinionated decisions is ok if they are sensible	00:45
wallyworld_	but throwing out decades of established semantic meaning is not a good decision	00:45
wallyworld_	thumper: so when are have finished +1'ing my branch :-) do you want a quick hangout?	00:52
thumper	sure for the hangout, not conviced it'll get a +1 yet :P	00:53
wallyworld_	ok, hangout first perhaps	00:53
wallyworld_	https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig	00:53
wallyworld_	when you are ready	00:53
thumper	wallyworld_: can you not hear me?	01:03
wallyworld_	no	01:03
thumper	I can see bars when I talk	01:03
wallyworld_	i'll try a new hangout	01:03
wallyworld_	https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.rmi4baa0vuu8vp4qs7eb2pd41g	01:04
wallyworld_	thumper: ?	01:05
* thumper goes on a quick dog walk and school run		01:54
wallyworld_	thumper: ping for when you return	02:25
thumper	return of the thumper	02:40
jam	morning thumper and wallyworld_	02:48
wallyworld_	yello	02:48
wallyworld_	thumper: i am sad that the provisioner code is full of switch statements - if environ do this else if lxc do that. what sort of non-oo is that?	02:49
thumper	wallyworld_: it is not full	02:50
thumper	there are a few switches in the factory methods	02:50
wallyworld_	and the loop	02:51
wallyworld_	but ok	02:51
thumper	the vast majority of the provisioning logic is reused	02:52
thumper	geez	02:52
wallyworld_	alright :-)	02:54
jam	thumper: is it possible to pull the bits that aren't out into an interface rather than using a switch ?	02:55
jam	(wallyworld isn't the first to see it and wonder)	02:55
jam	which probably means he won't be the last	02:56
wallyworld_	jam: an interface is what i was hoping for	02:56
thumper	probably	02:59
jam	thumper: fwiw, your container-interface bounced which caused your followup to also get blocked. Not sure if you saw it. The failure doesn't look like a flakey test immediately to me.	03:00
thumper	ok	03:09
thumper	jam: I'll take a look shortly, so much on...	03:10
thumper	jam: that is interesting, because the tests all passed locally	03:13
thumper	jam: I'm going to mark approved to see if it is intermittent like I think it is	03:13
thumper	jam, wallyworld_: yes it may well make sense to split that shit out into an interface	03:14
thumper	I probably should have thought of it at the time, but obviously didn't :)	03:14
thumper	I will leave it as an exercise for the reader (wallyworld_), as a prereq	03:14
wallyworld_	\o/	03:22
wallyworld_	not	03:22
thumper	wallyworld_: aw, c'mon, you love this shit	03:25
wallyworld_	sometimes	03:25
thumper	jam: yes, it was an intermittent test failure :(	03:29
hazmat	is this normal.. not found error on https://streams.canonical.com/juju/tools/streams/v1/index.json w/ trunk	03:43
hazmat	looks like some of the cli flags have been changing around..	03:49
hazmat	breaking existing scripts	03:49
bigjools	thumper: mind your language, it's a family channel	03:51
thumper	:P	03:51
hazmat	hmm.. does trunk work?	03:58
hazmat	trying to bootstrap right after destroy-env.. http://pastebin.ubuntu.com/6403355/	03:58
hazmat	if i leave off the upload-tools it hangs indefinitely..	03:59
hazmat	hmm... but if do a bootstrap without upload-tools after upload-tools it succeeds.	04:01
hazmat	sigh.. new control bucket and everything works fine.	04:09
thumper	wallyworld_: plz help hazmat :)	04:13
wallyworld_	hmm?	04:13
wallyworld_	new control bucket and it works sounds like an s3 issue	04:15
wallyworld_	error says tools tarball could not be uploaded to s3 bucket	04:15
wallyworld_	i'm not sure what juju related issue that might be	04:15
wallyworld_	i wonder how many times such an error has come up	04:16
hazmat	wallyworld_, here's another one.. right before that.. where bootstrap hangs forever.. http://pastebin.ubuntu.com/6403423/	04:17
hazmat	well 10m, before i killed it.	04:17
hazmat	changing to a new bucket and things seem to work again.	04:17
hazmat	wallyworld_, the odd part i was able to upload tools in a subsequent run to that same bucket before changing buckets.	04:17
wallyworld_	hazmat: the logs just make it seem like juju asks ec2 to do something (eg write data to a bucket, send data to an instance) and ec2 says no	04:19
wallyworld_	are we sure it's not just transient ec2 weirdness?	04:19
hazmat	wallyworld_, dunno, i couldn't reach the aws s3 web console.. to poke around more.. although i can hit it with the cli tools	04:20
wallyworld_	the stuff that's logged looks like one would expect on the surface eg looking for tools, not finding them, looking elsewhere etc	04:21
hazmat	wallyworld_, so we end up hitting about 12 urls on bootstrap (4 url x 3 times) is that normal?	04:21
wallyworld_	we look for tools in control bucket, streams.canonical.com, and for each of those, we look for signed and unsigned metadata	04:22
wallyworld_	and we try once initially, then upload if needed, then look again	04:22
wallyworld_	so there are indeed several urls we hit	04:22
wallyworld_	and we also look for tools mirrors	04:23
wallyworld_	signed and unsigned	04:23
wallyworld_	the debug logs can therefore be quite noisy	04:23
hazmat	ic	04:24
wallyworld_	it can be distracting	04:26
wallyworld_	but the debug is needed if things go wrong	04:26
wallyworld_	eg i put tools or image metadata somewhere and it is not used, does juju even look where i think it is looking	04:27
=== axw_ is now known as axw
=== axw_ is now known as axw
axw	wallyworld_: any tips on getting insight into what mgo/txn is doing? I've added a new Assert-only (no Insert/Update) op, doesn't seem to be doing anything	06:24
axw	or jam, or thumper...	06:25
wallyworld_	axw: i've only ever used println etc, i'm not sure what tools there are	06:25
jam	axw: I think the only master of txn is Wallyworld, but I have some understanding of how it works.	06:25
wallyworld_	so i print what is there in the db before the call, and again after	06:26
axw	mmk. I kinda need to know what's going on inside mongo I think :/	06:26
wallyworld_	what are you trying to do?	06:26
axw	stop service/unit/machine adds if environment's (new) life field is not Alive	06:26
wallyworld_	there are a few notAlive asserts you xcan copy from i think	06:27
axw	yeah, I've been referring to them	06:27
axw	it's a bit more complex, because this one has to cater for non-existent life field too	06:27
wallyworld_	you could try making the assert deliberately fail	06:28
axw	also, it's on a foreign doc	06:28
wallyworld_	ah	06:28
axw	ah yeah, good point	06:28
wallyworld_	bear in mind, all the asserts are evaluated up front	06:28
wallyworld_	so if you have a slice of txns	06:28
wallyworld_	an operation early on will not cause an assert later on to fail	06:28
axw	that's fine, that's what I want	06:29
wallyworld_	it sucks balls actually	06:29
wallyworld_	for me anyway :-)	06:29
axw	well, it suffices anyway :)	06:29
wallyworld_	anyway, it's a bit of trial and error i've found	06:29
wallyworld_	start simple, with it failing initially, and build up from there	06:30
axw	no worries, just thought I'd ask before I head into the hole	06:30
axw	sounds good, I'll try that	06:30
axw	ta	06:30
wallyworld_	good luck :-) ask if you get stuck	06:35
wallyworld_	fwereade: ping?	07:04
fwereade	wallyworld_, pong	07:05
axw	wallyworld_: I was comparing integers with strings.. duh. Turns out I can just use the existing assertion anyway, since Alive is the zero value	07:05
wallyworld_	\o/	07:05
fwereade	wallyworld_, thanks for the review	07:05
wallyworld_	fwereade: i hope it made sense	07:05
wallyworld_	fwereade: i have started some work to delay the start up of the container provisioners until they are needed	07:06
wallyworld_	it's really my first foray into that tar pit	07:06
fwereade	wallyworld_, largely, yeah, but jam's comments largely anticipate my own responses	07:06
wallyworld_	i would not be surprised if what i have splatted out is shitful	07:07
wallyworld_	i didn't realise about the life issue	07:07
wallyworld_	seems kinda confusion	07:07
fwereade	wallyworld_, in particular, we can't necessarily make machines dying	07:07
jam	fwereade: you seem back to work awfully soon after you signed off to go to sleep :)	07:07
wallyworld_	it sort of goes against expectations that a machine is destroyed but still "alive"	07:08
fwereade	jam, ehh, I'm awake now, might slope off for another part of the day :)	07:10
fwereade	wallyworld_, agreed -- sorry about the currently-crappy model	07:11
wallyworld_	fwereade: so i was hoping you could look at my shitful attempt to produce something and tell me it's all crackful and i need to start again	07:11
wallyworld_	it's justa bit of a brain dump so far	07:11
fwereade	wallyworld_, it would be my pleasure ;)	07:11
wallyworld_	but i want feedback before i go any further	07:12
wallyworld_	https://code.launchpad.net/~wallyworld/juju-core/lazy-container-provisioners/+merge/194795	07:12
wallyworld_	i'm no so sure about the watcher	07:12
wallyworld_	i know it's a bit of duplication	07:12
wallyworld_	there's similar code in other packages	07:12
wallyworld_	but i sorta want to see if it is the right thing to do first and then it can be consolidated	07:13
fwereade	wallyworld_, would you lbox propose -wip so I can line-by-line it please?	07:13
wallyworld_	sure	07:13
wallyworld_	so, the idea is - when machine agent starts, do not start provisoner for containers	07:14
wallyworld_	use a watcher to see when a container is first requested	07:14
wallyworld_	and only then start the provisoner	07:14
wallyworld_	also at start up, update the machine to records the supported containers	07:15
wallyworld_	so if a user add-machine an unsupported one, it will fail	07:15
wallyworld_	fwereade: https://codereview.appspot.com/25040043	07:15
wallyworld_	i have to go to soccer for a bit but will be back in a couple of hours	07:15
axw	fwereade: I have a WIP incoming for destroy-environment changes that I'd love some feedback on too	07:16
wallyworld_	fwereade: bear in mind i've not really done much with watchers before so i'm not sure if what i've done is even correct or as per the design of watchers. i've also not run the code or anything yet either	07:17
wallyworld_	take a number :-P	07:17
axw	:)	07:17
wallyworld_	i'll be away from keyboard for a bit anyway :-)	07:17
rogpeppe1	mornin' all	07:27
axw	morning rogpeppe1	07:27
rogpeppe1	axw: hi	07:28
axw	fwereade: https://codereview.appspot.com/22870045/ -- the title is probably misleading, but when you have some time	07:40
axw	just looking for a general in-the-right-direction or not	07:41
jam	axw: so machine agents can't actually stop their own machines, right? That has to be done by the Provisioner (or the juju client) because nobody else has the provider secrets.	07:49
jam	so I'm wondering if the watch needs to be a step higher as well. (Provisioner watches for env destruction, waits a bit (but probably not forever) for things to go to dying/dead and then kills the container/machine)	07:50
axw	jam: I don't follow	07:51
axw	there's a global flag, saying "environment is dead"	07:51
axw	the machiner watches for that, and then terminates when it is set	07:51
jam	axw: an important step in destroy-environment is to actually terminate the machines	07:51
jam	the machiner can't terminate itself	07:52
axw	jam: right, that bit is still on the CLI	07:52
axw	well	07:52
jam	(can't call EC2.TerminateMachine(my-instance-id)	07:52
axw	still done by the provisioner & CLI	07:52
jam	axw: so if the CLI is terminating the instances, they won't have a chance to cleanup the shutdown, right?	07:52
jam	and the mongodb main server is being killed, so do the watches fire?	07:52
jam	I guess the question is, what waits for what and how long do we wait?	07:52
jam	I would tend to say, if a Provisioner can Terminate a machine, there isn't much point waiting for the Machiner to tear itself down first.	07:53
jam	we really like this for the manually registered machines	07:53
jam	but for the rest, I go back and forth.	07:54
axw	jam: still not following :) where do we wait for the machiner to tear itself down?	07:54
axw	jam: mostly everything works as it does today, with the exception of machine agents being told to shut down just before we return to the CLI	07:56
axw	for the manual provider, this is essential to having the state servers tear themselves down	07:56
axw	for the others, it's superfluous, as the CLI will kill them anyway	07:56
axw	(but left in for the sake of generality, and it doesn't really hurt)	07:57
jam	axw: so if we don't wait until we know that the machiner's actually got the message, we end up killing mongo, and they'll never see the message.	08:01
jam	How do we know when they've gotten the message?	08:01
jam	and why bother waiting for the ones that we know we're just going to Terminate anyway	08:01
jam	so my model would be something like "for all machines listed as manually registered, wait until the agent shows as dying, and then return to the CLI to nuke everything else"	08:02
axw	jam: ehh good point, I mucked that bit up. I had been destroying all non-state machines previously	08:02
jam	except the caveat from fwereade	08:02
jam	which is that you can't actually set a Machine to Dying until Units on that machine are Dead	08:03
jam	which has been proposed as something we should really fix in the model.	08:03
jam	axw: so when would the state machines get terminated?	08:03
jam	by themselves?	08:03
axw	jam: yes. DestroyJuju should kill non-state machines after destroying units, and wait for them. Then set environment to Dead, and the state servers see that and kill themselves	08:04
jam	also, I'm not a big fan of waiting for things indefinitely, so some sort of "wait for a while for them to notice, but then go ahead and kill everything else anyway" is a fair point for me.	08:04
axw	I had that in the code before, not sure why I took it out...	08:05
jam	axw: so in the HA world, it is possible that only the machine agent that is running the active Provisioner will be able to Terminate machines.	08:05
jam	axw: so I like your "DestroyJuju should ..." except for nodes that are Provisioned properly, I'm not sure why we should wait for them to tear down rather than just destroying them. If we have a use case, do it, but otherwise we just slow down destroy-environment.	08:06
axw	jam: we can make it only for manually provisioned nodes, I was just trying to avoid special casing	08:07
axw	but that's the only time it's needed	08:07
jam	axw: I'd be okay seeing it in practice, but it is nice that today "juju destroy-environment" with 20+ machines can still return in a few seconds.	08:07
axw	fair enough	08:08
jam	axw: now, I could hypothesize a use case	08:08
jam	like	08:08
jam	when we have Storage	08:08
jam	you may want to cleanly unmount the storage before terminating	08:08
jam	so that your PG data is safely flushed to disk for the next round of the environment	08:08
jam	or something like that	08:08
jam	but it is a bit hypothetical	08:08
axw	hmm yeah. I think, ideally, environ.Destroy should handle everything still (for cases where the state server is stuffed), but that sort of thing may be outside its reach	08:10
jam	axw: I think whatever we do to try and make things a "nice, peaceful death" we should still keep an axe around in case it all goes badly	08:10
axw	yup	08:10
jam	that may become "destroy-environment --force"	08:10
jam	but I'd like to make sure that normal shutdown is "fast enough" that people don't feel they should habitually supply the --force parameter.	08:11
axw	yup, fair enough	08:11
jam	axw: but I'm mostly just talking through ideas with you. I haven't looked at the specific code	08:12
axw	easy enough	08:12
jam	I'm not 100% sure what the use case / model is when you have the ability to pass out credentials to an environment that don't have the EC2 creds directly available.	08:12
jam	I'm guessing that the security model says that those people aren't allowed to trigger DestroyEnvironment	08:12
jam	so the fact that the client can't	08:13
jam	isn't a problem	08:13
jam	although...	08:13
jam	hosted Juju (aka JaaS)	08:13
jam	is a bit more unclear on this point. The state machines would run on hardware you don't control, but you should have the ability to stop the environment, start a new one, etc. I don't know what the design for that is, TBH.	08:14
jam	Does the juju CLI still keep env provider creds around?	08:14
jam	these are still hypotheticals, and nothing we have to get right today, but it can be useful to look at some future use cases and see if we align well enough with them.	08:15
axw	jam: yeah, I'm not sure about how the JaaS story is meant to work exactly. fwereade mentioned that we'd still want to have destroy-environment behind the API (entirely) for that, probably	08:16
jam	axw: I do think that one of the designs of that is the Provider creds are always secret from the user (known only inside the JaaS servers)	08:17
jam	in which case, we do need all the Terminate calls to happen from the Provisioner	08:17
axw	jam: actually, that should work with what I've coded (when I add destruction of non-state machines back into the code)	08:19
jam	axw: I don't think it hurts to do 2 passes (one more time in the client if we have the appropriate secrets to do so), as Terminate tends to be idempotent :)	08:20
axw	yup	08:20
fwereade	axw, so the issue I have on first glance at the description is that waiting for all units to clean themselves up makesenv destruction take orders of magnitude longer	08:21
jam	fwereade: welcome back	08:21
jam	you missed a small amount of discussion :)	08:21
fwereade	jam, ah sorry	08:21
fwereade	jam, easily precisable?	08:21
axw	fwereade: heya. we can make that manual-provider specific	08:21
fwereade	(I did eat some bacon though, that was nice)	08:22
jam	fwereade: http://irclogs.ubuntu.com/2013/11/12/ though it appears to not be quite up to date with the last 20 minutes.	08:22
jam	I would guess it polls every X minutes or something.	08:22
jam	fwereade: I had the same concern	08:23
jam	fwereade: and could come up with a hypothetical "clean shutdown benefits stuff like storage, or database saving their state" but it was pretty hypothetical	08:23
jam	fwereade: it also brings up the idea that "we don't have to wait for Machine to be dead, we just care that Manual machines have seen the "please shut down message"	08:24
jam	fwereade: so if we had a Machine.Dying that actually worked	08:24
jam	then we could just wait for that	08:24
jam	fwereade: and we have "we probably want the State machines to issue Terminate requests before returning to the CLI because we'd like to have that in a JaaS-ish world."	08:24
jam	and we can deal with the ACL of who has rights to issue destroy-environment as a purely ACL issues	08:25
fwereade	jam, axw: I'm sorry, I'm having some trouble following	08:26
fwereade	jam, axw may have already responded to this but I was wondering yesterday whether it made sense to refuse destroy-environment requests when there are exist manually provisioned machines with units on them	08:27
jam	fwereade: so the question, "Do we wait for all things to cleanup nicely before we Terminate them"	08:27
fwereade	jam, I don't think we should in general	08:27
jam	fwereade: for Manually registered machines, we can't Terminate them, so we sort of have to	08:27
jam	for the rest	08:27
jam	I can hypothesize a "backup your stuff to storage we're shutting down" sort of hook	08:27
fwereade	jam, cattle not pets -- but manually provisioned machines are more like pets so we shouldn't abuse and abandon them	08:28
jam	but we don't have that today	08:28
jam	fwereade: right. my hypothesis is around a DB charm and Storage implementation that we want clean shutdown so we can bring it up in the next life	08:28
axw	brb, making a cup of tea	08:28
jam	though you could just say "that must be handled before destroy-env"	08:28
fwereade	jam, I'm inclined that way for now at least	08:28
jam	fwereade: right. That was my position as well.	08:29
jam	fwereade: so I think my statement was "change the environment to dying, wait for manually provisioned machine agents to show up as Dying, and then Terminate the rest of the world"	08:31
jam	fwereade: with the caveat that we don't have a good Machine.Dying, so we actually have to kill the units and wait for the Machiner to show up as Dead	08:31
fwereade	jam, I'm not sure what that complexity buys us over just refusing to destroy when we have manually provisioned machines	08:32
fwereade	jam, in particular I'm not quite sure what shutdown sequence you imagine for those machines	08:32
axw	fwereade: so, how do we kill the manually provisioned bootstrap nodes?	08:33
jam	fwereade: my understanding is that what we care about is to uninstall jujud from upstart, etc, on those machines.	08:33
jam	which is something we want anyway, even if it isn't part of destroy-environment.	08:33
jam	because otherwise once you've registered machines with An environment, reusing them is going to be a real pain	08:33
jam	fwereade: certainly I don't think your suggestion is that once an env has ever had any machines manually registered, you can never destroy that env again ?	08:34
fwereade	axw, my concern is with the units,not themachines	08:36
fwereade	axw, we can't just stop units, can we?	08:36
fwereade	er that was for jam	08:36
fwereade	axw, re bootstrap machine, good question	08:36
jam	fwereade: well, why can't we just stop units?	08:37
fwereade	jam, my suggestion re manual machines is that they need to have their units removed explicitly	08:37
fwereade	jam, because we'd thus leave the services running	08:37
jam	fwereade: well, we are in a "terminate the world" condition, there won't be any services in just a few moments.	08:37
fwereade	jam, but there will	08:37
axw	fwereade: setting them to Dying causes the uniter to remove them eventually...?	08:38
fwereade	jam, those machines will keep on running their services forever	08:38
jam	fwereade: but whatever mechanism exists to actually stop the Postgres instance on that machine is something that we can trigger an DestroyEnv time, no?	08:38
jam	fwereade: the code that axw put up does have a "destroyUnits" function.	08:38
fwereade	jam, axw: I'm still going by the description -- but that makes sense only for manual machines, right?	08:39
axw	fwereade: so it goes like this: 1. prevent new units/machines from being added; 2. destroy all units (and non-state machines). 3. tell state machines to destroy themselves	08:39
axw	fwereade: yes, we can make it manual-only	08:39
jam	fwereade: "destroys (and waits on death of) all units" is in the description	08:39
axw	it's not currently, but it can be	08:39
jam	fwereade, axw: I'm not so sure that we can get an accurate (1) today	08:40
jam	at least from what fwereade has said about "destroy-machine --force"	08:40
* jam goes to grab some coffee, biab		08:40
axw	jam: I've coded it into add-unit/add-machine transactions, I don't see any problems, but then that was the first time I touched mongo transactions :)	08:41
fwereade	jam, axw: well, we could (i) destroy all services and (ii) gate add-machine and add-service on destroy-flag-not-set	08:41
fwereade	jam, axw: is there any other operation that can demand resources?	08:41
fwereade	jam, axw: add-unit, add-relation are already gated on the relevant services being alive	08:41
axw	fwereade: yep, that's what I have done. see state/environ.go, state/service.go, state/state.go	08:41
axw	fwereade: I added it into add-machine and add-unit, since they're the ones that we're waiting on the death of, and they're the ones that create agents	08:42
axw	I added it into add-service too, but I think that assertion can come out	08:43
fwereade	axw, I think it's a bad idea on add-unit	08:43
fwereade	axw, it's bad enough that all unit additions to a given service have to be serialized	08:43
fwereade	axw, bringing the environment doc into play makes all unit additions serialized	08:43
axw	hmm true :(	08:44
fwereade	axw, hence the destroy-all-services and prevent-add-service suggestion	08:44
axw	fair enough, I didn't think about that bit	08:44
fwereade	axw, core idea is solid though, I agree	08:44
fwereade	axw, my main concern is not slowing down destruction of non-manual environments	08:44
axw	fwereade: I will update the code to only do this for manual machines now	08:45
axw	fwereade: well, I can do it quick and dirty - if instanceId startswith "manual:" is basically all we have to go by at the moment	08:46
axw	is that acceptable?	08:46
rogpeppe1	axw, fwereade: i'm somewhat uncomfortable with gating the destruction of an environment on the orderly teardown of every unit in the environment	08:47
fwereade	rogpeppe1, agreed	08:47
rogpeppe1	axw, fwereade: (manually provisioned units excepted, probably)	08:47
axw	rogpeppe1: likewise, I'll only gate it on units allocated to manually provisioned machines	08:47
rogpeppe1	ok, cool.	08:47
rogpeppe1	the other thought i had was to split up the process into two parts	08:48
rogpeppe1	so that you don't need a timeout parameter	08:48
rogpeppe1	(which is always going to be wrong)	08:48
axw	fwereade: I'm going to leave in the bit that calls Destroy on each unit/machine, though, to cater for JaaS	08:48
axw	it just won't wait on them	08:48
fwereade	axw, I think so, in itself -- but I feel like we're putting the cart before the horse here	08:48
fwereade	axw, why would we do that?	08:49
fwereade	axw, in a jaas environment we just nuke all the instances, bam, done, just like today (excepting manual)	08:49
rogpeppe1	so you could have a "Clean" or "StartDestroy" or something which initiated the process of terminating the manually provisioned machines	08:49
axw	fwereade: I was thinking we just set them to Dying, and have the provisioner do its job in the background	08:49
fwereade	axw, that will take forever	08:49
fwereade	axw, and block on hook errors	08:50
rogpeppe1	and then a final "Destroy" which would fail to work if there were any non-dead manually provisioned machines	08:50
axw	ok	08:50
fwereade	axw, I'm not even against what you have in principle	08:50
fwereade	axw, I think it's exactly right for taking down manual machines	08:50
fwereade	axw, but I think it's a serious degradation in the non-manual case	08:50
rogpeppe1	that way the GUI can watch the status as usual while the manually provisioned machines are destroyed	08:50
axw	yup, I can see that	08:51
fwereade	axw, and I think we take a good step forward by simply moving a nuke-everything command inside the api server	08:51
axw	rogpeppe1: sounds fair enough. the CLI has to poll for status tho?	08:51
rogpeppe1	we could potentially do that in the command line too	08:51
rogpeppe1	axw: it could use an AllWatcher just like the GUI	08:51
rogpeppe1	axw: or... hmm, yes, it would need some kind of watcher otherwise	08:52
axw	fwereade: "the nuke-everything inside the api server" bit can come later, right? :)	08:52
fwereade	axw, if the API call were just to do a StopInstances([all-non-manager-machines]) (and abort if manual machines exist) that would move a good chunk of the functionality to where we want it	08:52
axw	ok	08:52
axw	I can do that	08:53
fwereade	axw, I do like the destroying flag too though	08:53
axw	fwereade: I'm thinking about what you said about disallowing destroy-env if units eexist, and now I'm starting to change my mind	08:54
axw	because a unit may fail to die	08:54
fwereade	axw, yeah, I'm just thinking we can make progress without solving that -- and it will be easier to solve as we restrict the scope of what we're doing	08:54
jam	axw: we could error both ways here, but should a bad hook on one unit keep all the other machines running?	08:55
jam	fwereade: do you need anything more from us on: https://code.launchpad.net/~fwereade/juju-core/api-force-destroy-machines/+merge/194764	08:55
fwereade	axw, jam: ofc there is also independent demand for an ignore-hook-errors mode for unit shutdown	08:55
axw	fwereade: true, the problem would go away with that	08:56
axw	jam: ideally not, but we'd need some kind of feedback to let people know at least	08:56
axw	alright, I'll just put a TODO in to fix it with ignore-hooks	08:56
fwereade	jam, I'm not sure -- the machine-dying problem is out of scope, and the other bits wallyworld_ flags should be trivial fixes	08:57
fwereade	jam, my current plan is just to tidy up the clear code problems and resubmit	08:58
jam	fwereade: I just want to make sure you're unblocked, and still backporting the other fix to 1.16	08:58
fwereade	jam, I was going to backport both of those after I'd got this landed	08:59
fwereade	jam, I'll ping you or wallyworld_ for a review once I've had a moment to work on it a bit :)	09:00
jam	fwereade: also, you wanted to talk about Pinger stuff today instead of yesterday.	09:00
jam	I don't want to INTERRUPT flood you, though.	09:00
fwereade	jam, ah, yes, I did read back -- and I think I'm +1 on the idea that each SetAgentAlive creates an independent slot, we record all those slots somewhere in state, and we infer that any of those slots being alive indicates that the agent is	09:02
fwereade	jam, minimally invasive and rewritey, I think	09:02
fwereade	jam, the idea of defending against double-pings also has some merit but feels orthogonal	09:03
fwereade	jam, best to interrupt-flood me in the morning, sometime this PM I plan to go dark and start simplifying some of the state test setup stuff	09:03
fwereade	jam, because OMG it hurts to write that sort of code	09:04
jam	fwereade: you're welcome to go dark whenever you need to.	09:04
jam	fwereade: so when does SetAgentAlive actually trigger? It is the new heartbeat? or just on connect or ?	09:04
fwereade	jam, I was thinking on connect still?	09:05
jam	fwereade: I've personally moved on so that my favorite design actually disconnects the agents from the presence table, and we let the API server batch up "here are all the bits for agents I know about, set them all"	09:05
fwereade	jam, that's fine in principle -- not sure it entirely justifies the effort but I'm willing to let someone closer to the code make that judgment	09:06
fwereade	jam, the api server is certainly a good place to do all that, and if it's easy it definitely sgtm	09:08
jam	fwereade: so SetAgentAlive would kick off a new slot for that agent in whatever mechanism we are using (independent Pingers or one global Pinger)	09:09
jam	fwereade: I can say that a hypothetical 30s ping *10000 agents connected to an api server means you write an "$inc" to mongo every 3ms	09:10
jam	I guess because of 64-bit limitations on the int field, you ever decrease the number of $inc requests by 64	09:10
jam	so a constant multiplier which is good, but doesn't actually change your O()	09:11
fwereade	jam, yeah	09:12
jam	you do batch them up nicer	09:13
jam	but I was hoping for more of a log() sort of improvement.	09:13
fwereade	jam, bright ideas in that direction will be accepted most gratefully ;)	09:13
jam	fwereade: well, in mongo 2.6 (whenever that actually releases) we'll have binary operators like $bin setbitX	09:14
jam	that doesn't change the load factor.	09:14
jam	but it does avoid double ping corruption	09:14
jam	It does propose the question "is it cheaper to have one request with 50 $inc, or 50 $inc requestts"	09:15
fwereade	jam, one would hope the former, wouldn'tone	09:16
jam	fwereade: I would hope so, yes. Though if you had distributed master I could see a case where it might parallelize the latter better.	09:16
jam	fwereade: in my scale testing, I never did get mongo above 100% CPU	09:17
jam	so we still have to watch out for it becoming the bottleneck	09:17
fwereade	jam, yeah, it's nice that it isn't today though	09:17
jam	fwereade: well, it might have slightly been the bottleneck in the add-unit one-by-one case.	09:18
jam	It is hard to tell when you have 600% CPU in Juju, and 90+% in mongo	09:18
jam	and "add-relation nrpe-external-master" just goes into death throws, so I don't know who to blame there. :)	09:19
fwereade	jam, for that I think we should be moving the add-subordinate logic off the unit agent and into the api server, I suspect	09:22
fwereade	jam, distributing the logic doesn't actually help anyone in any way afaict	09:22
fwereade	jam, but that might itself be fiddly	09:22
fwereade	jam, and there are some benefits to how we have it today	09:23
jam	fwereade: well when I tested it on m1.xlarge (15GB) with just 1000 machines, we start at about 1GB of Res, and as soon as nrpe-external-master is related, we go OOM	09:23
jam	so something on the API server is consuming lots of memory for running this	09:24
jam	and we only end up with something like 3-4 actual "nrpe-external-master" agents in status	09:24
jam	that actually even get into "pending"	09:24
jam	and I think when jujud restarts after OOM, we end up at 0% CPU on all processes, and nothing making progress.	09:24
fwereade	jam, I suspect the problem is in the complexity of EnterScope having to add units as well	09:25
fwereade	jam, if we created appropriate subordinates at AddUnit time, and queued up a bunch of them at AddRelation time, we might have a happier time	09:26
fwereade	jam, or maybe we can just write EnterScope a bit more smartly	09:27
jamespage	jam, fwereade: not sure whether this has been discussed or not, but are we going to provide any sort of upgrade path for juju 12.04 users to 14.04?	09:27
fwereade	jam, it was the first complex txn logic I ever write so it's unlikely to be the best	09:27
jamespage	just trying to figure out what we need todo re charms for openstack this cycle	09:27
jamespage	(12.04 -> 14.04 deployments that is)	09:27
fwereade	jamespage, the assumption has hitherto been that machines on one series will stay forever on that series	09:28
jam	jamespage: as in, how to take your existing workload and upgrade all of your units so they are running on 14.04 ?	09:28
jamespage	fwereade, yeah - that was what I thought	09:28
jamespage	jam: yes	09:28
fwereade	jamespage, it's a bit crap, but I don't really see us addressing it this cycle	09:28
jamespage	if its not a supported path thats OK	09:28
fwereade	jamespage, cool	09:28
jam	fwereade: and the Charm story means that you can't just upgrade the service, you have to deploy a new service ?	09:28
fwereade	jam, yeah, the model doesn't really allow for series changes at all	09:29
jamespage	jam: yeah - that was the complicating factor from my perspective	09:29
jamespage	the 12.04 charms support cloud-archive through to havana and up to saucy OK	09:29
fwereade	jam, apart from anything else, we'd need cross-series charms for it to even begin to make sense	09:29
jamespage	but it feels like we have a bit of a clean sheet with 14.04	09:29
jamespage	i.e. we can drop some of the cruft :-)	09:30
fwereade	jam, maybe that's a bullet that's better bitten sooner than later	09:30
fwereade	jam, and it dovetails nicely with the workload-not-os focus	09:30
fwereade	jam, but it also rather gives me the fear	09:30
fwereade	jam, it'd demand a good solid block of scheduled time at the very least	09:31
jam	fwereade: it would be nice if you could say "new units of service X are deployed with this version of the charm"	09:32
jam	and then you can deploy 10 more 14.04 units, and destroy the original ones.	09:32
jam	It is also a nice story for upgrade of charms in general	09:32
jam	even without a series bump	09:33
jam	so that you can do a "safe" migration rather than "everything at once"	09:33
fwereade	jam, sure, but precise/nova is currently considered completely distinct from trusty/nova	09:33
fwereade	jam, that's what I mean by cross-series charms	09:33
fwereade	jam, I agree with what you're saying about smooth upgrades regardless	09:34
fwereade	jam, for os upgrades, even if the charms were the same you'd have to deal with all the subordinates and colocated services too	09:35
fwereade	jam, doable in theory but an awful lot of work in practice	09:36
axw	rogpeppe1: just thinking about what you were suggesting before: it's not that easy, because the last thing DestroyJuju does is tell the state servers to kill themselves	09:36
rogpeppe1	axw: that's why i'd split it into two	09:36
axw	so... destroy everything else, wait in the CLI, then finalise?	09:36
rogpeppe1	axw: that was my thought, yes	09:37
rogpeppe1	axw: and the finalisation step would fail if the appropriate pieces were not torn down correctly	09:37
rogpeppe1	axw: (it would probably have a force flag too though)	09:37
rogpeppe1	axw: then we can give useful feedback in the CLI if we want	09:38
axw	hmm yes, that would be nice	09:38
axw	sold	09:38
* axw deletes the comment about timeout length		09:38
rogpeppe1	axw: cool	09:38
rogpeppe1	fwereade: does the above seem reasonable to you?	09:38
fwereade	rogpeppe1, axw: I feel like the best first cut is still: (1) abort if manual machines exist (2) destroy services, set destroying flag (3) directly StopInstances for non-managers in the API server (4) environ.Destroy in the CLI	09:40
fwereade	rogpeppe1, axw: that can then be upgraded to allow for clean shutdown of manual machines, and force-destroy in state of other non-manager ones	09:41
rogpeppe1	fwereade: what do you mean when you say "destroy services" ?	09:41
axw	fwereade: ah sorry, I thought you came around to my proposal ;)	09:42
axw	mk	09:42
fwereade	rogpeppe1: set dying to prevent more units being added	09:42
fwereade	axw, um, sorry, I probably misread something	09:42
rogpeppe1	fwereade: ah yes, seems reasonable	09:42
rogpeppe1	fwereade: is the destroying flag not just Dying on the environment?	09:43
fwereade	axw, I thought we were at least in agreement about where we wanted to end up, maybe there was confusion about the exact steps	09:43
axw	fwereade: yeah, I can pare it back a bit for the first cut	09:43
fwereade	axw, I'm not so bothered about those so long as none of the steps leaves us with a significantly less performant destroy-environment	09:44
axw	ok	09:44
rogpeppe1	fwereade: tbh i think that if we're going to call environ.Destroy in the CLI there's not really any point in destroying any services	09:44
fwereade	rogpeppe1, the service destruction prevents add-unit	09:44
fwereade	rogpeppe1, without making every add-unit have to check the destroying flag	09:44
rogpeppe1	fwereade: but if we're killing the state server, add-unit is nicely prevented anyway :-)	09:44
fwereade	rogpeppe1, add-unit during StopInstances seems like a recipe for races	09:45
rogpeppe1	fwereade: hmm, you're probably right there	09:46
rogpeppe1	fwereade: isn't it racy anyway?	09:46
fwereade	rogpeppe1, "stop all non-manual non-manager instances" is, I think, an atom of functionality that remains useful in most contexts	09:46
fwereade	rogpeppe1, hmm, maybe? what in particular?	09:47
rogpeppe1	fwereade: if the provisioner has just seen a unit, a machine can be provisioned just as we're calling StopInstances	09:47
rogpeppe1	fwereade: i'm not sure there's any way around it	09:47
fwereade	rogpeppe1, well, StopInstances has to take its instance ids from the machines in state, doesn't it?	09:48
fwereade	rogpeppe1, (btw I think StopInstances may be a bit broken anyway -- it should surely take ids, not instances?)	09:48
rogpeppe1	fwereade: no, i don't think it does	09:49
rogpeppe1	fwereade: (have to take instance ids from the machines in state, that is)	09:49
rogpeppe1	fwereade: it takes instance ids from the instances returned by AllInstances, i think	09:49
fwereade	rogpeppe1, depends how providers react to being told to stop instances they're claiming don't exist in an Instances call	09:49
rogpeppe1	fwereade: Environ.Destroy does anyway	09:50
fwereade	rogpeppe1, yeah, but AllInstances is kinda useless in practice :(	09:50
rogpeppe1	fwereade: because of eventual consistency?	09:50
fwereade	rogpeppe1, in ec2 at least it's not uncommon for Instances/AllInstances to straight-up lie for minutes at a time	09:50
fwereade	rogpeppe1, yeah	09:51
rogpeppe1	fwereade: in which case we're stuffed - there's nothing we can do to prevent an instance escaping after an environment destroy	09:51
fwereade	rogpeppe1, wasn't there at one stage a Destroy param that was explicitly "and stop these instances that we know exist even if the provier's lying"?	09:51
rogpeppe1	fwereade: there was (maybe still is)	09:51
fwereade	rogpeppe1, well we can at least issue a Stop call for everything we believe to exist	09:52
rogpeppe1	fwereade: yeah	09:52
fwereade	rogpeppe1, which means taking those ids from state	09:52
rogpeppe1	fwereade: but i think i still feel that destroying the services is overkill and won't do the job correctly	09:53
fwereade	rogpeppe1, are you saying there's no point restricting add-* commands when we know the env is being destroyed?	09:53
rogpeppe1	fwereade: yeah, because we immediately destroy all the machines anyway	09:54
rogpeppe1	fwereade: and there's a race regardless	09:54
fwereade	rogpeppe1, I think it's actually quite nice to inform other users that their commands will have no effect	09:54
rogpeppe1	fwereade: if they issued the command half a second earlier, their commands would have no effect either	09:55
rogpeppe1	fwereade: but they wouldn't get told	09:55
fwereade	rogpeppe1, stopping all instances might take more than half a second	09:55
rogpeppe1	fwereade: true. we should probably start by destroying the state servers...	09:56
rogpeppe1	fwereade: i just feel that destroying all the services might itself take quite a long time, and it's not really necessary	09:58
rogpeppe1	fwereade: and that we should just cut to the chase and stop the user burning money asap	09:58
axw	rogpeppe1: it is useful for manual, because it wants to know which units to kill	09:58
axw	(not that we're doing that yet)	09:58
rogpeppe1	axw: agreed - we definitely need to tear down manual units first	09:59
fwereade	rogpeppe1, destroying the state servers really feels to me like the thing to do last	09:59
axw	yes	09:59
fwereade	rogpeppe1, axw: but can I recuse myself briefly? I'd like to address wallyworld_'s review before he goes to bed	10:00
axw	yup, thanks for your time fwereade	10:00
rogpeppe1	axw, fwereade: destroying the state servers puts the environment into an inactive state - no new instances will be created, for example.	10:02
axw	gtg, will come back to this tomorrow	10:07
fwereade	rogpeppe1, putting them into a temporary not-accepting-requests mode is kinder and cleaner, I think	10:09
rogpeppe1	fwereade: depends how long it takes - if i've got an almost-not-working environment, running n transactions to destroy the svcs might take a long time when all i want to do is destroy the darn environment.	10:10
rogpeppe1	fwereade: but fair enough, if you think it's that important	10:11
dimitern	fwereade, ping	10:20
dimitern	fwereade, I think I'm finally ready to propose the first part of upgrade-juju changes, and I'll appreciate if you look into it closely, because I probably missed something important	10:21
jam	rogpeppe1, fwereade: I'm going to poke at the GOMAXPROCS thing again. Though I'd like to (a) not do it during the test suite and (b) check for GOMAXPROCS if set. Is there a sane way to know that we aren't in the test suite? I'd rather do it that way then have to change every test case to turn off the effect.	10:21
fwereade	dimitern, cool, I just need to pop out before the standup, I will take a look afterwards	10:22
dimitern	fwereade, cheers	10:22
fwereade	jam, set a "this is real code flag" in the various main.main() funcs?	10:24
jam	fwereade: yeah, that's sort of what I was thinking.	10:24
fwereade	wallyworld_, jam: https://codereview.appspot.com/24790044 addressed, I think	10:37
wallyworld_	fwereade: thanks, will take a look	10:38
jam	saw the email, looking already	10:38
dimitern	fwereade, there it is https://codereview.appspot.com/25080043	10:40
fwereade	dimitern, cheers	10:40
rogpeppe1	jam: one way to know we're not in a test suite is to do the GOMAXPROCS initialization in a package that's not imported by a test	10:41
rogpeppe1	jam: though i would really like it much better if our tests passed reliably with GOMAXPROCS=anything	10:41
jam	rogpeppe1: if it is imported by main() won't it get imported by a test ?	10:41
rogpeppe1	jam: hmm, yeah you're right, for cmd/* tests	10:42
jam	rogpeppe1: I'm sure I'd like it if our tests just worked :)	10:43
wallyworld_	fwereade: did you forget to put in the destryMachineDoc in the cmd Info?	10:43
rogpeppe1	jam: another possibility is to put GOMAXPROCS=1 in test initialization	10:43
rogpeppe1	jam: actually, the best solution is just to call the GOMAXPROCS initialization inside main()	10:44
rogpeppe1	jam: main only gets called in very limited circumstances in our test suite	10:45
rogpeppe1	jam: and if we needed to, we could override it in that case (when we're execing the test binary) by setting the GOMAXPROCS env var	10:46
wallyworld_	fwereade: btw that lgtm assumes the destroyMachineDoc is used in the right place	10:58
fwereade	oh ffs	11:42
fwereade	google hates me today	11:42
fwereade	dimitern, rogpeppe1, mgz: I think I will pick this discussion up when I've had a chance to review the branch	11:42
dimitern	fwereade, ok, np	11:43
fwereade	...and eat lunch ;p	11:48
* TheMue => lunch		12:33
mattyw	fwereade, when you're back from lunch I've got a couple of small questions	12:35
rogpeppe1	fwereade, dimitern, jam: it looks to me as if we can't use upsert in a mongo transaction. is that right?	12:45
rogpeppe1	niemeyer: ^	12:47
* rogpeppe1 goes for lunch		13:03
=== gary_poster\|away is now known as gary_poster
rogpeppe1	fwereade: do you know of any existing logic in state that's similar to what we need for the collection of state servers? originally i was thinking of using $addToSet, but it doesn't quite fit the requirements; then i thought of using upsert, but it looks like we can't use that with transactions.	14:10
fwereade	rogpeppe1, not offhand -- I'd been vaguely expecting the meat to be a map[id]address for convenience of insert/remove	14:11
rogpeppe1	fwereade: i'd thought that too	14:12
fwereade	rogpeppe1, wrt interpreting an existing environment without on of those docs, I don't think there's a way around counting first to figure out whether to insert/update	14:12
dimitern	fwereade, how about my review btw?	14:13
fwereade	dimitern, I'm doing it :)	14:14
rogpeppe1	fwereade: i'm not sure whether a map[id]address can work, but i've just thought of something; maybe it can	14:14
dimitern	fwereade, cheers	14:14
TheMue	one review please: https://codereview.appspot.com/24040044	14:24
TheMue	rogpeppe1: calendar says you're reviewer today ;)	14:24
rogpeppe1	TheMue: ah, good point, will have a look	14:24
TheMue	rogpeppe1: and there's also https://codereview.appspot.com/15080044/, the script friendly outpout of env/swich	14:42
rogpeppe1	TheMue: will look at that next, thanks	14:43
TheMue	rogpeppe1: there's one field to remove in there, see martins comment, and he has troubles with the idea (you remember, default vc argument raw output)	14:43
TheMue	rogpeppe1: I've got to thank you	14:44
fwereade	dimitern, sorry, but I'm having real trouble following what's actually going on in that CL -- I have a bunch of comments on the documentation, but my brain keeps melting when I try to follow what's actually happening overall	14:48
fwereade	dimitern, I freely stipulate that this is because it's all based on my own shitty code that should never have seen the light of day	14:49
fwereade	dimitern, but it's giving me real problems when it comes to reviewing helpfully	14:49
fwereade	dimitern, especially since --upload-tools keeps on interfering	14:50
fwereade	dimitern, but I'm coming to feel this is a case where we should fix it, not patch it, if that phrasing makes sense to you	14:51
dimitern	fwereade, expand a bit please	14:51
dimitern	fwereade, the main changes are in validate(), when --version is not given, hence v.chosen is Zero	14:52
dimitern	fwereade, in addition to the removal of --dev flag and logic around it	14:52
fwereade	dimitern, I'm saying the code was largely incomprehensible before you started, and that makes it hard to validate your changes	14:52
dimitern	fwereade, I go over all discovered tools for that major version and filter out the incompatible within [current, nextSupported], picking newest at the end	14:53
dimitern	fwereade, ah	14:53
dimitern	fwereade, well, it is convoluted somewhat yes :)	14:53
fwereade	dimitern, (fwiw it is all my fault, I'm not going to defend the original form of the code)	14:53
fwereade	dimitern, but I'm wondering whether you're in a good position to rip all that underlying crap out and start afresh with something sane	14:54
dimitern	fwereade, that's a bit scary	14:55
fwereade	dimitern, yeah, I can understand that :)	14:55
dimitern	fwereade, the mess is around --upload-tools now mostly	14:55
fwereade	dimitern, yeah, and that's a lot of what bothers me	14:56
dimitern	fwereade, but didn't we decide to keep that behavior for now and just report is a deprecated?	14:57
fwereade	dimitern, because I no longer really understand how upload-tools interacts with anything else	14:57
fwereade	dimitern, Idon't know, the internet hated me so lost the end of that conversation	14:57
dimitern	fwereade, well, in summary	14:57
dimitern	fwereade, we discussed eventually replacing the need of --upload-tools everywhere with some dev docs + possibly scripts/tools that use sync-tools when you would've used --upload-tool	14:58
dimitern	fwereade, but for now iirc decided to deprecate --upload-tools with a message and keep it working for easier transition; it'll also need some announcement on the mailing list at least	14:59
fwereade	dimitern, if we've ever recommended that anyone use upload-tools I guess we should	15:00
fwereade	dimitern, but I would hope wenever had	15:00
fwereade	dimitern, it's always been a dev tool	15:00
dimitern	fwereade, yep	15:01
fwereade	dimitern, so who's transitioning?	15:01
fwereade	dimitern, keeping it around makes the transition harder	15:01
jcsackett	abently, sinzui: can you look at https://code.launchpad.net/~jcsackett/charmworld/rip-out-old-queues/+merge/194647 and/or https://code.launchpad.net/~jcsackett/charmworld/multiple-appflowers-2/+merge/194648	15:01
fwereade	dimitern, and just preserves the shitty devs-first attitude that has plagued the whole problem from the beginning	15:01
dimitern	fwereade, it's useful when developing/testing stuff from source, mainly for us (one reason rogpeppe1 was not sold on removing it right away, and I get that), but we should replace it with sync-tools + scripts / makefile commands to see how it works out	15:02
sinzui	jcsackett, I can	15:02
fwereade	dimitern, the fundamental problem with all our tools stuff is, and always has been, that --upload-tools works for devs	15:03
jcsackett	sinzui: thanks.	15:03
dimitern	fwereade, despite being tempting to rip it out and rewrite it, let's do it in steps I suggest	15:03
rogpeppe1	fwereade: given that people are using it, i think peremptorily removing it without at least some notification to the group might be considered a little rude	15:04
fwereade	dimitern, I'm just worried that building on what we have inevitably takes us further from a good solution	15:04
fwereade	rogpeppe1, who's using it?	15:04
rogpeppe1	fwereade: anyone that's following the dev version for whatever reason	15:04
fwereade	rogpeppe1, surely not? we distribute dev tools	15:05
dimitern	fwereade, rogpeppe1, well, if it's just us, then it's no problem	15:05
rogpeppe1	fwereade: true. anyone that builds from tip	15:05
fwereade	rogpeppe1, anyone who's building from source themselves is a developer	15:05
fwereade	rogpeppe1, and can I think be expected to deal with it	15:06
rogpeppe1	fwereade: we should at least provide some reasonable replacement so everyone doesn't have to reinvent their own script	15:07
fwereade	rogpeppe1, yeah, no argument there	15:07
dimitern	absolutely +1	15:07
rogpeppe1	fwereade: in particular i'd like to be able to keep my current workflow (change some source, try to bootstrap with tools built from the current source)	15:08
dimitern	rogpeppe1, or upgrade for that matter	15:08
fwereade	rogpeppe1, dimitern: I don't really care how hard it is for us tbh... making it easy for ourselves made it crap for everyone else	15:09
rogpeppe1	dimitern: indeed - that's also important to get right, so the developer doesn't constantly need to be fiddling with the build version in version.go (something that will inevitably leak into trunk at some point)	15:09
rogpeppe1	fwereade: i don't really see how it interfered with non-dev usage	15:09
fwereade	rogpeppe1, tools distribution was total crack until just before 13.04, was was only just downgraded to 98% crack in time for release	15:10
fwereade	rogpeppe1, this is precisely because we were all using a non-standard code path	15:10
rogpeppe1	fwereade: wasn't the crackfulness only exposed if you did upload-tools?	15:11
fwereade	rogpeppe1, the cost to us of constructing a realistic environment in which to run the same code as our users do is far less than the cost of having the real code path so little-travelled as to be barely a game trail	15:11
fwereade	rogpeppe1, no, upload-tools did the right thing	15:12
rogpeppe1	fwereade: you're probably thinking of a different piece of crack to me	15:12
fwereade	rogpeppe1, normal users would basically run random versions of the tools	15:12
rogpeppe1	fwereade: oh, you mean the "select highest version" logic?	15:12
fwereade	rogpeppe1, and those versions would usually work well enough to downgrade themselves to the desired versions	15:13
dimitern	:D	15:13
fwereade	rogpeppe1, but none of us noticed because we wererunning a different code path	15:13
fwereade	rogpeppe1, and now we have a bit of time scheduled to work on upgrades I am very keen that we avoid getting back into that particular failure mode	15:14
rogpeppe1	fwereade: ok	15:14
fwereade	rogpeppe1, cheers -- it's not even an issue with screwing up the code, because that basically always happens,because we're human -- it's about making sure we have a chance of detecting the problem	15:15
* fwereade quick ciggie, think		15:16
niemeyer	rogpeppe1: Yes, you cannot upsert in a mongo transaction.. you can insert and update, though	15:30
niemeyer	rogpeppe1: or update and insert	15:30
niemeyer	rogpeppe1: (ordering matters)	15:30
niemeyer	rogpeppe1: Sorry for missing your message earlier	15:31
fwereade	niemeyer, ha, ofc -- nice :)	15:31
rogpeppe1	niemeyer: so if you don't know if something exists, it would be ok to do both the insert and the update, and one would succeed, so all would work?	15:31
fwereade	rogpeppe1, I think you'd want update-then-insert,so the update silently fails	15:31
niemeyer	rogpeppe1: Yeah	15:32
niemeyer	fwereade: Right	15:32
rogpeppe1	niemeyer: cool, thanks	15:32
niemeyer	There's a message in juju-dev	15:32
niemeyer	on Aug 30th	15:32
niemeyer	Subject "Upcoming fix in mgo/txn"	15:32
rogpeppe1	niemeyer: cool, thanks	15:33
rogpeppe1	niemeyer: in fact i'm probably going to be using a single document, but that's really useful to know	15:33
rogpeppe1	TheMue: you've got a review	16:01
rogpeppe1	TheMue: and another one	16:11
jcastro	what was the final name of the ssh/manual provider?	16:26
TheMue	rogpeppe1: thx	16:27
natefinch	mental note: don't do apt-get update && apt-get upgrade during the workday :/	16:32
mgz	natefinch: indeed, it's an after-work leisure activity :)	16:32
natefinch	mgz: My old laptop is pissed I'm replacing it, evidently	16:33
TheMue	natefinch: reasonable	16:54
jam	jcastro: manual	17:05
jcastro	jam, ok so https://juju.ubuntu.com/docs/config-manual.html is correct terminology-wise?	17:05
mattyw	what version of mongo to people use to test on precise? I get all kinds of panics when running go test ./state/...	17:06
jam	mattyw: there should be a version in ppa:juju/stable or the cloud tools archive.	17:06
jam	I think it is 2.2.4?	17:06
jcastro	jam, I assume that core will write manual instead of "null" to the environments yaml?	17:06
jam	mattyw: we need SSL support which wasn't in the stock precise version	17:06
mattyw	jam, I remember that being the case - 2.2.4 is what I'm on - I'll paste the errors in case anyone is interested	17:07
dimitern	rogpeppe1, thanks for the review as well	17:12
rogpeppe1	dimitern: np	17:12
=== racedo` is now known as racedo
mattyw	this is more or less a summary of the errors I see http://paste.ubuntu.com/6406334/	17:25
* rogpeppe1 is done for the day.		19:30
rogpeppe1	g'night all	19:30
thumper	natefinch: if you have time... https://codereview.appspot.com/24980043/	20:46
thumper	pretty trivial move of functions	20:46
thumper	+ a few tests	20:46
natefinch	thumper: sure thing	20:46
thumper	natefinch: I wish go had a better way to handle dependencies than just packages...	20:48
natefinch	thumper: you mean like versioning on dependencies?	20:48
thumper	I have to move some interfaces out of a package where they fit naturally to avoid circular imports	20:48
natefinch	thumper: Oh I see	20:48
thumper	I don't have a good answer for that yet	20:49
natefinch	thumper: yeah, the circular imports thing is annoying... but it generally means you're doing something wrong anyway.	20:49
thumper	I had moved them up a package, from container/kvm to container, but it doesn't naturally fit	20:49
* thumper tries to move something else		20:49
* thumper has an idea		20:50
natefinch	thumper: there's actually the exact same problem in C# - you can't have X depend on y and Y depend on x.	20:50
* thumper nods		20:51
thumper	natefinch: ya know what, this circular import was because I was doing something wrong	20:59
thumper	changed it, and now it is better	20:59
natefinch	thumper: is it me, or is RemoveContainerDirectory actually supposed to be MoveContainerDirectory?	21:01
thumper	well, it should probably be renamed :)	21:02
thumper	but move is a little general	21:02
thumper	it does remove it from the main dir	21:02
thumper	but keeps it around for "debugging"	21:02
thumper	which most people never need	21:02
thumper	I have a mental bug about cleaning it out when you start a new local provider	21:03
natefinch	thumper: do we care that there's a race condition in RemoveContainerDirectory if two things both find a unique directory name and then both try to rename their directory to it?	21:06
thumper	natefinch: no, because there is only ever one goroutine doing this	21:06
thumper	the provisioner	21:07
thumper	may be worth noting in the function description though	21:07
natefinch	thumper: yeah.... it would be good to note it's not thread safe	21:08
* thumper wonders why he didn't get an email about natefinch's reply		21:30
natefinch	thumper: I just hit "go" like 30 seconds ago	21:30
thumper	oh	21:30
thumper	:)	21:30
natefinch	thumper: sorry for all the nitpicks	21:31
thumper	that's fine	21:31
* thumper goes of the nits		21:31
thumper	natefinch: I have another, Rietveld: https://codereview.appspot.com/25550043	21:31
thumper	natefinch: next branch in the pipeline	21:31
thumper	lots of initial boilerplate for the kvm containers	21:32
thumper	natefinch: we can't export them just for tests because the tests are in other packages	21:32
natefinch	thumper: oh yeah. Man I hate that.	21:33
thumper	this does make me sad	21:33
thumper	but go only has one protection mechanism	21:33
thumper	and is kinda anal about you circumventing it	21:33
natefinch	thumper: to be fair, it's because we're trying too hard to have other packages know about the internals of this one	21:34
natefinch	thumper: other packages shouldn't need to know about the removedDirectory, even for tests	21:34
thumper	well...	21:34
thumper	we want to be able to mock them out though	21:34
thumper	having a nice clean way to do that is difficult	21:35
natefinch	yeah	21:35
thumper	because whatever you choose	21:35
thumper	it is iky	21:35
natefinch	yep	21:35
natefinch	wonder if the way to do it is to just have Mock() and Unmock() methods in the package, and let it figure out how to do it. Not great, though. I wonder if anyone else has figured out cleaner ways to do it.	21:37
thumper	natefinch: most other projects have one package :-\|	21:37
natefinch	thumper: I'm not convinced of that. Not real projects that actually do things.	21:38
thumper	:)	21:38
thumper	MoveToRemovedDirectory?	21:39
thumper	awkward	21:39
natefinch	It's honestly a little weird that this detail is exposed. The reason you're doing it is to avoid circular dependencies?	21:39
thumper	which details?	21:40
thumper	the directory stuff?	21:40
natefinch	yeah	21:40
thumper	I've moved it up because it is common between lxc and kvm	21:40
thumper	we need somewhere to write the user data and have container logs	21:40
natefinch	yeah.. yeah, that's true.	21:41
thumper	I may just go with container.RemoveDirectory	21:42
natefinch	thumper: yeah, I think that's fine	21:43
natefinch	thumper: I'm not going to be able to get that review done before I have to go relieve my wife of the screaming baby at 5.	21:53
thumper	natefinch: that's fine	21:53
thumper	I'll get wallyworld_ to do it :)	21:54
wallyworld_	\/	21:54
thumper	wallyworld_: here is that IsKVMSupported method you are after :) https://code.launchpad.net/~thumper/juju-core/kvm-containers/+merge/194946	21:58
wallyworld_	great, will look soon	21:59
thumper	wallyworld_: although, I was thinking perhaps we should return (bool, error)	21:59
thumper	instead of just bool	21:59
wallyworld_	yes, +1 to error	21:59
thumper	so if the cpu-checkers package isn't installed, or for some reason /bin/kvm-ok isn't there, we get an error	21:59
wallyworld_	yes, agreed. need that error	22:03
=== gary_poster is now known as gary_poster\|away
wallyworld_	thumper: just started looking - the kvmContainer retains a reference to the factory, the lxc container doesn't - is there a fundamental difference i will encounter as i read more of the code?	22:20
thumper	I'm not sure...	22:23
thumper	wallyworld_: all you should really care about is the public interfaces	22:26
thumper	because implementation is likely to change	22:27
wallyworld_	as a user yes :-) i was trying to grok the implementation	22:27
thumper	:)	22:27
thumper	the implementation may change as it is all not-there-yet :)	22:28
* wallyworld_ nods		22:28
wallyworld_	i was comparing the lxc and kvm implementations / structs etc to see the differences	22:28
* thumper nods		22:29
wallyworld_	since as lxc is already in the code base, if this mp is similar, makes reviewing easier	22:30
wallyworld_	thumper: why does kvm package redefine ContainerFactory interface?	22:34
thumper	ContainerFactory is about the low level containers themselves	22:34
thumper	the container.Manager interface is how juju interacts with containers	22:34
thumper	the mock implementation is around the ContainerFactory and Container	22:34
thumper	not the container.Manager	22:34
wallyworld_	ok	22:34
wallyworld_	although since the interface is identical for lxc and kvm, i would have thought the implementing structs would have been different, but interface abstracted out as common	22:36
wallyworld_	otherwise value of having an interface is diminished	22:37
wallyworld_	may as well just use structs	22:37
thumper	part of the problem is the lxc versions are in the golxc package	22:38
thumper	not juju	22:38
thumper	interfaces mean I can mock it out	22:38
thumper	that in itself is good	22:38
* thumper has to nip out		22:39
thumper	bbl	22:39

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!