/srv/irclogs.ubuntu.com/2013/11/18/#juju-dev.txt

wallyworld	thumper: you around?	01:29
thumper	wallyworld hey	02:10
wallyworld	thumper: want a hangout?	02:16
thumper	sure	02:16
wallyworld	https://plus.google.com/hangouts/_/72cpi2rmduripigvi5lgv18b7c	02:16
axw	wallyworld: sorry, I guess I took one person's preference on US spelling for policy :)	03:16
wallyworld	axw: no worries :-)	03:16
wallyworld	no need to apologies	03:16
axw	and certainly no need to apologize	03:17
axw	hur hur	03:17
wallyworld	ha ha ha	03:17
wallyworld	axw: did my review on your bootstrap tools change make sense?	03:17
axw	yes, just makign some changes now	03:17
axw	thanks for that	03:17
wallyworld	np. just wanted to check you were unblocked	03:17
wallyworld	thumper: there's a potential issue with checking if a host machine can run a container. we currently support "juju add-machine lxc" which means create a new machine and add an lxc container. in that case, it's not possible at add-machine time to know if a container is supported or not. there was talk at one stage of removing the syntax allowing just "lxc" or "kvm" and forcing <containertype>:<machineid>	03:26
wallyworld	do you think we could remove the add-machine <containertype> syntax?	03:27
wallyworld	since i think there was feedback we should be explicit about where containers go	03:27
wallyworld	but even in that case, add-machine <container>:<machine> may need to block for a while if the machine is not up yet	03:27
wallyworld	eg if the user has scripted add-machine followed by add-machine container	03:28
thumper	wallyworld: I think it is reasonable for someone to request it	03:30
thumper	wallyworld: even though it may not be fully actionalbe	03:31
thumper	so if there is a pending machine for a particular machine	03:31
thumper	and that machine on starting says "I don't support containers"	03:31
wallyworld	i guess we could relfect in status	03:31
thumper	we need to be able to put that container into an error state	03:31
thumper	since we are operating asyncly	03:31
wallyworld	yes, ok	03:31
thumper	we need to be able to handle this cleanly	03:32
thumper	or at least	03:32
thumper	cleanish	03:32
thumper	o/ axw	03:32
wallyworld	i can do a retry strategy to allow a little waiting if supported containers not known	03:32
axw	heya thumper	03:32
thumper	wallyworld: nah don't do that	03:32
thumper	just expect it to be ok	03:32
thumper	it is up the juju to manage conflicts	03:32
wallyworld	ok	03:32
thumper	don't block	03:32
wallyworld	in many cases, we will hopefully know	03:33
thumper	ack	03:33
wallyworld	and can error immediately	03:33
wallyworld	well, reject the add-machine command i mean	03:33
* thumper nods		03:48
axw	wallyworld: I had to merge cmd/juju/bootstrap_test.go, and some other minor things. I forgot to remove the --source flag from cmd/juju	06:21
axw	if you want to re-review let me know, otherwise I'll just push and land	06:22
wallyworld	axw: otp, but i trust you :-)	06:22
axw	ta	06:23
wallyworld	jam: yeah, looks like it may be about to hail, go to race and get my son from cricket training	06:35
jam	wallyworld: try not to get hurt :)	06:35
wallyworld	will do :-)	06:35
axw	jam: I'm looking at that uninstall-script thing again. What do you think about this alternative: store the agent's upstart service name in agent.conf, as well as a list of subordinate services (for the moment, just juju-db)	07:33
axw	then we just stop/remove them	07:33
axw	and rm -fr config.DataDir()	07:34
axw	no opaque script, so upgrading should be simpler	07:34
jam	axw: can't we just derive the upstart service name? We derive it when we create the service in the first place?	07:35
jam	I will admit I don't know exactly what steps need to be done, I'm mostly just thinking about (a) how much can we just do so that when we change that list it is easy to do so	07:36
axw	jam: could do that too. we'd need to tell the agent that it's a state server some way other than through the state database	07:36
jam	axw: why is that?	07:36
jam	I thought the idea for manual teardown was that the state machines go down last, so we still have the database a bit before their dead (I think)	07:37
axw	yes.. hmm, maybe that's ok	07:37
jam	axw: at the very least, we determined what jobs we were running at startup, right?	07:37
axw	I'll need to look at the conditions for ErrTerminateAgent again	07:37
axw	yes	07:37
jam	given we needed to, ya know, do them :)	07:37
jam	axw: this might even make it clearer for the HA w/ manual stuff. If you add another node, that one may not start out as a state server, but become one later	07:38
jam	so deciding what needs to be cleaned up just before you do it, sounds better to me.	07:38
axw	hmm true, good point	07:38
axw	jam: ErrTerminateAgent requires a state conn anyway (makes sense; db err could be transient), so yes, that'd be fine	07:39
jam	axw: in general my experience with upgrades and Upstart is that we don't have a good way to change Upstart config once we've installed. So I'd like to avoid putting stuff in at that level. If it looks plausible to have the "this is how I clean myself up" clearly expressed inside the thing that is running, that sounds the best to me.	07:40
jam	Actually, the best is to have the newest thing possible know how that thing should clean up (like Upgrade should do the clean up in the New code, etc) but some of that is tricky to do.	07:41
axw	jam: I never would've modified upstart config itself, but agent.conf maybe. But anyway, it looks like it's all doable at runtime, without config changes.	07:41
axw	I'll dig in	07:42
jam	axw: so as for the specifics, I don't care if it is a script file that we generate and then run, vs commands we run directly, or whatever	07:43
jam	I don't quite understand why setUninstallScript has a restore function	07:43
jam	is that so it happens in a defer avoiding panic conditions?	07:43
axw	jam: that was just so changes to AgentEnvironment are contained	07:44
axw	after Configure returns, the original value is restored	07:44
jam	axw: so I think I now understand what it does, but I don't quite understand why you don't want AgentEnvironment to stay changed	07:47
axw	jam: it's only of philosophical value - I prefer input variables to be considered immutable	07:47
jam	axw: so I can see where we may not want to mutate what the caller thinks, except if its the whole point of the function :)	07:48
jam	o	07:49
jam	axw: it is suspicious that configure takes and returns a cloudinit.Config which is the same object	07:49
jam	but it appears to be the whole point of the function to mutate the c that is passed in.	07:49
jam	otherwise we should copy it, and return a new one	07:49
jam	which I do prefer	07:50
jam	but it would still be important that we don't unset the thing we just configured	07:50
axw	jam: AgentEnvironment belongs to the MachineConfig (input), not cloudinit.Config	07:50
axw	agreed that the c being input and output is odd - I changed that in another branch the other day :)	07:50
axw	(removed the output)	07:50
jam	axw: the only reason you might want in & out is because you want the caller to nil their object if there is an error, but it does seem like you either want an INOUT var or an IN and OUT but not an INOUT and an OUT	07:51
jam	and if you really need the caller to nil, then take an **obj	07:52
jam	cloudinit_test.go is the only place that doesn't pass it back into the same object (as far as I can tell0	07:53
jam	morning fwereade and dimitern	08:04
dimitern	morning	08:04
rogpeppe	mornin' all	08:07
axw	morning	08:07
axw	jam: do you think it'd be horrible to just attempt stopping/removing the juju-db service, and ignore the ENOENT?	08:09
axw	i.e. no check for state server	08:09
rogpeppe	axw: what's the context?	08:10
axw	rogpeppe: uninstalling mongo (juju-db) when destroying a manual provider env	08:10
axw	currently the machine agent just removes its own upstart config, and exits	08:11
rogpeppe	axw: when will it get ENOENT?	08:12
axw	rogpeppe: if the machine agent is not a state server, then juju-db won't exist	08:12
fwereade	jam, dimitern, rogpeppe,axw: mornings	08:12
rogpeppe	fwereade: hiya	08:12
axw	ahoj	08:12
rogpeppe	axw: i think it sounds reasonable	08:13
rogpeppe	axw: but i'd've thought it might be just as easy to check for state-serverness	08:14
axw	yeah it probably is. just looking at the options	08:14
axw	blind removal is tempting, because it keeps it all in one spot	08:15
rogpeppe	axw: which spot is that?	08:15
axw	rogpeppe: func (m *MachineAgent) uninstallAgent() error	08:15
axw	cmd/jujud/machine.go	08:15
rogpeppe	axw: presumably you could just pass isStateServer into that function (or machine.Jobs())	08:16
axw	rogpeppe: there's an error condition that the agent deals with that would cause termination, where the agent wouldn't be able to determine its jobs	08:17
axw	i.e. the machine entry does not exist in state	08:17
axw	but hey, maybe we don't care about nonsense like that :)	08:17
rogpeppe	axw: hmm, i wondered if something like that was possible	08:18
rogpeppe	axw: in that case, i think just delete and ignore ENOENT	08:18
rogpeppe	axw: but...	08:18
rogpeppe	axw: how can we know to destroy things if we can't get the jobs?	08:19
rogpeppe	axw: don't the jobs arrive in the same reply as the machine life status?	08:19
axw	rogpeppe: yes. if that returns not found or unauthorized, the agent terminates	08:20
rogpeppe	axw: ah, of course	08:21
rogpeppe	axw: in which case, i think that ignoring ENOENT is preferable to the alternative (caching locally whether we did have state server jobs)	08:21
rogpeppe	axw: in a sense, the upstart config is that local cache	08:22
axw	yeah, I'm thinking that too	08:22
rogpeppe	axw: my only hesitation is whether there might be something else with a juju-db service that might get annoyed, but i think enough things will break in that case that we can safely ignore the possibility	08:23
axw	rogpeppe: indeed, I came to the same conclusion	08:24
rogpeppe	fwereade: i think launchpad.net/juju-core/agent/bootstrap.go:123 is crackful and that it should use config.DefaultSeries. what do you think?	08:53
rogpeppe	fwereade: although... hmm, maybe not	08:54
rogpeppe	fwereade: in fact, no i think it's right	08:55
rogpeppe	fwereade: ignore me :-)	08:55
jam	axw: As long as what we are getting rid of is clearly a juju script (juju-db, juju-machine-0, etc) I think we're fine.	08:55
jam	we can't run 2 juju's on a given machine without a lot of other pain	08:55
rogpeppe	jam: yeah. it's a pity, that, really.	08:56
jam	(You might be able to run a unit of one environment and the state server of another environment, but that just sounds terrible)	08:56
jam	rogpeppe: well, we'd have to put namespaces to do tat	08:56
jam	that	08:56
axw	jam: you can with local, but these changes just won't work with local (which I think is reasonable)	08:56
jam	/etc/init/juju-env-X-machine-0	08:56
rogpeppe	jam: yeah - we'd probably put the env uuid in there	08:57
jam	axw: I'd think we'd want local to clean up properly	08:57
axw	why? env.Destroy does that anyway	08:58
jam	axw: well, we still want local environments to clean up properly, right? (it may be done in a different layer, but we might want to consider how to avoid redundancy as well)	08:59
axw	jam: I consider this to be like freeing memory before exiting a process	09:00
axw	there may be some use case in the future, but I don't see one right now	09:01
axw	handling the local provider with non-standard service names takes us back to modifying agent.conf	09:02
axw	jam, rogpeppe: https://codereview.appspot.com/28270043 -- take a look, let me know if you think it's worthwhile involving agent.conf to fix the local provider case	09:50
mgz	mornin'	10:00
jam	morning mgz	10:02
jam	standup time	10:45
jam	fwereade, rogpeppe, TheMue, https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig	10:45
mattyw	fwereade, thanks for the reviews :)	10:54
jam	fwereade: you seem to be having connection issues	10:58
fwereade	jam, ha, even my g+ chats don't seem to be getting though	11:02
fwereade	ian, ok, will do	11:02
fwereade	wallyworld, ^	11:02
jam	fwereade: I got your "isn't that just 2" but that was the last one	11:02
fwereade	grar, v quick break, we'll see if it's happier in 3 mins	11:07
fwereade	wallyworld, fwiw, a watcher for "kinds of containers this machine is expected to run" wouldbe easy, and was the originalplan a while ago	11:16
jam	fwereade: well we have "containers this machine is runinng"	11:16
fwereade	jam, I thought that was container-type-specific	11:16
jam	fwereade: right, that is what we are talking about. making one that is non-container type specific, but just doing "all" and reporting back errors for ones it doesn't support	11:17
fwereade	jam, wallyworld: isn't the simplest way to do that to launch a provisioner task with a broker that always just errors on provisioning?	11:20
jam	fwereade: the concern is that you're launching 5 different provisioners that will never do anything	11:20
jam	and a lot of duplicate code	11:20
jam	why not just run 1 that can handle N container types	11:20
fwereade	jam, because watcher->broker is a simple clean chunk of functionality that already exists	11:21
fwereade	jam, that is the point of a provisioner -- it watches a specific set of machines and provisions them using a specific broker	11:21
fwereade	jam, adding multiple brokers into the mix complicates that unnecessarily	11:21
fwereade	jam, compared to starting one provisioner for each kind of machine, and using a, ha, "null broker" when that machine kind is not known	11:22
jam	fwereade: but why multiple watchers?	11:22
jam	why not watch all possible containers?	11:22
fwereade	jam, to avoid complicating the provisioner task, mainly	11:23
jam	(it may be the internal DB structure don't support it well)	11:23
fwereade	jam, I'm not sure it's in a great position to have 1->N-ness poked into it	11:23
jam	fwereade: but what would an LXC provisioner do differently than a KVM one?	11:23
jam	the commands are different, but that is a lower level	11:23
fwereade	jam, talk to a different broker, where the two brokers are independent and needn't blockone another	11:24
jam	fwereade: overloading your system because you start an LXC and a KVM and an OpenVZ and a doesn't seem a better User Experience :)	11:25
jam	I agree in the external vs local provisioning case	11:25
fwereade	jam, seriously, a provisioner overloads the system?	11:25
fwereade	jam, if that's actually the case then fair enough	11:25
jam	fwereade: starting up a container does	11:25
jam	apt-get update	11:25
jam	starting 10 of them is actually quite bad (from reports I've seen)	11:26
jam	somone did the local provider and really hosed his sytem	11:26
fwereade	jam, sure, that was jorge doing deploy -n 50, but that was still with one single provisioner in play, so I think not germane	11:26
fwereade	jam, the problem was that he asked for his system to be overloaded	11:26
fwereade	jam, and besides the provider/container distinction I think holes your argument -- you're asking for two kinds of provisioner, a 1->1 and a 1->N one	11:28
fwereade	jam, a single provisioner, that maps from machine-set to broker, seems like the clearest model	11:28
jam	fwereade: I personally don't see why machine-set needs to split by type	11:29
jam	I guess	11:29
fwereade	jam, it doesn't need to be, but doing so extracts extraneous functionality from the provisioner, which doesn't need any additional complexity imo	11:30
wallyworld	jam: fwereade: so at the moment, there's an lxcBroker which acts as a provisioner task, as well as a cloud instance provisioner task. i'm sure we'll iterate to get the best model, whether that's one provision task for all container types or one per container type. the kvm provisioner code is not done yet. let's see how it falls out. in the meantime, we will achieve the required user facing functionality wrt containers and all that	11:37
wallyworld	ie users will be able to start supported containers, and get sensible errors if they try and start non-supported ones	11:37
fwereade	wallyworld, ok, that's cool -- I'm just saying I will get a bit shirty if the kvm code requires a single kvm-specific line in the provisioner task itself ;p	11:38
wallyworld	fwereade: as will i. don't fret too much. all i'm saying, it's a work in progress :-)	11:39
fwereade	wallyworld, don't worry, I won't, I know you know what you're doing :)	11:39
wallyworld	sometimes :-)	11:39
wallyworld	fwereade: one of my aims is to get rid of the switch statements in the current provisioner so that the kvm/lxc logic is isolated behind an interface	11:40
* fwereade cheers at wallyworld		11:41
wallyworld	scaling as discussed in the backscroll is a valid concern. we will have to look at that also as part of the solution	11:41
wallyworld	and there's always a trade off between conceptual complexity, number of moving parts etc	11:42
jam	fwereade: so the other logical thoughts are "what about new types", having to add yet-another-thing to monitor for another type that is normally not doing anything, etc.	11:47
jam	we know today that people want openvz, vagrant, vmware, ...	11:47
fwereade	jam, that it the last thing I want	11:47
jam	fwereade: so the nice thing about having 1 "ContainerProvisioner" is that it can also not think about types it doesn't know, but it can still say "I don't know about that type, so here is your error", rather than nothing listening for the OpenVS container type, so when you go to deploy it just sits in pending forever.	11:48
jam	It seemed to smooth things out to have a generic one	11:48
fwereade	jam, the machine agent asks for the types of the containers it's meant to be running; starts provisioners with appropriate brokers for those it understands, and null-broker provisioners for the ones it doesn't	11:48
jam	but it does depend on how things align	11:48
fwereade	jam, null brokers just error on StartInstance	11:48
jam	fwereade: sure, though that does mean Machine Agents run N watchers and N brokers for however many types that we might support	11:49
fwereade	jam, no, it means they run one provisioner for every container type they are currently using	11:50
fwereade	jam, plus one task that starts/stops them	11:50
jam	fwereade: "and null-broker provisioners for the ones it doesn't" is still N	11:50
fwereade	jam, it's a very small N compared to the number of possible container types out there	11:51
jam	I'm not sure I follow.	11:51
fwereade	jam, it's only for those cases where someone deployed an invalid container before the machine came up and was able to setits supported types	11:51
fwereade	jam, most machines just run a container-types watcher	11:52
fwereade	jam, no containers? no types, thus no provisioners	11:52
fwereade	jam, kvm and lxc and vagrant containers added? start 3 of them	11:52
fwereade	jam,one of which is null	11:52
jam	fwereade: I do finally see, though I don't think that has been reflected in the discussions so far.	11:52
jam	wallyworld: does that make sense to you/ ^^	11:53
fwereade	jam, although with a bit of cleverness in state we should be able to auto-error the unsupported ones anyway I think	11:53
* wallyworld reads		11:53
jam	I haven't heard about the container-types as a separate thing being watched.	11:53
fwereade	jam, that was the original idea a while back	11:53
jam	fwereade: wel, defense-in-dept is always useful. (how will this thing act if we poke something that could be construed as invalid)	11:53
wallyworld	jam: yes, it makes sense as that's how i've implemented it. when a machine agent starts up, it determines what container types the host can support	11:54
fwereade	jam, that's a benefit of having a null broker for unknown ones that might slip through	11:54
wallyworld	it then watches for containers of those types to be requested	11:54
wallyworld	once a container of a type is first requested, a provisoner task is started	11:54
fwereade	jam, in the ideal case we should be able to squish those weird ones before they even make it to the agent anyway	11:55
wallyworld	the provisioner task for a container type may well then do other stuff to prepare for the firt contaner of a type to start	11:55
fwereade	jam, the types watcher implementation on the server side could filter those out and error them itself, or possibly the SetSupportedContainers stuff could do so itself with a bit of nasty state prestidigitation	11:56
rogpeppe	natefinch: any chance of seeing your proposed package interface, please?	12:14
natefinch	rogpeppe: yeah, sure	12:16
rogpeppe	natefinch: godoc output would be ideal	12:16
* fwereade lunch		12:16
natefinch	rogpeppe: let me just whip up some godoc comments :)	12:17
rogpeppe	natefinch: always write your doc comments before writing the code :-)	12:17
natefinch	rogpeppe: :) I usually do... this was kind of exploratory coding, so I didn't. shrug	12:19
rogpeppe	natefinch: np	12:19
rogpeppe	natefinch: at this point i'm mostly interested to see if you've got AddPeer or SetPeers	12:20
natefinch	rogpeppe: add and remove	12:20
natefinch	rogpeppe: I could do set, that's actually easier than add or remove	12:21
rogpeppe	natefinch: i'm wondering if set might be more appropriate for our use case	12:21
rogpeppe	natefinch: although you might want to leave add and remove since you've already implemented them	12:21
jam	fwereade, wallyworld: your models don't actually match. wallyworld starts by introspecting the machine, fwereade has a list of requested-container-types that you watch	12:21
jam	so once you have the list, then they mostly match	12:21
wallyworld	i introspect the machine to determine what container types are supported, then watch those only	12:22
jam	wallyworld: which is not watching a list of requested container types	12:22
jam	and it is starting watchers for each type the machine might support	12:22
jam	rather than only ones that have already been requested	12:23
wallyworld	till the first container is requested, it's a strings watcher for the first container instance yes	12:23
jam	thats the key bit that I was missing at least. It may be that I'm misunderstanding the things you've said, but I do feel there is a communication gap between what fwereade is actually describing and what you have	12:23
wallyworld	which then starts a suitable provisioner task	12:24
jam	wallyworld: "first container instance" ?	12:24
wallyworld	afaik, all we have at the moment is the ability to call WatchMachineContainers	12:24
wallyworld	which triggers whenever a container is added to a machine	12:24
jam	which requires a list of container types, right ?	12:24
wallyworld	yes	12:25
jam	wallyworld: so what fwereade is describing, is another watcher	12:25
jam	which is watching the list-of-requested-container-types	12:25
jam	rather than the list-of-known-supported-types	12:25
jam	so we would still have a startup "these are the types I know how to support"	12:25
wallyworld	i currently call WatchMachineContainers - what it gives when it triggers is the container ids	12:25
jam	we actually ask back "and what types would you like me to run"	12:25
jam	wallyworld: right, that is something we also need, but fwereade is giving us a layer where we don't start provisioners until the list of requested containers now contains them	12:26
wallyworld	that's right, i only start a provisioner when the first conainer of a type is requested	12:26
wallyworld	until then, it's a simple strings watcher	12:27
jam	so WatchMachineContainers would be run by each of the LXCProvisioner and KVMProvisioner, with their personal subset. But we don't start one until this other field includes that type in the list.	12:27
wallyworld	no	12:27
jam	wallyworld: but what happens when you have another one	12:27
wallyworld	the provisioners are not started	12:27
jam	or one that is a type you didn't probe for	12:27
jam	or	12:27
jam	etc	12:27
wallyworld	well, the machine agent has to know what possible containers to probe for	12:27
wallyworld	cause there's different initialisation code required for each type	12:28
wallyworld	so it has to be baked in to the system	12:28
wallyworld	we can't suddenly support new container types	12:28
wallyworld	without the code which knows how to set up for that	12:28
wallyworld	which packages to apt-get install etc	12:28
jam	wallyworld: to give a hypothetical. Wouldn't it be nice if someone could request an OpenVZ which Juju 2.2 knows about, it goes into the DB, but the agent on machine-2 is only running Juju 2.0 and can just say "sorry, container type X not supported"	12:28
jam	wallyworld: I'm certainly not saying we support things we don't know how to support	12:29
jam	but what about being able to give error messages about things we haven't heard about before	12:29
wallyworld	i can certainly modify the code to do that	12:29
wallyworld	that would be easy to do	12:30
wallyworld	it would only be a small tweak	12:30
wallyworld	actually	12:30
wallyworld	that's how it woeks now	12:30
wallyworld	when the machine agent starts up	12:30
wallyworld	it sets the supported container list	12:30
jam	wallyworld: and I think it actually handles the "you asked for KVM which I know about but I don't actually support that" as well as "you asked for OpenVZ which I know nothing about"	12:30
wallyworld	and if juju client 2.2 comes along	12:30
wallyworld	and asks for something new, it will error immediately	12:31
jam	wallyworld: except if the machine hasn't finished starting yet, which means it will go off into lala land because nobody is checking for a type that wasn't baked in.	12:31
jam	which is why the "give me the list of types that have been requested, so I can start things for them, and oh, these ones I don't know about so put them into error state"	12:32
wallyworld	no, cause i'm still writing that code	12:32
jam	similar to "these ones I know about but don't actually support"	12:32
wallyworld	and i will be checking for requested stuff that's not supported	12:32
wallyworld	in fact, the code i have does do that already	12:32
jam	it isn't hard to say "if I don't know about it, it isn't supported"	12:32
wallyworld	yes	12:32
natefinch	rogpeppe: http://pastebin.ubuntu.com/6437236/	12:32
wallyworld	jam: the code in progress i have iterates over all requested containers, and sets status if not supported	12:33
wallyworld	so that will pick up new container type XYZ	12:33
jam	wallyworld: but it only does that at startup time ?	12:33
adeuring	rogpeppe: could you have a look athis MP: https://codereview.appspot.com/28310043 ?	12:33
rogpeppe	natefinch: why ...[]ReplsetMember ?	12:33
rogpeppe	adeuring: looking	12:33
wallyworld	jam: yes, but the iteration happens after the block has been established to prevent unknown containers from being reuested	12:33
adeuring	thanks!	12:33
jam	and aftewards it starts watching for only the types that it does support	12:34
rogpeppe	natefinch: wouldn't ...ReplsetMember be sufficient?	12:34
jam	but it starts watching for all things that it might support	12:34
wallyworld	yes	12:34
natefinch	rogpeppe: sorry, I just changed that... yeah, that's what I meant	12:34
wallyworld	jam: but only a strings watcher, not a provisioner	12:34
jam	the model from fwereade is to put a Watcher on actually requested types, for those that are requested, start a provisioner watching for containers of that type. When the list of requested types change, the first one first and starts a new provisioner which may be a "not supported provisioner"	12:35
natefinch	rogpeppe: I always forget the exact syntax for the variable params arrays	12:35
wallyworld	jam: i don't plan on starting any "not supported provisioners"	12:35
rogpeppe	adeuring: LGTM	12:35
wallyworld	no point	12:35
jam	wallyworld: my point is if we do the fwereade one, we don't ever end up in a race condition where we might not notice when someone requests something we don't support, even if the client is buggy and doesn't actually respect the supported types field	12:35
adeuring	rogpeppe: thanks!	12:35
rogpeppe	natefinch: ... replaces []	12:35
wallyworld	buggy client, never :-)	12:36
adeuring	fwereade: could you have a llok here: https://codereview.appspot.com/28310043 ?	12:36
natefinch	rogpeppe: yeah, I remembered that after you mentioned it.	12:36
wallyworld	jam: the back end is the thing that looks at the supported types field	12:36
jam	so while we could write it the other way, this way handles the cases we do care about, saves resources internally (doesn't have to even start watchers for supported types that aren't in use), and handles failure more gracefully	12:36
wallyworld	jam: the client just gets an error	12:36
wallyworld	here's no logic in the client	12:36
wallyworld	there's	12:37
jam	wallyworld: buggy code	12:37
jam	regardless client	12:37
natefinch	rogpeppe: I'd add SetReplicas that just takes an array of ReplsetMember and replaces the set in the mongo document	12:37
jam	anyway, your model can be made to work, the other just seems more robust and actually consumes less resources because you aren't even starting Watchers until one is requested	12:37
wallyworld	jam: the client asks for a container xyz, that goes to the server side, the server side is the thing that error's	12:37
jam	you start A watcher	12:37
jam	and never N watchers	12:37
rogpeppe	natefinch: i think i'd prefer to see bools rather than *bools in ReplsetMember, even if it means having another type for marshalling	12:38
wallyworld	jam: from memory, i start oen strings watcher per supported container type, after the suported container types have been set, preventing new unsupported ones from being rwquested	12:38
jam	wallyworld: anyway, your design can certainly be made to work. My #1 point is that it doesn't actually match what fwereade is saying, and his does have an interesting benefit.	12:38
wallyworld	i don't see the benefit just yet	12:39
wallyworld	my current implementation doesn't allow unsupported containers, is robust to old clients, and doesn't start unnecessary watchers	12:39
wallyworld	s/old/new	12:39
jam	wallyworld: benefit #1, for 90% of all machines that don't run any containers, they start 1 watcher of requested-container-types, even when we support 10 different Virtualization types	12:39
natefinch	rogpeppe: the only problem with that is that buildindexes defaults to true if unset, which is annoying to do in go marshalling .... doable, but annoying.	12:40
jam	so you could deploy to any of those types (say you are running on MaaS so you have full support for whatever you want). but then you still have only 1 watcher because you aren't actually using any of them.	12:40
wallyworld	jam: the current WatchContainer method doesn't take a list	12:40
rogpeppe	natefinch: then have NoIndexBuilding or something?	12:40
wallyworld	it just takes a single container to watch for	12:41
jam	wallyworld: absolutely. It needs code written	12:41
wallyworld	hence right now I need N	12:41
jam	the stuff we have today doesn't match the design fwereade stated we were trying for	12:41
wallyworld	i could change it yes	12:41
natefinch	rogpeppe: hrm... I sorta hate to modify the API for replicasets away from the Mongo documentation.	12:41
jam	we need another Watcher to watch the list-of-requested-container-types	12:41
jam	fwereade's claim is that was the design	12:41
jam	so the data may already be present in the DB	12:42
wallyworld	jam: so whether we start 1 initial watcher or N, it's essentially the same design	12:42
wallyworld	jam: not list-of-requested-container-types, but list of supported container types	12:42
rogpeppe	natefinch: we're still having the same defaults, so i think it's reasonable. (I think it would be quite unintuitive if the zero value of some of those *bools implied true and others false)	12:42
wallyworld	no need to watch for those we don't support	12:43
rogpeppe	natefinch: i think we can be go-idiomatic even while sticking reasonably close to mongo docs	12:43
rogpeppe	natefinch: even better, we can include links to the relevant parts of the doc	12:43
rogpeppe	s/can/could/	12:43
natefinch	rogpeppe: I'm just going off the defaults of what Mongo gives you. Not my faults they're inconsistent ;) To me, the point shows that they're optional, and the default is whatever mongo says the default is.	12:44
natefinch	s/point/pointer	12:44
rogpeppe	natefinch: i think it's going to be really awkward to build a ReplsetMember	12:44
jam	wallyworld: sure, but if we don't support them they likely won't end up in the requested set either.	12:44
rogpeppe	natefinch: i'd much prefer if it didn't need pointers to bool	12:44
jam	wallyworld: the point is to handle failure modes "oh it did end up in the requseted set somehow"	12:45
rogpeppe	natefinch: or float, for that matter	12:45
natefinch	rogpeppe: I guess pointers to booleans are annoying, it's true.	12:45
jam	and still be able to respond to it	12:45
wallyworld	jam: sure, my code does that now	12:45
jam	rather than just staying in Pending forever	12:45
jam	wallyworld: sure, but it also starts N watchers when there are 0 requested containers	12:45
wallyworld	by my code i mean the stuff in progress	12:45
natefinch	rogpeppe: the problem is that the defaults aren't zero for Priority or Votes.	12:45
wallyworld	jam: it has to start a watcher (or N currently)	12:46
jam	wallyworld: it has to start 1	12:46
jam	yes	12:46
jam	but 1 != N	12:46
wallyworld	so it knows when to kick off a new provisoner and prepare for that container type to be deployed	12:46
rogpeppe	natefinch: tbh, i don't mind if we don't have defaults for those	12:46
wallyworld	jam: sure, but with the api we have now, i have to start N	12:46
jam	wallyworld: you were asking for fewer moving parts	12:46
wallyworld	i can change that	12:46
wallyworld	i'm just using what we have to get it working	12:46
wallyworld	in user visible sense	12:47
wallyworld	can iterate behind the scenes once it's running	12:47
jam	wallyworld: mostly it felt like you didn't quite understand the design fwereade was talking, because you certainly weren't designing it in the same fashion. I wanted to make sure we are actually having the same conversation	12:47
natefinch	rogpeppe: now who's making it more difficult to create replica members? As it is, all you have to specify is the Host (actually that's something I shold work out - the Ids of the members are really just their index i nthe list... I probably shouldn't expose them)	12:47
jam	steps along the way, as long as we're actually headed in the same direction	12:47
rogpeppe	natefinch: one possibility is to have a separate type, say MemberSettings, and have a value, say DefaultMemberSettings	12:47
rogpeppe	natefinch: which holds all the usual defaults	12:47
wallyworld	but i am designing it in the same way - just differ initially on the initial watcher	12:47
wallyworld	1 vs N	12:48
wallyworld	or at least i think so	12:48
natefinch	rogpeppe: likely, most people will only use Host as a value. The rest are rarely used, other than ArbiterOnly	12:48
wallyworld	jam: i haven't actually talked to fwereade about this at all, just going by scanning the scrol back	12:48
natefinch	rogpeppe: (at least from what the documentation says, it seems unlikely they're options that are used very often, since the doc says stuff like "generally you shouldn't change this" etc)	12:49
jam	wallyworld: you aren't watching a list-of-requested-container-types at all, which is quite different from what he described.	12:49
wallyworld	the core concepts though i think are in alignment - doesn't start provisoners until needed, delay host init until needed etc	12:49
jam	I think after you have a watcher of something requested	12:49
jam	you're doing the same thing	12:49
wallyworld	maybe you and i differ on terminology	12:50
wallyworld	i think i am watching a list of supported continer types	12:50
jam	(15:16:00) fwereade: wallyworld, fwiw, a watcher for "kinds of containers this machine is expected to run" wouldbe easy, and was the originalplan a while ago	12:50
natefinch	rogpeppe: I like the default value of the struct.. that's not a bad idea. I just think it's less straightforward than just doing it the way I have it.	12:50
jam	wallyworld: no, you are starting N watchers on each type that you support	12:50
jam	which is not a watcher of "what types have been requested for this machine"	12:51
wallyworld	jam: yes - alist of supported contsiner typrs = a list of kinds this is expected to run	12:51
wallyworld	jam: no 1 watcher on each type = N	12:51
wallyworld	not N watchers on each type	12:51
wallyworld	and that's only because there's no the api to support one watcher for all supported typrs	12:52
wallyworld	yet	12:52
jam	so, yes we were talking past eachother a bit	12:52
jam	N watchers, 1 for each type you support	12:52
jam	vs	12:53
jam	wallyworld: you are starting N watchers	12:53
jam	vs	12:53
jam	starting 1 watcher that gets a list of things that it should then start watchers for	12:53
wallyworld	i don't get the last line, but	12:54
wallyworld	i would only like to start 1 watcher for the N container types	12:54
wallyworld	when the api supports that	12:54
wallyworld	it would be a small change to what we do now	12:54
jam	wallyworld: if I have a doc in the DB, that aggregates all container types requested for this machine, which is just a list of [LXC, KVM], though in the common case is just []	12:54
jam	fwereade's contention was that "that was the original design"	12:54
jam	which was clear didn't match what is being implemented	12:55
wallyworld	ok, i see now	12:55
jam	and I was trying to be clear about where he at least thought we were going	12:55
wallyworld	i watch for cotainer ids	12:55
wallyworld	and on the first one, start yje provisoner	12:55
wallyworld	essentially the same thing	12:55
wallyworld	with less moving parts	12:55
wallyworld	cause the db model is simpler	12:55
jam	wallyworld: except in the common case each machine-agent starts up N watchers, which is resources in the API server	12:55
jam	to watch for things that will never actually have chanegs	12:56
jam	changes	12:56
wallyworld	yes, but will be one	12:56
wallyworld	when the api is fixed	12:56
jam	wallyworld: so I was proposing that, but fwereade seemed to think it was bad	12:56
jam	and prefered the "original design"	12:56
jam	which is why this thread got started	12:56
jam	so I've at least illuminated where you two differe	12:56
jam	and I will happily step back and let fwereade and you finish the conversation	12:56
wallyworld	seems so :-) sorry, i was not getting it at first	12:56
jam	wallyworld: neither did I. I just got it slightly sooner than you.	12:57
wallyworld	well, i've had a few glasses of wine here by now, it's almost 11pm :-)	12:57
jam	It wasn't until (15:53:01) jam: wallyworld: does that make sense to you/ ^^	12:57
jam	that I got what the difference was	12:57
wallyworld	i think i skimmed that bit, sorry :-)	12:58
wallyworld	in any case, i'd like to finish what i currently have - it works, is robust to different client versions	12:58
jam	wallyworld: np, I didn't catch that you weren't understanding the "starting N watchers, 1 of each type" thing	12:58
wallyworld	and we can argue about how to tweak it	12:58
wallyworld	jam: yeah, i would have preferred to have only 1 watcher, but thought getting the new api done would take too much time	12:59
wallyworld	and i wanted to try to have kvm done for this week	12:59
wallyworld	cause the new api may we involve back end changes	12:59
wallyworld	to the watcher infrastructure	13:00
jam	wallyworld: sure. I think the discussion is just whether we have a higher-level watcher that then fires off these sub ones, or an aggregate watcher across them.	13:01
wallyworld	yeah. once the provisioner is started, it essentially becomes that watcher	13:01
wallyworld	s/that/the	13:01
jam	yeah	13:01
wallyworld	all the work to add the supported containers db model, and the code to update status etc is essentially independent of the initial watcher thing	13:02
wallyworld	hence i can get that done and provide the user visible functionality up front	13:03
wallyworld	and tweak the behind the scenes stuff later	13:03
rogpeppe	natefinch: why does InitReplicaSet need to be run on the same machine when initially setting up a replica set?	13:13
=== gary_poster\|away is now known as gary_poster
natefinch	rogpeppe: hmm.. I thought it had to be so mongo would know to replicate the stuff in this mongo instance, but it looks like I was wrong. The docs don't mention that, so I'll remove it.	13:15
rogpeppe	natefinch: i actually didn't think it was possible to change an existing mongo to use a replica set without restarting it, but presumably you found out a way?	13:16
natefinch	rogpeppe: oh, no... this code is outside that. I didn't write the restarting code yet, since that's outside the scope of what mgo can do. But that's pretty trivial, a couple exec commands	13:17
natefinch	rogpeppe: which is to say, yes, you have to restart mongo	13:17
rogpeppe	natefinch: i think that perhaps we can ignore that - we'll perhaps use a different upstart name, and make sure that the old one is removed before ensuring the new one exists.	13:18
rogpeppe	natefinch: do we actually need an InitReplicaSet call then?	13:18
natefinch	rogpeppe: you have to do both, the flag on mongo startup and initreplicaset	13:19
natefinch	rogpeppe: brb	13:20
jam	natefinch, rogpeppe: do you have to restart mongo anytime you change the replica set ? It looks like you have to set the startup flag, but we could just do that always, right?	13:25
rogpeppe	jam: no you don't	13:26
jam	rogpeppe: so http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/ certainly says you can't take a running service and make it HA without stopping it	13:26
rogpeppe	jam: we need to pass in the replica set name, but otherwise i think there's no need to restart	13:26
jam	unless we default to always starting in a replica set with just 1 entry	13:26
rogpeppe	jam: that's what we'd do, i think	13:26
rogpeppe	jam: unless you can think of a reason that's a bad idea	13:26
natefinch	jam, rogpeppe: so are you saying, always start mongo with the flag, but then just don't do replsetinitiate?	13:27
rogpeppe	natefinch: i think so. i'm not quite sure what your InitReplicaset function is doing though.	13:28
jam	rogpeppe: http://paste.ubuntu.com/6437458/	13:28
jam	ah, "we'd do"	13:29
jam	not "thats what we already do"	13:29
natefinch	rogpeppe: its what actually sets up the replica... it passes in the list of replicas. I don't know what it does behind the scenes. I can test what happens if you start mongo with --repl and try to use it as an individual database	13:29
rogpeppe	natefinch: from an API user's p.o.v., i'd prefer not to have to call InitReplicaset ever	13:30
jam	natefinch: so it sounds like we'd really like to do all of that work in "bootstrap-state"	13:30
jam	so that we have a replica set with just 1 entry	13:30
jam	natefinch: rs.initiate() takes an optional configuraiton	13:30
jam	which sure sounds like the intial value can just be 1 node	13:31
natefinch	jam: good point	13:31
rogpeppe	my experiments seemed to show it worked fine with just one node in the replica set	13:31
rogpeppe	there's one slight problem though	13:32
rogpeppe	in bootstrap-state, we don't necessarily know the machine's address	13:32
rogpeppe	or... do we?	13:32
jam	rogpeppe: again rs.initialize() can just be started without passing in anything	13:32
jam	let mongo sort it out	13:32
jam	we can change it later when we expand it	13:32
rogpeppe	jam: it might not sort it out correctly	13:32
rogpeppe	jam: at least on my laptop, it got the wrong address	13:33
jam	rogpeppe: there is a warning that you shouldn't use localhost for a member unless all entries are on localhost	13:33
jam	rogpeppe: but couldn't we do that when expanding?	13:33
jam	certainly it doesn't matter when there is no entries	13:33
rogpeppe	jam: yeah, you're probably right	13:33
jam	well, when there are no other mongod's	13:33
jam	mongods ?	13:33
jam	natefinch: extra exciting is that the docs explicitly say you shouldn't use mongorestore to seed the new guys, but you could snapshot the filesystem when mongo is in a consistent state.	13:35
jam	sounds like "stop mongod, snapshot the filesystem, then start it again"	13:35
jam	which is pretty terrible.	13:35
jam	I'm hoping as long as you haven't written data you don't have to	13:36
jam	it just needs a really long oplog	13:36
jam	yeah, while earlier in http://docs.mongodb.org/manual/tutorial/expand-replica-set/ it says "you can seed it this way", the next section says "you should not have any data already"	13:36
natefinch	jam: It should just sync	13:37
natefinch	jam: yeah, adding replicas with data in them would be bad	13:37
jam	so a little Schizophrenic	13:37
natefinch	brb, diaper	13:37
jam	natefinch: so it sounds like as long as you have an exact FS snapshot, then you'd be ok	13:37
jam	probably mongodbrestore doesn't preserve some oplog property	13:37
jam	it does sound like you just want to start empty	13:38
jam	I don't think we want to try producing a stable snapshot to copy on our own	13:38
rogpeppe	natefinch: here's an idea for a possible package interface. it's pretty close to what you have now, semantically, but with somewhat different names: http://paste.ubuntu.com/6437493/	13:39
rogpeppe	natefinch: need to go to lunch, back soon	13:40
jam	rogpeppe: interestingly, it does look like you have to add replica set members one by one, and they must already be started	13:41
natefinch	jam: pretty sure I've tried adding them before they were started, but I should double check	13:42
jam	natefinch: it does appear that you could set the configuration	13:42
jam	and then they should come online "by magic" as the command doesn't look to be synchronous	13:42
jam	but the docs certainly tell you to start them first	13:42
jam	presumably you could add them, and it would just go into non-quorum state	13:43
jam	which might be pretty bad if you are going from 1-3	13:43
jam	1 to 3	13:43
jam	rather than adding 1, waiting for the sync to finish, then adding another	13:43
natefinch	jam: isn't 2 a problem either way?	13:44
jam	natefinch: might be worth trying. create a bunch of data, start 1, add 2 and see if it accepts more data while it is bringing 2 and 3 up	13:44
jam	natefinch: so if you start with 1, and add 1, you still work, though if either one fails you've lost quorum	13:44
jam	I tihnk	13:44
jam	but at least it should still take writes (i would think)	13:44
jam	as it knows it is the elected master	13:44
jam	if you start 1 and add 2d	13:44
jam	2	13:44
jam	maybe it is still true	13:44
jam	that it knows it is the master by election process	13:44
jam	worth trying to see if adding 2 immediately puts it into "unavailable until sync is don"	13:45
jam	don'	13:45
jam	done	13:45
jam	natefinch: anyway, if you add 2 and they aren't up yet	13:45
natefinch	jam: it definitely says it re-elects when you remove a replica, but doesn't say it does when you add... so it's possible it'll just work	13:45
jam	it should refuse writes	13:45
jam	"should"	13:45
natefinch	jam: yeah, lotta shoulds. I'll do tests and figure out what it does do :)	13:46
jam	they may get put in some sort of "pending" nodes that don't actually change quorum	13:46
jam	natefinch: hopefully you don't have to test across version permutations	13:46
natefinch	jam: versions of mongo?	13:46
jam	natefinch: right	13:47
natefinch	jam: well, they just released 2.4 recently, and it looks like they're on about an 18 month cycle, so I think we're good for a while	13:48
jam	natefinch: http://engineering.foursquare.com/2011/05/24/fun-with-mongodb-replica-sets/ is interesting, though I don't think we'll actually be setting up hidden backup nodes	13:48
jam	natefinch: except we're running 2.2.? in production today :)	13:48
natefinch	jam: oh.	13:48
jam	so we know we need at least 2 version	13:48
jam	versions	13:48
natefinch	jam: I guess testing 2.2.x and 2.4 is probably a good idea. Are we likely to start using 2.4 soon? What determines that?	13:49
jam	natefinch: #1 thing is what version will be in Trusty	13:49
jam	but I'm quite sure we're stuck with 2.2 for precise->saucy for a while	13:50
jam	if we want to go to 2.4 we probably have to get it into trusty real-soon-now	13:50
jam	jamespage: ^^ do you know the plans for upgrading MongoDB version? I'm guessing we don't want to be using mongodb 2.2 in 3 years	13:50
jam	anyway, dinner time here	13:51
jam	see you all later	13:51
jamespage	jam: trusty already has 2.4	13:51
jamespage	so did saucy	13:51
natefinch	jamespage: cool, thanks	13:51
jamespage	natefinch, np	13:51
jam	natefinch: so in other words, we already deploy to 2.2 and 2.4	13:52
jam	given ppa:juju/stable is running 2.2	13:52
jam	for P	13:52
jam	jamespage: unless my "apt-cache madison" is lying somehow :)	13:53
jamespage	jam: yes - but juju auto-adds cloud-tools which contains 2.4	13:53
rogpeppe	natefinch: do you know if there's any way of asking whether replica set members are up to date with the log?	13:58
=== adeuring1 is now known as adeuring
adeuring	jam: could you have another look here: https://codereview.appspot.com/28310043/ ?	14:09
natefinch	rogpeppe: I'm not sure	14:11
rogpeppe	natefinch: i've just found it	14:12
rogpeppe	natefinch: http://docs.mongodb.org/v2.2/reference/replica-status/#repl-set-member-statuses	14:12
rogpeppe	natefinch: what do you think of my proposed package interface, BTW?	14:12
rogpeppe	natefinch: i tried to formulate it from the top down as something i'd like to use rather than from the bottom up	14:13
rogpeppe	natefinch: http://paste.ubuntu.com/6437493/ in case you missed it	14:13
natefinch	rogpeppe: saw it	14:13
natefinch	rogpeppe: mostly looks good to me. The one problem I have with memberdefaults is that if anyone just constructs members to pass in without noticing they should use the defaults... they'll get pretty bad defaults (no votes, 0 priority, and no indexes)	14:18
rogpeppe	natefinch: yeah; i think that's ok though. the defaults are there and obvious.	14:19
natefinch	rogpeppe: hrmph. It's not horrible, but not my favorite thing. The defaults on the struct are not what the struct actually defaults to.	14:22
sinzui	fwereade, rogpeppe: can either you of help me triage this bug? Do we is it really in Juju? Do we commit to fix it in the next 6 months? Bug #1250965	14:24
_mup_	Bug #1250965: Loopback mounts do not work with local provider <local-provider> <juju-core:New> <swift-storage (Juju Charms Collection):New> <https://launchpad.net/bugs/1250965>	14:24
rogpeppe	natefinch: yeah, maybe better to leave out the "defaulting to" remarks and just rever to MemberDefaults in the Member doc comment	14:24
rogpeppe	sinzui: looking	14:24
* rogpeppe looks up "loopback mounts"		14:25
rogpeppe	sinzui: by my very limited understanding of the issue, it looks like something we could probably fix soon and easily	14:27
rogpeppe	sinzui: and that we should do	14:27
sinzui	thank you!	14:27
rogpeppe	sinzui: but there might be security or other issues that i'm not aware of	14:28
sinzui	understood.	14:28
dimitern	rogpeppe, ping	14:46
rogpeppe	dimitern: pong	14:46
dimitern	rogpeppe, what's the preferred way to get an the environ from an api connection?	14:47
rogpeppe	dimitern: juju.NewConnFromState	14:47
dimitern	rogpeppe, if I use NewAPIClientFromName I only get the api client, not the underlying APIConn, which has both state and environ	14:47
rogpeppe	dimitern: best to avoid the necessity if possible though	14:48
rogpeppe	dimitern: which call is this that needs it?	14:48
dimitern	rogpeppe, I need something like NewConnFromState, but connecting to the API and returning the APIConn	14:48
dimitern	rogpeppe, upgrade juju needs an environ in order to call FindTools with it	14:49
rogpeppe	dimitern: oh, i see, as an agent	14:49
dimitern	rogpeppe, as a client	14:49
dimitern	rogpeppe, right now conn.Environ is used to get the environ in the command	14:50
rogpeppe	dimitern: cfg, err := st.EnvironConfig(); env, err := environs.New(cfg)	14:50
dimitern	rogpeppe, ah, ok, so I can call client.EnvironmentGet() and use that to construct and environ object	14:51
rogpeppe	dimitern: yeah	14:51
dimitern	rogpeppe, cheers	14:51
rogpeppe	dimitern: although...	14:51
rogpeppe	dimitern: we might possibly want to provide a way for a client to find tools without necessarily providing them with the whole environ config	14:52
rogpeppe	dimitern: so there may well be an argument for a new API call here	14:52
rogpeppe	fwereade: what thinkest thou?	14:52
dpb1	fwereade: ping	14:57
dimitern	rogpeppe, I realized I don't need to implement anything else than client.SetEnvironAgentVersion() in the API, and use EnvironmentGet() initially	14:57
rogpeppe	dimitern: sounds good	14:57
jcsackett	sinzui, abentley: either of you free to look at https://code.launchpad.net/~jcsackett/charmworld/better-jobs/+merge/195443 ?	15:05
abentley	jcsackett: sure.	15:06
jcsackett	also, do we have a new "ping the team" word, since we're not orange anymore?	15:06
jcsackett	thanks, abentley.	15:06
abentley	Maybe juju-qa?	15:06
abentley	jcsackett: It looks like you've added tests for your github changes, but not askubuntu.	15:10
jcsackett	abentley: that's true, since i didn't think it was really changing askubuntu execution.	15:10
jcsackett	abentley: oh wait, the backoff thing should have a test.	15:11
abentley	jcsackett: That's what I was thinking.	15:11
jcsackett	abentley: dig, i'll add that.	15:11
sinzui	fwereade, I see a report using the 1.16.4 potential client has a problem when the state-server is 1.16.3. ERROR no such request "DestroyMachines" on Client.	15:33
* sinzui is attempting to reproduce		15:33
fwereade	sinzui, oh shite	15:33
fwereade	sinzui, ofc it's reproable, I am an idiot, I even thought of it and then forgot it	15:34
* rogpeppe sees 1.16.5 arriving pronto		15:34
fwereade	sinzui, unless I convinced myself that it was an expected and transient error	15:34
fwereade	rogpeppe, that'll have the same problem	15:34
sinzui	fwereade, we do not normally see this in tests because they assume you are savvy enough to upload your tools if you have a release candidate, or we have release the actual tools	15:34
rogpeppe	fwereade: unless 1.16.5 rolls back some client changes i guess	15:35
fwereade	rogpeppe, and thus rolls back the bugfix	15:35
rogpeppe	fwereade: the bugfix can't apply client-side? i guess not unless you factor out stuff to statecmd	15:36
sinzui	1.16.4 is not out, We are going to release today I think.	15:36
rogpeppe	fwereade: sorry, i should have thought of this in my review	15:36
sinzui	fwereade, caribou reported the issue.	15:36
fwereade	rogpeppe, or tangles the source tree by introducing a 1.16-only statecmd bit	15:36
sinzui	I gave him the script that make a package	15:36
fwereade	sinzui, all praise to caribou	15:36
rogpeppe	fwereade: i'm not quite sure what you're thinking of there	15:37
fwereade	sinzui, I don't suppose it's reasonable to ask people to upgrade both server and client if they want the bugfixes?	15:37
fwereade	rogpeppe, the more 1.16 diverges from the shape of trunk the harder it will be to maintain -- I don't want to make that experience suck until we have 1.18 out, at which point we needn;t worry about 1.16 so much anyway	15:38
sinzui	fwereade, I consider a bug if the client ever selects a newer server.	15:38
fwereade	sinzui, yeah, normal use will lead to breakage	15:38
rogpeppe	fwereade: ah, bit==piece, not 1-or-0	15:38
sinzui	fwereade, I do think it is reasonable to say upgrade your client, then upgrade the server	15:38
* fwereade kicks himself around a bit		15:39
jcsackett	abentley: tests are pushed up.	15:39
fwereade	sinzui, new server with old client still works, but doesn't allow for --force, right?	15:39
fwereade	sinzui, it's just old server with new client?	15:40
sinzui	fwereade, I am not sure, caribou has stepped away for a bit	15:40
abentley	jcsackett: This also adds remove_server_start_time.py. Is that deliberate?	15:42
sinzui	fwereade, this is the background I have about the issue: after the report of the error: http://pastebin.ubuntu.com/6438018/	15:43
abentley	Oh, I guess that's a merge.	15:43
abentley	jcsackett: r=me.	15:44
sinzui	fwereade, "even without the --force it fails"	15:53
sinzui	fwereade, basic "juju terminate-machine 1" fails with the message mentioned previously	15:53
fwereade	sinzui, I think it is clear -- I backported the DestroyMachines and DestroyUnits API methods to 1.16.4, so 1.16.3 client still works by talking direct to the db	15:55
fwereade	sinzui, but 1.16.4 client expects the APIs to exist, and a 1.16.3 server does not have them	15:55
fwereade	sinzui, FWIW this will also break destroy-unit in the same circumstances	15:55
sinzui	fwereade, This issue might also be alleviated with "best practice". I have advised "juju upgrade-juju --version=1.16.4" to be clear about putting everything on the same version	15:56
fwereade	sinzui, well, if we can be very clear about it in the release notes, it does expose very useful new functionality	15:57
rogpeppe	natefinch: here's the replicaset package interface suggestion with status added, FWIW: http://paste.ubuntu.com/6438102/	16:06
jcsackett	abentley: thanks.	16:08
dimitern	fwereade_, rogpeppe: upgrade-juju + api https://codereview.appspot.com/21940044/ PTAL	16:16
natefinch	rogpeppe: reading it	16:20
natefinch	rogpeppe: are you running Mongo 2.2 or 2.4?	17:01
rogpeppe	natefinch: 2.2.4	17:01
rogpeppe	natefinch: ah, i was looking at the 2.2 docs when i was doing that package description	17:02
natefinch	rogpeppe: yeah, figured. I was poking at mongo and noticed some more info in status, but it must be added in 2.4	17:03
rogpeppe	TheMue: ping	17:12
* fwereade_ will bbl		17:12
TheMue	rogpeppe: pong	17:12
rogpeppe	TheMue: i've just been looking at https://codereview.appspot.com/24040044 again	17:12
TheMue	rogpeppe: yep	17:12
rogpeppe	TheMue: it still doesn't seem quite right to me, unless i'm missing something	17:12
TheMue	rogpeppe: ok, I'm listening	17:13
rogpeppe	TheMue: if a connection drops, what cleans up the pingTimeout?	17:13
TheMue	rogpeppe: if it drops Ping() isn't called, so the timer isn't reset, after 3 minutes there's a timeout which calls the passed action. and here rpcConn.Close() is called, which also call Kill() on the root (it implements the killer interface, but that already existed)	17:15
TheMue	rogpeppe: in the inital code Ping() already existed, but with no code inside, only a comment	17:15
natefinch	rogpeppe: I'm going to go with your suggestion and move the code I have over to it (it's really just some minor changes). I don't have the status code written, but that should be easy.	17:15
rogpeppe	natefinch: cool, thanks.	17:16
natefinch	rogpeppe: one thing - is it really that useful to return maps of statuses and members	17:16
rogpeppe	TheMue: so if a client drops a connection, the goroutine will remain around for up to 3 minutes. that seems a bit wrong to me - surely we can clean it up?	17:17
rogpeppe	natefinch: i dunno, i wondered about that	17:17
rogpeppe	natefinch: it nicely suggests the fact that there's only one entry per address	17:17
rogpeppe	natefinch: and it might work out more nicely in the actual agent code	17:18
TheMue	rogpeppe: eh, until those 3 minutes are done we're not sure that the connection is dropped (or the agent on the other side is just blocked)	17:18
rogpeppe	TheMue: what if the client explicitly drops the connection?	17:18
natefinch	rogpeppe: seems like it just makes it a little more annoying to iterate... it's also a difference from the way the data is input. It's not too hard to construct a map from a list if you need a map... it just doesn't seem like it actually fits the data model (other than, yes, there's only one per host... but that's generally more useful on input than output)	17:19
TheMue	rogpeppe: how is apiserver notified about that today?	17:19
rogpeppe	TheMue: the Kill method is called	17:20
rogpeppe	natefinch: the scenario i'm thinking about is you get info, then you want to see how the info corresponds with info you already hold.	17:22
rogpeppe	natefinch: but if you really think it doesn't fit very well, then slices could be fine, probably	17:22
TheMue	rogpeppe: ok, so I should stop the goroutine here too, but as Kill() is called in Close() I have to ensure that it doesn't deadlock	17:23
rogpeppe	TheMue: yup	17:23
TheMue	rogpeppe: do you add a note in the review? otherwise I'll do it ;)	17:24
rogpeppe	natefinch: having CurrentMembers return the same thing as is passed to replicaset.Set and Add seems like a reasonably strong argument for returning a slice actually	17:24
rogpeppe	TheMue: i will	17:24
TheMue	rogpeppe: great, thx	17:24
natefinch	rogpeppe: that was my thinking	17:25
rogpeppe	natefinch: and CurrentStatus should be similar to CurrentMembers, so yeah, go with slices all round	17:25
natefinch	rogpeppe: plus, there's only a max of 12 items in the list, so even if you have to do naive N^2 logicm it isn't going to hurt anything	17:26
natefinch	rogpeppe: cool	17:26
rogpeppe	natefinch: performance was not part of my considerations	17:26
natefinch	rogpeppe: going to be out for about an hour. Turning into a late working day for me, but so be it. I should have that code all set by EOD, and hopefully some tests too.	17:34
rogpeppe	natefinch: brilliant, thanks!	17:35
TheMue	rogpeppe, natefinch: anybody knows that error that made my merge fail: https://code.launchpad.net/~themue/juju-core/054-env-more-script-friendly/+merge/191838	17:35
rogpeppe	TheMue: sporadic failure	17:36
rogpeppe	TheMue: just mark as approved again to try once more	17:36
TheMue	rogpeppe: ah, already wondered	17:36
TheMue	rogpeppe: thx	17:36
sinzui	fwereade_, do you have a moment to discuss terminate-machine from new client to old server?	17:51
rogpeppe	right, done for the day	18:42
rogpeppe	g'night all	18:42
rogpeppe	off to see Gravity at the local 3D imax; should be good if the reviews are anything to go by.	18:43
mramm	rogpeppe: have fun!	18:44
natefinch	rogpeppe: night. Supposed to be good, yeah.	18:44
=== marcoceppi_ is now known as marcoceppi
thumper	morning	19:52
natefinch	thumper: morning	19:52
=== gary_poster is now known as gary_poster\|away
thumper	natefinch: o/	19:57
=== gary_poster\|away is now known as gary_poster
sinzui	thumper, could you read and reply to the message "Geting bug 1222671 into 1.16.4" that I sent to juju-dev	20:05
_mup_	Bug #1222671: Using the same maas user in different juju environments causes them to clash <cts-cloud-review> <maas-provider> <Go MAAS API Library:Fix Committed> <juju-core:Fix Committed by thumper> <juju-core 1.16:In Progress by sinzui> <https://launchpad.net/bugs/1222671>	20:05
thumper	hi sinzui	20:05
thumper	sinzui: it was my understanding that the merge that rog did into the 1.16 branch fixed that	20:06
thumper	which is why I marked it fix committed or fix released in that series	20:06
sinzui	Fab. Thanks thumper	20:06
thumper	I may be mistaken, but that is what I thought	20:06
thumper	fwereade_: you around?	20:12
fwereade_	thumper, heyhey	21:11
thumper	fwereade_: hey dude	21:11
thumper	got time for a hangout?	21:11
fwereade_	thumper, how's it going?	21:11
fwereade_	thumper, sure	21:11
thumper	good,	21:11
* thumper starts one		21:11
thumper	fwereade_: https://plus.google.com/hangouts/_/7ecpjvqj508h694vc55hjnqsvo?hl=en	21:12
thumper	wallyworld: so... we don't have any shared storage any more?	21:47
thumper	wallyworld: I could remove a config key from the local provider	21:47
thumper	shared-storage-port	21:47
wallyworld	nope, cause only ec2 and openstack had it anyway	21:47
wallyworld	and now we have simplestreams it's not needed	21:48
wallyworld	i guess so, not sure what shared-storage-port did	21:48
thumper	wallyworld: we should really fix the tools uploading for the local provider	21:48
wallyworld	yes	21:48
thumper	wallyworld: also, fwereade_ wants to chat with you	21:48
wallyworld	i'm available	21:49
fwereade_	wallyworld, heyhey	21:49
wallyworld	yello	21:50
wallyworld	fwereade_: did you want a hangout?	21:51
fwereade_	let's have a go	21:51
fwereade_	google has started hating me again after a goodish week	21:51
wallyworld	https://plus.google.com/hangouts/_/72cpil41gfi1iafo9ljflprqds	21:51
* thumper pokes the local provider with a long stick to see if it moves		21:57
* thumper opens up the beast again for more surgery		21:59
wallyworld	fwereade_: google does hate you	22:23
wallyworld	fwereade_: so, some remaining issues. i thought it best to keep the notion of managing the container dependencies out of the provisioner - those are separate concerns to me. the model is: wait until a container type is required, ensure stuff is set up, then start the provisioner to manage the creation of the containers	22:26
fwereade_	wallyworld, I'm +1 on that	22:26
wallyworld	so, the initial watcher does then kill itself once it has done that job	22:27
fwereade_	wallyworld, what you have does a solid job of starting the appropriate provisioner at the appropriate time	22:27
thumper	um...	22:27
wallyworld	cause it has served its purpose	22:27
thumper	do we have an initial watcher for each type of container?	22:27
wallyworld	yes, but only because the api only allows that	22:27
wallyworld	the api needs to change	22:27
fwereade_	wallyworld, I'm just saying that the setup bit is not implemented	22:28
wallyworld	and that involves uknown unknowns	22:28
wallyworld	fwereade_: it is in a downstream mp	22:28
wallyworld	the next one in the pipe	22:28
wallyworld	fwereade_: https://codereview.appspot.com/22980045/	22:28
* thumper fetches the paddles to shock the local provider back to life		22:29
thumper	CLEAR	22:29
wallyworld	lol	22:29
wallyworld	fwereade_: so, each container package is responsible for knowing how to set itself up so containers of that type can be started on a given host. the machine agent calls out when required to do that and then starts a suitable provisioner. i still want to get the hammer out and fix the provisioner task as previously discussed	22:31
wallyworld	my main goal this week is to get kvm supported with what apis we currently have	22:31
wallyworld	once that user facing functionailty is delivered, then we can tweak the behind the scenes things to clean it up	22:32
* thumper wondered why the local provider wasn't starting the machine		22:48
thumper	network traffic is spiking	22:48
thumper	last line in the log file	22:48
thumper	2013-11-18 22:44:33 DEBUG juju.container.kvm container.go:32 Synchronise images for precise amd64	22:48
wallyworld	thumper: we need user feedback on that shit :-)	22:49
thumper	wallyworld: not sure how...	22:49
wallyworld	to eliminate the wondering	22:49
wallyworld	we need to establish a channel back to the client	22:49
wallyworld	and the service business logic can pop progress events into that channel	22:50
thumper	machine provisioning failed...	22:54
thumper	now to figure out why	22:55
fwereade	wallyworld, everything to do with computers hates me	23:00
fwereade	wallyworld, thanks for that link though	23:00
fwereade	wallyworld, for some reason it makes it all look less objectionable	23:00
fwereade	wallyworld, do you think there's a reasonable evolution that lets us drop all the frickin' switching though?	23:01
wallyworld	fwereade: one sec. otp	23:01
fwereade	wallyworld, because I'm still feeling that if we're going to be lazy we should be really lazy	23:01
fwereade	wallyworld, install the packages only when we're actually trying to run a container and find missing deps	23:02
fwereade	wallyworld, separating out the thing that takes the decision on whether to start a provisioner is a great step	23:03
fwereade	wallyworld, so that is a win in itself, no argument there	23:03
fwereade	wallyworld, but smearing the specific-container-related logic out so widely stresses me out a little, because it introduces subtle dependencies	23:04
* thumper sighs		23:05
* thumper pulls apart the threads looking for the issue		23:05
* thumper opens the patient up again		23:06
fwereade	wallyworld, I am cool with the watcher strategy though	23:06
wallyworld	fwereade: sorry, just about finished phone cll, one more sec	23:07
fwereade	wallyworld, and while I would love to have further discussions about SOA I am flagging a little -- so I'm off for a quick ciggie, then a short chat before bed	23:07
wallyworld	ok, ill read you r comments while you kill your lungs	23:07
* thumper frowns		23:08
wallyworld	fwereade: we do only install packages when the first container is needed to be run, so we are really lazy	23:11
fwereade	wallyworld, they're still twitching a bit	23:12
wallyworld	all the container logic is in one place - the containers package. the agent (for setup) provisioner task (for running) calls into that	23:12
fwereade	wallyworld, but codewise the actual invocation of the setup has a very tenuous and distant connection to the actual launching of the container	23:12
wallyworld	set up and launching are separate	23:13
wallyworld	you could use the same argument when we used cloud init	23:13
fwereade	wallyworld, how about if we jammed the setup bits it into broker creation? that feels much closer	23:13
wallyworld	eg just efore this branch	23:13
wallyworld	we always apt-get installed lxc in cloud init	23:13
wallyworld	which is the lxc container set up	23:13
fwereade	wallyworld, indeed I could :)	23:13
wallyworld	that is very distant	23:13
wallyworld	now, all related container logic is at least together	23:13
wallyworld	and the worker task that uses it can call into it	23:14
fwereade	wallyworld, I'm certainly not defending that practice -- it was expedient but rather hairy really	23:14
fwereade	wallyworld, and it caused us problems ;p	23:14
fwereade	wallyworld, I agree it's closer now, and better than before	23:14
wallyworld	the containers package exposes 2 main sematics - setup and management	23:14
wallyworld	and the task uses those 2 main concenpts	23:14
wallyworld	by management, i mean runtime - start/stop etc	23:15
* thumper falls foul to wallyworld's hack on debug levels		23:15
wallyworld	i think we have separate of concerns ok, if not heading in the right direction	23:15
fwereade	wallyworld, it just seems reasonable that (say) broker creation be a decent signal implying the need for setup, ratherthan having broker creation either sane or not depending on the action of distant code	23:15
wallyworld	fwereade: so that implies there's some flag somewhere which records if setup has been done	23:16
fwereade	thumper, heh, I thought that'd cause trouble, but I couldn't think of anything better and you were on holiday	23:16
fwereade	thumper, if you're of a mind to fix it please correspond with davecheney, whose use cases we were trying to support	23:16
fwereade	wallyworld, it's definitely going in the right direction	23:16
* thumper nods		23:16
wallyworld	thumper: i did tell you about it and the need to fix it :-)	23:16
thumper	wallyworld: yeah I know	23:17
thumper	I just hadn't gotten around to it	23:17
fwereade	wallyworld, I'm just complaining because it feels like almost virgin soil, and that we could get further	23:17
* thumper puts it on the stack of shit to fix		23:17
wallyworld	i know, just pressing buttons :-P	23:17
fwereade	wallyworld, however, I remind myself, progress not perfection :)	23:17
fwereade	wallyworld, and as I said I'm pretty happy with how it looks in the context of the followup	23:17
wallyworld	fwereade: understood. the way i see it - we have container setup and management "nicely" packaged. we have stuff that calls it. we can adjust the stuff that calls it	23:18
thumper	heh, oops, found a weirdness...	23:18
wallyworld	fwereade: if you wanted to +1, or +1 with fixes, i can do that today	23:18
fwereade	wallyworld, yeah, I'm just rereading my whining	23:19
wallyworld	lol	23:19
wallyworld	thanks for staying up late todo this	23:19
fwereade	wallyworld, don't suppose I can convince you of a SetSupportedContainers call? I'd still prefer that to the mooted result from UpdateSC	23:20
fwereade	wallyworld, that's the only bit that still feels really wrong, and isn't amenable to easy fixes by virtue of being part of the api	23:20
wallyworld	fwereade: actually, in the current branch, i think i'm going to have to go to that api anyway, at least at the service level	23:20
fwereade	wallyworld, sweet :)	23:20
wallyworld	fwereade: so i did the Add api in isolation	23:21
wallyworld	and then as the implementation has evolved, it needs to be changed i think	23:21
wallyworld	so i could kand what's there, and it will be reworked in my current branch	23:21
wallyworld	fwereade: thanks for +1. i am fully aware it's not yet perfect, but is a step along the way :-)	23:23
wallyworld	and we are delivering new functionality to users	23:23
fwereade	wallyworld, no worries -- and my concern about the Set/Add bit is that if 1.17 goes out with Add it will be waaay more tedious to make Set work without ugliness	23:24
fwereade	wallyworld, so if you're confident that can be proposed today I can handle it	23:24
wallyworld	yeah. i was thinking with Add, we would be less immune to races	23:25
wallyworld	cause i wasn't sure if we would have multiple callers each adding their own	23:25
wallyworld	and add is more robust to that	23:25
fwereade	wallyworld, I'm not so bothered about those, because I feel we can restrict it to a single caller quite naturally	23:25
wallyworld	but i think callling Set will be limited to machine agent at start up	23:25
fwereade	wallyworld, yeah	23:26
wallyworld	that wasn't clear at the time initially	23:26
fwereade	understood :0	23:26
wallyworld	thanks again, go get some sleep :-)	23:26
fwereade	wallyworld, cheers, enjoy your day	23:26
wallyworld	i'll try :-)	23:26
* thumper is stonewalled by kvm tools		23:28
* thumper needs to email robie to get answers		23:28
thumper	perhaps I'll look at the logging stuff while I wait	23:28
wallyworld	thumper: or you could do this for me :-D https://codereview.appspot.com/28190043/	23:34
* thumper goes to have lunch first		23:36

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!