/srv/irclogs.ubuntu.com/2013/11/18/#juju-dev.txt

wallyworldthumper: you around?01:29
thumper wallyworld hey02:10
wallyworldthumper: want a hangout?02:16
thumpersure02:16
wallyworldhttps://plus.google.com/hangouts/_/72cpi2rmduripigvi5lgv18b7c02:16
axwwallyworld: sorry, I guess I took one person's preference on US spelling for policy :)03:16
wallyworldaxw: no worries :-)03:16
wallyworldno need to apologies03:16
axwand certainly no need to apologize03:17
axwhur hur03:17
wallyworldha ha ha03:17
wallyworldaxw: did my review on your bootstrap tools change make sense?03:17
axwyes, just makign some changes now03:17
axwthanks for that03:17
wallyworldnp. just wanted to check you were unblocked03:17
wallyworldthumper: there's a potential issue with checking if a host machine can run a container. we currently support "juju add-machine lxc" which means create a new machine and add an lxc container. in that case, it's not possible at add-machine time to know if a container is supported or not. there was talk at one stage of removing the syntax allowing just "lxc" or "kvm" and forcing <containertype>:<machineid>03:26
wallyworlddo you think we could remove the add-machine <containertype> syntax?03:27
wallyworldsince i think there was feedback we should be explicit about where containers go03:27
wallyworldbut even in that case, add-machine <container>:<machine> may need to block for a while if the machine is not up yet03:27
wallyworldeg if the user has scripted add-machine followed by add-machine container03:28
thumperwallyworld: I think it is reasonable for someone to request it03:30
thumperwallyworld: even though it may not be fully actionalbe03:31
thumperso if there is a pending machine for a particular machine03:31
thumperand that machine on starting says "I don't support containers"03:31
wallyworldi guess we could relfect in status03:31
thumperwe need to be able to put that container into an error state03:31
thumpersince we are operating asyncly03:31
wallyworldyes, ok03:31
thumperwe need to be able to handle this cleanly03:32
thumperor at least03:32
thumpercleanish03:32
thumpero/ axw03:32
wallyworldi can do a retry strategy to allow a little waiting if supported containers not known03:32
axwheya thumper03:32
thumperwallyworld: nah don't do that03:32
thumperjust expect it to be ok03:32
thumperit is up the juju to manage conflicts03:32
wallyworldok03:32
thumperdon't block03:32
wallyworldin many cases, we will hopefully know03:33
thumperack03:33
wallyworldand can error immediately03:33
wallyworldwell, reject the add-machine command i mean03:33
* thumper nods03:48
axwwallyworld: I had to merge cmd/juju/bootstrap_test.go, and some other minor things. I forgot to remove the --source flag from cmd/juju06:21
axwif you want to re-review let me know, otherwise I'll just push and land06:22
wallyworldaxw: otp, but i trust you :-)06:22
axwta06:23
wallyworldjam: yeah, looks like it may be about to hail, go to race and get my son from cricket training06:35
jamwallyworld: try not to get hurt :)06:35
wallyworldwill do :-)06:35
axwjam: I'm looking at that uninstall-script thing again. What do you think about this alternative: store the agent's upstart service name in agent.conf, as well as a list of subordinate services (for the moment, just juju-db)07:33
axwthen we just stop/remove them07:33
axwand rm -fr config.DataDir()07:34
axwno opaque script, so upgrading should be simpler07:34
jamaxw: can't we just derive the upstart service name? We derive it when we create the service in the first place?07:35
jamI will admit I don't know exactly what steps need to be done, I'm mostly just thinking about (a) how much can we just do so that when we change that list it is easy to do so07:36
axwjam: could do that too. we'd need to tell the agent that it's a state server some way other than through the state database07:36
jamaxw: why is that?07:36
jamI thought the idea for manual teardown was that the state machines go down last, so we still have the database a bit before their dead (I think)07:37
axwyes.. hmm, maybe that's ok07:37
jamaxw: at the very least, we determined what jobs we were running at startup, right?07:37
axwI'll need to look at the conditions for ErrTerminateAgent again07:37
axwyes07:37
jamgiven we needed to, ya know, *do* them :)07:37
jamaxw: this *might* even make it clearer for the HA w/ manual stuff. If you add another node, that one may not start out as a state server, but become one later07:38
jamso deciding what needs to be cleaned up just before you do it, *sounds* better to me.07:38
axwhmm true, good point07:38
axwjam: ErrTerminateAgent requires a state conn anyway (makes sense; db err could be transient), so yes, that'd be fine07:39
jamaxw: in general my experience with upgrades and Upstart is that we don't have a good way to change Upstart config once we've installed. So I'd like to avoid putting stuff in at that level. If it looks plausible to have the "this is how I clean myself up" clearly expressed inside the thing that is running, that sounds the best to me.07:40
jamActually, the best is to have the newest thing possible know how that thing should clean up (like Upgrade should do the clean up in the New code, etc) but some of that is tricky to do.07:41
axwjam: I never would've modified upstart config itself, but agent.conf maybe. But anyway, it looks like it's all doable at runtime, without config changes.07:41
axwI'll dig in07:42
jamaxw: so as for the specifics, I don't care if it is a script file that we generate and then run, vs commands we run directly, or whatever07:43
jamI don't quite understand why setUninstallScript has a restore function07:43
jamis that so it happens in a defer avoiding panic conditions?07:43
axwjam: that was just so changes to AgentEnvironment are contained07:44
axwafter Configure returns, the original value is restored07:44
jamaxw: so I think I now understand *what* it does, but I don't quite understand why you don't want AgentEnvironment to stay changed07:47
axwjam: it's only of philosophical value - I prefer input variables to be considered immutable07:47
jamaxw: so I can see where we may not want to mutate what the caller thinks, except if its the whole point of the function :)07:48
jamo07:49
jamaxw: it is suspicious that configure takes and returns a cloudinit.Config which is the same object07:49
jambut it appears to be the whole point of the function to mutate the c that is passed in.07:49
jamotherwise we should copy it, and return a new one07:49
jamwhich I do prefer07:50
jambut it would still be important that we don't unset the thing we just configured07:50
axwjam: AgentEnvironment belongs to the MachineConfig (input), not cloudinit.Config07:50
axwagreed that the c being input and output is odd - I changed that in another branch the other day :)07:50
axw(removed the output)07:50
jamaxw: the only reason you might want in & out is because you want the caller to nil their object if there is an error, but it does seem like you either want an INOUT var or an IN and OUT but not an INOUT and an OUT07:51
jamand if you *really* need the caller to nil, then take an **obj07:52
jamcloudinit_test.go is the only place that doesn't pass it back into the same object (as far as I can tell007:53
jammorning fwereade and dimitern08:04
dimiternmorning08:04
rogpeppemornin' all08:07
axwmorning08:07
axwjam: do you think it'd be horrible to just attempt stopping/removing the juju-db service, and ignore the ENOENT?08:09
axwi.e. no check for state server08:09
rogpeppeaxw: what's the context?08:10
axwrogpeppe: uninstalling mongo (juju-db) when destroying a manual provider env08:10
axwcurrently the machine agent just removes its own upstart config, and exits08:11
rogpeppeaxw: when will it get ENOENT?08:12
axwrogpeppe: if the machine agent is not a state server, then juju-db won't exist08:12
fwereadejam, dimitern, rogpeppe,axw: mornings08:12
rogpeppefwereade: hiya08:12
axwahoj08:12
rogpeppeaxw: i *think* it sounds reasonable08:13
rogpeppeaxw: but i'd've thought it might be just as easy to check for state-serverness08:14
axwyeah it probably is. just looking at the options08:14
axwblind removal is tempting, because it keeps it all in one spot08:15
rogpeppeaxw: which spot is that?08:15
axwrogpeppe: func (m *MachineAgent) uninstallAgent() error08:15
axwcmd/jujud/machine.go08:15
rogpeppeaxw: presumably you could just pass isStateServer into that function (or machine.Jobs())08:16
axwrogpeppe: there's an error condition that the agent deals with  that would cause termination, where the agent wouldn't be able to determine its jobs08:17
axwi.e. the machine entry does not exist in state08:17
axwbut hey, maybe we don't care about nonsense like that :)08:17
rogpeppeaxw: hmm, i wondered if something like that was possible08:18
rogpeppeaxw: in that case, i think just delete and ignore ENOENT08:18
rogpeppeaxw: but...08:18
rogpeppeaxw: how can we know to destroy things if we can't get the jobs?08:19
rogpeppeaxw: don't the jobs arrive in the same reply as the machine life status?08:19
axwrogpeppe: yes. if that returns not found or unauthorized, the agent terminates08:20
rogpeppeaxw: ah, of course08:21
rogpeppeaxw: in which case, i think that ignoring ENOENT is preferable to the alternative (caching locally whether we *did* have state server jobs)08:21
rogpeppeaxw: in a sense, the upstart config *is* that local cache08:22
axwyeah, I'm thinking that too08:22
rogpeppeaxw: my only hesitation is whether there might be something else with a juju-db service that might get annoyed, but i think enough things will break in that case that we can safely ignore the possibility08:23
axwrogpeppe: indeed, I came to the same conclusion08:24
rogpeppefwereade: i think launchpad.net/juju-core/agent/bootstrap.go:123 is crackful and that it should use config.DefaultSeries. what do you think?08:53
rogpeppefwereade: although... hmm, maybe not08:54
rogpeppefwereade: in fact, no i think it's right08:55
rogpeppefwereade: ignore me :-)08:55
jamaxw: As long as what we are getting rid of is *clearly* a juju script (juju-db, juju-machine-0, etc) I think we're fine.08:55
jamwe can't run 2 juju's on a given machine without a lot of other pain08:55
rogpeppejam: yeah. it's a pity, that, really.08:56
jam(You *might* be able to run a unit of one environment and the state server of another environment, but that just sounds terrible)08:56
jamrogpeppe: well, we'd have to put namespaces to do tat08:56
jamthat08:56
axwjam: you can with local, but these changes just won't work with local (which I think is reasonable)08:56
jam/etc/init/juju-env-X-machine-008:56
rogpeppejam: yeah - we'd probably put the env uuid in there08:57
jamaxw: I'd think we'd want local to clean up properly08:57
axwwhy? env.Destroy does that anyway08:58
jamaxw: well, we still want local environments to clean up properly, right? (it may be done in a different layer, but we might want to consider how to avoid redundancy as well)08:59
axwjam: I consider this to be like freeing memory before exiting a process09:00
axwthere may be some use case in the future, but I don't see one right now09:01
axwhandling the local provider with non-standard service names takes us back to modifying agent.conf09:02
axwjam, rogpeppe: https://codereview.appspot.com/28270043 -- take a look, let me know if you think it's worthwhile involving agent.conf to fix the local provider case09:50
mgzmornin'10:00
jammorning mgz10:02
jamstandup time10:45
jamfwereade, rogpeppe, TheMue, https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig10:45
mattywfwereade, thanks for the reviews :)10:54
jamfwereade: you seem to be having connection issues10:58
fwereadejam, ha, even my g+ chats don't seem to be getting though11:02
fwereadeian, ok, will do11:02
fwereadewallyworld, ^11:02
jamfwereade: I got your "isn't that just 2" but that was the last one11:02
fwereadegrar, v quick break, we'll see if it's happier in 3 mins11:07
fwereadewallyworld, fwiw, a watcher for "kinds of containers this machine is expected to run" wouldbe easy, and was the originalplan a while ago11:16
jamfwereade: well we have "containers this machine is runinng"11:16
fwereadejam, I thought that was container-type-specific11:16
jamfwereade: right, that is what we are talking about. making one that is non-container type specific, but just doing "all" and reporting back errors for ones it doesn't support11:17
fwereadejam, wallyworld: isn't the simplest way to do that to launch a provisioner task with a broker that always just errors on provisioning?11:20
jamfwereade: the concern is that you're launching 5 different provisioners that will never do anything11:20
jamand a lot of duplicate code11:20
jamwhy not just run 1 that can handle N container types11:20
fwereadejam, because watcher->broker is a simple clean chunk of functionality that already exists11:21
fwereadejam, that is the *point* of a provisioner -- it watches a specific set of machines and provisions them using a specific broker11:21
fwereadejam, adding multiple brokers into the mix complicates that unnecessarily11:21
fwereadejam, compared to starting one provisioner for each kind of machine, and using a, ha, "null broker" when that machine kind is not known11:22
jamfwereade: but why multiple watchers?11:22
jamwhy not watch all possible containers?11:22
fwereadejam, to avoid complicating the provisioner task, mainly11:23
jam(it may be the internal DB structure don't support it well)11:23
fwereadejam, I'm not sure it's in a great position to have 1->N-ness poked into it11:23
jamfwereade: but what would an LXC provisioner do differently than a KVM one?11:23
jamthe commands are different, but that is a lower level11:23
fwereadejam, talk to a different broker, where the two brokers are independent and needn't blockone another11:24
jamfwereade: overloading your system because you start an LXC and a KVM and an OpenVZ and a doesn't seem a better User Experience :)11:25
jamI agree in the external vs local provisioning case11:25
fwereadejam, seriously, a provisioner overloads the system?11:25
fwereadejam, if that's actually the case then fair enough11:25
jamfwereade: starting up a container does11:25
jamapt-get update11:25
jamstarting 10 of them is actually quite bad (from reports I've seen)11:26
jamsomone did the local provider and really hosed his sytem11:26
fwereadejam, sure, that was jorge doing deploy -n 50, but that was still with one single provisioner in play, so I think not germane11:26
fwereadejam, the problem was that he asked for his system to be overloaded11:26
fwereadejam, and besides the provider/container distinction I think holes your argument -- you're asking for two kinds of provisioner, a 1->1 and a 1->N one11:28
fwereadejam, a single provisioner, that maps from machine-set to broker, seems like the clearest model11:28
jamfwereade: I personally don't see why machine-set needs to split by type11:29
jamI guess11:29
fwereadejam, it doesn't *need* to be, but doing so extracts extraneous functionality from the provisioner, which doesn't need any additional complexity imo11:30
wallyworldjam: fwereade: so at the moment, there's an lxcBroker which acts as a provisioner task, as well as a cloud instance provisioner task. i'm sure we'll iterate to get the best model, whether that's one provision task for all container types or one per container type. the kvm provisioner code is not done yet. let's see how it falls out. in the meantime, we will achieve the required user facing functionality wrt containers and all that11:37
wallyworldie users will be able to start supported containers, and get sensible errors if they try and start non-supported ones11:37
fwereadewallyworld, ok, that's cool -- I'm just saying I will get a bit shirty if the kvm code requires a single kvm-specific line in the provisioner task itself ;p11:38
wallyworldfwereade: as will i. don't fret too much. all i'm saying, it's a work in progress :-)11:39
fwereadewallyworld, don't worry, I won't, I know you know what you're doing :)11:39
wallyworldsometimes :-)11:39
wallyworldfwereade: one of my aims is to get rid of the switch statements in the current provisioner so that the kvm/lxc logic is isolated behind an interface11:40
* fwereade cheers at wallyworld11:41
wallyworldscaling as discussed in the backscroll is a valid concern. we will have to look at that also as part of the solution11:41
wallyworldand there's always a trade off between conceptual complexity, number of moving parts etc11:42
jamfwereade: so the other logical thoughts are "what about new types", having to add yet-another-thing to monitor for another type that is normally not doing anything, etc.11:47
jamwe know today that people want openvz, vagrant, vmware, ...11:47
fwereadejam, that it the last thing I want11:47
jamfwereade: so the nice thing about having 1 "ContainerProvisioner" is that it can also not think about types it doesn't know, but it can still say "I don't know about that type, so here is your error", rather than nothing listening for the OpenVS container type, so when you go to deploy it just sits in pending forever.11:48
jamIt seemed to smooth things out to have a generic one11:48
fwereadejam, the machine agent asks for the types of the containers it's meant to be running; starts provisioners with appropriate brokers for those it understands, and null-broker provisioners for the ones it doesn't11:48
jambut it does depend on how things align11:48
fwereadejam, null brokers just error on StartInstance11:48
jamfwereade: sure, though that does mean Machine Agents run N watchers and N brokers for however many types that we might support11:49
fwereadejam, no, it means they run one provisioner for every container type they are currently using11:50
fwereadejam, plus one task that starts/stops them11:50
jamfwereade: "and null-broker provisioners for the ones it doesn't" is still N11:50
fwereadejam, it's a very small N compared to the number of possible container types out there11:51
jamI'm not sure I follow.11:51
fwereadejam, it's only for those cases where someone deployed an invalid container before the machine came up and was able to setits supported types11:51
fwereadejam, most machines just run a container-types watcher11:52
fwereadejam, no containers? no types, thus no provisioners11:52
fwereadejam, kvm and lxc and vagrant containers added? start 3 of them11:52
fwereadejam,one of which is null11:52
jamfwereade: I do finally see, though I don't think that has been reflected in the discussions so far.11:52
jamwallyworld: does that make sense to you/ ^^11:53
fwereadejam, although with a bit of cleverness in state we should be able to auto-error the unsupported ones *anyway* I think11:53
* wallyworld reads11:53
jamI haven't heard about the container-types as a separate thing being watched.11:53
fwereadejam, that was the original idea a while back11:53
jamfwereade: wel, defense-in-dept is always useful. (how will this thing act if we poke something that could be construed as invalid)11:53
wallyworldjam: yes, it makes sense as that's how i've implemented it. when a machine agent starts up, it determines what container types the host can support11:54
fwereadejam, that's a benefit of having a null broker for unknown ones that might slip through11:54
wallyworldit then watches for containers of those types to be requested11:54
wallyworldonce a container of a type is first requested, a provisoner task is started11:54
fwereadejam, in the ideal case we should be able to squish those weird ones before they even make it to the agent anyway11:55
wallyworldthe provisioner task for a container type may well then do other stuff to prepare for the firt contaner of a type to start11:55
fwereadejam, the types watcher implementation on the server side could filter those out and error them itself, or possibly the SetSupportedContainers stuff could do so itself with a bit of nasty state prestidigitation11:56
rogpeppenatefinch: any chance of seeing your proposed package interface, please?12:14
natefinchrogpeppe: yeah, sure12:16
rogpeppenatefinch: godoc output would be ideal12:16
* fwereade lunch12:16
natefinchrogpeppe: let me just whip up some godoc comments :)12:17
rogpeppenatefinch: always write your doc comments before writing the code :-)12:17
natefinchrogpeppe: :)  I usually do... this was kind of exploratory coding, so I didn't.  *shrug*12:19
rogpeppenatefinch: np12:19
rogpeppenatefinch: at this point i'm mostly interested to see if you've got AddPeer or SetPeers12:20
natefinchrogpeppe: add and remove12:20
natefinchrogpeppe: I could do set, that's actually easier than add or remove12:21
rogpeppenatefinch: i'm wondering if set might be more appropriate for our use case12:21
rogpeppenatefinch: although you might want to leave add and remove since you've already implemented them12:21
jamfwereade, wallyworld: your models don't actually match. wallyworld starts by introspecting the machine, fwereade has a list of requested-container-types that you watch12:21
jamso once you have the list, then they mostly match12:21
wallyworldi introspect the machine to determine what container types are supported, then watch those only12:22
jamwallyworld: which is *not* watching a list of requested container *types*12:22
jamand it *is* starting watchers for each type the machine might support12:22
jamrather than only ones that have already been requested12:23
wallyworldtill the first container is requested, it's a strings watcher for the first container instance yes12:23
jamthats the key bit that I was missing at least. It may be that I'm misunderstanding the things you've said, but I do feel there is a communication gap between what fwereade is actually describing and what you have12:23
wallyworldwhich then starts a suitable provisioner task12:24
jamwallyworld: "first container instance" ?12:24
wallyworldafaik, all we have at the moment is the ability to call WatchMachineContainers12:24
wallyworldwhich triggers whenever a container is added to a machine12:24
jamwhich requires a list of container types, right ?12:24
wallyworldyes12:25
jamwallyworld: so what fwereade is describing, is another watcher12:25
jamwhich is watching the list-of-requested-container-types12:25
jamrather than the list-of-known-supported-types12:25
jamso we would still have a startup "these are the types I know how to support"12:25
wallyworldi currently call WatchMachineContainers - what it gives when it triggers is the container ids12:25
jamwe actually ask back "and what types would you like me to run"12:25
jamwallyworld: right, that is something we *also* need, but fwereade is giving us a layer where we don't start provisioners until the list of *requested* containers now contains them12:26
wallyworldthat's right, i only start a provisioner when the first conainer of a type is requested12:26
wallyworlduntil then, it's a simple strings watcher12:27
jamso WatchMachineContainers would be run by *each* of the LXCProvisioner and KVMProvisioner, with their personal subset. *But* we don't start one until this other field includes that type in the list.12:27
wallyworldno12:27
jamwallyworld: but what happens when you have another one12:27
wallyworldthe provisioners are not started12:27
jamor one that is a type you didn't probe for12:27
jamor12:27
jametc12:27
wallyworldwell, the machine agent has to know what possible containers to probe for12:27
wallyworldcause there's different initialisation code required for each type12:28
wallyworldso it has to be baked in to the system12:28
wallyworldwe can't suddenly support new container types12:28
wallyworldwithout the code which knows how to set up for that12:28
wallyworldwhich packages to apt-get install etc12:28
jamwallyworld: to give a hypothetical. Wouldn't it be nice if someone could request an OpenVZ which Juju 2.2 knows about, it goes into the DB, but the agent on machine-2 is only running Juju 2.0 and can just say "sorry, container type X not supported"12:28
jamwallyworld: I'm certainly not saying we support things we don't know how to support12:29
jambut what about being able to give error messages about things we haven't heard about before12:29
wallyworldi can certainly modify the code to do that12:29
wallyworldthat would be easy to do12:30
wallyworldit would only be a small tweak12:30
wallyworldactually12:30
wallyworldthat's how it woeks now12:30
wallyworldwhen the machine agent starts up12:30
wallyworldit sets the supported container list12:30
jamwallyworld: and I *think* it actually handles the "you asked for KVM which I know about but I don't actually support that" as well as "you asked for OpenVZ which I know nothing about"12:30
wallyworldand if juju client 2.2 comes along12:30
wallyworldand asks for something new, it will error immediately12:31
jamwallyworld: except if the machine hasn't finished starting yet, which means it will go off into lala land because nobody is checking for a type that wasn't baked in.12:31
jamwhich is why the "give me the list of types that have been requested, so I can start things for them, and oh, these ones I don't know about so put them into error state"12:32
wallyworldno, cause i'm still writing that code12:32
jamsimilar to "these ones I know about but don't actually support"12:32
wallyworldand i will be checking for requested stuff that's not supported12:32
wallyworldin fact, the code i have does do that already12:32
jamit isn't hard to say "if I don't know about it, it isn't supported"12:32
wallyworldyes12:32
natefinchrogpeppe: http://pastebin.ubuntu.com/6437236/12:32
wallyworldjam: the code in progress i have iterates over all requested containers, and sets status if not supported12:33
wallyworldso that will pick up new container type XYZ12:33
jamwallyworld: but it only does that at startup time ?12:33
adeuringrogpeppe: could you have a look athis MP: https://codereview.appspot.com/28310043 ?12:33
rogpeppenatefinch: why ...[]ReplsetMember ?12:33
rogpeppeadeuring: looking12:33
wallyworldjam: yes, but the iteration happens after the block has been established to prevent unknown containers from being reuested12:33
adeuringthanks!12:33
jamand aftewards it starts watching for only the types that it does support12:34
rogpeppenatefinch: wouldn't  ...ReplsetMember be sufficient?12:34
jambut it starts watching for *all* things that it might support12:34
wallyworldyes12:34
natefinchrogpeppe: sorry, I just changed that... yeah, that's what I meant12:34
wallyworldjam: but only a strings watcher, not a provisioner12:34
jamthe model from fwereade is to put a Watcher on actually requested types, for those that are requested, start a provisioner watching for containers of that type. When the list of requested types change, the first one first and starts a new provisioner which may be a "not supported provisioner"12:35
natefinchrogpeppe: I always forget the exact syntax for the variable params arrays12:35
wallyworldjam: i don't plan on starting any "not supported provisioners"12:35
rogpeppeadeuring: LGTM12:35
wallyworldno point12:35
jamwallyworld: my point is if we do the fwereade one, we don't ever end up in a race condition where we might not notice when someone requests something we don't support, even if the client is buggy and doesn't actually respect the supported types field12:35
adeuringrogpeppe: thanks!12:35
rogpeppenatefinch: ... replaces []12:35
wallyworldbuggy client, never :-)12:36
adeuringfwereade: could you have a llok here: https://codereview.appspot.com/28310043 ?12:36
natefinchrogpeppe: yeah, I remembered that after you mentioned it.12:36
wallyworldjam: the back end is the thing that looks at the supported types field12:36
jamso while we *could* write it the other way, this way handles the cases we do care about, saves resources internally (doesn't have to even start watchers for supported types that aren't in use), and handles failure more gracefully12:36
wallyworldjam:  the client just gets an error12:36
wallyworldhere's no logic in the client12:36
wallyworldthere's12:37
jamwallyworld: buggy code12:37
jamregardless client12:37
natefinchrogpeppe: I'd add SetReplicas that just takes an array of ReplsetMember and replaces the set in the mongo document12:37
jamanyway, your model can be made to work, the other just seems more robust and actually consumes less resources because you aren't even starting Watchers until one is requested12:37
wallyworldjam: the client asks for a container xyz, that goes to the server side, the server side is the thing that error's12:37
jamyou start A watcher12:37
jamand never N watchers12:37
rogpeppenatefinch: i think i'd prefer to see bools rather than *bools in ReplsetMember, even if it means having another type for marshalling12:38
wallyworldjam: from memory, i start oen strings watcher per supported container type, after the suported container types have been set, preventing new unsupported ones from being rwquested12:38
jamwallyworld: anyway, your design can certainly be made to work. My #1 point is that it doesn't actually match what fwereade is saying, and his does have an interesting benefit.12:38
wallyworldi don't see the benefit just yet12:39
wallyworldmy current implementation doesn't allow unsupported containers, is robust to old clients, and doesn't start unnecessary watchers12:39
wallyworlds/old/new12:39
jamwallyworld: benefit #1, for 90% of all machines that don't run any containers, they start 1 watcher of requested-container-types, even when we support 10 different Virtualization types12:39
natefinchrogpeppe: the only problem with that is that buildindexes defaults to true if unset, which is annoying to do in go marshalling .... doable, but annoying.12:40
jamso you *could* deploy to any of those types (say you are running on MaaS so you have full support for whatever you want). but then you still have only 1 watcher because you aren't actually using any of them.12:40
wallyworldjam:  the current WatchContainer method doesn't take a list12:40
rogpeppenatefinch: then have NoIndexBuilding or something?12:40
wallyworldit just takes a single container to watch for12:41
jamwallyworld: absolutely. It needs code written12:41
wallyworldhence right now I need N12:41
jamthe stuff we have today doesn't match the design fwereade stated we were trying for12:41
wallyworldi could change it yes12:41
natefinchrogpeppe: hrm... I sorta hate to modify the API for replicasets away from the Mongo documentation.12:41
jamwe need another Watcher to watch the list-of-requested-container-types12:41
jamfwereade's claim is that was the design12:41
jamso the data may already be present in the DB12:42
wallyworldjam: so whether we start 1 initial watcher or N, it's essentially the same design12:42
wallyworldjam: not  list-of-requested-container-types, but list of supported container types12:42
rogpeppenatefinch: we're still having the same defaults, so i think it's reasonable. (I think it would be quite unintuitive if the zero value of some of those *bools implied true and others false)12:42
wallyworldno need to watch for those we don't support12:43
rogpeppenatefinch: i think we can be go-idiomatic even while sticking reasonably close to mongo docs12:43
rogpeppenatefinch: even better, we can include links to the relevant parts of the doc12:43
rogpeppes/can/could/12:43
natefinchrogpeppe: I'm just going off the defaults of what Mongo gives you.  Not my faults they're inconsistent ;)  To me, the point shows that they're optional, and the default is whatever mongo says the default is.12:44
natefinchs/point/pointer12:44
rogpeppenatefinch: i think it's going to be really awkward to build a ReplsetMember12:44
jamwallyworld: sure, but if we don't support them they likely won't end up in the requested set either.12:44
rogpeppenatefinch: i'd much prefer if it didn't need pointers to bool12:44
jamwallyworld: the point is to handle failure modes "oh it *did* end up in the requseted set somehow"12:45
rogpeppenatefinch: or float, for that matter12:45
natefinchrogpeppe: I guess pointers to booleans are annoying, it's true.12:45
jamand still be able to respond to it12:45
wallyworldjam: sure, my code does that now12:45
jamrather than just staying in Pending forever12:45
jamwallyworld: sure, but it *also* starts N watchers when there are 0 requested containers12:45
wallyworldby my code i mean the stuff in progress12:45
natefinchrogpeppe: the problem is that the defaults aren't zero for Priority or Votes.12:45
wallyworldjam: it has to start a watcher (or N currently)12:46
jamwallyworld: it *has* to start 112:46
jamyes12:46
jambut 1 != N12:46
wallyworldso it knows when to kick off a new provisoner and prepare for that container type to be deployed12:46
rogpeppenatefinch: tbh, i don't mind if we don't have defaults for those12:46
wallyworldjam: sure, but with the api we have now, i have to start N12:46
jamwallyworld: you were asking for fewer moving parts12:46
wallyworldi can change that12:46
wallyworldi'm just using what we have to get it working12:46
wallyworldin  user visible sense12:47
wallyworldcan iterate behind the scenes once it's running12:47
jamwallyworld: mostly it felt like you didn't quite understand the design fwereade was talking, because you certainly weren't designing it in the same fashion. I wanted to make sure we are actually having the same conversation12:47
natefinchrogpeppe: now who's making it more difficult to create replica members?  As it is, all you have to specify is the Host (actually that's something I shold work out - the Ids of the members are really just their index i nthe list... I probably shouldn't expose them)12:47
jamsteps along the way, as long as we're actually headed in the same direction12:47
rogpeppenatefinch: one possibility is to have a separate type, say MemberSettings, and have a value, say DefaultMemberSettings12:47
rogpeppenatefinch: which holds all the usual defaults12:47
wallyworldbut i am designing it in the same way - just differ initially on the initial watcher12:47
wallyworld1 vs N12:48
wallyworldor at least i think so12:48
natefinchrogpeppe: likely, most people will only use Host as a value.  The rest are rarely used, other than ArbiterOnly12:48
wallyworldjam:  i haven't actually talked to fwereade about this at all, just going by scanning the scrol back12:48
natefinchrogpeppe: (at least from what the documentation says, it seems unlikely they're options that are used very often, since the doc says stuff like "generally you shouldn't change this" etc)12:49
jamwallyworld: you aren't watching a list-of-requested-container-types at all, which is quite different from what he described.12:49
wallyworldthe core concepts though i think are in alignment - doesn't start provisoners until needed, delay host init until needed etc12:49
jamI think after you have a watcher of something requested12:49
jamyou're doing the same thing12:49
wallyworldmaybe you and i differ on terminology12:50
wallyworldi think i am watching a list of supported continer types12:50
jam(15:16:00) fwereade: wallyworld, fwiw, a watcher for "kinds of containers this machine is expected to run" wouldbe easy, and was the originalplan a while ago12:50
natefinchrogpeppe: I like the default value of the struct.. that's not a bad idea.  I just think it's less straightforward than just doing it the way I have it.12:50
jamwallyworld: no, you are starting N watchers on each type that you support12:50
jamwhich is not a watcher of "what types have been requested for this machine"12:51
wallyworldjam: yes - alist of supported contsiner typrs = a list of kinds this is expected to run12:51
wallyworldjam: no 1 watcher on each type = N12:51
wallyworldnot N watchers on each type12:51
wallyworldand that's only because there's no the api to support one watcher for all supported typrs12:52
wallyworldyet12:52
jamso, yes we were talking past eachother a bit12:52
jamN watchers, 1 for each type you support12:52
jamvs12:53
jamwallyworld: you are starting N watchers12:53
jamvs12:53
jamstarting 1 watcher that gets a list of things that it should then start watchers for12:53
wallyworldi don't get the last line, but12:54
wallyworldi would only like to start 1 watcher for the N container types12:54
wallyworldwhen the api supports that12:54
wallyworldit would be a small change to what we do now12:54
jamwallyworld: if I have a doc in the DB, that aggregates all container types requested for this machine, which is just a list of [LXC, KVM], though in the common case is just []12:54
jamfwereade's contention was that "that was the original design"12:54
jamwhich was clear didn't match what is being implemented12:55
wallyworldok, i see now12:55
jamand I was trying to be clear about where he at least thought we were going12:55
wallyworldi watch for cotainer ids12:55
wallyworldand on the first one, start yje provisoner12:55
wallyworldessentially the same thing12:55
wallyworldwith less moving parts12:55
wallyworldcause the db model is simpler12:55
jamwallyworld: except in the common case each machine-agent starts up N watchers, which is resources in the API server12:55
jamto watch for things that will never actually have chanegs12:56
jamchanges12:56
wallyworldyes, but will be one12:56
wallyworldwhen the api is fixed12:56
jamwallyworld: so I was proposing that, but fwereade seemed to think it was bad12:56
jamand prefered the "original design"12:56
jamwhich is why this thread got started12:56
jamso I've at least illuminated where you two differe12:56
jamand I will happily step back and let fwereade and you finish the conversation12:56
wallyworldseems so :-) sorry, i was not getting it at first12:56
jamwallyworld: neither did I. I just got it slightly sooner than you.12:57
wallyworldwell, i've had a few glasses of wine here by now, it's almost 11pm :-)12:57
jamIt wasn't until (15:53:01) jam: wallyworld: does that make sense to you/ ^^12:57
jamthat I got what the difference was12:57
wallyworldi think i skimmed that bit, sorry :-)12:58
wallyworldin any case, i'd like to finish what i currently have - it works, is robust to different client versions12:58
jamwallyworld: np, I didn't catch that you weren't understanding the "starting N watchers, 1 of each type" thing12:58
wallyworldand we can argue about how to tweak it12:58
wallyworldjam: yeah, i would have preferred to have only 1 watcher, but thought getting the new api done would take too much time12:59
wallyworldand i wanted to try to have kvm done for this week12:59
wallyworldcause the new api may we involve back end changes12:59
wallyworldto the watcher infrastructure13:00
jamwallyworld: sure. I think the discussion is just whether we have a higher-level watcher that then fires off these sub ones, or an aggregate watcher across them.13:01
wallyworldyeah. once the provisioner is started, it essentially becomes that watcher13:01
wallyworlds/that/the13:01
jamyeah13:01
wallyworldall the work to add the supported containers db model, and the code to update status etc is essentially independent of the initial watcher thing13:02
wallyworldhence i can get that done and provide the user visible functionality up front13:03
wallyworldand tweak the behind the scenes stuff later13:03
rogpeppenatefinch: why does InitReplicaSet need to be run on the same machine when initially setting up a replica set?13:13
=== gary_poster|away is now known as gary_poster
natefinchrogpeppe: hmm.. I thought it had to be so mongo would know to replicate the stuff in this mongo instance, but it looks like I was wrong.  The docs don't mention that, so I'll remove it.13:15
rogpeppenatefinch: i actually didn't think it was possible to change an existing mongo to use a replica set without restarting it, but presumably you found out a way?13:16
natefinchrogpeppe: oh, no... this code is outside that.  I didn't write the restarting code yet, since that's outside the scope of what mgo can do.  But that's pretty trivial, a couple exec commands13:17
natefinchrogpeppe: which is to say, yes, you have to restart mongo13:17
rogpeppenatefinch: i think that perhaps we can ignore that - we'll perhaps use a different upstart name, and make sure that the old one is removed before ensuring the new one exists.13:18
rogpeppenatefinch: do we actually need an InitReplicaSet call then?13:18
natefinchrogpeppe: you have to do both, the flag on mongo startup and initreplicaset13:19
natefinchrogpeppe: brb13:20
jamnatefinch, rogpeppe: do you have to restart mongo anytime you change the replica set ? It looks like you have to set the startup flag, but we could just do that always, right?13:25
rogpeppejam: no you don't13:26
jamrogpeppe: so http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/ certainly says you can't take a running service and make it HA without stopping it13:26
rogpeppejam: we need to pass in the replica set name, but otherwise i think there's no need to restart13:26
jamunless we default to always starting in a replica set with just 1 entry13:26
rogpeppejam: that's what we'd do, i think13:26
rogpeppejam: unless you can think of a reason that's a bad idea13:26
natefinchjam, rogpeppe:  so are you saying, always start mongo with the flag, but then just don't do replsetinitiate?13:27
rogpeppenatefinch: i think so. i'm not quite sure what your InitReplicaset function is doing though.13:28
jamrogpeppe: http://paste.ubuntu.com/6437458/13:28
jamah, "we'd do"13:29
jamnot "thats what we already do"13:29
natefinchrogpeppe: its what actually sets up the replica... it passes in the list of replicas.  I don't know what it does behind the scenes.  I can test what happens if you start mongo with --repl and try to use it as an individual database13:29
rogpeppenatefinch: from an API user's p.o.v., i'd prefer not to have to call InitReplicaset ever13:30
jamnatefinch: so it sounds like we'd really like to do all of that work in "bootstrap-state"13:30
jamso that we have *a* replica set with just 1 entry13:30
jamnatefinch: rs.initiate() takes an *optional* configuraiton13:30
jamwhich sure sounds like the intial value can just be 1 node13:31
natefinchjam: good point13:31
rogpeppemy experiments seemed to show it worked fine with just one node in the replica set13:31
rogpeppethere's one slight problem though13:32
rogpeppein bootstrap-state, we don't necessarily know the machine's address13:32
rogpeppeor... do we?13:32
jamrogpeppe: again rs.initialize() can just be started without passing in anything13:32
jamlet mongo sort it out13:32
jamwe can change it later when we expand it13:32
rogpeppejam: it might not sort it out correctly13:32
rogpeppejam: at least on my laptop, it got the wrong address13:33
jamrogpeppe: there is a warning that you shouldn't use localhost for a member unless all entries are on localhost13:33
jamrogpeppe: *but* couldn't we do that when expanding?13:33
jamcertainly it doesn't matter when there is no entries13:33
rogpeppejam: yeah, you're probably right13:33
jamwell, when there are no *other* mongod's13:33
jammongods ?13:33
jamnatefinch: extra exciting is that the docs *explicitly* say you shouldn't use mongorestore to seed the new guys, but you *could* snapshot the filesystem when mongo is in a consistent state.13:35
jamsounds like "stop mongod, snapshot the filesystem, then start it again"13:35
jamwhich is pretty terrible.13:35
jamI'm hoping as long as you haven't written data you don't *have* to13:36
jamit just needs a really long oplog13:36
jamyeah, while earlier in http://docs.mongodb.org/manual/tutorial/expand-replica-set/ it says "you can seed it this way", the next section says "you should not have any data already"13:36
natefinchjam: It should just sync13:37
natefinchjam: yeah, adding replicas with data in them would be bad13:37
jamso a little Schizophrenic13:37
natefinchbrb, diaper13:37
jamnatefinch: so it sounds like as long as you have an exact FS snapshot, then you'd be ok13:37
jamprobably mongodbrestore doesn't preserve some oplog property13:37
jamit does sound like you just want to start empty13:38
jamI don't think we want to try producing a stable snapshot to copy on our own13:38
rogpeppenatefinch: here's an idea for a possible package interface. it's pretty close to what you have now, semantically, but with somewhat different names: http://paste.ubuntu.com/6437493/13:39
rogpeppenatefinch: need to go to lunch, back soon13:40
jamrogpeppe: interestingly, it does look like you have to add replica set members one by one, and they must already be started13:41
natefinchjam: pretty sure I've tried adding them before they were started, but I should double check13:42
jamnatefinch: it does appear that you could set the configuration13:42
jamand then they should come online "by magic" as the command doesn't look to be synchronous13:42
jambut the docs certainly tell you to start them first13:42
jampresumably you could add them, and it would just go into non-quorum state13:43
jamwhich might be pretty bad if you are going from 1-313:43
jam1 to 313:43
jamrather than adding 1, waiting for the sync to finish, then adding another13:43
natefinchjam: isn't 2 a problem either way?13:44
jamnatefinch: might be worth trying. create a bunch of data, start 1, add 2 and see if it accepts more data while it is bringing 2 and 3 up13:44
jamnatefinch: so if you start with 1, and add 1, you still work, though if either one fails you've lost quorum13:44
jamI tihnk13:44
jambut at least it should still take writes (i would think)13:44
jamas it knows it is the elected master13:44
jamif you start 1 and add 2d13:44
jam213:44
jammaybe it is still true13:44
jamthat it knows it is the master by election process13:44
jamworth trying to see if adding 2 immediately puts it into "unavailable until sync is don"13:45
jamdon'13:45
jamdone13:45
jamnatefinch: anyway, if you add 2 and they aren't up yet13:45
natefinchjam: it definitely says it re-elects when you remove a replica, but doesn't say it does when you add... so it's possible it'll just work13:45
jamit should refuse writes13:45
jam"should"13:45
natefinchjam: yeah, lotta shoulds.  I'll do tests and figure out what it *does* do :)13:46
jamthey may get put in some sort of "pending" nodes that don't actually change quorum13:46
jamnatefinch: hopefully you don't have to test across version permutations13:46
natefinchjam: versions of mongo?13:46
jamnatefinch: right13:47
natefinchjam: well, they just released 2.4 recently, and it looks like they're on about an 18 month cycle, so I think we're good for a while13:48
jamnatefinch: http://engineering.foursquare.com/2011/05/24/fun-with-mongodb-replica-sets/ is interesting, though I don't think we'll actually be setting up hidden backup nodes13:48
jamnatefinch: except we're running 2.2.? in production today :)13:48
natefinchjam: oh.13:48
jamso we know we need at least 2 version13:48
jamversions13:48
natefinchjam: I guess testing 2.2.x and 2.4 is probably a good idea.  Are we likely to start using 2.4 soon?  What determines that?13:49
jamnatefinch: #1 thing is what version will be in Trusty13:49
jambut I'm quite sure we're stuck with 2.2 for precise->saucy for a while13:50
jamif we want to go to 2.4 we probably have to get it into trusty real-soon-now13:50
jamjamespage: ^^ do you know the plans for upgrading MongoDB version? I'm guessing we don't want to be using mongodb 2.2 in 3 years13:50
jamanyway, dinner time here13:51
jamsee you all later13:51
jamespagejam: trusty already has 2.413:51
jamespageso did saucy13:51
natefinchjamespage: cool, thanks13:51
jamespagenatefinch, np13:51
jamnatefinch: so in other words, we already deploy to 2.2 and 2.413:52
jamgiven ppa:juju/stable is running 2.213:52
jamfor P13:52
jamjamespage: unless my "apt-cache madison" is lying somehow :)13:53
jamespagejam: yes - but juju auto-adds cloud-tools which contains 2.413:53
rogpeppenatefinch: do you know if there's any way of asking whether replica set members are up to date with the log?13:58
=== adeuring1 is now known as adeuring
adeuringjam: could you have another look here: https://codereview.appspot.com/28310043/ ?14:09
natefinchrogpeppe: I'm not sure14:11
rogpeppenatefinch: i've just found it14:12
rogpeppenatefinch: http://docs.mongodb.org/v2.2/reference/replica-status/#repl-set-member-statuses14:12
rogpeppenatefinch: what do you think of my proposed package interface, BTW?14:12
rogpeppenatefinch: i tried to formulate it from the top down as something i'd like to use rather than from the bottom up14:13
rogpeppenatefinch: http://paste.ubuntu.com/6437493/ in case you missed it14:13
natefinchrogpeppe: saw it14:13
natefinchrogpeppe: mostly looks good to me. The one problem I have with memberdefaults is that if anyone just constructs members to pass in without noticing they should use the defaults... they'll get pretty bad defaults (no votes, 0 priority, and no indexes)14:18
rogpeppenatefinch: yeah; i think that's ok though. the defaults are there and obvious.14:19
natefinchrogpeppe: hrmph. It's not horrible, but not my favorite thing.  The defaults on the struct are not what the struct actually defaults to.14:22
sinzuifwereade, rogpeppe: can either you of help me triage this bug? Do we is it really in Juju? Do we commit to fix it in the next 6 months? Bug #125096514:24
_mup_Bug #1250965: Loopback mounts do not work with local provider <local-provider> <juju-core:New> <swift-storage (Juju Charms Collection):New> <https://launchpad.net/bugs/1250965>14:24
rogpeppenatefinch: yeah, maybe better to leave out the "defaulting to" remarks and just rever to MemberDefaults in the Member doc comment14:24
rogpeppesinzui: looking14:24
* rogpeppe looks up "loopback mounts"14:25
rogpeppesinzui: by my very limited understanding of the issue, it looks like something we could probably fix soon and easily14:27
rogpeppesinzui: and that we should do14:27
sinzuithank you!14:27
rogpeppesinzui: but there might be security or other issues that i'm not aware of14:28
sinzuiunderstood.14:28
dimiternrogpeppe, ping14:46
rogpeppedimitern: pong14:46
dimiternrogpeppe, what's the preferred way to get an the environ from an api connection?14:47
rogpeppedimitern: juju.NewConnFromState14:47
dimiternrogpeppe, if I use NewAPIClientFromName I only get the api client, not the underlying APIConn, which has both state and environ14:47
rogpeppedimitern: best to avoid the necessity if possible though14:48
rogpeppedimitern: which call is this that needs it?14:48
dimiternrogpeppe, I need something like NewConnFromState, but connecting to the API and returning the APIConn14:48
dimiternrogpeppe, upgrade juju needs an environ in order to call FindTools with it14:49
rogpeppedimitern: oh, i see, as an agent14:49
dimiternrogpeppe, as a client14:49
dimiternrogpeppe, right now conn.Environ is used to get the environ in the command14:50
rogpeppedimitern: cfg, err := st.EnvironConfig(); env, err := environs.New(cfg)14:50
dimiternrogpeppe, ah, ok, so I can call client.EnvironmentGet() and use that to construct and environ object14:51
rogpeppedimitern: yeah14:51
dimiternrogpeppe, cheers14:51
rogpeppedimitern: although...14:51
rogpeppedimitern: we might possibly want to provide a way for a client to find tools without necessarily providing them with the whole environ config14:52
rogpeppedimitern: so there may well be an argument for a new API call here14:52
rogpeppefwereade: what thinkest thou?14:52
dpb1fwereade: ping14:57
dimiternrogpeppe, I realized I don't need to implement anything else than client.SetEnvironAgentVersion() in the API, and use EnvironmentGet() initially14:57
rogpeppedimitern: sounds good14:57
jcsackettsinzui, abentley: either of you free to look at https://code.launchpad.net/~jcsackett/charmworld/better-jobs/+merge/195443 ?15:05
abentleyjcsackett: sure.15:06
jcsackettalso, do we have a new "ping the team" word, since we're not orange anymore?15:06
jcsackettthanks, abentley.15:06
abentleyMaybe juju-qa?15:06
abentleyjcsackett: It looks like you've added tests for your github changes, but not askubuntu.15:10
jcsackettabentley: that's true, since i didn't think it was really changing askubuntu execution.15:10
jcsackettabentley: oh wait, the backoff thing should have a test.15:11
abentleyjcsackett: That's what I was thinking.15:11
jcsackettabentley: dig, i'll add that.15:11
sinzuifwereade, I see a report using the 1.16.4 potential client has a problem when the state-server is 1.16.3. ERROR no such request "DestroyMachines" on Client.15:33
* sinzui is attempting to reproduce15:33
fwereadesinzui, oh *shite*15:33
fwereadesinzui, ofc it's reproable, I am an idiot, I even thought of it and then forgot it15:34
* rogpeppe sees 1.16.5 arriving pronto15:34
fwereadesinzui, *unless* I convinced myself that it was an expected and transient error15:34
fwereaderogpeppe, that'll have the same problem15:34
sinzuifwereade, we do not normally see this in tests because they assume you are savvy enough to upload your tools if you have a release candidate, or we have release the actual tools15:34
rogpeppefwereade: unless 1.16.5 rolls back some client changes i guess15:35
fwereaderogpeppe, and thus rolls back the bugfix15:35
rogpeppefwereade: the bugfix can't apply client-side? i guess not unless you factor out stuff to statecmd15:36
sinzui1.16.4 is not out, We are going to release today I think.15:36
rogpeppefwereade: sorry, i should have thought of this in my review15:36
sinzuifwereade, caribou reported the issue.15:36
fwereaderogpeppe, or tangles the source tree by introducing a 1.16-only statecmd bit15:36
sinzuiI gave him the script that make a package15:36
fwereadesinzui, all praise to caribou15:36
rogpeppefwereade: i'm not quite sure what you're thinking of there15:37
fwereadesinzui, I don't suppose it's reasonable to ask people to upgrade both server and client if they want the bugfixes?15:37
fwereaderogpeppe, the more 1.16 diverges from the shape of trunk the harder it will be to maintain -- I don't want to make that experience suck until we have 1.18 out, at which point we needn;t worry about 1.16 so much anyway15:38
sinzuifwereade, I consider a bug if the client ever selects a newer server.15:38
fwereadesinzui, yeah, normal use will lead to breakage15:38
rogpeppefwereade: ah, bit==piece, not 1-or-015:38
sinzuifwereade, I do think it is reasonable to say upgrade your client, then upgrade the server15:38
* fwereade kicks himself around a bit15:39
jcsackettabentley: tests are pushed up.15:39
fwereadesinzui, new server with old client still works, but doesn't allow for --force, right?15:39
fwereadesinzui, it's just old server with new client?15:40
sinzuifwereade, I am not sure, caribou has stepped away for a bit15:40
abentleyjcsackett: This also adds remove_server_start_time.py.  Is that deliberate?15:42
sinzuifwereade, this is the background I have about the issue: after the report of the error: http://pastebin.ubuntu.com/6438018/15:43
abentleyOh, I guess that's a merge.15:43
abentleyjcsackett: r=me.15:44
sinzuifwereade, "even without the --force it fails"15:53
sinzuifwereade, basic "juju terminate-machine 1" fails with the message mentioned previously15:53
fwereadesinzui, I think it is clear -- I backported the DestroyMachines and DestroyUnits API methods to 1.16.4, so 1.16.3 client still works by talking direct to the db15:55
fwereadesinzui, but 1.16.4 client expects the APIs to exist, and a 1.16.3 server does not have them15:55
fwereadesinzui, FWIW this will also break destroy-unit in the same circumstances15:55
sinzuifwereade, This issue might also be alleviated with "best practice". I have advised "juju upgrade-juju --version=1.16.4" to be clear about putting everything on the same version15:56
fwereadesinzui, well, if we can be very clear about it in the release notes, it *does* expose very useful new functionality15:57
rogpeppenatefinch: here's the replicaset package interface suggestion with status added, FWIW: http://paste.ubuntu.com/6438102/16:06
jcsackettabentley: thanks.16:08
dimiternfwereade_, rogpeppe: upgrade-juju + api https://codereview.appspot.com/21940044/ PTAL16:16
natefinchrogpeppe: reading it16:20
natefinchrogpeppe: are you running Mongo 2.2 or 2.4?17:01
rogpeppenatefinch: 2.2.417:01
rogpeppenatefinch: ah, i was looking at the 2.2 docs when i was doing that package description17:02
natefinchrogpeppe: yeah, figured. I was poking at mongo and noticed some more info in status, but it must be added in 2.417:03
rogpeppeTheMue: ping17:12
* fwereade_ will bbl17:12
TheMuerogpeppe: pong17:12
rogpeppeTheMue: i've just been looking at https://codereview.appspot.com/24040044 again17:12
TheMuerogpeppe: yep17:12
rogpeppeTheMue: it still doesn't seem quite right to me, unless i'm missing something17:12
TheMuerogpeppe: ok, I'm listening17:13
rogpeppeTheMue: if a connection drops, what cleans up the pingTimeout?17:13
TheMuerogpeppe: if it drops Ping() isn't called, so the timer isn't reset, after 3 minutes there's a timeout which calls the passed action. and here rpcConn.Close() is called, which also call Kill() on the root (it implements the killer interface, but that already existed)17:15
TheMuerogpeppe: in the inital code Ping() already existed, but with no code inside, only a comment17:15
natefinchrogpeppe: I'm going to go with your suggestion and move the code I have over to it (it's really just some minor changes). I don't have the status code written, but that should be easy.17:15
rogpeppenatefinch: cool, thanks.17:16
natefinchrogpeppe: one thing - is it really that useful to return maps of statuses and members17:16
rogpeppeTheMue: so if a client drops a connection, the goroutine will remain around for up to 3 minutes. that seems a bit wrong to me - surely we can clean it up?17:17
rogpeppenatefinch: i dunno, i wondered about that17:17
rogpeppenatefinch: it nicely suggests the fact that there's only one entry per address17:17
rogpeppenatefinch: and it *might* work out more nicely in the actual agent code17:18
TheMuerogpeppe: eh, until those 3 minutes are done we're not sure that the connection is dropped (or the agent on the other side is just blocked)17:18
rogpeppeTheMue: what if the client explicitly drops the connection?17:18
natefinchrogpeppe: seems like it just makes it a little more annoying to iterate... it's also a difference from the way the data is input.  It's not too hard to construct a map from a list if you need a map... it just doesn't seem like it actually fits the data model (other than, yes, there's only one per host... but that's generally more useful on input than output)17:19
TheMuerogpeppe: how is apiserver notified about that today?17:19
rogpeppeTheMue: the Kill method is called17:20
rogpeppenatefinch: the scenario i'm thinking about is you get info, then you want to see how the info corresponds with info you already hold.17:22
rogpeppenatefinch: but if you really think it doesn't fit very well, then slices could be fine, probably17:22
TheMuerogpeppe: ok, so I should stop the goroutine here too, but as Kill() is called in Close() I have to ensure that it doesn't deadlock17:23
rogpeppeTheMue: yup17:23
TheMuerogpeppe: do you add a note in the review? otherwise I'll do it ;)17:24
rogpeppenatefinch: having CurrentMembers return the same thing as is passed to replicaset.Set and Add seems like a reasonably strong argument for returning a slice actually17:24
rogpeppeTheMue: i will17:24
TheMuerogpeppe: great, thx17:24
natefinchrogpeppe: that was my thinking17:25
rogpeppenatefinch: and CurrentStatus should be similar to CurrentMembers, so yeah, go with slices all round17:25
natefinchrogpeppe: plus, there's only a max of 12 items in the list, so even if you have to do naive N^2 logicm it isn't going to hurt anything17:26
natefinchrogpeppe: cool17:26
rogpeppenatefinch: performance was not part of my considerations17:26
natefinchrogpeppe: going to be out for about an hour.  Turning into a late working day for me, but so be it.  I should have that code all set by EOD, and hopefully some tests too.17:34
rogpeppenatefinch: brilliant, thanks!17:35
TheMuerogpeppe, natefinch: anybody knows that error that made my merge fail: https://code.launchpad.net/~themue/juju-core/054-env-more-script-friendly/+merge/19183817:35
rogpeppeTheMue: sporadic failure17:36
rogpeppeTheMue: just mark as approved again to try once more17:36
TheMuerogpeppe: ah, already wondered17:36
TheMuerogpeppe: thx17:36
sinzuifwereade_, do you have a moment to discuss terminate-machine from new client to old server?17:51
rogpepperight, done for the day18:42
rogpeppeg'night all18:42
rogpeppeoff to see Gravity at the local 3D imax; should be good if the reviews are anything to go by.18:43
mrammrogpeppe: have fun!18:44
natefinchrogpeppe: night.  Supposed to be good, yeah.18:44
=== marcoceppi_ is now known as marcoceppi
thumpermorning19:52
natefinchthumper: morning19:52
=== gary_poster is now known as gary_poster|away
thumpernatefinch: o/19:57
=== gary_poster|away is now known as gary_poster
sinzuithumper, could you read and reply to the message "Geting bug 1222671 into 1.16.4" that I sent to juju-dev20:05
_mup_Bug #1222671: Using the same maas user in different juju environments causes them to clash <cts-cloud-review> <maas-provider> <Go MAAS API Library:Fix Committed> <juju-core:Fix Committed by thumper> <juju-core 1.16:In Progress by sinzui> <https://launchpad.net/bugs/1222671>20:05
thumperhi sinzui20:05
thumpersinzui: it was my understanding that the merge that rog did into the 1.16 branch fixed that20:06
thumperwhich is why I marked it fix committed or fix released in that series20:06
sinzuiFab. Thanks thumper20:06
thumperI may be mistaken, but that is what I thought20:06
thumperfwereade_: you around?20:12
fwereade_thumper, heyhey21:11
thumperfwereade_: hey dude21:11
thumpergot time for a hangout?21:11
fwereade_thumper, how's it going?21:11
fwereade_thumper, sure21:11
thumpergood,21:11
* thumper starts one21:11
thumperfwereade_: https://plus.google.com/hangouts/_/7ecpjvqj508h694vc55hjnqsvo?hl=en21:12
thumperwallyworld: so... we don't have any shared storage any more?21:47
thumperwallyworld: I could remove a config key from the local provider21:47
thumpershared-storage-port21:47
wallyworldnope, cause only ec2 and openstack had it anyway21:47
wallyworldand now we have simplestreams it's not needed21:48
wallyworldi guess so, not sure what shared-storage-port did21:48
thumperwallyworld: we should really fix the tools uploading for the local provider21:48
wallyworldyes21:48
thumperwallyworld: also, fwereade_ wants to chat with you21:48
wallyworldi'm available21:49
fwereade_wallyworld, heyhey21:49
wallyworldyello21:50
wallyworldfwereade_: did you want a hangout?21:51
fwereade_let's have a go21:51
fwereade_google has started hating me again after a goodish week21:51
wallyworldhttps://plus.google.com/hangouts/_/72cpil41gfi1iafo9ljflprqds21:51
* thumper pokes the local provider with a long stick to see if it moves21:57
* thumper opens up the beast again for more surgery21:59
wallyworldfwereade_: google does hate you22:23
wallyworldfwereade_: so, some remaining issues. i thought it best to keep the notion of managing the container dependencies out of the provisioner - those are separate concerns to me. the model is: wait until a container type is required, ensure stuff is set up, then start the provisioner to manage the creation of the containers22:26
fwereade_wallyworld, I'm +1 on that22:26
wallyworldso, the initial watcher does then kill itself once it has done that job22:27
fwereade_wallyworld, what you have does a solid job of starting the appropriate provisioner at the appropriate time22:27
thumperum...22:27
wallyworldcause it has served its purpose22:27
thumperdo we have an initial watcher for each type of container?22:27
wallyworldyes, but only because the api only allows that22:27
wallyworldthe api needs to change22:27
fwereade_wallyworld, I'm just saying that the setup bit is not implemented22:28
wallyworldand that involves uknown unknowns22:28
wallyworldfwereade_: it is in a downstream mp22:28
wallyworldthe next one in the pipe22:28
wallyworldfwereade_: https://codereview.appspot.com/22980045/22:28
* thumper fetches the paddles to shock the local provider back to life22:29
thumperCLEAR22:29
wallyworldlol22:29
wallyworldfwereade_: so, each container package is responsible for knowing how to set itself up so containers of that type can be started on a given host. the machine agent calls out when required to do that and then starts a suitable provisioner. i still want to get the hammer out and fix the provisioner task as previously discussed22:31
wallyworldmy main goal this week is to get kvm supported with what apis we currently have22:31
wallyworldonce that user facing functionailty is delivered, then we can tweak the behind the scenes things to clean it up22:32
* thumper wondered why the local provider wasn't starting the machine22:48
thumpernetwork traffic is spiking22:48
thumperlast line in the log file22:48
thumper2013-11-18 22:44:33 DEBUG juju.container.kvm container.go:32 Synchronise images for precise amd6422:48
wallyworldthumper: we need user feedback on that shit :-)22:49
thumperwallyworld: not sure how...22:49
wallyworldto eliminate the wondering22:49
wallyworldwe need to establish a channel back to the client22:49
wallyworldand the service business logic can pop progress events into that channel22:50
thumpermachine provisioning failed...22:54
thumpernow to figure out why22:55
fwereadewallyworld, everything to do with computers hates me23:00
fwereadewallyworld, thanks for that link though23:00
fwereadewallyworld, for some reason it makes it all look less objectionable23:00
fwereadewallyworld, do you think there's a reasonable evolution that lets us drop all the frickin' switching though?23:01
wallyworldfwereade: one sec.  otp23:01
fwereadewallyworld, because I'm still feeling that if we're going to be lazy we should be really lazy23:01
fwereadewallyworld, install the packages only when we're actually trying to run a container and find missing deps23:02
fwereadewallyworld, separating out the thing that takes the decision on whether to start a provisioner is a great step23:03
fwereadewallyworld, so that is a win in itself, no argument there23:03
fwereadewallyworld, but smearing the specific-container-related logic out so widely stresses me out a little, because it introduces subtle dependencies23:04
* thumper sighs23:05
* thumper pulls apart the threads looking for the issue23:05
* thumper opens the patient up again23:06
fwereadewallyworld, I am cool with the watcher strategy though23:06
wallyworldfwereade: sorry, just about finished phone cll, one more sec23:07
fwereadewallyworld, and while I would love to have further discussions about SOA I am flagging a little -- so I'm off for a quick ciggie, then a short chat before bed23:07
wallyworldok, ill read you r comments while you kill your lungs23:07
* thumper frowns23:08
wallyworldfwereade: we do only install packages when the first container is needed to be run, so we are really lazy23:11
fwereadewallyworld, they're still twitching a bit23:12
wallyworldall the container logic is in one place - the containers package. the agent (for setup) provisioner task (for running) calls into that23:12
fwereadewallyworld, but codewise the actual invocation of the setup has a very tenuous and distant connection to the actual launching of the container23:12
wallyworldset up and launching are separate23:13
wallyworldyou could use the same argument when we used cloud init23:13
fwereadewallyworld, how about if we jammed the setup bits it into broker creation? that feels much closer23:13
wallyworldeg just efore this branch23:13
wallyworldwe always apt-get installed lxc in cloud init23:13
wallyworldwhich is the lxc container set up23:13
fwereadewallyworld, indeed I could :)23:13
wallyworldthat is very distant23:13
wallyworldnow, all related container logic is at least together23:13
wallyworldand the worker task that uses it can call into it23:14
fwereadewallyworld, I'm certainly not defending that practice -- it was expedient but rather hairy really23:14
fwereadewallyworld, and it caused us problems ;p23:14
fwereadewallyworld, I agree it's closer now, and better than before23:14
wallyworldthe containers package exposes 2 main sematics - setup and management23:14
wallyworldand the task uses those 2 main concenpts23:14
wallyworldby management, i mean runtime - start/stop etc23:15
* thumper falls foul to wallyworld's hack on debug levels23:15
wallyworldi *think* we have separate of concerns ok, if not heading in the right direction23:15
fwereadewallyworld, it just seems reasonable that (say) broker creation be a decent signal implying the need for setup, ratherthan having broker creation either sane or not depending on the action of distant code23:15
wallyworldfwereade: so that implies there's some flag somewhere which records if setup has been done23:16
fwereadethumper, heh, I thought that'd cause trouble, but I couldn't think of anything better and you were on holiday23:16
fwereadethumper, if you're of a mind to fix it please correspond with davecheney, whose use cases we were trying to support23:16
fwereadewallyworld, it's definitely going in the right direction23:16
* thumper nods23:16
wallyworldthumper: i did tell you about it and the need to fix it :-)23:16
thumperwallyworld: yeah I know23:17
thumperI just hadn't gotten around to it23:17
fwereadewallyworld, I'm just complaining because it feels like almost virgin soil, and that we could get further23:17
* thumper puts it on the stack of shit to fix23:17
wallyworldi know, just pressing buttons :-P23:17
fwereadewallyworld, however, I remind myself, progress not perfection :)23:17
fwereadewallyworld, and as I said I'm pretty happy with how it looks in the context of the followup23:17
wallyworldfwereade: understood. the way i see it - we have container setup and management "nicely" packaged. we have stuff that calls it. we can adjust the stuff that calls it23:18
thumperheh, oops, found a weirdness...23:18
wallyworldfwereade: if you wanted to +1, or +1 with fixes, i can do that today23:18
fwereadewallyworld, yeah, I'm just rereading my whining23:19
wallyworldlol23:19
wallyworldthanks for staying up late todo this23:19
fwereadewallyworld, don't suppose I can convince you of a SetSupportedContainers call? I'd still prefer that to the mooted result from UpdateSC23:20
fwereadewallyworld, that's the only bit that still feels really wrong, and isn't amenable to easy fixes by virtue of being part of the api23:20
wallyworldfwereade: actually, in the current branch, i think i'm going to have to go to that api anyway, at least at the service level23:20
fwereadewallyworld, sweet :)23:20
wallyworldfwereade: so i did the Add api in isolation23:21
wallyworldand then as the implementation has evolved, it needs to be changed i think23:21
wallyworldso i could kand what's there, and it will be reworked in my current branch23:21
wallyworldfwereade: thanks for +1. i am fully aware it's not yet perfect, but is a step along the way :-)23:23
wallyworldand we are delivering new functionality to users23:23
fwereadewallyworld, no worries -- and my concern about the Set/Add bit is that if 1.17 goes out with Add it will be waaay more tedious to make Set work without ugliness23:24
fwereadewallyworld, so if you're confident that can be proposed today I can handle it23:24
wallyworldyeah. i was thinking with Add, we would be less immune to races23:25
wallyworldcause i wasn't sure if we would have multiple callers each adding their own23:25
wallyworldand add is more robust to that23:25
fwereadewallyworld, I'm not so bothered about those, because I feel we can restrict it to a single caller quite naturally23:25
wallyworldbut i think callling Set will be limited to machine agent at start up23:25
fwereadewallyworld, yeah23:26
wallyworldthat wasn't clear at the time initially23:26
fwereadeunderstood :023:26
wallyworldthanks again, go get some sleep :-)23:26
fwereadewallyworld, cheers, enjoy your day23:26
wallyworldi'll try :-)23:26
* thumper is stonewalled by kvm tools23:28
* thumper needs to email robie to get answers23:28
thumperperhaps I'll look at the logging stuff while I wait23:28
wallyworldthumper: or you could do this for me :-D https://codereview.appspot.com/28190043/23:34
* thumper goes to have lunch first23:36

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!