/srv/irclogs.ubuntu.com/2013/11/12/#juju-dev.txt

thumpero/ wallyworld_00:31
wallyworld_hi00:31
thumperwallyworld_: how's tricks?00:31
wallyworld_ok, just typing stuff into shietveld00:31
thumper:)00:31
wallyworld_for william's branch00:31
thumperwallyworld_: have you looked at fwereade's branch or shall I do it?00:31
wallyworld_just finishing up now00:31
wallyworld_i have a branch of my own that i have tweaked that you could look at00:32
thumpersure00:32
wallyworld_cool :-)00:32
thumperI have another for the kvm prelude too00:32
wallyworld_ok00:32
wallyworld_will look after i finish current one00:32
wallyworld_faaark. just lost all my comments in the review :-( paste buffer error :-(00:34
wallyworld_cause some kiwi distracted me00:34
* thumper looks around for that nasty kiwi00:36
* thumper can't see anyone that matches that description00:36
wallyworld_thumper needs a mirror :-P00:37
thumperwallyworld_: is it the supported containers that needs review?00:38
wallyworld_yeah00:38
wallyworld_thumper:  i knew i would get pushback on nil vs empty. sigh00:38
thumperheh00:38
thumperI think that [instance.NONE] is pretty clear for supported containers00:39
thumperbut I do agree00:39
thumperin general00:39
thumperthat empty != nil00:39
wallyworld_well it's true in python, java, c++ etc etc00:39
wallyworld_everything except Go it seems00:39
thumperwell, they are different in go doo00:40
thumpertoo00:40
thumperbut some people treat them the same00:40
wallyworld_well, sometimes00:40
thumpernil != []00:40
wallyworld_sure00:40
wallyworld_but you can append and iterate using a nil splice00:40
wallyworld_slice00:40
wallyworld_and it behaves just like empty00:40
thumperyeah...00:40
wallyworld_and marshalling stuffs it up00:40
wallyworld_i dowonder what crack the language designers were smoking00:41
wallyworld_not only in this instance00:41
thumpersomeone making opinionated decisions that work *most* of the time00:42
thumperI can see it from their point of view00:42
wallyworld_opinionated decisions is ok if they are sensible00:45
wallyworld_but throwing out decades of established semantic meaning is not a good decision00:45
wallyworld_thumper: so when are have finished +1'ing my branch :-) do you want a quick hangout?00:52
thumpersure for the hangout, not conviced it'll get a +1 yet :P00:53
wallyworld_ok, hangout first perhaps00:53
wallyworld_https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig00:53
wallyworld_when you are ready00:53
thumperwallyworld_: can you not hear me?01:03
wallyworld_no01:03
thumperI can see bars when I talk01:03
wallyworld_i'll try a new hangout01:03
wallyworld_https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.rmi4baa0vuu8vp4qs7eb2pd41g01:04
wallyworld_thumper: ?01:05
* thumper goes on a quick dog walk and school run01:54
wallyworld_thumper: ping for when you return02:25
thumperreturn of the thumper02:40
jammorning thumper and wallyworld_02:48
wallyworld_yello02:48
wallyworld_thumper: i am sad that the provisioner code is full of switch statements - if environ do this else if lxc do that. what sort of non-oo is that?02:49
thumperwallyworld_: it is not full02:50
thumperthere are a few switches in the factory methods02:50
wallyworld_and the loop02:51
wallyworld_but ok02:51
thumperthe vast majority of the provisioning logic is reused02:52
thumpergeez02:52
wallyworld_alright :-)02:54
jamthumper: is it possible to pull the bits that aren't out into an interface rather than using a switch ?02:55
jam(wallyworld isn't the first to see it and wonder)02:55
jamwhich probably means he won't be the last02:56
wallyworld_jam: an interface is what i was hoping for02:56
thumperprobably02:59
jamthumper: fwiw, your container-interface bounced which caused your followup to also get blocked. Not sure if you saw it. The failure doesn't look like a flakey test immediately to me.03:00
thumperok03:09
thumperjam: I'll take a look shortly, so much on...03:10
thumperjam: that is interesting, because the tests all passed locally03:13
thumperjam: I'm going to mark approved to see if it is intermittent like I think it is03:13
thumperjam, wallyworld_: yes it may well make sense to split that shit out into an interface03:14
thumperI probably should have thought of it at the time, but obviously didn't :)03:14
thumperI will leave it as an exercise for the reader (wallyworld_), as a prereq03:14
wallyworld_\o/03:22
wallyworld_not03:22
thumperwallyworld_: aw, c'mon, you love this shit03:25
wallyworld_sometimes03:25
thumperjam: yes, it was an intermittent test failure :(03:29
hazmatis this normal.. not found error on https://streams.canonical.com/juju/tools/streams/v1/index.json  w/ trunk03:43
hazmatlooks like some of the cli flags have been changing around..03:49
hazmatbreaking existing scripts03:49
bigjoolsthumper: mind your language, it's a family channel03:51
thumper:P03:51
hazmathmm.. does trunk work?03:58
hazmattrying to bootstrap right after destroy-env.. http://pastebin.ubuntu.com/6403355/03:58
hazmatif i leave off the upload-tools it hangs indefinitely..03:59
hazmathmm... but if do  a bootstrap without upload-tools after upload-tools it succeeds.04:01
hazmatsigh.. new control bucket and everything works fine.04:09
thumperwallyworld_: plz help hazmat :)04:13
wallyworld_hmm?04:13
wallyworld_new control bucket and it works sounds like an s3 issue04:15
wallyworld_error says tools tarball could not be uploaded to s3 bucket04:15
wallyworld_i'm not sure what juju related issue that might be04:15
wallyworld_i wonder how many times such an error has come up04:16
hazmatwallyworld_, here's another one.. right before that.. where bootstrap hangs forever.. http://pastebin.ubuntu.com/6403423/04:17
hazmatwell 10m, before i killed it.04:17
hazmatchanging to a new bucket and things seem to work again.04:17
hazmatwallyworld_, the odd part i was able to upload tools in a subsequent run to that same bucket before changing buckets.04:17
wallyworld_hazmat: the logs just make it seem like juju asks ec2 to do something (eg write data to a bucket, send data to an instance) and ec2 says no04:19
wallyworld_are we sure it's not just transient ec2 weirdness?04:19
hazmatwallyworld_, dunno, i couldn't reach the aws s3 web console.. to poke around more.. although i can hit it with the cli tools04:20
wallyworld_the stuff that's logged looks like one would expect on the surface eg looking for tools, not finding them, looking elsewhere etc04:21
hazmatwallyworld_, so we end up hitting about 12 urls on bootstrap (4 url x 3 times) is that normal?04:21
wallyworld_we look for tools in control bucket, streams.canonical.com, and for each of those, we look for signed and unsigned metadata04:22
wallyworld_and we try once initially, then upload if needed, then look again04:22
wallyworld_so there are indeed several urls we hit04:22
wallyworld_and we also look for tools mirrors04:23
wallyworld_signed and unsigned04:23
wallyworld_the debug logs can therefore be quite noisy04:23
hazmatic04:24
wallyworld_it can be distracting04:26
wallyworld_but the debug is needed if things go wrong04:26
wallyworld_eg i put tools or image metadata somewhere and it is not used, does juju even look where i think it is looking04:27
=== axw_ is now known as axw
=== axw_ is now known as axw
axwwallyworld_: any tips on getting insight into what mgo/txn is doing? I've added a new Assert-only (no Insert/Update) op, doesn't seem to be doing anything06:24
axwor jam, or thumper...06:25
wallyworld_axw: i've only ever used println etc, i'm not sure what tools there are06:25
jamaxw: I think the only master of txn is Wallyworld, but I have some understanding of how it works.06:25
wallyworld_so i print what is there in the db before the call, and again after06:26
axwmmk. I kinda need to know what's going on inside mongo I think :/06:26
wallyworld_what are you trying to do?06:26
axwstop service/unit/machine adds if environment's (new) life field is not Alive06:26
wallyworld_there are a few notAlive asserts you xcan copy from i think06:27
axwyeah, I've been referring to them06:27
axwit's a bit more complex, because this one has to cater for non-existent life field too06:27
wallyworld_you could try making the assert deliberately fail06:28
axwalso, it's on a foreign doc06:28
wallyworld_ah06:28
axwah yeah, good point06:28
wallyworld_bear in mind, all the asserts are evaluated up front06:28
wallyworld_so if you have a slice of txns06:28
wallyworld_an operation early on will not cause an assert later on to fail06:28
axwthat's fine, that's what I want06:29
wallyworld_it sucks balls actually06:29
wallyworld_for me anyway :-)06:29
axwwell, it suffices anyway :)06:29
wallyworld_anyway, it's a bit of trial and error i've found06:29
wallyworld_start simple, with it failing initially, and build up from there06:30
axwno worries, just thought I'd ask before I head into the hole06:30
axwsounds good, I'll try that06:30
axwta06:30
wallyworld_good luck :-) ask if you get stuck06:35
wallyworld_fwereade: ping?07:04
fwereadewallyworld_, pong07:05
axwwallyworld_: I was comparing integers with strings.. duh. Turns out I can just use the existing assertion anyway, since Alive is the zero value07:05
wallyworld_\o/07:05
fwereadewallyworld_, thanks for the review07:05
wallyworld_fwereade: i hope it made sense07:05
wallyworld_fwereade: i have started some work to delay the start up of the container provisioners until they are needed07:06
wallyworld_it's really my first foray into that tar pit07:06
fwereadewallyworld_, largely, yeah, but jam's comments largely anticipate my own responses07:06
wallyworld_i would not be surprised if what i have splatted out is shitful07:07
wallyworld_i didn't realise about the life issue07:07
wallyworld_seems kinda confusion07:07
fwereadewallyworld_, in particular, we can't necessarily make machines dying07:07
jamfwereade: you seem back to work awfully soon after you signed off to go to sleep :)07:07
wallyworld_it sort of goes against expectations that a machine is destroyed but still "alive"07:08
fwereadejam, ehh, I'm awake now, might slope off for another part of the day :)07:10
fwereadewallyworld_, agreed -- sorry about the currently-crappy model07:11
wallyworld_fwereade: so i was hoping you could look at my shitful attempt to produce something and tell me it's all crackful and i need to start again07:11
wallyworld_it's justa bit of a brain dump so far07:11
fwereadewallyworld_, it would be my pleasure ;)07:11
wallyworld_but i want feedback before i go any further07:12
wallyworld_https://code.launchpad.net/~wallyworld/juju-core/lazy-container-provisioners/+merge/19479507:12
wallyworld_i'm no so sure about the watcher07:12
wallyworld_i know it's a bit of duplication07:12
wallyworld_there's similar code in other packages07:12
wallyworld_but i sorta want to see if it is the right thing to do first and then it can be consolidated07:13
fwereadewallyworld_, would you lbox propose -wip so I can line-by-line it please?07:13
wallyworld_sure07:13
wallyworld_so, the idea is - when machine agent starts, do not start provisoner for containers07:14
wallyworld_use a watcher to see when a container is first requested07:14
wallyworld_and only then start the provisoner07:14
wallyworld_also at start up, update the machine to records the supported containers07:15
wallyworld_so if a user add-machine an unsupported one, it will fail07:15
wallyworld_fwereade: https://codereview.appspot.com/2504004307:15
wallyworld_i have to go to soccer for a bit but will be back in a couple of hours07:15
axwfwereade: I have a WIP incoming for destroy-environment changes that I'd love some feedback on too07:16
wallyworld_fwereade: bear in mind i've not really done much with watchers before so i'm not sure if what i've done is even correct or as per the design of watchers. i've also not run the code or anything yet either07:17
wallyworld_take a number :-P07:17
axw:)07:17
wallyworld_i'll be away from keyboard for a bit anyway :-)07:17
rogpeppe1mornin' all07:27
axwmorning rogpeppe107:27
rogpeppe1axw: hi07:28
axwfwereade: https://codereview.appspot.com/22870045/   -- the title is probably misleading, but when you have some time07:40
axwjust looking for a general in-the-right-direction or not07:41
jamaxw: so machine agents can't actually stop their own machines, right? That has to be done by the Provisioner (or the juju client) because nobody else has the provider secrets.07:49
jamso I'm wondering if the watch needs to be a step higher as well. (Provisioner watches for env destruction, waits a bit (but probably not forever) for things to go to dying/dead and then kills the container/machine)07:50
axwjam: I don't follow07:51
axwthere's a global flag, saying "environment is dead"07:51
axwthe machiner watches for that, and then terminates when it is set07:51
jamaxw: an important step in destroy-environment is to actually terminate the machines07:51
jamthe machiner can't terminate itself07:52
axwjam: right, that bit is still on the CLI07:52
axwwell07:52
jam(can't call EC2.TerminateMachine(my-instance-id)07:52
axwstill done by the provisioner & CLI07:52
jamaxw: so if the CLI is terminating the instances, they won't have a chance to cleanup the shutdown, right?07:52
jamand the mongodb main server is being killed, so do the watches fire?07:52
jamI guess the question is, what waits for what and how long do we wait?07:52
jamI would *tend* to say, if a Provisioner can Terminate a machine, there isn't much point waiting for the Machiner to tear itself down first.07:53
jamwe really like this for the manually registered machines07:53
jambut for the rest, I go back and forth.07:54
axwjam: still not following :)   where do we wait for the machiner to tear itself down?07:54
axwjam: mostly everything works as it does today, with the exception of machine agents being told to shut down just before we return to the CLI07:56
axwfor the manual provider, this is essential to having the state servers tear themselves down07:56
axwfor the others, it's superfluous, as the CLI will kill them anyway07:56
axw(but left in for the sake of generality, and it doesn't really hurt)07:57
jamaxw: so if we don't wait until we know that the machiner's actually got the message, we end up killing mongo, and they'll never see the message.08:01
jamHow do we know when they've gotten the message?08:01
jamand why bother waiting for the ones that we know we're just going to Terminate anyway08:01
jamso *my* model would be something like "for all machines listed as manually registered, wait until the agent shows as dying, and then return to the CLI to nuke everything else"08:02
axwjam: ehh good point, I mucked that bit up. I *had* been destroying all non-state machines previously08:02
jam*except* the caveat from fwereade08:02
jamwhich is that you can't actually set a Machine to Dying until Units on that machine are Dead08:03
jamwhich has been proposed as something we should really fix in the model.08:03
jamaxw: so when would the state machines get terminated?08:03
jamby themselves?08:03
axwjam: yes. DestroyJuju should kill non-state machines after destroying units, and wait for them. Then set environment to Dead, and the state servers see that and kill themselves08:04
jamalso, I'm not a big fan of waiting for things indefinitely, so some sort of "wait for a while for them to notice, but then go ahead and kill everything else anyway" is a fair point for me.08:04
axwI had that in the code before, not sure why I took it out...08:05
jamaxw: so in the HA world, it is possible that only the machine agent that is running the active Provisioner will be able to Terminate machines.08:05
jamaxw: so I like your "DestroyJuju should ..." except for nodes that are Provisioned properly, I'm not sure why we should wait for them to tear down rather than just destroying them. If we have a use case, do it, but otherwise we just slow down destroy-environment.08:06
axwjam: we can make it only for manually provisioned nodes, I was just trying to avoid special casing08:07
axwbut that's the only time it's needed08:07
jamaxw: I'd be okay seeing it in practice, but it *is* nice that today "juju destroy-environment" with 20+ machines can still return in a few seconds.08:07
axwfair enough08:08
jamaxw: now, I could hypothesize a use case08:08
jamlike08:08
jamwhen we have Storage08:08
jamyou may want to cleanly unmount the storage before terminating08:08
jamso that your PG data is safely flushed to disk for the next round of the environment08:08
jamor something like that08:08
jambut it is a bit hypothetical08:08
axwhmm yeah. I think, ideally, environ.Destroy should handle everything still (for cases where the state server is stuffed), but that sort of thing may be outside its reach08:10
jamaxw: I think whatever we do to try and make things a "nice, peaceful death" we should still keep an axe around in case it all goes badly08:10
axwyup08:10
jamthat may become "destroy-environment --force"08:10
jambut I'd like to make sure that normal shutdown is "fast enough" that people don't feel they should habitually supply the --force parameter.08:11
axwyup, fair enough08:11
jamaxw: but I'm mostly just talking through ideas with you. I haven't looked at the specific code08:12
axweasy enough08:12
jamI'm not 100% sure what the use case / model is when you have the ability to pass out credentials to an environment that don't have the EC2 creds directly available.08:12
jamI'm guessing that the security model says that those people aren't allowed to trigger DestroyEnvironment08:12
jamso the fact that the client can't08:13
jamisn't a problem08:13
jamalthough...08:13
jamhosted Juju (aka JaaS)08:13
jamis a bit more unclear on this point. The state machines would run on hardware you *don't* control, but you should have the ability to stop the environment, start a new one, etc. I don't know what the design for that is, TBH.08:14
jamDoes the juju CLI still keep env provider creds around?08:14
jamthese are still hypotheticals, and nothing we have to get right today, but it can be useful to look at some future use cases and see if we align well enough with them.08:15
axwjam: yeah, I'm not sure about how the JaaS story is meant to work exactly. fwereade mentioned that we'd still want to have destroy-environment behind the API (entirely) for that, probably08:16
jamaxw: I do think that one of the designs of that is the Provider creds are always secret from the user (known only inside the JaaS servers)08:17
jamin which case, we do need all the Terminate calls to happen from the Provisioner08:17
axwjam: actually, that should work with what I've coded (when I add destruction of non-state machines  back into the code)08:19
jamaxw: I don't think it hurts to do 2 passes (one more time in the client if we have the appropriate secrets to do so), as Terminate tends to be idempotent :)08:20
axwyup08:20
fwereadeaxw, so the issue I have on first glance at the description is that waiting for all units to clean themselves up makesenv destruction take orders of magnitude longer08:21
jamfwereade: welcome back08:21
jamyou missed a small amount of discussion :)08:21
fwereadejam, ah sorry08:21
fwereadejam, easily precisable?08:21
axwfwereade: heya. we can make that manual-provider specific08:21
fwereade(I did eat some bacon though, that was nice)08:22
jamfwereade: http://irclogs.ubuntu.com/2013/11/12/ though it appears to not be quite up to date with the last 20 minutes.08:22
jamI would guess it polls every X minutes or something.08:22
jamfwereade: I had the same concern08:23
jamfwereade: and could come up with a hypothetical "clean shutdown benefits stuff like storage, or database saving their state" but it was pretty hypothetical08:23
jamfwereade: it also brings up the idea that "we don't have to wait for Machine to be dead, we just care that Manual machines have seen the "please shut down message"08:24
jamfwereade: so if we had a Machine.Dying that actually worked08:24
jamthen we could just wait for that08:24
jamfwereade: and we have "we probably want the State machines to issue Terminate requests before returning to the CLI because we'd like to have that in a JaaS-ish world."08:24
jamand we can deal with the ACL of who has rights to issue destroy-environment as a purely ACL issues08:25
fwereadejam, axw: I'm sorry, I'm having some trouble following08:26
fwereadejam, axw may have already responded to this but I was wondering yesterday whether it made sense to refuse destroy-environment requests when there are exist manually provisioned machines with units on them08:27
jamfwereade: so the question, "Do we wait for all things to cleanup nicely before we Terminate them"08:27
fwereadejam, I don't think we should in general08:27
jamfwereade: for Manually registered machines, we can't Terminate them, so we sort of have to08:27
jamfor the rest08:27
jamI can hypothesize a "backup your stuff to storage we're shutting down" sort of hook08:27
fwereadejam, cattle not pets -- but manually provisioned machines *are* more like pets so we shouldn't abuse and abandon them08:28
jambut we don't have that today08:28
jamfwereade: right. my hypothesis is around a DB charm and Storage implementation that we want clean shutdown so we can bring it up in the next life08:28
axwbrb, making a cup of tea08:28
jamthough you could just say "that must be handled before destroy-env"08:28
fwereadejam, I'm inclined that way for now at least08:28
jamfwereade: right. That was my position as well.08:29
jamfwereade: so I think my statement was "change the environment to dying, wait for manually provisioned machine agents to show up as Dying, and then Terminate the rest of the world"08:31
jamfwereade: with the caveat that we don't have a good Machine.Dying, so we actually have to kill the units and wait for the Machiner to show up as Dead08:31
fwereadejam, I'm not sure what that complexity buys us over just refusing to destroy when we have manually provisioned machines08:32
fwereadejam, in particular I'm not quite sure what shutdown sequence you imagine for those machines08:32
axwfwereade: so, how do we kill the manually provisioned bootstrap nodes?08:33
jamfwereade: my understanding is that what we care about is to uninstall jujud from upstart, etc, on those machines.08:33
jamwhich is something we want anyway, even if it isn't part of destroy-environment.08:33
jambecause otherwise once you've registered machines with An environment, reusing them is going to be a real pain08:33
jamfwereade: certainly I don't think your suggestion is that once an env has ever had any machines manually registered, you can never destroy that env again ?08:34
fwereadeaxw, my concern is with the units,not themachines08:36
fwereadeaxw, we can't just stop units, can we?08:36
fwereadeer that was for jam08:36
fwereadeaxw, re bootstrap machine, good question08:36
jamfwereade: well, why can't we just stop units?08:37
fwereadejam, my suggestion re manual machines is that they need to have their units removed explicitly08:37
fwereadejam, because we'd thus leave the services running08:37
jamfwereade: well, we are in a "terminate the world" condition, there won't be any services in just a few moments.08:37
fwereadejam, but there will08:37
axwfwereade: setting them to Dying causes the uniter to remove them eventually...?08:38
fwereadejam, those machines will keep on running their services forever08:38
jamfwereade: but whatever mechanism exists to actually stop the Postgres instance on that machine is something that we can trigger an DestroyEnv time, no?08:38
jamfwereade: the code that axw put up does have a "destroyUnits" function.08:38
fwereadejam, axw: I'm still going by the description -- but that makes sense *only* for manual machines, right?08:39
axwfwereade: so it goes like this: 1. prevent new units/machines from being added; 2. destroy all units (and non-state machines). 3. tell state machines to destroy themselves08:39
axwfwereade: yes, we can make it manual-only08:39
jamfwereade: "destroys (and waits on death of) all units" is in the description08:39
axwit's not currently, but it can be08:39
jamfwereade, axw: I'm not so sure that we can get an accurate (1) today08:40
jamat least from what fwereade has said about "destroy-machine --force"08:40
* jam goes to grab some coffee, biab08:40
axwjam: I've coded it into add-unit/add-machine transactions, I don't see any problems, but then that was the first time I touched mongo transactions :)08:41
fwereadejam, axw: well, we could (i) destroy all services and (ii) gate add-machine and add-service on destroy-flag-not-set08:41
fwereadejam, axw: is there any other operation that can demand resources?08:41
fwereadejam, axw: add-unit, add-relation are already gated on the relevant services being alive08:41
axwfwereade: yep, that's what I have done. see state/environ.go, state/service.go, state/state.go08:41
axwfwereade: I added it into add-machine and add-unit, since they're the ones that we're waiting on the death of, and they're the ones that create agents08:42
axwI added it into add-service too, but I think that assertion can come out08:43
fwereadeaxw, I think it's a bad idea on add-unit08:43
fwereadeaxw, it's bad enough that all unit additions to a given service have to be serialized08:43
fwereadeaxw, bringing the environment doc into play makes *all* unit additions serialized08:43
axwhmm true :(08:44
fwereadeaxw, hence the destroy-all-services and prevent-add-service suggestion08:44
axwfair enough, I didn't think about that bit08:44
fwereadeaxw, core idea is solid though, I agree08:44
fwereadeaxw, my main concern is not slowing down destruction of non-manual environments08:44
axwfwereade: I will update the code to only do this for manual machines now08:45
axwfwereade: well, I can do it quick and dirty - if instanceId startswith "manual:" is basically all we have to go by at the moment08:46
axwis that acceptable?08:46
rogpeppe1axw, fwereade: i'm somewhat uncomfortable with gating the destruction of an environment on the orderly teardown of every unit in the environment08:47
fwereaderogpeppe1, agreed08:47
rogpeppe1axw, fwereade: (manually provisioned units excepted, probably)08:47
axwrogpeppe1: likewise, I'll only gate it on units allocated to manually provisioned machines08:47
rogpeppe1ok, cool.08:47
rogpeppe1the other thought i had was to split up the process into two parts08:48
rogpeppe1so that you don't need a timeout parameter08:48
rogpeppe1(which is always going to be wrong)08:48
axwfwereade: I'm going to leave in the bit that calls Destroy on each unit/machine, though, to cater for JaaS08:48
axwit just won't wait on them08:48
fwereadeaxw, I think so, in itself -- but I feel like we're putting the cart before the horse here08:48
fwereadeaxw, why would we do that?08:49
fwereadeaxw, in a jaas environment we just nuke all the instances, bam, done, just like today (excepting manual)08:49
rogpeppe1so you could have a "Clean" or "StartDestroy" or something which initiated the process of terminating the manually provisioned machines08:49
axwfwereade: I was thinking we just set them to Dying, and have the provisioner do its job in the background08:49
fwereadeaxw, that will take forever08:49
fwereadeaxw, and block on hook errors08:50
rogpeppe1and then a final "Destroy" which would fail to work if there were any non-dead manually provisioned machines08:50
axwok08:50
fwereadeaxw, I'm not even against what you have in principle08:50
fwereadeaxw, I think it's exactly right for taking down manual machines08:50
fwereadeaxw, but I think it's a serious degradation in the non-manual case08:50
rogpeppe1that way the GUI can watch the status as usual while the manually provisioned machines are destroyed08:50
axwyup, I can see that08:51
fwereadeaxw, and I think we take a good step forward by simply moving a nuke-everything command inside the api server08:51
axwrogpeppe1: sounds fair enough. the CLI has to poll for status tho?08:51
rogpeppe1we could potentially do that in the command line too08:51
rogpeppe1axw: it could use an AllWatcher just like the GUI08:51
rogpeppe1axw: or... hmm, yes, it would need some kind of watcher otherwise08:52
axwfwereade: "the nuke-everything inside the api server" bit can come later, right? :)08:52
fwereadeaxw, if the API call were *just* to do a StopInstances([all-non-manager-machines]) (and abort if manual machines exist) that would move a good chunk of the functionality to where we want it08:52
axwok08:52
axwI can do that08:53
fwereadeaxw, I do like the destroying flag too though08:53
axwfwereade: I'm thinking about what you said about disallowing destroy-env if units eexist, and now I'm starting to change my mind08:54
axwbecause a unit may fail to die08:54
fwereadeaxw, yeah, I'm just thinking we can make progress without solving that -- and it will be easier to solve as we restrict the scope of what we're doing08:54
jamaxw: we could error both ways here, but should a bad hook on one unit keep all the other machines running?08:55
jamfwereade: do you need anything more from us on: https://code.launchpad.net/~fwereade/juju-core/api-force-destroy-machines/+merge/19476408:55
fwereadeaxw, jam: ofc there is *also* independent demand for an ignore-hook-errors mode for unit shutdown08:55
axwfwereade: true, the problem would go away with that08:56
axwjam: ideally not, but we'd need some kind of feedback to let people know at least08:56
axwalright, I'll just put a TODO in to fix it with ignore-hooks08:56
fwereadejam, I'm not sure -- the machine-dying problem is out of scope, and the other bits wallyworld_ flags should be trivial fixes08:57
fwereadejam, my current plan is just to tidy up the clear code problems and resubmit08:58
jamfwereade: I just want to make sure you're unblocked, and still backporting the other fix to 1.1608:58
fwereadejam, I was going to backport both of those after I'd got this landed08:59
fwereadejam, I'll ping you or wallyworld_ for a review once I've had a moment to work on it a bit :)09:00
jamfwereade: also, you wanted to talk about Pinger stuff today instead of yesterday.09:00
jamI don't want to INTERRUPT flood you, though.09:00
fwereadejam, ah, yes, I did read back -- and I think I'm +1 on the idea that each SetAgentAlive creates an independent slot, we record all those slots somewhere in state, and we infer that any of those slots being alive indicates that the agent is09:02
fwereadejam, minimally invasive and rewritey, I think09:02
fwereadejam, the idea of defending against double-pings also has some merit but feels orthogonal09:03
fwereadejam, best to interrupt-flood me in the morning, sometime this PM I plan to go dark and start simplifying some of the state test setup stuff09:03
fwereadejam, because OMG it hurts to write that sort of code09:04
jamfwereade: you're welcome to go dark whenever you need to.09:04
jamfwereade: so when does SetAgentAlive actually trigger? It is the new heartbeat? or just on connect or ?09:04
fwereadejam, I was thinking on connect still?09:05
jamfwereade: I've personally moved on so that my favorite design actually disconnects the agents from the presence table, and we let the API server batch up "here are all the bits for agents I know about, set them all"09:05
fwereadejam, that's fine in principle -- not sure it entirely justifies the effort but I'm willing to let someone closer to the code make that judgment09:06
fwereadejam, the api server is certainly a good place to do all that, and if it's easy it definitely sgtm09:08
jamfwereade: so SetAgentAlive would kick off a new slot for that agent in whatever mechanism we are using (independent Pingers or one global Pinger)09:09
jamfwereade: I can say that a hypothetical 30s ping *10000 agents connected to an api server means you write an "$inc" to mongo every 3ms09:10
jamI guess because of 64-bit limitations on the int field, you ever decrease the number of $inc requests by 6409:10
jamso a constant multiplier which is good, but doesn't actually change your O()09:11
fwereadejam, yeah09:12
jamyou do batch them up nicer09:13
jambut I was hoping for more of a log() sort of improvement.09:13
fwereadejam, bright ideas in that direction will be accepted most gratefully ;)09:13
jamfwereade: well, in mongo 2.6 (whenever that actually releases) we'll have binary operators like $bin setbitX09:14
jamthat doesn't change the load factor.09:14
jambut it does avoid double ping corruption09:14
jamIt does propose the question "is it cheaper to have one request with 50 $inc, or 50 $inc requestts"09:15
fwereadejam, one would hope the former, wouldn'tone09:16
jamfwereade: *I* would hope so, yes. Though if you had distributed master I could see a case where it might parallelize the latter better.09:16
jamfwereade: in my scale testing, I never did get mongo above 100% CPU09:17
jamso we still have to watch out for it becoming the bottleneck09:17
fwereadejam, yeah, it's nice that it isn't today though09:17
jamfwereade: well, it might have slightly been the bottleneck in the add-unit one-by-one case.09:18
jamIt is hard to tell when you have 600% CPU in Juju, and 90+% in mongo09:18
jamand "add-relation nrpe-external-master" just goes into death throws, so I don't know who to blame there. :)09:19
fwereadejam, for that I think we should be moving the add-subordinate logic off the unit agent and into the api server, I suspect09:22
fwereadejam, distributing the logic doesn't actually help anyone in any way afaict09:22
fwereadejam, but that might itself be fiddly09:22
fwereadejam, and there are some benefits to how we have it today09:23
jamfwereade: well when I tested it on m1.xlarge (15GB) with just 1000 machines, we start at about 1GB of Res, and as soon as nrpe-external-master is related, we go OOM09:23
jamso *something* on the API server is consuming *lots* of memory for running this09:24
jamand we only end up with something like 3-4 actual "nrpe-external-master" agents in status09:24
jamthat actually even get into "pending"09:24
jamand I think when jujud restarts after OOM, we end up at 0% CPU on all processes, and nothing making progress.09:24
fwereadejam, I suspect the problem is in the complexity of EnterScope having to add units as well09:25
fwereadejam, if we created appropriate subordinates at AddUnit time, and queued up a bunch of them at AddRelation time, we might have a happier time09:26
fwereadejam, or *maybe* we can just write EnterScope a bit more smartly09:27
jamespagejam, fwereade: not sure whether this has been discussed or not, but are we going to provide any sort of upgrade path for juju 12.04 users to 14.04?09:27
fwereadejam, it was the first complex txn logic I ever write so it's unlikely to be the best09:27
jamespagejust trying to figure out what we need todo re charms for openstack this cycle09:27
jamespage(12.04 -> 14.04 deployments that is)09:27
fwereadejamespage, the assumption has hitherto been that machines on one series will stay forever on that series09:28
jamjamespage: as in, how to take your existing workload and upgrade all of your units so they are running on 14.04 ?09:28
jamespagefwereade, yeah - that was what I thought09:28
jamespagejam: yes09:28
fwereadejamespage, it's a bit crap, but I don't really see us addressing it this cycle09:28
jamespageif its not a supported path thats OK09:28
fwereadejamespage, cool09:28
jamfwereade: and the Charm story means that you can't just upgrade the service, you have to deploy a new service ?09:28
fwereadejam, yeah, the model doesn't really allow for series changes at all09:29
jamespagejam: yeah - that was the complicating factor from my perspective09:29
jamespagethe 12.04 charms support cloud-archive through to havana and up to saucy OK09:29
fwereadejam, apart from anything else, we'd need cross-series charms for it to even begin to make sense09:29
jamespagebut it feels like we have a bit of a clean sheet with 14.0409:29
jamespagei.e. we can drop some of the cruft :-)09:30
fwereadejam, maybe that's a bullet that's better bitten sooner than later09:30
fwereadejam, and it dovetails nicely with the workload-not-os focus09:30
fwereadejam, but it also rather gives me the fear09:30
fwereadejam, it'd demand a good solid block of scheduled time at the very least09:31
jamfwereade: it would be nice if you could say "new units of service X are deployed with this version of the charm"09:32
jamand then you can deploy 10 more 14.04 units, and destroy the original ones.09:32
jamIt is also a nice story for upgrade of charms in general09:32
jameven without a series bump09:33
jamso that you can do a "safe" migration rather than "everything at once"09:33
fwereadejam, sure, but precise/nova is currently considered completely distinct from trusty/nova09:33
fwereadejam, that's what I mean by cross-series charms09:33
fwereadejam, I agree with what you're saying about smooth upgrades regardless09:34
fwereadejam, for os upgrades, even if the charms were the same you'd have to deal with all the subordinates and colocated services too09:35
fwereadejam, doable in theory but an *awful* lot of work in practice09:36
axwrogpeppe1: just thinking about what you were suggesting before: it's not *that* easy, because the last thing DestroyJuju does is tell the state servers to kill themselves09:36
rogpeppe1axw: that's why i'd split it into two09:36
axwso... destroy everything else, wait in the CLI, then finalise?09:36
rogpeppe1axw: that was my thought, yes09:37
rogpeppe1axw: and the finalisation step would fail if the appropriate pieces were not torn down correctly09:37
rogpeppe1axw: (it would probably have a force flag too though)09:37
rogpeppe1axw: then we can give useful feedback in the CLI if we want09:38
axwhmm yes, that would be nice09:38
axwsold09:38
* axw deletes the comment about timeout length09:38
rogpeppe1axw: cool09:38
rogpeppe1fwereade: does the above seem reasonable to you?09:38
fwereaderogpeppe1, axw: I feel like the best first cut is still: (1) abort if manual machines exist (2) destroy services, set destroying flag (3) directly StopInstances for non-managers in the API server (4) environ.Destroy in the CLI09:40
fwereaderogpeppe1, axw: that can then be upgraded to allow for clean shutdown of manual machines, and force-destroy in state of other non-manager ones09:41
rogpeppe1fwereade: what do you mean when you say "destroy services" ?09:41
axwfwereade: ah sorry, I thought you came around to my proposal ;)09:42
axwmk09:42
fwereaderogpeppe1: set dying to prevent more units being added09:42
fwereadeaxw, um, sorry, I probably misread something09:42
rogpeppe1fwereade: ah yes, seems reasonable09:42
rogpeppe1fwereade: is the destroying flag not just Dying on the environment?09:43
fwereadeaxw, I thought we were at least in agreement about where we wanted to end up, maybe there was confusion about the exact steps09:43
axwfwereade: yeah, I can pare it back a bit for the first cut09:43
fwereadeaxw, I'm not so bothered about those so long as none of the steps leaves us with a significantly less performant destroy-environment09:44
axwok09:44
rogpeppe1fwereade: tbh i think that if we're going to call environ.Destroy in the CLI there's not really any point in destroying any services09:44
fwereaderogpeppe1, the service destruction prevents add-unit09:44
fwereaderogpeppe1, without making every add-unit have to check the destroying flag09:44
rogpeppe1fwereade: but if we're killing the state server, add-unit is nicely prevented anyway :-)09:44
fwereaderogpeppe1, add-unit during StopInstances seems like a recipe for races09:45
rogpeppe1fwereade: hmm, you're probably right there09:46
rogpeppe1fwereade: isn't it racy anyway?09:46
fwereaderogpeppe1, "stop all non-manual non-manager instances" is, I think, an atom of functionality that remains useful in most contexts09:46
fwereaderogpeppe1, hmm, maybe? what in particular?09:47
rogpeppe1fwereade: if the provisioner has just seen a unit, a machine can be provisioned just as we're calling StopInstances09:47
rogpeppe1fwereade: i'm not sure there's any way around it09:47
fwereaderogpeppe1, well, StopInstances has to take its instance ids from the machines in state, doesn't it?09:48
fwereaderogpeppe1, (btw I think StopInstances may be a bit broken anyway -- it should surely take ids, not instances?)09:48
rogpeppe1fwereade: no, i don't think it does09:49
rogpeppe1fwereade: (have to take instance ids from the machines in state, that is)09:49
rogpeppe1fwereade: it takes instance ids from the instances returned by AllInstances, i think09:49
fwereaderogpeppe1, depends how providers react to being told to stop instances they're claiming don't exist in an Instances call09:49
rogpeppe1fwereade: Environ.Destroy does anyway09:50
fwereaderogpeppe1, yeah, but AllInstances is kinda useless in practice :(09:50
rogpeppe1fwereade: because of eventual consistency?09:50
fwereaderogpeppe1, in ec2 at least it's not uncommon for Instances/AllInstances to straight-up lie for minutes at a time09:50
fwereaderogpeppe1, yeah09:51
rogpeppe1fwereade: in which case we're stuffed - there's nothing we can do to prevent an instance escaping after an environment destroy09:51
fwereaderogpeppe1, wasn't there at one stage a Destroy param that was explicitly "and stop these instances that we *know* exist even if the provier's lying"?09:51
rogpeppe1fwereade: there was (maybe still is)09:51
fwereaderogpeppe1, well we can at least issue a Stop call for everything we believe to exist09:52
rogpeppe1fwereade: yeah09:52
fwereaderogpeppe1, which means taking those ids from state09:52
rogpeppe1fwereade: but i *think* i still feel that destroying the services is overkill and won't do the job correctly09:53
fwereaderogpeppe1, are you saying there's no point restricting add-* commands when we know the env is being destroyed?09:53
rogpeppe1fwereade: yeah, because we immediately destroy all the machines anyway09:54
rogpeppe1fwereade: and there's a race regardless09:54
fwereaderogpeppe1, I think it's actually quite nice to inform other users that their commands will have no effect09:54
rogpeppe1fwereade: if they issued the command half a second earlier, their commands would have no effect either09:55
rogpeppe1fwereade: but they wouldn't get told09:55
fwereaderogpeppe1, stopping all instances might take more than half a second09:55
rogpeppe1fwereade: true. we should probably start by destroying the state servers...09:56
rogpeppe1fwereade: i just feel that destroying all the services might itself take quite a long time, and it's not really necessary09:58
rogpeppe1fwereade: and that we should just cut to the chase and stop the user burning money asap09:58
axwrogpeppe1: it is useful for manual, because it wants to know which units to kill09:58
axw(not that we're doing that yet)09:58
rogpeppe1axw: agreed - we definitely need to tear down manual units first09:59
fwereaderogpeppe1, destroying the state servers really feels to me like the thing to do *last*09:59
axwyes09:59
fwereaderogpeppe1, axw: but can I recuse myself briefly? I'd like to address wallyworld_'s review before he goes to bed10:00
axwyup, thanks for your time fwereade10:00
rogpeppe1axw, fwereade: destroying the state servers puts the environment into an inactive state - no new instances will be created, for example.10:02
axwgtg, will come back to this tomorrow10:07
fwereaderogpeppe1, putting them into a temporary not-accepting-requests mode is kinder and cleaner, I think10:09
rogpeppe1fwereade: depends how long it takes - if i've got an almost-not-working environment, running n transactions to destroy the svcs might take a long time when all i want to do is destroy the darn environment.10:10
rogpeppe1fwereade: but fair enough, if you think it's that important10:11
dimiternfwereade, ping10:20
dimiternfwereade, I think I'm finally ready to propose the first part of upgrade-juju changes, and I'll appreciate if you look into it closely, because I probably missed something important10:21
jamrogpeppe1, fwereade: I'm going to poke at the GOMAXPROCS thing again. Though I'd like to (a) not do it during the test suite and (b) check for GOMAXPROCS if set. Is there a sane way to know that we *aren't* in the test suite? I'd rather do it that way then have to change every test case to turn off the effect.10:21
fwereadedimitern, cool, I just need to pop out before the standup, I will take a look afterwards10:22
dimiternfwereade, cheers10:22
fwereadejam, set a "this is real code flag" in the various main.main() funcs?10:24
jamfwereade: yeah, that's sort of what I was thinking.10:24
fwereadewallyworld_, jam: https://codereview.appspot.com/24790044 addressed, I think10:37
wallyworld_fwereade: thanks, will take a look10:38
jamsaw the email, looking already10:38
dimiternfwereade, there it is https://codereview.appspot.com/2508004310:40
fwereadedimitern, cheers10:40
rogpeppe1jam: one way to know we're not in a test suite is to do the GOMAXPROCS initialization in a package that's not imported by a test10:41
rogpeppe1jam: though i would really like it much better if our tests passed reliably with GOMAXPROCS=anything10:41
jamrogpeppe1: if it is imported by main() won't it get imported by a test ?10:41
rogpeppe1jam: hmm, yeah you're right, for cmd/* tests10:42
jamrogpeppe1: I'm sure I'd like it if our tests just worked :)10:43
wallyworld_fwereade: did you forget to put in the destryMachineDoc in the cmd Info?10:43
rogpeppe1jam: another possibility is to put GOMAXPROCS=1 in test initialization10:43
rogpeppe1jam: actually, the best solution is just to call the GOMAXPROCS initialization inside main()10:44
rogpeppe1jam: main only gets called in very limited circumstances in our test suite10:45
rogpeppe1jam: and if we needed to, we could override it in that case (when we're execing the test binary) by setting the GOMAXPROCS env var10:46
wallyworld_fwereade: btw that lgtm assumes the destroyMachineDoc is used in the right place10:58
fwereadeoh ffs11:42
fwereadegoogle hates me today11:42
fwereadedimitern, rogpeppe1, mgz: I think I will pick this discussion up when I've had a chance to review the branch11:42
dimiternfwereade, ok, np11:43
fwereade...and eat lunch ;p11:48
* TheMue => lunch12:33
mattywfwereade, when you're back from lunch I've got a couple of small questions12:35
rogpeppe1fwereade, dimitern, jam: it looks to me as if we can't use upsert in a mongo transaction. is that right?12:45
rogpeppe1niemeyer: ^12:47
* rogpeppe1 goes for lunch13:03
=== gary_poster|away is now known as gary_poster
rogpeppe1fwereade: do you know of any existing logic in state that's similar to what we need for the collection of state servers? originally i was thinking of using $addToSet, but it doesn't quite fit the requirements; then i thought of using upsert, but it looks like we can't use that with transactions.14:10
fwereaderogpeppe1, not offhand -- I'd been vaguely expecting the meat to be a map[id]address for convenience of insert/remove14:11
rogpeppe1fwereade: i'd thought that too14:12
fwereaderogpeppe1, wrt interpreting an existing environment without on of those docs, I don't think there's a way around counting first to figure out whether to insert/update14:12
dimiternfwereade, how about my review btw?14:13
fwereadedimitern, I'm doing it :)14:14
rogpeppe1fwereade: i'm not sure whether a map[id]address can work, but i've just thought of something; maybe it can14:14
dimiternfwereade, cheers14:14
TheMueone review please: https://codereview.appspot.com/2404004414:24
TheMuerogpeppe1: calendar says you're reviewer today ;)14:24
rogpeppe1TheMue: ah, good point, will have a look14:24
TheMuerogpeppe1: and there's also https://codereview.appspot.com/15080044/, the script friendly outpout of env/swich14:42
rogpeppe1TheMue: will look at that next, thanks14:43
TheMuerogpeppe1: there's one field to remove in there, see martins comment, and he has troubles with the idea (you remember, default vc argument raw output)14:43
TheMuerogpeppe1: I've got to thank you14:44
fwereadedimitern, sorry, but I'm having real trouble following what's actually going on in that CL -- I have a bunch of comments on the documentation, but my brain keeps melting when I try to follow what's actually happening overall14:48
fwereadedimitern, I freely stipulate that this is because it's all based on my own shitty code that should never have seen the light of day14:49
fwereadedimitern, but it's giving me real problems when it comes to reviewing helpfully14:49
fwereadedimitern, especially since --upload-tools keeps on interfering14:50
fwereadedimitern, but I'm coming to feel this is a case where we should fix it, not patch it, if that phrasing makes sense to you14:51
dimiternfwereade, expand a bit please14:51
dimiternfwereade, the main changes are in validate(), when --version is not given, hence v.chosen is Zero14:52
dimiternfwereade, in addition to the removal of --dev flag and logic around it14:52
fwereadedimitern, I'm saying the code was largely incomprehensible before you started, and that makes it hard to validate your changes14:52
dimiternfwereade, I go over all discovered tools for that major version and filter out the incompatible within [current, nextSupported], picking newest at the end14:53
dimiternfwereade, ah14:53
dimiternfwereade, well, it is convoluted somewhat yes :)14:53
fwereadedimitern, (fwiw it is all my fault, I'm not going to defend the original form of the code)14:53
fwereadedimitern, but I'm wondering whether you're in a good position to rip all that underlying crap out and start afresh with something sane14:54
dimiternfwereade, that's a bit scary14:55
fwereadedimitern, yeah, I can understand that :)14:55
dimiternfwereade, the mess is around --upload-tools now mostly14:55
fwereadedimitern, yeah, and that's a lot of what bothers me14:56
dimiternfwereade, but didn't we decide to keep that behavior for now and just report is a deprecated?14:57
fwereadedimitern, because I no longer really understand how upload-tools interacts with anything else14:57
fwereadedimitern, Idon't know, the internet hated me so lost the end of that conversation14:57
dimiternfwereade, well, in summary14:57
dimiternfwereade, we discussed eventually replacing the need of --upload-tools everywhere with some dev docs + possibly scripts/tools that use sync-tools when you would've used --upload-tool14:58
dimiternfwereade, but for now iirc decided to deprecate --upload-tools with a message and keep it working for easier transition; it'll also need some announcement on the mailing list at least14:59
fwereadedimitern, if we've ever recommended that anyone use upload-tools I guess we should15:00
fwereadedimitern, but I would hope wenever had15:00
fwereadedimitern, it's always been a dev tool15:00
dimiternfwereade, yep15:01
fwereadedimitern, so who's transitioning?15:01
fwereadedimitern, keeping it around makes the transition harder15:01
jcsackettabently, sinzui: can you look at https://code.launchpad.net/~jcsackett/charmworld/rip-out-old-queues/+merge/194647 and/or https://code.launchpad.net/~jcsackett/charmworld/multiple-appflowers-2/+merge/19464815:01
fwereadedimitern, and just preserves the shitty devs-first attitude that has plagued the whole problem from the beginning15:01
dimiternfwereade, it's useful when developing/testing stuff from source, mainly for us (one reason rogpeppe1 was not sold on removing it right away, and I get that), but we should replace it with sync-tools + scripts / makefile commands to see how it works out15:02
sinzuijcsackett, I can15:02
fwereadedimitern, the fundamental *problem* with all our tools stuff is, and always has been, that --upload-tools works for devs15:03
jcsackettsinzui: thanks.15:03
dimiternfwereade, despite being tempting to rip it out and rewrite it, let's do it in steps I suggest15:03
rogpeppe1fwereade: given that people *are* using it, i think peremptorily removing it without at least some notification to the group might be considered a little rude15:04
fwereadedimitern, I'm just worried that building on what we have inevitably takes us *further* from a good solution15:04
fwereaderogpeppe1, who's using it?15:04
rogpeppe1fwereade: anyone that's following the dev version for whatever reason15:04
fwereaderogpeppe1, surely not? we distribute dev tools15:05
dimiternfwereade, rogpeppe1, well, if it's just us, then it's no problem15:05
rogpeppe1fwereade: true. anyone that builds from tip15:05
fwereaderogpeppe1, anyone who's building from source themselves is a developer15:05
fwereaderogpeppe1, and can I think be expected to deal with it15:06
rogpeppe1fwereade: we should at least provide some reasonable replacement so everyone doesn't have to reinvent their own script15:07
fwereaderogpeppe1, yeah, no argument there15:07
dimiternabsolutely +115:07
rogpeppe1fwereade: in particular i'd like to be able to keep my current workflow (change some source, try to bootstrap with tools built from the current source)15:08
dimiternrogpeppe1, or upgrade for that matter15:08
fwereaderogpeppe1, dimitern: I don't really care how hard it is for us tbh... making it easy for ourselves made it crap for everyone else15:09
rogpeppe1dimitern: indeed - that's also important to get right, so the developer doesn't constantly need to be fiddling with the build version in version.go (something that will inevitably leak into trunk at some point)15:09
rogpeppe1fwereade: i don't really see how it interfered with non-dev usage15:09
fwereaderogpeppe1, tools distribution was total crack until just before 13.04, was was only just downgraded to 98% crack in time for release15:10
fwereaderogpeppe1, this is precisely because we were all using a non-standard code path15:10
rogpeppe1fwereade: wasn't the crackfulness only exposed if you did upload-tools?15:11
fwereaderogpeppe1, the cost to us of constructing a realistic environment in which to run the same code as our users do is *far* less than the cost of having the real code path so little-travelled as to be barely a game trail15:11
fwereaderogpeppe1, no, upload-tools did the right thing15:12
rogpeppe1fwereade: you're probably thinking of a different piece of crack to me15:12
fwereaderogpeppe1, normal users would basically run random versions of the tools15:12
rogpeppe1fwereade: oh, you mean the "select highest version" logic?15:12
fwereaderogpeppe1, and those versions would usually work well enough to downgrade themselves to the desired versions15:13
dimitern:D15:13
fwereaderogpeppe1, but none of us *noticed* because we wererunning a *different code path*15:13
fwereaderogpeppe1, and now we have a bit of time scheduled to work on upgrades I am *very* keen that we avoid getting back into that particular failure mode15:14
rogpeppe1fwereade: ok15:14
fwereaderogpeppe1, cheers -- it's not even an issue with screwing up the code, because that basically always happens,because we're human -- it's about making sure we have a chance of detecting the problem15:15
* fwereade quick ciggie, think15:16
niemeyerrogpeppe1: Yes, you cannot upsert in a mongo transaction.. you can insert and update, though15:30
niemeyerrogpeppe1: or update and insert15:30
niemeyerrogpeppe1: (ordering matters)15:30
niemeyerrogpeppe1: Sorry for missing your message earlier15:31
fwereadeniemeyer, ha, ofc -- nice :)15:31
rogpeppe1niemeyer: so if you don't know if something exists, it would be ok to do both the insert and the update, and one would succeed, so all would work?15:31
fwereaderogpeppe1, I think you'd want update-then-insert,so the update silently fails15:31
niemeyerrogpeppe1: Yeah15:32
niemeyerfwereade: Right15:32
rogpeppe1niemeyer: cool, thanks15:32
niemeyerThere's a message in juju-dev15:32
niemeyeron Aug 30th15:32
niemeyerSubject "Upcoming fix in mgo/txn"15:32
rogpeppe1niemeyer: cool, thanks15:33
rogpeppe1niemeyer: in fact i'm probably going to be using a single document, but that's really useful to know15:33
rogpeppe1TheMue: you've got a review16:01
rogpeppe1TheMue: and another one16:11
jcastrowhat was the final name of the ssh/manual provider?16:26
TheMuerogpeppe1: thx16:27
natefinchmental note: don't do apt-get update && apt-get upgrade during the workday :/16:32
mgznatefinch: indeed, it's an after-work leisure activity :)16:32
natefinchmgz: My old laptop is pissed I'm replacing it, evidently16:33
TheMuenatefinch: reasonable16:54
jamjcastro: manual17:05
jcastrojam, ok so https://juju.ubuntu.com/docs/config-manual.html is correct terminology-wise?17:05
mattywwhat version of mongo to people use to test on precise? I get all kinds of panics when running go test ./state/...17:06
jammattyw: there should be a version in ppa:juju/stable or the cloud tools archive.17:06
jamI think it is 2.2.4?17:06
jcastrojam, I assume that core will write manual instead of "null" to the environments yaml?17:06
jammattyw: we need SSL support which wasn't in the stock precise version17:06
mattywjam, I remember that being the case - 2.2.4 is what I'm on - I'll paste the errors in case anyone is interested17:07
dimiternrogpeppe1, thanks for the review as well17:12
rogpeppe1dimitern: np17:12
=== racedo` is now known as racedo
mattywthis is more or less a summary of the errors I see http://paste.ubuntu.com/6406334/17:25
* rogpeppe1 is done for the day.19:30
rogpeppe1g'night all19:30
thumpernatefinch: if you have time... https://codereview.appspot.com/24980043/20:46
thumperpretty trivial move of functions20:46
thumper+ a few tests20:46
natefinchthumper: sure thing20:46
thumpernatefinch: I wish go had a better way to handle dependencies than just packages...20:48
natefinchthumper: you mean like versioning on dependencies?20:48
thumperI have to move some interfaces out of a package where they fit naturally to avoid circular imports20:48
natefinchthumper: Oh I see20:48
thumperI don't have a good answer for that yet20:49
natefinchthumper: yeah, the circular imports thing is annoying... but it generally means you're doing something wrong anyway.20:49
thumperI had moved them up a package, from container/kvm to container, but it doesn't naturally fit20:49
* thumper tries to move something else20:49
* thumper has an idea20:50
natefinchthumper: there's actually the exact same problem in C# - you can't have X depend on y and Y depend on x.20:50
* thumper nods20:51
thumpernatefinch: ya know what, this circular import was because I was doing something wrong20:59
thumperchanged it, and now it is better20:59
natefinchthumper: is it me, or is RemoveContainerDirectory actually supposed to be MoveContainerDirectory?21:01
thumperwell, it should probably be renamed :)21:02
thumperbut move is a little general21:02
thumperit does remove it from the main dir21:02
thumperbut keeps it around for "debugging"21:02
thumperwhich most people never need21:02
thumperI have a mental bug about cleaning it out when you start a new local provider21:03
natefinchthumper: do we care that there's a race condition in RemoveContainerDirectory if two things both find a unique directory name and then both try to rename their directory to it?21:06
thumpernatefinch: no, because there is only ever one goroutine doing this21:06
thumperthe provisioner21:07
thumpermay be worth noting in the function description though21:07
natefinchthumper: yeah.... it would be good to note it's not thread safe21:08
* thumper wonders why he didn't get an email about natefinch's reply21:30
natefinchthumper: I just hit "go" like 30 seconds ago21:30
thumperoh21:30
thumper:)21:30
natefinchthumper:  sorry for all the nitpicks21:31
thumperthat's fine21:31
* thumper goes of the nits21:31
thumpernatefinch: I have another, Rietveld: https://codereview.appspot.com/2555004321:31
thumpernatefinch: next branch in the pipeline21:31
thumperlots of initial boilerplate for the kvm containers21:32
thumpernatefinch: we can't export them just for tests because the tests are in other packages21:32
natefinchthumper: oh yeah.  Man I hate that.21:33
thumperthis does make me sad21:33
thumperbut go only has one protection mechanism21:33
thumperand is kinda anal about you circumventing it21:33
natefinchthumper: to be fair, it's because we're trying too hard to have other packages know about the internals of this one21:34
natefinchthumper: other packages shouldn't need to know about the removedDirectory, even for tests21:34
thumperwell...21:34
thumperwe want to be able to mock them out though21:34
thumperhaving a nice clean way to do that is difficult21:35
natefinchyeah21:35
thumperbecause whatever you choose21:35
thumperit is iky21:35
natefinchyep21:35
natefinchwonder if the way to do it is to just have Mock() and Unmock() methods in the package, and let it figure out how to do it.  Not great, though.  I wonder if anyone else has figured out cleaner ways to do it.21:37
thumpernatefinch: most other projects have one package :-|21:37
natefinchthumper: I'm not convinced of that.  Not real projects that actually do things.21:38
thumper:)21:38
thumperMoveToRemovedDirectory?21:39
thumperawkward21:39
natefinchIt's honestly a little weird that this detail is exposed.  The reason you're doing it is to avoid circular dependencies?21:39
thumperwhich details?21:40
thumperthe directory stuff?21:40
natefinchyeah21:40
thumperI've moved it up because it is common between lxc and kvm21:40
thumperwe need somewhere to write the user data and have container logs21:40
natefinchyeah.. yeah, that's true.21:41
thumperI may just go with container.RemoveDirectory21:42
natefinchthumper: yeah, I think that's fine21:43
natefinchthumper: I'm not going to be able to get that review done before I have to go relieve my wife of the screaming baby at 5.21:53
thumpernatefinch: that's fine21:53
thumperI'll get wallyworld_ to do it :)21:54
wallyworld_\/21:54
thumperwallyworld_: here is that IsKVMSupported method you are after :) https://code.launchpad.net/~thumper/juju-core/kvm-containers/+merge/19494621:58
wallyworld_great, will look soon21:59
thumperwallyworld_: although, I was thinking perhaps we should return (bool, error)21:59
thumperinstead of just bool21:59
wallyworld_yes, +1 to error21:59
thumperso if the cpu-checkers package isn't installed, or for some reason /bin/kvm-ok isn't there, we get an error21:59
wallyworld_yes, agreed. need that error22:03
=== gary_poster is now known as gary_poster|away
wallyworld_thumper: just started looking - the kvmContainer retains a reference to the factory, the lxc container doesn't - is there a fundamental difference i will encounter as i read more of the code?22:20
thumperI'm not sure...22:23
thumperwallyworld_: all you should really care about is the public interfaces22:26
thumperbecause implementation is likely to change22:27
wallyworld_as a user yes :-) i was trying to grok the implementation22:27
thumper:)22:27
thumperthe implementation may change as it is all not-there-yet :)22:28
* wallyworld_ nods22:28
wallyworld_i was comparing the lxc and kvm implementations / structs etc to see the differences22:28
* thumper nods22:29
wallyworld_since as lxc is already in the code base, if this mp is similar, makes reviewing easier22:30
wallyworld_thumper: why does kvm package redefine ContainerFactory interface?22:34
thumperContainerFactory is about the low level containers themselves22:34
thumperthe container.Manager interface is how juju interacts with containers22:34
thumperthe mock implementation is around the ContainerFactory and Container22:34
thumpernot the container.Manager22:34
wallyworld_ok22:34
wallyworld_although since the interface is identical for lxc and kvm, i would have thought the implementing structs would have been different, but interface abstracted out as common22:36
wallyworld_otherwise value of having an interface is diminished22:37
wallyworld_may as well just use structs22:37
thumperpart of the problem is the lxc versions are in the golxc package22:38
thumpernot juju22:38
thumperinterfaces mean I can mock it out22:38
thumperthat in itself is good22:38
* thumper has to nip out22:39
thumperbbl22:39

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!