thumper | o/ wallyworld_ | 00:31 |
---|---|---|
wallyworld_ | hi | 00:31 |
thumper | wallyworld_: how's tricks? | 00:31 |
wallyworld_ | ok, just typing stuff into shietveld | 00:31 |
thumper | :) | 00:31 |
wallyworld_ | for william's branch | 00:31 |
thumper | wallyworld_: have you looked at fwereade's branch or shall I do it? | 00:31 |
wallyworld_ | just finishing up now | 00:31 |
wallyworld_ | i have a branch of my own that i have tweaked that you could look at | 00:32 |
thumper | sure | 00:32 |
wallyworld_ | cool :-) | 00:32 |
thumper | I have another for the kvm prelude too | 00:32 |
wallyworld_ | ok | 00:32 |
wallyworld_ | will look after i finish current one | 00:32 |
wallyworld_ | faaark. just lost all my comments in the review :-( paste buffer error :-( | 00:34 |
wallyworld_ | cause some kiwi distracted me | 00:34 |
* thumper looks around for that nasty kiwi | 00:36 | |
* thumper can't see anyone that matches that description | 00:36 | |
wallyworld_ | thumper needs a mirror :-P | 00:37 |
thumper | wallyworld_: is it the supported containers that needs review? | 00:38 |
wallyworld_ | yeah | 00:38 |
wallyworld_ | thumper: i knew i would get pushback on nil vs empty. sigh | 00:38 |
thumper | heh | 00:38 |
thumper | I think that [instance.NONE] is pretty clear for supported containers | 00:39 |
thumper | but I do agree | 00:39 |
thumper | in general | 00:39 |
thumper | that empty != nil | 00:39 |
wallyworld_ | well it's true in python, java, c++ etc etc | 00:39 |
wallyworld_ | everything except Go it seems | 00:39 |
thumper | well, they are different in go doo | 00:40 |
thumper | too | 00:40 |
thumper | but some people treat them the same | 00:40 |
wallyworld_ | well, sometimes | 00:40 |
thumper | nil != [] | 00:40 |
wallyworld_ | sure | 00:40 |
wallyworld_ | but you can append and iterate using a nil splice | 00:40 |
wallyworld_ | slice | 00:40 |
wallyworld_ | and it behaves just like empty | 00:40 |
thumper | yeah... | 00:40 |
wallyworld_ | and marshalling stuffs it up | 00:40 |
wallyworld_ | i dowonder what crack the language designers were smoking | 00:41 |
wallyworld_ | not only in this instance | 00:41 |
thumper | someone making opinionated decisions that work *most* of the time | 00:42 |
thumper | I can see it from their point of view | 00:42 |
wallyworld_ | opinionated decisions is ok if they are sensible | 00:45 |
wallyworld_ | but throwing out decades of established semantic meaning is not a good decision | 00:45 |
wallyworld_ | thumper: so when are have finished +1'ing my branch :-) do you want a quick hangout? | 00:52 |
thumper | sure for the hangout, not conviced it'll get a +1 yet :P | 00:53 |
wallyworld_ | ok, hangout first perhaps | 00:53 |
wallyworld_ | https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig | 00:53 |
wallyworld_ | when you are ready | 00:53 |
thumper | wallyworld_: can you not hear me? | 01:03 |
wallyworld_ | no | 01:03 |
thumper | I can see bars when I talk | 01:03 |
wallyworld_ | i'll try a new hangout | 01:03 |
wallyworld_ | https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.rmi4baa0vuu8vp4qs7eb2pd41g | 01:04 |
wallyworld_ | thumper: ? | 01:05 |
* thumper goes on a quick dog walk and school run | 01:54 | |
wallyworld_ | thumper: ping for when you return | 02:25 |
thumper | return of the thumper | 02:40 |
jam | morning thumper and wallyworld_ | 02:48 |
wallyworld_ | yello | 02:48 |
wallyworld_ | thumper: i am sad that the provisioner code is full of switch statements - if environ do this else if lxc do that. what sort of non-oo is that? | 02:49 |
thumper | wallyworld_: it is not full | 02:50 |
thumper | there are a few switches in the factory methods | 02:50 |
wallyworld_ | and the loop | 02:51 |
wallyworld_ | but ok | 02:51 |
thumper | the vast majority of the provisioning logic is reused | 02:52 |
thumper | geez | 02:52 |
wallyworld_ | alright :-) | 02:54 |
jam | thumper: is it possible to pull the bits that aren't out into an interface rather than using a switch ? | 02:55 |
jam | (wallyworld isn't the first to see it and wonder) | 02:55 |
jam | which probably means he won't be the last | 02:56 |
wallyworld_ | jam: an interface is what i was hoping for | 02:56 |
thumper | probably | 02:59 |
jam | thumper: fwiw, your container-interface bounced which caused your followup to also get blocked. Not sure if you saw it. The failure doesn't look like a flakey test immediately to me. | 03:00 |
thumper | ok | 03:09 |
thumper | jam: I'll take a look shortly, so much on... | 03:10 |
thumper | jam: that is interesting, because the tests all passed locally | 03:13 |
thumper | jam: I'm going to mark approved to see if it is intermittent like I think it is | 03:13 |
thumper | jam, wallyworld_: yes it may well make sense to split that shit out into an interface | 03:14 |
thumper | I probably should have thought of it at the time, but obviously didn't :) | 03:14 |
thumper | I will leave it as an exercise for the reader (wallyworld_), as a prereq | 03:14 |
wallyworld_ | \o/ | 03:22 |
wallyworld_ | not | 03:22 |
thumper | wallyworld_: aw, c'mon, you love this shit | 03:25 |
wallyworld_ | sometimes | 03:25 |
thumper | jam: yes, it was an intermittent test failure :( | 03:29 |
hazmat | is this normal.. not found error on https://streams.canonical.com/juju/tools/streams/v1/index.json w/ trunk | 03:43 |
hazmat | looks like some of the cli flags have been changing around.. | 03:49 |
hazmat | breaking existing scripts | 03:49 |
bigjools | thumper: mind your language, it's a family channel | 03:51 |
thumper | :P | 03:51 |
hazmat | hmm.. does trunk work? | 03:58 |
hazmat | trying to bootstrap right after destroy-env.. http://pastebin.ubuntu.com/6403355/ | 03:58 |
hazmat | if i leave off the upload-tools it hangs indefinitely.. | 03:59 |
hazmat | hmm... but if do a bootstrap without upload-tools after upload-tools it succeeds. | 04:01 |
hazmat | sigh.. new control bucket and everything works fine. | 04:09 |
thumper | wallyworld_: plz help hazmat :) | 04:13 |
wallyworld_ | hmm? | 04:13 |
wallyworld_ | new control bucket and it works sounds like an s3 issue | 04:15 |
wallyworld_ | error says tools tarball could not be uploaded to s3 bucket | 04:15 |
wallyworld_ | i'm not sure what juju related issue that might be | 04:15 |
wallyworld_ | i wonder how many times such an error has come up | 04:16 |
hazmat | wallyworld_, here's another one.. right before that.. where bootstrap hangs forever.. http://pastebin.ubuntu.com/6403423/ | 04:17 |
hazmat | well 10m, before i killed it. | 04:17 |
hazmat | changing to a new bucket and things seem to work again. | 04:17 |
hazmat | wallyworld_, the odd part i was able to upload tools in a subsequent run to that same bucket before changing buckets. | 04:17 |
wallyworld_ | hazmat: the logs just make it seem like juju asks ec2 to do something (eg write data to a bucket, send data to an instance) and ec2 says no | 04:19 |
wallyworld_ | are we sure it's not just transient ec2 weirdness? | 04:19 |
hazmat | wallyworld_, dunno, i couldn't reach the aws s3 web console.. to poke around more.. although i can hit it with the cli tools | 04:20 |
wallyworld_ | the stuff that's logged looks like one would expect on the surface eg looking for tools, not finding them, looking elsewhere etc | 04:21 |
hazmat | wallyworld_, so we end up hitting about 12 urls on bootstrap (4 url x 3 times) is that normal? | 04:21 |
wallyworld_ | we look for tools in control bucket, streams.canonical.com, and for each of those, we look for signed and unsigned metadata | 04:22 |
wallyworld_ | and we try once initially, then upload if needed, then look again | 04:22 |
wallyworld_ | so there are indeed several urls we hit | 04:22 |
wallyworld_ | and we also look for tools mirrors | 04:23 |
wallyworld_ | signed and unsigned | 04:23 |
wallyworld_ | the debug logs can therefore be quite noisy | 04:23 |
hazmat | ic | 04:24 |
wallyworld_ | it can be distracting | 04:26 |
wallyworld_ | but the debug is needed if things go wrong | 04:26 |
wallyworld_ | eg i put tools or image metadata somewhere and it is not used, does juju even look where i think it is looking | 04:27 |
=== axw_ is now known as axw | ||
=== axw_ is now known as axw | ||
axw | wallyworld_: any tips on getting insight into what mgo/txn is doing? I've added a new Assert-only (no Insert/Update) op, doesn't seem to be doing anything | 06:24 |
axw | or jam, or thumper... | 06:25 |
wallyworld_ | axw: i've only ever used println etc, i'm not sure what tools there are | 06:25 |
jam | axw: I think the only master of txn is Wallyworld, but I have some understanding of how it works. | 06:25 |
wallyworld_ | so i print what is there in the db before the call, and again after | 06:26 |
axw | mmk. I kinda need to know what's going on inside mongo I think :/ | 06:26 |
wallyworld_ | what are you trying to do? | 06:26 |
axw | stop service/unit/machine adds if environment's (new) life field is not Alive | 06:26 |
wallyworld_ | there are a few notAlive asserts you xcan copy from i think | 06:27 |
axw | yeah, I've been referring to them | 06:27 |
axw | it's a bit more complex, because this one has to cater for non-existent life field too | 06:27 |
wallyworld_ | you could try making the assert deliberately fail | 06:28 |
axw | also, it's on a foreign doc | 06:28 |
wallyworld_ | ah | 06:28 |
axw | ah yeah, good point | 06:28 |
wallyworld_ | bear in mind, all the asserts are evaluated up front | 06:28 |
wallyworld_ | so if you have a slice of txns | 06:28 |
wallyworld_ | an operation early on will not cause an assert later on to fail | 06:28 |
axw | that's fine, that's what I want | 06:29 |
wallyworld_ | it sucks balls actually | 06:29 |
wallyworld_ | for me anyway :-) | 06:29 |
axw | well, it suffices anyway :) | 06:29 |
wallyworld_ | anyway, it's a bit of trial and error i've found | 06:29 |
wallyworld_ | start simple, with it failing initially, and build up from there | 06:30 |
axw | no worries, just thought I'd ask before I head into the hole | 06:30 |
axw | sounds good, I'll try that | 06:30 |
axw | ta | 06:30 |
wallyworld_ | good luck :-) ask if you get stuck | 06:35 |
wallyworld_ | fwereade: ping? | 07:04 |
fwereade | wallyworld_, pong | 07:05 |
axw | wallyworld_: I was comparing integers with strings.. duh. Turns out I can just use the existing assertion anyway, since Alive is the zero value | 07:05 |
wallyworld_ | \o/ | 07:05 |
fwereade | wallyworld_, thanks for the review | 07:05 |
wallyworld_ | fwereade: i hope it made sense | 07:05 |
wallyworld_ | fwereade: i have started some work to delay the start up of the container provisioners until they are needed | 07:06 |
wallyworld_ | it's really my first foray into that tar pit | 07:06 |
fwereade | wallyworld_, largely, yeah, but jam's comments largely anticipate my own responses | 07:06 |
wallyworld_ | i would not be surprised if what i have splatted out is shitful | 07:07 |
wallyworld_ | i didn't realise about the life issue | 07:07 |
wallyworld_ | seems kinda confusion | 07:07 |
fwereade | wallyworld_, in particular, we can't necessarily make machines dying | 07:07 |
jam | fwereade: you seem back to work awfully soon after you signed off to go to sleep :) | 07:07 |
wallyworld_ | it sort of goes against expectations that a machine is destroyed but still "alive" | 07:08 |
fwereade | jam, ehh, I'm awake now, might slope off for another part of the day :) | 07:10 |
fwereade | wallyworld_, agreed -- sorry about the currently-crappy model | 07:11 |
wallyworld_ | fwereade: so i was hoping you could look at my shitful attempt to produce something and tell me it's all crackful and i need to start again | 07:11 |
wallyworld_ | it's justa bit of a brain dump so far | 07:11 |
fwereade | wallyworld_, it would be my pleasure ;) | 07:11 |
wallyworld_ | but i want feedback before i go any further | 07:12 |
wallyworld_ | https://code.launchpad.net/~wallyworld/juju-core/lazy-container-provisioners/+merge/194795 | 07:12 |
wallyworld_ | i'm no so sure about the watcher | 07:12 |
wallyworld_ | i know it's a bit of duplication | 07:12 |
wallyworld_ | there's similar code in other packages | 07:12 |
wallyworld_ | but i sorta want to see if it is the right thing to do first and then it can be consolidated | 07:13 |
fwereade | wallyworld_, would you lbox propose -wip so I can line-by-line it please? | 07:13 |
wallyworld_ | sure | 07:13 |
wallyworld_ | so, the idea is - when machine agent starts, do not start provisoner for containers | 07:14 |
wallyworld_ | use a watcher to see when a container is first requested | 07:14 |
wallyworld_ | and only then start the provisoner | 07:14 |
wallyworld_ | also at start up, update the machine to records the supported containers | 07:15 |
wallyworld_ | so if a user add-machine an unsupported one, it will fail | 07:15 |
wallyworld_ | fwereade: https://codereview.appspot.com/25040043 | 07:15 |
wallyworld_ | i have to go to soccer for a bit but will be back in a couple of hours | 07:15 |
axw | fwereade: I have a WIP incoming for destroy-environment changes that I'd love some feedback on too | 07:16 |
wallyworld_ | fwereade: bear in mind i've not really done much with watchers before so i'm not sure if what i've done is even correct or as per the design of watchers. i've also not run the code or anything yet either | 07:17 |
wallyworld_ | take a number :-P | 07:17 |
axw | :) | 07:17 |
wallyworld_ | i'll be away from keyboard for a bit anyway :-) | 07:17 |
rogpeppe1 | mornin' all | 07:27 |
axw | morning rogpeppe1 | 07:27 |
rogpeppe1 | axw: hi | 07:28 |
axw | fwereade: https://codereview.appspot.com/22870045/ -- the title is probably misleading, but when you have some time | 07:40 |
axw | just looking for a general in-the-right-direction or not | 07:41 |
jam | axw: so machine agents can't actually stop their own machines, right? That has to be done by the Provisioner (or the juju client) because nobody else has the provider secrets. | 07:49 |
jam | so I'm wondering if the watch needs to be a step higher as well. (Provisioner watches for env destruction, waits a bit (but probably not forever) for things to go to dying/dead and then kills the container/machine) | 07:50 |
axw | jam: I don't follow | 07:51 |
axw | there's a global flag, saying "environment is dead" | 07:51 |
axw | the machiner watches for that, and then terminates when it is set | 07:51 |
jam | axw: an important step in destroy-environment is to actually terminate the machines | 07:51 |
jam | the machiner can't terminate itself | 07:52 |
axw | jam: right, that bit is still on the CLI | 07:52 |
axw | well | 07:52 |
jam | (can't call EC2.TerminateMachine(my-instance-id) | 07:52 |
axw | still done by the provisioner & CLI | 07:52 |
jam | axw: so if the CLI is terminating the instances, they won't have a chance to cleanup the shutdown, right? | 07:52 |
jam | and the mongodb main server is being killed, so do the watches fire? | 07:52 |
jam | I guess the question is, what waits for what and how long do we wait? | 07:52 |
jam | I would *tend* to say, if a Provisioner can Terminate a machine, there isn't much point waiting for the Machiner to tear itself down first. | 07:53 |
jam | we really like this for the manually registered machines | 07:53 |
jam | but for the rest, I go back and forth. | 07:54 |
axw | jam: still not following :) where do we wait for the machiner to tear itself down? | 07:54 |
axw | jam: mostly everything works as it does today, with the exception of machine agents being told to shut down just before we return to the CLI | 07:56 |
axw | for the manual provider, this is essential to having the state servers tear themselves down | 07:56 |
axw | for the others, it's superfluous, as the CLI will kill them anyway | 07:56 |
axw | (but left in for the sake of generality, and it doesn't really hurt) | 07:57 |
jam | axw: so if we don't wait until we know that the machiner's actually got the message, we end up killing mongo, and they'll never see the message. | 08:01 |
jam | How do we know when they've gotten the message? | 08:01 |
jam | and why bother waiting for the ones that we know we're just going to Terminate anyway | 08:01 |
jam | so *my* model would be something like "for all machines listed as manually registered, wait until the agent shows as dying, and then return to the CLI to nuke everything else" | 08:02 |
axw | jam: ehh good point, I mucked that bit up. I *had* been destroying all non-state machines previously | 08:02 |
jam | *except* the caveat from fwereade | 08:02 |
jam | which is that you can't actually set a Machine to Dying until Units on that machine are Dead | 08:03 |
jam | which has been proposed as something we should really fix in the model. | 08:03 |
jam | axw: so when would the state machines get terminated? | 08:03 |
jam | by themselves? | 08:03 |
axw | jam: yes. DestroyJuju should kill non-state machines after destroying units, and wait for them. Then set environment to Dead, and the state servers see that and kill themselves | 08:04 |
jam | also, I'm not a big fan of waiting for things indefinitely, so some sort of "wait for a while for them to notice, but then go ahead and kill everything else anyway" is a fair point for me. | 08:04 |
axw | I had that in the code before, not sure why I took it out... | 08:05 |
jam | axw: so in the HA world, it is possible that only the machine agent that is running the active Provisioner will be able to Terminate machines. | 08:05 |
jam | axw: so I like your "DestroyJuju should ..." except for nodes that are Provisioned properly, I'm not sure why we should wait for them to tear down rather than just destroying them. If we have a use case, do it, but otherwise we just slow down destroy-environment. | 08:06 |
axw | jam: we can make it only for manually provisioned nodes, I was just trying to avoid special casing | 08:07 |
axw | but that's the only time it's needed | 08:07 |
jam | axw: I'd be okay seeing it in practice, but it *is* nice that today "juju destroy-environment" with 20+ machines can still return in a few seconds. | 08:07 |
axw | fair enough | 08:08 |
jam | axw: now, I could hypothesize a use case | 08:08 |
jam | like | 08:08 |
jam | when we have Storage | 08:08 |
jam | you may want to cleanly unmount the storage before terminating | 08:08 |
jam | so that your PG data is safely flushed to disk for the next round of the environment | 08:08 |
jam | or something like that | 08:08 |
jam | but it is a bit hypothetical | 08:08 |
axw | hmm yeah. I think, ideally, environ.Destroy should handle everything still (for cases where the state server is stuffed), but that sort of thing may be outside its reach | 08:10 |
jam | axw: I think whatever we do to try and make things a "nice, peaceful death" we should still keep an axe around in case it all goes badly | 08:10 |
axw | yup | 08:10 |
jam | that may become "destroy-environment --force" | 08:10 |
jam | but I'd like to make sure that normal shutdown is "fast enough" that people don't feel they should habitually supply the --force parameter. | 08:11 |
axw | yup, fair enough | 08:11 |
jam | axw: but I'm mostly just talking through ideas with you. I haven't looked at the specific code | 08:12 |
axw | easy enough | 08:12 |
jam | I'm not 100% sure what the use case / model is when you have the ability to pass out credentials to an environment that don't have the EC2 creds directly available. | 08:12 |
jam | I'm guessing that the security model says that those people aren't allowed to trigger DestroyEnvironment | 08:12 |
jam | so the fact that the client can't | 08:13 |
jam | isn't a problem | 08:13 |
jam | although... | 08:13 |
jam | hosted Juju (aka JaaS) | 08:13 |
jam | is a bit more unclear on this point. The state machines would run on hardware you *don't* control, but you should have the ability to stop the environment, start a new one, etc. I don't know what the design for that is, TBH. | 08:14 |
jam | Does the juju CLI still keep env provider creds around? | 08:14 |
jam | these are still hypotheticals, and nothing we have to get right today, but it can be useful to look at some future use cases and see if we align well enough with them. | 08:15 |
axw | jam: yeah, I'm not sure about how the JaaS story is meant to work exactly. fwereade mentioned that we'd still want to have destroy-environment behind the API (entirely) for that, probably | 08:16 |
jam | axw: I do think that one of the designs of that is the Provider creds are always secret from the user (known only inside the JaaS servers) | 08:17 |
jam | in which case, we do need all the Terminate calls to happen from the Provisioner | 08:17 |
axw | jam: actually, that should work with what I've coded (when I add destruction of non-state machines back into the code) | 08:19 |
jam | axw: I don't think it hurts to do 2 passes (one more time in the client if we have the appropriate secrets to do so), as Terminate tends to be idempotent :) | 08:20 |
axw | yup | 08:20 |
fwereade | axw, so the issue I have on first glance at the description is that waiting for all units to clean themselves up makesenv destruction take orders of magnitude longer | 08:21 |
jam | fwereade: welcome back | 08:21 |
jam | you missed a small amount of discussion :) | 08:21 |
fwereade | jam, ah sorry | 08:21 |
fwereade | jam, easily precisable? | 08:21 |
axw | fwereade: heya. we can make that manual-provider specific | 08:21 |
fwereade | (I did eat some bacon though, that was nice) | 08:22 |
jam | fwereade: http://irclogs.ubuntu.com/2013/11/12/ though it appears to not be quite up to date with the last 20 minutes. | 08:22 |
jam | I would guess it polls every X minutes or something. | 08:22 |
jam | fwereade: I had the same concern | 08:23 |
jam | fwereade: and could come up with a hypothetical "clean shutdown benefits stuff like storage, or database saving their state" but it was pretty hypothetical | 08:23 |
jam | fwereade: it also brings up the idea that "we don't have to wait for Machine to be dead, we just care that Manual machines have seen the "please shut down message" | 08:24 |
jam | fwereade: so if we had a Machine.Dying that actually worked | 08:24 |
jam | then we could just wait for that | 08:24 |
jam | fwereade: and we have "we probably want the State machines to issue Terminate requests before returning to the CLI because we'd like to have that in a JaaS-ish world." | 08:24 |
jam | and we can deal with the ACL of who has rights to issue destroy-environment as a purely ACL issues | 08:25 |
fwereade | jam, axw: I'm sorry, I'm having some trouble following | 08:26 |
fwereade | jam, axw may have already responded to this but I was wondering yesterday whether it made sense to refuse destroy-environment requests when there are exist manually provisioned machines with units on them | 08:27 |
jam | fwereade: so the question, "Do we wait for all things to cleanup nicely before we Terminate them" | 08:27 |
fwereade | jam, I don't think we should in general | 08:27 |
jam | fwereade: for Manually registered machines, we can't Terminate them, so we sort of have to | 08:27 |
jam | for the rest | 08:27 |
jam | I can hypothesize a "backup your stuff to storage we're shutting down" sort of hook | 08:27 |
fwereade | jam, cattle not pets -- but manually provisioned machines *are* more like pets so we shouldn't abuse and abandon them | 08:28 |
jam | but we don't have that today | 08:28 |
jam | fwereade: right. my hypothesis is around a DB charm and Storage implementation that we want clean shutdown so we can bring it up in the next life | 08:28 |
axw | brb, making a cup of tea | 08:28 |
jam | though you could just say "that must be handled before destroy-env" | 08:28 |
fwereade | jam, I'm inclined that way for now at least | 08:28 |
jam | fwereade: right. That was my position as well. | 08:29 |
jam | fwereade: so I think my statement was "change the environment to dying, wait for manually provisioned machine agents to show up as Dying, and then Terminate the rest of the world" | 08:31 |
jam | fwereade: with the caveat that we don't have a good Machine.Dying, so we actually have to kill the units and wait for the Machiner to show up as Dead | 08:31 |
fwereade | jam, I'm not sure what that complexity buys us over just refusing to destroy when we have manually provisioned machines | 08:32 |
fwereade | jam, in particular I'm not quite sure what shutdown sequence you imagine for those machines | 08:32 |
axw | fwereade: so, how do we kill the manually provisioned bootstrap nodes? | 08:33 |
jam | fwereade: my understanding is that what we care about is to uninstall jujud from upstart, etc, on those machines. | 08:33 |
jam | which is something we want anyway, even if it isn't part of destroy-environment. | 08:33 |
jam | because otherwise once you've registered machines with An environment, reusing them is going to be a real pain | 08:33 |
jam | fwereade: certainly I don't think your suggestion is that once an env has ever had any machines manually registered, you can never destroy that env again ? | 08:34 |
fwereade | axw, my concern is with the units,not themachines | 08:36 |
fwereade | axw, we can't just stop units, can we? | 08:36 |
fwereade | er that was for jam | 08:36 |
fwereade | axw, re bootstrap machine, good question | 08:36 |
jam | fwereade: well, why can't we just stop units? | 08:37 |
fwereade | jam, my suggestion re manual machines is that they need to have their units removed explicitly | 08:37 |
fwereade | jam, because we'd thus leave the services running | 08:37 |
jam | fwereade: well, we are in a "terminate the world" condition, there won't be any services in just a few moments. | 08:37 |
fwereade | jam, but there will | 08:37 |
axw | fwereade: setting them to Dying causes the uniter to remove them eventually...? | 08:38 |
fwereade | jam, those machines will keep on running their services forever | 08:38 |
jam | fwereade: but whatever mechanism exists to actually stop the Postgres instance on that machine is something that we can trigger an DestroyEnv time, no? | 08:38 |
jam | fwereade: the code that axw put up does have a "destroyUnits" function. | 08:38 |
fwereade | jam, axw: I'm still going by the description -- but that makes sense *only* for manual machines, right? | 08:39 |
axw | fwereade: so it goes like this: 1. prevent new units/machines from being added; 2. destroy all units (and non-state machines). 3. tell state machines to destroy themselves | 08:39 |
axw | fwereade: yes, we can make it manual-only | 08:39 |
jam | fwereade: "destroys (and waits on death of) all units" is in the description | 08:39 |
axw | it's not currently, but it can be | 08:39 |
jam | fwereade, axw: I'm not so sure that we can get an accurate (1) today | 08:40 |
jam | at least from what fwereade has said about "destroy-machine --force" | 08:40 |
* jam goes to grab some coffee, biab | 08:40 | |
axw | jam: I've coded it into add-unit/add-machine transactions, I don't see any problems, but then that was the first time I touched mongo transactions :) | 08:41 |
fwereade | jam, axw: well, we could (i) destroy all services and (ii) gate add-machine and add-service on destroy-flag-not-set | 08:41 |
fwereade | jam, axw: is there any other operation that can demand resources? | 08:41 |
fwereade | jam, axw: add-unit, add-relation are already gated on the relevant services being alive | 08:41 |
axw | fwereade: yep, that's what I have done. see state/environ.go, state/service.go, state/state.go | 08:41 |
axw | fwereade: I added it into add-machine and add-unit, since they're the ones that we're waiting on the death of, and they're the ones that create agents | 08:42 |
axw | I added it into add-service too, but I think that assertion can come out | 08:43 |
fwereade | axw, I think it's a bad idea on add-unit | 08:43 |
fwereade | axw, it's bad enough that all unit additions to a given service have to be serialized | 08:43 |
fwereade | axw, bringing the environment doc into play makes *all* unit additions serialized | 08:43 |
axw | hmm true :( | 08:44 |
fwereade | axw, hence the destroy-all-services and prevent-add-service suggestion | 08:44 |
axw | fair enough, I didn't think about that bit | 08:44 |
fwereade | axw, core idea is solid though, I agree | 08:44 |
fwereade | axw, my main concern is not slowing down destruction of non-manual environments | 08:44 |
axw | fwereade: I will update the code to only do this for manual machines now | 08:45 |
axw | fwereade: well, I can do it quick and dirty - if instanceId startswith "manual:" is basically all we have to go by at the moment | 08:46 |
axw | is that acceptable? | 08:46 |
rogpeppe1 | axw, fwereade: i'm somewhat uncomfortable with gating the destruction of an environment on the orderly teardown of every unit in the environment | 08:47 |
fwereade | rogpeppe1, agreed | 08:47 |
rogpeppe1 | axw, fwereade: (manually provisioned units excepted, probably) | 08:47 |
axw | rogpeppe1: likewise, I'll only gate it on units allocated to manually provisioned machines | 08:47 |
rogpeppe1 | ok, cool. | 08:47 |
rogpeppe1 | the other thought i had was to split up the process into two parts | 08:48 |
rogpeppe1 | so that you don't need a timeout parameter | 08:48 |
rogpeppe1 | (which is always going to be wrong) | 08:48 |
axw | fwereade: I'm going to leave in the bit that calls Destroy on each unit/machine, though, to cater for JaaS | 08:48 |
axw | it just won't wait on them | 08:48 |
fwereade | axw, I think so, in itself -- but I feel like we're putting the cart before the horse here | 08:48 |
fwereade | axw, why would we do that? | 08:49 |
fwereade | axw, in a jaas environment we just nuke all the instances, bam, done, just like today (excepting manual) | 08:49 |
rogpeppe1 | so you could have a "Clean" or "StartDestroy" or something which initiated the process of terminating the manually provisioned machines | 08:49 |
axw | fwereade: I was thinking we just set them to Dying, and have the provisioner do its job in the background | 08:49 |
fwereade | axw, that will take forever | 08:49 |
fwereade | axw, and block on hook errors | 08:50 |
rogpeppe1 | and then a final "Destroy" which would fail to work if there were any non-dead manually provisioned machines | 08:50 |
axw | ok | 08:50 |
fwereade | axw, I'm not even against what you have in principle | 08:50 |
fwereade | axw, I think it's exactly right for taking down manual machines | 08:50 |
fwereade | axw, but I think it's a serious degradation in the non-manual case | 08:50 |
rogpeppe1 | that way the GUI can watch the status as usual while the manually provisioned machines are destroyed | 08:50 |
axw | yup, I can see that | 08:51 |
fwereade | axw, and I think we take a good step forward by simply moving a nuke-everything command inside the api server | 08:51 |
axw | rogpeppe1: sounds fair enough. the CLI has to poll for status tho? | 08:51 |
rogpeppe1 | we could potentially do that in the command line too | 08:51 |
rogpeppe1 | axw: it could use an AllWatcher just like the GUI | 08:51 |
rogpeppe1 | axw: or... hmm, yes, it would need some kind of watcher otherwise | 08:52 |
axw | fwereade: "the nuke-everything inside the api server" bit can come later, right? :) | 08:52 |
fwereade | axw, if the API call were *just* to do a StopInstances([all-non-manager-machines]) (and abort if manual machines exist) that would move a good chunk of the functionality to where we want it | 08:52 |
axw | ok | 08:52 |
axw | I can do that | 08:53 |
fwereade | axw, I do like the destroying flag too though | 08:53 |
axw | fwereade: I'm thinking about what you said about disallowing destroy-env if units eexist, and now I'm starting to change my mind | 08:54 |
axw | because a unit may fail to die | 08:54 |
fwereade | axw, yeah, I'm just thinking we can make progress without solving that -- and it will be easier to solve as we restrict the scope of what we're doing | 08:54 |
jam | axw: we could error both ways here, but should a bad hook on one unit keep all the other machines running? | 08:55 |
jam | fwereade: do you need anything more from us on: https://code.launchpad.net/~fwereade/juju-core/api-force-destroy-machines/+merge/194764 | 08:55 |
fwereade | axw, jam: ofc there is *also* independent demand for an ignore-hook-errors mode for unit shutdown | 08:55 |
axw | fwereade: true, the problem would go away with that | 08:56 |
axw | jam: ideally not, but we'd need some kind of feedback to let people know at least | 08:56 |
axw | alright, I'll just put a TODO in to fix it with ignore-hooks | 08:56 |
fwereade | jam, I'm not sure -- the machine-dying problem is out of scope, and the other bits wallyworld_ flags should be trivial fixes | 08:57 |
fwereade | jam, my current plan is just to tidy up the clear code problems and resubmit | 08:58 |
jam | fwereade: I just want to make sure you're unblocked, and still backporting the other fix to 1.16 | 08:58 |
fwereade | jam, I was going to backport both of those after I'd got this landed | 08:59 |
fwereade | jam, I'll ping you or wallyworld_ for a review once I've had a moment to work on it a bit :) | 09:00 |
jam | fwereade: also, you wanted to talk about Pinger stuff today instead of yesterday. | 09:00 |
jam | I don't want to INTERRUPT flood you, though. | 09:00 |
fwereade | jam, ah, yes, I did read back -- and I think I'm +1 on the idea that each SetAgentAlive creates an independent slot, we record all those slots somewhere in state, and we infer that any of those slots being alive indicates that the agent is | 09:02 |
fwereade | jam, minimally invasive and rewritey, I think | 09:02 |
fwereade | jam, the idea of defending against double-pings also has some merit but feels orthogonal | 09:03 |
fwereade | jam, best to interrupt-flood me in the morning, sometime this PM I plan to go dark and start simplifying some of the state test setup stuff | 09:03 |
fwereade | jam, because OMG it hurts to write that sort of code | 09:04 |
jam | fwereade: you're welcome to go dark whenever you need to. | 09:04 |
jam | fwereade: so when does SetAgentAlive actually trigger? It is the new heartbeat? or just on connect or ? | 09:04 |
fwereade | jam, I was thinking on connect still? | 09:05 |
jam | fwereade: I've personally moved on so that my favorite design actually disconnects the agents from the presence table, and we let the API server batch up "here are all the bits for agents I know about, set them all" | 09:05 |
fwereade | jam, that's fine in principle -- not sure it entirely justifies the effort but I'm willing to let someone closer to the code make that judgment | 09:06 |
fwereade | jam, the api server is certainly a good place to do all that, and if it's easy it definitely sgtm | 09:08 |
jam | fwereade: so SetAgentAlive would kick off a new slot for that agent in whatever mechanism we are using (independent Pingers or one global Pinger) | 09:09 |
jam | fwereade: I can say that a hypothetical 30s ping *10000 agents connected to an api server means you write an "$inc" to mongo every 3ms | 09:10 |
jam | I guess because of 64-bit limitations on the int field, you ever decrease the number of $inc requests by 64 | 09:10 |
jam | so a constant multiplier which is good, but doesn't actually change your O() | 09:11 |
fwereade | jam, yeah | 09:12 |
jam | you do batch them up nicer | 09:13 |
jam | but I was hoping for more of a log() sort of improvement. | 09:13 |
fwereade | jam, bright ideas in that direction will be accepted most gratefully ;) | 09:13 |
jam | fwereade: well, in mongo 2.6 (whenever that actually releases) we'll have binary operators like $bin setbitX | 09:14 |
jam | that doesn't change the load factor. | 09:14 |
jam | but it does avoid double ping corruption | 09:14 |
jam | It does propose the question "is it cheaper to have one request with 50 $inc, or 50 $inc requestts" | 09:15 |
fwereade | jam, one would hope the former, wouldn'tone | 09:16 |
jam | fwereade: *I* would hope so, yes. Though if you had distributed master I could see a case where it might parallelize the latter better. | 09:16 |
jam | fwereade: in my scale testing, I never did get mongo above 100% CPU | 09:17 |
jam | so we still have to watch out for it becoming the bottleneck | 09:17 |
fwereade | jam, yeah, it's nice that it isn't today though | 09:17 |
jam | fwereade: well, it might have slightly been the bottleneck in the add-unit one-by-one case. | 09:18 |
jam | It is hard to tell when you have 600% CPU in Juju, and 90+% in mongo | 09:18 |
jam | and "add-relation nrpe-external-master" just goes into death throws, so I don't know who to blame there. :) | 09:19 |
fwereade | jam, for that I think we should be moving the add-subordinate logic off the unit agent and into the api server, I suspect | 09:22 |
fwereade | jam, distributing the logic doesn't actually help anyone in any way afaict | 09:22 |
fwereade | jam, but that might itself be fiddly | 09:22 |
fwereade | jam, and there are some benefits to how we have it today | 09:23 |
jam | fwereade: well when I tested it on m1.xlarge (15GB) with just 1000 machines, we start at about 1GB of Res, and as soon as nrpe-external-master is related, we go OOM | 09:23 |
jam | so *something* on the API server is consuming *lots* of memory for running this | 09:24 |
jam | and we only end up with something like 3-4 actual "nrpe-external-master" agents in status | 09:24 |
jam | that actually even get into "pending" | 09:24 |
jam | and I think when jujud restarts after OOM, we end up at 0% CPU on all processes, and nothing making progress. | 09:24 |
fwereade | jam, I suspect the problem is in the complexity of EnterScope having to add units as well | 09:25 |
fwereade | jam, if we created appropriate subordinates at AddUnit time, and queued up a bunch of them at AddRelation time, we might have a happier time | 09:26 |
fwereade | jam, or *maybe* we can just write EnterScope a bit more smartly | 09:27 |
jamespage | jam, fwereade: not sure whether this has been discussed or not, but are we going to provide any sort of upgrade path for juju 12.04 users to 14.04? | 09:27 |
fwereade | jam, it was the first complex txn logic I ever write so it's unlikely to be the best | 09:27 |
jamespage | just trying to figure out what we need todo re charms for openstack this cycle | 09:27 |
jamespage | (12.04 -> 14.04 deployments that is) | 09:27 |
fwereade | jamespage, the assumption has hitherto been that machines on one series will stay forever on that series | 09:28 |
jam | jamespage: as in, how to take your existing workload and upgrade all of your units so they are running on 14.04 ? | 09:28 |
jamespage | fwereade, yeah - that was what I thought | 09:28 |
jamespage | jam: yes | 09:28 |
fwereade | jamespage, it's a bit crap, but I don't really see us addressing it this cycle | 09:28 |
jamespage | if its not a supported path thats OK | 09:28 |
fwereade | jamespage, cool | 09:28 |
jam | fwereade: and the Charm story means that you can't just upgrade the service, you have to deploy a new service ? | 09:28 |
fwereade | jam, yeah, the model doesn't really allow for series changes at all | 09:29 |
jamespage | jam: yeah - that was the complicating factor from my perspective | 09:29 |
jamespage | the 12.04 charms support cloud-archive through to havana and up to saucy OK | 09:29 |
fwereade | jam, apart from anything else, we'd need cross-series charms for it to even begin to make sense | 09:29 |
jamespage | but it feels like we have a bit of a clean sheet with 14.04 | 09:29 |
jamespage | i.e. we can drop some of the cruft :-) | 09:30 |
fwereade | jam, maybe that's a bullet that's better bitten sooner than later | 09:30 |
fwereade | jam, and it dovetails nicely with the workload-not-os focus | 09:30 |
fwereade | jam, but it also rather gives me the fear | 09:30 |
fwereade | jam, it'd demand a good solid block of scheduled time at the very least | 09:31 |
jam | fwereade: it would be nice if you could say "new units of service X are deployed with this version of the charm" | 09:32 |
jam | and then you can deploy 10 more 14.04 units, and destroy the original ones. | 09:32 |
jam | It is also a nice story for upgrade of charms in general | 09:32 |
jam | even without a series bump | 09:33 |
jam | so that you can do a "safe" migration rather than "everything at once" | 09:33 |
fwereade | jam, sure, but precise/nova is currently considered completely distinct from trusty/nova | 09:33 |
fwereade | jam, that's what I mean by cross-series charms | 09:33 |
fwereade | jam, I agree with what you're saying about smooth upgrades regardless | 09:34 |
fwereade | jam, for os upgrades, even if the charms were the same you'd have to deal with all the subordinates and colocated services too | 09:35 |
fwereade | jam, doable in theory but an *awful* lot of work in practice | 09:36 |
axw | rogpeppe1: just thinking about what you were suggesting before: it's not *that* easy, because the last thing DestroyJuju does is tell the state servers to kill themselves | 09:36 |
rogpeppe1 | axw: that's why i'd split it into two | 09:36 |
axw | so... destroy everything else, wait in the CLI, then finalise? | 09:36 |
rogpeppe1 | axw: that was my thought, yes | 09:37 |
rogpeppe1 | axw: and the finalisation step would fail if the appropriate pieces were not torn down correctly | 09:37 |
rogpeppe1 | axw: (it would probably have a force flag too though) | 09:37 |
rogpeppe1 | axw: then we can give useful feedback in the CLI if we want | 09:38 |
axw | hmm yes, that would be nice | 09:38 |
axw | sold | 09:38 |
* axw deletes the comment about timeout length | 09:38 | |
rogpeppe1 | axw: cool | 09:38 |
rogpeppe1 | fwereade: does the above seem reasonable to you? | 09:38 |
fwereade | rogpeppe1, axw: I feel like the best first cut is still: (1) abort if manual machines exist (2) destroy services, set destroying flag (3) directly StopInstances for non-managers in the API server (4) environ.Destroy in the CLI | 09:40 |
fwereade | rogpeppe1, axw: that can then be upgraded to allow for clean shutdown of manual machines, and force-destroy in state of other non-manager ones | 09:41 |
rogpeppe1 | fwereade: what do you mean when you say "destroy services" ? | 09:41 |
axw | fwereade: ah sorry, I thought you came around to my proposal ;) | 09:42 |
axw | mk | 09:42 |
fwereade | rogpeppe1: set dying to prevent more units being added | 09:42 |
fwereade | axw, um, sorry, I probably misread something | 09:42 |
rogpeppe1 | fwereade: ah yes, seems reasonable | 09:42 |
rogpeppe1 | fwereade: is the destroying flag not just Dying on the environment? | 09:43 |
fwereade | axw, I thought we were at least in agreement about where we wanted to end up, maybe there was confusion about the exact steps | 09:43 |
axw | fwereade: yeah, I can pare it back a bit for the first cut | 09:43 |
fwereade | axw, I'm not so bothered about those so long as none of the steps leaves us with a significantly less performant destroy-environment | 09:44 |
axw | ok | 09:44 |
rogpeppe1 | fwereade: tbh i think that if we're going to call environ.Destroy in the CLI there's not really any point in destroying any services | 09:44 |
fwereade | rogpeppe1, the service destruction prevents add-unit | 09:44 |
fwereade | rogpeppe1, without making every add-unit have to check the destroying flag | 09:44 |
rogpeppe1 | fwereade: but if we're killing the state server, add-unit is nicely prevented anyway :-) | 09:44 |
fwereade | rogpeppe1, add-unit during StopInstances seems like a recipe for races | 09:45 |
rogpeppe1 | fwereade: hmm, you're probably right there | 09:46 |
rogpeppe1 | fwereade: isn't it racy anyway? | 09:46 |
fwereade | rogpeppe1, "stop all non-manual non-manager instances" is, I think, an atom of functionality that remains useful in most contexts | 09:46 |
fwereade | rogpeppe1, hmm, maybe? what in particular? | 09:47 |
rogpeppe1 | fwereade: if the provisioner has just seen a unit, a machine can be provisioned just as we're calling StopInstances | 09:47 |
rogpeppe1 | fwereade: i'm not sure there's any way around it | 09:47 |
fwereade | rogpeppe1, well, StopInstances has to take its instance ids from the machines in state, doesn't it? | 09:48 |
fwereade | rogpeppe1, (btw I think StopInstances may be a bit broken anyway -- it should surely take ids, not instances?) | 09:48 |
rogpeppe1 | fwereade: no, i don't think it does | 09:49 |
rogpeppe1 | fwereade: (have to take instance ids from the machines in state, that is) | 09:49 |
rogpeppe1 | fwereade: it takes instance ids from the instances returned by AllInstances, i think | 09:49 |
fwereade | rogpeppe1, depends how providers react to being told to stop instances they're claiming don't exist in an Instances call | 09:49 |
rogpeppe1 | fwereade: Environ.Destroy does anyway | 09:50 |
fwereade | rogpeppe1, yeah, but AllInstances is kinda useless in practice :( | 09:50 |
rogpeppe1 | fwereade: because of eventual consistency? | 09:50 |
fwereade | rogpeppe1, in ec2 at least it's not uncommon for Instances/AllInstances to straight-up lie for minutes at a time | 09:50 |
fwereade | rogpeppe1, yeah | 09:51 |
rogpeppe1 | fwereade: in which case we're stuffed - there's nothing we can do to prevent an instance escaping after an environment destroy | 09:51 |
fwereade | rogpeppe1, wasn't there at one stage a Destroy param that was explicitly "and stop these instances that we *know* exist even if the provier's lying"? | 09:51 |
rogpeppe1 | fwereade: there was (maybe still is) | 09:51 |
fwereade | rogpeppe1, well we can at least issue a Stop call for everything we believe to exist | 09:52 |
rogpeppe1 | fwereade: yeah | 09:52 |
fwereade | rogpeppe1, which means taking those ids from state | 09:52 |
rogpeppe1 | fwereade: but i *think* i still feel that destroying the services is overkill and won't do the job correctly | 09:53 |
fwereade | rogpeppe1, are you saying there's no point restricting add-* commands when we know the env is being destroyed? | 09:53 |
rogpeppe1 | fwereade: yeah, because we immediately destroy all the machines anyway | 09:54 |
rogpeppe1 | fwereade: and there's a race regardless | 09:54 |
fwereade | rogpeppe1, I think it's actually quite nice to inform other users that their commands will have no effect | 09:54 |
rogpeppe1 | fwereade: if they issued the command half a second earlier, their commands would have no effect either | 09:55 |
rogpeppe1 | fwereade: but they wouldn't get told | 09:55 |
fwereade | rogpeppe1, stopping all instances might take more than half a second | 09:55 |
rogpeppe1 | fwereade: true. we should probably start by destroying the state servers... | 09:56 |
rogpeppe1 | fwereade: i just feel that destroying all the services might itself take quite a long time, and it's not really necessary | 09:58 |
rogpeppe1 | fwereade: and that we should just cut to the chase and stop the user burning money asap | 09:58 |
axw | rogpeppe1: it is useful for manual, because it wants to know which units to kill | 09:58 |
axw | (not that we're doing that yet) | 09:58 |
rogpeppe1 | axw: agreed - we definitely need to tear down manual units first | 09:59 |
fwereade | rogpeppe1, destroying the state servers really feels to me like the thing to do *last* | 09:59 |
axw | yes | 09:59 |
fwereade | rogpeppe1, axw: but can I recuse myself briefly? I'd like to address wallyworld_'s review before he goes to bed | 10:00 |
axw | yup, thanks for your time fwereade | 10:00 |
rogpeppe1 | axw, fwereade: destroying the state servers puts the environment into an inactive state - no new instances will be created, for example. | 10:02 |
axw | gtg, will come back to this tomorrow | 10:07 |
fwereade | rogpeppe1, putting them into a temporary not-accepting-requests mode is kinder and cleaner, I think | 10:09 |
rogpeppe1 | fwereade: depends how long it takes - if i've got an almost-not-working environment, running n transactions to destroy the svcs might take a long time when all i want to do is destroy the darn environment. | 10:10 |
rogpeppe1 | fwereade: but fair enough, if you think it's that important | 10:11 |
dimitern | fwereade, ping | 10:20 |
dimitern | fwereade, I think I'm finally ready to propose the first part of upgrade-juju changes, and I'll appreciate if you look into it closely, because I probably missed something important | 10:21 |
jam | rogpeppe1, fwereade: I'm going to poke at the GOMAXPROCS thing again. Though I'd like to (a) not do it during the test suite and (b) check for GOMAXPROCS if set. Is there a sane way to know that we *aren't* in the test suite? I'd rather do it that way then have to change every test case to turn off the effect. | 10:21 |
fwereade | dimitern, cool, I just need to pop out before the standup, I will take a look afterwards | 10:22 |
dimitern | fwereade, cheers | 10:22 |
fwereade | jam, set a "this is real code flag" in the various main.main() funcs? | 10:24 |
jam | fwereade: yeah, that's sort of what I was thinking. | 10:24 |
fwereade | wallyworld_, jam: https://codereview.appspot.com/24790044 addressed, I think | 10:37 |
wallyworld_ | fwereade: thanks, will take a look | 10:38 |
jam | saw the email, looking already | 10:38 |
dimitern | fwereade, there it is https://codereview.appspot.com/25080043 | 10:40 |
fwereade | dimitern, cheers | 10:40 |
rogpeppe1 | jam: one way to know we're not in a test suite is to do the GOMAXPROCS initialization in a package that's not imported by a test | 10:41 |
rogpeppe1 | jam: though i would really like it much better if our tests passed reliably with GOMAXPROCS=anything | 10:41 |
jam | rogpeppe1: if it is imported by main() won't it get imported by a test ? | 10:41 |
rogpeppe1 | jam: hmm, yeah you're right, for cmd/* tests | 10:42 |
jam | rogpeppe1: I'm sure I'd like it if our tests just worked :) | 10:43 |
wallyworld_ | fwereade: did you forget to put in the destryMachineDoc in the cmd Info? | 10:43 |
rogpeppe1 | jam: another possibility is to put GOMAXPROCS=1 in test initialization | 10:43 |
rogpeppe1 | jam: actually, the best solution is just to call the GOMAXPROCS initialization inside main() | 10:44 |
rogpeppe1 | jam: main only gets called in very limited circumstances in our test suite | 10:45 |
rogpeppe1 | jam: and if we needed to, we could override it in that case (when we're execing the test binary) by setting the GOMAXPROCS env var | 10:46 |
wallyworld_ | fwereade: btw that lgtm assumes the destroyMachineDoc is used in the right place | 10:58 |
fwereade | oh ffs | 11:42 |
fwereade | google hates me today | 11:42 |
fwereade | dimitern, rogpeppe1, mgz: I think I will pick this discussion up when I've had a chance to review the branch | 11:42 |
dimitern | fwereade, ok, np | 11:43 |
fwereade | ...and eat lunch ;p | 11:48 |
* TheMue => lunch | 12:33 | |
mattyw | fwereade, when you're back from lunch I've got a couple of small questions | 12:35 |
rogpeppe1 | fwereade, dimitern, jam: it looks to me as if we can't use upsert in a mongo transaction. is that right? | 12:45 |
rogpeppe1 | niemeyer: ^ | 12:47 |
* rogpeppe1 goes for lunch | 13:03 | |
=== gary_poster|away is now known as gary_poster | ||
rogpeppe1 | fwereade: do you know of any existing logic in state that's similar to what we need for the collection of state servers? originally i was thinking of using $addToSet, but it doesn't quite fit the requirements; then i thought of using upsert, but it looks like we can't use that with transactions. | 14:10 |
fwereade | rogpeppe1, not offhand -- I'd been vaguely expecting the meat to be a map[id]address for convenience of insert/remove | 14:11 |
rogpeppe1 | fwereade: i'd thought that too | 14:12 |
fwereade | rogpeppe1, wrt interpreting an existing environment without on of those docs, I don't think there's a way around counting first to figure out whether to insert/update | 14:12 |
dimitern | fwereade, how about my review btw? | 14:13 |
fwereade | dimitern, I'm doing it :) | 14:14 |
rogpeppe1 | fwereade: i'm not sure whether a map[id]address can work, but i've just thought of something; maybe it can | 14:14 |
dimitern | fwereade, cheers | 14:14 |
TheMue | one review please: https://codereview.appspot.com/24040044 | 14:24 |
TheMue | rogpeppe1: calendar says you're reviewer today ;) | 14:24 |
rogpeppe1 | TheMue: ah, good point, will have a look | 14:24 |
TheMue | rogpeppe1: and there's also https://codereview.appspot.com/15080044/, the script friendly outpout of env/swich | 14:42 |
rogpeppe1 | TheMue: will look at that next, thanks | 14:43 |
TheMue | rogpeppe1: there's one field to remove in there, see martins comment, and he has troubles with the idea (you remember, default vc argument raw output) | 14:43 |
TheMue | rogpeppe1: I've got to thank you | 14:44 |
fwereade | dimitern, sorry, but I'm having real trouble following what's actually going on in that CL -- I have a bunch of comments on the documentation, but my brain keeps melting when I try to follow what's actually happening overall | 14:48 |
fwereade | dimitern, I freely stipulate that this is because it's all based on my own shitty code that should never have seen the light of day | 14:49 |
fwereade | dimitern, but it's giving me real problems when it comes to reviewing helpfully | 14:49 |
fwereade | dimitern, especially since --upload-tools keeps on interfering | 14:50 |
fwereade | dimitern, but I'm coming to feel this is a case where we should fix it, not patch it, if that phrasing makes sense to you | 14:51 |
dimitern | fwereade, expand a bit please | 14:51 |
dimitern | fwereade, the main changes are in validate(), when --version is not given, hence v.chosen is Zero | 14:52 |
dimitern | fwereade, in addition to the removal of --dev flag and logic around it | 14:52 |
fwereade | dimitern, I'm saying the code was largely incomprehensible before you started, and that makes it hard to validate your changes | 14:52 |
dimitern | fwereade, I go over all discovered tools for that major version and filter out the incompatible within [current, nextSupported], picking newest at the end | 14:53 |
dimitern | fwereade, ah | 14:53 |
dimitern | fwereade, well, it is convoluted somewhat yes :) | 14:53 |
fwereade | dimitern, (fwiw it is all my fault, I'm not going to defend the original form of the code) | 14:53 |
fwereade | dimitern, but I'm wondering whether you're in a good position to rip all that underlying crap out and start afresh with something sane | 14:54 |
dimitern | fwereade, that's a bit scary | 14:55 |
fwereade | dimitern, yeah, I can understand that :) | 14:55 |
dimitern | fwereade, the mess is around --upload-tools now mostly | 14:55 |
fwereade | dimitern, yeah, and that's a lot of what bothers me | 14:56 |
dimitern | fwereade, but didn't we decide to keep that behavior for now and just report is a deprecated? | 14:57 |
fwereade | dimitern, because I no longer really understand how upload-tools interacts with anything else | 14:57 |
fwereade | dimitern, Idon't know, the internet hated me so lost the end of that conversation | 14:57 |
dimitern | fwereade, well, in summary | 14:57 |
dimitern | fwereade, we discussed eventually replacing the need of --upload-tools everywhere with some dev docs + possibly scripts/tools that use sync-tools when you would've used --upload-tool | 14:58 |
dimitern | fwereade, but for now iirc decided to deprecate --upload-tools with a message and keep it working for easier transition; it'll also need some announcement on the mailing list at least | 14:59 |
fwereade | dimitern, if we've ever recommended that anyone use upload-tools I guess we should | 15:00 |
fwereade | dimitern, but I would hope wenever had | 15:00 |
fwereade | dimitern, it's always been a dev tool | 15:00 |
dimitern | fwereade, yep | 15:01 |
fwereade | dimitern, so who's transitioning? | 15:01 |
fwereade | dimitern, keeping it around makes the transition harder | 15:01 |
jcsackett | abently, sinzui: can you look at https://code.launchpad.net/~jcsackett/charmworld/rip-out-old-queues/+merge/194647 and/or https://code.launchpad.net/~jcsackett/charmworld/multiple-appflowers-2/+merge/194648 | 15:01 |
fwereade | dimitern, and just preserves the shitty devs-first attitude that has plagued the whole problem from the beginning | 15:01 |
dimitern | fwereade, it's useful when developing/testing stuff from source, mainly for us (one reason rogpeppe1 was not sold on removing it right away, and I get that), but we should replace it with sync-tools + scripts / makefile commands to see how it works out | 15:02 |
sinzui | jcsackett, I can | 15:02 |
fwereade | dimitern, the fundamental *problem* with all our tools stuff is, and always has been, that --upload-tools works for devs | 15:03 |
jcsackett | sinzui: thanks. | 15:03 |
dimitern | fwereade, despite being tempting to rip it out and rewrite it, let's do it in steps I suggest | 15:03 |
rogpeppe1 | fwereade: given that people *are* using it, i think peremptorily removing it without at least some notification to the group might be considered a little rude | 15:04 |
fwereade | dimitern, I'm just worried that building on what we have inevitably takes us *further* from a good solution | 15:04 |
fwereade | rogpeppe1, who's using it? | 15:04 |
rogpeppe1 | fwereade: anyone that's following the dev version for whatever reason | 15:04 |
fwereade | rogpeppe1, surely not? we distribute dev tools | 15:05 |
dimitern | fwereade, rogpeppe1, well, if it's just us, then it's no problem | 15:05 |
rogpeppe1 | fwereade: true. anyone that builds from tip | 15:05 |
fwereade | rogpeppe1, anyone who's building from source themselves is a developer | 15:05 |
fwereade | rogpeppe1, and can I think be expected to deal with it | 15:06 |
rogpeppe1 | fwereade: we should at least provide some reasonable replacement so everyone doesn't have to reinvent their own script | 15:07 |
fwereade | rogpeppe1, yeah, no argument there | 15:07 |
dimitern | absolutely +1 | 15:07 |
rogpeppe1 | fwereade: in particular i'd like to be able to keep my current workflow (change some source, try to bootstrap with tools built from the current source) | 15:08 |
dimitern | rogpeppe1, or upgrade for that matter | 15:08 |
fwereade | rogpeppe1, dimitern: I don't really care how hard it is for us tbh... making it easy for ourselves made it crap for everyone else | 15:09 |
rogpeppe1 | dimitern: indeed - that's also important to get right, so the developer doesn't constantly need to be fiddling with the build version in version.go (something that will inevitably leak into trunk at some point) | 15:09 |
rogpeppe1 | fwereade: i don't really see how it interfered with non-dev usage | 15:09 |
fwereade | rogpeppe1, tools distribution was total crack until just before 13.04, was was only just downgraded to 98% crack in time for release | 15:10 |
fwereade | rogpeppe1, this is precisely because we were all using a non-standard code path | 15:10 |
rogpeppe1 | fwereade: wasn't the crackfulness only exposed if you did upload-tools? | 15:11 |
fwereade | rogpeppe1, the cost to us of constructing a realistic environment in which to run the same code as our users do is *far* less than the cost of having the real code path so little-travelled as to be barely a game trail | 15:11 |
fwereade | rogpeppe1, no, upload-tools did the right thing | 15:12 |
rogpeppe1 | fwereade: you're probably thinking of a different piece of crack to me | 15:12 |
fwereade | rogpeppe1, normal users would basically run random versions of the tools | 15:12 |
rogpeppe1 | fwereade: oh, you mean the "select highest version" logic? | 15:12 |
fwereade | rogpeppe1, and those versions would usually work well enough to downgrade themselves to the desired versions | 15:13 |
dimitern | :D | 15:13 |
fwereade | rogpeppe1, but none of us *noticed* because we wererunning a *different code path* | 15:13 |
fwereade | rogpeppe1, and now we have a bit of time scheduled to work on upgrades I am *very* keen that we avoid getting back into that particular failure mode | 15:14 |
rogpeppe1 | fwereade: ok | 15:14 |
fwereade | rogpeppe1, cheers -- it's not even an issue with screwing up the code, because that basically always happens,because we're human -- it's about making sure we have a chance of detecting the problem | 15:15 |
* fwereade quick ciggie, think | 15:16 | |
niemeyer | rogpeppe1: Yes, you cannot upsert in a mongo transaction.. you can insert and update, though | 15:30 |
niemeyer | rogpeppe1: or update and insert | 15:30 |
niemeyer | rogpeppe1: (ordering matters) | 15:30 |
niemeyer | rogpeppe1: Sorry for missing your message earlier | 15:31 |
fwereade | niemeyer, ha, ofc -- nice :) | 15:31 |
rogpeppe1 | niemeyer: so if you don't know if something exists, it would be ok to do both the insert and the update, and one would succeed, so all would work? | 15:31 |
fwereade | rogpeppe1, I think you'd want update-then-insert,so the update silently fails | 15:31 |
niemeyer | rogpeppe1: Yeah | 15:32 |
niemeyer | fwereade: Right | 15:32 |
rogpeppe1 | niemeyer: cool, thanks | 15:32 |
niemeyer | There's a message in juju-dev | 15:32 |
niemeyer | on Aug 30th | 15:32 |
niemeyer | Subject "Upcoming fix in mgo/txn" | 15:32 |
rogpeppe1 | niemeyer: cool, thanks | 15:33 |
rogpeppe1 | niemeyer: in fact i'm probably going to be using a single document, but that's really useful to know | 15:33 |
rogpeppe1 | TheMue: you've got a review | 16:01 |
rogpeppe1 | TheMue: and another one | 16:11 |
jcastro | what was the final name of the ssh/manual provider? | 16:26 |
TheMue | rogpeppe1: thx | 16:27 |
natefinch | mental note: don't do apt-get update && apt-get upgrade during the workday :/ | 16:32 |
mgz | natefinch: indeed, it's an after-work leisure activity :) | 16:32 |
natefinch | mgz: My old laptop is pissed I'm replacing it, evidently | 16:33 |
TheMue | natefinch: reasonable | 16:54 |
jam | jcastro: manual | 17:05 |
jcastro | jam, ok so https://juju.ubuntu.com/docs/config-manual.html is correct terminology-wise? | 17:05 |
mattyw | what version of mongo to people use to test on precise? I get all kinds of panics when running go test ./state/... | 17:06 |
jam | mattyw: there should be a version in ppa:juju/stable or the cloud tools archive. | 17:06 |
jam | I think it is 2.2.4? | 17:06 |
jcastro | jam, I assume that core will write manual instead of "null" to the environments yaml? | 17:06 |
jam | mattyw: we need SSL support which wasn't in the stock precise version | 17:06 |
mattyw | jam, I remember that being the case - 2.2.4 is what I'm on - I'll paste the errors in case anyone is interested | 17:07 |
dimitern | rogpeppe1, thanks for the review as well | 17:12 |
rogpeppe1 | dimitern: np | 17:12 |
=== racedo` is now known as racedo | ||
mattyw | this is more or less a summary of the errors I see http://paste.ubuntu.com/6406334/ | 17:25 |
* rogpeppe1 is done for the day. | 19:30 | |
rogpeppe1 | g'night all | 19:30 |
thumper | natefinch: if you have time... https://codereview.appspot.com/24980043/ | 20:46 |
thumper | pretty trivial move of functions | 20:46 |
thumper | + a few tests | 20:46 |
natefinch | thumper: sure thing | 20:46 |
thumper | natefinch: I wish go had a better way to handle dependencies than just packages... | 20:48 |
natefinch | thumper: you mean like versioning on dependencies? | 20:48 |
thumper | I have to move some interfaces out of a package where they fit naturally to avoid circular imports | 20:48 |
natefinch | thumper: Oh I see | 20:48 |
thumper | I don't have a good answer for that yet | 20:49 |
natefinch | thumper: yeah, the circular imports thing is annoying... but it generally means you're doing something wrong anyway. | 20:49 |
thumper | I had moved them up a package, from container/kvm to container, but it doesn't naturally fit | 20:49 |
* thumper tries to move something else | 20:49 | |
* thumper has an idea | 20:50 | |
natefinch | thumper: there's actually the exact same problem in C# - you can't have X depend on y and Y depend on x. | 20:50 |
* thumper nods | 20:51 | |
thumper | natefinch: ya know what, this circular import was because I was doing something wrong | 20:59 |
thumper | changed it, and now it is better | 20:59 |
natefinch | thumper: is it me, or is RemoveContainerDirectory actually supposed to be MoveContainerDirectory? | 21:01 |
thumper | well, it should probably be renamed :) | 21:02 |
thumper | but move is a little general | 21:02 |
thumper | it does remove it from the main dir | 21:02 |
thumper | but keeps it around for "debugging" | 21:02 |
thumper | which most people never need | 21:02 |
thumper | I have a mental bug about cleaning it out when you start a new local provider | 21:03 |
natefinch | thumper: do we care that there's a race condition in RemoveContainerDirectory if two things both find a unique directory name and then both try to rename their directory to it? | 21:06 |
thumper | natefinch: no, because there is only ever one goroutine doing this | 21:06 |
thumper | the provisioner | 21:07 |
thumper | may be worth noting in the function description though | 21:07 |
natefinch | thumper: yeah.... it would be good to note it's not thread safe | 21:08 |
* thumper wonders why he didn't get an email about natefinch's reply | 21:30 | |
natefinch | thumper: I just hit "go" like 30 seconds ago | 21:30 |
thumper | oh | 21:30 |
thumper | :) | 21:30 |
natefinch | thumper: sorry for all the nitpicks | 21:31 |
thumper | that's fine | 21:31 |
* thumper goes of the nits | 21:31 | |
thumper | natefinch: I have another, Rietveld: https://codereview.appspot.com/25550043 | 21:31 |
thumper | natefinch: next branch in the pipeline | 21:31 |
thumper | lots of initial boilerplate for the kvm containers | 21:32 |
thumper | natefinch: we can't export them just for tests because the tests are in other packages | 21:32 |
natefinch | thumper: oh yeah. Man I hate that. | 21:33 |
thumper | this does make me sad | 21:33 |
thumper | but go only has one protection mechanism | 21:33 |
thumper | and is kinda anal about you circumventing it | 21:33 |
natefinch | thumper: to be fair, it's because we're trying too hard to have other packages know about the internals of this one | 21:34 |
natefinch | thumper: other packages shouldn't need to know about the removedDirectory, even for tests | 21:34 |
thumper | well... | 21:34 |
thumper | we want to be able to mock them out though | 21:34 |
thumper | having a nice clean way to do that is difficult | 21:35 |
natefinch | yeah | 21:35 |
thumper | because whatever you choose | 21:35 |
thumper | it is iky | 21:35 |
natefinch | yep | 21:35 |
natefinch | wonder if the way to do it is to just have Mock() and Unmock() methods in the package, and let it figure out how to do it. Not great, though. I wonder if anyone else has figured out cleaner ways to do it. | 21:37 |
thumper | natefinch: most other projects have one package :-| | 21:37 |
natefinch | thumper: I'm not convinced of that. Not real projects that actually do things. | 21:38 |
thumper | :) | 21:38 |
thumper | MoveToRemovedDirectory? | 21:39 |
thumper | awkward | 21:39 |
natefinch | It's honestly a little weird that this detail is exposed. The reason you're doing it is to avoid circular dependencies? | 21:39 |
thumper | which details? | 21:40 |
thumper | the directory stuff? | 21:40 |
natefinch | yeah | 21:40 |
thumper | I've moved it up because it is common between lxc and kvm | 21:40 |
thumper | we need somewhere to write the user data and have container logs | 21:40 |
natefinch | yeah.. yeah, that's true. | 21:41 |
thumper | I may just go with container.RemoveDirectory | 21:42 |
natefinch | thumper: yeah, I think that's fine | 21:43 |
natefinch | thumper: I'm not going to be able to get that review done before I have to go relieve my wife of the screaming baby at 5. | 21:53 |
thumper | natefinch: that's fine | 21:53 |
thumper | I'll get wallyworld_ to do it :) | 21:54 |
wallyworld_ | \/ | 21:54 |
thumper | wallyworld_: here is that IsKVMSupported method you are after :) https://code.launchpad.net/~thumper/juju-core/kvm-containers/+merge/194946 | 21:58 |
wallyworld_ | great, will look soon | 21:59 |
thumper | wallyworld_: although, I was thinking perhaps we should return (bool, error) | 21:59 |
thumper | instead of just bool | 21:59 |
wallyworld_ | yes, +1 to error | 21:59 |
thumper | so if the cpu-checkers package isn't installed, or for some reason /bin/kvm-ok isn't there, we get an error | 21:59 |
wallyworld_ | yes, agreed. need that error | 22:03 |
=== gary_poster is now known as gary_poster|away | ||
wallyworld_ | thumper: just started looking - the kvmContainer retains a reference to the factory, the lxc container doesn't - is there a fundamental difference i will encounter as i read more of the code? | 22:20 |
thumper | I'm not sure... | 22:23 |
thumper | wallyworld_: all you should really care about is the public interfaces | 22:26 |
thumper | because implementation is likely to change | 22:27 |
wallyworld_ | as a user yes :-) i was trying to grok the implementation | 22:27 |
thumper | :) | 22:27 |
thumper | the implementation may change as it is all not-there-yet :) | 22:28 |
* wallyworld_ nods | 22:28 | |
wallyworld_ | i was comparing the lxc and kvm implementations / structs etc to see the differences | 22:28 |
* thumper nods | 22:29 | |
wallyworld_ | since as lxc is already in the code base, if this mp is similar, makes reviewing easier | 22:30 |
wallyworld_ | thumper: why does kvm package redefine ContainerFactory interface? | 22:34 |
thumper | ContainerFactory is about the low level containers themselves | 22:34 |
thumper | the container.Manager interface is how juju interacts with containers | 22:34 |
thumper | the mock implementation is around the ContainerFactory and Container | 22:34 |
thumper | not the container.Manager | 22:34 |
wallyworld_ | ok | 22:34 |
wallyworld_ | although since the interface is identical for lxc and kvm, i would have thought the implementing structs would have been different, but interface abstracted out as common | 22:36 |
wallyworld_ | otherwise value of having an interface is diminished | 22:37 |
wallyworld_ | may as well just use structs | 22:37 |
thumper | part of the problem is the lxc versions are in the golxc package | 22:38 |
thumper | not juju | 22:38 |
thumper | interfaces mean I can mock it out | 22:38 |
thumper | that in itself is good | 22:38 |
* thumper has to nip out | 22:39 | |
thumper | bbl | 22:39 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!