[00:41] <pullies> hi, i'm trying out ensemble but when running `ensemble status` i get an Invalid SSH key message
[00:42] <pullies> i have specified authorized-keys-path in my environment (that's what let me run bootstrap in the first place)
[00:43] <pullies> this is when using the ppa, btw.
[00:43] <pullies> any advice?
[00:51] <niemeyer> pullies: Hey there
[00:51] <niemeyer> pullies: Hmmm
[00:53] <niemeyer> pullies: What's the content of the file you're pointing at with the authorized-keys-path?
[00:55] <niemeyer> pullies: It usually gets that automatically from your ~/.ssh/id_dsa.pub or ~/.ssh/id_rsa.pub files
[00:55] <niemeyer> pullies: Did you have these when you were seeing the error earlier?
[00:58] <RoAkSoAx_> niemeyer: who creates id_{dsa|.pub
[00:58] <RoAkSoAx_> niemeyer: who creates id_{dsa|rsa}.pub
[00:58] <RoAkSoAx_> in the zookeeper?
[00:58] <niemeyer> RoAkSoAx_: deploy does
[00:58] <niemeyer> RoAkSoAx_: It serializes with the environment
[01:02] <RoAkSoAx_> niemeyer: ok cool, cause I was encountering situations on which the zookeeper complaint about the keys but there were no keys
[01:03] <RoAkSoAx_> niemeyer: err was looking for keys, but there were no keys
[01:03] <niemeyer> RoAkSoAx_: Ok.. pullies was just reporting a similar issue above
[01:03] <niemeyer> RoAkSoAx_: Maybe that's the problem
[01:03] <niemeyer> pullies: Have you deployed something?  That could be the issue
[01:03] <niemeyer> We should handle that bootstrapping phase better in that sense
[01:05] <RoAkSoAx_> niemeyer: but in my cause, after bootstrapping the zookeeper dones't have any .pub keys created 
[11:22] <hazmat> g'morning
[11:46] <fwereade> heya hazmat :)
[12:22] <pullies> sorry, i disappeared for the night, as you could probably tell.  ;-)  i have deployed nothing.
[12:22] <pullies> -----BEGIN RSA PRIVATE KEY-----
[12:22] <pullies> (a base64 encoded string which i won't paste here)
[12:22] <pullies> -----END RSA PRIVATE KEY-----
[12:26]  * hazmat has to give an ensemble presentation tonight, but his laptop does a kernel panic everytime it sleeps
[12:26] <hazmat> sadness
[12:46] <fwereade> did something about how we handle $PYTHONPATH change?
[12:47] <hazmat> fwereade, no.. not in a very long time
[12:47] <fwereade> hazmat: ok, I'm doing something dumb then :)
[12:47] <hazmat> fwereade, there is a new bug open about pythonpath being set for hooks which it probably shouldn't be
[12:47] <hazmat> fwereade, what's the sympton?
[12:48] <fwereade> I set PYTHONPATH, run bin/ensemble, and it still picks up the system version
[12:48] <hazmat> fwereade, a system version (ppa install) of ensemble?
[12:48] <fwereade> hazmat: yep
[12:49] <hazmat> fwereade, is it ppa or a manual installation via sudo setup.py install?
[12:49] <fwereade> ppa
[12:49] <hazmat> fwereade, i'd suggest first removing the package
[12:50] <hazmat> PYTHONPATH=$PWD python -c "import ensemble; print ensemble"
[12:50] <hazmat> is a quick verification of the import path for ensemble
[12:51] <hazmat> er.. PYTHONPATH=$PWD:$PYTHONPATH  is probably better
[12:51] <hazmat> for real usage
[12:51] <fwereade> hazmat: that appears to work
[12:51] <fwereade> hazmat: anyway, don't worry, all I really needed was verification that it we me being stupid ;)
[12:52] <hazmat> fwereade, path problems hit all of us one time or another..
[12:54]  * fwereade is reassured ;)
[12:58] <hazmat> fwereade, btw, when your the second reviewer on a branch, you should adjust the merge proposal status (at the top) if both the reviews are approve, then approved, else work in progress. 
[12:59]  * hazmat dog walks, bbiab
[13:01] <fwereade> hazmat: cool, thanks
[13:02] <fwereade> hazmat: I think I may have thought it happened by magic :p
[13:02] <pullies> sorry, a little context would probably help.  last night i reported an error that `ensemble status` gives an Invalid SSH key message when i specify my authorized-keys-path in environment
[13:03] <hazmat> pullies, so you have the path to your ssh key in 'authorized-keys-path' in the provider section of the environment? 
[13:04] <hazmat> pullies, and yes the context helps ;-)
[13:05] <pullies> hazmat, i have a path to my ssh key in environments.sample.authorized-keys-path
[13:05] <hazmat> pullies, looking at the code, it looks like the path is a misnomer :-( it wants the name of a file in the .ssh directory
[13:05] <pullies> AHA
[13:05] <hazmat> pullies, if your up for it please file a bug that this is misleading/confusing regarding the name and usage of this setting
[13:06] <pullies> hazmat, where's the bug tracker?
[13:06] <pullies> launchpad?
[13:06] <hazmat> http://launchpad.net/ensemble
[13:07] <hazmat> pullies, your going to need to shutdown and bootstrap again, we need the ssh key active for doing any connection from the cli to the ensemble setup
[13:07] <hazmat> pullies, so your saying you where able to bootstrap with an invalid key?
[13:07] <hazmat> s/where/were
[13:07] <hazmat> that's also a problem/bug
[13:07] <hazmat> if so
[13:07] <pullies> hazmat, running the instance was successful.  i believe connecting to it was not
[13:08] <pullies> this launchpad dashboard is confusingly worded about openid
[13:08] <pullies> :-)
[13:08] <hazmat> pullies, suggestions welcome on #launchpad ;-)
[13:09] <hazmat> pullies, thanks for filing a bug, i'm heading out for a few minutes, let us know how it goes
[13:10] <pullies> still the same error message
[13:11] <pullies> i'm assuming "the ssh directory" is ~/.ssh
[13:11] <pullies> ?
[13:14] <_mup_> Bug #819803 was filed: authorized-keys-path is actually a filename, not a path. <Ensemble:New> < https://launchpad.net/bugs/819803 >
[13:17] <pullies> hazmat, it's possible that ssh access hasn't been enabled for the security group, poking around at ec2 docs.  can you confirm that this is a necessary precondition that ensemble doesn't take care of?
[13:31] <hazmat> pullies, ensemble does indeed take care of ec2 security groups
[13:31] <hazmat> pullies, it takes a minute or two for the instance to be up and responding
[13:31] <hazmat> pullies, actually it does look like it will try either a full path or a name
[13:32] <pullies> i've skipped the ensemble part.
[13:32] <hazmat> pullies, it looks like you would get a LookupError
[13:32] <pullies> i'm trying to use the keypair itself
[13:32] <hazmat> not an invalid ssh key error, if it couldn't find the key
[13:32] <pullies> to make sure i can login
[13:32] <pullies> and i can't.
[13:33] <pullies> ssh -i ~/.ssh/mykey.pem ubuntu@ec2-ip.compute-1.amazonaws.com
[13:33] <pullies> that should succeed, yes?
[13:33] <pullies> ssh -v is telling me that it reads the rsa private key
[13:34] <pullies> "authentications that can continue: public key" is issued twice
[13:34] <pullies> i'm a bit confused why i can't ssh directly in
[13:35] <hazmat> pullies, the ssh key is specified for ensemble is the public key, not the private key
[13:35] <hazmat> pullies, do you have an id_[dsa|rsa].pub in ~/.ssh ?
[13:36] <pullies> amazon only gave me a .pem file to download
[13:36] <hazmat> pullies, ensemble  doesn't use that
[13:36] <hazmat> pullies, try ssh-keygen -t rsa
[13:36] <pullies> will generate a local key and try that
[13:36] <pullies> :-)  duh.
[13:37] <pullies> thanks
[13:37] <hazmat> pullies, you can remove the authorized-keys-path as well, ensemble picks up default keys automatically if none are specified
[13:37] <fwereade> difference between orchestra and ec2: we can't easily tell whether an orchestra machine is running
[13:38] <fwereade> so get_zookeeper_machines is problematic on orchestra
[13:38] <fwereade> because it can't verify the sanity of the state it gets from FileStorage
[13:39] <fwereade> in orchestra, should we be (say) trashing state in shutdown, or should we figure out some way to query the machines and thereby match ec2 better?
[13:41] <hazmat> fwereade, orchestra doesn't know if machines it setup are running? 
[13:42] <hazmat> i thought it was doing a tftp/dhcp setup, maybe that's not exposed via the api?
[13:42] <fwereade> hazmat: nope, you can theoretically query power status
[13:43] <fwereade> but that's acted weird for me
[13:43] <fwereade> and that still doesn't tell us whether they're actually running, or if something went wrong part way through install (for example)
[13:43] <hazmat> fwereade, it looks like the examples specify remote power management but not status
[13:44] <fwereade> the api includes a "status" command, which AFAICT acts like an "off" command
[13:45] <hazmat> fwereade, that seems like an upstream bug if that's the case
[13:46] <hazmat> fwereade, not having any orchestra/cobbler experience, i'm not sure what the options are. but if the zk pointer file is invalid, the whole thing basically breaks down.
[13:46] <niemeyer> Hey guys!
[13:46] <fwereade> hazmat: I'm assuming for now that it's something I'm doing wrong, I tend to defer to RoAkSoAx_ for the final word on these things
[13:46] <hazmat> niemeyer, top of the morning
[13:47] <fwereade> heya niemeyer
[13:47] <fwereade> hazmat: it's fine if provider-state is borked, on ec2, because we can check machine status
[13:47] <fwereade> hazmat: we bootstrap if there's no state, or if the state is nonsensical => probably already shut down
[13:47] <hazmat> fwereade, well on ec2 we always intersect the two provider state against machine status, becuase we need the ip resolution
[13:48] <niemeyer> hazmat: balance to you my friend
[13:48] <fwereade> hazmat: maybe that's the intent, I'm just telling you what I could infer from the code :)
[13:48] <hazmat> fwereade, yup, indeed that's the case.. orchestra is a different beast a bit
[13:49] <pullies> hazmat, now this is progress
[13:49] <pullies> 2011-08-02 09:48:45,857 ERROR SSH forwarding error: Agent admitted failure to sign using the key. Permission denied (publickey).  
[13:49] <niemeyer> hazmat, fwereade Got the conversation mid-way through, but it sounds sensible to trash state on shutdown
[13:49] <niemeyer> I'd rather rename shutdown to destroy-environment, but that's another topic
[13:49] <hazmat> pullies, if you modify/change the key, you'll need to ensemble shutdown && ensemble bootstrap
[13:49] <pullies> this is after doing that
[13:50] <pullies> and i've removed the path from environments.yaml
[13:51] <niemeyer> pullies: Have you run deploy already?
[13:51] <hazmat> pullies, so if you do ec2-describe-instances do you see the instance running (the security group should match the environment name prefixed with 'ensemble-')
[13:51] <fwereade> niemeyer: cool, that feels like it would make life easier on my side and do no harm to ec2
[13:51] <niemeyer> Well, I guess it doesn't really matter actually
[13:51] <niemeyer> fwereade: Agreed
[13:51] <hazmat> niemeyer, yeah.. failure to connect precludes deploy
[13:52] <niemeyer> hazmat: I was thinking that the key is serialized with the env, which happens at deploy
[13:52] <niemeyer> hazmat: But that's something else.. we need a key there to connect to zk in the first place
[13:52] <hazmat> pullies, you should be able to ssh into the machine using  directly using ssh ubuntu@ec2-host-name 
[13:52] <hazmat> niemeyer, the public key is sent at launch time via cloud-init
[13:52] <hazmat> niemeyer, its not stored in zk
[13:53] <pullies> hazmat, the dashboard shows the machine.
[13:53] <niemeyer> hazmat: It is stored in zk during deploy
[13:53] <hazmat> niemeyer, the environment is yes
[13:53] <niemeyer> hazmat: Otherwise how would it be in cloud-init for the other machines
[13:53] <niemeyer> hazmat: The keys
[13:53] <hazmat> niemeyer, totally 
[13:53] <hazmat> but not for the bootstrap
[13:53] <niemeyer> hazmat: Yes, I guess that's what I said above?
[13:53] <hazmat> yup
[13:53] <hazmat> :-)
[13:55] <hazmat> pullies, so what'd i'd like to verify is from the cli you can log into that machine via ssh, if you didn't rename the ssh key, it should just pick up the private side of the same default
[13:56] <hazmat> ssh will try a few from what it finds in ~/.ssh
[13:58] <pullies> i generated the key twice.  it's possible something is cached in either my client or theirs.  will have to log out and try again
[13:58] <pullies> will attempt it tonight
[13:58] <pullies> thanks for the help.  will definitely focus on the ssh portion, i don't think it's ensemble at this point
[14:11] <niemeyer> statik: Morning
[14:12] <statik> morning niemeyer
[14:16] <RoAkSoAx_> fwereade: howdy!!
[14:17] <fwereade> RoAkSoAx_: heyhey!
[14:17] <fwereade> ow's it going?
[14:17] <RoAkSoAx_> fwereade: pretty good, you?
[14:17] <fwereade> RoAkSoAx_: pretty good thanks :)
[14:17] <fwereade> RoAkSoAx_: and I got netboot 9% working on my cobbler, too
[14:18] <fwereade> shadow-trunk is up to date, and might even work for you now ;)
[14:18] <RoAkSoAx_> fwereade: cool, where are you stuck?
[14:18] <fwereade> er, that should have been a 99% up there :)
[14:18] <fwereade> the ubuntu-orchestra-client install
[14:18] <RoAkSoAx_> fwereade: cool, I'm actually pulling your branch to test now
[14:18] <RoAkSoAx_> fwereade: you mean the variable?
[14:19] <RoAkSoAx_> on the preseed?
[14:19] <fwereade> (1) it asks for an rsyslog server, and then cannot fails with "cannot stat /var/something/puppet"
[14:19] <RoAkSoAx_> fwereade: show me the line in the preseed
[14:19] <fwereade> RoAkSoAx_: yeah, I remember you telling me to "just comment it out for now" a while ago, so that's what I did
[14:20] <fwereade> can't copy from VM, but it's the pkgsel one as copied from your mail
[14:20] <RoAkSoAx_> fwereade: is the creation of the cloud-init data fixed?
[14:20] <fwereade> RoAkSoAx_: I *think* so
[14:21] <fwereade> RoAkSoAx_: I now generate something that actually looks like a working EC2 one
[14:21] <RoAkSoAx_> fwereade: ok, gonna test now then ;)
[14:21] <fwereade> RoAkSoAx_: the precise details of how I screwed it up the first time are far to embarrassing to relate :p
[14:21] <fwereade> RoAkSoAx_: sweet, tyvm
[14:22] <RoAkSoAx_> fwereade: hehe its all good
[14:31] <fwereade> RoAkSoAx_: hm, I seem to be getting "204 No Content"s from webdav, which I wasn't before, but it all works (apart from the error, heh)
[14:32] <RoAkSoAx_> fwereade: on the orchestra server, what's in /var/lib/webdav
[14:32] <fwereade> RoAkSoAx_: the right stuff
[14:33] <RoAkSoAx_> fwereade: formulas dir and provider-state?
[14:33] <fwereade> RoAkSoAx_: no but yes (I haven't got a formulas dir at the moment, but the right content was written to provider-state)
[14:35] <RoAkSoAx_> fwereade: mkdir -p /var/lib/webdav/formulas && chown -R www-data:www-data /var/lib/webdav/formulas
[14:36] <RoAkSoAx_> fwereade: ok, so bootstrapping works, ensemble status doesn't
[14:37] <fwereade> awesome! I haven't even thought about what status does, so that's the progress I wanted :)
[14:38] <RoAkSoAx_> fwereade: ok, in orchestra means that was related to having @property def _machines:
[14:38] <fwereade> RoAkSoAx_: indeed, and my understanding was that that was something you wanted to defer until the sprint
[14:38] <RoAkSoAx_> fwereade: but anywa,s what's the last thing merged there and what was left to "separate" from the old bootstrap-orchestra branch?
[14:38] <RoAkSoAx_> fwereade: i wanna have it working though
[14:39] <fwereade> RoAkSoAx_: sounds good to me
[14:39] <fwereade> RoAkSoAx_: were we going with "stick it into ks_meta" for now?
[14:39] <fwereade> last thing merged into shadow-trunk is cobbler-launch-machine
[14:39] <RoAkSoAx_> fwereade: so what;'s missing, the shutdown stuff?
[14:39] <fwereade> cobbler-kill-machine is WIP
[14:40] <fwereade> bootstrap-verify-storage is an unrelated bug I picked up lest I spin my wheels on monday, and that should be good soon
[14:40] <RoAkSoAx_> fwereade: alright, so I;ll re-read your branch and try to identify what'[s missing from the things I wanted to do
[14:41] <fwereade> RoAkSoAx_: I plan to make one more change -- to treat 204 as success (as I think is correct: processed successfully, doesn't feel it needs to return any content)
[14:42] <RoAkSoAx_> fwereade: I haven't seen that, where did you see that?
[14:42] <RoAkSoAx_> ort in what situation
[14:42] <fwereade> I seem to be getting that every time I PUT provider-state to webdav
[14:42] <RoAkSoAx_> fwereade: i haven't seen anything
[14:43] <RoAkSoAx_> fwereade: make sure the formulas dir is there and restart apache2 and see if it keeps throwing that error
[14:43] <RoAkSoAx_> fwereade: is the default storage-url also in?
[14:43] <fwereade> well, it's not an error, it seems like a perfectly legitimate response: "yep, cool, I've handled your request and I have nothing more to tell you, but here's a fresh etag maybe"
[14:44] <fwereade> but twisted getPage seems to consider "not 200" == "something happened, raise an exception"
[14:45] <fwereade> I'll bounce apache anyway, but I think what'll fix it is deleting provider-state, I'll let you know in 5
[14:45] <RoAkSoAx_> fwereade: nah nothing will delete provider-state
[14:45] <RoAkSoAx_> fwereade: you'd have to do it manually in orther to be able to bootstrap again
[14:46] <fwereade> RoAkSoAx_: what I'm doing is setting it to {} on shutdown
[14:47] <fwereade> RoAkSoAx_: and, yes, if I overwrite I get 204, if I trash it manually I get 200
[14:47] <fwereade> RoAkSoAx_: overwrite is perfectly reaonable behaviour, I'll just make sure ensemble understands that
[14:47] <RoAkSoAx_> fwereade: i don't think we would need to delete provider-state on shutdown
[14:48] <RoAkSoAx_> fwereade: remember that we are dealing with physical hw
[14:48] <RoAkSoAx_> and it is expensive
[14:48] <RoAkSoAx_> to be installing every time we want a zookeeper
[14:48] <RoAkSoAx_> when we already have one
[14:48] <fwereade> hm, I thought that ensemble shutdown was intended to wipe out the whole environment -- inverse of bootstrap
[14:49] <RoAkSoAx> fwereade: that's one of the things I'm also planning to address.
[14:49] <fwereade> that's what it seems to do on EC2 anyway :)
[14:49] <RoAkSoAx> fwereade: yes, on ec2 is non-expensive because you can fire up instances or destroy them on demand
[14:49] <fwereade> RoAkSoAx: heh, ok
[14:49] <RoAkSoAx> fwereade: but in real hardware is not the same approach
[14:50] <fwereade> RoAkSoAx: I've been working under the assumption that I should mirror ec2 behaviour as closely as possible
[14:51] <fwereade> RoAkSoAx: at least for now ;)
[14:51] <RoAkSoAx> fwereade: yes, but I think things like that
[14:51] <RoAkSoAx> can be avoided for now
[14:51] <RoAkSoAx> fwereade: I mean, wiping out provider-state is a super minor change
[14:51] <RoAkSoAx> and I don't think it is necessary
[14:52] <fwereade> RoAkSoAx: well, keeping a zookeeper around is quite a major difference, it seems to me :)
[14:52] <fwereade> RoAkSoAx: well... it's very convenient for me :)
[14:52] <RoAkSoAx> fwereade: indeed, but again, we are dealing with real hardware in this case
[14:53] <RoAkSoAx> fwereade: sysadmins *wont* install zookeepers every week to deploy environments but rather, they would keep once zookeeper up and running at all times
[14:54] <RoAkSoAx> fwereade: it is expensive in many ways, 1: downtime 2. network bandwidth wasted 3. hardware is useless 4. reinstallations at all times are expensive
[14:54] <RoAkSoAx> fwereade: why this works on ec2? simply becuase i can fire up/destroy instances on demand and costs 2 cents?
[14:54] <RoAkSoAx> fwereade: were there's a prebuilt image
[14:55] <fwereade> RoAkSoAx: heh, got you, it's the system install cost not the zookeeper install cost (right?)
[14:55] <RoAkSoAx> fwereade: yes
[14:55] <fwereade> RoAkSoAx: ...but we still pay the system install cost for every other machine, right?
[14:55] <RoAkSoAx> fwereade: right, *but* the idea is now to figure out a way of *re-using* the machines instead
[14:55] <fwereade> RoAkSoAx: and if we have a local mirror it's not going to be *that* big a difference is it?
[14:56] <fwereade> RoAkSoAx: ha -- I see :)
[14:56] <fwereade> RoAkSoAx: that goal has escaped me
[14:56] <fwereade> RoAkSoAx: sorry :)
[14:56] <RoAkSoAx> fwereade: hehe but yeah having a mirror is still big difference when deploying a services
[14:57] <RoAkSoAx> cause it still uses bandwdith
[14:57] <RoAkSoAx> and multiplyed by lots of servers
[14:57] <RoAkSoAx> it is huge
[14:57] <RoAkSoAx> fwereade: but yes, that's another thing that I was gonna bring up during the sprint
[14:57] <fwereade> RoAkSoAx: you make a lot of sense
[14:58]  * RoAkSoAx better starts writing down all this stuff otherwise he'll forget :)
[14:58] <fwereade> RoAkSoAx: it just doesn't precisely fit with what I'd understood our goals to be -- I thought we were aiming for parity with ec2 for now, and figuring out the tricky stuff at the sprint
[14:58]  * fwereade would appreciate that :)
[14:59] <RoAkSoAx> fwereade: yeah we can do that if you want too
[14:59] <RoAkSoAx> fwereade: dealing with VM's is as inexpensive as ec2
[14:59] <niemeyer> I seem to remember the wiki sent us to the right page after authenticating
[14:59] <niemeyer> It doesn't look like that's the case anymore
[15:00] <fwereade> RoAkSoAx: well, that's my justification for what I've been doing
[15:00] <RoAkSoAx> fwereade: but right now, what I was persnally looking for is having it bootstrapping, deploying, etc, working (not really exactly the same as ec2, but close), so that during the sprint we could address these issues and differences with ec2
[15:00] <fwereade> RoAkSoAx: I feel it's currently useful, towards that goal, even if things change as the plans firm up
[15:01] <RoAkSoAx> fwereade: you don't need to justify as we didn't set any boundaries about stuff liuke these when we started
[15:01] <fwereade> RoAkSoAx: that's my idea too, with the added condition of "on my local VM network"
[15:01] <RoAkSoAx> fwereade: but my concern is that you might end up writing code that might be later dismissed :)
[15:01] <fwereade> RoAkSoAx: deleting code is one of the great joys in life ;)
[15:02] <RoAkSoAx> fwereade: hehehe alright
[15:02] <RoAkSoAx> fwereade: again I don't mind you doing that, seriously, just giving you a broad view of what I have in my mind at the moment :)
[15:02] <fwereade> RoAkSoAx: cool, I was worried I was going off into the weeds :)
[15:02] <fwereade> RoAkSoAx: good to resync ;)
[15:03] <RoAkSoAx> fwereade: nah.. either way, this things are gonna have to be discussed next week so my thoights my change given input from others
[15:04] <RoAkSoAx> these*
[15:04] <fwereade> RoAkSoAx: cool -- anyway, I'll handle 204s on .put() and propose launch-machine and bootstrap-verify-storage
[15:04] <RoAkSoAx> fwereade: cool
[15:04] <fwereade> RoAkSoAx: and that'll probably be my day, but I might be able to check in later when cath's gone to bed
[15:05] <fwereade> RoAkSoAx: I'll make sure shadow-trunk is up to date with whatever I've proposed
[15:05] <RoAkSoAx> i'll work on reviwing what would be missing from shadow-trunk in comparison to bootstrap's branch
[15:05] <fwereade> fantastic
 fwereade: i don't think we would need to delete provider-state on shutdown
 fwereade: remember that we are dealing with physical hw
 and it is expensive
[15:54] <niemeyer> RoAkSoAx: destroy-environment should really destroy it..
[15:55] <niemeyer> RoAkSoAx: I agree with you that physical hardware may make the admin act differently
[15:55] <niemeyer> RoAkSoAx: E.g. not destroying the environment
[15:55] <niemeyer> RoAkSoAx: It should be possible to terminate services and take them off the machines so that we can reuse not only the bootstrap machine but all of them
[15:55] <fwereade> everyone: I need to be away sharpish, I'm afraid
[15:55] <niemeyer> RoAkSoAx: But that's about _using_ the env
[15:55] <niemeyer> RoAkSoAx: Without destroying it
[15:56] <niemeyer> RoAkSoAx: Having ensemble destroy-environment not destroying it for reuse would be awkward
[15:56] <fwereade> but I have a couple of new mps, and I would appreciate reviews from one and all, eithet on those or on their various prerequisites :)
[15:56] <fwereade> enjoy your afternoons :)
[15:57] <niemeyer> I'm stepping out as well, but for lunch.. biab
[16:04] <RoAkSoAx> niemeyer: right, but from my point of view, destroy an environment should destroy everything, but leave the information from the zookeeper, so next time someone will like to bootstrap, it can detect "hey there's already a zookeeper, if it is sleeping, let's wake it up, if it is awake, let's use it"
[16:05] <RoAkSoAx> niemeyer: and that way we save ourselves from reinstalling a machine again
[16:47] <niemeyer> RoAkSoAx: zk is part of the environment
[16:47] <niemeyer> RoAkSoAx: It's actually a key part of it
[16:48] <niemeyer> RoAkSoAx: If one wants to save the time to redeploy zk, just don't destroy the environment
[16:48] <niemeyer> RoAkSoAx: It's a "doctor, it hurts!" case :)
[16:48] <_mup_> ensemble/states-with-principals r303 committed by kapil.thangavelu@canonical.com
[16:48] <_mup_> statebase retry topology change respects change functions which yield control.
[17:55] <niemeyer> bcsaller: How's it going with the local dev stuff?
[17:57] <bcsaller> niemeyer: I'm working on trying to add flexability to how machine assignment is done in deploy/add_unit. Those both use state.service.assign_to_unassigned_machine which clearly isn't always what we want. 
[17:58] <bcsaller> niemeyer: but specifying machines in deploy/add-unit is a little at odd with the co-location spec. Its a different axis to plot unit placement on 
[17:58] <niemeyer> bcsaller: Don't worry about co-location for the moment..
[17:58] <niemeyer> bcsaller: This is really a different angle of the problem
[17:58] <bcsaller> just keeping it in mind
[17:58] <niemeyer> bcsaller: Cool, that's nice
[17:59] <niemeyer> bcsaller: Hmm.. but we do have specific assignment, rigth?
[17:59] <niemeyer> bcsaller: assign_to_unassigned is just one method we have
[17:59] <bcsaller> its the only one used
[17:59] <bcsaller> in the cli
[18:00] <bcsaller> so really it becomes about providing access to other means for placement (as a starting point)
[18:01] <bcsaller> I know there is a desire down the road to say things like `ensemble add-unit -n <num> service`
[18:01] <bcsaller> but if deploy and add-unit grow syntax to support machine assignment I want it to be future friendly 
[18:02] <niemeyer> bcsaller: Have you seen assign_to_machine?
[18:02] <bcsaller> yes
[18:03] <bcsaller> niemeyer: I think the issue comes in at the cli level to be clear
[18:03] <niemeyer> bcsaller: That's why I don't get the problem you're describing.. sure, we have assign_to_unassigned_mchine, which is the hard one..
[18:03] <niemeyer> bcsaller: We also have an explicit one
[18:03] <niemeyer> bcsaller: Which is easy to use
[18:03] <bcsaller> niemeyer: its literally an issue of cli syntax I'm talking about, not a coding hurdle 
[18:03] <niemeyer> bcsaller: Ahh, ok
[18:04] <bcsaller> I don't want to blindly add new syntax that isn't friendly to the other efforts we have in mind 
[18:04] <niemeyer> bcsaller: 100% with you
[18:05] <niemeyer> bcsaller: Hmmm
[18:07] <niemeyer> bcsaller: Here is an idea..
[18:07] <SpamapS> How is this at all relevant to local dev?
[18:07] <SpamapS> There's only one machine in local dev.
[18:07] <niemeyer> bcsaller: Let's introduce a command named "set-devel-flag"
[18:07] <niemeyer> SpamapS: Let's cover this in a moment..
[18:07] <SpamapS> Which would be "available" because it can add containers.
[18:08] <niemeyer> bcsaller: Or even better, "set-devel"
[18:08] <bcsaller> SpamapS: thats an important part of the change, but now the cli tools only look for unassigned machines so its a little more pervasive 
[18:08] <niemeyer> bcsaller: Takes a json blob
[18:09] <niemeyer> bcsaller: and stores it in zookeeper, within the topology in a "devel" key
[18:09] <SpamapS> So to me, the current way, "find me an available machine" should just find you machine 0 .. your local machine. For EC2, since they can't do containers, they are unavailable as soon as they have 1 thing on them.
[18:09] <niemeyer> bcsaller: So we can experiment with different settings
[18:10] <niemeyer> SpamapS: Don't worry about it.. we're just splitting development in logical steps
[18:10] <niemeyer> SpamapS: We'll eventually give you the feature you want.
[18:10] <niemeyer> bcsaller: Or maybe it should really be "set-flag"
[18:10] <niemeyer> bcsaller: So that we can use that later
[18:11] <niemeyer> bcsaller: (rather than being specific to "development")
[18:11] <niemeyer> bcsaller: This way you can create an alternative path within the logic by consulting specific flags
[18:11] <niemeyer> bcsaller: Without altering the standard operation
[18:11] <niemeyer> bcsaller: Thoughts?
[18:11] <bcsaller> niemeyer: we could easily add arguments to deploy/add-unit that were conceptually --placement <strategy_or_plan> where it could be a machine id or the name of an available planner which could choose local, reuse, etc
[18:11] <bcsaller> as a counter proposal 
[18:12] <niemeyer> bcsaller: Yes, we could, .. we'd also have to worry about getting it right.. you already spent a day thinking about it and didn't get to a good plan, so my suggestion is to get unblocked and
[18:12] <bcsaller> std ops through the code paths would all have to check those flags, which is fine, we want something like that anyway
[18:12] <niemeyer> bcsaller: have the actual goal in mind for the moment.. we can worry about neat placement strategies down the road
[18:12] <niemeyer> bcsaller: The problem we have at hand right now doesn't depend ont his
[18:12] <bcsaller> I don't need to build those now, that wasn't the point
[18:13] <niemeyer> bcsaller: That's my point! ;-0
[18:13] <bcsaller> +1
[18:14] <bcsaller> I find that syntax better than talking about setting development flags in a json bucket, but under the hook it will play out much the same from the internals of those tools
[18:14] <niemeyer> bcsaller: So every time you do deploy wordpress/mysql/etc, you'll have --placement ?
[18:14] <cole> roadmap question:  I get that ~/.ensemble/environments.yaml can be very easily modified to scale an app.  is there a framework for allowing this to be done based on some performance threshold? like memory consumed or cpu utilization / overall cluster throughput etc… ??
[18:15] <niemeyer> bcsaller: In local development there can't be anything besides --placement=local
[18:15] <niemeyer> bcsaller: So where do you store the fact placement _has_ to be local?
[18:15] <bcsaller> niemeyer: it would just default to doing with it does, "unassigned" which points to the existing method 
[18:16] <bcsaller> `local` is a method that says return machine<0>
[18:16] <niemeyer> bcsaller: Ok.. that sounds good as well.. can you please describe the semantics end-to-end?
[18:16] <niemeyer> cole: We'll be with you in a sec
[18:17] <bcsaller> `ensemble deploy --placement local mysql`
[18:17] <bcsaller> `ensemble deploy --placement local wordpress`
[18:17] <bcsaller> would place two units and assigned them to the machine returned by the policy, in this case machine 0 which is the local box
[18:19] <bcsaller> internally this would replace the code in deploy and add unit that maps/find machines and does unit assignment with a callout to policy by name. If that option is an int, the machine is is resolved and used with a different policy function doing specific assignment 
[18:19] <bcsaller> add-unit -n <num> --placement xxx could still be strange, with a policy it could work, with a machine id... ?
[18:19] <bcsaller> but that doesn't seem to be a blocker to me 
[18:21] <hazmat> niemeyer, bcsaller unrelated to current discussion, i was looking over the co-location stuff on the ML, and was wondering if this isn't easier with the relation qualification co-located or a new relation type container, the distinguishing characteristic is that the physical placement, its odd indeed for a local co-located service to talk to an opposite end remote service. its more of a local either p2p relations between those units deployed in the s
[18:21] <hazmat> ame container, or a bus/ring container relation containing only the local units.
[18:21] <hazmat> bcsaller, that sounds good if default placement policy can derive from provider
[18:21] <niemeyer> bcsaller: We don't have to address specific assignment for the moment
[18:21] <hazmat> thus obviating the need for specifying it in hte common case
[18:21] <niemeyer> bcsaller: I want to avoid the "I want this in machine X" feature for now
[18:21] <niemeyer> bcsaller: Because it blocks other characteristics we're intrested on
[18:22] <bcsaller> niemeyer: I prefer that as well 
[18:22] <niemeyer> hazmat: Sorry, I'll be with you soon.. let me unroll the stack of questions
[18:22] <niemeyer> bcsaller: Ok
[18:22] <bcsaller> niemeyer: a couple of named policies that map cli stuff to the service assignment code then is pretty simple and seems future aware
[18:23] <niemeyer> bcsaller: So --placement local sounds fine to bootstrap.. the local provider can somehow determine the default policy down the road
[18:23] <bcsaller> hazmat: it makes total sense that providers can carry code for specific policies 
[18:23] <hazmat> cole, its definitely something we're thinking about, but its probably a ways out, we're currently working out how to get things like default service monitoring onto systems. in future with monitoring, and a remote api for ensemble, a user could provide scaling logic, its probably a while till ensemble provides it as a generic feature.
[18:24] <hazmat> bcsaller, not that they should per se have code, ideally it could be generic, just that they specify a default named policy
[18:24] <niemeyer> bcsaller: The point was more that we need to tweak default policy according to backend
[18:24] <bcsaller> ok
[18:24] <niemeyer> bcsaller: We don't want --placement switches on every single call on a local dev
[18:25] <cole> hazmat: thanks!  I figured as much.  I think we might be able to help in that area.  project looks like it's coming along nicely!
[18:25] <bcsaller> right
[18:25] <bcsaller> got it
[18:25] <niemeyer> bcsaller: But I see your overall plan, it's a good idea, +1
[18:25] <bcsaller> cool, I can work on a branch for that today, sounds pretty simple
[18:26] <hazmat> cole, fwiw, as is though ensemble cli already enables the ability to scale a service and automatically reconfigure clusters for the additional capacity, just not as the automated scaling bit in response to service conditions.
[18:27] <niemeyer> SpamapS: So..
[18:27] <cole> hzmat: yep, got it!
[18:27] <niemeyer> SpamapS: The way the work is being structured is this:
 1) Make multiple units work on a single machine across the board (no LXC)
 2) Make local deployments work with one or multiple units (no LXC)
 3) Make LXC work to deploy units locally (doesn't matter if EC2 can't do it yet)
[18:28] <niemeyer> SpamapS: bcsaller is working on step (1) still (he started yesterday :-)
[18:29] <SpamapS> Cool, I had a branch that did 1 with --machine $machine_id .. tho it was failing tests last I checked.
[18:31] <niemeyer> SpamapS: That's exactly the context of the conversation.. I don't want to nail the problem of specific assignment for the moment.. there are other approaches we can take for that (resource interest, service proximity, etc), and it's really unrelated to the core problem we're solving for local development
[18:31] <niemeyer> SpamapS: So I had one suggestion, and bcsaller has a better suggestion which we'll go down with.. --placement local..
[18:32] <niemeyer> SpamapS: This is a trivial bootstrap process that keeps the complex problems for latter
[18:32] <niemeyer> later
[18:32] <bcsaller> SpamapS: where using a local provider would change the default placement policy for you
[18:36] <SpamapS> ACK
[18:37] <hazmat> niemeyer, although placement considering the service to be deployed (resource interest, service proximity)  will need to receive it as part of the placement api
[18:37] <niemeyer> hazmat: Ok, re. co-location.. I agree the flag on the relation is probably all we need
[18:37] <niemeyer> hazmat: I don't see it as being special, though
[18:38] <niemeyer> hazmat: These relations still need well defined interfaces
[18:38] <hazmat> niemeyer, yeah.. well its not clear that a local co-located service needs to have any access to the remote units
[18:38] <niemeyer> hazmat: They don't _have_ to
[18:38] <hazmat> er. its opposite end
[18:38] <niemeyer> hazmat: But they should be _able_ to
[18:39] <niemeyer> hazmat: re. the placement point above, yes, I'm not trying to define how that's going to work right no
[18:39] <niemeyer> w
[18:39] <niemeyer> hazmat: Was rather just mentioning there are additional things we'll want to talk about and understand when sorting this actual issue
[18:39] <niemeyer> hazmat: The problem we have at hand right now is much simpler, though
[18:39] <hazmat> indeed
[18:42] <hazmat> okay.. i did some reviews and security work today, switching tracks i'm going to do a presentation tonight at a local python user group, going to prep for that
[18:43] <niemeyer> hazmat: Cool.. I'll switch to reviews.. is there something blocking you on that front?
[18:43] <niemeyer> I'd like to sort all of William's branches today, hopefully
[18:45] <hazmat> niemeyer, nope.. i've just been going through william's branches.. on the security front, the integration work is coming along, i've reworked the interfaces a few times, most recently to enable us to turn off security by default for tests (default for now is enabled), still a little bit of refactoring to do on the policy.. i'm trying to finish the end to end so i can fix up policy-rules branch based on better knowledge of its application.
[18:45] <niemeyer> hazmat: Cool
[19:08] <niemeyer> Huge wind storm here today
[19:11] <jcastro> how do you move between VTs in the tmux thing when you're in debug mode?
[19:17] <hazmat> jcastro, ctrl-a
[19:18] <hazmat> is the escape sequence, tmux config in debug-mode is setup to emulate screen
[19:19] <jcastro> ah, been spoiled by byobu I guess, heh
[19:19]  * jcastro finishes up his ensemble screencast
[19:27] <niemeyer> jcastro: We hope to use byobu again at some point
[19:27] <jcastro> easy to forget how spoiled I was
[19:27] <niemeyer> jcastro: kirkland is working on a set of configs for tmux, and hopefully we can also bring screen back in the future
[20:18] <hazmat> niemeyer, any progress on the repo work?
[20:18] <hazmat> just using the principia-tools to setup a demo.. and thinking ick
[20:20] <niemeyer> hazmat: None..
[20:21] <niemeyer> hazmat: Stuck on reviews, interviews, conversations, etc
[20:21] <niemeyer> hazmat: Hoping to get to it this week still
[20:59] <SpamapS> Ok I just uploaded txzookeeper 0.8.0 to oneiric.. and will upload trunk shortly as well.
[21:00] <SpamapS> hazmat: If there's anything minor I can do to make principia less "ick" .. let me know. I've tried to make it a little better of late. :-P
[21:00] <SpamapS> hazmat: don't want to spend much time on it though.. :)
[21:01] <hazmat> SpamapS, i appreciate the work on it, just wishing for a repository to obviate the need for additional tools to deploy
[21:01] <SpamapS> hazmat: exactly
[21:02] <SpamapS> hazmat: I'd like a better repo too.. principia is.. well a nice experiment. :()
[21:02] <SpamapS> hazmat: note that there's a 'princpia update' command now.. which pulls a new list of formulas
[21:02] <SpamapS> hazmat: and some of the commands have --help
[21:03] <jcastro> SpamapS: what!
[21:03] <jcastro> where?
[21:03] <hazmat> SpamapS, if i had to capture in one line the three things to make the dev story better.. it would be "local dev, no formula revs, pre-allocate machines"
[21:03] <SpamapS> jcastro: in the ppa
[21:03] <jcastro> oh man
[21:03] <SpamapS> jcastro: sudo apt-get install principia-tools
[21:03] <jcastro> I totally missed that
[21:03] <hazmat> SpamapS, getall by itself seems to do the trick of updating (mr seems to do it)
[21:03] <jcastro> also, check it out: http://www.youtube.com/watch?v=4Rl7wTlUqkY
[21:03] <SpamapS> hazmat: getall calls update :)
[21:03] <hazmat> well of grabbing new formulas
[21:03] <hazmat> nice :-)
[21:04] <hazmat> SpamapS, was that a good summation of things? or are there others that get top billing?
[21:04] <SpamapS> hazmat: yeah definitely.. though I have to say, the formula dev story is already pretty damn good.. our standards just keep going up. :)
[21:05] <hazmat> SpamapS, indeed, but precious seconds get lost, and turned into minutes.. we keep getting busier ;-)
[21:06] <hazmat> i think i figured out a quick way to pre-allocate machines, but the allocation doesn't take place till the first formula is deployed
[21:06] <hazmat> which is kinda of a bummer, its more like a delayed pre-allocation
[21:06] <SpamapS> yeah I think it actually makes sense to enable it as its own command
[21:06] <hazmat> SpamapS, like add-machines 5 ?
[21:07] <SpamapS> ensemble bootstrap && ensemble allocate-machines --ec2.instance-type=m1.small 10
[21:07] <SpamapS> It would be cool to have every aspect of the environment available as --env.x=foo
[21:08] <SpamapS> Would solve a lot of the "need a way to specify X at runtime"
[21:08] <SpamapS> jcastro: cool video
[21:10] <hazmat> hmmm.. that sounds good re allocate-machines.. the env.x syntax is likely problematic.. its kinda of redundant in that the cli is already targeting a env for any op, so the qualification is odd
[21:11] <SpamapS> hazmat: its to prevent namespace collision
[21:11] <SpamapS> hazmat: doesn't have to be ec2. .. could be --envset instance-type=m1.small
[21:12] <SpamapS> hazmat: or just bury it in the positional args
[21:12] <SpamapS> hazmat: just seems like a good idea to be able to override settings at runtime thats all
[21:17] <hazmat> SpamapS, ic.. i was thinking just allocate-machines --provider-size=m1.small  4
[22:43] <niemeyer> Ugh.. almost 8
[22:43] <niemeyer> Time flew by today
[23:02] <niemeyer> ALRIGHT!
[23:02] <niemeyer> We have an almost empty review queue!
[23:02] <niemeyer> It's been a while..
[23:02] <niemeyer> But!
[23:02] <niemeyer> We still need  a hand on this one:
[23:02] <niemeyer> https://code.launchpad.net/~fwereade/ensemble/webdav-storage
[23:02] <niemeyer> It lacks a second review
[23:02] <niemeyer> Any takers?
[23:35] <_mup_> Bug #820107 was filed: Ensemble should enable flexible unit placement <Ensemble:In Progress by bcsaller> < https://launchpad.net/bugs/820107 >