/srv/irclogs.ubuntu.com/2011/12/07/#juju.txt

nijaba	SpamapS: merge done	00:02
nijaba	resubmited	00:02
* nijaba off to see morpheus		00:03
SpamapS	nijaba: merging now, THANKS!	00:04
nijaba	SpamapS: my pleasure	00:05
* nijaba really having fun		00:05
_mup_	juju/ssh-known_hosts r427 committed by jim.baker@canonical.com	00:13
_mup_	Initial commit	00:13
nijaba	SpamapS: was wondering why having a readme is not mandatory for charm. Aren't they a bit dry without inbreed doc? Shouldn't juju offer a "man" option to access documentations about charms?	00:22
SpamapS	nijaba: I've thought about exactly that, having a 'charm info xxx' command that intelligently looks for README* readme* and cats them together into less would be cool. :)	00:23
SpamapS	s/less/$PAGER/	00:23
SpamapS	nijaba: I think they also need a maintainer: field in metadata.yaml	00:24
nijaba	SpamapS: this last bit was acked a fe days ago, IIRC	00:24
nijaba	SpamapS: I'll open a bug :)	00:24
adam_g	SpamapS: hey clint, should i push this lonely precise branch (lp:~gandelman-a/charm/precise/rabbitmq-server/900440) directly to lp:charm/precise/rabbitmq-server since theres nowhere to file a proper merge proposal?	00:35
SpamapS	adam_g: how about you push the oneiric branch into precise, then do a MP against that?	00:40
_mup_	juju/ssh-known_hosts r428 committed by jim.baker@canonical.com	00:40
_mup_	Support machine recycling	00:40
_mup_	Bug #901017 was filed: Juju should have a "info" or "man" option <juju:New> < https://launchpad.net/bugs/901017 >	00:40
adam_g	SpamapS: ahh	00:41
adam_g	SpamapS: hmm. no dice pushing the current lp:charm/rabbitmq-server up to lp:charm/precise/rabbitmq-server http://paste.ubuntu.com/762302/	00:45
SpamapS	adam_g: right.. I guess I have to do the "initialize" step before we can push to that series. :(	00:46
SpamapS	adam_g: can you at least push it to ~gandelman-a/charm/precise/... ?	00:46
adam_g	SpamapS: yeah, already there: lp:~gandelman-a/charm/precise/rabbitmq-server/900440	00:47
SpamapS	adam_g: so you can probably push the oneiric one to lp:~charmers/.....	00:48
SpamapS	adam_g: then do the MP	00:48
SpamapS	adam_g: meanwhile I'll try to figure out how we initialize the series	00:48
adam_g	that might work	00:49
adam_g	i could push to lp:~charmers/charm/precise/rabbitmq-server/trunk, suppose i'll propose against that...	00:53
SpamapS	yeah that will work	00:54
_mup_	juju/upgrade-config-defaults r429 committed by kapil.thangavelu@canonical.com	02:26
_mup_	use lazy computation of default values instead of recording them to config state	02:26
_mup_	juju/upgrade-config-defaults r430 committed by kapil.thangavelu@canonical.com	02:32
_mup_	config value validation no longer returns defaults	02:32
_mup_	juju/upgrade-config-defaults r431 committed by kapil.thangavelu@canonical.com	02:35
_mup_	no longer explicitly touch defaults in upgrade, the lazy computation suffices.	02:35
_mup_	Bug #901043 was filed: switch charm subcommand to change origin of charm and upgrade <juju:New> < https://launchpad.net/bugs/901043 >	02:47
hazmat	SpamapS, is bug 900517 different than the upgrade config defaults issue?	02:51
_mup_	Bug #900517: config-get on an int set to 0 does not return '0' but an empty string <juju:New> < https://launchpad.net/bugs/900517 >	02:51
* SpamapS reads		02:52
SpamapS	hazmat: its entirely possible that this was actually the same effect.	02:52
SpamapS	hazmat: easy to test that hypothesis	02:52
* hazmat does a UTSL		02:53
hazmat	SpamapS, i still haven't managed to reproduce bug 861928, i suspect its timing dependent, if you do manage to reproduce, it would be helpful to attach the entire provisioning agent log	02:55
_mup_	Bug #861928: provisioning agent gets confused when machines are terminated <juju:New for jimbaker> < https://launchpad.net/bugs/861928 >	02:55
SpamapS	hazmat: interesting	02:58
SpamapS	hazmat: you know.. kees was experiencing it on the oneiric version (r398) .. its possible that its been fixed inadvertently with some of the ZK / API fixes	02:58
hazmat	SpamapS, yeah.. jimbaker fixed another provisioning agent bug post oneiric afaicr	02:59
SpamapS	have we broken backward compatibility with r398 at all? I have half a mind to propose that we just put r427 in oneiric-updates	02:59
SpamapS	features be damned. ;)	03:00
SpamapS	only problem is.. we can't actually upgrade deployed environments	03:00
hazmat	SpamapS, i doubt that's an issue in practice	03:01
SpamapS	that we can't upgrade the provisioning agent?	03:01
hazmat	SpamapS, that their are long lived juju environments extant	03:01
SpamapS	kees had one very long lived for doing sbuild fanout	03:01
hazmat	SpamapS, but fair enough	03:01
hazmat	SpamapS, he shut it down though.. 45usd spend	03:02
SpamapS	until it stopped working	03:02
SpamapS	Anyway, I agree, nobody should have a long lived 11.10 juju cluster. :)	03:02
SpamapS	would be good to come up with an upgrade story for 12.04's juju	03:03
SpamapS	if william finishes the upstart job stuff.. we can at least put in the packages to stop/start the agents on upgrade	03:03
hazmat	indeed that will be key, we can probably do some of dance around that, but the biggest question mark on the upgrade story, is just coordinating a code drop/rev across a cluster of different release series	03:05
hazmat	ideally just a binary drop..	03:05
SpamapS	Which is why I think we're going to eventually have to host juju packages on the juju service nodes	03:12
SpamapS	Otherwise precise won't be able to play in a "Q" managed cluster	03:12
SpamapS	should be fairly easy... each juju package just needs to include a script which builds itself for every series you want to support	03:14
SpamapS	and of course, we have to build a test suite which makes sure that actually works ;)	03:15
hazmat	SpamapS, re the config set to 0, afaics its not an issue	03:38
hazmat	hmm.. maybe it is	03:39
hazmat	something sounds familiar	03:39
SpamapS	I was thinking it might be an issue where the value might not be carefully checked for None	03:41
_mup_	juju/sshclient-refactor r428 committed by kapil.thangavelu@canonical.com	04:49
_mup_	refactor the sshclient (zk over ssh tunnel)	04:49
_mup_	juju/sshclient-refactor r429 committed by kapil.thangavelu@canonical.com	04:55
_mup_	increase the default timeout	04:55
_mup_	juju/sshclient-refactor r430 committed by kapil.thangavelu@canonical.com	04:55
_mup_	robust zk conn	04:55
* SpamapS cheers hazmat on		04:59
* hazmat falls asleep		05:06
rog	mornin'	07:41
TheMue	moo rog	08:19
rog	TheMue: yo	08:21
mpl	rog: is the example in the example dir of zookeeper working for you (that is, once you've replaced Init with Dial and fixed the err.String() calls)?	10:32
rog	mpl: i'll try it	10:32
mpl	rog: here I have two problems with it. 1) it doesn't return as it should if I don't have any zookeeper server running. 2) I get loads of error messages for error, coming apparently from this point: event := <-session (it doesn't get past there apparently).	10:34
mpl	s/for error//	10:34
rog	mpl: yeah, me too - loads of time out errors	10:38
rog	mpl: i think the timeout must be wrong	10:38
rog	mpl: yeah, the timeout should be 5e9 not 5000	10:39
rog	mpl: BTW i'm not sure what it should do if there's no zk server running	10:40
rog	mpl: here's my updated version: http://paste.ubuntu.com/762565/	10:40
mpl	rog: well, I don't know what it should do, but err should be != nil when Dial fails, and it seems it's not the case for me.	10:40
rog	mpl: i'm not sure that Dial can ever fail	10:41
mpl	oh	10:41
mpl	how come?	10:41
rog	mpl: because the connection itself is asynchronous	10:41
mpl	ah yes	10:41
mpl	good point, thx	10:41
mpl	so that err check is pretty moot	10:41
rog	mpl: i think that's wrong, and gustavo and i have talked about changing it in the past, but the changes haven't been made yet	10:41
mpl	ok, another thing I don't get, why do I get tons of messages and not just one? that chan read is not in a loop.	10:42
rog	mpl: looking at zk C source, it looks like the only way it can return an error is if the hosts arg is malformed	10:45
rog	mpl: the messages are printed by the zk client code	10:45
rog	mpl: (logging is turned on by default, which i think is wrong too)	10:46
mpl	rog: you mean they come from underlying calls of Dial?	10:49
rog	mpl: yeah - they come from within the C API	10:49
mpl	rog: and not in any case as a result of this: "event := <-session" ?	10:49
rog	mpl: indeed - that blocks until the connection is made. i don't know if zk ever decides that it can't connect.	10:50
mpl	rog: ok, that's reassuring then, thx.	10:50
rog	mpl: you can turn the debugging messages off	10:50
mpl	ah cool, it finally worked.	10:52
rog	mpl: zookeeper.SetLogLevel(0)	10:52
mpl	good to know, thx.	10:53
mpl	rog: ok, I'll elaborate from that example to play with ssh.	10:54
rog	mpl: sounds good	11:48
TheMue	re	12:30
hazmat	g'monring	12:45
TheMue	moo hazmat	12:55
TheMue	for documentation purposes: are there some special bazaar configuration settings for juju?	13:14
rog	TheMue: not as far as i know	13:35
TheMue	fine, makes it easier	13:35
TheMue	I'm working on a "Getting Started"	13:36
fwereade	hazmat, is there some reason you know of for the particular shape of the code around CharmUpgradeOperation?	13:38
fwereade	hazmat, because the workflow is perfectly capable of synchronising the state if we make the charm upgrade much more like a normal transition, but it's much hairier if there's a reason not to do it as a normal transition	13:40
hazmat	fwereade, not sure what you mean	13:52
hazmat	fwereade, you mean push more of the operation out of the watch callback and into the transition?	13:52
fwereade	hazmat, that everything done CharmUpgradeOperation ought IMO to be done on the lifecycle, like the other things that happen as part of of a state transition	13:54
fwereade	hazmat, and if we do that we can easily just call "self.workflow.synchronize(executor)" in place of the boolean tangle in the original MP	13:55
hazmat	fwereade, hmm. so my thought there its not something that is manageable completely internal to the lifecycle, it depends on external mutable persistent settings, which is very different then anything else in the lifecycle	13:55
fwereade	hazmat, on the service's charm id?	13:56
hazmat	ie. you can't just call lifecycle.upgrade() and expect it to work, the external state needed to be put in place first.. where as you can call any of the other lifecycle methods	13:56
hazmat	fwereade, on the upgrade flag	13:56
fwereade	hazmat, hmm, hadn't had that perspective	13:56
hazmat	fwereade, i thought the plan was not to do anything on upgrade_error	13:57
hazmat	fwereade, how does this issue arrise?	13:57
fwereade	hazmat, you recall the plan to make the workflow know how to set up the lifecycle and executor to match the current state	13:58
fwereade	hazmat, to do so, we need to be able to detect the errors which occur while the executor is paused, so we can restore it correctly	13:59
hazmat	fwereade, i thought we'd moved on to its an easy thing to distinguish in the upgrade transition, and we'll be dealing with disconnected op sync anyways, so exact match isn't nesc (queueing in the background)	13:59
hazmat	fwereade, the error from the executor is paused is noted in the state	14:00
fwereade	hazmat, how is it noted?	14:00
fwereade	hazmat, we don't even try to fire a transition until some time after we've stopped the executor	14:00
hazmat	fwereade, although juju could probably use a more robust setup there from pause, to enclose the rest in a try/except block	14:00
hazmat	fwereade, so from pause to transition, its set a zk value, and extract a charm to disk	14:02
hazmat	if the transition/hook fails we'll get into a recorded error state	14:02
fwereade	hazmat, and if anything goes wrong during the extract or the zookeeper set, we'll be in a weird state	14:02
hazmat	fwereade, a try/except around the others can manually fire transition to an error state	14:03
hazmat	on error	14:03
hazmat	fwereade, its an odd scenario regardless if we have a half extracted charm on disk	14:03
* hazmat ponders		14:04
fwereade	hazmat, agreed, but I don't think we can guarantee that that will never happen	14:04
hazmat	fwereade, agreed, although we can do a better job of minimizing, but its not clear that encompassing more to the error state, is helpful wrt to retry, the coordination state is gone on retry	14:10
hazmat	the flag is cleared, and we don't know that we can safely execute the upgrade hook again, because we don't know the state on disk or zk of the charm	14:11
hazmat	and if we renter the entire ugprade operation, we don't have the coordination state to trigger any changes, and it will early exit	14:12
fwereade	hazmat, isn't it just down to the order of operations?	14:12
hazmat	perhaps	14:12
fwereade	hazmat, if we extract, then set in ZK, then fire the hook	14:13
hazmat	fwereade, i don't see how that helps, the flag is cleared	14:13
hazmat	fwereade, and you can't set the flag in an error state	14:14
hazmat	fwereade, your right though, an error here should be recorded as a charm upgrade error	14:14
fwereade	hazmat, because we can know by the unit charm id whether or not the extraction of the latest charm has completed; if it has we can move straight on to firing the hooks(or not) according to the "resolved" command	14:16
fwereade	hazmat, if the charm ids don;t match, we start the operation from scratch	14:16
fwereade	hazmat, (when we retry)	14:16
hazmat	fwereade, so right now error states always refer to hook errors..	14:17
fwereade	hazmat, from the POV of the workflow state, which represents what the unit is actually doing, I feel that "half-extracted charm that's 100% broken" should absolutely represent an error	14:18
hazmat	fwereade, it definitely should, i'm just trying to work through the implications of changing the meaning of an error state, what retry means in this context, and changing the interactions/responsibilities of lifecycle compared to any extant uses.	14:20
hazmat	fwereade, there's a notion that upgrades flags shouldn't survive restarts, which is one reason why we cleared the flag early	14:20
hazmat	i'm trying to recall if there was more to it that	14:21
* SpamapS stretches and yawns		14:21
hazmat	fwereade, so when would the upgrade flag get cleared?	14:22
fwereade	hazmat, my idea is that we cleat the upgrade flag as soon as we see it, but we kick off an upgrade_charm transition, which is "started"->"started"	14:22
fwereade	hazmat, if we're not in a started state we just bail before we even try the transition	14:23
* hazmat nods		14:23
fwereade	hazmat, the lifecycle.upgrade_charm will do the early parts before stopping the hooks and quietly bail out on errors, equivalently to now	14:23
fwereade	hazmat, but once we hit the stop-hooks-start-messing-with-disk-state point, any subsequent errors should come out and be detected as transition failures	14:24
hazmat	fwereade, how do you renter the upgrade charm state?	14:24
hazmat	error state that is	14:24
hazmat	on a process restart	14:25
robbiew	hazmat: just got an email from fernanda...TZ mixup?	14:25
fwereade	hazmat, it's just an existing workflow state, I'm already in that state when I come up	14:25
hazmat	robbiew, doh.. indeed that is tz mixup, i thought it was +1 hr	14:26
hazmat	fwereade, but the process mem state is different	14:26
hazmat	fwereade, ah.. so the executor is still stopped	14:26
hazmat	because we never started the lifecycle, and we're not listening to any rel lifecycles	14:27
fwereade	hazmat, lifecycle.running and executor.,running are not especially closely related	14:27
hazmat	fwereade, yup.. so if we restart in a charm upgrade error state.. the lifecycle is stopped, the exec is running, but nothing feeding into it	14:28
fwereade	hazmat, the executor needs to be stopped during upgrade error states	14:29
fwereade	hazmat, all teh rest of the time it's fine	14:29
hazmat	fwereade, how does it get stopped on restart	14:29
fwereade	hazmat, we just don't start it explicitly, we let the workflow do so if it's in a state which needs it	14:30
hazmat	fwereade, and how is it any different than the lifecycle just being stopped	14:31
fwereade	hazmat, so it's just "self.workflow.synchronize(self.executor)" and then we're in the state we must have been in when we left off last time	14:31
fwereade	hazmat, from outside perspective no different, I guess -- no hooks are executing -- but... well, why exactly are we explicitly stopping the executor when we could just stop the lifecycle like we do with, say, configure?	14:33
fwereade	hazmat, ...only just thought of that :/	14:34
hazmat	robbiew, just rescheduled for 20m from now	14:34
robbiew	hazmat: cool	14:34
hazmat	fwereade, because the ability to run a hook now (ahead of any queued hooks) has a safety notion that the executor is stopped, in part to guarantee that there are no other currently executing hooks	14:37
hazmat	fwereade, i need to switch tracks for a little bit, but i'll definitely ponder this some more	14:38
fwereade_	hazmat, isn't the reason that the unit relation lifecycles' schedulers could still be busily executing queued hooks at any stage?	14:38
hazmat	fwereade, not sure if you saw this.. because the ability to run a hook now (ahead of any queued hooks) has a safety notion that the executor is stopped, in part to guarantee that there are no other currently executing hooks	14:39
fwereade_	hazmat, exactly so	14:39
hazmat	fwereade, i need to switch tracks for a little bit, but i'll definitely ponder this some more.. i think this is worthwhile.. part of the issue though on either an extract failure or a state change failure, is that its signals a signficant problem	14:40
fwereade_	hazmat, I think it comes down to my conviction that we're better off restoring process state on startup -- which state can be encapsulated in 2 bools -- than we are by complicating the logic we run all the time	14:41
fwereade_	hazmat, ok, ttyl -- ping me to continue when you're free :)	14:41
hazmat	fwereade_, isn't restoring the state as simple as is -> if not self.running: self.lifecycle.start, else self.executor.start()	14:42
fwereade_	hazmat, well, "started" implies both running, but yeah, it's not complicated	14:42
fwereade_	hazmat, you seemed at one stage to be arguing against it	14:42
hazmat	fwereade_, actually i was hoping for that since it was the simplest thing, but the notion that upgrade error should encapsulate non hook errors has some merit	14:48
hazmat	fwereade_, definitely worth exploring, and i think a good track	14:49
fwereade_	hazmat, I think it is the simplest thing	14:49
SpamapS	http://www.ustream.tv/channel/vclug-venturaphp .. me.. talking about juju to a local LUG ... unfortunately, the demo failed because I had a lucid AMI in my environments.yaml	15:33
SpamapS	totally forgot that I had been monkeying around with the AMI. :-P	15:33
SpamapS	Pretty much flies off the rails at 22:00	15:33
* SpamapS goes off to get the family out so he can get work done.		15:33
hazmat	fwereade, connectivity problems?	16:12
fwereade	hazmat, yeah, sorry about that, didn't actually notice it happening until just now	16:13
hazmat	fwereade, no worries	16:13
* kees waves "hi"		16:21
kees	so, I discussed some of the trouble I had with the provision here last sunday. not the best time for catching people, i realize.	16:22
kees	*provisioner	16:22
kees	SpamapS pointed me to where cloud-init does it's work, but ultimately I wasn't able to get the provisioner back on its feet.	16:22
kees	hazmat: what's the best way for me to help debug the troubles I ran into?	16:23
SpamapS	kees: using the PPA version would go a long way to figuring out if this is already fixed or not.. which I suspect it may have been	16:25
SpamapS	kees: we still need to make the agents more robust and restartable, which fwereade is working on right now.. but I think some of the ZK stuff has been fixed since 11.10 released	16:25
kees	SpamapS: how do I find AMIs with the PPA version built-in?	16:26
kees	SpamapS: and why not SRU these fixes to Oneiric?	16:26
SpamapS	kees: you don't need an AMI.. you just add 'juju-origin: ppa' to your environment settings	16:26
SpamapS	kees: Its hard to isolate the fixes because there have been massive changes.	16:27
kees	SpamapS: hrm, let me try...	16:28
SpamapS	kees: also if your client version is from the PPA, it will automatically deploy with the PPA	16:28
* SpamapS curses himself for forgetting to run the test suite before commit to trunk.. https://launchpadlibrarian.net/86855907/buildlog_ubuntu-precise-i386.juju_0.5%2Bbzr428-1juju2~precise1_FAILEDTOBUILD.txt.gz		16:29
kees	SpamapS: if I just set "juju-origin: ppa", is that sufficient, or do I need to also install juju from the PPA?	16:29
* SpamapS puts on the cowboy hat		16:29
hazmat	kees, that's sufficient	16:30
SpamapS	Has the ZK schema bumped since r398?	16:30
hazmat	SpamapS, no	16:30
* kees attempts a bootstrap...		16:30
hazmat	SpamapS, there's been some minor additions, but no changes to the cli interactions	16:30
kees	one of the really goofy bugs I ran into was that --environment seemed to be ignored by a lot of commands	16:31
hazmat	kees, that's odd just about every command takes that option	16:32
hazmat	kees, it has to be specified after the sub command.	16:32
kees	i.e. I tried to do juju bootstrap --environment sample2 after my "sample" environment's provisioner freaked out.	16:32
kees	and then juju status --environment sample2 always failed.	16:32
kees	then I destroyed sample2, and then juju status couldn't find sample any more	16:32
kees	so I had to hard-code the instance list in the source to get control back.	16:33
SpamapS	kees: did they both have the same control-bucket ?	16:33
kees	what is a control-bucket? :)	16:33
SpamapS	kees: the thing that uniquely identifies an environment in the provider...	16:34
hazmat	kees, its an s3 bucket that's spec'd in environments.yaml.. its env specific	16:34
hazmat	it gets autogenerated the first time around, but it can't be copied between multiple environments, without causing issues	16:34
kees	ah, I see that now. does that get added automatically? I don't remember adding that or admin-secret	16:34
kees	yeah, that would totally be what happened then	16:35
kees	I just copied the entire "sample" section and changed the name.	16:35
hazmat	hmm.. we should probably warn/error if we see that come up	16:35
kees	heh, d'oh.	16:35
SpamapS	hazmat: yeah control-bucket should have the env name in it.. so we should be able to error out.. "control bucket foo has env name X not Y"	16:35
kees	seems like that should be stored somewhere else instead of injected into environment.yaml	16:35
kees	okay, well, that explains that glitch at least. :)	16:36
SpamapS	kees: its used by clients to find the ZK server, so it has to be in environments.yaml	16:36
SpamapS	Tho one thing that would work is to change it to control-bucket-prefix: .. and by default just prepend that to the env name.	16:37
hazmat	SpamapS, that would create an implicit fail scenario around changing an env name	16:37
kees	it might be nice to have the finding of the master instance show up in --verbose (i.e. the processing of the ec2 instance list, etc)	16:37
hazmat	although for local provider it already is	16:37
hazmat	since we use the env name on disk	16:38
kees	I spent a lot of time trying to figure out how juju was deciding which was a master instance when I broke it with sample2.	16:38
hazmat	kees, its always machine 0 atm	16:38
SpamapS	hazmat: err, env name can't be changed AFAICT, its used for so many things... ec2 group names for one.	16:38
hazmat	SpamapS, ugh.. good point	16:38
kees	hazmat: I mean the stuff before "Connecting to environment".	16:39
hazmat	SpamapS, that sounds quite sensible then.. along with a nice warning in the doc about it	16:39
kees	hazmat: when I bootstrapped using the same control-bucket, suddenly juju would only talk to the new instance	16:39
rog	could we derive the control bucket name by combining the env name and the access id in some way?	16:39
rog	thus removing the need for a user to invent another name	16:40
kees	https://juju.ubuntu.com/docs/getting-started.html#configuring-your-environment <- this could add some details about what the control bucket is.	16:40
_mup_	Bug #901311 was filed: automatically prefix control bucket with the environment name <juju:New> < https://launchpad.net/bugs/901311 >	16:41
rog	hazmat: could that work?	16:42
_mup_	juju/ssh-known_hosts r429 committed by jim.baker@canonical.com	16:42
_mup_	Merged trunk	16:42
SpamapS	rog: I'm a little hesitant to make use of the access key id in any permanent context	16:42
hazmat	robbiew, not sure we want to include access id its tied to an external/provider notion	16:43
rog	e.g. envname + salt + hash(salt+accessid)	16:43
SpamapS	rog: they can be created and discarded quite often	16:43
hazmat	rog, &	16:43
kees	is there documentation on the potential contents of environment.yaml?	16:43
hazmat	rog, take orchestra for example.. what's an access id.. or local provider, its a provider specific notion	16:43
kees	e.g. how would I discover "juju-origin: ppa" otherwise?	16:43
rog	hazmat: does orchestra have a control-bucket field?	16:44
hazmat	kees, https://juju.ubuntu.com/docs/provider-configuration-ec2.html?highlight=origin	16:44
hazmat	rog, doh.. good point	16:44
kees	hazmat: ah-ha! thanks. I knew I'd found that before at some point.	16:44
* SpamapS goes OTP		16:44
kees	hazmat: maybe link to that from https://juju.ubuntu.com/docs/getting-started.html#configuring-your-environment ?	16:44
hazmat	rog, the other thing with access id, is it assumes the identity is shared across all users of the env	16:45
hazmat	rog, which is true/required atm for bootstrap/destroy-environment	16:45
kees	what about making "juju-origin" be "PPA" by default, since that should always be the latest/greatest? that could be SRUed to oneiric.	16:45
rog	hazmat: that's true.	16:45
rog	hazmat: but it might be a useful default	16:46
rog	hazmat: if there's no entry for control-bucket, for example	16:46
hazmat	rog, maybe not though.. they need access to bucket, which we have setup as private by default atm.. i just want to leave options open for delegation of access	16:46
rog	hazmat: if we want multiuser access, the bucket must be readable by other users, right?	16:47
hazmat	rog, yeah.. i'm not sure we'd ever make that not a required arg for ec2, if its an auto on deterministic setting.. well you can change your id or switch accounts, and then poof your env is gone	16:47
kees	hazmat: okay, so, I spawned a bunch of units, and I've hit exactly what I saw on Sunday.	16:47
kees	machines:	16:47
kees	...	16:47
kees	6: {dns-name: '', instance-id: i-0b65094c}	16:47
kees	...	16:48
kees	builder-debian/5:	16:48
kees	machine: 6	16:48
kees	public-address: null	16:48
kees	relations: {}	16:48
kees	state: null	16:48
kees	machine 6 hasn't been noticed, and the unit stays "public-address: null"	16:48
rog	hazmat: isn't that already true? (given that the bucket is private)	16:48
hazmat	kees, also fwiw the latest client btw shows more information on status regarding machine state (pending from the provider, running, etc)	16:48
hazmat	kees, public-address is null till the machine actually comes up and starts the machine agent..	16:49
hazmat	its not instaneous	16:49
hazmat	it takes a minute, for the machine to launch, and have packages installed and to be available	16:49
kees	ah, well, it just came up. heh. sunday I waited though. it wasn't up after an hour.	16:49
hazmat	kees, definitely broken then, but its not something you can determine instaneously is all i'm saying	16:50
kees	hazmat: right, absolutely.	16:50
hazmat	kees, what i'm trying to verify though is.. A) is the bug something we've already fixed in the ppa B) if not what's the provisioning agent log look like	16:50
* kees nods		16:50
kees	let me try to trigger the missing machine fault, one sec.	16:50
kees	kaboooom	16:52
kees	here was my steps:	16:52
kees	$ juju terminate-machine 10	16:52
kees	oops, ignore that	16:52
kees	steps:	16:52
kees	$ juju remove-unit builder-debian/7	16:52
kees	$ juju terminate-machine 10	16:52
kees	$ juju add-unit builder-debian	16:52
kees	at which point the provisioner explodes with python backtraces	16:53
kees	2011-12-07 08:52:01,217 provision:ec2: twisted ERROR: KeyError: 'Message'	16:53
kees	2011-12-07 08:52:01,217 provision:ec2: twisted ERROR: Logged from file provision.py, line 156	16:53
kees	what logs can I provide? :)	16:53
hazmat	kees, awesome. the log is in /var/log/juju .. i think its provisioning-agent.log but i'm not sure of the exact filename	16:54
hazmat	kees, its on machine 0 of the env	16:54
hazmat	i think i kept using destroy-service instead of remove-unit when i was trying to reproduce this	16:55
drt24	So I am trying to use orchestra as per http://cloud.ubuntu.com/2011/09/oneiric-server-deploy-server-fleets-p2/ and this is failing because following those instructions does not appear to result in pxe booting being setup correctly on the provisioning server.	16:56
kees	hazmat: http://paste.ubuntu.com/762905/	16:56
drt24	I now have got the dhcp server running but it still isn't configured to do pxe things properly.	16:56
drt24	and so I get "No filename" errors when trying to boot client VMs	16:57
hazmat	kees, thanks thats very helpful	16:57
hazmat	that looks like a bug in txaws	16:57
drt24	(this is on oneiric VMs)	16:57
kees	hazmat: cool, excellent.	16:58
kees	hazmat: I assume that moving the ppa fixed the bring-up bug, or it's a hard race to lose and I just got "lucky" on sunday	16:58
hazmat	kees, its really not a racy normally, i'm sorry that was your first juju experience. the client cli status reporting is much better now about keeping the user informed about what's going on (is the provider machine up, is juju read on the machine). the provisioning bug in particular has been a little hard to reproduce, and its been unclear what version and what the bug is.. but i think thanks to your help we should be able to fix that i	17:06
hazmat	n the next day or two. and it indeed its seem to be a bug in txaws in that it varies/reproduces based on ec2 error response variation.	17:06
kees	cool, thanks for looking into it!	17:09
TheMue	hazmat: you wrote about a presentation about juju. would you please send it to me?	17:10
kees	it was frustrating for sure, but it was still _way_ easier to bring up a bunch of identical instances this way.	17:10
kees	the charm stuff is nice :)	17:10
hazmat	TheMue, we have them shared in an ubuntu one folder atm	17:17
TheMue	hazmat: ah, ok. still have not used my account. so I'll try it now.	17:19
TheMue	hazmat: does it cover the dependencies of external components (like zk) and internal/external modules and libraries	17:20
TheMue	?	17:20
hazmat	TheMue, no	17:20
hazmat	TheMue, its a very high level architecture diagram	17:20
TheMue	hazmat: ok, but I think it will help	17:21
fwereade	hazmat, btw, need an opinion on how it's acceptable to detect unexpected shutdowns during the critical window of filesystem-screwage during upgrade-charm	17:48
fwereade	hazmat, the workflow state seems like such an obvious place to put it, but I don't think it's a good idea to fire a transition while midway through executing another transition	17:50
fwereade	hazmat, so if I were to do that I'd have to have a callback on workflow that called set_state on itself explicitly	17:51
fwereade	hazmat, which feels like a bit of a perversion of the state machine	17:51
fwereade	hazmat, hm, I have to stop now :( I'll pop back on later	17:52
hazmat	fwereade, doh.. sorry.. definitely i think your idea is good (collapse part of upgrade op into the transition), go for it	17:53
fwereade	hazmat, the issue is that I feel I should be able to handle the fact that the process could suddenly die while we're half way through extracting the charm	17:54
hazmat	fwereade, that's independent really of the workflow aspect	17:54
fwereade	hazmat, well, the trouble is it's intimately bound up with it, because if we come up from an incomplete upgrade we need to go into upgrade_charm_error state	17:56
fwereade	hazmat, it certainly can't go on the lifecycle, we don't want that explicitly controlling the workflow	17:56
kees	hazmat, SpamapS: if you're interested, I've got another set of juju blog posts up now:	17:56
kees	http://www.outflux.net/blog/archives/2011/12/07/juju-bug-fixing/	17:56
kees	http://www.outflux.net/blog/archives/2011/12/07/how-to-throw-an-ec2-party/	17:56
hazmat	fwereade, sure, that state can be signaled by the error handler, but the aspect of doing the upgrade in such a way as to handle unexpected errors is independent of the location of the code	17:57
fwereade	hazmat, I guess it could go on the unit agent itself, but it's a step in he opposite direction from the (IMO nice) move of state-reconciliation from unit agent to workflow	17:59
fwereade	hazmat, teh workflow really feels like the right place for it	17:59
hazmat	fwereade, so what happens on a retry?	18:00
hazmat	of upgrade_error	18:00
fwereade	hazmat, the usual: if unit charm id doesn't match service charm id, download and unpack before running the hooks	18:01
fwereade	hazmat, and if it does, we know we're recovering from a state post-successful-replace, and we just fire the hooks if we're asked	18:02
hazmat	fwereade, sounds good	18:02
fwereade	hazmat, I'm just trying to figure out whether an "unlicensed" state transition, that doesn't go through the normal transition logic, is in any way acceptable	18:02
hazmat	fwereade, just make an additional transition	18:02
hazmat	fwereade, what's the scenario?	18:02
fwereade	hazmat, and it's explicitly OK to fire a transition in the course of another transition?	18:02
hazmat	fwereade, no.. but the lifecycle can call other lifecycle methods	18:03
fwereade	hazmat, when we hit the point of no return something needs to record the fact that we're in a risky state	18:03
kickinz1_	hi!	18:04
fwereade	hazmat, as said above I think the workflow is the right place for it	18:04
SpamapS	kees: ty, reading your posts now. ;)	18:04
hazmat	fwereade, huh? the transition handler itself is supposed to be risky/failable.. that's the benefit it and i thought the point.. it will record failures	18:04
kickinz1_	May I ask aquestion?	18:04
SpamapS	kees: btw, you should be able to use us-west-2 now ;)	18:05
hazmat	kickinz1_, sure	18:05
jimbaker	kees, cool post. i'm working on the ssh key management now, so that will take one step out of your process	18:05
kickinz1_	I'm in the process of using juju with orchestra	18:05
kickinz1_	When creating the boot strap, it fails with this error:	18:06
kickinz1_	/root/.juju/environments.yaml: environments.orchestra.default-series:	18:06
kickinz1_	The only place I see this is onbugs, but while using etckeeper.	18:06
hazmat	fwereade, ah.. i think we're agreeing.. i think the point of no return stuff should be in the transition handler with a conditional guard, hence failures there record state, and can be retried. sounds good.	18:07
kickinz1_	(https://bugs.launchpad.net/bugs/872553)	18:07
_mup_	Bug #872553: [SRU] upon creating a node via juju & orchestra, etckeeper hangs <verification-done> <Orchestra:Invalid by andreserl> <etckeeper (Ubuntu):Fix Released by kirkland> <orchestra (Ubuntu):Invalid by andreserl> <etckeeper (Ubuntu Oneiric):Fix Released by kirkland> <orchestra (Ubuntu Oneiric):Invalid by andreserl> < https://launchpad.net/bugs/872553 >	18:07
fwereade	hazmat, I'm not totally certain whether we're talking past one another or not, 1 sec	18:08
SpamapS	kickinz1_: can you maybe pastebin the whole error, like from $ juju .... to the next $ ?	18:08
kickinz1_	ok	18:08
fwereade	hazmat, I'm talking about something like this: http://paste.ubuntu.com/762978/	18:09
fwereade	hazmat, on UnitWorkflowState	18:10
fwereade	hazmat, damn, really must go, bbl	18:10
kickinz1_	http://pastebin.com/NNqBkiNn	18:10
kickinz1_	any idea?	18:15
kickinz1_	I'm using precise	18:16
hazmat	fwereade, the state changes should go in the watch callback not the workflow	18:16
hazmat	fwereade, the existing upgradecharm op will continue to exist, and it can do some basic checks, but it will kick off the state change after clearing the upgrade flag, the transition handler holds the rest of the code to the upgrade, it should be retryable cleanly, if it fails the unit goes into an upgrade_charm_error.	18:18
hazmat	kickinz1_, do you have a default-series set in your environments.yaml ?	18:19
_mup_	Bug #901343 was filed: juju.control.tests.test_status.StatusTest.test_render_dot broken <juju:In Progress by clint-fewbar> < https://launchpad.net/bugs/901343 >	18:20
kickinz1_	no	18:20
kickinz1_	I'm getting the source of juju to look at what it expect.	18:21
kickinz1_	Funny names...."astounding, mgnificent, overridden, puissant"...	18:24
kickinz1_	thanks! default-series: oneiric made it work!	18:25
niemeyer	Hello!	18:38
mainerror	o/	18:39
niemeyer	mainerror: Yo	18:42
niemeyer	rog: You'll like some of the upcoming improvements on lbox..	18:42
rog	niemeyer: cool	18:42
niemeyer	Just need to test them now.. no Launchpad connection on the flight :)	18:42
rog	niemeyer: a couple of new reviews for you BTW	18:43
niemeyer	rog: and you just got one	18:44
rog	niemeyer: yay!	18:44
rog	niemeyer: make that 3 new reviews - i'd forgotten about that one!	18:45
rog	niemeyer: i've updated the cloudinit package merge proposal	18:45
niemeyer	rog: Sorry, btw, I did a big mess before leaving while working on lbox..	18:45
niemeyer	rog: Repeatedly sending the same message	18:45
rog	niemeyer: that's fine. i just ignored 'em all :-)	18:46
rog	niemeyer: was there any signal in there, in fact?	18:46
drt24	solution to my problem: run sudo orchestra-import-isos and then add and remove the cobbler server configuration	18:46
niemeyer	rog: Any signal? How do you mean?	18:48
rog	niemeyer: did any of the messages mean anything?	18:49
niemeyer	rog: No, in the end I was on crack consistently, because both changesets were already merged	18:49
rog	niemeyer: i thought so. just checking.	18:49
rog	niemeyer: http://codereview.appspot.com/5444043/ in case you didn't get a notification email	18:50
rog	niemeyer: (that one's independent of the others)	18:50
niemeyer	rog: Thanks	18:51
niemeyer	rog: That was one of the things I fixed in the plane, btw	18:51
rog	niemeyer: cool	18:51
niemeyer	rog: It should now send a ptal	18:51
niemeyer	rog: The other is to detect the -cr automatically after first use	18:51
rog	niemeyer: ideally it should let me look at the codereview page before sending any mail	18:51
niemeyer	rog: and the other is to checkout target branches automatically for diffing	18:52
rog	niemeyer: just to do a last sanity check	18:52
niemeyer	rog: and finally I've added support for default flags	18:52
rog	niemeyer: with Go reviews, i often end up uploading several times before mailing	18:52
niemeyer	rog: Most of that is untested, though, obviously.. will be fun to see what works :-)	18:52
niemeyer	rog: That was done too	18:52
rog	niemeyer: +50 for auto downloading!	18:53
niemeyer	rog: there's a new -prep flag now	18:53
rog	oh yes, this one often bites me	18:53
rog	:	18:53
niemeyer	rog: You can use at any time to upload without requesting the review	18:53
rog	i'll do -target ../foo-trunk	18:53
niemeyer	rog: It will also leave the Merge Proposal in Launchpad as Work In Progress, rather than Needs Review	18:53
niemeyer	rog: We should put that in the branch itself	18:53
rog	and lbox propose doesn't check that the dir exists until after the file's been edited	18:53
niemeyer	rog: It looks for ".lbox"	18:53
rog	(the description)	18:53
niemeyer	Ok, let me give you some quick reviews	18:54
kees	SpamapS: us-west-2> yay! I will save a little money and a little latency. :)	18:56
kees	jimbaker: excellent! I look forward to that. :)	18:56
niemeyer	kees: <envy>	19:00
kees	niemeyer: ?	19:01
kees	niemeyer: oh, that I have an ec2 region in my state?	19:01
niemeyer	kees: Yeah :-D	19:01
SpamapS	https://code.launchpad.net/~clint-fewbar/juju/fix-dot-test/+merge/84827	19:02
SpamapS	Woudl appreciate a quick review cycle on that.. fixes the test suite on trunk.	19:02
SpamapS	Would even	19:02
kickinz1_	bye!	19:03
nijaba	SpamapS: Hello. Can you think of anything else that would be needed for Limesurvey's charm, or should I move on to roundcube?	19:11
SpamapS	nijaba: if it has the ability to make use of readonly slaves so we can scale it out even more, that would be cool, but its not really necessary. ;)	19:11
nijaba	SpamapS: I do not think this is possible in Limesurvey	19:12
SpamapS	nijaba: I plan to write a mysql-proxy charm when subordinate charms land that will direct SELECT to a slave, and all others to a master. Should be interesting. :)	19:13
nijaba	SpamapS: sounds really cool :)	19:13
SpamapS	I wonder if MySQL cluster works in any useful way on EC2.. probably not with the latency spikes.	19:13
SpamapS	7.2 will have memcache protocol access built in, that should be cool. :)	19:14
nijaba	SpamapS: ok, so I'll move on to Roundcube, making a first version of it that carries smtp/imap server address in the config. Will update it once someone will have charmed a mail server to depend on it	19:14
* nijaba wonders if dependencies can be made optional		19:15
SpamapS	nijaba: really even after the mail server is charmed, it will be useful to be able to just set it in the configs and not have to relate anything.	19:15
SpamapS	nijaba: yes, optional: true can be added as an attribute after the interface: xxx	19:15
SpamapS	which I find completely brain imploding.. requires: optional..	19:15
SpamapS	:-P	19:15
nijaba	SpamapS: so that' what we should do	19:15
rog	niemeyer: off for the day, see ya tomorrow?	19:16
rog	ttfn all	19:16
niemeyer	rog: Yeah, have a good evening	19:16
niemeyer	rog: and you have another review	19:16
_mup_	juju/sshclient-refactor r431 committed by kapil.thangavelu@canonical.com	21:13
_mup_	cleanup cli output when connection refused	21:13
SpamapS	bcsaller: I noticed you had some subordinate branches in review. How close are we to having some things to play with? I had this crazy idea for a charm..	21:22
SpamapS	mk-query-digest can take tcpdump output, and tell you what queries sucked	21:23
SpamapS	So.. throw that on your apps for 5 minutes, related back to somewhere to storage the output.. and you can get like, an instant picture of your app	21:23
SpamapS	and where it sucks	21:23
bcsaller	SpamapS: thats cool. While the feature set is getting closer to alpha its still at the starting gate in terms of reviews.	21:24
bcsaller	SpamapS: and history shows that always takes a whiel	21:25
SpamapS	Yeah	21:25
SpamapS	I'm eager	21:25
SpamapS	I have a bunch of cool ideas and I want to try them out. ;)	21:25
SpamapS	negronjl: btw, would appreciate a review on this https://code.launchpad.net/~clint-fewbar/charm/oneiric/mysql/add-config/+merge/84697	21:27
negronjl	SpamapS: ok ... working on it now	21:27
_mup_	juju/ssh-known_hosts r430 committed by jim.baker@canonical.com	21:34
_mup_	Do not create known_hosts files in actual home directory when testing	21:34
_mup_	juju/ssh-find-zk r430 committed by jim.baker@canonical.com	21:35
_mup_	Initial refactoring	21:35
_mup_	juju/ssh-find-zk r431 committed by jim.baker@canonical.com	21:36
_mup_	Merged upstream	21:36
negronjl	SpamapS: looks good ... deployed with multiple changes in config.yaml and it all works good as far as I can tell. Approved.	21:41
negronjl	SpamapS: should I merge it as well ?	21:41
SpamapS	negronjl: no I think the proposer should merge if they're a member of charmers	21:49
SpamapS	negronjl: and thanks for the review!	21:50
* SpamapS is eager to start writing tests as well		21:50
negronjl	SpamapS: no prob.	21:50
SpamapS	Pushed up to revision 69.	22:06
* SpamapS giggles like beavis and butthead		22:06
hazmat	niemeyer, i wanted to try one of the lbox -cr reviews, but it seems to have trouble.. it wants the -for branch to point something on disk, but it doesn't like a trunk checkout, any ideas?	22:13
_mup_	Bug #901463 was filed: SSH Client code and output cleanups <juju:In Progress by hazmat> < https://launchpad.net/bugs/901463 >	22:15
hazmat	also wasn't clear on the -bp if it wants the name of a blueprint or a link,	22:16
* SpamapS goes to get a RedBull from 7-11...		22:21
hazmat	bcsaller, can subordinate charms talk to each other in the same container level isolation that they can with the master?	22:24
bcsaller	hazmat: they need a relationship defined	22:25
hazmat	bcsaller, but would that be a normal s2s rel? or a container scoped one	22:25
bcsaller	hazmat: we talked about that, its wasn't clear that it was a high priority use case, I would think we'd honor the subordinate flag on the relationship. It would be a special case though as they are not subordinate to each other	22:27
bcsaller	hazmat: what use case do you see?	22:28
hazmat	bcsaller, in terms of impl if the container scope is just another type of relation	22:28
hazmat	bcsaller, i was thinking about doing something with volume management and backup for cassandra, a volume manager that can attach volumes to the node, and a backup cassandra plugin, that could snapshot and transfer data to the volume	22:28
_mup_	juju/ssh-find-zk r432 committed by jim.baker@canonical.com	22:29
_mup_	Fix tests to support refactoring	22:29
bcsaller	hazmat: so even without special support those could both be subordinate with a normal relationship and they would be able to filter for the right pair	22:29
bcsaller	but I think we can do better and you seem to as well	22:29
* SpamapS chugs redbull...		22:32
SpamapS	http://bit.ly/surjbS	22:32
SpamapS	BUG TRIAGE RAMPAGE!!!	22:33
* SpamapS storms off into launchpad		22:33
marcoceppi	louel, Gotenks.	22:35
Daviey	hazmat: bug 804203, is https://issues.apache.org/jira/browse/HBASE-2418 related?	22:39
_mup_	Bug #804203: Juju needs to communicate securely with Zookeeper <security> <juju:Confirmed for hazmat> < https://launchpad.net/bugs/804203 >	22:39
SpamapS	Daviey: its related in that HBASE needs to implement the same level of auth controls as Juju does when working with zookeeper.	22:44
hazmat	yup	22:45
hazmat	basically node level acls to protect portions of the zk tree from anon clients	22:46
* hazmat heads out to dinner		22:51
hazmat	Daviey, its not on the roadmap atm for 12.04	22:52
elmo	blink	22:53
elmo	seriously?	22:53
Daviey	SpamapS / hazmat: thanks	22:56
* SpamapS is somewhat frustrated about that one as well		22:58
niemeyer	hazmat: Hmm	22:58
niemeyer	hazmat: What does bzr info print for your checkout	22:58
niemeyer	?	22:58
SpamapS	Hmm, so config settings can't contain non-ASCII data	23:50
SpamapS	print >>stream, str(result)	23:50
SpamapS	UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128)	23:50
SpamapS	seems like just changing that to unicode(result) would work	23:51
SpamapS	of course, really, I just want the raw bytes no matter what..	23:52
niemeyer	Woohay.. new features of lbox working well	23:56
niemeyer	SpamapS: Ugh..	23:56
niemeyer	SpamapS: That's a super well known wart of Python :-(	23:56
SpamapS	wart in what way?	23:57
SpamapS	unicode is tricky?	23:57
niemeyer	SpamapS: Luckily 3.0 is fixing it, so people will stop doing it all the time	23:57
niemeyer	SpamapS: It's not on itself	23:57
niemeyer	SpamapS: The problem is how Unicode evolved within the language	23:57
SpamapS	Seems like a wart of all programming done before 2005 :-P	23:57
SpamapS	Java tried, even they got it wrong. :-P	23:57
niemeyer	SpamapS: >>> u"é" + "é"	23:57
niemeyer	Traceback (most recent call last):	23:57
niemeyer	File "<stdin>", line 1, in <module>	23:57
niemeyer	UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)	23:57
niemeyer	SpamapS: Just too easy to get wrong	23:58
niemeyer	SpamapS: 3.0 is fixing that by separating raw bytes from human text more clearly	23:58
SpamapS	oh thats good	23:58
niemeyer	SpamapS: 3.X, that is	23:58
SpamapS	so how do you get raw bytes with 2.7 ? unicode(var) ?	23:59
SpamapS	that seems wrong	23:59
niemeyer	SpamapS: "é"	23:59
niemeyer	SpamapS: That's raw bytes	23:59
niemeyer	SpamapS: But was also the correct way to do human text until several years ago	23:59
niemeyer	SpamapS: Hence the mess	23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!