/srv/irclogs.ubuntu.com/2011/12/07/#juju.txt

nijabaSpamapS: merge done00:02
nijabaresubmited00:02
* nijaba off to see morpheus00:03
SpamapSnijaba: merging now, THANKS!00:04
nijabaSpamapS: my pleasure00:05
* nijaba really having fun00:05
_mup_juju/ssh-known_hosts r427 committed by jim.baker@canonical.com00:13
_mup_Initial commit00:13
nijabaSpamapS: was wondering why having a readme is not mandatory for charm. Aren't they a bit dry without inbreed doc?  Shouldn't juju offer a "man" option to access documentations about charms?00:22
SpamapSnijaba: I've thought about exactly that, having a 'charm info xxx' command that intelligently looks for README* readme* and cats them together into less would be cool. :)00:23
SpamapSs/less/$PAGER/00:23
SpamapSnijaba: I think they also need a maintainer: field in metadata.yaml00:24
nijabaSpamapS: this last bit was acked a fe days ago, IIRC00:24
nijabaSpamapS: I'll open a bug :)00:24
adam_gSpamapS: hey clint, should i push this lonely precise branch (lp:~gandelman-a/charm/precise/rabbitmq-server/900440) directly to lp:charm/precise/rabbitmq-server since theres nowhere to file a proper merge proposal?00:35
SpamapSadam_g: how about you push the oneiric branch into precise, then do a MP against that?00:40
_mup_juju/ssh-known_hosts r428 committed by jim.baker@canonical.com00:40
_mup_Support machine recycling00:40
_mup_Bug #901017 was filed: Juju should have a "info" or "man" option <juju:New> < https://launchpad.net/bugs/901017 >00:40
adam_gSpamapS: ahh00:41
adam_gSpamapS: hmm. no dice pushing the current lp:charm/rabbitmq-server up to lp:charm/precise/rabbitmq-server http://paste.ubuntu.com/762302/00:45
SpamapSadam_g: right.. I guess I have to do the "initialize" step before we can push to that series. :(00:46
SpamapSadam_g: can you at least push it to ~gandelman-a/charm/precise/... ?00:46
adam_gSpamapS: yeah, already there: lp:~gandelman-a/charm/precise/rabbitmq-server/90044000:47
SpamapSadam_g: so you can probably push the oneiric one to lp:~charmers/.....00:48
SpamapSadam_g: then do the MP00:48
SpamapSadam_g: meanwhile I'll try to figure out how we initialize the series00:48
adam_gthat might work00:49
adam_gi could push to lp:~charmers/charm/precise/rabbitmq-server/trunk, suppose i'll propose against that...00:53
SpamapSyeah that will work00:54
_mup_juju/upgrade-config-defaults r429 committed by kapil.thangavelu@canonical.com02:26
_mup_use lazy computation of default values instead of recording them to config state02:26
_mup_juju/upgrade-config-defaults r430 committed by kapil.thangavelu@canonical.com02:32
_mup_config value validation no longer returns defaults02:32
_mup_juju/upgrade-config-defaults r431 committed by kapil.thangavelu@canonical.com02:35
_mup_no longer explicitly touch defaults in upgrade, the lazy computation suffices.02:35
_mup_Bug #901043 was filed: switch charm subcommand to change origin of charm and upgrade <juju:New> < https://launchpad.net/bugs/901043 >02:47
hazmatSpamapS, is bug 900517 different than the upgrade config defaults issue?02:51
_mup_Bug #900517: config-get on an int set to 0 does not return '0' but an empty string <juju:New> < https://launchpad.net/bugs/900517 >02:51
* SpamapS reads02:52
SpamapShazmat: its entirely possible that this was actually the same effect.02:52
SpamapShazmat: easy to test that hypothesis02:52
* hazmat does a UTSL02:53
hazmatSpamapS, i still haven't managed to reproduce bug 861928, i suspect its timing dependent, if you do manage to reproduce, it would be helpful to attach the entire provisioning agent log02:55
_mup_Bug #861928: provisioning agent gets confused when machines are terminated <juju:New for jimbaker> < https://launchpad.net/bugs/861928 >02:55
SpamapShazmat: interesting02:58
SpamapShazmat: you know.. kees was experiencing it on the oneiric version (r398) .. its possible that its been fixed inadvertently with some of the ZK / API fixes02:58
hazmatSpamapS, yeah.. jimbaker fixed another provisioning agent bug post oneiric afaicr02:59
SpamapShave we broken backward compatibility with r398 at all? I have half a mind to propose that we just put r427 in oneiric-updates02:59
SpamapSfeatures be damned. ;)03:00
SpamapSonly problem is.. we can't actually upgrade deployed environments03:00
hazmatSpamapS, i doubt that's an issue in practice03:01
SpamapSthat we can't upgrade the provisioning agent?03:01
hazmatSpamapS, that their are long lived juju environments extant03:01
SpamapSkees had one very long lived for doing sbuild fanout03:01
hazmatSpamapS, but fair enough03:01
hazmatSpamapS, he shut it down though.. 45usd spend03:02
SpamapSuntil it stopped working03:02
SpamapSAnyway, I agree, nobody should have a long lived 11.10 juju cluster. :)03:02
SpamapSwould be good to come up with an upgrade story for 12.04's juju03:03
SpamapSif william finishes the upstart job stuff.. we can at least put in the packages to stop/start the agents on upgrade03:03
hazmatindeed that will be key, we can probably do some of dance around that, but the biggest question mark on the upgrade story, is just coordinating a code drop/rev across a cluster of different release series03:05
hazmatideally just a binary drop..03:05
SpamapSWhich is why I think we're going to eventually have to host juju packages on the juju service nodes03:12
SpamapSOtherwise precise won't be able to play in a "Q" managed cluster03:12
SpamapSshould be fairly easy... each juju package just needs to include a script which builds itself for every series you want to support03:14
SpamapSand of course, we have to build a test suite which makes sure that actually works ;)03:15
hazmatSpamapS, re the config set to 0, afaics its not an issue03:38
hazmathmm.. maybe it is03:39
hazmatsomething sounds familiar03:39
SpamapSI was thinking it might be an issue where the value might not be carefully checked for None03:41
_mup_juju/sshclient-refactor r428 committed by kapil.thangavelu@canonical.com04:49
_mup_refactor the sshclient (zk over ssh tunnel)04:49
_mup_juju/sshclient-refactor r429 committed by kapil.thangavelu@canonical.com04:55
_mup_increase the default timeout04:55
_mup_juju/sshclient-refactor r430 committed by kapil.thangavelu@canonical.com04:55
_mup_robust zk conn04:55
* SpamapS cheers hazmat on04:59
* hazmat falls asleep05:06
rogmornin'07:41
TheMuemoo rog08:19
rogTheMue: yo08:21
mplrog: is the example in the example dir of zookeeper working for you (that is, once you've replaced Init with Dial and fixed the err.String() calls)?10:32
rogmpl: i'll try it10:32
mplrog: here I have two problems with it. 1) it doesn't return as it should if I don't have any zookeeper server running. 2) I get loads of error messages for error, coming apparently from this point: event := <-session (it doesn't get past there apparently).10:34
mpls/for error//10:34
rogmpl: yeah, me too - loads of time out errors10:38
rogmpl: i think the timeout must be wrong10:38
rogmpl: yeah, the timeout should be 5e9 not 500010:39
rogmpl: BTW i'm not sure what it should do if there's no zk server running10:40
rogmpl: here's my updated version: http://paste.ubuntu.com/762565/10:40
mplrog: well, I don't know what it should do, but err should be != nil when Dial fails, and it seems it's not the case for me.10:40
rogmpl: i'm not sure that Dial can ever fail10:41
mploh10:41
mplhow come?10:41
rogmpl: because the connection itself is asynchronous10:41
mplah yes10:41
mplgood point, thx10:41
mplso that err check is pretty moot10:41
rogmpl: i think that's wrong, and gustavo and i have talked about changing it in the past, but the changes haven't been made yet10:41
mplok, another thing I don't get, why do I get tons of messages and not just one? that chan read is not in a loop.10:42
rogmpl: looking at zk C source, it looks like the only way it can return an error is if the hosts arg is malformed10:45
rogmpl: the messages are printed by the zk client code10:45
rogmpl: (logging is turned on by default, which i think is wrong too)10:46
mplrog: you mean they come from underlying calls of Dial?10:49
rogmpl: yeah - they come from within the C API10:49
mplrog: and not in any case as a result of this: "event := <-session" ?10:49
rogmpl: indeed - that blocks until the connection is made. i don't know if zk ever decides that it can't connect.10:50
mplrog: ok, that's reassuring then,  thx.10:50
rogmpl: you can turn the debugging messages off10:50
mplah cool, it finally worked.10:52
rogmpl: zookeeper.SetLogLevel(0)10:52
mplgood to know, thx.10:53
mplrog: ok, I'll elaborate from that example to play with ssh.10:54
rogmpl: sounds good11:48
TheMuere12:30
hazmatg'monring12:45
TheMuemoo hazmat12:55
TheMuefor documentation purposes: are there some special bazaar configuration settings for juju?13:14
rogTheMue: not as far as i know13:35
TheMuefine, makes it easier13:35
TheMueI'm working on a "Getting Started"13:36
fwereadehazmat, is there some reason you know of for the particular shape of the code around CharmUpgradeOperation?13:38
fwereadehazmat, because the workflow is perfectly capable of synchronising the state if we make the charm upgrade much more like a normal transition, but it's much hairier if there's a reason *not* to do it as a normal transition13:40
hazmatfwereade, not sure what you mean13:52
hazmatfwereade, you mean push more of the operation out of the watch callback and into the transition?13:52
fwereadehazmat, that everything done CharmUpgradeOperation ought IMO to be done on the lifecycle, like the other things that happen as part of of a state transition13:54
fwereadehazmat, and if we do that we can easily just call "self.workflow.synchronize(executor)" in place of the boolean tangle in the original MP13:55
hazmatfwereade, hmm. so my thought there its not something that is manageable completely internal to the lifecycle,  it depends on external mutable persistent settings, which is very different then anything else in the lifecycle13:55
fwereadehazmat, on the service's charm id?13:56
hazmatie. you can't just call lifecycle.upgrade() and expect it to work, the external state needed to be put in place first.. where as you can call any of the other lifecycle methods13:56
hazmatfwereade, on the upgrade flag13:56
fwereadehazmat, hmm, hadn't had that perspective13:56
hazmatfwereade, i thought the plan was not to do anything on upgrade_error13:57
hazmatfwereade, how does this issue arrise?13:57
fwereadehazmat, you recall the plan to make the workflow know how to set up the lifecycle and executor to match the current state13:58
fwereadehazmat, to do so, we need to be able to detect the errors which occur while the executor is paused, so we can restore it correctly13:59
hazmatfwereade, i thought we'd moved on to its  an easy thing to distinguish in the upgrade transition, and we'll be dealing with disconnected op sync anyways, so exact match isn't nesc (queueing in the background)13:59
hazmatfwereade, the error from the executor is paused is noted in the state14:00
fwereadehazmat, how is it noted?14:00
fwereadehazmat, we don't even try to fire a transition until some time after we've stopped the executor14:00
hazmatfwereade, although juju could probably use a more robust setup there from pause, to enclose the rest in a try/except block14:00
hazmatfwereade, so from pause to transition, its set a zk value, and extract a charm to disk14:02
hazmatif the transition/hook fails we'll get into a recorded error state14:02
fwereadehazmat, and if anything goes wrong during the extract or the zookeeper set, we'll be in a weird state14:02
hazmatfwereade, a try/except around the others can manually fire transition to an error state14:03
hazmaton error14:03
hazmatfwereade, its an odd scenario regardless if we have a half extracted charm on disk14:03
* hazmat ponders14:04
fwereadehazmat, agreed, but I don't think we can guarantee that that will *never* happen14:04
hazmatfwereade, agreed, although we can do a better job of minimizing, but its not clear that encompassing more to the error state, is helpful wrt to retry, the coordination state is gone on retry14:10
hazmatthe flag is cleared, and we don't know that we can safely execute the upgrade hook again, because we don't know the state on disk or zk of the charm14:11
hazmatand if we renter the entire ugprade operation, we don't have the coordination state to trigger any changes, and it will early exit14:12
fwereadehazmat, isn't it just down to the order of operations?14:12
hazmatperhaps14:12
fwereadehazmat, if we extract, then set in ZK, then fire the hook14:13
hazmatfwereade, i don't see how that helps, the flag is cleared14:13
hazmatfwereade, and you can't set the flag in an error state14:14
hazmatfwereade, your right though, an error here should be recorded as a charm upgrade error14:14
fwereadehazmat, because we can know by the unit charm id whether or not the extraction of the latest charm has completed; if it has we can move straight on to firing the hooks(or not) according to the "resolved" command14:16
fwereadehazmat, if the charm ids don;t match, we start the operation from scratch14:16
fwereadehazmat, (when we retry)14:16
hazmatfwereade, so right now error states always refer to hook errors..14:17
fwereadehazmat, from the POV of the workflow state, which represents what the unit is actually doing, I feel that "half-extracted charm that's 100% broken" should absolutely represent an error14:18
hazmatfwereade, it definitely should, i'm just trying to work through the implications of changing the meaning of an error state, what retry means in this context, and changing the interactions/responsibilities of lifecycle compared to any extant uses.14:20
hazmatfwereade, there's a notion that upgrades flags shouldn't survive restarts, which is one reason why we cleared the flag early14:20
hazmati'm trying to recall if there was more to it that14:21
* SpamapS stretches and yawns14:21
hazmatfwereade, so when would the upgrade flag get cleared?14:22
fwereadehazmat, my idea is that we cleat the upgrade flag as soon as we see it, but we kick off an upgrade_charm transition, which is "started"->"started"14:22
fwereadehazmat, if we're not in a started state we just bail before we even try the transition14:23
* hazmat nods14:23
fwereadehazmat, the lifecycle.upgrade_charm will do the early parts before stopping the hooks and quietly bail out on errors, equivalently to now14:23
fwereadehazmat, but once we hit the stop-hooks-start-messing-with-disk-state point, any subsequent errors should come out and be detected as transition failures14:24
hazmatfwereade, how do you renter the upgrade charm state?14:24
hazmaterror state that is14:24
hazmaton a process restart14:25
robbiewhazmat: just got an email from fernanda...TZ mixup?14:25
fwereadehazmat, it's just an existing workflow state, I'm already in that state when I come up14:25
hazmatrobbiew, doh.. indeed that is tz mixup, i thought it was +1 hr14:26
hazmatfwereade, but the process mem state is different14:26
hazmatfwereade, ah.. so the executor is still stopped14:26
hazmatbecause we never started the lifecycle, and we're not listening to any rel lifecycles14:27
fwereadehazmat, lifecycle.running and executor.,running are not especially closely related14:27
hazmatfwereade, yup.. so if we restart in a charm upgrade error state.. the lifecycle is stopped, the exec is running, but nothing feeding into it14:28
fwereadehazmat, the executor needs to be stopped during upgrade error states14:29
fwereadehazmat, all teh rest of the time it's fine14:29
hazmatfwereade, how does it get stopped on restart14:29
fwereadehazmat, we just don't start it explicitly, we let the workflow do so if it's in a state which needs it14:30
hazmatfwereade, and how is it any different than the lifecycle just being stopped14:31
fwereadehazmat, so it's just "self.workflow.synchronize(self.executor)" and then we're in the state we must have been in when we left off last time14:31
fwereadehazmat, from outside perspective no different, I guess -- no hooks are executing -- but... well, why exactly are we explicitly stopping the executor when we could just stop the lifecycle like we do with, say, configure?14:33
fwereadehazmat, ...only just thought of that :/14:34
hazmatrobbiew, just rescheduled for 20m from now14:34
robbiewhazmat: cool14:34
hazmatfwereade, because the ability to run a hook now (ahead of any queued hooks) has a safety notion that the executor is stopped, in part to guarantee that there are no other currently executing hooks14:37
hazmatfwereade, i need to switch tracks for a little bit, but i'll definitely ponder this some more14:38
fwereade_hazmat, isn't the reason that the unit relation lifecycles' schedulers could still be busily executing queued hooks at any stage?14:38
hazmatfwereade, not sure if you saw this.. because the ability to run a hook now (ahead of any queued hooks) has a safety notion that the executor is stopped, in part to guarantee that there are no other currently executing hooks14:39
fwereade_hazmat, exactly so14:39
hazmatfwereade, i need to switch tracks for a little bit, but i'll definitely ponder this some more.. i think this is worthwhile.. part of the issue though on either an extract failure or a state change failure, is that its signals a signficant problem14:40
fwereade_hazmat, I think it comes down to my conviction that we're better off restoring process state on startup -- which state can be encapsulated in 2 bools -- than we are by complicating the logic we run all the time14:41
fwereade_hazmat, ok, ttyl -- ping me to continue when you're free :)14:41
hazmatfwereade_, isn't restoring the state as simple as is -> if not self.running: self.lifecycle.start, else self.executor.start()14:42
fwereade_hazmat, well, "started" implies both running, but yeah, it's not complicated14:42
fwereade_hazmat, you seemed at one stage to be arguing against it14:42
hazmatfwereade_, actually i was hoping for that since it was the simplest thing, but the notion that upgrade error should encapsulate non hook errors has some merit14:48
hazmatfwereade_, definitely worth exploring, and i think a good track14:49
fwereade_hazmat, I think it is the simplest thing14:49
SpamapShttp://www.ustream.tv/channel/vclug-venturaphp  .. me.. talking about juju to a local LUG ... unfortunately, the demo failed because I had a lucid AMI in my environments.yaml15:33
SpamapStotally forgot that I had been monkeying around with the AMI. :-P15:33
SpamapSPretty much flies off the rails at 22:0015:33
* SpamapS goes off to get the family out so he can get work done.15:33
hazmatfwereade, connectivity problems?16:12
fwereadehazmat, yeah, sorry about that, didn't actually notice it happening until just now16:13
hazmatfwereade, no worries16:13
* kees waves "hi"16:21
keesso, I discussed some of the trouble I had with the provision here last sunday. not the best time for catching people, i realize.16:22
kees*provisioner16:22
keesSpamapS pointed me to where cloud-init does it's work, but ultimately I wasn't able to get the provisioner back on its feet.16:22
keeshazmat: what's the best way for me to help debug the troubles I ran into?16:23
SpamapSkees: using the PPA version would go a long way to figuring out if this is already fixed or not.. which I suspect it may have been16:25
SpamapSkees: we still need to make the agents more robust and restartable, which fwereade is working on right now.. but I think some of the ZK stuff has been fixed since 11.10 released16:25
keesSpamapS: how do I find AMIs with the PPA version built-in?16:26
keesSpamapS: and why not SRU these fixes to Oneiric?16:26
SpamapSkees: you don't need an AMI.. you just add 'juju-origin: ppa' to your environment settings16:26
SpamapSkees: Its hard to isolate the fixes because there have been massive changes.16:27
keesSpamapS: hrm, let me try...16:28
SpamapSkees: also if your client version is from the PPA, it will automatically deploy with the PPA16:28
* SpamapS curses himself for forgetting to run the test suite before commit to trunk.. https://launchpadlibrarian.net/86855907/buildlog_ubuntu-precise-i386.juju_0.5%2Bbzr428-1juju2~precise1_FAILEDTOBUILD.txt.gz16:29
keesSpamapS: if I just set "juju-origin: ppa", is that sufficient, or do I need to also install juju from the PPA?16:29
* SpamapS puts on the cowboy hat16:29
hazmatkees, that's sufficient16:30
SpamapSHas the ZK schema bumped since r398?16:30
hazmatSpamapS, no16:30
* kees attempts a bootstrap...16:30
hazmatSpamapS, there's been some minor additions, but no changes to the cli interactions16:30
keesone of the really goofy bugs I ran into was that --environment seemed to be ignored by a lot of commands16:31
hazmatkees, that's odd just about every command takes that option16:32
hazmatkees, it has to be specified after the sub command.16:32
keesi.e. I tried to do  juju bootstrap --environment sample2 after my "sample" environment's provisioner freaked out.16:32
keesand then juju status --environment sample2 always failed.16:32
keesthen I destroyed sample2, and then juju status couldn't find sample any more16:32
keesso I had to hard-code the instance list in the source to get control back.16:33
SpamapSkees: did they both have the same control-bucket ?16:33
keeswhat is a control-bucket? :)16:33
SpamapSkees: the thing that uniquely identifies an environment in the provider...16:34
hazmatkees, its an s3 bucket that's spec'd in environments.yaml.. its env specific16:34
hazmatit gets autogenerated the first time around, but it can't be copied between multiple environments, without causing issues16:34
keesah, I see that now. does that get added automatically? I don't remember adding that or admin-secret16:34
keesyeah, that would totally be what happened then16:35
keesI just copied the entire "sample" section and changed the name.16:35
hazmathmm.. we should probably warn/error if we see that come up16:35
keesheh, d'oh.16:35
SpamapShazmat: yeah control-bucket should have the env name in it.. so we should be able to error out.. "control bucket foo has env name X not Y"16:35
keesseems like that should be stored somewhere else instead of injected into environment.yaml16:35
keesokay, well, that explains that glitch at least. :)16:36
SpamapSkees: its used by clients to find the ZK server, so it has to be in environments.yaml16:36
SpamapSTho one thing that would work is to change it to control-bucket-prefix: .. and by default just prepend that to the env name.16:37
hazmatSpamapS, that would create an implicit fail scenario around changing an env name16:37
keesit might be nice to have the finding of the master instance show up in --verbose (i.e. the processing of the ec2 instance list, etc)16:37
hazmatalthough for local provider it already is16:37
hazmatsince we use the env name on disk16:38
keesI spent a lot of time trying to figure out how juju was deciding which was a master instance when I broke it with sample2.16:38
hazmatkees, its always machine 0 atm16:38
SpamapShazmat: err, env name can't be changed AFAICT, its used for so many things... ec2 group names for one.16:38
hazmatSpamapS, ugh.. good point16:38
keeshazmat: I mean the stuff before "Connecting to environment".16:39
hazmatSpamapS, that sounds quite sensible then.. along with a nice warning in the doc about it16:39
keeshazmat: when I bootstrapped using the same control-bucket, suddenly juju would only talk to the new instance16:39
rogcould we derive the control bucket name by combining the env name and the access id in some way?16:39
rogthus removing the need for a user to invent another name16:40
keeshttps://juju.ubuntu.com/docs/getting-started.html#configuring-your-environment <- this could add some details about what the control bucket is.16:40
_mup_Bug #901311 was filed: automatically prefix control bucket with the environment name <juju:New> < https://launchpad.net/bugs/901311 >16:41
roghazmat: could that work?16:42
_mup_juju/ssh-known_hosts r429 committed by jim.baker@canonical.com16:42
_mup_Merged trunk16:42
SpamapSrog: I'm a little hesitant to make use of the access key id in any permanent context16:42
hazmatrobbiew, not sure we want to include access id its tied to an external/provider notion16:43
roge.g. envname + salt + hash(salt+accessid)16:43
SpamapSrog: they can be created and discarded quite often16:43
hazmatrog, &16:43
keesis there documentation on the potential contents of environment.yaml?16:43
hazmatrog, take orchestra for example.. what's an access id.. or local provider, its a provider specific notion16:43
keese.g. how would I discover "juju-origin: ppa" otherwise?16:43
roghazmat: does orchestra have a control-bucket field?16:44
hazmatkees, https://juju.ubuntu.com/docs/provider-configuration-ec2.html?highlight=origin16:44
hazmatrog, doh.. good point16:44
keeshazmat: ah-ha! thanks. I knew I'd found that before at some point.16:44
* SpamapS goes OTP16:44
keeshazmat: maybe link to that from https://juju.ubuntu.com/docs/getting-started.html#configuring-your-environment ?16:44
hazmatrog, the other thing with access id, is it assumes the identity is shared across all users of the env16:45
hazmatrog, which is true/required atm for bootstrap/destroy-environment16:45
keeswhat about making "juju-origin" be "PPA" by default, since that should always be the latest/greatest? that could be SRUed to oneiric.16:45
roghazmat: that's true.16:45
roghazmat: but it might be a useful default16:46
roghazmat: if there's no entry for control-bucket, for example16:46
hazmatrog, maybe not though.. they need access to bucket, which we have setup as private by default atm.. i just want to leave options open for delegation of access16:46
roghazmat: if we want multiuser access, the bucket must be readable by other users, right?16:47
hazmatrog, yeah.. i'm not sure we'd ever make that not a required arg for ec2, if its an auto on deterministic setting.. well you can change your id or switch accounts, and then poof your env is gone16:47
keeshazmat: okay, so, I spawned a bunch of units, and I've hit exactly what I saw on Sunday.16:47
keesmachines:16:47
kees...16:47
kees  6: {dns-name: '', instance-id: i-0b65094c}16:47
kees...16:48
kees      builder-debian/5:16:48
kees        machine: 616:48
kees        public-address: null16:48
kees        relations: {}16:48
kees        state: null16:48
keesmachine 6 hasn't been noticed, and the unit stays "public-address: null"16:48
roghazmat: isn't that already true? (given that the bucket is private)16:48
hazmatkees, also fwiw the latest client btw shows more information on status regarding machine state (pending from the provider, running, etc)16:48
hazmatkees, public-address is null till the machine actually comes up and starts the machine agent..16:49
hazmatits not instaneous16:49
hazmatit takes a minute, for the machine to launch, and have packages installed and to be available16:49
keesah, well, it just came up. heh. sunday I waited though. it wasn't up after an hour.16:49
hazmatkees, definitely broken then, but its not something you can determine instaneously is all i'm saying16:50
keeshazmat: right, absolutely.16:50
hazmatkees, what i'm trying to verify though is.. A) is the bug something we've already fixed in the ppa B) if not what's the provisioning agent log look like16:50
* kees nods16:50
keeslet me try to trigger the missing machine fault, one sec.16:50
keeskaboooom16:52
keeshere was my steps:16:52
kees$ juju terminate-machine 1016:52
keesoops, ignore that16:52
keessteps:16:52
kees$ juju remove-unit builder-debian/716:52
kees$ juju terminate-machine 1016:52
kees$ juju add-unit builder-debian16:52
keesat which point the provisioner explodes with python backtraces16:53
kees2011-12-07 08:52:01,217 provision:ec2: twisted ERROR: KeyError: 'Message'16:53
kees2011-12-07 08:52:01,217 provision:ec2: twisted ERROR: Logged from file provision.py, line 15616:53
keeswhat logs can I provide? :)16:53
hazmatkees, awesome. the log is in /var/log/juju .. i think its provisioning-agent.log but i'm not sure of the exact filename16:54
hazmatkees, its on machine 0 of the env16:54
hazmati think i kept using destroy-service instead of remove-unit when i was trying to reproduce this16:55
drt24So I am trying to use orchestra as per http://cloud.ubuntu.com/2011/09/oneiric-server-deploy-server-fleets-p2/ and this is failing because following those instructions does not appear to result in pxe booting being setup correctly on the provisioning server.16:56
keeshazmat: http://paste.ubuntu.com/762905/16:56
drt24I now have got the dhcp server running but it still isn't configured to do pxe things properly.16:56
drt24and so I get "No filename" errors when trying to boot client VMs16:57
hazmatkees, thanks thats very helpful16:57
hazmatthat looks like a bug in txaws16:57
drt24(this is on oneiric VMs)16:57
keeshazmat: cool, excellent.16:58
keeshazmat: I assume that moving the ppa fixed the bring-up bug, or it's a hard race to lose and I just got "lucky" on sunday16:58
hazmatkees, its really not a racy normally, i'm sorry that was your first juju experience. the client cli status reporting is much better now about keeping the user informed about what's going on (is the provider machine up, is juju read on the machine).  the provisioning bug in particular has been a little hard to reproduce, and its been unclear what version and what the bug is.. but i think thanks to your help we should be able to fix that i17:06
hazmatn the next day or two. and it indeed its seem to be a bug in txaws in that it varies/reproduces based on ec2 error response variation.17:06
keescool, thanks for looking into it!17:09
TheMuehazmat: you wrote about a presentation about juju. would you please send it to me?17:10
keesit was frustrating for sure, but it was still _way_ easier to bring up a bunch of identical instances this way.17:10
keesthe charm stuff is nice :)17:10
hazmatTheMue, we have them shared in an ubuntu one folder atm17:17
TheMuehazmat: ah, ok. still have not used my account. so I'll try it now.17:19
TheMuehazmat: does it cover the dependencies of external components (like zk) and internal/external modules and libraries17:20
TheMue?17:20
hazmatTheMue, no17:20
hazmatTheMue, its a very high level architecture diagram17:20
TheMuehazmat: ok, but I think it will help17:21
fwereadehazmat, btw, need an opinion on how it's acceptable to detect unexpected shutdowns during the critical window of filesystem-screwage during upgrade-charm17:48
fwereadehazmat, the workflow state seems like such an obvious place to put it, but I don't think it's a good idea to fire a transition while midway through executing another transition17:50
fwereadehazmat, so if I were to do that I'd have to have a callback on workflow that called set_state on itself explicitly17:51
fwereadehazmat, which feels like a bit of a perversion of the state machine17:51
fwereadehazmat, hm, I have to stop now :( I'll pop back on later17:52
hazmatfwereade, doh.. sorry.. definitely i think your idea is  good (collapse part of upgrade op into the transition), go for it17:53
fwereadehazmat, the issue is that I feel I should be able to handle the fact that the process could suddenly die while we're half way through extracting the charm17:54
hazmatfwereade, that's independent really of the workflow aspect17:54
fwereadehazmat, well, the trouble is it's intimately bound up with it, because if we come up from an incomplete upgrade we need to go into upgrade_charm_error state17:56
fwereadehazmat, it certainly can't go on the lifecycle, we don't want that explicitly controlling the workflow17:56
keeshazmat, SpamapS: if you're interested, I've got another set of juju blog posts up now:17:56
keeshttp://www.outflux.net/blog/archives/2011/12/07/juju-bug-fixing/17:56
keeshttp://www.outflux.net/blog/archives/2011/12/07/how-to-throw-an-ec2-party/17:56
hazmatfwereade, sure, that state can be signaled by the error handler, but the aspect of doing the upgrade in such a way as to handle unexpected errors is independent of the location of the code17:57
fwereadehazmat, I guess it could go on the unit agent itself, but it's a step in he opposite direction from the (IMO nice) move of state-reconciliation from unit agent to workflow17:59
fwereadehazmat, teh workflow really feels like the right place for it17:59
hazmatfwereade, so what happens on a retry?18:00
hazmatof upgrade_error18:00
fwereadehazmat, the usual: if unit charm id doesn't match service charm id, download and unpack before running the hooks18:01
fwereadehazmat, and if it does, we know we're recovering from a state post-successful-replace, and we just fire the hooks if we're asked18:02
hazmatfwereade, sounds good18:02
fwereadehazmat, I'm just trying to figure out whether an "unlicensed" state transition, that doesn't go through the normal transition logic, is in any way acceptable18:02
hazmatfwereade, just make an additional transition18:02
hazmatfwereade, what's the scenario?18:02
fwereadehazmat, and it's explicitly OK to fire a transition in the course of another transition?18:02
hazmatfwereade, no.. but the lifecycle can call other lifecycle methods18:03
fwereadehazmat, when we hit the point of no return *something* needs to record the fact that we're in a risky state18:03
kickinz1_hi!18:04
fwereadehazmat, as said above I think the workflow is the right place for it18:04
SpamapSkees: ty, reading your posts now. ;)18:04
hazmatfwereade, huh? the transition handler itself is supposed to be risky/failable.. that's the benefit it and i thought the point.. it will record failures18:04
kickinz1_May I ask aquestion?18:04
SpamapSkees: btw, you should be able to use us-west-2 now ;)18:05
hazmatkickinz1_, sure18:05
jimbakerkees, cool post. i'm working on the ssh key management now, so that will take one step out of your process18:05
kickinz1_I'm in the process of using juju with orchestra18:05
kickinz1_When creating the boot strap, it fails with this error:18:06
kickinz1_/root/.juju/environments.yaml: environments.orchestra.default-series:18:06
kickinz1_The only place I see this is onbugs, but while using etckeeper.18:06
hazmatfwereade, ah.. i think we're agreeing.. i think the point of no return stuff should be in the transition handler with a conditional guard, hence failures there record state, and can be retried. sounds good.18:07
kickinz1_(https://bugs.launchpad.net/bugs/872553)18:07
_mup_Bug #872553: [SRU] upon creating a node via juju & orchestra, etckeeper hangs <verification-done> <Orchestra:Invalid by andreserl> <etckeeper (Ubuntu):Fix Released by kirkland> <orchestra (Ubuntu):Invalid by andreserl> <etckeeper (Ubuntu Oneiric):Fix Released by kirkland> <orchestra (Ubuntu Oneiric):Invalid by andreserl> < https://launchpad.net/bugs/872553 >18:07
fwereadehazmat, I'm not totally certain whether we're talking past one another or not, 1 sec18:08
SpamapSkickinz1_: can you maybe pastebin the whole error, like from $ juju ....   to the next $ ?18:08
kickinz1_ok18:08
fwereadehazmat, I'm talking about something like this: http://paste.ubuntu.com/762978/18:09
fwereadehazmat, on UnitWorkflowState18:10
fwereadehazmat, damn, really must go, bbl18:10
kickinz1_http://pastebin.com/NNqBkiNn18:10
kickinz1_any idea?18:15
kickinz1_I'm using precise18:16
hazmatfwereade, the state changes should go in the watch callback not the workflow18:16
hazmatfwereade, the existing upgradecharm op will continue to exist, and it can do some basic checks, but it will kick off the state change after clearing the upgrade flag, the transition handler holds the rest of the code to the upgrade, it should be retryable cleanly, if it fails the unit goes into an upgrade_charm_error.18:18
hazmatkickinz1_, do you have a default-series set in your environments.yaml ?18:19
_mup_Bug #901343 was filed: juju.control.tests.test_status.StatusTest.test_render_dot broken <juju:In Progress by clint-fewbar> < https://launchpad.net/bugs/901343 >18:20
kickinz1_no18:20
kickinz1_I'm getting the source of juju to look at what it expect.18:21
kickinz1_Funny names...."astounding, mgnificent, overridden, puissant"...18:24
kickinz1_thanks! default-series: oneiric made it work!18:25
niemeyerHello!18:38
mainerroro/18:39
niemeyermainerror: Yo18:42
niemeyerrog: You'll like some of the upcoming improvements on lbox..18:42
rogniemeyer: cool18:42
niemeyerJust need to test them now.. no Launchpad connection on the flight :)18:42
rogniemeyer: a couple of new reviews for you BTW18:43
niemeyerrog: and you just got one18:44
rogniemeyer: yay!18:44
rogniemeyer: make that 3 new reviews - i'd forgotten about that one!18:45
rogniemeyer: i've updated the cloudinit package merge proposal18:45
niemeyerrog: Sorry, btw, I did a big mess before leaving while working on lbox..18:45
niemeyerrog: Repeatedly sending the same message18:45
rogniemeyer: that's fine. i just ignored 'em all :-)18:46
rogniemeyer: was there any signal in there, in fact?18:46
drt24solution to my problem: run sudo orchestra-import-isos and then add and remove the cobbler server configuration18:46
niemeyerrog: Any signal? How do you mean?18:48
rogniemeyer: did any of the messages mean anything?18:49
niemeyerrog: No, in the end I was on crack consistently, because both changesets were already merged18:49
rogniemeyer: i thought so. just checking.18:49
rogniemeyer: http://codereview.appspot.com/5444043/ in case you didn't get a notification email18:50
rogniemeyer: (that one's independent of the others)18:50
niemeyerrog: Thanks18:51
niemeyerrog: That was one of the things I fixed in the plane, btw18:51
rogniemeyer: cool18:51
niemeyerrog: It should now send a ptal18:51
niemeyerrog: The other is to detect the -cr automatically after first use18:51
rogniemeyer: ideally it should let me look at the codereview page before sending any mail18:51
niemeyerrog: and the other is to checkout target branches automatically for diffing18:52
rogniemeyer: just to do a last sanity check18:52
niemeyerrog: and finally I've added support for default flags18:52
rogniemeyer: with Go reviews, i often end up uploading several times before mailing18:52
niemeyerrog: Most of that is untested, though, obviously.. will be fun to see what works :-)18:52
niemeyerrog: That was done too18:52
rogniemeyer: +50 for auto downloading!18:53
niemeyerrog: there's a new -prep flag now18:53
rogoh yes, this one often bites me18:53
rog:18:53
niemeyerrog: You can use at any time to upload without requesting the review18:53
rogi'll do -target ../foo-trunk18:53
niemeyerrog: It will also leave the Merge Proposal in Launchpad as Work In Progress, rather than Needs Review18:53
niemeyerrog: We should put that in the branch itself18:53
rogand lbox propose doesn't check that the dir exists until after the file's been edited18:53
niemeyerrog: It looks for ".lbox"18:53
rog(the description)18:53
niemeyerOk, let me give you some quick reviews18:54
keesSpamapS: us-west-2> yay! I will save a little money and a little latency. :)18:56
keesjimbaker: excellent! I look forward to that. :)18:56
niemeyerkees: <envy>19:00
keesniemeyer: ?19:01
keesniemeyer: oh, that I have an ec2 region in my state?19:01
niemeyerkees: Yeah :-D19:01
SpamapShttps://code.launchpad.net/~clint-fewbar/juju/fix-dot-test/+merge/8482719:02
SpamapSWoudl appreciate a quick review cycle on that.. fixes the test suite on trunk.19:02
SpamapSWould even19:02
kickinz1_bye!19:03
nijabaSpamapS: Hello.  Can you think of anything else that would be needed for Limesurvey's charm, or should I move on to roundcube?19:11
SpamapSnijaba: if it has the ability to make use of readonly slaves so we can scale it out even more, that would be cool, but its not really necessary. ;)19:11
nijabaSpamapS: I do not think this is possible in Limesurvey19:12
SpamapSnijaba: I plan to write a mysql-proxy charm when subordinate charms land that will direct SELECT to a slave, and all others to a master. Should be interesting. :)19:13
nijabaSpamapS: sounds really cool :)19:13
SpamapSI wonder if MySQL cluster works in any useful way on EC2.. probably not with the latency spikes.19:13
SpamapS7.2 will have memcache protocol access built in, that should be cool. :)19:14
nijabaSpamapS: ok, so I'll move on to Roundcube, making a first version of it that carries smtp/imap server address in the config.  Will update it once someone will have charmed a mail server to depend on it19:14
* nijaba wonders if dependencies can be made optional19:15
SpamapSnijaba: really even after the mail server is charmed, it will be useful to be able to just set it in the configs and not have to relate anything.19:15
SpamapSnijaba: yes, optional: true can be added as an attribute after the interface: xxx19:15
SpamapSwhich I find completely brain imploding.. requires: optional..19:15
SpamapS:-P19:15
nijabaSpamapS: so that' what we should do19:15
rogniemeyer: off for the day, see ya tomorrow?19:16
rogttfn all19:16
niemeyerrog: Yeah, have a good evening19:16
niemeyerrog: and you have another review19:16
_mup_juju/sshclient-refactor r431 committed by kapil.thangavelu@canonical.com21:13
_mup_cleanup cli output when connection refused21:13
SpamapSbcsaller: I noticed you had some subordinate branches in review. How close are we to having some things to play with? I had this *crazy* idea for a charm..21:22
SpamapSmk-query-digest can take tcpdump output, and tell you what queries sucked21:23
SpamapSSo.. throw that on your apps for 5 minutes, related back to somewhere to storage the output.. and you can get like, an instant picture of your app21:23
SpamapSand where it sucks21:23
bcsallerSpamapS: thats cool. While the feature set is getting closer to alpha its still at the starting gate in terms of reviews.21:24
bcsallerSpamapS: and history shows that always takes a whiel21:25
SpamapSYeah21:25
SpamapSI'm eager21:25
SpamapSI have a bunch of cool ideas and I want to try them out. ;)21:25
SpamapSnegronjl: btw, would appreciate a review on this https://code.launchpad.net/~clint-fewbar/charm/oneiric/mysql/add-config/+merge/8469721:27
negronjlSpamapS: ok ... working on it now21:27
_mup_juju/ssh-known_hosts r430 committed by jim.baker@canonical.com21:34
_mup_Do not create known_hosts files in actual home directory when testing21:34
_mup_juju/ssh-find-zk r430 committed by jim.baker@canonical.com21:35
_mup_Initial refactoring21:35
_mup_juju/ssh-find-zk r431 committed by jim.baker@canonical.com21:36
_mup_Merged upstream21:36
negronjlSpamapS: looks good ... deployed with multiple changes in config.yaml and it all works good as far as I can tell.  Approved.21:41
negronjlSpamapS: should I merge it as well ?21:41
SpamapSnegronjl: no I think the proposer should merge if they're a member of charmers21:49
SpamapSnegronjl: and thanks for the review!21:50
* SpamapS is eager to start writing tests as well21:50
negronjlSpamapS: no prob.21:50
SpamapSPushed up to revision 69.22:06
* SpamapS giggles like beavis and butthead22:06
hazmatniemeyer, i wanted to try one of the lbox -cr reviews, but it seems to have trouble.. it wants the -for branch to point something on disk, but it doesn't like a trunk checkout, any ideas?22:13
_mup_Bug #901463 was filed: SSH Client code and output cleanups <juju:In Progress by hazmat> < https://launchpad.net/bugs/901463 >22:15
hazmatalso wasn't clear on the -bp if it wants the name of a blueprint or  a link,22:16
* SpamapS goes to get a RedBull from 7-11...22:21
hazmatbcsaller, can subordinate charms talk to each other in the same container level isolation that they can with the master?22:24
bcsallerhazmat: they need a relationship defined22:25
hazmatbcsaller, but would that be a normal s2s rel? or a container scoped one22:25
bcsallerhazmat: we talked about that, its wasn't clear that it was a high priority use case, I would think we'd honor the subordinate flag on the relationship. It would be a special case though as they are not subordinate to each other22:27
bcsallerhazmat: what use case do you see?22:28
hazmatbcsaller, in terms of impl if the container scope is just another type of relation22:28
hazmatbcsaller, i was thinking about doing something with volume management and backup for cassandra, a volume manager that can attach volumes to the node, and a backup cassandra plugin, that could snapshot and transfer data to the volume22:28
_mup_juju/ssh-find-zk r432 committed by jim.baker@canonical.com22:29
_mup_Fix tests to support refactoring22:29
bcsallerhazmat: so even without special support those could both be subordinate with a normal relationship and they would be able to filter for the right pair22:29
bcsallerbut I think we can do better and you seem to as well22:29
* SpamapS chugs redbull...22:32
SpamapShttp://bit.ly/surjbS22:32
SpamapSBUG TRIAGE RAMPAGE!!!22:33
* SpamapS storms off into launchpad22:33
marcoceppilouel, Gotenks.22:35
Davieyhazmat: bug 804203, is https://issues.apache.org/jira/browse/HBASE-2418 related?22:39
_mup_Bug #804203: Juju needs to communicate securely with Zookeeper <security> <juju:Confirmed for hazmat> < https://launchpad.net/bugs/804203 >22:39
SpamapSDaviey: its related in that HBASE needs to implement the same level of auth controls as Juju does when working with zookeeper.22:44
hazmatyup22:45
hazmatbasically node level acls to protect portions of the zk tree from anon clients22:46
* hazmat heads out to dinner22:51
hazmatDaviey, its not on the roadmap atm for 12.0422:52
elmo*blink*22:53
elmoseriously?22:53
DavieySpamapS / hazmat: thanks22:56
* SpamapS is somewhat frustrated about that one as well22:58
niemeyerhazmat: Hmm22:58
niemeyerhazmat: What does bzr info print for your checkout22:58
niemeyer?22:58
SpamapSHmm, so config settings can't contain non-ASCII data23:50
SpamapSprint >>stream, str(result)23:50
SpamapSUnicodeEncodeError: 'ascii' codec can't encode characters in position 0-10: ordinal not in range(128)23:50
SpamapSseems like just changing that to unicode(result) would work23:51
SpamapSof course, really, I just want the raw bytes no matter what..23:52
niemeyerWoohay.. new features of lbox working well23:56
niemeyerSpamapS: Ugh..23:56
niemeyerSpamapS: That's a super well known wart of Python :-(23:56
SpamapSwart in what way?23:57
SpamapSunicode is tricky?23:57
niemeyerSpamapS: Luckily 3.0 is fixing it, so people will stop doing it all the time23:57
niemeyerSpamapS: It's not on itself23:57
niemeyerSpamapS: The problem is how Unicode evolved within the language23:57
SpamapSSeems like a wart of all programming done before 2005 :-P23:57
SpamapSJava tried, even they got it wrong. :-P23:57
niemeyerSpamapS: >>> u"é" + "é"23:57
niemeyerTraceback (most recent call last):23:57
niemeyer  File "<stdin>", line 1, in <module>23:57
niemeyerUnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)23:57
niemeyerSpamapS: Just too easy to get wrong23:58
niemeyerSpamapS: 3.0 is fixing that by separating raw bytes from *human text* more clearly23:58
SpamapSoh thats good23:58
niemeyerSpamapS: 3.X, that is23:58
SpamapSso how do you get raw bytes with 2.7 ? unicode(var) ?23:59
SpamapSthat seems wrong23:59
niemeyerSpamapS: "é"23:59
niemeyerSpamapS: That's raw bytes23:59
niemeyerSpamapS: But was also the correct way to do human text until several years ago23:59
niemeyerSpamapS: Hence the mess23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!