[00:00] Ryan_Lane: w00t! [00:00] Ryan_Lane: m_3 is on holiday all week.. best email him to be sure he gets the message. :) [00:02] ah. ok. cool ;) [00:22] jimbaker, figured that one out [00:22] re connection error, its was the chain aware deferred thingy on errors [00:22] jimbaker, i'm wondering about resurrecting your last rewrite [00:23] config-get needs the same --format=shell that relation-get has [00:24] config-changed runs ridiculously slow with 40 execs .. :-P [00:34] ok Silly question but do you all have weekly IRC meetings and if so where can I find the wiki link to the agenda and meeting logs etc - I wanted to include juju in with the development teams weekly meetings section in UWN [00:35] No there's no weekly IRC meeting. [00:35] well that explains why I couldn't find anything - thanks :-) [00:35] SpamapS, sounds good [00:36] akgraner: we do a google hangout weekly, and it definitely *needs* an agenda and minutes.. [00:36] ahh ok when you all formalize that can you drop the link into the news channel so we can include it please :-) [00:36] hazmat, i hope you have better luck in that rewrite [00:37] SpamapS, looking forward to the juju webinar...:-) [00:37] but it makes perfect sense that it was the chain aware stuff - that was the obvious suspect [00:39] akgraner: yeah should be good. Hopefully we can finish our slides tomorrow and have time to practice. :) [01:20] <_mup_> Bug #900560 was filed: after upgrade-charm, new config settings do not all have default values < https://launchpad.net/bugs/900560 > [01:40] jimbaker: I just used juju scp btw. :) [01:41] jimbaker: ty! [01:54] SpamapS: I just uploaded hadoop-mapreduce charm ... It's a charm that allows you to decouple your mapreduce job from your hadoop cluster. [01:54] negronjl: w00t! [01:54] negronjl: would love to see a web interface for that.. isn't that kind of what HIVE does? [01:55] SpamapS: hive is more of a layer on top of hadoop, this charm just takes care of all the setup and cleanup that is supposed to happen before and after a job is run [01:55] SpamapS: I'll work on hive, pig, etc. eventually :) [01:55] SpamapS: hive lets you use more SQL like commands to query hadoop [01:56] that would be nice for demos.. dump a giant chunk of data in and start doing ad-hoc queries [01:56] SpamapS: I'll work on that soon ... [01:57] SpamapS: this new hadoop-mapreduce charm shows users how they can use their own custom mapreduce jobs, make a charm out of it and run it against a hadoop cluster that's already deployed via juju [01:57] SpamapS: so, it allows you to juju deploy it all :) [01:57] SpamapS: I'll work on hive next ( as time permits ) [01:59] can I make my own ubuntu servers juju compatible so that my apps do not use amazon's ec2 cloud? [02:00] sounds great tho [02:00] so it's not possible? [02:02] Skaag: you can, I was responding to negronjl ;) [02:02] ah, very cool [02:02] Skaag: right now you only can do that using the 'orchestra' provider [02:02] which costs money? [02:02] nope [02:02] just time.. ;) [02:02] its kind of raw [02:02] and you only get one service per machine [02:02] yes that sucks [02:03] I assume it's going to be fixed... [02:03] Yeah there are two features which will help with that.. subordinate services and placement constraints. [02:03] subordinate services will let you marry two things to basically create like a super charm. [02:04] placement constraints will let you define where you want services to be placed based on things like cpu, ram, or just arbitrary tags that you assign to machines [02:04] Skaag: so depending on what you want, your needs should be served [02:04] cripes, its late.. I need to go [02:04] * SpamapS disappears [02:05] sounds awesome! [02:05] thanks SpamapS [02:53] hi all, anyone experienced issue when deploying ganglia to EC2? (using "bzr checkout lp:charm/ganglia ganglia" command?) === Skaag_ is now known as Skaag [05:00] * hazmat yawns [05:20] SpamapS, glad juju scp proved to be useful [11:10] Good morning! [11:11] morning niemeyer! [11:15] rog: I've mixed things up in your review, sorry about that [11:17] That's what I take for review late at night [11:18] moo [11:38] rog: You have a saner review [11:38] TheMue: Yo [12:06] niemeyer: thanks, i'm on it [12:07] niemeyer, fwereade, TheMue: mornin' everyone BTW [12:07] heya rog [12:08] yo rog and fwereade [12:17] rog: Btw, gocheck has a FitsTypeOf checker that does what you wanted there [12:18] Assert(x, FitsTypeOf, (*environ)(nil)) [12:23] niemeyer: thanks. all done. https://codereview.appspot.com/5449065/ [12:29] why isn't this review handled on launchpad? [12:29] Daviey: It is.. [12:30] Daviey: https://code.launchpad.net/~rogpeppe/juju/go-juju-ec2-region/+merge/84256 [12:30] Daviey: But codereview does inline code reviews [12:30] Daviey: So we use both [12:32] seems a quirky workflow, but ok. :/ [12:33] Daviey: http://blog.labix.org/2011/11/17/launchpad-rietveld-happycodereviews [12:34] Daviey: The quirky workflow is significantly simpler and faster to deal with [12:35] Do we have a complete paper for the used tools, conventions, workflows, standards and conventions? [12:35] TheMue: Nope [12:36] TheMue: You can check out the Landscape one for some background [12:36] something like golangs "Getting Started" would be fine. nothing big. [12:37] TheMue: This blog post is a Getting Started pretty much [12:37] landscape one? [12:37] niemeyer: next: https://codereview.appspot.com/5449103/ [12:39] TheMue: Nah, nevermind.. reading it again I notice it's of little use for us [12:40] niemeyer: ok [12:40] niemeyer: one mo, i need to merge trunk [12:40] niemeyer: consider a unit agent coming up to discover that the unit workflow is in state "stop_error" [12:41] niemeyer: maybe I'll write down my "starter experiences" ;) [12:41] TheMue: Sounds like a plan [12:41] fwereade, rog: ok [12:41] niemeyer: it seems to me that that could only have happened as a result of a failed attempt to shut down cleanly at some stage [12:42] fwereade: Hmm.. yeah [12:42] niemeyer, and that in that case it would be reasonable to retry into "stopped" without hooks, and then to transition back to "started" as we would have done had the original stop worked properly [12:43] niemeyer, please poke holes in my theory [12:43] fwereade: What do you mean by `to retry into "stopped"` [12:43] ? [12:43] fwereade: The user is retrying? THe system? etc [12:44] niemeyer, I was referring to the "retry" transition alias [12:44] niemeyer, may as well just explicitly "retry_stop" [12:45] fwereade: I'm missing something still [12:45] fwereade: Probably because I don't understand it well enough [12:45] niemeyer, I think I've misthought anyway [12:45] fwereade: Does stop transition back onto "started" when it works? [12:46] niemeyer, a clean shutdown of the unit agent puts the workflow into "stopped", and that will go back into "started" when it comes up again [12:46] fwereade: Aha, ok [12:46] fwereade: That makes sense [12:46] fwereade: So the unit shuts down, and goes onto stop_error because stop failed.. [12:47] niemeyer: so I think the general idea, that coming up in "stop_error" should somehow be turned into a "started" state, remains sensible [12:47] fwereade: You mean automatically? [12:47] niemeyer: all clear now. [12:47] niemeyer, I do, but I may well be wrong [12:48] fwereade: I'm not sure [12:48] fwereade: I mean, I'm not sure that doing this automatically is a good idea [12:48] fwereade: I'd vote for explicitness until we understand better how people are using this [12:48] niemeyer, that's perfectly reasonable [12:48] fwereade: The reason being that a stop_error can become an exploded start and/or upgrade soon enough [12:49] fwereade: I'd personally appreciate knowing that stop failed, so I can investigate what happened on time, rather than blowing up in cascade in later phases, which will be harder to debug [12:49] niemeyer, that definitely makes sense as far as it goes [12:50] * fwereade thinks a moment [12:50] fwereade: resolved [--retry] enables the workflow of ignoring it [12:51] niemeyer, but it leaves the unit in a "stopped" state without an obvious way to return to a "started" state [12:51] niemeyer, I'll need to check whether bouncing the unit agent will then do the correct swicth to "started" [12:52] niemeyer, which I guess would be sort of OK, but it's not necessarily the obvious action to take to get back into "started" once you've resolved the stop error [12:53] fwereade: What about "resolved"? [12:53] fwereade: "resolved" should transition it to stopped rather than started [12:54] fwereade: It always moves it to the next intended state, rather than the previous one [12:54] niemeyer, yes, but "stopped" is not a useful state for a running unit to be in [12:55] fwereade: Indeed.. but I don't know what you mean by that [12:56] fwereade: The stop hook has run to transition it onto stopped.. if you resolved a stop_error, it should be stopped, not running [12:56] niemeyer: it should, yes, but the purpose of that stopped state is to signal that it's in a nice clean state for restart (if the restart ever happens ofc) [12:57] niemeyer: (it could just have gone down for ever, but I don't think that's relevant here) [12:57] niemeyer: we end up past the point at which we should detect "ok, we were shut down cleanly so we can be brought up cleanly, let's bring ourselves back up" [12:59] fwereade: You're implying that the stop hook is only called for starting, which is awkward to me [12:59] "without painful considerations like 'we're in charm_upgrade_error state, reestablish all watches and relation lifecycles but keep the hook executor stopped watches'" [12:59] (er, lose the trailing "watches" above) [13:01] niemeyer: I'm implying, I think, that the stop hook is called for stopping, and that that can either happen because the whole unit is going away forever *or* because we've been killed cleanly for, say, machine reboot [13:02] fwereade: It can also happen for any other reason, like a migration, or cold backup, etc [13:02] niemeyer: and that when the agent comes up again, "stopped" is considered to be a sensible state from which to transition safely to "started", just as None or "installed" would be [13:03] niemeyer: I'm not sure I see the consequences there, expand please? [13:03] fwereade: That's fine, but the point is that some of these actions are not safe to execute if stopped has actually blown up, I think [13:04] niemeyer: I'm comfortable with the decision not to automatically start after stop_error [13:05] niemeyer: I'm not confident that we have a sensible plan for transitioning back to started once the user has fixed and resolveded [13:06] fwereade: It feels like there are two situations: [13:06] fwereade: 1) The unit was stopped temporarily, without the machine rebooting [13:07] fwereade: 2) The unit was stopped because the machine is rebooting or is being killed [13:07] fwereade: Are you handling only 2) right now? [13:07] fwereade: Or is there some situation where you're handling 1)? [13:08] rog: Review delivered [13:08] niemeyer: on it [13:08] niemeyer, I'm handling both, I think -- how do we tell which is the case? [13:09] fwereade: Is there a scenario where the unit stops without a reboot? How? [13:10] niemeyer: not a *planned* situation [13:10] fwereade: What about unplanned situations.. how could it happen? [13:10] niemeyer: but what about, say, an OOM kill? [13:11] fwereade: OOM kill of what? [13:11] niemeyer, the unit agent... could that never happen? [13:11] niemeyer, even, a poorly written charm that accidentally kills the wrong PID [13:12] fwereade: Wow, hold on. The unit agent dying and the unit agent workflow are disconnected, aren't them? [13:12] Sorry [13:12] fwereade: Wow, hold on. The unit agent dying and the unit workflow are disconnected, aren't them? [13:12] niemeyer: the workflow transitions to "stopped" when the agent is stopped [13:12] fwereade: WHAAA [13:13] fwereade: How can it possibly *transition* when it *dies*? [13:14] niemeyer: ok, as I understand it, when we're killed normally, stopService will be called [13:14] niemeyer: it's only a kill -9 that will take us down without our knowledge [13:14] fwereade: and an OOM, and a poorly writen charm, and ... [13:15] niemeyer: indeed, I'm considering that there are two scenarios, respectively equivalent to kill and kill -9 [13:15] niemeyer: response made. [13:15] niemeyer: in neither case do we know for what reason we're shutting down [13:16] niemeyer: (and in only the first case do we know it's even happened) [13:17] fwereade: So why are we taking a kill as a stop? Didn't you implement logic that enables the unit agent to catch up gracefully after a period of being down? [13:17] niemeyer: we were always taking a kill as a stop [13:17] http://paste.ubuntu.com/761597/ [13:18] (given that a "friendly" kill will stopService) [13:18] fwereade: ok, the question remains [13:22] niemeyer: ...sorry, I thought you were following up your last message [13:23] niemeyer: I have implemented that logic, or thought I had, but have discovered a subtlety in response to hazmat's latest review [13:23] niemeyer: well, in hindsight, an obviousty :/ [13:23] g'morning [13:23] heya hazmat [13:24] i ended up rewriting the ssh client stuff last night [13:24] fwereade, i'm heading back to your review right now [13:24] hazmat: might be better to join the conversation here [13:24] * hazmat catches up with the channel log [13:24] hazmat: I'm pretty sure the initial version of the unit state handling is flat-out wrong :( [13:26] niemeyer: ok, stepping back a mo [13:27] niemeyer: when the unit agent comes up, the workflow could be in *any* state, and we need to make sure the lifecycles end up in the correct state [13:27] fwereade: Right [13:27] fwereade: What I'm questioning is the implicit stop [13:28] niemeyer: cool, we're on the same page then [13:28] fwereade: We should generally not kill the unit unless we've been explicitly asked to [13:28] fwereade: In the face of uncertainty, attempt to preserve the service running [13:28] niemeyer: I am very sympathetic to arguments that we should *not* explicitly go into the "stopped" when we're shut down [13:28] niemeyer: cool [13:29] niemeyer: am I right in thinking there's some bug where the stop hook doesn't get run anyway? [13:29] fwereade: I agree that we should transition stop_error => started when we're coming from a full reboot, though [13:30] fwereade: The stop story was mostly uncovered before you started working on it [13:32] niemeyer: heh, I think I'm more confused now :( [13:32] niemeyer: the discussion of start hooks on reboot seems to me to be unconnected with the actual state we're in [13:33] niemeyer: by it's nature, that uarantee is an end-run around whatever normal workflow we've set up, isn't it? [13:33] * fwereade resolves once again to lrn2grammar, lrn2spell [13:34] fwereade, why would it be an end run around to what the state of the system is? [13:34] errors should be explicitly resolved [13:35] i see the point re stop being a effectively a final transition for resolved [13:35] fwereade: Hmm [13:35] fwereade: I agree with hazmat in the general case [13:35] fwereade: What I was point at, though, is that there's one specific case where that's not true: reboots [13:36] fwereade: Because we're *sure* the unit was stopped, whether it liked it or not [13:36] hazmat, niemeyer: surely there are only a few "expected" states in which to run the start hook? [13:36] fwereade: That's a special case where *any* transition should be transitioned onto "stopped", and then the normal startup workflow should run [13:37] niemeyer: so we should set up state transitions to"stopped" for every state? [13:38] fwereade: That's a bit of a truism (sure, we don't run start arbitrarily) [13:38] i'm wondering if the reboot scenario is better handled explicitly [13:38] niemeyer: ...including, say, "install_error"? [13:38] hazmat: Right, that's my thinking too [13:38] via marking the zk unit state, rather than implicit detection [13:38] hazmat: Nope [13:38] hazmat: Machines do crash, and that's also a reboot [13:39] true [13:39] fwereade: No.. why would we? [13:39] fwereade: You seem to be in a different track somehow [13:39] fwereade: Maybe I'm missing what you're trying to understand/point out [13:40] niemeyer: I think the discussion has broadened to cover several subjects, each with several possible tracks ;) [13:41] * fwereade marshals his thoughts again [13:41] fwereade: There are two cases: [13:41] 1) Reboot [13:42] In that case, when the unit agent starts in this scenario, it should reset the state to stopped, and then handle the starting cleanly. [13:43] 2) Explicit stop + start (through a command to be introduced, or whatever) [13:43] In this scenario, a stop_error should remain as such until the user explicitly resolves it [13:43] resolving should transition to *stopped*, and not attempt to start it again [13:43] Then, [13:43] There's a third scenario [13:43] 3) Unknown unit agent crashes [13:44] Whether kill or kill -9 or any other signal, the unit agent should *not* attempt to transition onto stopped, because the user didn't ask for the service to stop. [13:44] Instead, the unit agent should be hooked up onto upstart [13:45] So that it is immediately kicked back on [13:46] Even if the user says "stop unit-agent".. we should not stop the service [13:47] 3) sounds good, the stop dance needs explicit coordination with non container agents, getting explicit transitions removed from implicit process actions is a win. [13:47] niemeyer: ok, so I guess at the moment we don't have anything depending on the existing stop-on-kill behaviour [13:48] 2) isn't really a use case we have atm, but sounds good, 1) the handing nears some exploration, what's an install_error mean in this context [13:48] hazmat: Good point, agreed [13:48] are we resetting to a null state, or are we creating guide paths to start from every point in the graph [13:49] hazmat: Even on reboots, it should take into account which states are fine to ignore [13:49] stop_error is one of them [13:50] start_error is another [13:50] niemeyer: so indeed I'm very happy with (3) and only a little confused about (2): because I'm now not sure when we *should* be in a "stopped" state [13:50] niemeyer, right.. by ignore you mean ignoring resolved action, and just reset the workflow to a point where we can call start, and possibly install again [13:51] fwereade: Imagine we introduce three fictitious commands (we have to think, let's not do them now): juju start, juju stop, and juju restart. [13:51] juju transition [13:51] fwereade: Do you get the picture? [13:51] hazmat: Ugh, nope.. mixing up implementation and interface [13:51] ;-) [13:52] niemeyer: yes, but I'm worried about here-and-now; I see https://bugs.launchpad.net/juju/+bug/802995 and https://bugs.launchpad.net/juju/+bug/872264 [13:52] <_mup_> Bug #802995: Destroy service should invoke unit's stop hook, verify/investigate this is true < https://launchpad.net/bugs/802995 > [13:52] <_mup_> Bug #872264: stop hook does not fire when units removed from service < https://launchpad.net/bugs/872264 > [13:52] juju coffee, bbiam [13:53] fwereade: Both of these look like variations of the 1) scenario.. why would we care about any erroring states if the unit is effectively being doomed? [13:56] niemeyer: it seems to me that if we don't stop on stopService, we will *never* go into the "stopped" state [13:56] niemeyer: I'm keen on making that change [13:57] fwereade: Why? Stop can (and should, IMO) be an explicitly requested action. [13:57] fwereade: Stop is "Hey, your hundred-thousand dollars database server is going for a rest" [13:58] niemeyer: back up again: you're saying that we shouldn't go into "stopped" just because the unit agent is shutting down [13:58] fwereade: It doesn't feel like the kind of thing to be done without an explicit "Hey, stop it!" from the admin [13:58] niemeyer: so it's a state that we won't ever enter until some hypothetical future juju start/stop command is implemented? [13:58] fwereade: Define "shutting down" [13:59] fwereade Shutting down can mean lots of things.. if it means "kill", yes, that's what I mean [13:59] fwereade: If it means, the unit was notified that it should stop and shut down, no, it should execute the stop hook [14:00] niemeyer: ok, that makes perfect sense, it just seems that we don't actually do that now [14:00] fwereade: No, it's not a hypothetical future.. a unit destroy request can execute stop [14:00] fwereade: Because it's an explicit shut down request from the admin [14:01] fwereade: But that's not the same as twisted's stopService [14:01] niemeyer: ok, but in that case we'll never be in a position where we're coming back up in "stopped" or "stop_error", which renders the original question moot [14:01] fwereade: That's right [14:02] fwereade: it's a problem we have, and that will have to be solved at some point, but it doesn't feel like you have it now [14:02] Alright, and I really have to finish packing, get lunch, and leave, or will miss the bus [14:02] niemeyer: sorry to hold you up, didn't realise :( [14:03] niemeyer: take care & have fun :) [14:03] fwereade: No worries, it was a good conversation [14:03] Cheers all! [14:06] niemeyer, have a good trip [14:06] fwereade, so do you feel like you've got a clear path forward? [14:06] hazmat: still marshalling my thoughts a bit [14:07] hazmat: the way I see it right now, I have to fix the stop behaviour and see what falls out of the rafters, and then move forward from there again [14:07] * hazmat checks out the mongo conference lineup [14:08] fwereade, fixing the stop behavior is a larger scope of work [14:09] fwereade, i'd remove/decouple stop state from process shutdown and move forward with the restart work [14:09] hazmat: ...then my path is not clear, because I have to deal with coming up in states that are ...well, wrong [14:11] fwereade, a writeup on stop stuff.. http://pastebin.ubuntu.com/761640/ [14:12] i feel mixing up different things though [14:12] fwereade, got a moment for g+? [14:12] hazmat, yeah, sounds good [14:47] hazmat: Thanks! [14:47] Heading off.. [14:47] Will be online later from the airport.. [15:05] quick query: anyone know of a way to get bzr, for a given directory, to add all files not previously known and remove all files not in the directory that *were* previously known? a kind of "sync", i guess. [15:06] i can do "bzr rm $dir" then "bzr add $dir" but that's a bit destructive [15:21] rog: I'm still a bit lost between the different gozk repos. what's the difference (in terms of import) between launchpad.net/gozk and launchpad.net/gozk/zookeeper? is one of them a subpart of the other or are they really just different versions of the same thing overall? [15:21] mpl: the latter is the later version [15:21] mpl: the former is the currently public version [15:21] mpl: i just found a pending merge that needs approval from niemeyer, BTW [15:23] rog: ok, thx. and with which one of them do you think I should work? [15:23] mpl: launchpad.net/gozk/zookeeper [15:24] rog: good, because gotest pass with that one for me, while they don't with the public one. also, how come there is no Init() in launchpad.net/gozk/zookeeper? has it been moved/renamed to something else? [15:25] mpl: let me have a look [15:27] mpl: Init is now called Dial [15:28] yeah looks like it, thanks for the confirmation. [15:31] hazmat: ping [15:31] lynxman, pong [15:33] mpl: seems like my merge has actually gone in recently [15:33] hazmat: quick question for you, we're trying to see how to do a remote deployment with juju [15:34] * hazmat nods [15:34] hazmat: so we were thinking about a "headless" juju deployment in which it connects to zookeeper once the formula has been deployed [15:34] hazmat: part of the formula being setting up a tunnel or route to the zookeeper node [15:34] hazmat: what are your thoughts about that? :) [15:35] lynxman, what do you mean by remote in this context? [15:35] hazmat: let's say I have a physical platform and I want to extend it by deploying nodes on another platform which is not on the same network [15:35] hazmat: but it's just an extension of the first platform, in N nodes as necessary [15:36] lynxman, like different racks different or like different data centers different [15:36] hazmat: different data centers different :) [15:36] ie. what's the normal net latency [15:37] hazmat: latency would be a bit higher than usual, let's say around 200ms [15:37] and increased risk of splits [15:37] hazmat: exactly :) [15:38] lynxman, so in that case, i'd say its probably best to model two juju different environments, and have proxies for any inter-data center relations [15:38] hazmat: hmm could you point me towards proper documentation for proxying? is Juju so far okay with that? [15:39] lynxman, hmm.. maybe its worth taking a step back, what sort of activities do you want that coordinate across the data centers [15:39] hazmat: we want to extend a current deployment, let's say a wiki app, set up the appropriate tunnels for all the necessary db backend comms [15:40] hazmat: just to support periods of higher traffic [15:42] hazmat: so I'd extend my serverfarm on certain hours of the day, for example :) [15:43] lynxman, well not exactly that which is well known.. extending it across the world during certain hours of the day on architecture that wants connected clients is a different order [15:43] lynxman, so proxies are probably something that would be best provided by juju itself, its possible to do so in a charm, but in cases like this you'd effectively be forking the charm your proxying [15:44] hazmat: pretty much yeah [15:44] hazmat: we want to use the colocation facility of juju to add a proxy charm under the regular one and such [15:44] lynxman, atm we're working on making its so that juju agents can disconnect for extended periods and come back in a sane state [15:45] hazmat: hmm interesting, any ETA for that or is it in the far future? [15:45] lynxman, its for 12.04 [15:45] hazmat: neat :) [15:46] lynxman, its not clear how the remote dc is exposed via the provider in this case which i assume is orchestra [15:46] hazmat: the idea is to add either a stunnel or a route to a vpn concentrator, which will be deployed by a small charm or orchestra itself as necessary [15:47] lynxman, right, but it wouldn't be a single orchestra endpoint controlling machines at each data center, they would be separate [15:47] hazmat: exactly [15:48] lynxman, so i'm still thinking its better to try and model this as separate environments [15:48] hazmat: it'd be configured by cloud-init [15:48] hazmat: hmm I see [15:48] hazmat: so any docs you can point me at on how to connect two zookeeper instances? [15:49] its not a single endpoint we're talking too, and even just for redundancy, we'd want each data center to be functional in the case of a split [15:50] * hazmat ponders proxies [15:51] so in this case you'd want to have the tunnel/vpn as a subordinate charm, and a proxy db, that you can communicate with. [15:52] hmm.. lynxman i think the nutshell is cross dc isn't on the roadmap for 12.04, we will support different racks, different availability zones, etc. but i don't think we have the bandwidth to do cross-dc [15:52] well [15:52] hazmat: well we're trying to investigate the options on that basically [15:53] hazmat: our first idea was a headless juju that could deploy a charm and as part of the charm connect itself back to the zookeeper [15:53] hazmat: just to keep it as atomic as possible [15:53] lynxman, fair enough, lacking support in the core, the options are if you have a single provider endpoint, you can try it anyways and it might work. or you'll be doing custom charm work to try and cross the chasm. [15:54] headless juju is not a meaningful term to me [15:54] hazmat: headless as the head being zookeeper :) [15:54] lynxman, still not meaningful ;-) [15:54] its like saying a web browser without html [15:55] hazmat: hmm the idea is to deploy a charm through the juju client and once the charm is setup let it connect through a tunnel to zookeeper to report back [15:55] hazmat: does that make more sense? [15:55] less [15:55] it would make more sense to register a machine for use by juju === amithkk is now known as sansui12 [15:56] since a single provider is lacking here [15:56] and then it would be available to the juju env, and you could deploy particular units/services across it with the appropriate constraint specification [15:56] hazmat: so I'd need to do the tunneling part as a pre-deployment before juju using another tool, be it cloud-init or such, right? [15:56] but the act of registration startups a zk connected machine agent === sansui12 is now known as amithkk [15:57] hazmat: then just tell juju to deploy into that machine in special using a certain charm [15:57] lynxman, what's controlling machines in the other dc? [15:57] lynxman, are they two dcs with two orchestras ? [15:57] hazmat: best case scenario cloud-init [15:57] hazmat: not necessarily [15:57] hazmat: but cloud-init is also integrated into orchestra so... [15:58] lynxman, yup [15:58] hazmat: it's a good single point [16:00] lynxman, the notion of connecting back on charm deploy isn't really the right one.. juju owns the machine since it created it, and the machine is connected to zk independent of any services deployed to it [16:00] hazmat: that's why I wanted to pass the idea through you to know what you thought :) [16:00] hence the notion of registering the machine to make it available to the environment, but thats something out of core as it violates any notion of interacting with the machine via the provider [16:02] hazmat: exactly, it does violate the model somehow [16:02] as far as approaching this in a way thats supportable in an on-going fashion, i think its valuable to try and model the different dcs as different juju environments that are communicating [16:03] then you could deploy your vpn charm as a subordinate charm through out one environment to provide the connectivity to the other env [16:04] the lack of cross env relations and no support for the core is problematic, but it sound more like a solvable case of a one-off deployment [16:04] via custom charms [16:05] actually maybe even generically if its done right.. [16:05] hazmat: but that's the idea, the other dc can be used on and off, different machines, different allocations [16:05] hazmat: that's why I was opting for an atomic solution [16:05] a proxy charm would be fairly generic [16:53] rog: which merge were you talking about? [16:54] mpl: update-server-interface (revision 24 in gozk/zookeeper trunk) [17:24] rog: bleh, I find launchpad interface to view changes and browse files really awkward :/ [17:25] mpl: i use bzr qdiff when codereview isn't available [17:26] bzr: (apt-get install qbzr) [17:26] rog: anyway, what is this merge about. are you pointing it out because it is relevant to the Init -> Dial change we talked about? [17:30] mpl: it had lots of changes in it [17:31] mpl: and quite possibly that change included, i can't remember [17:32] ok [18:32] fwereade, its not clear that coming up with an installed state would result in a transition. [18:32] to started [18:32] re ml [18:38] <_mup_> Bug #900873 was filed: Automatically terminate machines that do not register with ZK < https://launchpad.net/bugs/900873 > === lamal666 is now known as lamalex [19:07] jimbaker, incidentally this is my cleanup of sshclient.. http://pastebin.ubuntu.com/761938/ [19:09] hazmat, you have a yield in _internal_connect, but no inlineCallbacks [19:09] jimbaker, sure [19:11] hazmat, so i like the intent (the inline form is much better for managing errors imho), but is that supposed to work as-is? [19:12] jimbaker, do you see a problem with it? [19:12] jimbaker, there's a minor accompanying change to sshforward [19:12] hazmat, i just wonder why it's not decorated with @inlineCallbacks, that's all [19:20] So, can a charm run config-set and set config options? [19:21] no [19:21] well [19:22] maybe? [19:22] marcoceppi: its worth trying 'juju set' from inside a charm.. but my inclinationw ould be that it couldn't because it wouldn't have AWS creds to find the ZK server. [19:22] Ah, okay. [19:22] marcoceppi: I do think its an interesting idea to be able to adjust the whole service's settings from hooks. [19:23] For things like blowfish encryption key, I'd like to randomly generate it and have it set in the config so juju get will show it [19:23] * marcoceppi writes a bug [19:23] marcoceppi: not sure ZK is a super safe place for private keys [19:24] Neither is plaintext files, but that's what I'm working with [19:24] Yo [19:25] marcoceppi: what you want is the ability to feed data back to the user basically, right? [19:25] more or less [19:25] yes [19:25] marcoceppi: yeah there's a need for that, Not sure if a "config-set" would be the right place [19:26] mm, it's just the first thing that came to mind for me [19:26] marcoceppi: bug #862418 might encompass what you want [19:26] <_mup_> Bug #862418: Add a way to show warning/error messages back to the user < https://launchpad.net/bugs/862418 > [19:27] thanks [19:30] Ugh, this is probably the wrong channel, but I can't get this sed statement to work. Surely someone has had to escape a variable that was a path to work with sed before [19:30] I started with s/\//\\\//g because that seems like it would logically work, but it doesn't. [19:30] marcoceppi: you can use other chars than / [19:30] For a file path? [19:31] s,/etc/hosts,/etc/myhosts,g [19:31] ohhh [19:31] that actually helps a lot. [19:31] I completely forgot about that [19:31] Yeah, I get all backslash crazy sometimes too then remember [19:42] hello. Quick question: if I want to pass parameters to my charm (retrieved by config-get), do I need to explicitly mention the yaml file on the deploy command? I could not find doc about this. Did I miss it? [19:55] nijaba: you can pass a full configuration via yaml in deploy, or use 'juju set' after deploy. [19:55] SpamapS: but do I have to specify the yaml or will chamname.yaml be automatically used? [19:56] nijaba: if you want deploy to use a yaml file you have to mention it [19:56] SpamapS: ok, thanks a lot :) [20:08] SpamapS: another question. in my config.yaml, how do I set my default to be an empty string. I get: "expected string, got None" error [20:12] nijaba: default: "" [20:12] SpamapS: nm, found the doc finaly :) [20:16] marcoceppi: thanks :) [20:18] IMO the default for type: string *should* be "" [20:18] None comes out as "None" from config-get [20:19] or empty, I forget actually [20:19] hrm [20:22] "" works as expected [20:30] SpamapS: I've got a couple of improvements for charm-helper-sh (mainly fixes to wget) should I just push those straight to the branch or would a review still be a good thing? [20:30] marcoceppi: lets do reviews for everything.. we'll pretend people are actually using these and we don't want to break it. :) [20:31] <3 sounds good [20:31] you'd better, cause I do now :) [20:31] \o/ [20:31] It's actually a bug fix that would result in an install error :\ [20:35] SpamapS, do you have ideas on how to reproduce bug 861928 [20:35] <_mup_> Bug #861928: provisioning agent gets confused when machines are terminated < https://launchpad.net/bugs/861928 > [20:35] hazmat: yes, just terminate a machine using ec2-terminate-machines [20:35] SpamapS, ah [20:35] SpamapS, thanks [20:36] hazmat: I don't know if there's a race and you have to do it at a certain time. [20:37] SpamapS, i've been playing around with juju terminate-machine .. haven't been able to reproduce, i'll triggering it externally with ec2-terminate-instances [20:38] hazmat: right, because juju terminate-machine cleans up ZK first. :) [20:45] jimbaker, have you tried reproducing this one? [20:45] hazmat, not yet, i've been working on known_hosts actually [20:47] another stupid question: can I do a relation-get in a config-changed hook? how do I specify which relation I am taling about? [20:48] *talking [20:49] Flight time.. [20:49] Laters [20:51] nijaba, not at the moment [20:52] hazmat: harg. So if I have a config file that takes stuff from the relation, the logic to implement config-changed is going to be quite complicated... [20:52] nijaba, you can store it on disk in your relation hooks, and use it in config-changed [20:52] nijaba, or just configure the service in the relation hook [20:53] hazmat: ah, coool. thanks a lot [20:53] nijaba, don't get me wrong, that is a bug [20:53] hazmat: I don't, I just like the workaround [21:00] nijaba: one way I've done what you're dealing with is to just have the hooks feed into one place, and then at the end, try to build the configuration if possible.. otherwise exit 0 [21:21] <_mup_> txzookeeper/trunk r45 committed by kapil.foss@gmail.com [21:21] <_mup_> [trivial] ensure we always attempt to close the zk handle on client.close, stops background connect activity associated to the handle. [21:36] nijaba: tsk tsk.. idempotency issues in limesurvey charm. ;) [21:37] SpamapS: can you be a bit clearer? [21:37] nijaba: I've got a diff now.. :) [21:37] nijaba: if you remove the db relation and add it back in.. fail abounds ;) [21:37] nijaba: or more correctly, if you try to relate it to a different db server [21:37] SpamapS: ah. let me finish the current test and will fix [21:38] nijaba: no I have a fix already [21:38] nijaba: I'll push the diff up [21:38] SpamapS: ok thanks, will look [21:38] mv admin/install admin/install.done [21:38] chmod a-rwx admin/install.done [21:39] This bit is rather perplexing tho [21:39] SpamapS: for "security" reason, install procs are to be moved away or admin interface will compllain [21:40] SpamapS: they recommend to completely remove, but that'smy way of doing it [21:40] SpamapS: just so that it could be reused [21:41] negronjl: you ever see this type of java error with the hadoop charms? -> https://pastebin.canonical.com/56816/ [21:41] robbiew: checking [21:41] nijaba: ok, makes sense [21:42] robbiew: bad jar file ... [21:42] robbiew: Where is the jar file itself in the system ? [21:42] yeah...but the same example worked when I deployed using default 32bit oneiric system...this is a 64bit [21:42] nijaba: seems rather silly to remove perfectly good tools.. ;) [21:43] negronjl: /usr/lib/hadoop [21:43] i can unzip it [21:43] robbiew: did you do this all with juju ? [21:44] robbiew: if so, was it in AWS ? Do you have the AMI so I can re-create ? I normally use 64-bit oneiric ones and I haven't seen that error .. [21:44] robbiew: using the following .... default-image-id: ami-7b39f212 [21:44] default-instance-type: m1.large [21:44] default-series: oneiric [21:45] robbiew: in my environment.yaml file [21:45] * robbiew double checks his [21:45] hmm [21:46] negronjl: default-image-id: ami-c162a9a8 [21:46] robbiew: I normally use the latest m1.large, oneiric AMI that I can find in http://uec-images.ubuntu.com/oneiric/current/ [21:48] robbiew: Today's latest one ( for oneiric, m1.large ) is: ami-c95e95a0 [21:48] robbiew: I am currently testing that one just to be sure [21:48] negronjl: do we do them for m1.xlarge? [21:49] * robbiew was playing around with types [21:49] robbiew: I haven't tried them but, now is as good a time as any so, trying now :) [21:49] robbiew: give me a sec [21:49] lol [21:51] nijaba: http://bazaar.launchpad.net/~charmers/charm/oneiric/limesurvey/trunk/revision/10 [21:51] * SpamapS should have pushed that to a review branch.. bad bad SpamapS [21:51] robbiew: no xlarge images [21:51] * SpamapS runs off to dentist appt that starts in 8 minutes [21:51] robbiew: I see cc1.4xlarge though [21:51] robbiew: test that one ? [21:52] negronjl: yeah...so I'm clearly of in the weeds :) [21:52] eh...nah [21:52] let me use the right settings first ;) [21:52] I had it working with the defaults [21:52] robbiew: k ... let me know if I can break anything for you :) [21:52] so I figured it was user error [21:52] lol...sure [21:54] SpamapS: thanks :) but this means that the install proc may run multiple times even for the same db. does this work? [22:07] SpamapS: just proposed a merge for you [22:16] hazmat, you're right; I don't think a unit would automatically transition from installed to started, in the current trunk [22:16] hazmat, but the only valid starting state for the original code was None [22:18] hazmat, and given the clearly-expressed intent of the install transition's success_transition of "start", it seemed like a clear and obvious thing to do :) [22:18] gn all [22:42] SpamapS: I'm still writing tests for the hook, so I'll just push that up as a different merge request later [22:46] test for the helper* [22:47] hazmat: any chance we could get some workitems listed on https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-juju-roadmap? at least around the features we want to deliver? I need to confirm, but i think we can attach any bugs we want to fix. [22:48] * robbiew is sure SpamapS would love to help you guys with this (wink wink) [22:50] * hazmat takes a look [22:52] hazmat: doesn't necessarily have to be you....if there's a list of who's doing what, I can put the info in myself...is it the kanban? [22:52] robbiew, its in other blueprints [22:53] oh [22:53] hazmat: juju project blueprints? [22:53] * robbiew can look there [22:54] robbiew, https://blueprints.launchpad.net/juju [22:54] upstream vs distro blueprints. [22:54] hazmat: cool, thx [22:57] negronjl: so I'm still getting the error...maybe I'm missing something [22:58] do I need to use the hadoop-mapreduce charm now? [22:58] I was simply deploying hadoop-master, hadoop-slave, and ganglia [22:59] then relating the master and slave services...and ganglia and slave services [22:59] robbiew: you shouldn't .... [22:59] robbiew: the hadoop-mapreduce is there for convenience ... what are the steps of what you are doing ? [23:00] relate master slave....relate ganglia slave [23:00] pretty much m_3's blog post [23:01] robbiew: let me deploy a cluster ... give me a sec [23:01] it worked with default type and oneiric today....only difference is the 64bit types. [23:01] ok [23:03] <_mup_> juju/upgrade-config-defaults r428 committed by kapil.thangavelu@canonical.com [23:03] <_mup_> ensure new config defaults are applied on upgrade [23:04] robbiew: deploying now .. let me wait for it to complete and I'll see what it does [23:05] cool [23:06] negronjl: I wonder if it's a weird java bug [23:10] negronjl: was following the steps here: http://cloud.ubuntu.com/2011/11/monitoring-hadoop-benchmarks-teragenterasort-with-ganglia-2/ [23:10] fyi [23:10] gotta run and pick up kids...and do the dad thing...will be back on later tonight. If you find anything just update here or shoot me a mail [23:11] robbiew: ok ... I'll email it to ya [23:11] cool...good luck! :P [23:38] nijaba: yes install just exits if it is run a second time on the same DB [23:39] SpamapS: k, good :) [23:40] SpamapS: do you code your charms local or on ec2? [23:45] nijaba: ec2 [23:45] the ssh problem with the local provider makes it basically unusable for me. [23:45] nijaba: I was using canonistack for a while but it also became unreliable. :-P [23:45] EC2 is still the most reliable way to get things done with juju. :) [23:46] SpamapS: which one? canonistack or your's [23:46] us-east-1 ;) [23:46] was talking about openstack [23:46] nijaba: pinged you back on the merge proposal.. I think the right thing to do is just re-delete the "already configured" check [23:47] sounds good to me [23:47] nijaba: I only ever tried against canonistack. [23:47] k [23:47] nijaba: I really like limesurvey. Its a lot more fun than wordpress for testing things. :) [23:48] just have to convince it to stop using myisam. :) [23:48] SpamapS: hehe, it's been my learning project for both packaging and juju now :) [23:48] SpamapS: I still can't get the package in though [23:49] SpamapS: been ready for 2 years, lack sponsor for both debian and ubuntu [23:49] SpamapS: my merge should help you use InnoDB, it's now in the config [23:50] nijaba: packaging PHP apps is so pre-JUJU [23:51] SpamapS: true, true