[00:06] hazmat: does that mean that some degree of polling is necessary? [00:06] SpamapS, actually it was a bad test, watches are sent reliably in the event disconnection, when the client reconnects [00:07] SpamapS, i'm trying to tackle our zk error handling with eye to solving disconnect problems [00:07] i think i've got a good game plan, just writing some tests to verify behavior [00:08] i ended up having to create a tcp proxy, else the whole thing is way too timing dependent [00:08] MITM FTW ;-) [00:08] nice [00:38] sad reality.. my laptop is slower than EC2 for deploying [00:39] iterating on a 2 node ceph cluster over and over takes about 3 minutes per deploy/add-unit .. [00:39] EC2 can do it in 2 [00:39] (assuming my units go from pending to running in the usual 30 seconds) [01:48] bcsaller, if we ever have a sprint in boulder: http://www.yelp.com/biz/zudaka-healthy-latin-food-boulder-2 - all vegetarian (and vegan friendly), and it's south american cuisine, which is the unusual part [01:50] jimbaker: nice [02:12] <_mup_> txzookeeper/four-letter-client r44 committed by kapil.foss@gmail.com [02:12] <_mup_> four letter command admin client === _mup__ is now known as _mup_ [11:43] <_mup_> Bug #878114 was filed: launch multiple bootstrap nodes < https://launchpad.net/bugs/878114 > [12:09] hi guys [12:10] anyone used juju with OpenStack - is it as simple as configuring the environments.yaml and away you go or is there more to it? [12:13] uksysadmin: I haven't used it myself, so I'm afraid I don't know the details, but I'm pretty sure people have [12:14] uksysadmin: if hazmat's around he may be able to tell you more details [12:14] cheers - I'm about to have a go, but nice to know any gotchas up front [12:15] uksysadmin: we've certainly fixed bugs related to openstack compatibility [12:15] uksysadmin: please do let us know if you find any more :) [12:25] will do [12:47] before I start - I'm using 11.10 - do I need to add the ppas as per the documentation or is it ok to use the ones provided in universe? [12:54] (ignore - all is going ok so far... :)) [13:04] uksysadmin, sorry I missed you, the ones in universe should be just fine [13:15] great cheers [13:16] one things (and I'll cross post) - to do a bootstrap of a server under openstack, do I need 11.04/11.10 and if so where can I get suitable images from? [13:16] uksysadmin: look at Orchestra [13:17] Cheers HarryPanda, I am. just getting the basics going first by being able to do the juju bits [13:17] although... that comes with images... [13:17] I see your thinking [13:21] no dice (if that was what you were implying) [13:35] seeing if these work instead: http://uec-images.ubuntu.com/server/oneiric/current/ [13:56] uksysadmin, its pretty much as simple as configuring the environment [13:57] uksysadmin, use euca-describe-images.. to find an image that you can specify in environments.yaml [14:08] uksysadmin, course that has to be set up in openstack glance [14:12] sure, cheers hazmat [15:18] kim0, (or anyone) I'm trying to get the juju charms as per https://www.linux.com/learn/tutorials/495738-conducting-clouds-the-ubuntu-way-an-introduction-to-juju and I get bzr errors [15:18] not only under 11.10 am I getting a warning about bzrlib being different to bzr (fresh installs still get this) it says [15:19] bzr: ERROR: Not a branch: "http://bazaar.launchpad.net/~charmers/charm/oneiric/mysql/trunk/sql/". [15:20] if I forget the bzr repos and just do a juju deployment it says it can't find store.juju.ubuntu.com [15:21] crap .. principia is now gone, this article has been waiting at linux.com for too long [15:21] if it makes anyone feel better my juju bootstrap worked a charm ;-) [15:21] d'oh [15:22] good article though kim0 - is there a slightly modified version you've done elsewhere? [15:22] uksysadmin: prolly the most updated would be docs: https://juju.ubuntu.com/docs/ [15:23] I followed that too/similar stuff -but failed at: juju deploy --repository=examples mysql [15:23] oh wow it used principia? [15:23] yeah thats been gone for like 6 weeks [15:23] DNS lookup failed: address 'store.juju.ubuntu.com' not found: [Errno -5] No address associated with hostname. [15:23] 2011-10-19 16:20:08,676 ERROR DNS lookup failed: address 'store.juju.ubuntu.com' not found: [Errno -5] No address associated with hostname. [15:23] kim0: will they let us edit it? [15:24] uksysadmin: the docs haven't been updated in a while.. its supposed to be juju deploy --repository=examples local:mysql [15:25] * SpamapS is getting rather annoyed that the website still says the wrong thing [15:25] ok I'll see if that makes a difference... though strange if it fixes a DNS error... [15:25] uksysadmin: its not a "DNS error" .. the store isn't live yet. [15:26] uksysadmin: though we are pondering changing it so local: is always assumed [15:27] uksysadmin: bug 872164 if you'd like to comment/track it [15:27] <_mup_> Bug #872164: [Oneiric] Cannot deply services - store.juju.ubuntu.com not found < https://launchpad.net/bugs/872164 > [15:28] ta - will do [15:28] in the meantime - how do I get those lucky charms? [15:28] <_mup_> Bug #873907 was filed: Security group on EC2 does not open proper port < https://launchpad.net/bugs/873907 > [15:29] uksysadmin, easiest way is to apt-get install bzr, bzr branch lp:charm-tools, cd charm-tools.. ./charm get-all ../path/to/place-to-store charms [15:29] er.. s/bzr/mr [15:30] that will fetch all the charms of lp [15:30] s/of/off [15:30] cheers - no docs mention that (or ones that I'm being referred to) [15:30] juju.ubuntu.com/Charms should mention it [15:31] hazmat: we're about to get re-flooded.. linux.com article went out with the same problems. [15:31] would be good if the tutorial at /docs/ mentioned it then... [15:31] uksysadmin, we've been working on a charm store, that's directly integrated with the client, its not expected that folks will normally have to go down the charm-tools road [15:31] SpamapS, bummer [15:31] sorry for the bearer of bad news - was good up until the point about principia [15:31] uksysadmin, we've got a technical problem getting our docs updated that we're waiting on a sysadmin to fix.. [15:32] oh dear - I'll not pester you guys. I've probably got enough to get me going, thanks. [15:32] uksysadmin, its very good to know where the bad docs are ;-) [15:32] but the rest should be good [15:32] :) [15:40] uksysadmin: for 'lp:principia' you can replace it with 'lp:charm' and that should solve those problems [15:40] I think somebody already said that [15:41] ok, I'll give that a shot whilst the charms are checking out using the tool [15:42] kim0: btw, juju.ubuntu.com/docs is actually quite out of date at the moment. We're still trying to figure out why its not updating from the bzr tree. [15:45] on my 11.10 machine I did the tools getall, and in the directory I specified I needed to do a quick symlink to the current dir: (ln - s . oneiric) as it was failing with 2011-10-19 16:43:37,007 ERROR Charm 'local:oneiric/mysql' not found in repository ... [15:46] uksysadmin, hmm.. yeah. the charms need to prefix with their release series [15:46] after that it seems to be working [15:47] one could say its working like a charm [15:47] (are there openinings for a PR person? ;-)) [15:48] uksysadmin: too easy. ;) [15:50] in terms of the docs needed to be updated from bzr....if someone has hosting space and time to setup the docs, we can fix this fast than apparently IS can [15:50] at least until they are able to respond [15:53] robbiew, sure i can do it [15:53] robbiew, i'm tired of waiting [15:53] * hazmat setups a dns entry [15:53] hazmat: cool [15:53] have we tried #is yet? [15:54] SpamapS, i tried them last week, someone (?) took a look around didn't see the cron job [15:54] or where to update, feel to try again [15:54] on it [15:54] ticket#? [15:55] SpamapS, 48456 [15:55] ty [16:03] thanks all - apart from environmental restrictions (you try sanely running virtual under virtual) all seems to be good! [16:03] LXC isn't virtual. :) [16:03] its just contained [16:03] but yeah, my box hits a load of 12 quite often [16:04] not running under lxc - vbox [16:06] temporary issue fortunately [16:06] hometime here in the good ol' uk. thanks for your help. [16:07] ugh.. mongodb randomly restarted on me [16:19] hazmat: chaos monkey support.. disable with --no-chaos-monkey [16:20] SpamapS, lol [16:20] I was thinking we should have a chaos-monkey built into juju [16:21] We do, his name is Ben and he's a vegan [16:22] :) [16:25] i'm increasingly vegan. except for honey. or fish. also, i couldn't give up yogurt. and i have a soft spot for cheese, especially parmigiano-reggiano. but perhaps a little more vegan than not ;) [16:27] SpamapS: hazmat: docs fixed with hazmat's workaround until IS can respond [16:27] https://juju.ubuntu.com/Documentation should have the right stuff [16:28] jimbaker: maybe you just hate dairy cows and bees [16:28] jimbaker, thats awesome ;-) [16:28] SpamapS, to the contrary, i love them too much ;) [16:29] robbiew: at least we have somewhere to send them now. :) [16:29] robbiew: we should really fix that frame to be an iframe so there aren't two scroll bars. [16:29] eh..whatever [16:31] SpamapS: now you [16:31] are just trying to get cute [16:35] no.. the embedded window is way too small to hold the whole page [16:44] SpamapS: I can change that...one sec [16:46] SpamapS: reload [16:46] ;) [16:46] robbiew: better than before :) [16:47] I wonder if there's a way to say "embed this and make it as big as it wants to be" [16:47] * SpamapS *hates* html [16:47] probably [16:47] one sec...let me try [16:47] the doc html should scale down pretty well [16:52] bah! I haven't written html in forever....Spamaps, it's a wiki, so feel free to try :D [16:54] * SpamapS will just use the direct link [16:54] lol [16:54] So, my new favorite use for the local provider is to spin up a giant EC2 instance and use it there [16:55] because it just *kills* my laptop [16:55] $0.50/hour for an m2.xlarge is better than 40 $0.08/hour m1.small's per hour ;) [16:56] lol [16:56] * SpamapS wonders if lenovo makes a thinkpad w/ 12GB [16:56] just need 8GB of RAM for charms and the rest for compiz [17:09] heh.. piping 'debug-log' through 'ccze' makes the day a lot more fun I have to say [17:10] oooooooo [17:11] juju status | ccze -m ansi [17:11] *pretty* [17:15] interesting [17:47] So, with peer relations.. there needs to be something analogous to 'remove-relation ; add-relation' [17:47] I have all these idempotent hooks, I want to run them again [17:59] SpamapS, i don't follow [17:59] SpamapS, you can remove a peer relation, and add it [17:59] SpamapS, juju does auto activate peer relation is all [18:03] OH [18:03] very useful. :) [18:03] * SpamapS did not actually *try* removing it [18:03] so thats how I can get refreshes after upgrading charms [18:04] Tho that also calls broken.. which will likely take services down [18:04] SpamapS, m_3 made a nice suggestion on how to add relation iteration capabilities [18:04] before we sort of blocked on the anonymous relations from a server provides.. but the easy solution is to just qualifies though during iteration with the end point service name in addition to the local relation name [18:06] EPARSE [18:06] huh? [18:07] SpamapS, just thinking of making relations addressable from upgrade hooks [18:07] *that* is 100% necessary [18:07] (you may recall, I suggested very early that upgrade would require re-running every hook) [18:08] SpamapS, one of the issues is that for something that provides an interface, it could have multiple relations from things that require to effectively the same named relation interface [18:08] ie. provides creates what are effectively anonymous/non-addressable relations [18:08] bug 873116 and bug 767195 may be duplicates.. both I think opened by me. ;) [18:09] an easy way to qualify those would be to just suffix the endpoint service name [18:09] so they can be iterated and addressed in non ambigious fashion [18:09] relation-get variable_name [ unit_name ] [ relation_name ] ? [18:10] mysql charm ... list-relations -> db-wordpress, db-drupal .. to pass to relation-get [18:10] Oh the active relations [18:11] yeah thats bug 767195 .. you opened it.. I recall discussing this a while back [18:11] yeah... [18:13] Very high order stuff.. important, but can be worked around for now. [18:13] It would be quite useful in the ceph charm. What I have to do there is just store relation data locally and keep regenerating stuff from that. [18:13] fwereade_, any interest in looking at that? or do you want to roll with ha stuff, i can comment on your proposal on the bug [18:14] fwereade_: btw, great discussion on the SSH key management stuff. I feel like that is going to be really cool when we get to it. [18:14] i just sent out a large mail on regarding conn failure and session expiration analysis and what my plan is [18:14] hazmat: oh good. :) I just got hit by it [18:14] SpamapS, yeah.. making local provider work through hibernate is a great test scenario ;-) [18:15] that would definitely be nice [18:16] for me it just stops working after lunch [18:33] <_mup_> juju/ssh-passthrough r408 committed by jim.baker@canonical.com [18:33] <_mup_> PEP8, PyFlakes, docstrings [18:36] SpamapS, thanks :) [18:36] hazmat, sorry, need to catch up on context, was putting laura to bed [18:36] fwereade_, no worries [18:36] ah, I just saw that email, and suspected it would be relevant to my interests :) [18:37] fwereade_, the context for the relation stuff is in the bug links, just responding to your other email re ha, and then going to try and do some reviews [18:37] the review queue is overflowing [18:37] fwereade_, bcsaller, jimbaker if you have some time, we really should get some more reviews in [18:38] hazmat, duly noted :) [18:38] hazmat, will do, just want to get this ssh passthrough stuff done first [18:38] hazmat: I'll try for a couple today [18:38] thanks guys [18:39] taking too long but almost there (too many things that should be easy in argparse, aren't ;) ) [18:39] and, hazmat, I can happily work on whatever seems most sensible to you [18:40] hazmat, the HA stuff is interesting, but whether or not I should be working on it can probably be determined by how much I appear to have been on crack while writing that comment ;) [18:41] fwereade_, we should do a round discussion on it, the bug is probably a decent place for it, we can decide after that.. there's another low hanging but critical task, which is upstartifying all the agents [18:41] currently only local provider uses upstart for the unit agent, but creating an upstart module that can be used for all the agents would be a huge win on the way to ha [18:42] +100 for that [18:42] it would seem that the two features are hugely related [18:43] * hazmat grabs some coffee [18:43] i still think just relying on the fact that the provisioning agent can be restarted + a leader election would suffice for provisioning agent HA, for now. not the scalable solution of course [18:43] The're related in that they both will help with the resiliency of the system [18:43] jimbaker: ZK is mroe important [18:43] more rather [18:44] SpamapS, in terms of upstart of ZK? [18:44] And as long as you're going to run two ZK's, why not run two provisioning agents? ;) [18:44] jimbaker: in terms of HA for bootstrap [18:44] SpamapS, exactly [18:44] for upstart.. thats more about being able to reboot nodes [18:44] every ZK should have a corresponding provisioning agent, just makes sense [18:45] in terms of layout [18:45] there's a second task for rebooting, which is making sure that agents that are disconnected can recover from a long absence gracefully. [18:46] but really.. if you can just reboot them and block zk changes while something is gone.. thats a huge step forward. [18:46] maybe something like: is_bootstrap = fib(num_active_machines +2) while num_active_machines < 4, on all those machines we run a PA and ZK and the lowest machine id is leader [18:46] i think reboot is fine, certainly that works for the provisioning agent [18:46] for multi pa, i think fine grain locks are preferrable [18:46] their is parallel work to be done [18:46] for zk, its unesc. zk does its own leader election [18:46] hazmat, exactly, that would support better scalability [18:47] and the clients can connect to all of them and route appropriately [18:47] bcsaller: simple and elegant.. I would have no problem with that solution. :) [18:47] hazmat, leader is only for determining whether a given provisioning agent is active, other than the too simple solution i have in mind :) [18:47] upstart is a good first step though [18:47] true, no need for a leader PA [18:47] sorry, under the too simple solution [18:48] the zk service and pa just become another service managed by juju is the end goal i'd like get to [18:48] make it just another service managed through juju [18:48] hazmat: +1 [18:48] it just removes the SPOF. but i agree with the end goal [18:49] thought like we've mentioned namespaces for things like status seem even more important then [18:49] because you don't want to see juju internal services by default [18:49] I don't know if its that clear [18:50] some would say they want to see *everything* they are responsible for. [18:50] I mean, every machine is effectively related to bootstrap [18:50] bcsaller, yeah.. one step at a time though... namespacing might be a nice alternative to do strange internal service name checks for protection... with an option to show all namespaces [18:50] it also helps when we start to exploring hierarchies of services [18:51] Cars existed a long time w/o seatbelts. :) [18:51] I don't know if you need safety up front. Just sanity. [18:51] did rockets ever not have seat belts though? [18:52] Oh dear.. please don't tell me I've wandered into the rocket science lab? ;) [18:55] again, wouldn't it make more sense to follow this plan: remove the SPOF by just having one active provisioning agent + some number of standbys. the PA is extensively tested to follow its design, that it can always be restarted [18:55] then implement a better provisioning agent, which in fact does parallelize work [18:56] this can also get to the more desirable quality that the PA is just another service [18:57] Honestly [18:57] even the most active site with 1000's of nodes [18:57] I doubt can overwhelm a single PA [18:58] ZK will be the choke point there [18:58] SpamapS, agreed. i think we might have some issues in how we iterate the topology, etc. but not in instructing the cloud provider on what to do next [19:01] SpamapS, for instance, i think we could be smarter about the watch mgmt in the expose logic in the PA. too much duplicate work when it's just operating on the the toplogy node. but that's just a matter of having a better watch setup. (or alternatively, moving expose to the machine agent!) [19:01] I kind of like how we're taking advantage of the provider firewall [19:03] SpamapS, sure, and it is transparent in the usage, so point well taken [19:04] 2011-10-19 19:03:20,032: hook.output@ERROR: + relation-get hostname [19:04] 2011-10-19 19:03:20,222: hook.output@ERROR: Traceback (most recent call last): [19:04] Failure: juju.hooks.protocol.NoSuchUnit: The relation 'mon' has no unit state for 'ceph/11' [19:04] So... [19:04] in a departed hook.. [19:04] I can't get the relation data? [19:07] SpamapS, the provider firewall is just an optimization for ec2 at this point, it impl as the sole network security prevents security for other providers [19:07] Yeah seems like both are useful [19:07] in the future with a machine level firewalls, the ec2 firewall can be maintained just as a provider specific optimization [19:08] i've got another long email in the works on that topic [19:08] time for a doctor's appt, bbiab [19:50] does anyone have any experience with installing hbase on ec2? I am trying to find any documentation. I currently have Juju installed with hadoop-master / slave charms and would like to get Hbase running [19:51] evandev: IIRC, the hadoop master and slave charms only give you HDFS [19:51] Correct, I was just wondering if anyone had taken it a step further and installed hbase on top of that [19:52] Have not.. but if you want to take a crack at it, I'm sure m_3 would be interested in helping. :) [19:52] m_3: ^^ [19:55] <_mup_> Bug #878462 was filed: resolved --retry does not retry the hook < https://launchpad.net/bugs/878462 > [19:57] cool thanks [19:58] evandev: can HBase and HDFS share the same set of namenode/slaves ? [19:58] evandev: if so it might be best implemented as a config option on top of the existing hadoop charms. [19:59] ugh.. departed hook executes in parallel with unit removal... [19:59] no way to gracefully remove a ceph monitor node from another one then. HRM [19:59] I suppose the stop hook would work [20:00] Yea that was my next question [20:00] I thought about modify the charms [20:00] evandev: if you have pulled them down with 'charm getall', simplest thing to do is to unbind the charm, commit your changes, and then push to a bzr branch. [20:00] modifying* [20:00] * SpamapS realizes thats not in a wiki page and remedies the situation [20:01] SpamapS: thnx [20:01] ahh I think ill try that [20:02] thanks SpamapS [21:00] SpamapS, you mean in parallel across different units i assume, yeah.. [21:01] SpamapS, stop hooks are not executed atm, nother topic for discussion [21:06] hazmat: so, yeah, we need to provide graceful shutdown for clusters [21:07] hazmat: I'm thinking that you should not actually destroy the unit until its departed hooks have finished on all related nodes [21:07] this is already well documented in bugs tho.. [21:40] <_mup_> juju/ssh-passthrough r409 committed by jim.baker@canonical.com [21:40] <_mup_> Test for parse errors [22:02] bcsaller, looks like you meant to do an approve on https://code.launchpad.net/~hazmat/juju/unlocalize-network/+merge/79476 [22:03] jimbaker: I thought the second person did the approve [22:04] sure, and i can do that, but your comment was just a normal comment, not an approve comment [22:04] bcsaller, ^^^ [22:04] yeah, it should have been an approve then [22:05] ok, i've just approved it, since it's pretty clear in the merge proposal the intent [22:11] bcsaller, do you want to propose your branch for bug 873643? it looks good to me [22:11] <_mup_> Bug #873643: config values are re-set to their default values when only one is changed < https://launchpad.net/bugs/873643 > [22:12] jimbaker: I never got a reply from SpamapS about what he wanted done, I can propose my extension of his branch or he can merge and repush. [22:12] SpamapS, maybe you can delete your merge proposal for that bug? i don't know what the process should be, but i'm ready to approve bcsaller's work, between your trivial change and the reasonable test, it looks good with just the caveat that there's a grammatical error [22:12] in a comment [22:13] bcsaller, you can just unlink the old branch and link yours [22:13] its been pending for a while [22:13] hazmat: that might be best then [22:13] and it's high [22:13] on life [22:13] it's really impacting actual usage, so we should get it in [22:13] Yeah if bcsaller's is complete please do move forward [22:15] thre [22:15] handled [22:15] https://code.launchpad.net/~bcsaller/juju/config-do-not-overwrite/+merge/79890 [23:22] hazmat: great email about the timeouts. I think I hit that just when my system load gets high because some things take 3+ seconds [23:40] hazmat: I think this may be another "production" bug.. when this hits.. the agents basically are dead in the water. [23:41] SpamapS, ? [23:41] hazmat: The weirdness I reported last week seems to be a timeout [23:41] SpamapS, what's the defect? [23:42] bug 875903 [23:42] <_mup_> Bug #875903: Zookeeper errors in local provider cause strange status view and possibly broken topology < https://launchpad.net/bugs/875903 > [23:43] SpamapS its two different issues [23:43] SpamapS, one the session expired, so the units are dead [23:43] SpamapS, two status was reporting based on the recorded state instead of taking into account the presence of the connected agent [23:43] the second issue has been addressed by a branch fwereade_ has in the review queue [23:44] the first by the timeout email [23:44] Ok [23:44] I have been running into a lot of the weird status.. [23:44] not suspending/hibernating/anything [23:44] just using it through the day [23:45] SpamapS, hmm.. with high load / swapping? [23:45] some load, no swap [23:45] disk is definitely *slammed* [23:45] SpamapS, i'd probably attribute it to the same [23:45] I have no doubt that occasionally some things block for 3 seconds [23:46] which is why I moved my testing to an m2.xlarge for a while today [23:46] with a giant tmpfs volume [23:46] no such issues over there. :) [23:47] at $0.50/hour, its a bargain compared to dealing with my silly laptop [23:48] 2011-10-19 23:48:45,173:480(0x7fa28ae7f700):ZOO_ERROR@handle_socket_error_msg@1621: Socket [192.168.122.1:48263] zk retcode=-112, errno=116(Stale NFS file handle): sessionId=0x1331e917b100004 has expired. [23:49] 16:49:25 up 3 days, 17:12, 2 users, load average: 9.03, 5.04, 3.26 [23:51] hazmat: so .. yeah.. this is frustrating. [23:52] * SpamapS realizes he's late to pick up the little one and signs off for the day [23:52] SpamapS, yeah.. the fix is actually pretty small and straightforward, just needs some good tests [23:52] i'm in progress on it, but trying to take some time today to reviews [23:54] SpamapS, is suspect on ec2 its the vagaries of virtual and the load from the multiple units [23:54] s/is/i