[00:03] SpamapS, next time you do want to do a review though, ping me, I can at least pick up the easy ones to prescreen for ya. [00:10] jcastro: I need to get the python charm helpers into charm-tools actualy.. thats the current priority [00:12] I mean an opportunistic "whenever" [00:19] adam_g, if you have the provisioning agent log that would be helpful to diagnose.. is that against maas or orchestra? [00:19] ooh. [00:19] baremetal that is [00:20] that is odd, its not even showing the unit [00:24] hazmat: yeah, i watched the logs and there was nothing odd, let me go see if i can grep out that deployment [00:24] its since been working [00:25] this is an orchestra provider [00:28] http://paste.ubuntu.com/884140/ [00:28] adam_g, with no units like that, it would appear the units where destroy via juju remove-unit [00:29] adam_g, that's a fragment of the log [00:30] hazmat: on ec2, ive seen juju get trigger happy and start taking out nodes that i've manually added to the security group. is it capable of doing similar things with the orchestra provider? [00:30] hazmat: how much context would you like? the log is big [00:31] adam_g, yes.. it owns the security group on ec2, and will treat things it doesn't know about on ec2 as runaways and clean them up.. that behavior is also present on orchestra [00:31] adam_g, but again something would have to have removed the rabbitmq unit [00:32] ie.. juju remove-unit [00:32] and even then juju wouldn't kill the machine.. because it knows about it [00:32] and if the machine where dead out of band, the unit would still show [00:32] adam_g, i'll take as much context as you have [00:33] right [00:33] sure one sec [00:33] http://paste.ubuntu.com/884146/ [00:34] ^ that is from the teardown of the previous deployment through till the deployment following the failure [00:34] http://paste.ubuntu.com/884147/ <- thats the whole thing [00:35] hazmat: does zk have transactions of any kind, or could it be a transient thing caused by a timeout of some kind between client and zk? [00:38] SpamapS, it has atomic operations we use, it has a limited tx in 3.4.. as for the cause of this issue, i haven't seen anything in the logs that shows me its a bug [00:38] versus just acting on executed command [00:38] adam_g, how are you tearing down the env? [00:39] hmm.. it would be nice to get a dump of [00:39] zk [00:40] as is i see the unit was destroyed explicitly, and the machine to which it was assigned was removed as well [00:40] a service with no units, looks like the original status output [00:40] hazmat: i keep the bootstra node in place, and do something like: destroy all services, terminate all machines but the bootstrap, usually sleeping for some seconds between terminate-machine to allow power unit to catch up with requests [00:41] hazmat: 'as i see the unit was destroy explicitly'... which unit? the rabbitmq that is missing its machine? [00:41] ahh so remove-unit won't clean up an empty service [00:42] adam_g, its missing any units [00:42] SpamapS, yes [00:42] adam_g: does add-unit resolve things? [00:42] * hazmat tries to come up with a remote dump zk script [00:43] yeah.. that would verify [00:43] SpamapS: i can try next time i hit this... [00:44] adam_g, do you have this teardown automated? [00:44] adam_g, you should try the charmrunner tools [00:44] hmm [00:44] actually i guess the snapshot/restore assumes a local provider [00:45] easy to fix though [00:47] hazmat: yea, teardown is automated. i'd definitely like to combine efforts and standardize on whatever tools you guys are using at some point [00:47] FWIW, i'd never seen this issue until recently though, last 1.5 week or so [00:50] adam_g, pls keep that env alive for a few minutes more if not already dead [00:50] hazmat: still in place [00:50] adam_g, i'm almost done with a remote dump zk script [00:54] adam_g, the tools are a bit split.. i've got a few useful ones in charmrunner (charm test thingy), and there are some in jujujitsu [00:54] SpamapS, btw. nice name [00:54] adam_g, here's the script http://paste.ubuntu.com/884162/ [00:54] you can just python dumpzk.py -f filen.zip -e env_name [00:55] hazmat: name? [00:55] SpamapS, the jujujitsu name [00:55] Oh, hah, yeah, I love it. :) [00:56] I do hope others like the idea and want to dump more things into it. [00:59] hazmat: people.canonical.com/~agandelman/zk.zip this is from the current deployment in the same enviromment. the failed unit in that pastebin is gone by now. ill hang onto that script and dump it next time i run into the issue [00:59] adam_g, cool === Guest18667 is now known as jrgifford [01:04] adam_g, till then afaics from looking at status code, the rabbitmq unit was removed explicitly with juju remove-unit, and then the machine removed with juju terminate-machine [01:06] adam_g, but that seems odd, since i assume your just using destroy-service and terminate-machine for cleanup [01:06] hazmat: thats strange. nowhere in any of the automation we use is remove-unit called [01:06] right [01:06] anyways.. if you can run that script if it happens again that would be helpful [01:06] for sure [03:34] <_mup_> Bug #955677 was filed: provisioning agent crashes when deploying to a maas node < https://launchpad.net/bugs/955677 > === almaisan-away is now known as al-maisan === al-maisan is now known as almaisan-away === almaisan-away is now known as al-maisan === Leseb_ is now known as Leseb === Leseb_ is now known as Leseb === tobin is now known as Guest24410 [10:14] <_mup_> Bug #955576 was filed: 'local:' services not started on reboot < https://launchpad.net/bugs/955576 > === hspencer is now known as hspencer[afk] === asavu_ is now known as asavu === TheMue_ is now known as TheMue === al-maisan is now known as almaisan-away === medberry is now known as med_ === fjlacoste is now known as flacoste === elmo_ is now known as elmo === almaisan-away is now known as al-maisan === Guest24410 is now known as otbin === otbin is now known as tobin === tobin is now known as Guest10799 [16:11] <_mup_> juju/local-survive-restart r477 committed by kapil.thangavelu@canonical.com [16:11] <_mup_> upstartify local provider zk [16:31] \o/ === al-maisan is now known as almaisan-away [16:45] hazmat: my hero! :) [16:45] lxc and the local provider have gotten much better of late [16:49] SpamapS, its mostly unchanged outside of the upstartification of some bits [16:50] SpamapS, there's still some love needed for the whole failure scenario around lxc-wait [16:51] hazmat: yeah thats being looked at upstream... apparently you can only have one lxc-wait running at a time, and that is the krux of the problem [16:51] crux even .. :-P [16:51] SpamapS, well.. we're not properly passing it a bit mask around multiple states, we're just waiting for it to get to started, and on error it never does. but yeah.. the ability to ask it multiple times is also nice [16:51] er. concurrently === lifeless_ is now known as lifeless [16:53] hazmat: apparently it listens for a signal from lxc-start on a private socket so only one lxc-wait can be listening at one time [16:53] <_mup_> Bug #956183 was filed: Support suspending environment < https://launchpad.net/bugs/956183 > [16:53] hazmat: I'm pretty sure that master-customize also seems to not error on failure of any of its commands. === medberry is now known as Guest35857 [17:22] hazmat: around? [17:25] adam_g, yes [17:25] headless chicken [17:28] same here heh [17:29] maybe try a tournequette to stop the bleeding? [17:30] hazmat: so there seems to be some issues ATM /w juju + essex, which i think are security group related. i was going to see if you had a script/doc around that mimicks the boto calls juju runs in the ec2 provider. i was having trouble recreating using euca2ools. i can extract it all myself if you're bogged down, but figured id check first [17:33] un momento [17:35] adam_g, http://paste.ubuntu.com/885124/ [17:36] those are all the calls, but re security groups, there is one for the environment, and then one per machine [17:36] adam_g: one thing.. juju uses txaws, not boto [17:36] the environment has a rule to allow for internal group access [17:36] and then ones per machine are manipulate to allow for external access as the services with units on a given machine are exposed [17:38] the environment group is also used to help identify which machine in the provider juju has responsibility for, ie as a form of tagging [17:43] hazmat: thanks, ill check those. id like to be able to recreate the same security groups + rules manually on ec2 and nova. i think theres something screwy going on with rules that reference other groups [17:45] great idea>> iptables-save for ec2 security groups [17:47] adam_g, yeah.. it was a bit wonky last cycle as well for self-referential security group rules, ie. the metadata looked suspect, i think it worked well because effectively the enforcement wasn't in place. === Guest35857 is now known as med12345 === med12345 is now known as med___ [18:08] SpamapS, I've got two incoming charms that need a round 2 review [18:08] and m_3 is chilling at some ruby conference [18:08] but these will be easy. :) [18:08] cool [18:09] jcastro, I can prob pickup some review later tomorrow if that would help? [18:09] subway IRC and Alice IRC. [18:09] jamespage, actually what would help is you monitoring the incoming queue on occassion [18:09] let me get you a link [18:09] jcastro, sure [18:10] maybe we should try to doing something pilot'ish like we do for Ubuntu dev? [18:10] yeah [18:10] for now though: [18:10] https://bugs.launchpad.net/charms/+bugs?field.tag=new-charm [18:10] any of the new ones [18:11] like saltmaster or gearman? [18:11] and Fix Committed [18:11] saltmaster is incomplete, updated [18:11] fix committed is when the person was incomplete then wants another review [18:13] rightoh [18:13] and New is up for first review? [18:13] right [18:21] <_mup_> juju/refactor-machine-agent r461 committed by jim.baker@canonical.com [18:21] <_mup_> Merged trunk & resolved conflict === med___ is now known as med__ === med__ is now known as med_ === marcoceppi_ is now known as marcoceppi === dvestal is now known as dvestal|away [18:53] <_mup_> juju/relation-reference-spec r6 committed by jim.baker@canonical.com [18:53] <_mup_> Initial commit [19:34] <_mup_> juju/relation-hook-commands-spec r6 committed by jim.baker@canonical.com [19:35] <_mup_> Initial commit [19:42] <_mup_> juju/relation-info-command-spec r6 committed by jim.baker@canonical.com [19:42] <_mup_> Initial commit [19:42] <_mup_> Bug #956352 was filed: Enable relation hook commands to work with arbitrary relations. < https://launchpad.net/bugs/956352 > [19:45] <_mup_> juju/juju-status-changes-spec r6 committed by jim.baker@canonical.com [19:45] <_mup_> Initial commit [19:47] <_mup_> Bug #956357 was filed: Fix `juju status` bug when working with multiple relations for a service. < https://launchpad.net/bugs/956357 > [19:52] <_mup_> Bug #956372 was filed: Add `relation-info` to list relation ids associated with a service < https://launchpad.net/bugs/956372 > [19:56] <_mup_> Bug #956377 was filed: Enable unambiguous reference to relations by using a relation id < https://launchpad.net/bugs/956377 > [21:15] SpamapS, this might be more of an m_3 question but [21:16] if I want to see a big list of what charms are currently failing tests and that I should be looking to fix I go to .... ? [21:17] hm, why does yaml.dump have to make such ugly yaml? [21:17] jcastro: charmtests.markmims.com is what I've been looknig at [21:18] Looks dead tho [21:19] bummer [21:20] jcastro: Its a single charm, so you can also just deploy it.. ;) [21:21] ahh.. defualt_flow_style=False helps [21:22] jcastro: how much would you love a juju-jitsu subcommand called 'setup-environment' that did Q&A to fill in the blanks? [21:22] I would have a party [21:22] jcastro: polishing it off now [21:22] hey is this in the PPA yet? [21:23] no [21:23] still pretty raw.. so.. bzr branch and play.. [21:23] oh dude [21:23] you put the gource thing in here [21:24] jcastro: yes! [21:24] jcastro: just run it.. you get a gourcer on your default environment. :) [21:51] jcastro SpamapS charmtests back up [21:52] hit by the overly-strict type checking across the whole repo [21:52] m_3: did you pull the latest changes? I fixed most of them over the last week. [21:53] SpamapS: I did... essentially `charm list | grep lp:charms` [21:54] m_3: thats part of why I added the new --fix stuff to 'charm update' [21:55] I thougth that was for existing local repos.. this wipes andclean branches [21:56] win 17 [23:53] jcastro: there's a little surprise waiting for you on your blog ;) [23:58] hazmat: lots of progress! charmtests.markmims.com [23:58] looks like most of them are completing the graph runs without hanging [23:59] m_3, nice