[00:39] hazmat, this looks provocative. the last line in the formula.log for wordpress/0 - 2011-04-21 23:30:08,060: twisted@ERROR: TypeError: 'NoneType' object is not callable [00:40] jimbaker, hmm [00:40] so maybe the relation workflows are running, but they are going into a bad state? [00:40] jimbaker, possibly [00:41] surprised we don't have anything nicer in the traceback. that's unfortunate [00:41] jimbaker, the status looks right [00:41] hazmat, no, it's missing the relation status info [00:42] jimbaker, one option is to have a look at the zkshell.. ensemble ssh 0 && /usr/share/zookeeper/bin/zkCli.sh [00:42] jimbaker, hmmm.. [00:42] jimbaker, i'll have a look in the morning [00:43] compare it against this output: http://pastebin.ubuntu.com/597192/ [00:43] (from an earlier run) [00:44] hazmat, sounds good [00:54] hazmat, for later consumption - shouldn't this be more than two? zk: localhost:2181(CONNECTED) 26] ls /units --- [unit-0000000000, unit-0000000001] [01:00] hmmm... maybe not that part after all - i was looking at zk_workflow_identity [01:00] it looks like it uses the same path for both ServiceUnitState and UnitRelationState [01:01] but those are not the same paths [01:01] based on looking at those specific classes [01:01] either i'm confused or ensemble is confused ;) [01:46] jimbaker, they are at the same path [01:46] all the workflows for a unit-agent are managed on a single node [02:16] hazmat, thanks for the clarification [02:16] this would have caused more issues if it were not the case, i guess [02:16] jimbaker np.. sorry i had to run out [02:17] jimbaker, yeah.. it seems strange the workflows on the unit are initialized and showing them as running the but the units weren't up or in an error state which seems strange [02:17] i can't think of any reason why that would be the case. [02:18] anyway, just curious it's happening in us-west now - one more thing to try is in eu-west (if that's the other region completely set up) [02:18] i'll take a look at it tomorrow [02:18] jimbaker, it is [02:18] hazmat, have a good night, ttyl [06:24] <_mup_> ensemble/refactor-to-yamlstate r197 committed by bcsaller@gmail.com [06:24] <_mup_> set not taking any random data, but insisting on a dict in tests (for YAMLState) [08:52] Morning everyone [13:06] anyone around o/ [15:02] kim0, g'morning [15:02] or hello is probably more appropriate [15:30] hazmat: hey o/ [15:33] team on vacation huh :) [15:59] <_mup_> ensemble/merged-alt-region-logging r210 committed by kapil.thangavelu@canonical.com [15:59] <_mup_> merge ensemble-log-level [16:00] <_mup_> ensemble/merged-alt-region-logging r211 committed by kapil.thangavelu@canonical.com [16:00] <_mup_> merge ensemble-log-crash [16:07] kim0, hi [16:08] kim0, we have been trying out the alternative region branch. it's not worked for me with deploying our example formulas. you want to give it a try too? [16:08] it did work w/ kapil however === deryck is now known as deryck[lunch] [16:16] ec2 has recovered it seems .. I'm writing a small user level tutorial now, but if you need someone else to test that branch, sure [16:17] kim0, i think it would be useful for sure [16:17] it's also a good way to play w/ the environments.yaml file [16:18] jimbaker: cool, any instructions on using that branch ? I'll try it in a few hours though, now right now [16:20] let me know how do I use it, thanks [16:27] <_mup_> Bug #769030 was filed: Enable one control bucket to be used for multiple regions < https://launchpad.net/bugs/769030 > [16:29] you just need to configure two things in your environments.yaml file: region - us-east-1, us-west-1, eu-west-1; and ensemble-branch - https://code.launchpad.net/~hazmat/ensemble/ensemble-alternate-regions [16:30] got it [16:30] kim0, i also have one with the logging stuff merged in.. lp:~hazmat/ensemble/merged-alt-region-logging [16:30] hazmat, sounds good, then we don't use an earlier formula set [16:30] have to use [16:30] jimbaker, it didn't work entirely [16:31] jimbaker, i'm trying out trunk in east atm to verify the delta [16:31] hazmat, sounds good, less mystifying then [16:31] we prefer our failures to be consistent ;) [16:31] jimbaker, what's odd is that the unit relation state in zk (in west) is good, but status isn't reporting it, and wordpress isn't running [16:32] also ensemble-log seems to return 0 even when it fails/errors. [16:33] hazmat, i just got this weird output in a log [16:33] 2011-04-22 09:20:52,519 unit:mysql/0: twisted ERROR: TypeError: 'Port' object is not callable [16:34] jimbaker, can you paste the full log to pastebin [16:34] hazmat, will do [16:34] thanks [16:35] http://pastebin.ubuntu.com/597493/ [16:35] jimbaker, is this on the open-port/close-port branches? [16:36] hazmat, no, the alternative region branch, running with trunk r200 formulas [16:36] i'm going to try the new alt region branch w/ logging merge in now [16:36] jimbaker, hmm.. there are no port classes in ensemble, only in twisted. [16:36] hazmat, indeed [16:37] are we sure we got good versions of python, twisted, etc built in these amis? [16:37] it's as if we got some version skew going on here [16:38] <_mup_> Bug #769035 was filed: Need a top level decorataor on all independent callbacks to do nice error printing < https://launchpad.net/bugs/769035 > [16:39] jimbaker, just stock natty [16:41] <_mup_> Bug #769036 was filed: Ensemble hook cli api needs to do correct exit codes < https://launchpad.net/bugs/769036 > [16:44] jimbaker, so trunk works for me [16:45] hazmat, trunk formulas, or using trunk with us-east-1? [16:45] trunk and us-east-1 [16:45] trying alt-region with us-east-1 [16:46] there's nothing in the branch remotely related to units or hooks.. [16:46] hazmat, exactly, that's what is so strange here. some other unexpected dependency seemingly === deryck[lunch] is now known as deryck [17:37] jimbaker, the merged-alt-region-logging branch seems to work okay [17:37] in us-east-1 [17:43] hazmat, trying it out now [17:44] jimbaker, cool, i'm trying it out in a different region and then i do see a problem [17:49] hazmat, doesn't work for me this try. speaking of round trip overhead, now doing "watch ensemble status" ;) [17:50] i had proposed building that in for ensemble status with actual watches, but repeatedly polling like this is the poor man's approach for sure [17:51] and the interesting thing is seeing the relation service state disappear... crazy [17:57] jimbaker: the plan with status is to have a mode where it blocks on a topo watch and then reissues the status in a loop [17:57] unlike watch it knows when things change [17:57] bcsaller, yes, intelligent watches :) [17:57] bcsaller, good to know it is in the works [17:58] bcsaller, doing "watch ensemble status" is still useful right now [17:58] good [17:59] i think once we have the relation settings added to ensemble status, that's going to be pretty awesome [18:42] aha [18:42] unit agents are dead [18:43] hazmat, that would make sense [18:43] jimbaker, the fact there is nothing in the log is rather frightening [18:44] hazmat, indeed. i'm just about to testing us-east-1 w/ trunk at r200, which is the last good one i observed [18:44] try testing [18:45] jimbaker, i was able to get trunk latest from merge-alt-region-logging working on us-east-1 [18:46] hazmat, i was unable to get that - i was just getting "it works" plus empty relation service states [18:46] probably because of dead unit agents [18:46] jimbaker, no.. i'm actually got populated relation states with dead units agents [18:46] hazmat, crazy [18:46] jimbaker, the variations i'm using are trunk, ensemble-alternate-region, merge-alt-region-logging [18:47] all with formulas from that are the equivalent of the trunk versions [18:47] i've seen trunk and merge-alt-region-logging working on us-east-1 [18:47] hazmat, i have tried all of those, both with us-east-1 and us-west-1 [18:47] nothing is working end-to-end for me today [18:48] everything starts off fine... then it just mysteriously fails [18:49] hazmat, maybe i should rebuild my buckets, don't know if that's an issue based on bug 769030 [18:49] <_mup_> Bug #769030: Enable one control bucket to be used for multiple regions < https://launchpad.net/bugs/769030 > [18:49] jimbaker, yeah.. i switch my buckets when changing regions atm [18:50] they should recover fine [18:50] ie. detect dead instance stale file, and create new one [18:50] although given how the control bucket works, i wouldn't expect it to impact [18:50] which is what they normally do, or we'd be cleaning it all the time [19:00] bcsaller, hazmat - standup? [19:00] jimbaker, sounds good [19:00] I have little to report, but sure [19:01] then it will go fast :) [19:09] <_mup_> Bug #769120 was filed: Ensemble status shouldn't report dead units based soley on state, but also on presence. < https://launchpad.net/bugs/769120 > [19:19] http://dtrace.org/blogs/bmc/2010/08/30/dtrace-node-js-and-the-robinson-projection/ [19:20] http://wiki.joyent.com/display/node/Using+Cloud+Analytics [19:20] http://dtrace.org/blogs/dap/2011/03/01/welcome-to-cloud-analytics/ [19:30] allergies miserable. [19:39] hazmat, you really should try hot yoga. i found it really helps clear sinuses and it would seem prevent allergic symptoms too [20:15] jimbaker, sadly hot yoga isn't my thing [20:16] does anyone understand apport handling of core files? [20:28] jimbaker, can you give a hook at this look trivial patch for trunk.. https://pastebin.canonical.com/46611/ [20:28] not sure if argparse version changed, but i currently have these two tests failing for me on trunk [20:32] bcsaller, ^ if you have a moment and could look at the trivial.. i'm waiting on that before doing some merges. [20:33] hazmat: the change to generation happens outside the patch? [20:33] bcsaller, yeah.. the error output change cause is not clear, i just matched to what the current production is [20:34] seems like it should have been caught when the change happend and the tests would have broken. [20:34] I'm fine with the change, but want to understand how it happeded [20:34] happened [20:37] bcsaller, yeah.. i'm bisecting the last 5revs now to double check [20:40] bcsaller, just went back a month history, still getting the errors, i'd have to guess its an argparse change and rev increment [20:42] maybe, yeah [20:43] thanks for checking [20:43] bcsaller, seems to be a change between pypi version of argparse and the builtin 2.7 version [20:43] ahh, natty on 2.7 [20:43] makes sense now [20:45] not sure if that's it.. also happens with python 2.6 using the distro argparse [20:45] but it does work with the pypi argparse 1.2.1 [20:46] where as the distro version (for 2.6) is 1.1-1 [20:47] no.. actually it didn't work 1.2.1 [20:47] i had used a patched trunk to test that one [20:48] <_mup_> ensemble/trunk r205 committed by kapil.thangavelu@canonical.com [20:48] <_mup_> argparse error output seems to have changed, match tests to match current output [trivial][r=bcsaller] [20:54] <_mup_> ensemble/trunk r206 committed by kapil.thangavelu@canonical.com [20:54] <_mup_> merge ensemble-log-level [a=niemeyer][r=niemeyer][f=767364] [20:54] <_mup_> This fixes a problem with the ensemble-log hook CLI API, [20:54] <_mup_> not correctly taking a -lLOG_LEVEL option. [20:57] <_mup_> ensemble/trunk r207 committed by kapil.thangavelu@canonical.com [20:57] <_mup_> merge ensemble-log-crash [a=niemeyer][r=kapil][f=767391] [20:57] <_mup_> This fixes a traceback when attempting to use the ensemble-log [20:57] <_mup_> hook CLI API from hooks. [21:36] jimbaker, the principia trunk seems to work with trunk, but the trunk formulas don't on natty. [21:44] still seeing unit agents die though [21:48] hmm [23:18] jimbaker, txzookeeper unit tests are segfaulting with default natty it appears. [23:19] jimbaker, bcsaller do you run with the package zk or a local build? [23:19] local [23:20] yeah.. we've been getting away with not having our own packages [23:20] even though there were known issues with the lucid one, it still worked for our uses. [23:20] doesn't appear to be the case with natty, we're going to need package trunk (3.4) or backport perhaps [23:31] hazmat, curiously i'm reinstalling stuff now [23:32] is everyone running on python 2.7 at this point? [23:32] jimbaker, i am [23:32] jimbaker, i test with 2.6 occasionally as well [23:32] i'm now getting some test errors on trunk, just building a new virtualenv to try 2.7 out [23:36] jimbaker, what does ./test need in a ZOOKEEPER_PATH ? just the zkServer.sh script? [23:36] hazmat, iirc, it doesn't use zkServer.sh [23:37] hmm [23:37] jimbaker, looks like i just need a directory with the jar [23:38] hazmat, sounds about right [23:38] it looks for both dev and prod installs [23:38] which are laid out differently [23:39] hmm.. yeah.. it doesn't work just pointing to a directory of jars.. which is what the deb does for install into /usr/share/java [23:39] makes sense [23:40] fortunately easy enough to change in ensemble.tests.common.ManagedZooKeeper [23:40] basically a variant of zkServer.sh [23:40] was that adjusted in the deb package? [23:43] jimbaker, just had to fix the test/common get class path to not hardcode src release stuff [23:44] jimbaker, debian uses /usr/share/java for java libs [23:44] which was the whole point of that classpath property, so that's cool