[11:15] hi gmb [11:23] Hi bac. [11:24] gmb: hi there. have you any experience getting an 8 instance ec2 up and running? i've configured per gary's email. do you then just do the normal thing, deploying a master and slave or do you have to deploy multiple slaves? [11:26] bac: I'm just deploying the one slave. I haven't actually tried to make it build anything yet, though. [11:26] But the one slave, plus testr, plus multiple cores, should be all we need. [11:26] ok, cool. i'm trying to test new steps for the slave i've added to the lpbuildbot master.cfg [11:27] originally i had a slave start error...so it didn't get to the part i was interested in [11:59] hey gmb. it would have been fine if you had started the zope.testing thing. Better than fine, great. ;-) Did you? I'll arrange the cards if so [11:59] I'm assuming you completed the automation script? [12:00] restarting post upgrade... [12:08] what is the trick for resetting the apt cache? from ec2 i'm getting hash errors. [12:09] benji: do you know ^^ ? [12:09] let me check my notes [12:10] thanks [12:11] bac: apt-get clean [12:11] hullo [12:12] through a series of misadventures I'm on the mac side [12:12] https://talkgadget.google.com/hangouts/_/extras/canonical.com/goldenhorde [12:12] I'm waiting here should anyone care to join me ^^^ [12:12] bac benji gmb [12:12] gary_poster: Not yet, but I will (on both counts). I've spent my morning trying to get a working desktop environment, but I've given up and switched to OSX+SSH for the afternoon. [12:13] ok gmb [12:38] benji, I'm collecting, not filing yet, but I'll pastebin what I find here as I go [12:38] http://pastebin.ubuntu.com/912936/ is first [12:40] http://pastebin.ubuntu.com/912940/ is second, looks essentially the same (first line is mistake, should not have been copied) [12:41] * benji looks. [12:41] aaaand that looks familiar... http://pastebin.ubuntu.com/912942/ [12:42] same http://pastebin.ubuntu.com/912944/ [12:42] hmm, it looks like our previous fix for the locale whining of perl might have been good to keep [12:42] yeah maybe so [12:43] same http://pastebin.ubuntu.com/912948/ [12:43] my first tack will be to treat it as a test bug, if the test doesn't care about locales then we should ignore this warning when comparing output [12:44] same: http://pastebin.ubuntu.com/912950/ [12:44] * gmb -> late lunch [12:44] that would be nice, if possible [12:45] whee, same http://pastebin.ubuntu.com/912951/ [12:47] I see several of these sorts of errors but they are not connected clearly with a test (they break subunit formatting). http://pastebin.ubuntu.com/912955/ wgrant talked about these memcache things on the list recently. I thought the resolution was that it was caused by a newer version of some package and we were going to try and roll back [12:48] bzr locale again http://pastebin.ubuntu.com/912957/ [12:49] again http://pastebin.ubuntu.com/912959/ [12:49] OK, I will announce if this is *not* the same bzr locale bug. If I don't say anything, assume it is bzr locale [12:49] http://pastebin.ubuntu.com/912961/ [12:50] I wonder why this is the first time we're seeing this error. [12:50] http://pastebin.ubuntu.com/912962/ [12:50] me too [12:50] this is first time on Precise beta 2. could be cause [12:51] http://pastebin.ubuntu.com/912965/ [12:52] http://pastebin.ubuntu.com/912967/ [12:52] http://pastebin.ubuntu.com/912968/ [12:52] That's it [12:52] So I only saw three sorts of errors [12:52] bzr locale [12:53] the Twisted reactor thing that benji fixed yesterday [12:53] and the connection errors that wgrant said are happening on the main buildbot [12:53] cool [12:54] gary_poster: do you want me to file the bug about the locale thing? [13:28] benji, sure thank you [13:28] k [14:02] benji, if you give me the bug number I'm happy to add all the affected tests we found, or at least the pastebins. Or something [14:02] gary_poster: 972456 [14:02] ack thanks benji [14:03] bug 972456 [14:03] <_mup_> Bug #972456: Tests can fail when bzr emits an unexpected "unsupported locale setting" warning < https://launchpad.net/bugs/972456 > [14:33] gmb, fwiw, I was able to get the zope.testing tests to run by doing it in a lucid lxc (with python 2.6.5). Maybe that's the trick. I stashed the current failures at this pastebin: http://pastebin.ubuntu.com/913071/ . I saw four, though I thought frankban said we were down to three. I'm pretty tempted to dig into the subunit ones, since subunit support is so important to us, and the source of the regression. [14:34] The other three (testrunner-edge-cases.txt, testrunner-debugging.txt, testrunner-debugging-layer-setup.test) I'm planning on continuing to ignore [14:35] gary_poster: Noted, thanks. I'm not yet in a position to pick up the zope.testing work, so if you want to forge ahead with it and then (if you're not done with it by your EOD, which you may well be) send me a handoff email, I'll be happy to carry the baton tomorrow AM. [14:35] sounds perfect gmb, thx [14:35] (Or Precise might stop being a pain before then. Who knows?) [14:35] :-) [15:14] gary_poster: juju expose for the master did not work for me. can you give me details of how you fixed it via the AWS console? [15:14] bac, sure (and how weird!) [15:14] bac, I'll do it at same time [15:14] go to http://aws.amazon.com/ [15:14] there [15:15] My Account/Console -> console [15:15] sign in [15:15] click on ec2 [15:15] click on [N] Security Groups on right side [15:15] (make sure you aren't in region 'singapore'! ) [15:15] :-) [15:16] look for group representing machine [15:16] ok, now i have a ton of groups [15:16] juju-[name of the environment you used]-[number of the machine from juju status] [15:16] So for instance my environment was big-ec2 [15:17] and my machine was 2 [15:17] so I clicked on juju-big-ec2-2 [15:17] 2 for the bb master, no? [15:17] On bottom click on "Inbound" tab [15:17] bac, depends on the order that you started, but yes, that's the way it is for me too [15:17] juju status is authoritative [15:18] ok [15:18] In Port range on "Inbound" tab type 8010 [15:18] now i already have (to the right) 8010 0.0.0.0/0 [15:18] but no one is answering [15:18] ok, so this is not the problem [15:19] bac, next possibility is that the master is not actually up [15:19] juju status shows the master as started [15:19] juju ssh 2 [15:19] go to /var/lib/buildbot/masters/master and look at twistd.log [15:20] As of now, bac, I have never encountered the situation you describe, fwiw [15:20] uh oh [15:20] i have no /var/lib/buildbot [15:20] heh [15:20] yeah, that doesn't sound so good [15:20] i ran the hooks/install and start manually and they seemed happy [15:20] you are sure--you are looking as root? [15:21] yes, not /var/lib/buildbopt [15:21] yes, not /var/lib/buildbot [15:21] yes, no /var/lib/buildbot [15:22] um [15:28] bac, I'm afraid I have no idea whatsoever. How did you deploy the master? [15:28] juju deploy --config=/home/bac/juju/oneiric/buildbot-master/examples/lpbuildbot.yaml --repository=~/juju local:buildbot-master [15:28] that failed due to the non-existent apt repo [15:28] i fixed /etc/apt/sources.list [15:29] and then did an 'apt-get update' / 'apt-get upgrade' [15:29] then i ran hooks/install and hooks/start [15:29] oh, manually? [15:29] yes [15:29] how should i have done it? [15:30] bac, I think the right thing to have done would have been to do "juju resolved --retry buildbot-master/0" [15:31] there are environmental variables that are not around when you run them by hand [15:31] I described doing this in my second "Starting..." email yesterday [15:31] gary_poster: i'm a bit confused [15:31] i thought the "resolved" command was done before using 'juju debug-hooks', which i did [15:32] so are you saying: [15:32] juju deploy [15:32] see it fail [15:32] juju ssh and then fix /etc/apt [15:32] then juju resolved -- and all should carry on happily? [15:33] *if* you do a retry [15:33] "resolved" alone means "I handled it; don't retry" [15:33] unless you add --retry [15:33] that means "I resolved the problem you encountered; please retry" [15:33] i always used --retry, even if i was going to use debug-hooks [15:34] ok, i'll shoot this environment and try again after lunch [15:34] * bac argh [15:34] so...when you said "i fixed /etc/apt/sources.list" that means you did it within debug-hooks? [15:34] yes [15:34] oh [15:34] bad, huh? [15:35] no [15:35] that sounds fine on the face of it, just not what I did [15:35] I thought you meant that you had done it with juju ssh [15:35] well, something i did caused it to remain unhappy [15:35] right [15:35] mm [15:35] you could try deploying another master? [15:36] i don't see the benefit. i'd rather clean house and try again from fresh [15:36] you could try killing this master? I don't remember if they said there was a "die die die" for a machine, or if the only option is to redo an existing machine [15:37] bac, the only benefit is that you've already paid the price for the slave setup [15:37] and it is unrelated to the master [15:37] because you hadn't gotten that far yet [15:37] so if you got a working master [15:37] then you'd be able to connect your existing slave [15:37] yeah, but i did the same lame-o dance with the slave, so i probably screwed it up [15:37] but I'm just brainstorming [15:38] I don't think it was necessarily all that lame-o [15:38] did you run install during the install step and start during the start step? [15:39] If so, AFAIK, you did everything the way it was supposed to be done, for some story [15:39] ok, on the slave machine i do have /var/lib/buildbot/slaves/slave -- so it seems to be happier [15:39] yes, that's what i did [15:39] yeah [15:40] It's not clear to me that you did anything wrong :-/ [15:40] ok, i destroyed the master [15:40] ok [15:40] i'll now try to redeploy [15:41] try again, yeah. master is fast [15:42] ok, he's up and 'pending' [15:42] ok [15:42] now 'installed' [15:42] huh [15:42] oh, it is the same machine, so the apt problem is pre-fixed [15:42] right [15:42] and 'started' [15:42] whee [15:43] so is there a buildbot? [15:43] yes [15:43] uh, great, I'm so glad we figured out this problem! :-D [15:43] i will now add-relation [15:44] cool [15:45] and it is available on the web. glad i didn't blow it all away! [15:45] great [15:45] ok, so do i have to manually kick off a build? [15:45] yeah bac [15:48] ok, so it tried to run my script 'init_testr.sh' but it was not there [15:48] but it is there...in /var/lib/buildbot/slaves/slave/lucid-devel/build [15:49] bac, right mode? [15:49] -r-xr--r-- 1 buildbot buildbot 395 Apr 3 15:46 init_testr.sh* [15:49] looks reasonable [15:49] so perhaps it thinks it should be somewhere else? [15:50] I don't think so...lemme look at working example. Could you also give me url of web interface to master? [15:58] gary_poster: yeah, the ./ made it work [15:58] cool bac. [15:58] so i need to let the build finish and then restart another to ensure i have all of the data in the .testrepository? [16:08] yes bac [16:08] bac, it should take no time at all [16:08] it should be done already, in fact [16:08] root@ip-10-82-27-185:/var/lib/buildbot/slaves/slave/lucid-devel/.testrepository# ls [16:08] 0 1 failing format next-stream times.dbm [16:08] i think we have a wiener [16:08] looks perfect bac :-) [16:08] with that i lunch and bike [16:09] cool! talk to you in a bot [16:09] bit [16:22] gary_poster: is the kanban update bot running again? and do we have to do anything to get it to update a card? [16:23] benji, I think it is running (80% sure), though it gets confused by bugs that do not have LP tasks (and maybe other situations) [16:23] gary_poster: cool, thanks [16:23] we don't need to do anything to get it to update except wait and have an LP task for the bug (and maybe it must be the only one?) [16:23] welcome [17:35] gary_poster (or bac): any thoughts on a next task? I think the main lanes are full so helping with one of those cards would seem best. [17:35] benji, looking [17:35] bac's is almost done; he just needs to wrap it up and send it off [17:36] I'm doing frightening, embarrassing things. Consider my most recent check-in message for our zope.testing fork: [17:36] bzr commit -m 'Consider this check-in suspect: I reviewed the test failures in the file and they seemed innocuous, rather than correct.' [17:37] I would have you join in with me but my internet connection is supposed to die for a bit soon [17:38] maybe we should do that anyway. You can make progress while I'm disconnected. You up for a call benji? [17:38] gary_poster: I think so. [17:39] k [17:40] benji https://talkgadget.google.com/hangouts/_/extras/canonical.com/goldenhorde [17:51] benji left in a huff [17:52] or a cough [17:52] or a sneeze [17:53] gary_poster: my machine crashed [17:53] benji, I figured as much. Come on by when things are normal again [17:57] * bac wraps things up [18:11] gary_poster, benji: could one of you gentlemens please review https://code.launchpad.net/~bac/lpbuildbot/remember-the-testr/+merge/100664 ? [18:12] bac, I'll look in a sec, sure [18:22] approved bac. thank you [18:23] thanks [18:35] * bac looks for a card [18:36] gary_poster: did you have your francis call? [18:36] bac, 4PM [18:54] "lunch" [18:56] benji: are you about to grab a spot in the coding lane? [18:57] if so, anything i can help with? [18:57] bac: nope, I'm helping Gary with his card [18:57] okey doke. i'll throw a dart at my monitor then [19:41] benji, I suspect you are almost done? [19:42] gary_poster: totally done: lp:~benji/zope.testing/3.9.4-fork/ [19:42] I don't know if we're MP-ready or not. [19:42] awesome benji. Are you making an MP? Shockingly, I'd be happy to review...oh ok. What's up? [19:43] gary_poster: I'll do an MP. I just wasnt' sure if there was more you wanted to do. [19:43] no, cool. We fixed the bug, and did some other good stuff on the way. [19:44] You'll be happy to know that my MIL's low-end FIOS Internet connection is close to double my maxed-out U-verse connection [19:44] I need to get her an AppleTV or Roku or something [19:45] Because she should be able to take advantage of this easily [19:46] Maybe I'll just run CAT5 down the interstate from NJ to NC [19:46] I'm sure that will work out perfectly [19:46] gary_poster: hey, could i tap off it? [19:47] heh, sure bac [19:47] gary_poster: https://code.launchpad.net/~benji/zope.testing/3.9.4-fork/+merge/100679 [19:47] ack benji, on it [19:47] fwiw, i used to have a roku but it looks ugly and ham fisted next to the appletv [19:48] ok [19:48] yeah I like my appletv [19:48] benji, wrong merge target [19:48] so conflicts and other bad stuff [19:48] darn [19:48] make merge target of lp:~launchpad/zope.testing/3.9.4-fork [19:49] benji, ^^ [19:49] already done [19:50] gary_poster: https://code.launchpad.net/~benji/zope.testing/3.9.4-fork/+merge/100681 [19:50] cool benji, much better thanks :-) [19:51] benji, line 8 of diff is my fault. I regarded it as a hack and I didn't mean to check it in. I suggest removal, but if you like it for some reason that's fine [19:52] looking [19:52] no that's evil, I'll remove it [19:53] pushed [19:53] thanks [19:54] benji, funny that there was already a test of the odd behavior of list tests [19:54] benji, approved [19:56] gary_poster: cool, I'll merge it into ~launchpad/zope.testing/trunk, then [19:56] er, lp:~launchpad/zope.testing/3.9.4-fork, rather [19:56] benji, you mean https://code.launchpad.net/~launchpad/zope.testing/3.9.4-fork ? [19:56] yeah ok [19:56] :) [19:56] :-) [19:58] gary_poster: I just posted my March EC2 expenses [19:58] how bad were they benji? [19:59] it wasn't nearly as bad as yours: $162.12 [19:59] Still, a new high? [20:02] gary_poster: by a (base 2) order of magnitude [20:02] ok, branch merged and pushed [20:02] I guess saying "twice as much" would be easier. [20:03] gary_poster: is the /etc/apt/source.list problem on ec2 going to be something we have to deal with long term? do we want a card to fix it? [20:04] bac, I don't know. Maybe we should. :-/ [20:04] i'll add it. deleting later is cheap [20:14] benji, btw, please make a lp branch for getting the new egg in the tree [20:15] gary_poster: ah, will do [20:15] thx [20:15] benji + card ;-) [20:15] k [20:27] gary_poster: any suggestions on a task to pick? [20:28] bac on call [20:40] darn, my setuplxc-created dependencies directory has HTTP checkouts [21:21] bac, the most helpful task would be to investigate the memcache ConnectionError failures that wgrant described on the list ("[Launchpad-dev] memcache errors in ec2 and buildbot -- newer python-memcached to blame?"). They are affecting the main buildbot and us as well, so it's a generic "green buildbot" thing. [21:21] alternatively... [21:22] "teach buildbot to understand subunit in test results to properly report failure numbers in waterfall" card is interesting [21:24] alternatively "Fix /etc/apt/sources.list on ec2" from you. It's not that the debian repo isn't there, it's that it has the wrong hash. Not sure how that would happen [21:26] gary_poster: I give up. I have the new p6 egg in the dependencies repo but I can't for the life of me get a LP build to work to make sure it's used and does the right thing. I think my cold has infected my brain. I'll try again tomorrow if someone else hasn't done it. [21:27] benji, ok. how weird! so you've pushed the change? [21:27] to the download-cache I mean? [21:27] the addition of the p6 egg [21:31] gary_poster: yep [21:32] cool benji. feel better. I will be out tomorrow it looks like. [21:32] I'll send a note to yellow list [21:32] ok; enjoy yourself [21:32] with suggested tasks [21:32] thanks