[08:27] morning gmb, please call me if you need a code review, a film review, and in general a review of anything reviewable [08:27] frankban, Duly noted :). I should have something for you before the morning is out, actually. [08:27] cool [08:33] ah gmb, if you have time: https://code.launchpad.net/~frankban/lpsetup/branch-subcommand/+merge/105857 [08:34] frankban, Sure, I'll look shortly. [08:34] thank you gmb [09:17] frankban, https://code.launchpad.net/~gmb/launchpad/bug-999554/+merge/105944 needs a review when you've got a second. [09:18] sure gmb [09:47] frankban, Your lpsetup branch looks good. [09:54] gmb: cool, your branch looks good too, i have just a comment, probably the LoC fever will hit us in the future [10:03] frankban, Oh, crumbs. Good catch :) [10:04] Fixed & pushed. [10:21] thanks gmb [11:53] morning frankban -- how goes the reviewing? [12:06] hi bac: it's going well [12:06] i hope it is as glamorous as you anticipated! :) [12:09] bac benji frankban gmb call in 2 (11 after the hour) [12:10] Yarp. [12:13] benji gmb yoo hoo [12:13] gary_poster, I keep getting redirected to plus.google.com when I try to join. Can you invite Canonical-me please? [12:14] gmb, I just sent to graham.binns@canonical.com. It doesn't auto complete :-/ [12:26] https://docs.google.com/a/canonical.com/document/d/1NCbUDCxjnsXbz9AeW0HAmmMdOM7g3eEhhCxyE3_3fRM/edit [12:31] * gmb -> lunch [12:39] * benji reboots. [12:46] gary_poster: The only thing I can think to add to the two-week summary would be to harden (or plan to discuss hardening) what exactly we mean by 95% success rate. The two week metric is pretty good but probably too big. You also mention using the last three days, which seems good. [12:47] A particular number of runs might be a good idea too. [12:47] benji, good point, thank you. I'll incorporate that [12:54] gary_poster: in "goals for next meeting" the switch to lpsetup is still present [12:55] frankban, right, I think I mentioned on call that "Progress on tracked items" and "Goals for next meeting" are not yet updated. I'll be sure to update those. Thank you. [12:56] oh... right [13:02] now I can't reproduce the sound problem. I'm going to consider it fixed until it isn't. [13:02] :-) [13:03] gmb, I'm approving things in canonicaladmin. already approved bank/swap days. looking at expenses. Why don't you need per diems for April 30 & May 1--ah right, because they were vacation days because of flight scheduling snafu, right? [13:04] gary_poster: re. stub and 992184; I'm inclined to set up a buildbot setup in EC2 and verify that I can reproduce the problem there and then give him instructions on how to do the same. How does that sound? [13:06] benji, sounds good, if you think that can be super cheap to do. Other alternatives include letting him look at data center, which had one instance in the past 8, but does not give him sudo access to much of anything; or to just tell him how to set up our buildbot juju thing, which would be cheaper in developer time probably if more expensive in ec2 [13:07] gmb, similar question about May 6, 7, 11 [13:07] no per diem [13:07] gary_poster: I was going to do the latter after verifying that I could reproduce the problem there, but I guess since we've seen the problem there, then we can assume it is reproducable [13:08] gmb, that's Sunday, Monday, Friday? [13:09] benji, "I guess since we've seen the problem there, then we can assume it is reproducable" there = data center? [13:09] gary_poster: no, EC2 using our Buildbot charms [13:10] oh cool. except that I'm confused by your sentence then. Rereading... [13:10] oic [13:10] I don't think I communicated my plan very well. Let me try again. I was going to do a builbot deployment using juju, then log in there and reproduce the problem. Once I can do that I would tell stub how to do the same. [13:10] yes, I think we can assume it is reproducable [13:11] +1 benji. The only trick may be that we are relying on our own yellow versions of the charms [13:11] in that case I'll skip to telling stub how to build an environment with juju [13:11] benji, I've been using runparallel every day for awhile, from Brad. I will send [13:12] cool, anything that makes it easier for him would be good [13:13] benji, I sent to you with cryptic notes. :-) lemme know if not clear [13:13] heh, ok [13:13] gary_poster, 6, 7, 11 had food provided. [13:13] (Or were paid on a company card) [13:13] gmb, wow swanky :-) ok cool [13:14] gmb approved. thanks for diligence. [13:15] gary_poster, No worries. I kept track as I went; given that the per diem was very generous (compared to London's, say) I didn't want to over-claim. [13:15] fun fact: I'm trying out our parallel tests on a cool [13:15] cool [13:16] that was an aborted message at the beginning there :-P [13:16] heh [13:17] fun fact: I'm preparing to try out our parallel tests on a cc2.8xlarge (very fast 16 core). Will report back. :-) [13:34] gary_poster: as I write this out, it seems like a lot to ask; maybe we should just build a slave and hand it off to him [14:01] maybe fun to follow along: http://ec2-23-22-101-96.compute-1.amazonaws.com:8010/waterfall [14:28] ec2 with 32 cores: that was anticlimactic. trying again with the timeout for ssh/dhcp increased to 60... [14:45] gary_poster: amazing! if you want to try the lxc-start-ephemeral/lp-lxc-ip combo, you can find it here: http://pastebin.ubuntu.com/990779/ [14:46] awesome thanks frankban, will do [14:47] I'm going to increase the timeout to 120 locally first; only 10 of the 32 started up properly this time within 60 seconds. concerning [14:47] then will try that other version [15:00] heh, if that had actually worked, the timing would be pretty freaking awesome: 23 minutes, 12 seconds. [15:14] First run only had 10 successful slaves [15:14] second had 7! [15:14] first had timeout of 60 seconds [15:14] second 120 [15:15] Now trying with new version of lxc-start-ephemeral [15:22] frankban, new version of lxc-start-ephemeral got us up to 16 of 32 instances. The other 16 could not get IP addresses. I suspect there's some kind of provisioning limit we're hitting... [15:23] benji, do you know of a reason why 16 of 32 lxc container instances would not get IPs within 60 seconds? [15:23] or something to check? [15:23] other than serge? ;-) [15:23] you could try again using the new version and TRIES=120 [15:25] frankban, yeah, I'll try that next--or maybe even 240, 'cause what the heck--though upping the old version seemed to show that you could give it 120 seconds and it would still not come up. [15:27] wow it's fast. If we could get this working that would be incredible [15:30] hi frankban, i just tried using 'lp-setup branch' and it worked fine. [15:31] cool bac! [15:31] frankban: but i did notice you've chosen different defaults from rocketfuel-setup for branches and dependencies [15:31] even though they can be overridden, i think that may be an irritant for people with existing setups [15:38] gary_poster: re: the subvertpy clean up card...is it as simple as this http://paste.ubuntu.com/990857/ ? [15:41] gary_poster: (your last message got eaten by the beep monster) no; my first guess would be to see how many DHCP addresses are available, 16 sounds like a nice round number [15:42] bac, I think so [15:42] ok, great. not looking for complication but didn't want to miss something [15:43] benji, how would I determine this? [15:43] yes bac, I thought about this, I am undecided: I agree with you but maybe, from the perspective of someone who want to just try "lp-setup lxc-install", having different paths could be safer [15:44] gary_poster: no idea; first I'd figure out what allocates the dhcp addresses and then look at it's configuration options (I now realize that that might have been more pendantic than you were looking for) ;) [15:44] heh [15:45] ack benji thanks. will check in with serge with that as an opening volley when I get back from lunch [15:46] gary_poster: can i use r=gary for both of those? i don't plan to do a MP [15:46] gary_poster: ps aux | grep lxc-dnsmasq [15:46] bac: +1 [15:46] thx [15:46] frankban, ok [15:47] frankban, benji, "--dhcp-range 10.0.3.2,10.0.3.254" doesn't look like that should be a problem :-) [15:47] gary_poster: if I read that well, it doesn't seem to be a leases problem [15:47] ah... [15:48] right, --dhcp-lease-max=253 too [15:48] also --dhcp-lease-max=253 [15:48] heh [15:48] :-) [15:48] frankban, running to lunch. build 4 of http://ec2-23-22-101-96.compute-1.amazonaws.com:8010/waterfall has new lxc start ephemeral with 240 tries [15:49] biab [15:53] * benji reboots to try to fix sound problem. [16:11] I found out one thing. As long as I keep the sond settings app open, beeps remain audible. [16:16] gary_poster: 32 workers correctly started and a subunit assertion failure spoiled our joy... [16:31] it finally happened: bug 1000000 [16:31] <_mup_> Bug #1000000: For every bug on Launchpad, 67 iPads are sold. < https://launchpad.net/bugs/1000000 > [16:39] * benji reboots [17:13] :-/ [17:18] Wow, it took 2.5 minutes to get all instances up [17:20] benji: i wonder how long stephane waited to snag that one? he is the new mpt [17:20] lame bug if you ask me :-P [17:21] way to fix bug: release software with more bugs! [17:21] they used the 1M numer reasonably well, but it could have been bigger/funnier [17:22] manually killing 32 lxc instances is annoying [17:23] yay, western digital replaced my dead 9 month old drive with a new one of a different design. dead 'elements' morphed into a WD Studio. hoping for better longevity. [17:23] cool [17:28] frankban, would be nice to have TRIES be configurable with commandline args [17:29] trying again [17:30] the subunit assertion is another variation of bug 996729 [17:30] <_mup_> Bug #996729: zope.testing --subunit allows bad output on stdout, which can break subunit processing < https://launchpad.net/bugs/996729 > [19:18] gary_poster: heads up, i'd like to start early so i can leave early tomorrow afternoon. we're going to the beach tomorrow night. will be working from there on friday. [19:19] bac, fun. Sounds good. [19:20] some academic friends of ours rent a place every year and invite us down for the weekend to fix their phones, computers, and now tablets. that's not how they would characterize it but is how it happens. :) [19:20] lol [19:21] well, I hope the payoff is worth it :-) [19:21] nice beach front house, good food and beverages [19:21] sounds nice [19:22] i made no headway with bug 992814 so now i'm working on the .testrepository cleanup [19:22] <_mup_> Bug #992814: lib/lp/services/webservice/doc/launchpadlib.txt fails intermittently/rarely in parallel tests < https://launchpad.net/bugs/992814 > [19:22] ok [19:22] I hope the ones we have left are not too unpleasant :-( [19:37] gary_poster: I need to go AFK for a while, have a bad headache. [19:38] benji :-( ok feel better [19:38] thanks [20:39] gary_poster: i've rewritten lpbuildbot/scripts/init_testr.sh to be a python script so i could more easily find and delete the old testr files. so far so good. [20:39] since it is run as a 'step', can i get access to the logging mechanism? [20:40] bac, the buildbot logging stuff? If so, no, I don't think so... [20:41] gary_poster: yeah, for putting up error messages should something go wrong into the waterfall [20:41] oh [20:41] i thought that might be asking too much [20:41] well [20:41] if you dump things to stdout it will be available in the stdout link [20:41] from the waterfall [20:41] isn't that good for what you need? [20:43] yeah, that'll work [20:44] * bac was overcomplicating things [20:49] cool, glad will work. [20:49] * gary_poster is ready for a break. [20:49] * gary_poster is going to go get some water [21:09] * gary_poster is going [21:10] bye