[11:38] <bac> benji: i think the problem with our zope.testing fork tests is that now, for subunit, we are writing directly to __stdout__, so the testrunner actually running the tests cannot capture the output.  it's all going directly to the screen and the tests fail
[11:48]  * gmb -> lunch
[11:59] <benji> bac: I'm over here now. :)
[12:06] <gary_poster> bac benji frankban gmb, as a reminder, we will have the daily & weekly call in 2 hours and 4 minutes
[12:07] <benji> gary_poster: thanks, I had forgotten about the time move
[12:07] <gary_poster> welcome
[12:09] <benji> frankban: I had dome problems with lpsetup yesterday afternoon. Once I have my EC2 instance up and configured correctly will you have a few minutes to help me?
[12:09] <frankban> benji: sure
[12:09] <benji> thanks
[12:31] <benji> frankban: is it best to use lpsetup from a checkout or from the PPA?
[12:32] <gary_poster> I need to restart.  back soon ghopefully
[12:32] <frankban> benji: same revision, so no real difference. in general, PPA is better
[12:55] <benji> frankban: the hangout is open when you're ready: https://plus.google.com/hangouts/_/e5a32b64b2b26fda880db39e37125e9f6733ae75
[12:56] <frankban> benji: joining
[13:04] <gary_poster> frankban, thank you for great lpsetup analysis.  I (optimistically?) made cards for them on the board.  This might help us in the discussion; also, I only made blocks where I thought they were absolutely necessary, which I think is a bit more flexible than your three steps, so we can talk about that on the call also.  Thank you!
[13:05] <frankban> gary_poster: cool! thank you
[13:52] <gary_poster> bac benji frankban gmb, call in 18 (early warning since it is an unusual time for us)
[13:52] <gmb> k
[13:52] <benji> Forewarned is forearmed
[13:53] <frankban> benji: could you please paste the output of `locale` in your ec2 instance? gary_poster: could you do the same on the slave if you are running parallel tests?
[13:53] <gary_poster> frankban, so in the host?
[13:53] <benji> frankban: sure, one sec
[13:53] <frankban> gary_poster: yes
[13:53] <frankban> gary_poster: ah, no, in the lxc
[13:54] <benji> frankban: http://paste.ubuntu.com/1042422/
[13:54] <frankban> (so, between two runs, no rush)
[13:54] <benji> frankban: note, that is from the host
[13:54] <gary_poster> frankban, oh ok.  this is the host, cause I already did it. http://pastebin.ubuntu.com/1042421/ .  I'll get it from a container as soon as I can
[13:57] <frankban> benji: sorry, I need it in the lxc: ssh `sudo lp-lxc-ip -i eth0 -n lptests` locale
[13:58] <benji> frankban: ok, I'll do it after lpsetup finishes; I don't want to spook it
[14:08] <benji> frankban: so, lpsetup has stopped, I don't see an error message but it also didn't have a parade about how everything went fine; will you look at termbeamer and see if it looks good to you?
[14:08] <frankban> benji: lpsetup finished, and it seems without error, but I believe that if you start the container, ssh into it and run make schema in devel we will see an error
[14:08] <gary_poster> bac benji frankban gmb https://plus.google.com/hangouts/_/9fc6f87349a45167b19d7b96789e769a23e20c1c?authuser=1&hl=en-USin 2
[14:08] <gary_poster> urg
[14:08] <gary_poster> https://plus.google.com/hangouts/_/9fc6f87349a45167b19d7b96789e769a23e20c1c?authuser=1&hl=en-US
[14:09] <gary_poster> and...how odd...my camera is lit as if it is being used, but I am not visible...
[14:09]  * benji tries
[14:10] <frankban> benji: because i've see that launchpad-database-dependencies have found postgres 8.4... that's the weird behavior I encountered locally, and it seems to be related to LC_ALL settings.
[14:10] <benji> :(
[14:12] <benji> frankban: here is the locale output from inside the container http://paste.ubuntu.com/1042454/
[14:19] <bac> the "madness" quip has been superceded by "have you tried turning it off and on again"
[16:08] <frankban> benji: so make schema fails: could you please try to run:
[16:08] <frankban> $ LC_ALL=C sudo pg_createcluster 9.1 main --start --encoding UNICODE
[16:09] <frankban> then: utilities/launchpad-database-setup ubuntu
[16:09] <frankban> and finally: `make schema` again
[16:15] <benji> frankban: I'll try after lunch.  I'll email you the results if you're not around.
[16:17] <frankban> cool, thanks beji
[16:17] <frankban> hem, thanks benji
[17:12] <benji> frankban: (if you're still around) that appears to have worked
[17:15] <frankban> thanks benji, on Monday I will try to fix lpsetup to use LC_ALL . have a good weekend!
[17:15] <gary_poster> bye
[17:16] <gary_poster> hey benji, I'm watching a slave while tests are running
[17:16] <gary_poster> it is very interesting
[17:16] <benji> frankban: cool; enjoy your weekend
[17:16] <benji> how so?
[17:16] <gary_poster> so, first CPU is 77% idle (this is on a 24-container instance on a 32 thread/16 core machine)
[17:16] <gary_poster> there is 0 wait time
[17:16] <gary_poster> for io
[17:17] <gary_poster> according to vmstat
[17:17] <gary_poster> memory has 33638M free all the time
[17:17] <gary_poster> I mean, around there
[17:17] <gary_poster> entropy varies widely but never seems to get beneath the 1000s
[17:18] <benji> gary_poster: what is the % utilization for io?  (iostat -x 10) would be a good way to see it; ignoring the first output which is supposed to be "recent history" but I don't have much faith in
[17:19] <gary_poster> there are 8.49 writes/sec and a wrqm/s of 29.64 which is the only thing that looks suspicious so far
[17:19] <gary_poster> trying that
[17:20] <gary_poster> I was doing a watch rather than a -x 10...
[17:21] <gary_poster> but benji it never goes over 3.84% and gets as low as 0.4%
[17:21] <benji> wow; that's quite good
[17:21] <gary_poster> now cpu is up to 33% idle
[17:22] <gary_poster> well it ought to be!  remember, as far as we know, we are writing and reading to memoru
[17:22] <gary_poster> memory
[17:22] <benji> yep
[17:22] <gary_poster> I'm not entirely sure what we are writing tbh, unless it is just the testr recordings
[17:22] <benji> is there ever a non-trivial %steal?
[17:22] <gary_poster> it's always 0
[17:22] <gary_poster> 0.00
[17:23] <gary_poster> Now back up to 50.82% idle
[17:24] <gary_poster> I don't see what the hang up is, unless it's something like reading and writing memory or something crazy like that
[17:24] <benji> oh, wait... that will be 0 on the host; thinko on my part
[17:24] <benji> so did one of the containers take a long time to start in this scenario?
[17:26] <gary_poster> benji, no more than 3 minutes, but will look one sec
[17:28] <benji> it would be interesting to run an "iostat -x 10" while they start to see if there is much resource centention, then run a pidstat on any stragglers to see why they are being slow
[17:28] <gary_poster> benji, first one was ready @ 16:57:42, last one reported for duty @ approx 17:00:05
[17:29] <gary_poster> That can only explain up to 3 minutes though
[17:29] <gary_poster> of 10-ish
[17:29] <benji> right, but the non-loadbalancing could explain the rest
[17:30] <benji> I would be interested in seeing if, say, a 14 container run on the 16 core machine did or did not have any stragglers
[17:31] <gary_poster> The last worker to run was worker-7, which started work at 17:00:55 (so I was wrong about the last one reporting)...
[17:32] <gary_poster> I mean, that was the last one to stop; and it may have been the last one to start
[17:36] <gary_poster> worker-10 was penultimate to finish, and started @ 17:00:56; worker-17 started @ 17:00:53 and was antepenultimate to finish...
[17:37] <gary_poster> ok, doing this systematically.
[17:37] <gary_poster> worker-0: 17:00:57 - 17:25:28
[17:38] <gary_poster> worker-1: 16:57:48 - 17:25:39
[17:39] <gary_poster> worker-2: 17:00:56 - 17:24:35
[17:39] <gary_poster> worker-3: 16:57:51 - 17:21:32
[17:40] <gary_poster> worker-4: 17:00:56 - 17:26:30
[17:42] <gary_poster> worker-5: 17:00:57 - 17:25:45
[17:43] <gary_poster> (Nte that this was a particularly fast run, at 32 mins, 45 secs; and this was a round-robin-assigned version
[17:43] <gary_poster> )
[17:43] <gary_poster> wait, something is wrong...
[17:44] <gary_poster> that one lost a worker
[17:45] <gary_poster> So 24 is too much
[17:46] <gary_poster> During building, hard drive is getting up to 57.44 %util
[17:47] <gary_poster> ok now starting tests...
[17:47] <gary_poster> well, listing tests...
[17:47] <gary_poster> low %util
[17:48] <gary_poster> 23% idle cpu
[17:48] <gary_poster> plenty of free memory
[17:48] <gary_poster> entropy still never below 1000
[17:49] <gary_poster> benji ^^ what else could it be?
[17:49]  * benji reads the backlog.
[17:50] <gary_poster> benji you only need to back 10 or so
[17:50] <gary_poster> no starting tests
[17:50] <gary_poster> "now startin tests..."
[17:51] <benji> gary_poster: what is the load?
[17:53] <gary_poster> benji, 18.99, 11.93, 7.74
[17:53] <gary_poster> not sure if we should regard 16 or 32 as the expected top load
[17:53] <benji> how many containers did you run?  24 again?
[17:53] <gary_poster> benji, yes
[17:54] <gary_poster> and 23.2% idle
[17:54] <gary_poster> at highest
[17:54] <gary_poster> usually a lot more
[17:54] <gary_poster> well, often a lot more
[17:55] <benji> interesting, so over the last five minutes there have been on average 19 (rounding up) processes that were runable; it seems significant to me that the number is so much less than 24
[17:55] <gary_poster> well, there was a 14.35 % idle, but still
[17:56] <gary_poster> benji, fwiw, that was relatively near the beginning of a test
[17:56] <gary_poster> right now our 1 minute time is 21.something
[17:57] <benji> good, that's much closer to what I would expect
[17:57] <gary_poster> 22.53 now even
[17:57] <gary_poster> 23.53! 23.81! whee!
[17:57] <benji> :)
[17:58] <benji> given that each test is non-parallel (even if something is running in another process, like a DB query, the other process is waiting on the result), I would expect that perfect utilization would mean that load == # containers
[17:58] <gary_poster> %idle still in the 30s
[17:58] <gary_poster> I guess that makes sense-ish
[17:58] <gary_poster> given 24 test runs on 32 "cores"
[17:59] <gary_poster> so that is 75% usage
[17:59] <gary_poster> but where's the slow-down?
[17:59] <benji> "the slow-down" as in the variation in start up times?
[18:00] <gary_poster> yeah
[18:00] <benji> gary_poster: want to hang out?
[18:00] <gary_poster> we are well past that now of course, i, this test run
[18:00] <gary_poster> sure
[18:01] <gary_poster> benji https://plus.google.com/hangouts/_/4ff44d4a05bb2e19cabdfe6963ad1235f6d40fc6?authuser=1&hl=en-US
[18:01] <gary_poster> I am still blue mushroom head
[18:02] <benji> gary_poster: my browser seems to be mid-crash, one secon
[18:02] <benji> d
[18:02] <gary_poster> ok
[18:03] <benji> hmm, maybit it is my OS
[18:03] <gary_poster> uh oh
[18:03] <benji> rebooting
[20:48] <gary_poster> benji http://pastebin.ubuntu.com/1042973/
[21:00] <bac> benji, those ebs instructions look great.  i'm confused, though, as to what makes the ebs volume.  is there something magic about /dev/xvdf?
[21:01] <benji> bac: I made it by hand.  If we end up productizing it we will use the AWS API to make them and the snapshot and associate the volumes with the isntances, etc.
[21:01] <bac> benji, so that part is not shown in your instructions?
[21:02] <benji> bac: I think I mentioned it but didn't give step-by-step instructions
[21:02] <bac> yeah, you said create them and make sure they are in the same zone
[21:03] <bac> ok, i was just going to be really confused if there wasn't more to it