bac | benji: i think the problem with our zope.testing fork tests is that now, for subunit, we are writing directly to __stdout__, so the testrunner actually running the tests cannot capture the output. it's all going directly to the screen and the tests fail | 11:38 |
---|---|---|
* gmb -> lunch | 11:48 | |
benji | bac: I'm over here now. :) | 11:59 |
gary_poster | bac benji frankban gmb, as a reminder, we will have the daily & weekly call in 2 hours and 4 minutes | 12:06 |
benji | gary_poster: thanks, I had forgotten about the time move | 12:07 |
gary_poster | welcome | 12:07 |
benji | frankban: I had dome problems with lpsetup yesterday afternoon. Once I have my EC2 instance up and configured correctly will you have a few minutes to help me? | 12:09 |
frankban | benji: sure | 12:09 |
benji | thanks | 12:09 |
benji | frankban: is it best to use lpsetup from a checkout or from the PPA? | 12:31 |
gary_poster | I need to restart. back soon ghopefully | 12:32 |
frankban | benji: same revision, so no real difference. in general, PPA is better | 12:32 |
benji | frankban: the hangout is open when you're ready: https://plus.google.com/hangouts/_/e5a32b64b2b26fda880db39e37125e9f6733ae75 | 12:55 |
frankban | benji: joining | 12:56 |
gary_poster | frankban, thank you for great lpsetup analysis. I (optimistically?) made cards for them on the board. This might help us in the discussion; also, I only made blocks where I thought they were absolutely necessary, which I think is a bit more flexible than your three steps, so we can talk about that on the call also. Thank you! | 13:04 |
frankban | gary_poster: cool! thank you | 13:05 |
gary_poster | bac benji frankban gmb, call in 18 (early warning since it is an unusual time for us) | 13:52 |
gmb | k | 13:52 |
benji | Forewarned is forearmed | 13:52 |
frankban | benji: could you please paste the output of `locale` in your ec2 instance? gary_poster: could you do the same on the slave if you are running parallel tests? | 13:53 |
gary_poster | frankban, so in the host? | 13:53 |
benji | frankban: sure, one sec | 13:53 |
frankban | gary_poster: yes | 13:53 |
frankban | gary_poster: ah, no, in the lxc | 13:53 |
benji | frankban: http://paste.ubuntu.com/1042422/ | 13:54 |
frankban | (so, between two runs, no rush) | 13:54 |
benji | frankban: note, that is from the host | 13:54 |
gary_poster | frankban, oh ok. this is the host, cause I already did it. http://pastebin.ubuntu.com/1042421/ . I'll get it from a container as soon as I can | 13:54 |
frankban | benji: sorry, I need it in the lxc: ssh `sudo lp-lxc-ip -i eth0 -n lptests` locale | 13:57 |
benji | frankban: ok, I'll do it after lpsetup finishes; I don't want to spook it | 13:58 |
benji | frankban: so, lpsetup has stopped, I don't see an error message but it also didn't have a parade about how everything went fine; will you look at termbeamer and see if it looks good to you? | 14:08 |
frankban | benji: lpsetup finished, and it seems without error, but I believe that if you start the container, ssh into it and run make schema in devel we will see an error | 14:08 |
gary_poster | bac benji frankban gmb https://plus.google.com/hangouts/_/9fc6f87349a45167b19d7b96789e769a23e20c1c?authuser=1&hl=en-USin 2 | 14:08 |
gary_poster | urg | 14:08 |
gary_poster | https://plus.google.com/hangouts/_/9fc6f87349a45167b19d7b96789e769a23e20c1c?authuser=1&hl=en-US | 14:08 |
gary_poster | and...how odd...my camera is lit as if it is being used, but I am not visible... | 14:09 |
* benji tries | 14:09 | |
frankban | benji: because i've see that launchpad-database-dependencies have found postgres 8.4... that's the weird behavior I encountered locally, and it seems to be related to LC_ALL settings. | 14:10 |
benji | :( | 14:10 |
benji | frankban: here is the locale output from inside the container http://paste.ubuntu.com/1042454/ | 14:12 |
bac | the "madness" quip has been superceded by "have you tried turning it off and on again" | 14:19 |
frankban | benji: so make schema fails: could you please try to run: | 16:08 |
frankban | $ LC_ALL=C sudo pg_createcluster 9.1 main --start --encoding UNICODE | 16:08 |
frankban | then: utilities/launchpad-database-setup ubuntu | 16:09 |
frankban | and finally: `make schema` again | 16:09 |
benji | frankban: I'll try after lunch. I'll email you the results if you're not around. | 16:15 |
frankban | cool, thanks beji | 16:17 |
frankban | hem, thanks benji | 16:17 |
benji | frankban: (if you're still around) that appears to have worked | 17:12 |
frankban | thanks benji, on Monday I will try to fix lpsetup to use LC_ALL . have a good weekend! | 17:15 |
gary_poster | bye | 17:15 |
gary_poster | hey benji, I'm watching a slave while tests are running | 17:16 |
gary_poster | it is very interesting | 17:16 |
benji | frankban: cool; enjoy your weekend | 17:16 |
benji | how so? | 17:16 |
gary_poster | so, first CPU is 77% idle (this is on a 24-container instance on a 32 thread/16 core machine) | 17:16 |
gary_poster | there is 0 wait time | 17:16 |
gary_poster | for io | 17:16 |
gary_poster | according to vmstat | 17:17 |
gary_poster | memory has 33638M free all the time | 17:17 |
gary_poster | I mean, around there | 17:17 |
gary_poster | entropy varies widely but never seems to get beneath the 1000s | 17:17 |
benji | gary_poster: what is the % utilization for io? (iostat -x 10) would be a good way to see it; ignoring the first output which is supposed to be "recent history" but I don't have much faith in | 17:18 |
gary_poster | there are 8.49 writes/sec and a wrqm/s of 29.64 which is the only thing that looks suspicious so far | 17:19 |
gary_poster | trying that | 17:19 |
gary_poster | I was doing a watch rather than a -x 10... | 17:20 |
gary_poster | but benji it never goes over 3.84% and gets as low as 0.4% | 17:21 |
benji | wow; that's quite good | 17:21 |
gary_poster | now cpu is up to 33% idle | 17:21 |
gary_poster | well it ought to be! remember, as far as we know, we are writing and reading to memoru | 17:22 |
gary_poster | memory | 17:22 |
benji | yep | 17:22 |
gary_poster | I'm not entirely sure what we are writing tbh, unless it is just the testr recordings | 17:22 |
benji | is there ever a non-trivial %steal? | 17:22 |
gary_poster | it's always 0 | 17:22 |
gary_poster | 0.00 | 17:22 |
gary_poster | Now back up to 50.82% idle | 17:23 |
gary_poster | I don't see what the hang up is, unless it's something like reading and writing memory or something crazy like that | 17:24 |
benji | oh, wait... that will be 0 on the host; thinko on my part | 17:24 |
benji | so did one of the containers take a long time to start in this scenario? | 17:24 |
gary_poster | benji, no more than 3 minutes, but will look one sec | 17:26 |
benji | it would be interesting to run an "iostat -x 10" while they start to see if there is much resource centention, then run a pidstat on any stragglers to see why they are being slow | 17:28 |
gary_poster | benji, first one was ready @ 16:57:42, last one reported for duty @ approx 17:00:05 | 17:28 |
gary_poster | That can only explain up to 3 minutes though | 17:29 |
gary_poster | of 10-ish | 17:29 |
benji | right, but the non-loadbalancing could explain the rest | 17:29 |
benji | I would be interested in seeing if, say, a 14 container run on the 16 core machine did or did not have any stragglers | 17:30 |
gary_poster | The last worker to run was worker-7, which started work at 17:00:55 (so I was wrong about the last one reporting)... | 17:31 |
gary_poster | I mean, that was the last one to stop; and it may have been the last one to start | 17:32 |
gary_poster | worker-10 was penultimate to finish, and started @ 17:00:56; worker-17 started @ 17:00:53 and was antepenultimate to finish... | 17:36 |
gary_poster | ok, doing this systematically. | 17:37 |
gary_poster | worker-0: 17:00:57 - 17:25:28 | 17:37 |
gary_poster | worker-1: 16:57:48 - 17:25:39 | 17:38 |
gary_poster | worker-2: 17:00:56 - 17:24:35 | 17:39 |
gary_poster | worker-3: 16:57:51 - 17:21:32 | 17:39 |
gary_poster | worker-4: 17:00:56 - 17:26:30 | 17:40 |
gary_poster | worker-5: 17:00:57 - 17:25:45 | 17:42 |
gary_poster | (Nte that this was a particularly fast run, at 32 mins, 45 secs; and this was a round-robin-assigned version | 17:43 |
gary_poster | ) | 17:43 |
gary_poster | wait, something is wrong... | 17:43 |
gary_poster | that one lost a worker | 17:44 |
gary_poster | So 24 is too much | 17:45 |
gary_poster | During building, hard drive is getting up to 57.44 %util | 17:46 |
gary_poster | ok now starting tests... | 17:47 |
gary_poster | well, listing tests... | 17:47 |
gary_poster | low %util | 17:47 |
gary_poster | 23% idle cpu | 17:48 |
gary_poster | plenty of free memory | 17:48 |
gary_poster | entropy still never below 1000 | 17:48 |
gary_poster | benji ^^ what else could it be? | 17:49 |
* benji reads the backlog. | 17:49 | |
gary_poster | benji you only need to back 10 or so | 17:50 |
gary_poster | no starting tests | 17:50 |
gary_poster | "now startin tests..." | 17:50 |
benji | gary_poster: what is the load? | 17:51 |
gary_poster | benji, 18.99, 11.93, 7.74 | 17:53 |
gary_poster | not sure if we should regard 16 or 32 as the expected top load | 17:53 |
benji | how many containers did you run? 24 again? | 17:53 |
gary_poster | benji, yes | 17:53 |
gary_poster | and 23.2% idle | 17:54 |
gary_poster | at highest | 17:54 |
gary_poster | usually a lot more | 17:54 |
gary_poster | well, often a lot more | 17:54 |
benji | interesting, so over the last five minutes there have been on average 19 (rounding up) processes that were runable; it seems significant to me that the number is so much less than 24 | 17:55 |
gary_poster | well, there was a 14.35 % idle, but still | 17:55 |
gary_poster | benji, fwiw, that was relatively near the beginning of a test | 17:56 |
gary_poster | right now our 1 minute time is 21.something | 17:56 |
benji | good, that's much closer to what I would expect | 17:57 |
gary_poster | 22.53 now even | 17:57 |
gary_poster | 23.53! 23.81! whee! | 17:57 |
benji | :) | 17:57 |
benji | given that each test is non-parallel (even if something is running in another process, like a DB query, the other process is waiting on the result), I would expect that perfect utilization would mean that load == # containers | 17:58 |
gary_poster | %idle still in the 30s | 17:58 |
gary_poster | I guess that makes sense-ish | 17:58 |
gary_poster | given 24 test runs on 32 "cores" | 17:58 |
gary_poster | so that is 75% usage | 17:59 |
gary_poster | but where's the slow-down? | 17:59 |
benji | "the slow-down" as in the variation in start up times? | 17:59 |
gary_poster | yeah | 18:00 |
benji | gary_poster: want to hang out? | 18:00 |
gary_poster | we are well past that now of course, i, this test run | 18:00 |
gary_poster | sure | 18:00 |
gary_poster | benji https://plus.google.com/hangouts/_/4ff44d4a05bb2e19cabdfe6963ad1235f6d40fc6?authuser=1&hl=en-US | 18:01 |
gary_poster | I am still blue mushroom head | 18:01 |
benji | gary_poster: my browser seems to be mid-crash, one secon | 18:02 |
benji | d | 18:02 |
gary_poster | ok | 18:02 |
benji | hmm, maybit it is my OS | 18:03 |
gary_poster | uh oh | 18:03 |
benji | rebooting | 18:03 |
gary_poster | benji http://pastebin.ubuntu.com/1042973/ | 20:48 |
bac | benji, those ebs instructions look great. i'm confused, though, as to what makes the ebs volume. is there something magic about /dev/xvdf? | 21:00 |
benji | bac: I made it by hand. If we end up productizing it we will use the AWS API to make them and the snapshot and associate the volumes with the isntances, etc. | 21:01 |
bac | benji, so that part is not shown in your instructions? | 21:01 |
benji | bac: I think I mentioned it but didn't give step-by-step instructions | 21:02 |
bac | yeah, you said create them and make sure they are in the same zone | 21:02 |
bac | ok, i was just going to be really confused if there wasn't more to it | 21:03 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!