/srv/irclogs.ubuntu.com/2012/06/15/#launchpad-yellow.txt

bac	benji: i think the problem with our zope.testing fork tests is that now, for subunit, we are writing directly to __stdout__, so the testrunner actually running the tests cannot capture the output. it's all going directly to the screen and the tests fail	11:38
* gmb -> lunch		11:48
benji	bac: I'm over here now. :)	11:59
gary_poster	bac benji frankban gmb, as a reminder, we will have the daily & weekly call in 2 hours and 4 minutes	12:06
benji	gary_poster: thanks, I had forgotten about the time move	12:07
gary_poster	welcome	12:07
benji	frankban: I had dome problems with lpsetup yesterday afternoon. Once I have my EC2 instance up and configured correctly will you have a few minutes to help me?	12:09
frankban	benji: sure	12:09
benji	thanks	12:09
benji	frankban: is it best to use lpsetup from a checkout or from the PPA?	12:31
gary_poster	I need to restart. back soon ghopefully	12:32
frankban	benji: same revision, so no real difference. in general, PPA is better	12:32
benji	frankban: the hangout is open when you're ready: https://plus.google.com/hangouts/_/e5a32b64b2b26fda880db39e37125e9f6733ae75	12:55
frankban	benji: joining	12:56
gary_poster	frankban, thank you for great lpsetup analysis. I (optimistically?) made cards for them on the board. This might help us in the discussion; also, I only made blocks where I thought they were absolutely necessary, which I think is a bit more flexible than your three steps, so we can talk about that on the call also. Thank you!	13:04
frankban	gary_poster: cool! thank you	13:05
gary_poster	bac benji frankban gmb, call in 18 (early warning since it is an unusual time for us)	13:52
gmb	k	13:52
benji	Forewarned is forearmed	13:52
frankban	benji: could you please paste the output of `locale` in your ec2 instance? gary_poster: could you do the same on the slave if you are running parallel tests?	13:53
gary_poster	frankban, so in the host?	13:53
benji	frankban: sure, one sec	13:53
frankban	gary_poster: yes	13:53
frankban	gary_poster: ah, no, in the lxc	13:53
benji	frankban: http://paste.ubuntu.com/1042422/	13:54
frankban	(so, between two runs, no rush)	13:54
benji	frankban: note, that is from the host	13:54
gary_poster	frankban, oh ok. this is the host, cause I already did it. http://pastebin.ubuntu.com/1042421/ . I'll get it from a container as soon as I can	13:54
frankban	benji: sorry, I need it in the lxc: ssh `sudo lp-lxc-ip -i eth0 -n lptests` locale	13:57
benji	frankban: ok, I'll do it after lpsetup finishes; I don't want to spook it	13:58
benji	frankban: so, lpsetup has stopped, I don't see an error message but it also didn't have a parade about how everything went fine; will you look at termbeamer and see if it looks good to you?	14:08
frankban	benji: lpsetup finished, and it seems without error, but I believe that if you start the container, ssh into it and run make schema in devel we will see an error	14:08
gary_poster	bac benji frankban gmb https://plus.google.com/hangouts/_/9fc6f87349a45167b19d7b96789e769a23e20c1c?authuser=1&hl=en-USin 2	14:08
gary_poster	urg	14:08
gary_poster	https://plus.google.com/hangouts/_/9fc6f87349a45167b19d7b96789e769a23e20c1c?authuser=1&hl=en-US	14:08
gary_poster	and...how odd...my camera is lit as if it is being used, but I am not visible...	14:09
* benji tries		14:09
frankban	benji: because i've see that launchpad-database-dependencies have found postgres 8.4... that's the weird behavior I encountered locally, and it seems to be related to LC_ALL settings.	14:10
benji	:(	14:10
benji	frankban: here is the locale output from inside the container http://paste.ubuntu.com/1042454/	14:12
bac	the "madness" quip has been superceded by "have you tried turning it off and on again"	14:19
frankban	benji: so make schema fails: could you please try to run:	16:08
frankban	$ LC_ALL=C sudo pg_createcluster 9.1 main --start --encoding UNICODE	16:08
frankban	then: utilities/launchpad-database-setup ubuntu	16:09
frankban	and finally: `make schema` again	16:09
benji	frankban: I'll try after lunch. I'll email you the results if you're not around.	16:15
frankban	cool, thanks beji	16:17
frankban	hem, thanks benji	16:17
benji	frankban: (if you're still around) that appears to have worked	17:12
frankban	thanks benji, on Monday I will try to fix lpsetup to use LC_ALL . have a good weekend!	17:15
gary_poster	bye	17:15
gary_poster	hey benji, I'm watching a slave while tests are running	17:16
gary_poster	it is very interesting	17:16
benji	frankban: cool; enjoy your weekend	17:16
benji	how so?	17:16
gary_poster	so, first CPU is 77% idle (this is on a 24-container instance on a 32 thread/16 core machine)	17:16
gary_poster	there is 0 wait time	17:16
gary_poster	for io	17:16
gary_poster	according to vmstat	17:17
gary_poster	memory has 33638M free all the time	17:17
gary_poster	I mean, around there	17:17
gary_poster	entropy varies widely but never seems to get beneath the 1000s	17:17
benji	gary_poster: what is the % utilization for io? (iostat -x 10) would be a good way to see it; ignoring the first output which is supposed to be "recent history" but I don't have much faith in	17:18
gary_poster	there are 8.49 writes/sec and a wrqm/s of 29.64 which is the only thing that looks suspicious so far	17:19
gary_poster	trying that	17:19
gary_poster	I was doing a watch rather than a -x 10...	17:20
gary_poster	but benji it never goes over 3.84% and gets as low as 0.4%	17:21
benji	wow; that's quite good	17:21
gary_poster	now cpu is up to 33% idle	17:21
gary_poster	well it ought to be! remember, as far as we know, we are writing and reading to memoru	17:22
gary_poster	memory	17:22
benji	yep	17:22
gary_poster	I'm not entirely sure what we are writing tbh, unless it is just the testr recordings	17:22
benji	is there ever a non-trivial %steal?	17:22
gary_poster	it's always 0	17:22
gary_poster	0.00	17:22
gary_poster	Now back up to 50.82% idle	17:23
gary_poster	I don't see what the hang up is, unless it's something like reading and writing memory or something crazy like that	17:24
benji	oh, wait... that will be 0 on the host; thinko on my part	17:24
benji	so did one of the containers take a long time to start in this scenario?	17:24
gary_poster	benji, no more than 3 minutes, but will look one sec	17:26
benji	it would be interesting to run an "iostat -x 10" while they start to see if there is much resource centention, then run a pidstat on any stragglers to see why they are being slow	17:28
gary_poster	benji, first one was ready @ 16:57:42, last one reported for duty @ approx 17:00:05	17:28
gary_poster	That can only explain up to 3 minutes though	17:29
gary_poster	of 10-ish	17:29
benji	right, but the non-loadbalancing could explain the rest	17:29
benji	I would be interested in seeing if, say, a 14 container run on the 16 core machine did or did not have any stragglers	17:30
gary_poster	The last worker to run was worker-7, which started work at 17:00:55 (so I was wrong about the last one reporting)...	17:31
gary_poster	I mean, that was the last one to stop; and it may have been the last one to start	17:32
gary_poster	worker-10 was penultimate to finish, and started @ 17:00:56; worker-17 started @ 17:00:53 and was antepenultimate to finish...	17:36
gary_poster	ok, doing this systematically.	17:37
gary_poster	worker-0: 17:00:57 - 17:25:28	17:37
gary_poster	worker-1: 16:57:48 - 17:25:39	17:38
gary_poster	worker-2: 17:00:56 - 17:24:35	17:39
gary_poster	worker-3: 16:57:51 - 17:21:32	17:39
gary_poster	worker-4: 17:00:56 - 17:26:30	17:40
gary_poster	worker-5: 17:00:57 - 17:25:45	17:42
gary_poster	(Nte that this was a particularly fast run, at 32 mins, 45 secs; and this was a round-robin-assigned version	17:43
gary_poster	)	17:43
gary_poster	wait, something is wrong...	17:43
gary_poster	that one lost a worker	17:44
gary_poster	So 24 is too much	17:45
gary_poster	During building, hard drive is getting up to 57.44 %util	17:46
gary_poster	ok now starting tests...	17:47
gary_poster	well, listing tests...	17:47
gary_poster	low %util	17:47
gary_poster	23% idle cpu	17:48
gary_poster	plenty of free memory	17:48
gary_poster	entropy still never below 1000	17:48
gary_poster	benji ^^ what else could it be?	17:49
* benji reads the backlog.		17:49
gary_poster	benji you only need to back 10 or so	17:50
gary_poster	no starting tests	17:50
gary_poster	"now startin tests..."	17:50
benji	gary_poster: what is the load?	17:51
gary_poster	benji, 18.99, 11.93, 7.74	17:53
gary_poster	not sure if we should regard 16 or 32 as the expected top load	17:53
benji	how many containers did you run? 24 again?	17:53
gary_poster	benji, yes	17:53
gary_poster	and 23.2% idle	17:54
gary_poster	at highest	17:54
gary_poster	usually a lot more	17:54
gary_poster	well, often a lot more	17:54
benji	interesting, so over the last five minutes there have been on average 19 (rounding up) processes that were runable; it seems significant to me that the number is so much less than 24	17:55
gary_poster	well, there was a 14.35 % idle, but still	17:55
gary_poster	benji, fwiw, that was relatively near the beginning of a test	17:56
gary_poster	right now our 1 minute time is 21.something	17:56
benji	good, that's much closer to what I would expect	17:57
gary_poster	22.53 now even	17:57
gary_poster	23.53! 23.81! whee!	17:57
benji	:)	17:57
benji	given that each test is non-parallel (even if something is running in another process, like a DB query, the other process is waiting on the result), I would expect that perfect utilization would mean that load == # containers	17:58
gary_poster	%idle still in the 30s	17:58
gary_poster	I guess that makes sense-ish	17:58
gary_poster	given 24 test runs on 32 "cores"	17:58
gary_poster	so that is 75% usage	17:59
gary_poster	but where's the slow-down?	17:59
benji	"the slow-down" as in the variation in start up times?	17:59
gary_poster	yeah	18:00
benji	gary_poster: want to hang out?	18:00
gary_poster	we are well past that now of course, i, this test run	18:00
gary_poster	sure	18:00
gary_poster	benji https://plus.google.com/hangouts/_/4ff44d4a05bb2e19cabdfe6963ad1235f6d40fc6?authuser=1&hl=en-US	18:01
gary_poster	I am still blue mushroom head	18:01
benji	gary_poster: my browser seems to be mid-crash, one secon	18:02
benji	d	18:02
gary_poster	ok	18:02
benji	hmm, maybit it is my OS	18:03
gary_poster	uh oh	18:03
benji	rebooting	18:03
gary_poster	benji http://pastebin.ubuntu.com/1042973/	20:48
bac	benji, those ebs instructions look great. i'm confused, though, as to what makes the ebs volume. is there something magic about /dev/xvdf?	21:00
benji	bac: I made it by hand. If we end up productizing it we will use the AWS API to make them and the snapshot and associate the volumes with the isntances, etc.	21:01
bac	benji, so that part is not shown in your instructions?	21:01
benji	bac: I think I mentioned it but didn't give step-by-step instructions	21:02
bac	yeah, you said create them and make sure they are in the same zone	21:02
bac	ok, i was just going to be really confused if there wasn't more to it	21:03

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!