=== _salem is now known as salem_ === salem_ is now known as _salem === chihchun_afk is now known as chihchun === maclin__ is now known as maclin [05:17] Good morning [07:08] pitti, how do I connect to running autopkgtest VMs? there are defunct runcmd, but no evidence of what happened on the console and on the host. [07:22] jibel: sorry, missed your ping [07:22] jibel: ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -p 10022 ubuntu@localhost [07:23] jibel: could also be 10023 or higher if multiple VMs are running; it says the port in the log, though [07:26] pitti, ah sorry, I didn't see the port at the end of the command line. Thanks [07:38] pitti, there are serious performance issues with 9p [07:38] pitti, I compared dd on the mountpoint and locally inside the VM http://paste.ubuntu.com/7403086/ [07:42] jibel: right; but it's still much faster than squeezing everything through a pipe and tar [07:43] jibel: we run the tests and builds in /tmp/ in the testbed though, not in /autopkgtest [07:43] jibel: the former is a workaround for the too old qemu in saucy/precise (that caused the "invalid numeric value" breakage) [07:43] which means we actually have to copy large packages, which is what causes these timeouts on libo & friends [07:43] I need to look into that [07:44] (aside from all the other fires that are burning) [07:44] jibel: does that performance limitation hamper some test? [07:44] jibel: the 142 MB/s is just writing to memory, right? whereas 9p actually goes to the disk in the hosts's /tmp/ [07:44] I suppose that explains most of the difference [07:45] pitti, right, but but 2.9MB/s is really slow even for disks [07:45] -but [07:46] pitti, adt-virt-qemu copies data from /autopkgtest right? [07:46] I think that's what impacts performance and makes some test hang [07:47] jibel: yes; /autopkgtest is used for everything that needs to copy up/down data [07:48] jibel: originally I had all tests and source packages there, but as we can't make it owned by the user running the test I had to play some chmod tricks and move building and the tests tree out [07:48] jibel: which test is hanging due to that? [07:48] jibel: NB that libo etc. fail due to the copy timeout when copying the unpackaged source tree between host and testbed (that's the bit on my list) [07:48] pitti, 51200000 bytes (51 MB) copied, 0.228209 s, 224 MB/s [07:49] pitti, ^this is on disk [07:49] faster than mem even :) [07:49] jibel: are you sure? that's going to teh overlay in /dev/shm, isn't it? [07:49] back in 3 mins [07:50] pitti, I am sure, last test was in $HOME as auto-package-testing on alderamin [07:51] and the VM runs with -drive file=/run/shm/adt-utopic-amd64-cloud.img.overlay-1399316707.83,if=virtio,index=0 [07:55] jibel: ah, you ran it on the host, not in the VM [07:56] so 221 MB/s on my workstation's disk [07:56] * pitti boots VM [07:58] jibel: right, I get 6.9 MB/s here [08:00] pitti, yes, 224 is on the host with a raid array, 2.9 is on 9p in the VM and 142 in the VM on disk (which is an overlay in shm) [08:07] jibel: so at least chromium and libo don't have build-needed, for those I can radically optimize the copying and thus avoid the timeout [08:08] ah crap, no, these might actually call stuff from the full tree [08:08] jibel: as an immediate workaround I'd suggest I temporarily change the default copy timeout so that this doesn't keep blocking our tests? [08:09] -copy_timeout = int(os.getenv('ADT_VIRT_COPY_TIMEOUT', '300')) [08:09] +copy_timeout = int(os.getenv('ADT_VIRT_COPY_TIMEOUT', '3000')) [08:10] man, why is it so ridiculously hard to communicate with a QEMU VM [08:11] it seems pretty much the only thing that's fast is ssh, and that makes a lot of assumptions [08:21] pitti, 3000 sounds good for now, let see if it makes binutils more stable. [08:22] jibel: the recent one succeeded; I retried chromium, libo, and friends with the 3000 [08:23] for LO and linux, I am not confident. It took 40min to copy the built tree with previous version of autopkgtest [08:26] * pitti wants a way to access a VM image without root privs. Now! === qwebirc854243 is now known as slickymasterWork [09:42] pitti, it seems to be related to the block size used by cp, if I dd with bs=1M I get x85 performance improvement on my machine === vrruiz_ is now known as rvr [09:58] jibel: oh, wow! so maybe we could apply a similar trick in the copy{up,down}_shareddir bits [09:59] Morning all [10:02] jibel: https://lists.gnu.org/archive/html/coreutils/2011-07/msg00059.html seems related [10:05] morning davmor2 :) === _salem is now known as salem_ [13:22] pitti, FYI, I didn't fin any way to really improve cp on 9p. I tried cpio too and specified a bloc size but it doesn't make a differentce. The best I could do is with rsync which is 2 times faster than cp [13:29] jibel: ah, I'm also currently playing around with this; my hope was that cpio and/or tar would help as you can specify big block sizes [13:32] dd bs=1M if=/dev/zero count=100 | cpio -o --file=out.cpio [13:32] heh, that only creates a sparse file [13:33] pitti, I tried find /autopkgtest/tmp -depth -print|cpio -pdm /tmp/adt/ [13:34] jibel: I'm getting 44 MB/s with tar, while I got 6 with 512 byte block (default) dd [13:34] pitti, I also tried the options msize and cache of 9p but visible improvement [13:34] *no visible [13:36] jibel: same with reading, btw [13:36] dd if=/autopkgtest/out.tar of=/dev/null -> 7.0 MB/s [13:36] with bs=1M -> 63 MB/s [13:39] jibel: so it seems funneling that through tar if both the host and testbed paths are *not* already in the shared dir will give some x10 improvement [13:48] pitti, right, the best I can get is by creating a tar file on the host then on the guest run: time dd if=/autopkgtest/shared.tar bs=1M|tar x -C /tmp/adt/ [13:48] pitti, it takes 2s for a 100M tar file [13:50] jibel: right, I found reading be much less sensitive to the block size [13:52] while cp -a takes 52s and kills the cpu [13:55] jibel: I'm now testing with a more realistic scenario with lots of smaller files, not a single big one [13:55] with an unpacked postgresql-9.3 tree (113 MB, 5680 files) [13:55] 11 seconds with cp -r [13:56] pitti, pitti I tried with /usr/share/doc + libpng = 17806 files [15:22] pitti, I pushed fixes and new tests to https://code.launchpad.net/~jibel/britney/fix_missing_results [15:22] pitti, I verified that I could reproduce the bug with the current version of britney and the new tests [15:22] \o/ [15:22] jibel: you rock, thanks [15:23] pitti, do you think you'd have time tomorrow for a pre-review, then I'll propose a merge against britney [15:23] jibel: yes, absolutely; this is the #1 issue for wrecking utopic, I'll make time [15:23] jibel: it seems this bug currently happesn more often than not; presumably because of the large amount of pacakge influx from syncs, etc.? [15:23] pitti, that'd be great, many thanks [15:25] jibel: perhaps you can already propose, then we see the diff and have some comment thread for the discussion/review? [15:25] jibel: set it as "WIP" for now [15:27] pitti, it's essentially because there are more packages with autopkgtest, and the last result is take into account. [15:27] pitti, okay, I'll do that === roadmr is now known as roadmr_afk [15:28] pitti, any reason the tests aren't merged in britney2-ubuntu? [15:29] jibel: I don't know; I proposed it ages ago, but got no reaction to it yet; probably needs more poking === roadmr_afk is now known as roadmr [15:58] jibel: OOI, why logging.warning -> print() ? === chihchun is now known as chihchun_afk [16:04] pitti, because that's the only call to logging in all the code, print() is used everywhere else. Probably a copy/paste at some point [16:06] jibel: ah, thanks === chihchun_afk is now known as chihchun === roadmr is now known as roadmr_afk === bfiller is now known as bfiller_afk === roadmr_afk is now known as roadmr === chihchun is now known as chihchun_afk === bfiller is now known as bfiller-afk === bfiller-afk is now known as bfiller === salem_ is now known as _salem