=== ara is now known as Guest5294 === smb` is now known as smb === txspud|away is now known as txspud === lool- is now known as lool [14:55] rbasak: hi there, it seems like I'm hitting the same issue as you were here http://irclogs.ubuntu.com/2013/08/01/%23ubuntu-kernel.html [14:55] rbasak: was there any fix for it? [14:55] I have a very reliable way of reproducing it in my schroot, when unpacking the firefox tar [15:03] shadeslayer: only fix is to not use overlayfs [15:03] :( [15:03] but I need it :( [15:03] You can configure schroot to clone the entire tree instead I think? [15:03] Or something like that. [15:04] that sounds expensive [15:04] Indeed. [15:05] Well, the workaround I added seems to work, which is basically, keep retrying the tar comand [15:05] *command [15:05] It's a race, so that might work eventually for you. It's not guaranteed to, though. [15:05] aha [15:05] IIRC, the code in tar makes an assumption that doesn't hold true on overlayfs. [15:06] I can't remember the details. Something about fstat after the file has been written or something. [15:06] Probably inode number related. [15:08] well, the exact combination is eatmydata + overlayfs + tar, so I can imagine things going wrong === jdstrand_ is now known as jdstrand [16:04] Hi. I'm having a problem with HOME=/ strace -ttfot git-clone -q git://kernel.ubuntu.com/ubuntu/linux.git [16:05] It falls over after almost exactly 2 minutes. [16:05] Without -q it works. [16:05] The strace shows the server sending EOF while the client is still waiting. [16:05] Where should I report this ? It's quite inconvenient as it's making something in our CI system not work... [16:10] Diziet, which version of git is that ? [16:10] Debian wheezy [16:10] 1.7.10.4 [16:10] why does that ring a bell [16:11] (Obviously the failure happens without the strace.) [16:11] The fact that it works without -q suggests to me that something is deciding that this process or this connection is "stuck", using a timer which gets reset by output (which is fairly copious with -v) [16:11] (I mean, copious without -v) [16:12] yep, that is something like the bug, and it is tickling my "this is familiar" noddle [16:13] As a workaround, perhaps the timeout could be increased to an amount sufficient to make "git clone -q linux" work ? [16:13] apw: a grep in my irc logs (because it *does* ring a bell) shows a ref to bug #1228148 [16:13] bug 1228148 in git (Ubuntu Precise) "git clone -q ends with "early EOF" error on large repositories" [Undecided,Confirmed] https://launchpad.net/bugs/1228148 [16:14] henrix, thanks ... i knew i'd seen something [16:14] I am experiencing this problem with our git caching proxy. Naturally I can't really sensibly tell the proxy not to use -q in its own git invocations. [16:14] henrix, oh yeah, i think we shelved this till zinc got upgraded to P [16:14] which i assume it already has [16:15] yeah, it looks like it has been upgraded [16:16] That LP bug has a workaround (sending KA packets) but I'm pretty sure the timeout isn't in git itself. [16:16] It is probably in some proxy or wrapper you have. [16:16] So I don't think it is necessary for you to patch your git (although that would help people with broken^W NAT). [16:17] For me it would suffice to increase your timeout. (I know it is at your end because I have repro'd the problem from my colo.) [16:20] Diziet, what is the config for the timeout [16:20] Diziet, as i can likely get that changed quickly [16:20] IDK. I'm not sure it is actually in git. [16:21] There's a --timeout option to git-daemon which might be relevant. [16:21] And also a --timeout option to git-upload-pack. [16:21] I think the latter may be the one. [16:22] Are you running vanilla git-daemon out of inetd, or what ? [16:23] I'm going to try adding a --timeout to my own git-daemon here to see if I can repro the bug. [16:26] Diziet, i am pretty sure it is going to be timeing out, so 1) i am gc'ing the repo you are cloning, and 2) looking to change the timeout [16:27] apw: Thanks. If you bear with me 10 mins or so I can probably confirm what to do to git to increase the timeout. [16:28] Are you running git from inetd, or from upstart ? Is it git-daemon or something else ? [16:28] Diziet, great, pretty sure it is from xinetd [16:28] Can you check the command line you're giving it ? I think it's probably git-daemon blah blah blah --timeout=120 blah blah [16:29] But before you change that 120 I am going to try to repro the fault here. [16:29] Diziet, yeah looks to be ... bah ... i'll wait on your test to confirm, and then can set those wheels in motion [16:30] Diziet, though the bigger wheels would be to add these patches [16:30] Great, will get back to you. [16:30] Heh. [16:30] col tells me that it's fixed (as in, the KA patch is included) in trusty. [16:35] Diziet, yeah and i am sure that box is on P rig [16:35] Diziet, yeah and i am sure that box is on P right now [16:36] I can confirm that --timeout=30 makes my own git server produce the same problem. [16:36] So I think adjusting your --timeout=120 to (say) --timeout=2000 will probably help. [16:36] Actually-stuck processes consume very little resource so I think you should be fine with a big timeout. [16:37] Diziet, yeah, i'll see what i can get fixed [16:38] Thanks. Should I expect a change soon (eg today) ? [16:48] I have sent an email to the bug suggesting increasing --timeout as a workaround. [16:48] (Of course that won't help people behind a NAT with a short timeout.) [16:52] Diziet, i'd hope, but not expect it to change quite that quick [16:59] OK, thanks. === pgraner-afk is now known as pgraner-food === pgraner-food is now known as pgraner [19:41] Diziet, ok i've reuqested the timeout change, and am working on the fixes, we shall see what occurs [21:34] hey [21:34] wonder if someone has an idea... not necessarily kernel, but somewhat related. [21:35] from trusty: http://paste.ubuntu.com/9887959/ [21:35] from utopic: http://paste.ubuntu.com/9887962/ [21:37] smoser: its like one of those 'spot the differences' pictures. So you're wondering why the estimated minimum size is less in 3.16? [21:37] yes. i had more info coming. but good job spotting the diff :) [21:38] I'm not terribly concerned about the difference, but its causing a failure on a [21:38] build of maas images [21:39] smoser: so without looking at the code, I wonder if some meta-data structures changed size and thus the minimal size changed. Whats the failure? [21:40] after doing a of apt-get installs and such to a loop mounted image, on trusty I try to 'resize2fs' to 1400M (arbitrary historic size) and it fits fine. === pgraner is now known as pgraner-gym [21:47] smoser: one thing to try is use the older e2fsprogs with the newer kernel to see if there is any calulcation differences there [21:47] do you think that resize2fs is able to use anything from the kernel ? [21:47] i would not haave thought it would. [21:48] but your suggestion would give more info to that [21:48] smoser: there is a patch that adjusts the inode struct which means it could be padded differently and have a different size too.. haven't exhaustively searched [21:49] the difference is 8114 bytes, but we have 145152 inodes... so maybe its a superblock change? [21:51] smoser: another thing to look at is config differences between 3.13/3.16 to see if something was turned on that could affect structure sizes [21:51] arges, but the file isn't mounted. [21:52] so i'm confuse das to how kernel would be involved. [21:52] isn't resize2fs just opening a file an dlooking around at it ? [21:52] smoser: not sure. strace and find out [21:53] downloading the image here and looking at it [21:54] strace output: http://paste.ubuntu.com/9888176/ [21:56] smoser: so you see a few ioctls before 'Minimum' gets printed [21:56] i'm not sure extactly how minimum size is calculated though. [22:01] what i'm doing is in lp:maas-images. something like this ends up getting run: [22:01] time maas-cloudimg2eph2 -vv --kernel=linux-generic --arch=arm64 \ [22:01] $url root-image.gz \ [22:01] --krd-pack=linux-generic,boot-kernel,boot-initrd 2>&1 | tee out.log [22:01] and on utopic, it doesn't fit into 1400M and on trusty it does. [22:01] and i think (doing a more controlled test now) that its significantly different. [22:02] smoser: do you know what size it ends up being overall? [22:03] i'll have that later tonirhg. just started a run on utopic and one on trusty. [22:04] i'll keep the images aroudn too so i can grab more debug info on them. [22:04] but i have to go afk for a while. thanks for your thoughts. [22:04] smoser: hmm running this in vivid makes me run 'e2fsck -f *' first, then I get 290304 (this is on 3.19) [22:05] smoser: sure. this might be worth filing a bug at some point. feel free to ping me again [22:05] hm.. [22:05] the downloaded image is dirty ? [22:05] that's what resize2fs 1.42.12 says [22:06] hm.. [22:06] md5 sum matches [22:06] that could be relavant. [22:06] well, you see the output that i got, it doesn't complain. [22:06] maybe i'll try more liberally sprinking 'e2fsck -fy the-image' [22:07] 1.42.12 vs 1.42.9 wonder if there is some fix that exposes this... ugh [22:09] * arges tries a trusty image for the heck of it [22:12] trusty image on vivid doesn't complain that i need to run e2fsck. size on 3.19/vivid 236103, 3.13/trusty 230329 so still a difference [22:20] smoser: if you want i can do a bisect and figured out which commit made it blow up in size. [23:34] arges, [23:34] trusty: http://paste.ubuntu.com/9889226/ [23:34] utopic: http://paste.ubuntu.com/9889235/ [23:35] basically, note that 'df' reports a difference used of '980200' to '980216' [23:36] but resize2fs on trusty reports min size of of ~1073M versus ~1582M on utopic. [23:36] resize2fs does claim "estimated" size, but you'd hope it could estimate a bit closer than that. [23:38] apw: (git-deamon timeout) Thanks.