[14:55] <shadeslayer> rbasak: hi there, it seems like I'm hitting the same issue as you were here http://irclogs.ubuntu.com/2013/08/01/%23ubuntu-kernel.html
[14:55] <shadeslayer> rbasak: was there any fix for it?
[14:55] <shadeslayer> I have a very reliable way of reproducing it in my schroot, when unpacking the firefox tar
[15:03] <rbasak> shadeslayer: only fix is to not use overlayfs
[15:03] <shadeslayer> :(
[15:03] <shadeslayer> but I need it :(
[15:03] <rbasak> You can configure schroot to clone the entire tree instead I think?
[15:03] <rbasak> Or something like that.
[15:04] <shadeslayer> that sounds expensive
[15:04] <rbasak> Indeed.
[15:05] <shadeslayer> Well, the workaround I added seems to work, which is basically, keep retrying the tar comand
[15:05] <shadeslayer> *command
[15:05] <rbasak> It's a race, so that might work eventually for you. It's not guaranteed to, though.
[15:05] <shadeslayer> aha 
[15:05] <rbasak> IIRC, the code in tar makes an assumption that doesn't hold true on overlayfs.
[15:06] <rbasak> I can't remember the details. Something about fstat after the file has been written or something.
[15:06] <rbasak> Probably inode number related.
[15:08] <shadeslayer> well, the exact combination is eatmydata + overlayfs + tar, so I can imagine things going wrong
[16:04] <Diziet> Hi.  I'm having a problem with     HOME=/ strace -ttfot git-clone -q git://kernel.ubuntu.com/ubuntu/linux.git
[16:05] <Diziet> It falls over after almost exactly 2 minutes.
[16:05] <Diziet> Without -q it works.
[16:05] <Diziet> The strace shows the server sending EOF while the client is still waiting.
[16:05] <Diziet> Where should I report this ?  It's quite inconvenient as it's making something in our CI system not work...
[16:10] <apw> Diziet, which version of git is that ?
[16:10] <Diziet> Debian wheezy
[16:10] <Diziet>  1.7.10.4
[16:10] <apw> why does that ring a bell
[16:11] <Diziet> (Obviously the failure happens without the strace.)
[16:11] <Diziet> The fact that it works without -q suggests to me that something is deciding that this process or this connection is "stuck", using a timer which gets reset by output (which is fairly copious with -v)
[16:11] <Diziet> (I mean, copious without -v)
[16:12] <apw> yep, that is something like the bug, and it is tickling my "this is familiar" noddle
[16:13] <Diziet> As a workaround, perhaps the timeout could be increased to an amount sufficient to make "git clone -q linux" work ?
[16:13] <henrix> apw: a grep in my irc logs (because it *does* ring a bell) shows a ref to bug #1228148
[16:14] <apw> henrix, thanks ... i knew i'd seen something
[16:14] <Diziet> I am experiencing this problem with our git caching proxy.  Naturally I can't really sensibly tell the proxy not to use -q in its own git invocations.
[16:14] <apw> henrix, oh yeah, i think we shelved this till zinc got upgraded to P
[16:14] <apw> which i assume it already has
[16:15] <henrix> yeah, it looks like it has been upgraded
[16:16] <Diziet> That LP bug has a workaround (sending KA packets) but I'm pretty sure the timeout isn't in git itself.
[16:16] <Diziet> It is probably in some proxy or wrapper you have.
[16:16] <Diziet> So I don't think it is necessary for you to patch your git (although that would help people with broken^W NAT).
[16:17] <Diziet> For me it would suffice to increase your timeout.  (I know it is at your end because I have repro'd the problem from my colo.)
[16:20] <apw> Diziet, what is the config for the timeout
[16:20] <apw> Diziet, as i can likely get that changed quickly
[16:20] <Diziet> IDK.  I'm not sure it is actually in git.
[16:21] <Diziet> There's a --timeout option to git-daemon which might be relevant.
[16:21] <Diziet> And also a --timeout option to git-upload-pack.
[16:21] <Diziet> I think the latter may be the one.
[16:22] <Diziet> Are you running vanilla git-daemon out of inetd, or what ?
[16:23] <Diziet> I'm going to try adding a --timeout to my own git-daemon here to see if I can repro the bug.
[16:26] <apw> Diziet, i am pretty sure it is going to be timeing out, so 1) i am gc'ing the repo you are cloning, and 2) looking to change the timeout
[16:27] <Diziet> apw: Thanks.  If you bear with me 10 mins or so I can probably confirm what to do to git to increase the timeout.
[16:28] <Diziet> Are you running git from inetd, or from upstart ?  Is it git-daemon or something else ?
[16:28] <apw> Diziet, great, pretty sure it is from xinetd
[16:28] <Diziet> Can you check the command line you're giving it ?  I think it's probably   git-daemon blah blah blah --timeout=120 blah blah
[16:29] <Diziet> But before you change that 120 I am going to try to repro the fault here.
[16:29] <apw> Diziet, yeah looks to be ... bah ... i'll wait on your test to confirm, and then can set those wheels in motion
[16:30] <apw> Diziet, though the bigger wheels would be to add these patches
[16:30] <Diziet> Great, will get back to you.
[16:30] <Diziet> Heh.
[16:30] <Diziet> col tells me that it's fixed (as in, the KA patch is included) in trusty.
[16:35] <apw> Diziet, yeah and i am sure that box is on P rig
[16:35] <apw> Diziet, yeah and i am sure that box is on P right now
[16:36] <Diziet> I can confirm that --timeout=30 makes my own git server produce the same problem.
[16:36] <Diziet> So I think adjusting your --timeout=120 to (say) --timeout=2000 will probably help.
[16:36] <Diziet> Actually-stuck processes consume very little resource so I think you should be fine with a big timeout.
[16:37] <apw> Diziet, yeah, i'll see what i can get fixed
[16:38] <Diziet> Thanks.  Should I expect a change soon (eg today) ?
[16:48] <Diziet> I have sent an email to the bug suggesting increasing --timeout as a workaround.
[16:48] <Diziet> (Of course that won't help people behind a NAT with a short timeout.)
[16:52] <apw> Diziet, i'd hope, but not expect it to change quite that quick
[16:59] <Diziet> OK, thanks.
[19:41] <apw> Diziet, ok i've reuqested the timeout change, and am working on the fixes, we shall see what occurs
[21:34] <smoser> hey
[21:34] <smoser> wonder if someone has an idea... not necessarily kernel, but somewhat related.
[21:35] <smoser> from trusty: http://paste.ubuntu.com/9887959/
[21:35] <smoser> from utopic: http://paste.ubuntu.com/9887962/
[21:37] <arges> smoser: its like one of those 'spot the differences' pictures. So you're wondering why the estimated minimum size is less in 3.16?
[21:37] <smoser> yes. i had more info coming. but good job spotting the diff :)
[21:38] <smoser> I'm not terribly concerned about the difference, but its causing a failure on a
[21:38] <smoser> build of maas images
[21:39] <arges> smoser: so without looking at the code, I wonder if some meta-data structures changed size and thus the minimal size changed. Whats the failure?
[21:40] <smoser> after doing a of apt-get installs and such to a loop mounted image, on trusty I try to 'resize2fs' to 1400M (arbitrary historic size) and it fits fine.
[21:47] <arges> smoser: one thing to try is use the older e2fsprogs with the newer kernel to see if there is any calulcation differences there
[21:47] <smoser> do you think that resize2fs is able to use anything from the kernel ?
[21:47] <smoser> i would not haave thought it would.
[21:48] <smoser> but your suggestion would give more info to that
[21:48] <arges> smoser: there is a patch that adjusts the inode struct which means it could be padded differently and have a different size too.. haven't exhaustively searched
[21:49] <arges> the difference is 8114 bytes, but we have 145152 inodes... so maybe its a superblock change? 
[21:51] <arges> smoser: another thing to look at is config differences between 3.13/3.16 to see if something was turned on that could affect structure sizes
[21:51] <smoser> arges, but the file isn't mounted.
[21:52] <smoser> so i'm confuse das to how kernel would be involved.
[21:52] <smoser> isn't resize2fs just opening a file an dlooking around at it ?
[21:52] <arges> smoser: not sure. strace and find out
[21:53] <arges> downloading the image here and looking at it
[21:54] <smoser> strace output: http://paste.ubuntu.com/9888176/
[21:56] <arges> smoser: so you see a few ioctls before 'Minimum' gets printed
[21:56] <arges> i'm not sure extactly how minimum size is calculated though.
[22:01] <smoser> what i'm doing is in lp:maas-images.  something like this ends up getting run:
[22:01] <smoser>   time maas-cloudimg2eph2 -vv --kernel=linux-generic --arch=arm64 \
[22:01] <smoser>      $url root-image.gz \
[22:01] <smoser>     --krd-pack=linux-generic,boot-kernel,boot-initrd 2>&1 | tee out.log
[22:01] <smoser> and on utopic, it doesn't fit into 1400M and on trusty it does.
[22:01] <smoser> and i think (doing a more controlled test now) that its significantly different.
[22:02] <arges> smoser: do you know what size it ends up being overall?
[22:03] <smoser> i'll have that later tonirhg. just started a run on utopic and one on trusty.
[22:04] <smoser> i'll keep the images aroudn too so i can grab more debug info on them.
[22:04] <smoser> but i have to go afk for a while. thanks for your thoughts.
[22:04] <arges> smoser: hmm running this in vivid makes me run 'e2fsck -f *' first, then I get 290304 (this is on 3.19)
[22:05] <arges> smoser: sure. this might be worth filing a bug at some point. feel free to ping me again
[22:05] <smoser> hm..
[22:05] <smoser> the downloaded image is dirty ?
[22:05] <arges> that's what resize2fs 1.42.12 says
[22:06] <smoser> hm..
[22:06] <arges> md5 sum matches
[22:06] <smoser> that could be relavant.
[22:06] <smoser> well, you see the output that i got, it doesn't complain. 
[22:06] <smoser> maybe i'll try more liberally sprinking 'e2fsck -fy the-image' 
[22:07] <arges> 1.42.12 vs 1.42.9 wonder if there is some fix that exposes this... ugh
[22:09]  * arges tries a trusty image for the heck of it
[22:12] <arges> trusty image on vivid doesn't complain that i need to run e2fsck. size on 3.19/vivid 236103, 3.13/trusty 230329 so still a difference
[22:20] <arges> smoser: if you want i can do a bisect and figured out which commit made it blow up in size.
[23:34] <smoser> arges, 
[23:34] <smoser> trusty: http://paste.ubuntu.com/9889226/
[23:34] <smoser> utopic: http://paste.ubuntu.com/9889235/
[23:35] <smoser> basically, note that 'df' reports a difference used of '980200' to '980216'
[23:36] <smoser> but resize2fs on trusty reports min size of of ~1073M versus ~1582M on utopic.
[23:36] <smoser> resize2fs does claim "estimated" size, but you'd hope it could estimate a bit closer than that.
[23:38] <Diziet> apw: (git-deamon timeout) Thanks.