/srv/irclogs.ubuntu.com/2013/01/23/#ubuntu-kernel.txt

=== yofel_ is now known as yofel
=== chiluk is now known as chiluk_away
=== chiluk_away is now known as chiluk
ricardo-hehay03:54
=== chiluk is now known as chiluk_away
ppisatimoin09:04
=== henrix_ is now known as henrix
jodh`apw: I think I may be seeing the same problem we discussed yesterday where your server dies on boot.10:21
jodh`apw: I've got similar behaviour over here on a macbook, but it's definitely a kernel issue fwics10:21
shadeslayerogasawara: ping10:39
shadeslayerogasawara: golem.de : http://www.golem.de/news/linux-distribution-ubuntu-erwaegt-umstieg-auf-rolling-releases-1301-97086.html : says that Ubuntu will become rolling release citing that you said it10:40
shadeslayer( I haven't watched the video just yet )10:41
=== yofel_ is now known as yofel
xnoxshadeslayer: it's not like that hasn't been pondered about before..... as usually it's all moot until there is a clear quality & testing & stability criteria.11:01
shadeslayerokay, it's just that the tone of golem.de is "We're going to switch, soonish"11:02
shadeslayerbut then again, it might be just google translate11:02
=== henrix is now known as henrix_
=== henrix_ is now known as henrix
psivaajodh: just wondering if you had chance to test the fix that you mentioned for bug 110038612:11
ubot2Launchpad bug 1100386 in linux (Ubuntu) "Raring server installations on VMs fail to reboot after the installations" [High,Confirmed] https://launchpad.net/bugs/110038612:11
apwjodh, yeah it seems to be made happier by 'nomodeset' on my virtual instances, which tends to say it looking like you is coincidence, and likely we are losing all console output12:13
apwjodh, do you have any symptoms other than a hang ?  a nice panic perhaps?12:14
jodhapw: no - nothing.12:15
jodhpsivaa: been attempting to test it all morning but everything I've touched so far today has broken (including kernel, compiler, apport, unity, router, sbuild :()12:16
psivaajodh: okay, good luck for the rest of the day :)12:41
jodhpsivaa: fix for bug 1096531 now uploaded.13:11
ubot2Launchpad bug 1096531 in upstart (Ubuntu) "After touch /forcefsck and reboot: Assertion failed in log_clear_unflushed" [High,In progress] https://launchpad.net/bugs/109653113:11
psivaajodh: thanks13:19
* henrix -> lunch13:24
apwpsivaa, do we know when the boot hang at fsck started ?13:25
apwpsivaa, when did it show up in your testing13:25
psivaaapw: its after fsck started13:25
apwpsivaa, yes that one, what version did it first appear, what version of the kernel13:26
psivaaapw: i need to check, 2 mins 13:27
apwpsivaa, thanks13:27
=== rsalveti_ is now known as rsalveti
psivaaapw: its fsck from util-linux 2.20.113:34
psivaaif that's you asked :)13:35
apwpsivaa, i think this is possibly a kernel issue, as you have only recently noticed it13:36
apwpsivaa, i assume you have an image which does not show this, and one that does, i want to know the kernel version in the last working one 13:36
psivaaapw: i have a server image from the 9th Nov, but iirc this issue was noticed early jan, so checking if we have a newer image13:40
=== henrix is now known as henrix_
=== henrix_ is now known as henrix
ogasawarashadeslayer: there might be some issue with translation there, you should probably just watch the video.  I mentioned that a rolling release has only been a discussion at this point and nothing set in stone.  as xnox also noted, I mentioned in the interview the quality and stability aspects need to be in place before such a plan could move forward.13:53
shadeslayerI see13:54
shadeslayerIt's just that I don't have enough bandwidth at the moment to stream :()13:54
shadeslayer:)13:54
xnoxshadeslayer: you should definitely watch it. ogasawara was a real star in that interview: informative and inspiring. Overall very entertaining hangout.13:55
shadeslayer:D13:57
=== henrix is now known as henrix_
=== henrix_ is now known as henrix
apwsforshee, so i have not been using it very long, but preliminary feeling is that this brcm fix has worked, i will be suing it a bit longer before i say for sure ... but i have been bitching much less about my machine with it14:20
sforsheeapw, great. It tested well for me too. It's already on the linux-wireless list, so I expect it in the next 3.8-rc.14:21
sforsheeapw, otherwise is brcmsmac working well on your machine?14:22
apwsforshee, i think it is ok yeah14:27
apwsforshee, if you get me the commit i'll push it to raring as a sauce patch, and it will either rebase out of existance or not depending if it makes the next -rc14:28
apwas it is very annoying :)14:28
sforsheeapw, ack. I'll send it to the list in a bit.14:29
wookeyHmm. I see that some bright spark decided that the correct name for the variable keeping the kernel's version of DEB_HOST_ARCH (e.g. 'arm') should be called 'build_arch'14:33
wookeyIt's only used in the rules files. WOuld people object if I sent in a patch to rename it 'host_arch' to reduce cognitive dissonance?14:34
wookey(I know the GNU naming is bloody confusding but as we are using that already for every else it seems peverse to use this)14:34
wookeyI guess it was called 'build' to distinguish it from 'header'. 14:35
apwwookey, i guess that would be ok14:35
apwwookey, i hate all of those names ... sigh14:35
wookeyyes it really hurts your head, especially if you don;t do it all the time14:36
apwwookey, oh but is that really waht it is ?14:36
wookeySo anyone coming to it fresh is likely to get it all wrong14:36
wookeydo you know why we have 'build_arch and 'header_arch'? when are they different?14:36
=== chiluk_away is now known as chiluk
apwbuild_arch is actually the kernel arch one uses when making things14:37
apwso the arch you are building for, is that HOST14:37
wookeycorrect14:37
wookeyand the arch you are building on is BUILD14:37
apwif you are changing it maybe kernel_arch might make more sense, as it is not an _ARCH in the sense of those arches anyhow14:38
wookeyand tools stuff probably doesn;t keep all this straight. I've fixed one bug. looking for more now. 14:38
wookeyright. that's a better name. thank you14:38
apwand would probabally match header_arch as well14:38
apwheader_arch used to be different back when 32bit and 64bit x86 were different in the kernel, so it is possible it could reappear14:39
wookeyOK14:39
wookeyI'm not sure which is correct when building tools (which use that arch to find the syscalls, for example)14:40
wookeydoesn;t matter for 'arm', or 'arm64' but I'd like to get it right...14:40
wookeyI suspect header_arch...14:41
apwwookey, you are likely right, hard to be sure15:12
apwjodh, hey ... ok these machines are actually not dead at all, in fact we have just lost the console driver en-toto.  i need to find out what upstart thought was going on during early boot, is that recorded?15:13
jodhapw: how 'early' are we talking?15:14
apwjodh, i want to know what triggered plymouth to start15:16
apwstart on (started plymouth15:17
apw          and (graphics-device-added PRIMARY_DEVICE_FOR_DISPLAY=115:17
apw               or drm-device-added PRIMARY_DEVICE_FOR_DISPLAY=115:17
apw               or stopped udev-fallback-graphics))15:17
jodhapw: could have been started in the initramfs.15:17
apwjodh, basically i want to know what lit off that15:17
apwjodh, i want to know what order and when these events occured in the boo15:18
apwjodh, as we seem to be lighting off plymouth just before we switch framebuffers instead of just after15:18
apwjodh, is that level of detail recorded anywhere on disk, can i ask for it15:19
jodhapw: your best bet if you can't get a console log is to have plymouth.conf call 'set' or 'env' and write a persistent file. That will contain UPSTART_EVENTS (see http://upstart.ubuntu.com/cookbook/#standard-environment-variables).15:19
jodhapw: that env var will tell you which events triggered plymouth to start.15:20
apwjodh, ahh will have a go at that15:21
bjfarges, jsalisbury any progress on the inotify bugs?15:23
argesbjf: testing today on an earlier version15:25
jsalisburybjf, I requested testing of earlier kernels in all the bugs, I'll go check for updates now15:30
bjfarges, jsalisbury have you been able to reproduce the issue?15:30
argesbjf: ok i'lm looking at 1101666 and i haven't been able to reproduce yet15:33
bjfarges, ack, was just wondering if were were dependent on testing by the reporters15:34
argesbjf: so for this bug, essentially its to run this test program and expect an error of 'inotify_init: Too many open files'? am i missing something15:36
bjfarges, that was my reading of the bug15:36
* ogasawara back in 2015:36
argesbjf: i've tried in a 3.2.0-36-generic kernel, and a 3.2.0-32-virtual15:36
argesand no luck reproducing15:37
argesalso looks like this user is reproducing on vmware fwiw15:37
argeswhich i don't have15:37
jsalisburybjf, arges, so far only one person has done some testing.  3.2.0-36 the bug and 3.2.0-35 does not.  I've also asked for testing of 3.2.0-36.56 versus 3.2.0-36.57, but don't have those results yet.15:37
jsalisburys/3.2.0-36/3.2.0-36 has/15:38
bjfjsalisbury, arges so this bug has been around for more than one release as i understand it so it's not a regression that just went in15:38
argesjsalisbury: also looks like lino (the patch author is looking at this too)15:38
bjfjsalisbury, arges though i'd like to get it resolved quickly and into the point release, i'm not going to wait a long time for it15:39
argesok i see a bash script in the comments, trying that too15:39
jsalisburyarges, bjf, There is also this comment in one of the bugs: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1101355/comments/615:40
ubot2Ubuntu bug 1101666 in linux (Ubuntu Quantal) "duplicate for #1101355 inotify fd leak" [Medium,In progress]15:40
bjfjsalisbury, yes we have seen that. we generally don't fix a regression by applying a bunch of new commits15:41
jsalisburybjf, arges, however, I haven't looked at those patches as of yet.15:41
jsalisburybjf, ack15:41
jsalisburyarges, I haven't tried to reproduce it yet, are you working on that?  15:42
bjfjsalisbury, but i'm not ruling it out15:42
argesjsalisbury: i'm running the bash script in 1101666 comment #2515:42
jsalisburyarges, ok15:43
argesso far no problem on bare metal 3.2.0-3615:43
argesafter iteration 15815:43
argeswait15:43
argesnope I see the same issue15:43
argestesting on earlier version 15:43
jsalisburyarges, are you running .56 or .57?15:43
argesumm15:44
argesjsalisbury: this is a kernel with that headset patch on top... so let me re-test with an official build15:45
argestoo many things flying around15:45
henrixjsalisbury: arges: i was able to reproduce the issue on a VM (kvm), after iteration 11715:48
argeshenrix: which version?15:48
henrixarges: the version in -updates15:48
henrix3.2.0-36.5715:49
argeshttps://bugs.launchpad.net/ubuntu/+source/linux/+bug/1101666/comments/3315:49
ubot2Ubuntu bug 1101666 in linux (Ubuntu Quantal) "inotify fd leak" [Medium,In progress]15:49
argesthis is also interesting15:49
henrixok, so i'll just build a test kernel reverting the fsnotify commits we added on 3.2.0-36.57 15:51
henrixarges: jsalisbury: or do you have another suggestion?15:52
argeshenrix: no however do we think this is already verified?15:52
jsalisburyhenrix, shouldn't that just be 3.2.0-36.56 ?15:52
argesand it looks like that fails from reading the comments. the only fsnotify patches are the ones i SRUed15:53
henrixjsalisbury: well, 3.2.0-36.56 actually contains some of these patches already. 3.2.0-36.57 reverts some SAUCE patches and applies them again from mainline15:53
argeshenrix: did we test 3.2.0-36.57?15:53
jsalisburyhenrix, ok, got it.15:53
apwso it seems we'd want to know if .56 was good (i believe we are saying .57 is bad)15:54
argesalthought the fsnotify patches in .56 (the original SAUCEd ones) exclude a single patch from that 9 patch series15:54
henrixyes, .57 is bad. but most probably .56 is also bad.15:55
henrixif you run git log --oneline Ubuntu-3.2.0-36.56..Ubuntu-3.2.0-36.57 you'll see they are pretty much the same, appart from one patch15:55
henrix.57 = .56 + 692115c15:55
henrixand i don't think we want to leave 692115c out15:56
argesno15:56
argesbut I think its clear it causes a regression15:56
argesdespite fixing something else15:56
apwwelll knowing if .56 was good would tell us that indeed, is .56 good15:57
henrixok, i'll build .56 and test it15:57
jsalisburyhenrix, you can get .56 from here: https://launchpad.net/ubuntu/+source/linux/3.2.0-36.5615:58
henrixjsalisbury: ack, thanks15:58
jsalisburyhenrix, under "Builds" per arch15:58
argesalso keep in mind that these 9 patches are in ubuntu-raring too15:59
argesso i can test raring as well15:59
apwjodh, ok that shows what i feared, that the efifb is being detected as a sane frambuffer, then recycled16:00
jsalisburyarges, I believe the patches in 3.8 are different 16:00
argesjsalisbury: well the ones in 3.2 3.5 are backports of the 3.8 patches16:01
apwslangasek, can you remember whether we expect to use efifb as anything other than a fallback during bootsplash ?16:01
apwslangasek, ie are there systems where that would be the only one commonly16:01
slangasekapw: I don't know about commonly, but if there's no other framebuffer available I think we would wind up just using efifb?16:01
jsalisburyarges, ok, were they backported from commit: 96680d2b9174668100824d763382240c71baa811 ?16:02
apwslangasek, the issue i am seeing is in VMs, efifb comes up, we splash to it (or tell plymouth to anyhow) and then cirrus comes up and replaces it, failing cause it cannot replace the framebuffer16:03
argesjsalisbury: that's the merge... 16:03
apwslangasek, in the bad old world we had the same issue with vesafb and it went onto the 'fallback' thing to change the order16:03
jsalisburyarges, ack16:03
apwslangasek, efifb is triggering the same badness i believe here, and so i am considering whether we can not consider efifb as a primary display just as a fallback16:04
argestesting .56 btw16:04
argesjsalisbury: henrix : .56 also fails for me16:06
arges3.2.0-36.5616:06
apwjodh, your mac which was showing perhaps the same issue.  for me the machine is coming up successfully with a completely dead console (no VTs no nothing)16:08
apwjodh, but it is on the network, can you tell if yours is the same, if you can login have a look at /proc/fb which in my failing case was empty16:08
apwjodh, if you can login i would also like a dmesg16:09
jodhapw: will try to nagivate my menu "blind" and see what happens :)16:09
henrixarges: i'm about to test it too16:10
argesjsalisbury: Ubuntu-3.8.0-1.5 works fine up to 300 interations of this bash script 16:10
apwjodh, would you not be beyond menus when it fails, ie booting16:10
argeshenrix: ok yours should fail also16:11
henrixarges: yep :)16:11
jodhapw: no - my menu waits for input by design. I can turn it off though...16:11
apwjodh, oh you have your own menu after boot16:14
apwsoemthing i would not see16:14
jodhapw: you're right - I've got it in the odd state now. There is *nothing* in dmesg for the kernel - first entries are for upstart.16:15
apwso you can login ?16:15
jodhapw: to be more precise, nothing from the kernel until *after* upstart has started.16:15
jodhapw: logged in now.16:15
apwjodh, well that is odd (re: dmesg being empty of kernel things)16:15
jodhapw: keyboard lights fail to work which coupled with a blank screen led me to believe it was dead.16:15
apwjodh, what does /proc/fb have in it16:15
apwjodh, but can i have dmesg regardless please16:16
jodh0 inteldrmfb\n1 VESA VGA16:16
jodhapw: sure...16:16
henrixarges: ok, took me a while to reproduce as i was boot the wrong kernel :p16:16
apwjodh, ok do you have upstart debug on, maybe we are losing the early dmesg due to size16:17
argesoops16:17
apwjodh, i am not sure i expect to see VESA loaded there, if you have inteldrmfb loaded ...16:17
apwslangasek, ^^ we only load vesafb if we don't get a normal driver in the normal run of things right ?16:17
apwjodh, let me know where you have put the dmesg, pastebin it perhaps16:18
jodhapw: yeah - one sec. wondering if I can run 'apport-collect' again for the bug?16:19
apwjodh, hmm no idea16:19
jodhapw: seems you can: https://launchpadlibrarian.net/129184147/BootDmesg.txt (from bug 1103406)16:22
ubot2Launchpad bug 1103406 in linux (Ubuntu) "kernel disabled console output and keyboard lights in early boot" [High,Confirmed] https://launchpad.net/bugs/110340616:22
slangasekapw: cirrusfb> isn't there an open bug report about this, which I filed and smb worked on for a bit?  Considering that we have *no console* in the EFI world until efifb is loaded, I don't think it's sane to treat efifb as a "fallback"16:30
apwjodh, ok ... can we get an env >/FOO from -fallback-graphics on your system please16:31
bipulI am looking some help for writing code in C for Packet capturing through libpcap in Linux(Ubuntu)16:31
apwjodh, i want to know if it is seeing the intelfb right16:31
slangasekapw: vesafb> yes, AFAIK we only load that via the upstart job you wrote for this16:31
jodhapw: on it...16:31
apwslangasek, so yeah we don't have a console till efifb starts true, but we get those when we are going to get a better one later as well16:32
apwslangasek, which sounds like the vesafb issue in a sense16:32
ogra_there is also /usr/share/initramfs-tools/scripts/init-top/framebuffer16:32
bipulAny one here who can help me.16:32
apwslangasek, we want to use efib, but only if we don't have something more useful16:32
slangasekogra_: which is also "fallback" in its logic16:32
ogra_(indeed only if FRAMEBUFFER exists)16:32
slangasekapw: why can we not fix the efifb->cirrusfb handoff?16:33
slangasekI don't think we want the kernel to delay all video output until it has a chance to probe around and try to find a chip-specific video driver16:33
apwslangasek, this is exactly the same handoff issue we had with vesafb, wherein upsteam said don't do that16:34
slangasekhmm, is it definitely the same16:34
slangasek?16:34
apwslangasek, output will come out on it, the issue is if plymouth has it open when the kernel trys to replace it16:34
apwslangasek, then it cannot rip out efifb correctly, nor replace it with cirrus so we have none16:35
apwslangasek, so i am not suggesting no output, just no splash on it16:35
slangasekah16:35
apwso i guess it would be purple from grub if that is on16:36
apwor in my case it has kernel dmesg vommit16:36
apwas it is a server image16:36
slangasekso you intend efifb to not be autoloaded?16:37
apwi seem to be able to 'fix' my machine here by preventing efifb fb0 from triggering PRIMARY_DEVICE_FOR_DISPLAY16:37
apwslangasek, i think it is builtin for reasons i now forget, so not stopping it autoload, just not marking it PRIMARY_16:38
apwthen it gets used, as a fallback rather than first16:38
apwat least i think it should, it is very hard to tell here where its not innitialised long enough before the machine boots and turns off splash16:38
jodhapw: http://paste.ubuntu.com/1563545/16:39
slangasekapw: hmm, ok16:39
slangasekI guess that sounds sane16:39
jodhapw: ... but oddly, this time /proc/fb just contains '0 inteldrmfb'16:39
apwjodh, in this case is /proc/fb showing the same two lines ?16:40
apwjodh, ok that one seems a 'good' boot then16:40
slangasekapw: can you confirm that if you make this change to efifb, the per-chip driver *does* get marked PRIMARY_ in its place?16:40
jodhapw: no - this boot was also 'stuck'.16:40
apwslangasek, it does in my VM here that triggers the issue, cirrus comes along very soon after and makes the same things happen16:41
apwjodh, and a dmesg off that one16:41
slangasekapw: "makes the same things happen" - I specifically want to know that it sets the PRIMARY_ flag, such that we don't have to wait for udev to settle before we get splash init16:42
argesjsalisbury: ok i'm building a precise kernel with patches identified here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1101355/comments/616:42
ubot2Ubuntu bug 1101666 in linux (Ubuntu Quantal) "duplicate for #1101355 inotify fd leak" [Medium,In progress]16:42
apwjodh, i did find in my testing that the timeing change triggered by adding env > can change timing enough to cahnge the issue, which is also annoying16:42
jodhapw: updated dmesg - http://paste.ubuntu.com/1563556/16:42
jsalisburyarges, cool.  were those patches just missing from the original backport?16:42
apwslangasek, sorry yes, that is what i am saying; env from fallback has16:43
jodhapw: ack16:43
slangasekapw: ok, then that sounds sane16:43
apwslangasek, sorry from plymouth-splash has:16:43
apwPRIMARY_DEVICE_FOR_DISPLAY=116:43
apwand what looks to be a sane pci id for the cirrus fake thing16:43
argesjsalisbury: no the original backport that fixed the bug iw as looking at was 9 patches16:44
apwslangasek, i assume where we only have efifb splash will be late of course16:44
argesthese 6 additional ones came later16:44
jsalisburyarges, ahh, ok16:44
apwslangasek, ok i will put together a patch for udev and we can get it tested officially16:45
argesanyway building. i want to confirm that these new patches fix _both_ issues16:45
slangasekapw: we may also need/want to touch udev-fallback-graphics to not try to load vesafb on top of efifb16:46
apwslangasek, a point indeed, hmm i wonder what happens there16:48
apwslangasek, i'll try preventing cirrus from loading in my case and see16:48
slangasekok16:48
argesok uploading16:57
apwslangasek, ok it seems that we will try and load vesafb (as we surmised) but that actually that is ok because it fails16:59
apwERROR: could not insert 'vesafb': No such device16:59
apweven with a VGA device installed rather than cirrus, because efifb has it, it fails to load17:00
slangasekright17:00
slangasekthough we preferably would know this and fail silently instead of throwing noise to /var/log/upstart17:01
apwi guess we could check /proc/fb for EFI VGA as well if we wanted to be paranoid17:01
slangasek(IMHO)17:01
argesjsalisbury: henrix : please test http://people.canonical.com/~arges/lp1101666/17:02
apwslangasek, oddly it doesn't seem to record anything in there17:03
apwslangasek, for that job even when i beieve it ran and failed17:04
slangasekhmm, odd17:04
slangasekwell, we do call modprobe with '-q'?17:05
apwslangasek, ahh yes, -q -b17:05
=== henrix is now known as henrix_
=== henrix_ is now known as henrix
jsalisburyarges, will do shortly.  just need to setup an environment to reproduce.17:08
argesjsalisbury ok i'm testing on this end17:09
henrixarges: do you have a link to gomeisa/tangerine with the .debs?17:10
argeshenrix: gomeisa.buildd:lp1101666/17:10
henrixarges: cool, thanks17:10
arges300 iterations without a failure...17:12
argesbrb17:12
arges800 iterations without a failure17:20
* ppisati -> gym17:24
jsalisburyarges, I'll test in about 10 minutes.  Just finishing up another bug17:27
henrixarges: same here. and i've just started to run in parallel the inotify stress test that should trigger the original bug that should have been fixed by 3.2.0-36.5717:27
apwarges, so is that a further 6 patches on top of what we have already in the tree ?17:28
psivaaapw, so the hang issue started with Ubuntu 3.7.0-6-generic (20121213 raring server and later ones)17:34
psivaaapw: i tried with 3.7.0.5-generic and earlier ones (20121212 and earlier images) and could not see the hang17:35
jsalisburyarges, testing your kernel now.  I changes the script to run for 3000 iterations, so I'll just let it go17:36
jsalisburyarges, crap, got a panic17:37
henrixjsalisbury: inotify related, or something else?17:37
jsalisburyhenrix, idk yet.  Happened right at the login screen.  Booting again.17:38
henrixouch17:38
jsalisburyhenrix, ok, booted fine that time.  Let me do a few reboots, then I'll kick off the test.17:40
jsalisburyhenrix, arges, got another panic right at the login screen.  The RIP is effective_load() in the scheduler.17:50
jsalisburyhenrix, arges, This may be another bug, so let me see if I can reproduce it with the Precise kernel 17:51
henrixjsalisbury: just curious, are you running on baremetal or VM?18:11
jsalisburyhenrix, baremetal18:37
jsalisburyhenrix, I let the script run that reproduces the bug, and it's at 2300 iterations without reproducing it.18:37
jsalisburyhenrix, I'll see if I can reproduce the panic shortly.18:38
* ppisati goes to find some food...19:55
apwpsivaa, thanks for the info20:02
ogasawarajsalisbury_, henrix: what's the bug# for this fsnotify regression?  I wanted to throw it in this weekly status report20:22
jsalisbury_ogasawara, bug 110166620:23
ubot2Launchpad bug 1101666 in linux (Ubuntu Quantal) "inotify fd leak" [Medium,In progress] https://launchpad.net/bugs/110166620:23
ogasawarajsalisbury_: thanks20:23
=== jsalisbury_ is now known as jsalisbury
=== henrix is now known as henrix_

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!