=== martink3_ is now known as martink3 | ||
kernelbug | problems with bisection... | 13:36 |
---|---|---|
kernelbug | what if the kernel is not listed here? http://people.canonical.com/~kernel/info/kernel-version-map.html | 13:36 |
kernelbug | how i'm supposed to map anything then? :) | 13:36 |
kernelbug | i ran mainline-build-one and got errors... | 13:37 |
kernelbug | fatal: invalid reference: Ubuntu-3.18.0-7.8 | 13:38 |
kernelbug | vivid-amd64: chroot not found (::,) | 13:38 |
mamarley | apw: So I figured out the cause of my problem. It wasn't actually a regression in the kernel, it is the "watchdog" service. If I stop that service, suspend works perfectly. Odd... | 13:40 |
ogra_ | wow | 13:42 |
ogra_ | there is an option in /etc/default/watchdog to disable it permanently | 13:43 |
ogra_ | (and please fil a bug too) | 13:44 |
mamarley | I want to try it on another box first to see if it has the same problem there. | 13:44 |
ogra_ | good idea | 13:44 |
mamarley | I actually had to add an [Install] section with a WantedBy to get it to work at all, so I am afraid any bug I file would be invalid anyway. | 13:47 |
kernelbug | so? :) | 13:48 |
kernelbug | why errors? | 13:48 |
kernelbug | :< | 13:52 |
mamarley | I just tried another system with the iTCO_wdt device and watchdog enabled. It does not seem to suffer from the same issue. | 13:53 |
kernelbug | am i ignored or are you just busy? :) | 13:57 |
apw | or perhaps, maybe, it is the weekend? the first error implies you are not in the appropriate git repo, the second that you do not have build chroots created | 14:11 |
kernelbug | apw, hmm | 14:17 |
kernelbug | apw, maybe it's morning somewhere ;) | 14:17 |
kernelbug | apw, well i followed the instructions | 14:18 |
kernelbug | https://wiki.ubuntu.com/Kernel/KernelBisection | 14:19 |
kernelbug | apw, "HEAD is now at 32ac5b4... UBUNTU: Ubuntu-3.19.0-56.62" | 14:49 |
kernelbug | apw, why 3.19? :) | 14:49 |
kernelbug | well, see you later... | 14:59 |
=== DevBox|2 is now known as DevBox | ||
alkisg | Hi guys, post the 3.2 kernel I'm getting a 10-second delay at boot: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1259861 | 17:01 |
ubot5 | Launchpad bug 1259861 in linux (Ubuntu) "5-10 second delay in kernel boot" [Medium,Confirmed] | 17:01 |
alkisg | The weird thing is that it happens in some real hardware and under virtualbox, but I was not able to reproduce it under KVM | 17:02 |
alkisg | I'm guessing there's some "wait up to 10 seconds for that event" code somewhere, but how can I pinpoint it to help in solving the issue? | 17:03 |
alkisg | It's Ubuntu specific, I haven't seen it in any vanilla kernels so far | 17:03 |
TJ- | alkisg: does a boot with 'debug' indicate where the delay is by the last messages before the delay? | 17:17 |
alkisg | TJ-: no, I tried with debug and it made no difference in dmesg | 17:19 |
alkisg | And booting with various clients, the messages I see around the delay are not the same | 17:19 |
alkisg | TJ-, it's possible that it happens on your PC as well, you could run dmesg and see... | 17:20 |
apw | alkisg, hmmmm, there must be a pattern | 17:20 |
apw | citainly i do not have a 10s that i can see | 17:20 |
TJ- | alkisg: which exact Ubunt kernel version do you see it on first (after 3.2) - we can go back through the commits from that | 17:20 |
TJ- | alkisg: no, it doesn't happen for me :) | 17:21 |
alkisg | I think that if I run the stock ubuntu live cds in virtualbox, it always happens | 17:21 |
alkisg | So I can test with e.g. 12.04.1 (it doesn't), 12.10, 12.04.2... | 17:21 |
alkisg | Will that help? | 17:21 |
* alkisg has something running in kvm and can't use virtualbox right now, but will try it in half an hour or so when it finishes | 17:26 | |
alkisg | One dmesg from 16.04 on some i5, netbooted with LTSP: http://paste.debian.net/417051/ | 17:31 |
alkisg | The delay is before the initramfs gets loaded, at 2 => 12 sec | 17:31 |
ogra_ | different compressions ? | 17:32 |
infinity | That kinda looks like you have a sleep 10 in your initrd. :P | 17:33 |
infinity | Which would be a local thing (or a weird package), cause I've never seen it. | 17:34 |
infinity | alkisg: rgrep sleep /usr/share/initramfs-tools/ | pastebinit | 17:34 |
alkisg | infinity: the delay is before the initramfs gets loaded | 17:35 |
alkisg | But weird as it sounds, I tried removing the "ip=" parameters from the cmdline | 17:35 |
alkisg | And the delay seems to go away | 17:35 |
infinity | On what are you basing "before the initramfs"? | 17:35 |
alkisg | ip= is normally processed by the initramfs, but in this case it's also causing 10 sec delay before the initramfs | 17:35 |
alkisg | infinity: because the delay happens before e.g. break=top | 17:36 |
ogra_ | you mean "before the initramfs writes to logs" ... | 17:37 |
TJ- | alkisg: have you considered its due to system delays in loading/decompressing the initrd.img, as ogra_ hinted at? | 17:37 |
ogra_ | how do you know it doesnt operate | 17:38 |
infinity | TJ-: No way that machine would take 10s to load an initrd unless it was a several hundred megs. | 17:38 |
ogra_ | ... it simply doesnt print for 10sec ... it might as well process something | 17:38 |
alkisg | TJ-: I think it is indeed because somehow ip=xxx is processed by the kernel (so same initrd and compression methods etc) | 17:38 |
alkisg | Give me 5 mins to check | 17:38 |
TJ- | alkisg: if the mass storage device has bad sectors there may be I/O errors going on | 17:40 |
ogra_ | infinity, well, a 25MB xz compressed initrd (typical ubuntu initrd nowadays) on a 600MHz single core CPU can surely take a while | 17:41 |
ogra_ | (indeed this isnt a 600MHz single core :) ) | 17:41 |
alkisg | Yup, ip= is what's causing it | 17:42 |
alkisg | So, my test so far is: | 17:42 |
infinity | Indeed. And it's obviously not the issue if removing cmdline args "fixes" it. | 17:42 |
infinity | Nor is it I/O issues, etc. | 17:42 |
ogra_ | yeah | 17:42 |
alkisg | I put break=top. I get to the initramfs prompt in 2 secs. | 17:42 |
alkisg | I put ip=dhcp break=top. I get to the initramfs prompt in 12 secs. | 17:42 |
alkisg | And of course ip= is processed long after "top" by the initramfs | 17:43 |
ogra_ | is the NIC driver builtin ? | 17:43 |
alkisg | No idea, but udev hasn't ran yet at that point... | 17:43 |
alkisg | Let me reboot to see which nic it has | 17:43 |
* ogra_ was more interested in modprobe than udev | 17:44 | |
infinity | Indeed, nothing *should* be doing much with IP before break=top | 17:45 |
infinity | But that doesn't stop people from being silly. | 17:45 |
infinity | (note that conf/conf.d is sourced, so one can have code in there, for instance) | 17:45 |
alkisg | I can put a custom init inside the initramfs if that'll help in proving that initramfs isn't to blame | 17:46 |
alkisg | Hmm or I could just not use an initramfs and check the time of the kernel panic | 17:46 |
infinity | Sure, just 'ln sh init' | 17:46 |
alkisg | ogra_: 03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c | 17:46 |
alkisg | Stock 16.04, I suppose that's not builtin, right? | 17:46 |
infinity | Or drop the initrd, yes. | 17:46 |
ogra_ | right,, not builtin ... | 17:47 |
ogra_ | could be the module loading (or the firmware loading) that slows you down | 17:47 |
infinity | Nah. | 17:47 |
infinity | The modules are loaded after his delay. | 17:47 |
infinity | dmesg is pretty clear on that. | 17:47 |
infinity | I suppose the kernel could be attempting to configure a nonexistent device for 10s, but that seems pretty amazingly silly if it is (and seems like a bug people would have yelled about earlier) | 17:48 |
ogra_ | how would ip= work without a network device ? | 17:48 |
infinity | ogra_: The initrd processes it later. | 17:48 |
ogra_ | so it times out til it notices there is no NIC yet ? | 17:49 |
alkisg | So, without initrd. (1) without ip= → kernel panic in 0.58. (2) with ip= → kernel panic in 12 sec. | 17:49 |
alkisg | I think that rules out the initramfs issues | 17:50 |
* alkisg tries various ip=xxxx parameters, to see if "none" or something bypasses it... | 17:50 | |
infinity | It's entirely possible the kernel is blocking with ip=dhcp because there's no interface yet, but man, that seems braindead. | 17:51 |
alkisg | There's a way to have the kernel assign an ip | 17:51 |
alkisg | It needs some CONFIG_XXX lines and the module to be included | 17:51 |
alkisg | But those aren't there by default in Ubuntu kernel builds | 17:51 |
TJ- | net/ipv4/ipconfig.c is responsible for processing the "ip=" internally, dhcp is going to result in a delay if the DHCP server doesn't respond. Have you tcpdump-ed the network link? | 17:51 |
infinity | TJ-: The dhcp server can't respond, there's no interface. :) | 17:52 |
infinity | (nothing to tcpdump) | 17:52 |
alkisg | TJ-: and also in my initial try, I was putting a static ip=x:x:x:x: there, that too caused a delay | 17:52 |
alkisg | (IPAPPEND 3 in pxelinux) | 17:52 |
TJ- | infinity: that'll not help :) | 17:53 |
ogra_ | just go for https://sourceforge.net/projects/ubuntubsd/ ... i heard BSD is a lot better for network stuff *g* | 17:55 |
alkisg | Haha | 17:56 |
TJ- | alkisg: try enabling dynamic debug during boot for the ipconfig handler to begin with: ... "ddebug_query=file net/ipv4/ipconfig.c +pflm" ... | 18:00 |
alkisg | TJ-: I just put that part in the cmdline? Thanks, trying... | 18:02 |
TJ- | alkisg: depending on the age of the kernel that may need some modification since things have changed alot with both the key and the pr_debug call sites | 18:03 |
alkisg | Stock 16.04, vmlinuz-4.4.0-14-generic i386 | 18:03 |
alkisg | Linux 4.4.0-14-generic #30-Ubuntu SMP Tue Mar 15 13:02:52 UTC 2016 i686 i686 i686 GNU/Linux | 18:04 |
TJ- | ok, you may need to replace 'ddebug_query=' with 'dyndbg=' | 18:05 |
TJ- | see Documentation/dynamic-debug-howto.txt for more detail | 18:07 |
alkisg | I tried both, but I didn't see any changes. Would I be seeing more output in the screen, or does it go to some internal files e.g. under /sys, /proc or whatever? | 18:08 |
TJ- | you will see it in the dmesg output | 18:09 |
alkisg | Nothing there... are you sure that's the correct file? Isn't that from klibc which goes inside the initramfs? Is that same file included in the kernel as well? | 18:10 |
TJ- | there are a lot of pr_debug() sites in ipconfig.c so if you've set "ip=dhcp" i'd expect to see something. I seem to recall the kernel will stop processing the options after a "--" so make sure its before that if it occurs | 18:10 |
TJ- | net/ipv4/ipconfig.c is the source file where the code for 'ip=' lives | 18:11 |
alkisg | I have ip=some:static:ip, and no "--"... I'll try with ip=dhcp as well | 18:12 |
TJ- | it is possible, if there's no network device, that code never gets called. in which case you'd need to identify which part of the code is responsible for 'looking' for network devices and looking for pr_debug() call sites there, then putting them on the command line as well | 18:13 |
alkisg | No difference with ip=dhcp. | 18:14 |
alkisg | The good thing is that now it's very easily reproducible | 18:14 |
alkisg | Anyone can just put ip=dhcp in his cmdline and reproduce it | 18:15 |
alkisg | (maybe not in kvm though, maybe the kernel handles virtualized networking differently) | 18:19 |
infinity | alkisg: VIRTIO_NET is builtin on that kernel. | 18:22 |
infinity | alkisg: Which helps the argument that the kernel is blocking when there's no device to configure. | 18:22 |
alkisg | Looks like it... there are also several timeouts in ipconfig.c that may match what I'm seeing | 18:25 |
alkisg | ip=none doesn't cause the issue | 18:25 |
alkisg | ip=10.161.254.61:10.161.254.11:10.161.254.1:255.255.255.0:pc61::none does cause it | 18:29 |
TJ- | alkisg: you might want to add to that existing dyndbg this: ... "; file drivers/net/virtio_net.c +pflm"... | 18:37 |
alkisg | Thank you TJ-, trying... | 18:38 |
TJ- | basically, in the source tree once you identify a location (in the built-in source at this point) do a "git grep 'pr_debug' path/to/files" to see if there are dyn-debug sites to make use of | 18:39 |
alkisg | I think that I will need to debug ipconfig.c though, and that only has "ifdef IPCONFIG_DEBUG" there, no pr_debug... | 18:40 |
alkisg | And since the driver is realtek, I'm not sure that debugging virtio will help | 18:41 |
alkisg | Maybe all that means I'll have to build the kernel myself, while setting IPCONFIG_DEBUG there? :-/ | 18:41 |
alkisg | Virtualbox with virtio instead of intel nic emulation, doesn't have the delay | 18:44 |
alkisg | (it takes 3 minutes to load the kernel/initrd via the network with virtio under vbox, but ok that's completely unrelated, it just makes debugging that way suck) | 18:45 |
alkisg | Hrm, I can't see any debug messages with virtio either | 18:58 |
alkisg | So yesterday booting LTSP clients in 16.04 took 45 seconds, now with this and 2 other delays I got rid of caused by the initramfs, it got down to 20 seconds :) | 19:03 |
kbug | anyone awake? :) | 20:59 |
kbug | still issues with the bisection... | 20:59 |
=== hallyn81 is now known as hallyn |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!