=== martink3_ is now known as martink3 [13:36] problems with bisection... [13:36] what if the kernel is not listed here? http://people.canonical.com/~kernel/info/kernel-version-map.html [13:36] how i'm supposed to map anything then? :) [13:37] i ran mainline-build-one and got errors... [13:38] fatal: invalid reference: Ubuntu-3.18.0-7.8 [13:38] vivid-amd64: chroot not found (::,) [13:40] apw: So I figured out the cause of my problem. It wasn't actually a regression in the kernel, it is the "watchdog" service. If I stop that service, suspend works perfectly. Odd... [13:42] wow [13:43] there is an option in /etc/default/watchdog to disable it permanently [13:44] (and please fil a bug too) [13:44] I want to try it on another box first to see if it has the same problem there. [13:44] good idea [13:47] I actually had to add an [Install] section with a WantedBy to get it to work at all, so I am afraid any bug I file would be invalid anyway. [13:48] so? :) [13:48] why errors? [13:52] :< [13:53] I just tried another system with the iTCO_wdt device and watchdog enabled. It does not seem to suffer from the same issue. [13:57] am i ignored or are you just busy? :) [14:11] or perhaps, maybe, it is the weekend? the first error implies you are not in the appropriate git repo, the second that you do not have build chroots created [14:17] apw, hmm [14:17] apw, maybe it's morning somewhere ;) [14:18] apw, well i followed the instructions [14:19] https://wiki.ubuntu.com/Kernel/KernelBisection [14:49] apw, "HEAD is now at 32ac5b4... UBUNTU: Ubuntu-3.19.0-56.62" [14:49] apw, why 3.19? :) [14:59] well, see you later... === DevBox|2 is now known as DevBox [17:01] Hi guys, post the 3.2 kernel I'm getting a 10-second delay at boot: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1259861 [17:01] Launchpad bug 1259861 in linux (Ubuntu) "5-10 second delay in kernel boot" [Medium,Confirmed] [17:02] The weird thing is that it happens in some real hardware and under virtualbox, but I was not able to reproduce it under KVM [17:03] I'm guessing there's some "wait up to 10 seconds for that event" code somewhere, but how can I pinpoint it to help in solving the issue? [17:03] It's Ubuntu specific, I haven't seen it in any vanilla kernels so far [17:17] alkisg: does a boot with 'debug' indicate where the delay is by the last messages before the delay? [17:19] TJ-: no, I tried with debug and it made no difference in dmesg [17:19] And booting with various clients, the messages I see around the delay are not the same [17:20] TJ-, it's possible that it happens on your PC as well, you could run dmesg and see... [17:20] alkisg, hmmmm, there must be a pattern [17:20] citainly i do not have a 10s that i can see [17:20] alkisg: which exact Ubunt kernel version do you see it on first (after 3.2) - we can go back through the commits from that [17:21] alkisg: no, it doesn't happen for me :) [17:21] I think that if I run the stock ubuntu live cds in virtualbox, it always happens [17:21] So I can test with e.g. 12.04.1 (it doesn't), 12.10, 12.04.2... [17:21] Will that help? [17:26] * alkisg has something running in kvm and can't use virtualbox right now, but will try it in half an hour or so when it finishes [17:31] One dmesg from 16.04 on some i5, netbooted with LTSP: http://paste.debian.net/417051/ [17:31] The delay is before the initramfs gets loaded, at 2 => 12 sec [17:32] different compressions ? [17:33] That kinda looks like you have a sleep 10 in your initrd. :P [17:34] Which would be a local thing (or a weird package), cause I've never seen it. [17:34] alkisg: rgrep sleep /usr/share/initramfs-tools/ | pastebinit [17:35] infinity: the delay is before the initramfs gets loaded [17:35] But weird as it sounds, I tried removing the "ip=" parameters from the cmdline [17:35] And the delay seems to go away [17:35] On what are you basing "before the initramfs"? [17:35] ip= is normally processed by the initramfs, but in this case it's also causing 10 sec delay before the initramfs [17:36] infinity: because the delay happens before e.g. break=top [17:37] you mean "before the initramfs writes to logs" ... [17:37] alkisg: have you considered its due to system delays in loading/decompressing the initrd.img, as ogra_ hinted at? [17:38] how do you know it doesnt operate [17:38] TJ-: No way that machine would take 10s to load an initrd unless it was a several hundred megs. [17:38] ... it simply doesnt print for 10sec ... it might as well process something [17:38] TJ-: I think it is indeed because somehow ip=xxx is processed by the kernel (so same initrd and compression methods etc) [17:38] Give me 5 mins to check [17:40] alkisg: if the mass storage device has bad sectors there may be I/O errors going on [17:41] infinity, well, a 25MB xz compressed initrd (typical ubuntu initrd nowadays) on a 600MHz single core CPU can surely take a while [17:41] (indeed this isnt a 600MHz single core :) ) [17:42] Yup, ip= is what's causing it [17:42] So, my test so far is: [17:42] Indeed. And it's obviously not the issue if removing cmdline args "fixes" it. [17:42] Nor is it I/O issues, etc. [17:42] yeah [17:42] I put break=top. I get to the initramfs prompt in 2 secs. [17:42] I put ip=dhcp break=top. I get to the initramfs prompt in 12 secs. [17:43] And of course ip= is processed long after "top" by the initramfs [17:43] is the NIC driver builtin ? [17:43] No idea, but udev hasn't ran yet at that point... [17:43] Let me reboot to see which nic it has [17:44] * ogra_ was more interested in modprobe than udev [17:45] Indeed, nothing *should* be doing much with IP before break=top [17:45] But that doesn't stop people from being silly. [17:45] (note that conf/conf.d is sourced, so one can have code in there, for instance) [17:46] I can put a custom init inside the initramfs if that'll help in proving that initramfs isn't to blame [17:46] Hmm or I could just not use an initramfs and check the time of the kernel panic [17:46] Sure, just 'ln sh init' [17:46] ogra_: 03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c [17:46] Stock 16.04, I suppose that's not builtin, right? [17:46] Or drop the initrd, yes. [17:47] right,, not builtin ... [17:47] could be the module loading (or the firmware loading) that slows you down [17:47] Nah. [17:47] The modules are loaded after his delay. [17:47] dmesg is pretty clear on that. [17:48] I suppose the kernel could be attempting to configure a nonexistent device for 10s, but that seems pretty amazingly silly if it is (and seems like a bug people would have yelled about earlier) [17:48] how would ip= work without a network device ? [17:48] ogra_: The initrd processes it later. [17:49] so it times out til it notices there is no NIC yet ? [17:49] So, without initrd. (1) without ip= → kernel panic in 0.58. (2) with ip= → kernel panic in 12 sec. [17:50] I think that rules out the initramfs issues [17:50] * alkisg tries various ip=xxxx parameters, to see if "none" or something bypasses it... [17:51] It's entirely possible the kernel is blocking with ip=dhcp because there's no interface yet, but man, that seems braindead. [17:51] There's a way to have the kernel assign an ip [17:51] It needs some CONFIG_XXX lines and the module to be included [17:51] But those aren't there by default in Ubuntu kernel builds [17:51] net/ipv4/ipconfig.c is responsible for processing the "ip=" internally, dhcp is going to result in a delay if the DHCP server doesn't respond. Have you tcpdump-ed the network link? [17:52] TJ-: The dhcp server can't respond, there's no interface. :) [17:52] (nothing to tcpdump) [17:52] TJ-: and also in my initial try, I was putting a static ip=x:x:x:x: there, that too caused a delay [17:52] (IPAPPEND 3 in pxelinux) [17:53] infinity: that'll not help :) [17:55] just go for https://sourceforge.net/projects/ubuntubsd/ ... i heard BSD is a lot better for network stuff *g* [17:56] Haha [18:00] alkisg: try enabling dynamic debug during boot for the ipconfig handler to begin with: ... "ddebug_query=file net/ipv4/ipconfig.c +pflm" ... [18:02] TJ-: I just put that part in the cmdline? Thanks, trying... [18:03] alkisg: depending on the age of the kernel that may need some modification since things have changed alot with both the key and the pr_debug call sites [18:03] Stock 16.04, vmlinuz-4.4.0-14-generic i386 [18:04] Linux 4.4.0-14-generic #30-Ubuntu SMP Tue Mar 15 13:02:52 UTC 2016 i686 i686 i686 GNU/Linux [18:05] ok, you may need to replace 'ddebug_query=' with 'dyndbg=' [18:07] see Documentation/dynamic-debug-howto.txt for more detail [18:08] I tried both, but I didn't see any changes. Would I be seeing more output in the screen, or does it go to some internal files e.g. under /sys, /proc or whatever? [18:09] you will see it in the dmesg output [18:10] Nothing there... are you sure that's the correct file? Isn't that from klibc which goes inside the initramfs? Is that same file included in the kernel as well? [18:10] there are a lot of pr_debug() sites in ipconfig.c so if you've set "ip=dhcp" i'd expect to see something. I seem to recall the kernel will stop processing the options after a "--" so make sure its before that if it occurs [18:11] net/ipv4/ipconfig.c is the source file where the code for 'ip=' lives [18:12] I have ip=some:static:ip, and no "--"... I'll try with ip=dhcp as well [18:13] it is possible, if there's no network device, that code never gets called. in which case you'd need to identify which part of the code is responsible for 'looking' for network devices and looking for pr_debug() call sites there, then putting them on the command line as well [18:14] No difference with ip=dhcp. [18:14] The good thing is that now it's very easily reproducible [18:15] Anyone can just put ip=dhcp in his cmdline and reproduce it [18:19] (maybe not in kvm though, maybe the kernel handles virtualized networking differently) [18:22] alkisg: VIRTIO_NET is builtin on that kernel. [18:22] alkisg: Which helps the argument that the kernel is blocking when there's no device to configure. [18:25] Looks like it... there are also several timeouts in ipconfig.c that may match what I'm seeing [18:25] ip=none doesn't cause the issue [18:29] ip=10.161.254.61:10.161.254.11:10.161.254.1:255.255.255.0:pc61::none does cause it [18:37] alkisg: you might want to add to that existing dyndbg this: ... "; file drivers/net/virtio_net.c +pflm"... [18:38] Thank you TJ-, trying... [18:39] basically, in the source tree once you identify a location (in the built-in source at this point) do a "git grep 'pr_debug' path/to/files" to see if there are dyn-debug sites to make use of [18:40] I think that I will need to debug ipconfig.c though, and that only has "ifdef IPCONFIG_DEBUG" there, no pr_debug... [18:41] And since the driver is realtek, I'm not sure that debugging virtio will help [18:41] Maybe all that means I'll have to build the kernel myself, while setting IPCONFIG_DEBUG there? :-/ [18:44] Virtualbox with virtio instead of intel nic emulation, doesn't have the delay [18:45] (it takes 3 minutes to load the kernel/initrd via the network with virtio under vbox, but ok that's completely unrelated, it just makes debugging that way suck) [18:58] Hrm, I can't see any debug messages with virtio either [19:03] So yesterday booting LTSP clients in 16.04 took 45 seconds, now with this and 2 other delays I got rid of caused by the initramfs, it got down to 20 seconds :) [20:59] anyone awake? :) [20:59] still issues with the bisection... === hallyn81 is now known as hallyn