[13:36] <kernelbug> problems with bisection...
[13:36] <kernelbug> what if the kernel is not listed here? http://people.canonical.com/~kernel/info/kernel-version-map.html
[13:36] <kernelbug> how i'm supposed to map anything then? :)
[13:37] <kernelbug> i ran mainline-build-one and got errors...
[13:38] <kernelbug> fatal: invalid reference: Ubuntu-3.18.0-7.8
[13:38] <kernelbug> vivid-amd64: chroot not found (::,)
[13:40] <mamarley> apw: So I figured out the cause of my problem.  It wasn't actually a regression in the kernel, it is the "watchdog" service.  If I stop that service, suspend works perfectly.  Odd...
[13:42] <ogra_> wow
[13:43] <ogra_> there is an option in /etc/default/watchdog to disable it permanently 
[13:44] <ogra_> (and please fil a bug too)
[13:44] <mamarley> I want to try it on another box first to see if it has the same problem there.
[13:44] <ogra_> good idea
[13:47] <mamarley> I actually had to add an [Install] section with a WantedBy to get it to work at all, so I am afraid any bug I file would be invalid anyway.
[13:48] <kernelbug> so? :)
[13:48] <kernelbug> why errors?
[13:52] <kernelbug> :<
[13:53] <mamarley> I just tried another system with the iTCO_wdt device and watchdog enabled.  It does not seem to suffer from the same issue.
[13:57] <kernelbug> am i ignored or are you just busy? :)
[14:11] <apw> or perhaps, maybe, it is the weekend?  the first error implies you are not in the appropriate git repo, the second that you do not have build chroots created
[14:17] <kernelbug> apw, hmm
[14:17] <kernelbug> apw, maybe it's morning somewhere ;)
[14:18] <kernelbug> apw, well i followed the instructions
[14:19] <kernelbug> https://wiki.ubuntu.com/Kernel/KernelBisection
[14:49] <kernelbug> apw, "HEAD is now at 32ac5b4... UBUNTU: Ubuntu-3.19.0-56.62"
[14:49] <kernelbug> apw, why 3.19? :)
[14:59] <kernelbug> well, see you later...
[17:01] <alkisg> Hi guys, post the 3.2 kernel I'm getting a 10-second delay at boot: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1259861
[17:02] <alkisg> The weird thing is that it happens in some real hardware and under virtualbox, but I was not able to reproduce it under KVM
[17:03] <alkisg> I'm guessing there's some "wait up to 10 seconds for that event" code somewhere, but how can I pinpoint it to help in solving the issue?
[17:03] <alkisg> It's Ubuntu specific, I haven't seen it in any vanilla kernels so far
[17:17] <TJ-> alkisg: does a boot with 'debug' indicate where the delay is by the last messages before the delay?
[17:19] <alkisg> TJ-: no, I tried with debug and it made no difference in dmesg
[17:19] <alkisg> And booting with various clients, the messages I see around the delay are not the same
[17:20] <alkisg> TJ-, it's possible that it happens on your PC as well, you could run dmesg and see...
[17:20] <apw> alkisg, hmmmm, there must be a pattern
[17:20] <apw> citainly i do not have a 10s that i can see
[17:20] <TJ-> alkisg: which exact Ubunt kernel version do you see it on first (after 3.2) - we can go back through the commits from that
[17:21] <TJ-> alkisg: no, it doesn't happen for me :)
[17:21] <alkisg> I think that if I run the stock ubuntu live cds in virtualbox, it always happens
[17:21] <alkisg> So I can test with e.g. 12.04.1 (it doesn't), 12.10, 12.04.2...
[17:21] <alkisg> Will that help?
[17:26]  * alkisg has something running in kvm and can't use virtualbox right now, but will try it in half an hour or so when it finishes
[17:31] <alkisg> One dmesg from 16.04 on some i5, netbooted with LTSP: http://paste.debian.net/417051/
[17:31] <alkisg> The delay is before the initramfs gets loaded, at 2 => 12 sec
[17:32] <ogra_> different compressions ?
[17:33] <infinity> That kinda looks like you have a sleep 10 in your initrd. :P
[17:34] <infinity> Which would be a local thing (or a weird package), cause I've never seen it.
[17:34] <infinity> alkisg: rgrep sleep /usr/share/initramfs-tools/ | pastebinit
[17:35] <alkisg> infinity: the delay is before the initramfs gets loaded
[17:35] <alkisg> But weird as it sounds, I tried removing the "ip=" parameters from the cmdline
[17:35] <alkisg> And the delay seems to go away
[17:35] <infinity> On what are you basing "before the initramfs"?
[17:35] <alkisg> ip= is normally processed by the initramfs, but in this case it's also causing 10 sec delay before the initramfs
[17:36] <alkisg> infinity: because the delay happens before e.g. break=top
[17:37] <ogra_> you mean "before the initramfs writes to logs" ... 
[17:37] <TJ-> alkisg: have you considered its due to system delays in loading/decompressing the initrd.img, as ogra_ hinted at?
[17:38] <ogra_> how do you know it doesnt operate 
[17:38] <infinity> TJ-: No way that machine would take 10s to load an initrd unless it was a several hundred megs.
[17:38] <ogra_> ... it simply doesnt print for 10sec ... it might as well process something 
[17:38] <alkisg> TJ-: I think it is indeed because somehow ip=xxx is processed by the kernel (so same initrd and compression methods etc)
[17:38] <alkisg> Give me 5 mins to check
[17:40] <TJ-> alkisg: if the mass storage device has bad sectors there may be I/O errors going on
[17:41] <ogra_> infinity, well, a 25MB xz compressed initrd (typical ubuntu initrd nowadays) on a 600MHz single core CPU can surely take a while 
[17:41] <ogra_> (indeed this isnt a 600MHz single core :) )
[17:42] <alkisg> Yup, ip= is what's causing it
[17:42] <alkisg> So, my test so far is:
[17:42] <infinity> Indeed.  And it's obviously not the issue if removing cmdline args "fixes" it.
[17:42] <infinity> Nor is it I/O issues, etc.
[17:42] <ogra_> yeah
[17:42] <alkisg> I put break=top. I get to the initramfs prompt in 2 secs.
[17:42] <alkisg> I put ip=dhcp break=top. I get to the initramfs prompt in 12 secs.
[17:43] <alkisg> And of course ip= is processed long after "top" by the initramfs
[17:43] <ogra_> is the NIC driver builtin ? 
[17:43] <alkisg> No idea, but udev hasn't ran yet at that point...
[17:43] <alkisg> Let me reboot to see which nic it has
[17:44]  * ogra_ was more interested in modprobe than udev
[17:45] <infinity> Indeed, nothing *should* be doing much with IP before break=top
[17:45] <infinity> But that doesn't stop people from being silly.
[17:45] <infinity> (note that conf/conf.d is sourced, so one can have code in there, for instance)
[17:46] <alkisg> I can put a custom init inside the initramfs if that'll help in proving that initramfs isn't to blame
[17:46] <alkisg> Hmm or I could just not use an initramfs and check the time of the kernel panic
[17:46] <infinity> Sure, just 'ln sh init'
[17:46] <alkisg> ogra_: 03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 0c
[17:46] <alkisg> Stock 16.04, I suppose that's not builtin, right?
[17:46] <infinity> Or drop the initrd, yes.
[17:47] <ogra_> right,, not builtin ... 
[17:47] <ogra_> could be the module loading (or the firmware loading) that slows you down
[17:47] <infinity> Nah.
[17:47] <infinity> The modules are loaded after his delay.
[17:47] <infinity> dmesg is pretty clear on that.
[17:48] <infinity> I suppose the kernel could be attempting to configure a nonexistent device for 10s, but that seems pretty amazingly silly if it is (and seems like a bug people would have yelled about earlier)
[17:48] <ogra_> how would ip= work without a network device ?
[17:48] <infinity> ogra_: The initrd processes it later.
[17:49] <ogra_> so it times out til it notices there is no NIC yet ? 
[17:49] <alkisg> So, without initrd. (1) without ip= → kernel panic in 0.58. (2) with ip= → kernel panic in 12 sec.
[17:50] <alkisg> I think that rules out the initramfs issues
[17:50]  * alkisg tries various ip=xxxx parameters, to see if "none" or something bypasses it...
[17:51] <infinity> It's entirely possible the kernel is blocking with ip=dhcp because there's no interface yet, but man, that seems braindead.
[17:51] <alkisg> There's a way to have the kernel assign an ip
[17:51] <alkisg> It needs some CONFIG_XXX lines and the module to be included
[17:51] <alkisg> But those aren't there by default in Ubuntu kernel builds
[17:51] <TJ-> net/ipv4/ipconfig.c is responsible for processing the "ip=" internally, dhcp is going to result in a delay if the DHCP server doesn't respond. Have you tcpdump-ed the network link?
[17:52] <infinity> TJ-: The dhcp server can't respond, there's no interface. :)
[17:52] <infinity> (nothing to tcpdump)
[17:52] <alkisg> TJ-: and also in my initial try, I was putting a static ip=x:x:x:x: there, that too caused a delay
[17:52] <alkisg> (IPAPPEND 3 in pxelinux)
[17:53] <TJ-> infinity: that'll not help :)
[17:55] <ogra_> just go for https://sourceforge.net/projects/ubuntubsd/ ... i heard BSD is a lot better for network stuff *g*
[17:56] <alkisg> Haha
[18:00] <TJ-> alkisg: try enabling dynamic debug during boot for the ipconfig handler to begin with:   ... "ddebug_query=file net/ipv4/ipconfig.c +pflm" ...
[18:02] <alkisg> TJ-: I just put that part in the cmdline? Thanks, trying...
[18:03] <TJ-> alkisg: depending on the age of the kernel that may need some modification since things have changed alot with both the key and the pr_debug call sites
[18:03] <alkisg> Stock 16.04, vmlinuz-4.4.0-14-generic i386
[18:04] <alkisg> Linux 4.4.0-14-generic #30-Ubuntu SMP Tue Mar 15 13:02:52 UTC 2016 i686 i686 i686 GNU/Linux
[18:05] <TJ-> ok, you may need to replace 'ddebug_query=' with 'dyndbg='
[18:07] <TJ-> see Documentation/dynamic-debug-howto.txt for more detail
[18:08] <alkisg> I tried both, but I didn't see any changes. Would I be seeing more output in the screen, or does it go to some internal files e.g. under /sys, /proc or whatever?
[18:09] <TJ-> you will see it in the dmesg output
[18:10] <alkisg> Nothing there... are you sure that's the correct file? Isn't that from klibc which goes inside the initramfs? Is that same file included in the kernel as well?
[18:10] <TJ-> there are a lot of pr_debug() sites in ipconfig.c so if you've set "ip=dhcp" i'd expect to see something. I seem to recall the kernel will stop processing the options after a "--" so make sure its before that if it occurs
[18:11] <TJ-> net/ipv4/ipconfig.c is the source file where the code for 'ip=' lives
[18:12] <alkisg> I have ip=some:static:ip, and no "--"... I'll try with ip=dhcp as well
[18:13] <TJ-> it is possible, if there's no network device, that code never gets called. in which case you'd need to identify which part of the code is responsible for 'looking' for network devices and looking for pr_debug() call sites there, then putting them on the command line as well
[18:14] <alkisg> No difference with ip=dhcp.
[18:14] <alkisg> The good thing is that now it's very easily reproducible
[18:15] <alkisg> Anyone can just put ip=dhcp in his cmdline and reproduce it
[18:19] <alkisg> (maybe not in kvm though, maybe the kernel handles virtualized networking differently)
[18:22] <infinity> alkisg: VIRTIO_NET is builtin on that kernel.
[18:22] <infinity> alkisg: Which helps the argument that the kernel is blocking when there's no device to configure.
[18:25] <alkisg> Looks like it... there are also several timeouts in ipconfig.c that may match what I'm seeing
[18:25] <alkisg> ip=none doesn't cause the issue
[18:29] <alkisg> ip=10.161.254.61:10.161.254.11:10.161.254.1:255.255.255.0:pc61::none does cause it
[18:37] <TJ-> alkisg: you might want to add to that existing dyndbg this: ... "; file drivers/net/virtio_net.c +pflm"...
[18:38] <alkisg> Thank you TJ-, trying...
[18:39] <TJ-> basically, in the source tree once you identify a location (in the built-in source at this point) do a "git grep 'pr_debug' path/to/files" to see if there are dyn-debug sites to make use of
[18:40] <alkisg> I think that I will need to debug ipconfig.c though, and that only has "ifdef IPCONFIG_DEBUG" there, no pr_debug...
[18:41] <alkisg> And since the driver is realtek, I'm not sure that debugging virtio will help
[18:41] <alkisg> Maybe all that means I'll have to build the kernel myself, while setting IPCONFIG_DEBUG there? :-/
[18:44] <alkisg> Virtualbox with virtio instead of intel nic emulation, doesn't have the delay
[18:45] <alkisg> (it takes 3 minutes to load the kernel/initrd via the network with virtio under vbox, but ok that's completely unrelated, it just makes debugging that way suck)
[18:58] <alkisg> Hrm, I can't see any debug messages with virtio either
[19:03] <alkisg> So yesterday booting LTSP clients in 16.04 took 45 seconds, now with this and 2 other delays I got rid of caused by the initramfs, it got down to 20 seconds :)
[20:59] <kbug> anyone awake? :)
[20:59] <kbug> still issues with the bisection...