[07:41] apw, ring ring [07:42] apw, can you please tell me how to do git bijection and build kernel from source ? [07:42] already did s10gopal: "git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git" then "cd linux" and you're in the 'root' of the source tree [07:43] s10gopal, that is not a simple thing to do, there is a wiki page with the details somewhere; or it might be better if we get someone to feed you built kernels to test [07:43] s10gopal, do you h [07:43] and this too 'git checkout v4.12' then do 'git bisect start' 'git bisect good v4.12' git bisect bad v4.13' [07:43] s10gopal, do you have a 'good and bad' kernel pair [07:43] 4.12 is good and 4.13 is bad [07:44] then you might want to test the 4.13-rc1 mainline build, as that is 'kind of' in the middle of those two [07:44] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc1/ [07:44] that is already pre-built and will save you a bit of time [07:45] should i test all ?4.13rc* [07:45] and what about 4.12.* line ? [07:47] well most of the change goes in in -rc1 so that is the nearest to the 'middle' in a bisection sense as we have built [07:47] if you are lucky that one will be good then it is worth testing the other -rc's too, -rc4 or whatever and hone in on it, but test -rc1 and see if that is good/bad [07:48] what is all rc are bad ? [07:50] well if -rc1 is bad, then the issue is between 4.12 and 4.13-rc1 and that is the most likely outcome, but that is what we are trying to narrow in on [07:52] apw, thx testing it , and after getting good and bad pair , this bug can be fixed or i have to wait for years ? [07:53] s10gopal, once the bisect is complete we will know the exact commit which introduced the issue, tehn it needs looking at, sometimes it is obvious sometimes it needs work upstream [07:53] so i cannot possibly answer your question [07:53] thx , going to test rc , bye [07:53] at least there is hope, and it is within your control to move it along; consider how it would be with windows [07:59] dijuremo, there may be some movement on at least some of the hangs we have been seeing, i wonder if you could try out the kernels at the bottom of LP: #1759920 [07:59] Launchpad bug 1759920 in linux (Ubuntu Artful) "intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-image-4.13.0-37-generic)" [High,Confirmed] https://launchpad.net/bugs/1759920 [07:59] see the last comment, it has the links to the kernels and hints what to collect [08:00] iirc you were seeing issues with both 4.4 and 4.13 kernels === Elimin8r is now known as Elimin8er === oSoMoN_ is now known as oSoMoN === shadeslayer_ is now known as shadeslayer [11:19] seriously kernel folks, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1760973 [11:19] apw: It's done -> https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1759920/comments/68 [11:19] Ubuntu bug 1760973 in linux (Ubuntu) "virtualbox-dkms 5.2.8-dfsg-5: virtualbox kernel module failed to build [Makefile:976: "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel"]" [Critical,Confirmed] [11:19] Ubuntu bug 1759920 in linux (Ubuntu Artful) "intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-image-4.13.0-37-generic)" [High,Confirmed] [11:20] LocutusOfBorg, what? we broke the kernel in -proposed ? is this no a crime ? [11:20] now [11:21] dijuremo, do i take it it worked from all that :) [11:22] sforshee, ^ [11:23] so far so good... But this is the T3610 that was stable for a long time yesterday without noibpb [11:23] dijuremo, well do let us know how you get on, the more the merrier testing wise [11:24] The one where I was running prime95 on. I left it running and ran for many hours but today it was frozen with the *older* *unpatched* 4.13.0-38 [11:24] So this morning, force restarted it, applied the patches and now we will see... If I leave it running prime95 for a full day and does not crash, we have a winner... [11:25] tyhicks, ^ [11:30] apw: Do you guys plan to patch the 4.4.0 series as well to fix that one? [11:31] apw: What would you be your best estimate on a release of both patched 4.4.0 and 4.13.0 to the main Ubuntu channels? [11:31] dijuremo, yes we would be planning on fixing any affected kerenls, it would most likley be in the next cycle, we might expedite this fix it depends [11:31] dijuremo, so about 3 weeks [11:33] I guess I will try to hold in adding noibpb to my Ubuntu machines for now, only do it for those freezing, do not want to end up with all of them using nobipb and being vulnerable. Just do not have enough personnel and time for trying something that does config management, i.e. ansible, salt, etc... [12:07] apw, rc1 is also bad , which next to test ? [12:07] s10gopal, then the next is a real bisect from v4.12 to v4.13-rc1 [12:08] s10gopal, which i might make sense for someone to generate the krenls fro you [12:08] jsalisbury, ^ if you have time perhaps you could help out ? [12:08] should i test 4.12.14 too ? [12:10] apw, 4.12.1 to 4.12.14 === Blub0\0 is now known as Blub\0 [12:31] apw, ack [12:32] !ping [12:32] pong! [12:32] s10gopal, do you have a bug opened for this bisect? [12:32] jsalisbury, https://bugzilla.kernel.org/show_bug.cgi?id=198665 [12:32] bugzilla.kernel.org bug 198665 in Power-Off "Battery drains when laptop is off (shutdown) . WOL disabled and no usb device connected." [High,Needinfo] [12:33] jsalisbury, and this too https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1745646 [12:33] Ubuntu bug 1745646 in linux (Ubuntu) "Battery drains when laptop is off (shutdown)" [Medium,New] [12:34] s10gopal, I'll start a bisect between 4.12 and 4.13-rc1 and build a test kernel. I'll post it to the bug shortly. [12:34] jsalisbury, i am having ssd and core i5 , can you please teach me how to do it ? i will build it and test [12:35] s10gopal, if you'd like to try, there is a wiki that explains it. Let me grab the link [12:35] jsalisbury, but i am unable to understand it [12:36] s10gopal, https://wiki.ubuntu.com/Kernel/KernelBisection [12:36] jsalisbury, but i am unable to understand it [12:36] s10gopal, Ok, I'll build the test kernels for you to test then. [12:37] jsalisbury, thx a lot [12:37] jsalisbury, where you are going to post them ? [12:38] s10gopal, I"ll post a link in the bug and instructions. It will be here when built: [12:38] http://kernel.ubuntu.com/~jsalisbury/lp1745646/ === ghostcube_ is now known as ghostcube [14:15] jsalisbury, i need to install it by sudo dpkg -i linux-image-4.12.0-041200-generic_4.12.0-041200.201804041240_amd64.deb only ? [14:18] s10gopal, use $ sudo apt install ./linux....deb [14:18] s10gopal, note that './' is needed. If you have the full set of packages to install in a dir you can use $ sudo apt install ./*.deb [14:18] i did cd Downloads [14:18] s10gopal, that will tell you if you have all the right deps / if you forgor or missing any downloaded packages. [14:18] then sudo dpkg -i linux-image-4.12.0-041200-generic_4.12.0-041200.201804041240_amd64.deb [14:19] it is no longer advisable to ever use dpkg -i interractively.... using apt with downloaded debs is better. [14:19] xnox, can we install kernel by single file ? generally it is required to download 3 deb files [14:20] s10gopal, why? generally, no.... [14:20] why would you want to install it by single file? [14:21] it is not packaged like that... thus by not getting all the files, you will cause broken packages and broken dependencies on your system, preventing further installations of any packages if you force it [14:22] xnox, i downloaded a kernel from http://kernel.ubuntu.com/~jsalisbury/lp1745646/ [14:22] using apt install ./path-to.deb ./path-to.other.deb -> saveguards you from breaking your system. [14:22] s10gopal, if that installs fine using apt, it is fine. [14:22] xnox: is there any documentation on using 'apt' eith deb files? I don't see anything in 'man apt' [14:23] s10gopal, Just be sure to reboot after installing the new kernel and select it from the GRUB menu. [14:23] TJ-, i did sudo dpkg -i linux-image-4.12.0-041200-generic_4.12.0-041200.201804041240_amd64.deb , any other command or file is required too or i should reboot now ? [14:23] s10gopal, after a reboot, run 'uname -a' to ensure you booted into it. [14:23] TJ-, that would need ping to juliank, but he is not in this channel [14:23] * xnox summons juliank [14:24] ok thx [14:24] rebooting [14:24] xnox: you called [14:24] :D [14:24] xnox: is there any documentation on using 'apt' eith deb files? I don't see anything in 'man apt' [14:24] juliank, ^ [14:24] s/eith/with/ [14:25] hmm [14:25] maybe not [14:25] It would be nice to refer users to using apt that way [14:25] juliank, i do recall our extensive discussion of specifying things with ./*.deb at all. [14:25] yeah [14:26] What works is apt --with-source names..., and apt install /absolute/path ./relative/path/with/dot/in/front [14:26] Linux gopal-HP-Notebook 4.12.0-041200-generic #201804041240 SMP Wed Apr 4 12:44:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux [14:27] --with-source is documented (it adds e.g. a .changes or .deb file to the cache like a normal package set) [14:27] TJ-, yeah in general these days, after building things i only ever use $ sudo apt install ./*.deb -> e.g. when testing new builds of systemd, etc. [14:27] causer there are tricky depends that dpkg -i cannot get right, if i forget a package, or list them out of order. [14:27] but the other not [14:27] Nice things for testing builds are [14:27] xnox: right - I could do with apt instead of dpkg -i in some of my scripts [14:28] apt install --only-upgrade ../package_version_arch.changes [14:28] or without --only-upgrade if you need to install new stuff [14:28] TJ-, i am in jsalisbury kernel ? [14:28] and/or --with-source changes and running upgrade or something [14:28] I guess that makes a lot of sense [14:28] apt upgrade --with-source ../ [14:29] * TJ- pipes juliank into 'man-db' [14:29] For with-source at least, you can also generate a Packages file and install from that [14:30] You can use APT to upgrade locally compiled packages by passing it only a changes file? Where has this been all my life? [14:30] I think juliank is overdue a blogpost on all the planets which just lists these commands and ends with [14:31] .... "guess what all of the above do?! Your welcome!" [14:31] s10gopal, post the output of uname -a from a terminal and we will know. [14:31] Linux gopal-HP-Notebook 4.12.0-041200-generic #201804041240 SMP Wed Apr 4 12:44:39 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux [14:32] s10gopal, great. Now test for the bug. I'll build the next test kernel based on the results. [14:32] s10gopal, then will do the same over again about 10 or so times. [14:32] xnox: I'd expect apt upgrade to do the wrong thing, BTEW [14:32] jsalisbury, how much time i should turn off my laptop ? [14:32] s10gopal, however long you did in the past to reproduce the bug. [14:33] s10gopal, we just want to know if this kernel exhibits the bug or not. [14:33] mamarley: it's only been around for 2 years [14:34] going to test it [14:34] thx [14:39] jsalisbury: you've going to love s10gopal :) Thanks for stepping in [14:42] juliank, but at King's Cross you can take pictures on the platform 9¾ [14:42] wrong channel [14:42] oh wel [14:57] TJ-, anytime :-) [14:58] jsalisbury: did you see that s10gopal is suspecting the RTC sync on shutdown due to discovering an old Debian bug? [15:00] TJ-, I didn't. That's what is great about a bisect. No guessing needed :-D [15:00] TJ-, many times no thinking needed either, lol [15:01] jsalisbury: I think it's an ACPI issue [15:02] TJ-, probably. Once he tests a couple of kernels, the number of commits will come way down, so we may be able to just pick it out [15:03] Yes, I hope so. It's a hell of a one to test though, because the PC needs leaving shutdown for quite a few hours to be sure of the battery drain === aaa_ is now known as aaa_|away [15:04] TJ-, arg, well I imagine this bisect will take a while then. I'll skim throught the commits between v4.12 and v4.13-rc1 and see if anything sticks out. [15:06] jsalisbury: I was going to prepare a script to generate all the kernels bisect would need to test ahead of time so he could just install them without waiting for bisect+build but he started 'clinging' to me so I had to back off [15:07] TJ-, so he's a "Klingon" ? [15:07] :) [15:08] jsalisbury: desperately wants a solution but has very little ability to read documentation/wiki and transpose into what the specific commands he needs are [15:09] TJ-, I'll help him along, he seems willing to test, so that's good. [15:09] * TJ- nods [16:06] A quick question about the changes to the kernel per Ubuntu bug 1759920, Will you guys pass the fix upstream to Debian? Were their kernels also affected similarly with ibpb freezes? [16:06] Ubuntu bug 1759920 in linux (Ubuntu Artful) "intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-image-4.13.0-37-generic)" [High,Confirmed] https://launchpad.net/bugs/1759920 === himcesjf_ is now known as him-cesjf [17:10] dijuremo: hi, i've been sent your way from #ubuntu-server about an intel-microcode issue, possibly related to bug 1759920 [17:10] bug 1759920 in linux (Ubuntu Artful) "intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-image-4.13.0-37-generic)" [High,Confirmed] https://launchpad.net/bugs/1759920 [17:12] i have some Kaby Lake laptops which are displaying the same symptoms, which seems to be closely related to sssd [17:20] dijuremo: the fix is already in the upstream linux kernel - we were using a slightly different patch that had came from a processor vendor [17:21] Gargravarr: sssd is a good way to trigger it [17:21] Gargravarr: if you have time, please follow the instructions in this comment: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1759920/comments/67 [17:21] Ubuntu bug 1759920 in linux (Ubuntu Artful) "intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-image-4.13.0-37-generic)" [High,Confirmed] [17:24] tyhicks: thanks for the link. is this known to affect 4.4 kernels as well? my laptop really does not like 4.13 [17:25] Gargravarr: it does affect 4.4 but I don't yet have a test kernel [17:25] Gargravarr: keep an eye out as I'll have a 4.4 test kernel soon [17:25] thanks. i do have an affected laptop installed with 4.13 but i'm currently testing with the 4.4 one [17:35] Gargravarr: All our machines had sssd, so probably why we were seeing them freeze consistently. [17:37] dijuremo: part of my job here has been migrating local users onto LDAP. dogfooding has meant i ran into the problem pretty frequently [17:38] Gargravarr: I maintain RHEL 7.x and Ubuntu Linux desktops and laptops all tied with sssd to Active Directory... [17:39] you have my sympathy ;) [17:40] Gargravarr: honestly has worked very well, had some issues related to DNS and very slow lookups of things, but working well now. [17:41] i built an OpenLDAP cluster from scratch [17:41] * waveform is stupid enough to run an openldap cluster at home === aaa_|away is now known as YR3aG4hQ [17:41] that's now nice and stable, but trying to do this all open-source has been really quite painful [17:41] (there are many raspberry pis ... that's my excuse) [17:42] waveform: i have more laptops than relatives :P the thought of running my own has crossed my mind === YR3aG4hQ is now known as sMFts2gy [17:42] can't deny, AD does make building and maintaing a domain significantly easier [17:43] Gargravarr: Dont want to hijack this channel talking about sssd, so PM me if you like. I did run openldap back in the beginning of the 2000s, but it was even more challenging, compiling and running Openldap on Solaris and authenticate against kerberos to get rid of NIS. Was a fun project. But do not want to deal with ldap, etc.... [17:43] i now speak fluent LDIF... [17:44] tyhicks helpfully explained why this is a kernel issue in the other channel [17:44] and why sssd is particularly good at hitting it [17:44] tyhicks: okay, i've loaded this laptop up with your -38 kernel, here goes [17:46] tyhicks: no, it froze again, immediately after successful auth [17:47] Gargravarr: You have just gave me some news, I was not aware that sssd would trigger the freeze, but now it makes sense that more and more of my systems began to freeze. I thought originally it was related to running X and the Nvidia driver since I could sometimes log in, but the after log out the system would freeze restarting lightdm [17:47] You have just *given* me some news.... [17:47] we all English good on IRC :) [17:48] Not a native english speaker, so I have an excuse, but try to make up for my mistakes when I see them ;) [17:49] yeah, definitely not related to nvidia drivers (that came up a lot on the duplicate bugs, but I've only got AMD/intel graphics here and still experienced the lockup) [17:49] Gargravarr: could you boot the same kernel but add "noibpb" to the kernel command line in grub (so that you can fully boot) and then paste the output that I requested here? https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1759920/comments/67 [17:49] Ubuntu bug 1759920 in linux (Ubuntu Artful) "intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-image-4.13.0-37-generic)" [High,Confirmed] === sMFts2gy is now known as KM9K62TkUq [17:50] * Gargravarr is learning why netcat is such an invaluable tool [17:52] tyhicks: it booted successfully without the noibpb flag, what does that do? [17:52] Gargravarr: disables the problematic code path [17:52] (among other things) [17:53] right, that explains it, version signature confirms i've booted the wrong kernel :) [17:53] woohoo! that's why I insisted on proof :) [17:53] I'm pretty confident that the fix is right [17:53] but I really do appreciate your testing [17:54] so give it another go with the right kernel and keep your fingers crossed [17:56] indeed. wasn't sure if i booted the right kernel in the first place (or whether i already had -38 installed), but it's bloody hard to see on these XPS13's - ours have frikkin' 4k screens [18:00] tyhicks: so do you want me to boot -38 with or without that flag to test? [18:03] i'm guessing without === KM9K62TkUq is now known as aaa_ [18:03] Gargravarr: without [18:06] okay, so everything is matching your output on comment #67 so far [18:06] just fired up sssd [18:06] let's see what happens... [18:08] okay, it did an auth and failed on something else (my bad, quirk of our LDAP setup i think) [18:08] most importantly, it DIDN'T freeze [18:10] nice! [18:10] yuh, script crashed before it set up the PAM profile, lemme install it... [18:11] Gargravarr: please paste the output of those commands so that I can double check your machine state [18:11] Gargravarr: either in the bug report or via paste.ubuntu.com [18:13] tyhicks: collecting it now [18:13] thanks [18:14] halle-freaking-lujah, logged in to desktop successfully [18:15] your fix looks like the ticket [18:16] :) [18:22] tyhicks: comment #72 [18:24] does the CPU generation make much difference on how badly affected the machine is? i notice our Skylake desktops haven't run into it yet, but the one machine i've checked in depth doesn't have intel-microcode installed. can't say for sure if the others do [18:24] and all the ones that are affected are Kaby Lake [18:24] Gargravarr: hmm, could you paste the output of 'sudo cat /sys/devices/system/cpu/cpu0/microcode/version' into this channel? [18:25] Gargravarr: run that command on the machine that you left the bug comment about [18:25] okay, lemme start it back up [18:25] as for CPU generation, I'm not sure... some of your machine may not have updated microcode [18:25] indeed. i thought it was part of my build script [18:26] oh feck. that was stupid. [18:26] you may not have the latest microcode on the machine that you left the comment about [18:27] my LDAP setup here involves storing the ecryptfs recovery key for a user directly in LDAP against their profile [18:27] i seem to have neglected to make an exception for root... [18:29] tyhicks: 0x7c [18:29] for the XPS 13 running your -38 kernel [18:29] i7 7560U chip [18:31] Gargravarr: what version of intel-microcode is installed? [18:32] ...it ain't [18:35] interesting. so i have (at some point) updated the BIOS (which includes microcode) to v2.5.0, released 18th Feb [18:35] Gargravarr: and 'cat /proc/sys/kernel/ibpb_enabled' prints out '1'? [18:35] only i can't find that version on Dell's site any more [18:36] seems to have been entirely replaced by 2.5.1, which makes me think the .0 version is known broken [18:37] I think you should have revision 0x84 [18:37] see https://newsroom.intel.com/wp-content/uploads/sites/11/2018/04/microcode-update-guidance.pdf for better info than I can provide [18:38] okay. i'll push the firmware up to the latest version while i'm at it [18:38] anyways, IBPB is available in your microcode so that's all that I needed to know [18:38] I need to get heads-down on these backports now [18:38] thanks again! [18:46] tyhicks: okay, so updating the system firmware has indeed pushed the microcode up to 0x84 [18:52] good deal [18:52] this is where it gets confusing, with the BIOS and userspace stuff overlapping :) [18:52] and it all meets in the kernel [18:52] jsalisbury, it is bad [18:57] s10gopal, ack. thanks for testing. I'll build the next kernel and post it shortly. [18:58] right, that's 3 hours past clocking-off time, time to go home :P