[08:38] apw, please see https://bugzilla.kernel.org/show_bug.cgi?id=198665 , how to disable "hwclock --systohc" during shutdown ? [08:38] bugzilla.kernel.org bug 198665 in Power-Off "Battery drains when laptop is off (shutdown) . WOL disabled and no usb device connected." [High,Needinfo] [08:42] i think he did it by stopping the clock update in "/etc/init.d/hwclock.sh" in the stop phase [08:42] if false && /sbin/hwclock --rtc=/dev/$HCTOSYS_DEVICE --systohc $HWCLOCKPARS $BADYEAR; then [08:42] ^ i am going to guess that the original script has that line without the initial false && in it [08:42] apw, sorry i am unable to understand it [08:44] cat: /etc/init.d/hwclock.sh: No such file or directory [08:45] s10gopal, hmm, i guess it would be in a systemd unit now, somewhere, sigh [08:45] ubuntu 14.04 dont come with systemd [08:46] odd i have a hwclock.sh on my system so i am supprised you don't [08:48] s10gopal, likely it is /etc/init/hwclock-save.conf [08:48] cat q [08:49] apw, http://paste.ubuntu.com/p/GMP8PMKxFB/ [08:50] s10gopal, looks likely from teh description and the commands it executes ... that that is what is writing back your clock [08:51] s10gopal, the test they want you to do is to comment out the bit which updates the clock, the hwclock line there, and then turn [08:51] s10gopal, off your laptop and see how bad the drain is then [08:52] sorry i didnt got it [08:53] apw, you think they will fix it ? [08:54] s10gopal, no they are asking you to test something to see if your bug is the same as another bug [08:55] s10gopal, to find that out they want you to stop the system from updating the clock, which is that line in the file [08:56] the bug which zai posted is reported in 2012 and still it is not fixed [08:56] apw, my problem dont occur upto kernel 4.12 [08:56] s10gopal, right, as they decided if wriiting to your clock breaks your machine that that is not really an OS issue [08:57] it starts from kernel 4.13 [08:57] s10gopal, so likely this is not the saem problem [08:57] s10gopal, and doing what they ask will prove that and they can realise it is a new problem [08:58] apw, #ubuntu-kernel user's can fix it too? [09:04] apw, can you please guide me, how i can learn linux kernel and driver development ? [09:05] s10gopal, you should be able to comment out that line and do the test they want pretty easily [09:05] as for learnign kernel development you really want to get a book if you are starting from nothing [09:05] which book ? [09:07] is knowledge of digital electronics is also required ? [09:08] i had things like the linux device drivers from o'reilly, but that would be woefully out of date these days [09:10] i can't say i have needed a book for like 15 years, so i have no current suggestions [09:10] and you know about electronics things too ? [09:10] i happen to, but i don't think it is necessary [09:11] first i should read linux book ? [10:57] There was a typo, should had been DELL 5820, Sorry: Guys, I have another example, a DELL T5820, running the 4.4.0-116 kernel. It freezes with any BIOS past 1.13. I am lucky I have a copy of that BIOS saved, because the DELL site is only listing 1.32 and 1.40, both of which result in the machine immediately freezing when it reaches the GUI login window. That particular machine was running BIOS 1.20 and just by downgrading to 1.13 the m [10:58] tomreyn: Just as TJ-: explained it is very crazy, but seems like many DELL and Lenovo systems with their latest BIOS result in frozen machines, sometimes just on boot to the ligthdm GUI login screen, sometimes they freeze after login out after a first succesful log in. [11:01] uuh, fun! not! [11:02] dijuremo: your last but one message was cut off after: "That particular machine was running BIOS 1.20 and just by downgrading to 1.13 the m" [11:02] dijuremo, we have had reports of new bios carrying unwell microcode for the cpu [11:02] (due to IRC line length limits= [11:03] dijuremo, have you tested with latest intel-microcode in teh archive ? [11:03] which would have released in teh last couple of days by the looks of it [11:04] apw: By archive you mean the latest PPA repo for Intel-Microcode? We did and both the .deb and BIOS had the same Level. [11:05] dijuremo, i meant the one in -security "now" [11:05] https://launchpad.net/~ubuntu-security-proposed/+archive/ubuntu/ppa/+build/14453957 from 20180313? We tried that microcode [11:06] It matches the microcode in the latest BIOS' [11:06] dijuremo, that looks like the right one [11:06] and if you downgrade your bios and leave that microcode installed does it keep working or continue to break [11:09] Yes, we tried downgrading BIOS with and without latest intel-microcode package. With the older BIOS the machine does not freeze with or without the microcode installed [11:09] All of those tests were conducted on a DELL T3610. [11:10] We have seen the behavior on both DELL T5810 and the new T5820 as well as Laptop 7480 and a Lenovo M93p [11:10] It is easily reproducible. On a Machine with 16.04 either 4.4.0-1116 or 4.13.0 kernels if you install the latest BIOS then the machines freeze. [11:11] Now I have a repo of old BIOS' for these machines, but DELL is no longer posting the very old BIOS' so for other people who cannot get the older BIOS' it will be end of game... [11:11] so is there an older kernel which works this bios or is the bios simply trash [11:12] It is not just one BIOS. [11:12] it is all bioses since 1.13 right ? [11:12] In the DELL T5820 it seems that way. [11:12] that all of them have the issue does not tell us whether the bios or kernl are at fault [11:13] We had one machine on 1.20 which worked OK for a few days then started doing the constant freeze. We downgraded to 1.13 and it worked. [11:13] have you asked dell about it ? [11:13] For the T3610 then you have to stay in BIOS A14 becuase the latest A16 freezes the machine. [11:14] likely whatever they make teh bios kits from is the same and has the saem fixes applied across the board [11:14] Not yet, but as I stated, it is not DELL only. Lenovo M93p is the same issue. [11:14] with a total freeze it is hard to be sure if it is teh same issue, so we should take care [11:14] with making that as an assertion imo [11:15] Also issue with DELL is they would not care likely, they sold the machines with windows, so when I say I am running Ubuntu they are going to say put windows and we will give you support... [11:15] often the way, though many of the machines are certified with ubuntu too [11:16] You are right about the assumption, but it has been very repeatable that the BIOS downgrade fixes the issues. [11:16] ALso none of the nopti or nospectre_v2 kernel options have helped. And the problem freeze seems to be triggered much faster in GUI mode with Nvidia cards. [11:17] right, but that symptom that downgrading the bios works, would normally say "bios to blame" [11:17] them not caring is somewhat of an orthoganal issue [11:18] if we are to have any real hope of finding out what in the kerenl is tickling the broken bit of the bios [11:18] we would need to find a kernel which does not trigger the issue [11:18] Some machines were working OK until the latest 4.4.0 and 4.13.0 releases... but I have not really tried gong that rabbit whole of trying 4.4.0-abc (older versions) [11:18] so we can find out what changed [11:20] dijuremo, it may also be worth testing the latest kernel which just promoted to -proposed [11:20] Which latest? 4.13.0 the hwe-16.04? [11:21] iirc there is a new 4.4.0 [11:22] a -119, you never know it might be the magic bullet [11:24] Where would I find that one? [11:24] xenial-proposed [11:27] seems like I got bitten by the new intel microcode as it comes with Ubuntu now... [11:27] bitten ? [11:28] system hanging after first boot with that installed, but it works fine after an UEFI firmware update [11:29] that sounds a bit mad on first reading [11:29] at least, I hope it works fine; it was hanging during boot every time and I can work now... [11:29] tyhicks, ^ food for thought on the "when to add hard depends on microcode" debate [11:36] this motherboard: https://www.asus.com/Motherboards/H170M-PLUS/ with a Core i5-6600 [11:37] and Ubuntu 17.10 [11:46] "sig 0x000506e3, pf_mask 0x36, 2017-11-16, rev 0x00c2, size 99328" [11:46] apw: I cannot seem to find the -119 update, it should be part of the same repo where I got the microcode from, right? [11:52] dijuremo, in the primary archive in the -proposed pocket [11:53] https://launchpad.net/~ubuntu-security-proposed/+archive/ubuntu/ppa [11:54] Is that where it should be? [11:54] dijuremo, no in the primary archive, in the xenial-proposed pocket [11:55] $ rmadison -a source linux -s xenial,xenial-updates,xenial-security,xenial-proposed [11:55] linux | 4.4.0-21.37 | xenial | source [11:55] linux | 4.4.0-116.140 | xenial-updates | source [11:55] linux | 4.4.0-116.140 | xenial-security | source [11:55] linux | 4.4.0-119.143 | xenial-proposed | source [11:55] So add to my sources.list: deb http://us.archive.ubuntu.com/ubuntu/ xenial-proposed main restricted [11:56] or just download it from the pool [11:56] or indeed from the launchpad librarian [11:56] https://launchpad.net/ubuntu/+source/linux/4.4.0-119.143 [12:00] apw: got it now, updating... [12:19] apw, hi, did you see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1758856/comments/1 [12:19] Ubuntu bug 1758856 in linux (Ubuntu Artful) "retpoline hints: primary infrastructure and initial hints" [Undecided,New] [12:20] Ubuntu-4.15.0-14.15 likely has the same issue [12:28] ricotz, no, but i also hit the issue myself this morning, and am already poking it [12:29] Setting up linux-image-generic-hwe-16.04 (4.13.0.38.57 -> Was able to boot and log in to GUI, but on log off, machine froze. Numlock key no longer works, cannot ssh into it. [12:29] apw, alright [12:29] Next test 4.4.0-119 [12:42] apw: So far seems like 4.4.0-119 is *the* magic bullet on the DELL T3610 with A19 BIOS. I was able to log in and log out twice, ran some Matlab and glxgears and no freezes so far, whereas 4.13.0.38.57 froze the machine after loging out the first time. [12:44] # uname -r [12:44] 4.4.0-119-generic [12:44] Using the microcode from 20180312 [12:44] ii  intel-microcode                             3.20180312.0~ubuntu16.04.1                   amd64        Processor microcode firmware for Intel CPUs [12:44] cat /sys/devices/system/cpu/cpu0/microcode/version   [12:44] 0x42c [12:53] Ugh I cannot type today... *DELL T3610 with A16 BIOS* [12:54] Running the torture test with prime95, keeping and eye on Temps on the machine as well. [12:57] dijuremo, well that is something, so that implies there is something between -116 and -119 which helps with this hang, so it is worth getting the lenovo tested too [12:57] dijuremo, and i do know we had a heap of stable updates applied in this new cycle [12:58] I have another one, an Optiplex 7050 freezing on 4.4.0-116 so going to try 119 on that one. [12:58] Will also try and find the Lenovot and test 119 [12:58] dijuremo, so if the latest xenial-proposed hwe kerenl doesn't fix the issue for that bios level, we might need you [12:59] dijuremo, to try some of the stable updates to see if we can narrow down where the fix is in the 4.4.x series so we can apply it to the later kerenls too [12:59] dijuremo, but first lets see if -119 is the universal panacea for the hangs on all your boxes [12:59] dijuremo, and do report the lot into your bug [13:00] Sounds like a plan. [13:07] apw: Here is another finding. One system without BIOS update started freezing. It has several 4.4.0 kernels installed. It recently got the intel-microcode update from 20180312. [13:07] * apw crawls into a corner and quietly dies [13:07] Both the 4.4.0-112 and 4.4.0-116 kernels lead to a frozen machine right after lightdm starts. However, 4.4.0-109 does not freeze the machine once the GUI starts. [13:09] On that machine, the intel-microcode package was installed on 3/30/2018 and then the user started to see the freezing. [13:21] dijuremo, 109 or 119 ? [13:45] if there is no UEFI upgrade available, you should be able to boot into the recovery console from GRUB & remove the microcode from there [13:53] apw: we've gotten quite a few reports of lockups (bug #1759920) with new BIOS or updated intel-microcode [13:53] bug 1759920 in linux (Ubuntu Artful) "intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-image-4.13.0-37-generic)" [High,Confirmed] https://launchpad.net/bugs/1759920 [13:53] apw: I haven't been able to reproduce it [13:54] tyhicks, it sounds like that there might be some relief in later 4.4 kernels at least, from whatever is tickling that [13:54] tyhicks, dijuremo is chasing it down in their bug [13:55] apw, dijuremo: while debugging, this is an important detail to remember about how the intel-microcode package works: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1759920/comments/45 [13:55] Ubuntu bug 1759920 in linux (Ubuntu Artful) "intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-image-4.13.0-37-generic)" [High,Confirmed] [14:09] tyhicks, a good point indeed [14:09] and sensibly too from a regression perspective [14:19] apw: I had an older 4.4.0-109 (one zero nine) available in the machine and with the latest microcode installed 3/31/2018 and the older BIOS 1.4.4 on the 7050 the machine would boot just fine without freezing, whereas the 4.4.0-112 and 4.4.0-116 would freeze the machine [14:20] So it seems that instability starts on 4.4.0-112 when the latest microcode is present. [14:22] I went ahead and installed 4.4.0-119 from the proposed release repo and it also worked well initially. This machine is an Optiplex 7050 which uses Intel Core i7 with built-in GPU. Since everything seemed to be OK so far, I added and Nvidia card, installed the Nvidia driver and now the machine freezes after lightdm starts even with 4.4.0-119 [14:23] sweet. dont use nvidia - problem solved! [14:23] j/k [14:24] tyhicks: sounds good, will keep it in mind, though when we did all the prior testing, we were doing cat /sys/devices/system/cpu/cpu0/microcode/version which was in fact showing us the proper microcode versions. [14:28] FYI, the T3610 machine is still running mprime solid, CPU temps hovering around 67C, with a maximum of 70C [14:30] 4.4.0-112.135 sets CONFIG_KERNEL_NOBP=y according to http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_4.4.0-116.140/changelog [14:31] BP = branch predeiction, https://patchwork.kernel.org/patch/10168887/ [14:32] umm, ignore, wrong architecture [14:52] So for the DELL 7050 I am currently on BIOS Version: 1.8.2, the microcode is: 0x84 and I have the Intel microcode package installed 3.20180312.0~ubuntu16.04.1. Nvidia driver is also installed and version 384.111-0ubuntu0.16.04.1. I can consistently freeze the machine with kernels 4.4.0-112 4.4.0-116 and 4.4.0-119 with the previous configuration. If I boot the kernel 4.4.0-109 the machine does *not* freeze. [15:02] apw, the 4.15.0-14-generic kernel in proposed is causing some dkms breakage, any ideas about this? https://launchpadlibrarian.net/363092142/DKMSBuildLog.txt [15:03] apw, i see the same thing for sysdig and other DKMS modules too [15:05] cking, yes, i know what that is [15:05] /bin/bash: ./debian/scripts/retpoline-extract-one: No such file or directory [15:05] yep [15:06] i know how to fix that, and will be doing so shortly [15:06] ok ta, perhaps it can be fixed against bug 1760876 [15:06] bug 1760876 in fwts (Ubuntu) "fwts-efi-runtime-dkms 18.03.00-0ubuntu1: fwts-efi-runtime-dkms kernel module failed to build" [Medium,In progress] https://launchpad.net/bugs/1760876 [15:06] cking, sure [15:06] thanks! [15:34] djinni: that makes sense because the 4.4.0-109 kernel didn't make use of the IBRS/IBPB features provided by the new microcode [15:34] bah [15:35] dijuremo: that makes sense because the 4.4.0-109 kernel didn't make use of the IBRS/IBPB features provided by the new microcode [15:35] djinni: sorry for the invalid ping [15:36] tyhicks, 'yay' [15:36] that microcode is just a living disaster in all its forms [15:37] apw: either the microcode for some processors is bad or we have a bad IBRS/IBPB patch in our 4.4 and 4.13 kernels [15:37] i guess that is entirely possible, though we have been using those kernels heavily in-house, but ... [15:38] then again i thought that the reporter had turned off those mitigations [15:39] i guess we will find out when the bug is updated [15:39] tyhicks, you might want to check that dijuremo has tested with the ibrs ibpb sysfs thingies disabled too [15:39] thoguh we should not be using ibrs if we have retpoline on, so its not likely to be that one [15:41] dijuremo: it would be helpful if you could verify if this bug comment also works for you: https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1759920/comments/55 [15:41] Ubuntu bug 1759920 in linux (Ubuntu Artful) "intel-microcode 3.20180312.0 causes lockup at login screen(w/ linux-image-4.13.0-37-generic)" [High,Confirmed] [15:42] dijuremo: I think the best situation to test would be the 4.4.0-119 kernel, with updated microcode (0x84 for your Dell 7050), and seeing if you add noibpb to the kernel command line prevents the lockups [15:43] dijuremo: if that doesn't work, do the same test but add "noibrs to the kernel command [15:44] dijuremo: let me try that again without hitting enter too early... if that doesn't work, do the same test but add "noibrs noibpb" to the kernel command [16:27] OK, here are the latest developments, without just yet trying noibpbp or noibrs noibpb: The DELL 5820 came back for freezing. I have updated it to the latest BIOS, 1.4.0 Removed all kernels but 4.4.0-92 (Does not freeze), installed 4.4.0-109 (Does not freeze). Installed 4.4.0-119 and machine freezes. Next step, use -119 with noibpb.... [16:36] Both the DELL T5820 and the Optiplex 7050 work properly (no more immediate freezes when it gets to the GUI login screen) when I use 4.4.0-119 and the *noibpb* kernel option [16:39] ok, very good to know [16:43] So noibpb should work for both 4.4.0-119 and 4.13.0-38 right? How about 4.13.0-37? [16:46] dijuremo: noibpb will be honored in all three of those kernels [16:58] 4.13.0-38 tested on T5820 and no freeze with noibpb, 4.13.0-37 tested on Optiplex 7050 and no freeze with the same option. === himcesjf_ is now known as him-cesjf [17:35] tyhicks, ugg [18:39] apw, can you please guide me how to do a git bisect run to find the first bad commit ? [18:44] So for now by disabling ibpb we are making the systems vulnerable to some form of spectre, does this sound right? [18:54] if it works on 4.12 but not on 4.13, it is a regression [18:54] TJ-, can you please guide me ? [19:01] s10gopal: see https://wiki.ubuntu.com/Kernel/KernelBisection#How_do_I_bisect_the_upstream_kernel.3F [19:01] TJ-, should i test sub version between 4.12 and 4.13 too ? [19:03] TJ-, i am unable to understand above guide , can you please explain it ? [19:04] s10gopal: identify the closest 2 versions where the lower works and the higher does not, then git bisect between those 2 versions and test it one [19:04] s10gopal: Do you know what "bisect" means? [19:04] TJ-, no [19:05] TJ-, line which cut two things ? [19:05] s10gopal: it means 'split in two' ... let me give you an example [19:06] dijuremo: yes you are [19:10] s10gopal: let's assume we have an imaginary project with releases v1.0 (commit #1) through v1.5 (commit #20) with 20 commits between them. There's a regression somewhere between them - so the fastest way to find it is take the halfway point commit (#10), build the project, and test. If that test works then we know the problem must be higer than #10, so we halve again between #10 and #20 and build again [19:10] for #15, test. If it fails then we know he problem is between #10 and #15 so halve again and build #12 and test. If that works we know the issue is between #12 and #15 ... that process continues until you've only got 1 commit left, which is where the regression was introduced [19:11] TJ-, apply binary search? [19:11] s10gopal: correct, that's a bisect(ion) [19:12] using the 'git bisect' tool it does this for you, you just tell it whether the last build was GOOD or BAD and it figures out the next commit and checks it out ready for you to build [19:13] can i manually do it by installing kernel build between 4.12 and 4.13 ? [19:13] from http://kernel.ubuntu.com/~kernel-ppa/mainline/ [19:14] s10gopal: no, you need to do it from a clone of the mainline source code git repository [19:15] TJ-, can you please guide me how to build kernel from git source ? [19:15] s10gopal: it's an upstream issue so you work on the upstream repo. It's also much easier than trying to git-bisect Ubuntu kernels because they are rebased so there isn't a single linear history [19:15] s10gopal: read the guide I showed [19:22] TJ-, which link should i use ? https://launchpad.net/ubuntu/lucid/+source/linux [19:22] https://launchpad.net/ubuntu/maverick/+source/linux [19:22] https://launchpad.net/ubuntu/natty/+source/linux [19:22] https://launchpad.net/ubuntu/oneiric/+source/linux [19:22] https://launchpad.net/ubuntu/precise/+source/linux [19:22] https://launchpad.net/ubuntu/quantal/+source/linux [19:22] https://launchpad.net/ubuntu/raring/+source/linux [19:22] https://launchpad.net/ubuntu/saucy/+source/linux [19:22] https://launchpad.net/ubuntu/trusty/+source/linux [19:22] https://launchpad.net/ubuntu/utopic/+source/linux [19:22] https://launchpad.net/ubuntu/vivid/+source/linux [19:24] s10gopal: none, use upstream mainline [19:25] TJ-, this one ? http://kernel.ubuntu.com/~kernel-ppa/mainline/ [19:26] s10gopal: "git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git" then "cd linux" and you're in the 'root' of the source tree [19:38] TJ-, done [19:38] now? [19:46] TJ-, git bisect is going to check 4.12 to 4.16 right ? how i can modify to check only between 4.12 .. 4.13 ? [19:47] s10gopal: Use the versions that are applicable to you in place of those in the example on the Wiki [19:47] TJ-, git log --oneline 4.12..4.13 | wc [19:48] TJ-, ambiguous argument '4.12..4.13': unknown revision or path not in the working tree. [19:49] s10gopal: so you checkout the last known good, as in 'git checkout v4.12' then do 'git bisect start' 'git bisect good v4.12' git bisect bad v4.13' ... that'll checkout at the 1/2 way commit, so then you build, install, and test. [19:52] TJ-, how to build and install ? [19:53] s10gopal: I feel like this is way beyond you. It is documented in the Wiki. I don't have the time to be able to talk you through it. [19:54] TJ-, thx a lot , can #ubuntu users help me ? [19:55] s10gopal it's beyond that kind of support, it's a very niche procedure for kernel devs [19:56] TJ-, anything like https://lwn.net/Articles/317154/ ? [19:59] s10gopal: no, running 'git bisect' is the easy bit. I'm referring to preparing to build by copying in an appropriate .config file, doing 'make silentoldconfig' then 'make bzImage' then having to copy that into /boot/vmlinuz-test, build an initrd.img 'update-initramfs -c -k test' and 'update-grub' then reboot test and repeat as required. [20:00] sorry i didnt got it [20:00] s10gopal: that's my point, this is way beyond you. [20:01] TJ-, what i can do ? [20:02] be patient! unless you can find an upstream kernel developer who wants to take it on it could take years to be addressed [20:03] TJ-, can i test sub version ? it will make bisect process faster ?