=== himcesjf_ is now known as him-cesjf | ||
f_g | jsalisbury: I/we don't have any affected hardware for testing, and our production kernel has the problematic commit reverted currently. if alkisg does not report back, feel free to ping me and I can see about spinning up a test kernel and attempting to convince our affected users to try it out ;) | 07:39 |
---|---|---|
alkisg | It's rainy today, I'm not sure I'll be able to bike to the affected school... | 07:40 |
alkisg | OK, I got remote access, testing http://kernel.ubuntu.com/~jsalisbury/lp1742630/i386/linux-image-4.13.0-25-generic_4.13.0-25.29~lp1742630_i386.deb | 08:16 |
f_g | apw: the 4.4 origin/pti branch is missing GPR scrubbing on vmexit for Intel / VMX | 08:17 |
f_g | the upstream mainline commit has both Intel and AMD in one commit, your branch contains one by AMD for AMD only | 08:18 |
f_g | a1c61c3a6dec6fca6380fa7aa294978dc84e616c in xenial's pti branch | 08:18 |
f_g | 0cb5b30698fdc8f6b4646012e3acb4ddce430788 in mainline | 08:20 |
alkisg | jsalisbury: in cooperation with the school teacher, he tells me that 4.10.0-42 boots, your 4.13.0-25 does NOT boot, and the stock 4.13.0-26 does NOT boot either | 08:59 |
alkisg | Also, now I have 3 schools affected | 09:00 |
bjf | alkisg, i can't tell if your saying that jsalisbury's test kernel also does not boot | 09:08 |
alkisg | bjf: exactly, jsalisbury's test kernel does also not boot | 09:08 |
alkisg | I can go visit some school in person if that can somehow help | 09:09 |
alkisg | E.g. maybe I could boot with `debug` and get some better error message than "it reboots"... :) | 09:18 |
=== jdstrand_ is now known as jdstrand | ||
f_g | alkisg jsalisbury: we've had positive feedback from affected users for our kernel with the revert, and Debian went the same route in their 4.9 (which got the buggy commit via -stable it seems?). maybe that would be the better short-term option given the lack of response upstream, especially with users being forced to 4.13 from a previously working 4.10.. | 09:28 |
alkisg | f_g: I don't know your case exactly, but if you have a kernel with some reverted commit that I could test, different from the one of jsalisbury's, I'd be glad to | 09:29 |
alkisg | (i386 only here though) | 09:29 |
apw | f_g, what bug number is that ... | 09:30 |
apw | alkisg, if someone can tell me what patch we are reverting i can give yuo a kernel to test at least | 09:30 |
f_g | alkisg: it's a derivative, so it's not a stock Ubuntu kernel. amd64 only as well, so that is likely not much help to you. you can just do a git revert of the offending commit, bump the version and build your own (for testing purposes). the ubuntu wiki has details ;) | 09:30 |
f_g | apw: the scsi/libsas one? #1726519 | 09:31 |
apw | f_g, the one you and alkisg are talking about | 09:31 |
alkisg | apw: jsalisbury mentioned https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1742630 | 09:31 |
ubot5 | Launchpad bug 1742630 in linux (Ubuntu Artful) "Booting from 4.13 leads to Oops: NULL pointer dereference - RIP: isci_task_abort_task+0x30/0x3e0 [isci]" [High,Triaged] | 09:31 |
f_g | both are the same bug ;) | 09:31 |
alkisg | I tested his kernel but it did NOT solve the issue | 09:31 |
apw | alkisg, his was adding the "likely fix" i assume | 09:32 |
apw | f_g, and 909657615d9b ("scsi: libsas: allow async aborts") is what you reverted with success ? | 09:33 |
f_g | apw: exactly | 09:33 |
alkisg | apw: he's mentioning what he did in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1742630 comment #5 | 09:33 |
ubot5 | Launchpad bug 1742630 in linux (Ubuntu Artful) "Booting from 4.13 leads to Oops: NULL pointer dereference - RIP: isci_task_abort_task+0x30/0x3e0 [isci]" [High,Triaged] | 09:33 |
apw | alkisg, and did you test the kernl he refers to comment #34 which to my eye claimes the revert is applied ? | 09:34 |
alkisg | apw: exactly, although he had to rebuild for i386 just for me | 09:34 |
apw | bah, i need to smack him, there are no patches in that directory | 09:34 |
apw | alkisg, and it did not fix you is that also true ? | 09:35 |
alkisg | Hehe | 09:35 |
alkisg | apw: the school teacher that tested jsalisbury's kernel reported that it did not boot | 09:35 |
alkisg | I installed it for him, he said he got a black screen when he selected it in grub, and then he booted with the previous 4.10 kernel | 09:35 |
alkisg | apw: if you spin an i386 kernel for me, I can test it as soon as it's ready | 09:36 |
alkisg | (in person) | 09:36 |
apw | no it appears if you had an i386 built then it wasn't that one | 09:36 |
alkisg | apw, that one: http://kernel.ubuntu.com/~jsalisbury/lp1742630/i386/ | 09:37 |
apw | alkisg, right, that actually #34 points to this one, which has no i386 ... http://kernel.ubuntu.com/~jsalisbury/lp1726519/ | 09:37 |
apw | alkisg, so i don't think you have actually tested the revert, but some other "likely fix" | 09:38 |
alkisg | apw: jsalisbury's prepared an amd64 build with the revert. I told him I don't have amd64 to test with that, and he issued another build of that just for me | 09:38 |
f_g | apw: that likely fix comment is by me - I based that on the Fixes stanza and commit message, like I said I don't have any affected hardware to test. | 09:38 |
alkisg | apw: so afaik they both have the same fix; but jsalisbury would know more... | 09:39 |
apw | alkisg, right but the one you just pasted to me, is not the same build ... it is a build pointed to by the "likely fix" commit | 09:39 |
alkisg | ok | 09:39 |
f_g | there is one user reporting that the patched kernel fixes the issue, so maybe alkisg's "does not boot" is unrelated | 09:39 |
apw | alkisg, so i would say basically we have no idea because jsalisbury didn't include the patches in the results so we cannot tell ... *grrrr* | 09:39 |
apw | alkisg, so all i can suggest is i build you an i386 with that definativly reverted | 09:40 |
alkisg | apw, if you can build/point me to an i386 build with the revert, I can test it immediately | 09:40 |
apw | ok | 09:40 |
alkisg | Cool | 09:40 |
apw | alkisg, so this is a xenial linux-hwe ... right ? | 09:40 |
alkisg | Right | 09:40 |
f_g | alkisg: if you go there in person, it would be good to check if you get an oops that matches the description. all the reports I have print the oops loud and clear right at boot | 09:41 |
alkisg | f_g: I can arrange to be there in half an hour; the problem is I can't stay there for a loooong time, let's say half/one hour | 09:42 |
alkisg | So, it'll be best if we gather whatever I need to test before going there | 09:42 |
f_g | alkisg: yes, having the kernel with revert at hand for testing is a good idea :) | 09:42 |
apw | alkisg, you can test this kernel remotely i think ? | 09:43 |
alkisg | apw: sure, but I don't mind going there, in case I can get more info with "debug" or something | 09:43 |
apw | alkisg, lets see if it is this i guess :) | 09:43 |
alkisg | ok | 09:43 |
apw | alkisg, http://people.canonical.com/~apw/lp1726519-xenial/ | 10:00 |
alkisg | apw: ty | 10:00 |
alkisg | Eh the teacher left the school, I'll go there and test in 30' | 10:05 |
apw | how annoying | 10:06 |
apw | you need a school in your house, the only sensible solution :/ | 10:07 |
f_g | apw: missing the GPR scrub on vmexit for VMX in 4.13/pti as well (compared to mainline again) | 11:18 |
apw | f_g, yep, thanks, will get that replaced, what a mess the world is right now | 11:42 |
f_g | indeed. | 11:49 |
* apw wonders how alkisg is getting on | 11:49 | |
apw | f_g, pti branch is updated with those now, pending them passing some kind of testing | 11:50 |
apw | alkisg, any luck? we are running out of runway to include anything in this respin | 11:52 |
f_g | apw: we've been running with them for a week already without any negative reports | 11:52 |
apw | f_g, great, the vmx one i am pretty confident with as they were clean applications | 11:53 |
apw | it is the other thing that is worrying me right now, but it may become moot fairly soon | 11:53 |
f_g | yep, the vmx part is from google to fix a google PoC - I guess that part is already pretty battle-tested ;) | 11:54 |
f_g | apw: initial limited testing looks good for both 4.4 and 4.13 | 12:57 |
f_g | I hope re-integrating the pti branches with mainline RETPOLINE won't cause too much problems - lots of user out there still running affected CPUs that will never get IBRS and IBPB | 12:58 |
apw | f_g, upstream will be adding ibrs/ibpb on top of their reptoline, once that exists and stops moving we'll be wanting to flip to that | 13:27 |
alkisg | apw: sorry, I went to the school but there were many students in the computer lab and it was very hard to reboot. I did install the kernel, and I'm waiting for the result tomorrow morning. | 13:34 |
alkisg | (eh, I didn't explain that we're using a server/netbooted client model, and the problem happened on the server) | 13:35 |
apw | okies, tehn we'll ignore that one for now | 13:35 |
jsalisbury | alkisg, f_g I'm reviewing the scrollbck on this channel now. I'll review the bug comments too, but it sounds like the kenrel posted in comment #5 of bug 1742630 does not boot for you, alkisg? Did the kernel apw built for you resolve the bug? | 15:11 |
ubot5 | bug 1742630 in linux (Ubuntu Artful) "Booting from 4.13 leads to Oops: NULL pointer dereference - RIP: isci_task_abort_task+0x30/0x3e0 [isci]" [High,Triaged] https://launchpad.net/bugs/1742630 | 15:11 |
alkisg | jsalisbury: correct, your kernel didn't boot, I'll know tomorrow about apw's kernel | 15:31 |
jsalisbury | alkisg, ok, thanks! | 15:32 |
alkisg | Thank you too :) | 15:32 |
jsalisbury | alkisg, if you can, can you grab a screen shot of diginal picture of my kernel when it does not boot? | 15:32 |
jsalisbury | s/of/or/ | 15:32 |
alkisg | jsalisbury: sure; will "debug" help more? | 15:32 |
alkisg | I'll check both anyways | 15:33 |
jsalisbury | alkisg, it would be good to see if there is a panic or stack trace for my test kernel | 15:33 |
alkisg | ok | 15:33 |
jsalisbury | alkisg, I'd like to send that info upstream as well, and provide feedback on the patch. | 15:33 |
jsalisbury | IDENTIFY jsalisbury | 21:15 |
apw_ | he shoots ... and misses | 21:15 |
=== apw_ is now known as apw | ||
jsalisbury | heh | 21:16 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!