=== himcesjf_ is now known as him-cesjf
f_gjsalisbury: I/we don't have any affected hardware for testing, and our production kernel has the problematic commit reverted currently. if alkisg does not report back, feel free to ping me and I can see about spinning up a test kernel and attempting to convince our affected users to try it out ;)07:39
alkisgIt's rainy today, I'm not sure I'll be able to bike to the affected school...07:40
alkisgOK, I got remote access, testing http://kernel.ubuntu.com/~jsalisbury/lp1742630/i386/linux-image-4.13.0-25-generic_4.13.0-25.29~lp1742630_i386.deb08:16
f_gapw: the 4.4 origin/pti branch is missing GPR scrubbing on vmexit for Intel / VMX08:17
f_gthe upstream mainline commit has both Intel and AMD in one commit, your branch contains one by AMD for AMD only08:18
f_ga1c61c3a6dec6fca6380fa7aa294978dc84e616c in xenial's pti branch08:18
f_g0cb5b30698fdc8f6b4646012e3acb4ddce430788 in mainline08:20
alkisgjsalisbury: in cooperation with the school teacher, he tells me that  4.10.0-42 boots, your 4.13.0-25 does NOT boot, and the stock 4.13.0-26 does NOT boot either08:59
alkisgAlso, now I have 3 schools affected09:00
bjfalkisg, i can't tell if your saying that jsalisbury's test kernel also does not boot09:08
alkisgbjf: exactly, jsalisbury's test kernel does also not boot09:08
alkisgI can go visit some school in person if that can somehow help09:09
alkisgE.g. maybe I could boot with `debug` and get some better error message than "it reboots"... :)09:18
=== jdstrand_ is now known as jdstrand
f_galkisg jsalisbury: we've had positive feedback from affected users for our kernel with the revert, and Debian went the same route in their 4.9 (which got the buggy commit via -stable it seems?). maybe that would be the better short-term option given the lack of response upstream, especially with users being forced to 4.13 from a previously working 4.10..09:28
alkisgf_g: I don't know your case exactly, but if you have a kernel with some reverted commit that I could test, different from the one of jsalisbury's, I'd be glad to09:29
alkisg(i386 only here though)09:29
apwf_g, what bug number is that ... 09:30
apwalkisg, if someone can tell me what patch we are reverting i can give yuo a kernel to test at least09:30
f_galkisg: it's a derivative, so it's not a stock Ubuntu kernel. amd64 only as well, so that is likely not much help to you. you can just do a git revert of the offending commit, bump the version and build your own (for testing purposes). the ubuntu wiki has details ;)09:30
f_gapw: the scsi/libsas one? #172651909:31
apwf_g, the one you and alkisg are talking about09:31
alkisgapw: jsalisbury mentioned https://bugs.launchpad.net/ubuntu/+source/linux/+bug/174263009:31
ubot5Launchpad bug 1742630 in linux (Ubuntu Artful) "Booting from 4.13 leads to Oops: NULL pointer dereference - RIP: isci_task_abort_task+0x30/0x3e0 [isci]" [High,Triaged]09:31
f_gboth are the same bug ;)09:31
alkisgI tested his kernel but it did NOT solve the issue09:31
apwalkisg, his was adding the "likely fix" i assume09:32
apwf_g, and 909657615d9b ("scsi: libsas: allow async aborts") is what you reverted with success ?09:33
f_gapw: exactly09:33
alkisgapw: he's mentioning what he did in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1742630 comment #509:33
ubot5Launchpad bug 1742630 in linux (Ubuntu Artful) "Booting from 4.13 leads to Oops: NULL pointer dereference - RIP: isci_task_abort_task+0x30/0x3e0 [isci]" [High,Triaged]09:33
apwalkisg, and did you test the kernl he refers to comment #34 which to my eye claimes the revert is applied ?09:34
alkisgapw: exactly, although he had to rebuild for i386 just for me09:34
apwbah, i need to smack him, there are no patches in that directory09:34
apwalkisg, and it did not fix you is that also true ?09:35
alkisgapw: the school teacher that tested jsalisbury's kernel reported that it did not boot09:35
alkisgI installed it for him, he said he got a black screen when he selected it in grub, and then he booted with the previous 4.10 kernel09:35
alkisgapw: if you spin an i386 kernel for me, I can test it as soon as it's ready09:36
alkisg(in person)09:36
apwno it appears if you had an i386 built then it wasn't that one09:36
alkisgapw, that one: http://kernel.ubuntu.com/~jsalisbury/lp1742630/i386/09:37
apwalkisg, right, that actually #34 points to this one, which has no i386 ... http://kernel.ubuntu.com/~jsalisbury/lp1726519/09:37
apwalkisg, so i don't think you have actually tested the revert, but some other "likely fix"09:38
alkisgapw: jsalisbury's prepared an amd64 build with the revert. I told him I don't have amd64 to test with that, and he issued another build of that just for me09:38
f_gapw: that likely fix comment is by me - I based that on the Fixes stanza and commit message, like I said I don't have any affected hardware to test.09:38
alkisgapw: so afaik they both have the same fix; but jsalisbury would know more...09:39
apwalkisg, right but the one you just pasted to me, is not the same build ... it is a build pointed to by the "likely fix" commit09:39
f_gthere is one user reporting that the patched kernel fixes the issue, so maybe alkisg's "does not boot" is unrelated09:39
apwalkisg, so i would say basically we have no idea because jsalisbury didn't include the patches in the results so we cannot tell ... *grrrr*09:39
apwalkisg, so all i can suggest is i build you an i386 with that definativly reverted09:40
alkisgapw, if you can build/point me to an i386 build with the revert, I can test it immediately09:40
apwalkisg, so this is a xenial linux-hwe ... right ?09:40
f_galkisg: if you go there in person, it would be good to check if you get an oops that matches the description. all the reports I have print the oops loud and clear right at boot09:41
alkisgf_g: I can arrange to be there in half an hour; the problem is I can't stay there for a loooong time, let's say half/one hour09:42
alkisgSo, it'll be best if we gather whatever I need to test before going there09:42
f_galkisg: yes, having the kernel with revert at hand for testing is a good idea :)09:42
apwalkisg, you can test this kernel remotely i think ?09:43
alkisgapw: sure, but I don't mind going there, in case I can get more info with "debug" or something09:43
apwalkisg, lets see if it is this i guess :)09:43
apwalkisg, http://people.canonical.com/~apw/lp1726519-xenial/10:00
alkisgapw: ty10:00
alkisgEh the teacher left the school, I'll go there and test in 30'10:05
apwhow annoying10:06
apwyou need a school in your house, the only sensible solution :/10:07
f_gapw: missing the GPR scrub on vmexit for VMX in 4.13/pti as well (compared to mainline again)11:18
apwf_g, yep, thanks, will get that replaced, what a mess the world is right now11:42
* apw wonders how alkisg is getting on11:49
apwf_g, pti branch is updated with those now, pending them passing some kind of testing11:50
apwalkisg, any luck?  we are running out of runway to include anything in this respin11:52
f_gapw: we've been running with them for a week already without any negative reports11:52
apwf_g, great, the vmx one i am pretty confident with as they were clean applications11:53
apwit is the other thing that is worrying me right now, but it may become moot fairly soon11:53
f_gyep, the vmx part is from google to fix a google PoC - I guess that part is already pretty battle-tested ;)11:54
f_gapw: initial limited testing looks good for both 4.4 and 4.1312:57
f_gI hope re-integrating the pti branches with mainline RETPOLINE won't cause too much problems - lots of user out there still running affected CPUs that will never get IBRS and IBPB12:58
apwf_g, upstream will be adding ibrs/ibpb on top of their reptoline, once that exists and stops moving we'll be wanting to flip to that13:27
alkisgapw: sorry, I went to the school but there were many students in the computer lab and it was very hard to reboot. I did install the kernel, and I'm waiting for the result tomorrow morning.13:34
alkisg(eh, I didn't explain that we're using a server/netbooted client model, and the problem happened on the server)13:35
apwokies, tehn we'll ignore that one for now13:35
jsalisburyalkisg, f_g I'm reviewing the scrollbck on this channel now.  I'll review the bug comments too, but it sounds like the kenrel posted in comment #5 of bug 1742630 does not boot for you, alkisg?  Did the kernel apw built for you resolve the bug?15:11
ubot5bug 1742630 in linux (Ubuntu Artful) "Booting from 4.13 leads to Oops: NULL pointer dereference - RIP: isci_task_abort_task+0x30/0x3e0 [isci]" [High,Triaged] https://launchpad.net/bugs/174263015:11
alkisgjsalisbury: correct, your kernel didn't boot, I'll know tomorrow about apw's kernel15:31
jsalisburyalkisg, ok, thanks!  15:32
alkisgThank you too :)15:32
jsalisburyalkisg, if you can, can you grab a screen shot of diginal picture of my kernel when it does not boot?15:32
alkisgjsalisbury: sure; will "debug" help more?15:32
alkisgI'll check both anyways15:33
jsalisburyalkisg, it would be good to see if there is a panic or stack trace for my test kernel15:33
jsalisburyalkisg, I'd like to send that info upstream as well, and provide feedback on the patch.15:33
jsalisburyIDENTIFY jsalisbury21:15
apw_he shoots ... and misses21:15
=== apw_ is now known as apw

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!