[07:39] <f_g> jsalisbury: I/we don't have any affected hardware for testing, and our production kernel has the problematic commit reverted currently. if alkisg does not report back, feel free to ping me and I can see about spinning up a test kernel and attempting to convince our affected users to try it out ;)
[07:40] <alkisg> It's rainy today, I'm not sure I'll be able to bike to the affected school...
[08:16] <alkisg> OK, I got remote access, testing http://kernel.ubuntu.com/~jsalisbury/lp1742630/i386/linux-image-4.13.0-25-generic_4.13.0-25.29~lp1742630_i386.deb
[08:17] <f_g> apw: the 4.4 origin/pti branch is missing GPR scrubbing on vmexit for Intel / VMX
[08:18] <f_g> the upstream mainline commit has both Intel and AMD in one commit, your branch contains one by AMD for AMD only
[08:18] <f_g> a1c61c3a6dec6fca6380fa7aa294978dc84e616c in xenial's pti branch
[08:20] <f_g> 0cb5b30698fdc8f6b4646012e3acb4ddce430788 in mainline
[08:59] <alkisg> jsalisbury: in cooperation with the school teacher, he tells me that  4.10.0-42 boots, your 4.13.0-25 does NOT boot, and the stock 4.13.0-26 does NOT boot either
[09:00] <alkisg> Also, now I have 3 schools affected
[09:08] <bjf> alkisg, i can't tell if your saying that jsalisbury's test kernel also does not boot
[09:08] <alkisg> bjf: exactly, jsalisbury's test kernel does also not boot
[09:09] <alkisg> I can go visit some school in person if that can somehow help
[09:18] <alkisg> E.g. maybe I could boot with `debug` and get some better error message than "it reboots"... :)
[09:28] <f_g> alkisg jsalisbury: we've had positive feedback from affected users for our kernel with the revert, and Debian went the same route in their 4.9 (which got the buggy commit via -stable it seems?). maybe that would be the better short-term option given the lack of response upstream, especially with users being forced to 4.13 from a previously working 4.10..
[09:29] <alkisg> f_g: I don't know your case exactly, but if you have a kernel with some reverted commit that I could test, different from the one of jsalisbury's, I'd be glad to
[09:29] <alkisg> (i386 only here though)
[09:30] <apw> f_g, what bug number is that ... 
[09:30] <apw> alkisg, if someone can tell me what patch we are reverting i can give yuo a kernel to test at least
[09:30] <f_g> alkisg: it's a derivative, so it's not a stock Ubuntu kernel. amd64 only as well, so that is likely not much help to you. you can just do a git revert of the offending commit, bump the version and build your own (for testing purposes). the ubuntu wiki has details ;)
[09:31] <f_g> apw: the scsi/libsas one? #1726519
[09:31] <apw> f_g, the one you and alkisg are talking about
[09:31] <alkisg> apw: jsalisbury mentioned https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1742630
[09:31] <f_g> both are the same bug ;)
[09:31] <alkisg> I tested his kernel but it did NOT solve the issue
[09:32] <apw> alkisg, his was adding the "likely fix" i assume
[09:33] <apw> f_g, and 909657615d9b ("scsi: libsas: allow async aborts") is what you reverted with success ?
[09:33] <f_g> apw: exactly
[09:33] <alkisg> apw: he's mentioning what he did in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1742630 comment #5
[09:34] <apw> alkisg, and did you test the kernl he refers to comment #34 which to my eye claimes the revert is applied ?
[09:34] <alkisg> apw: exactly, although he had to rebuild for i386 just for me
[09:34] <apw> bah, i need to smack him, there are no patches in that directory
[09:35] <apw> alkisg, and it did not fix you is that also true ?
[09:35] <alkisg> Hehe
[09:35] <alkisg> apw: the school teacher that tested jsalisbury's kernel reported that it did not boot
[09:35] <alkisg> I installed it for him, he said he got a black screen when he selected it in grub, and then he booted with the previous 4.10 kernel
[09:36] <alkisg> apw: if you spin an i386 kernel for me, I can test it as soon as it's ready
[09:36] <alkisg> (in person)
[09:36] <apw> no it appears if you had an i386 built then it wasn't that one
[09:37] <alkisg> apw, that one: http://kernel.ubuntu.com/~jsalisbury/lp1742630/i386/
[09:37] <apw> alkisg, right, that actually #34 points to this one, which has no i386 ... http://kernel.ubuntu.com/~jsalisbury/lp1726519/
[09:38] <apw> alkisg, so i don't think you have actually tested the revert, but some other "likely fix"
[09:38] <alkisg> apw: jsalisbury's prepared an amd64 build with the revert. I told him I don't have amd64 to test with that, and he issued another build of that just for me
[09:38] <f_g> apw: that likely fix comment is by me - I based that on the Fixes stanza and commit message, like I said I don't have any affected hardware to test.
[09:39] <alkisg> apw: so afaik they both have the same fix; but jsalisbury would know more...
[09:39] <apw> alkisg, right but the one you just pasted to me, is not the same build ... it is a build pointed to by the "likely fix" commit
[09:39] <alkisg> ok
[09:39] <f_g> there is one user reporting that the patched kernel fixes the issue, so maybe alkisg's "does not boot" is unrelated
[09:39] <apw> alkisg, so i would say basically we have no idea because jsalisbury didn't include the patches in the results so we cannot tell ... *grrrr*
[09:40] <apw> alkisg, so all i can suggest is i build you an i386 with that definativly reverted
[09:40] <alkisg> apw, if you can build/point me to an i386 build with the revert, I can test it immediately
[09:40] <apw> ok
[09:40] <alkisg> Cool
[09:40] <apw> alkisg, so this is a xenial linux-hwe ... right ?
[09:40] <alkisg> Right
[09:41] <f_g> alkisg: if you go there in person, it would be good to check if you get an oops that matches the description. all the reports I have print the oops loud and clear right at boot
[09:42] <alkisg> f_g: I can arrange to be there in half an hour; the problem is I can't stay there for a loooong time, let's say half/one hour
[09:42] <alkisg> So, it'll be best if we gather whatever I need to test before going there
[09:42] <f_g> alkisg: yes, having the kernel with revert at hand for testing is a good idea :)
[09:43] <apw> alkisg, you can test this kernel remotely i think ?
[09:43] <alkisg> apw: sure, but I don't mind going there, in case I can get more info with "debug" or something
[09:43] <apw> alkisg, lets see if it is this i guess :)
[09:43] <alkisg> ok
[10:00] <apw> alkisg, http://people.canonical.com/~apw/lp1726519-xenial/
[10:00] <alkisg> apw: ty
[10:05] <alkisg> Eh the teacher left the school, I'll go there and test in 30'
[10:06] <apw> how annoying
[10:07] <apw> you need a school in your house, the only sensible solution :/
[11:18] <f_g> apw: missing the GPR scrub on vmexit for VMX in 4.13/pti as well (compared to mainline again)
[11:42] <apw> f_g, yep, thanks, will get that replaced, what a mess the world is right now
[11:49] <f_g> indeed.
[11:49]  * apw wonders how alkisg is getting on
[11:50] <apw> f_g, pti branch is updated with those now, pending them passing some kind of testing
[11:52] <apw> alkisg, any luck?  we are running out of runway to include anything in this respin
[11:52] <f_g> apw: we've been running with them for a week already without any negative reports
[11:53] <apw> f_g, great, the vmx one i am pretty confident with as they were clean applications
[11:53] <apw> it is the other thing that is worrying me right now, but it may become moot fairly soon
[11:54] <f_g> yep, the vmx part is from google to fix a google PoC - I guess that part is already pretty battle-tested ;)
[12:57] <f_g> apw: initial limited testing looks good for both 4.4 and 4.13
[12:58] <f_g> I hope re-integrating the pti branches with mainline RETPOLINE won't cause too much problems - lots of user out there still running affected CPUs that will never get IBRS and IBPB
[13:27] <apw> f_g, upstream will be adding ibrs/ibpb on top of their reptoline, once that exists and stops moving we'll be wanting to flip to that
[13:34] <alkisg> apw: sorry, I went to the school but there were many students in the computer lab and it was very hard to reboot. I did install the kernel, and I'm waiting for the result tomorrow morning.
[13:35] <alkisg> (eh, I didn't explain that we're using a server/netbooted client model, and the problem happened on the server)
[13:35] <apw> okies, tehn we'll ignore that one for now
[15:11] <jsalisbury> alkisg, f_g I'm reviewing the scrollbck on this channel now.  I'll review the bug comments too, but it sounds like the kenrel posted in comment #5 of bug 1742630 does not boot for you, alkisg?  Did the kernel apw built for you resolve the bug?
[15:31] <alkisg> jsalisbury: correct, your kernel didn't boot, I'll know tomorrow about apw's kernel
[15:32] <jsalisbury> alkisg, ok, thanks!  
[15:32] <alkisg> Thank you too :)
[15:32] <jsalisbury> alkisg, if you can, can you grab a screen shot of diginal picture of my kernel when it does not boot?
[15:32] <jsalisbury> s/of/or/
[15:32] <alkisg> jsalisbury: sure; will "debug" help more?
[15:33] <alkisg> I'll check both anyways
[15:33] <jsalisbury> alkisg, it would be good to see if there is a panic or stack trace for my test kernel
[15:33] <alkisg> ok
[15:33] <jsalisbury> alkisg, I'd like to send that info upstream as well, and provide feedback on the patch.
[21:15] <jsalisbury> IDENTIFY jsalisbury
[21:15] <apw_> he shoots ... and misses
[21:16] <jsalisbury> heh