[11:07] <smb> tseliot, Hey, we had been a look at bug 1426492 this morning as this has a ugly bus error in the logs. I just tried but could only confirm the problem of producing false crash reports issues, which actually seems to be a dkms one
[11:08] <smb> tseliot, Have you heard of any other "bus error" breakage or might that way of failing be a one off
[11:38] <tseliot> smb: no, I haven't heard of it but DKMS seems to fail at random, or rather report failing even when it doesn't (LP: #1268257)
[11:39] <smb> tseliot, Right, that is the way I see often enough and looks to be dkms' fault for using exec in the middle of /etc/kernel/postinst.d/dkms
[11:40] <smb> Basically one attempt fails because the headers are not done, yet. then the next one succeeds and you get a crash report while everything is shiny
[11:40] <tseliot> oh
[11:42] <tseliot> smb: I think it's desirable to catch any failures though
[11:43] <smb> tseliot, Adam had spotted this. Well but exec ends the current script, doesn't it?
[11:43] <tseliot> yep
[11:43] <smb> What we want is the error message about headers missing and probably not error the postinst for that special case
[11:43] <tseliot> yes, I've just noticed
[11:44]  * tseliot nods
[11:44] <smb> tseliot, We should use one of those nvidia dkms fail reports to actually fix the dkms problem (which I see even in Trusty). I just was not sure whether the one with the bus error was "suitable"
[11:45] <tseliot> smb: I think we have more than enough duplicates of 1268257 to make a point ;)
[11:48] <smb> tseliot, I bet. :) Any of those you would prefer. That other one you mentioned is for 331.113... that be U or before.
[11:49] <smb> Maybe I should open a new report against dkms and we can dup the nvidia ones against that as we see fit
[11:49] <tseliot> smb: that would work too
[11:56] <smb> tseliot, ok, so here we go ... bug 1427175
[11:56] <tseliot> smb: great, thanks. I'll also talk to upstream about it
[11:57] <smb> tseliot, Ok, sounds good
[15:51] <smb> bjf, sforshee, I think patch#3 which was sent for bug 1410852 really has some issue. Just did a backport myself which would not remove a function. Would it be enough to re-submit just that one or should I send all three again (which I just used for a test build)
[15:52] <bjf> smb, do all 3 so it's all clear
[15:53] <smb> bjf, ok. ack
[16:15] <smb> bjf, sforshee, kamal, Ok, sent... of course with another (though slightly minor) glitch. Maybe if you apply it, maybe you can drop the OCD drop of that additional newline
[16:16] <bjf> apw, so just how broken _is_ ipv6?
[16:16] <bjf> apw, completely or only for certain cases?
[16:16] <apw> bjf, as broken as it was last time, exploding at random several times a day
[16:16] <apw> bjf, remember stgrabers issue ?
[16:16] <bjf> apw, no
[16:17] <apw> we had a bad backport in stable of an ipv6 sizing patch, which was triggering bangs on mapped ipv4 addresss
[16:17] <bjf> apw, i'm trying to decide if it warrants a respin to fix this one issue or wait for the SRU cycle which starts today
[16:17] <apw> kamal, what did we do last time, did we respin for it then ?
[16:18] <apw> bjf, its pretty serious if you have ipv6 enabled and you are a server
[16:18] <stgraber> last time I had to wait for two kernel updates (about a month) to get a fix
[16:18] <stgraber> which meant reverting to a kernel with a known security issue on public facing machines...
[16:18] <bjf> apw, sounds like a respin
[16:18] <apw> oh we clearly don't care about you :)
[16:18] <apw> stgraber, ugg
[16:19] <kamal> iirc, we fixed and released it promptly, once the problem had been identified
[16:20] <bjf> kamal, you did such a fantastic job last week ... :-)
[16:20] <kamal> ... someone else doesn't want a turn?
[16:21] <bjf> kamal, i'd be happy to do it
[16:21] <apw> sforshee, anyhow i've written that up in the bug ... sigh ... i guess we need to apply it ... again
[16:21] <bjf> will have to respin hwe-trusty as well
[16:22] <kamal> bjf, I'm swamped (with the *other* IPv6 issue, among other things) ... if you want to take this one, that would be great.
[16:22] <bjf> kamal, not a problem. i'll take it
[16:22] <bjf> apw, i should just revert the second one and spin right?
[16:23] <stgraber> kamal: previous bug was reported on the 20th of December, fixed on the 6th of January and released to -updates on the 2nd of February, you and I have a different definition of "promptly" :)
[16:23] <apw> bjf, yeah that is the more correct version
[16:23] <apw> stgraber, i don't think we were engaged with it till much later, when we started looking at it, it was quick (for us) from there
[16:23] <apw> i blame those christmas things
[16:24] <bjf> stgraber, we would have loved to have gotten to it quicker but were forced to take shutdown days ... so sorry
[16:26] <stgraber> oh yeah, the whole 20th of December till early January was totally expected due to company shutdown :) It's just that to me a month doesn't really qualify as "promptly" :)
[16:26] <stgraber> anyway, enough complaining for today :)
[16:26] <bjf> sforshee, you want to do a quick respin of trusty and hwe-trusty? would be good practice :-)
[16:28] <sforshee> bjf: well *want* might be a strong word, but I'm willing ;-)
[16:32] <bjf> sforshee, it's yours to do. 
[16:34] <bjf> sforshee, this should not be an ABI bump
[16:37] <sforshee> bjf: ack
[19:19] <gchao_> Hi
[19:20] <gchao_> Is there someone who knows about kernel crashes? XD
[19:21] <apw> gchao_, if you have a kernel crash, file a bug as that will get the requsite info from your machine
[19:22] <gchao_> hi apw! thanks for the response
[19:22] <gchao_> so filing a bug is standard procedure even if I'm just troubleshooting?
[19:24] <apw> gchao_, if you want help from outside it is easiest to see things like the crash stack as they are large
[19:24] <apw> if you have one you could pastebin it too, and someone might have ideas
[19:25] <apw> as people are on varying timezones it is hard to keep abreast of the whole story if it isn't all in one place and the bug is a good place for that
[19:26] <gchao_> The thing is - I don't even get a crash stack (the one under /var/crash/ ?) I was trying to use crash to debug it but nothing is there. A real "crash" never occurs, but it gets stuck in an endless panic loop
[19:26] <gchao_> and seems to ignore the magic button.
[19:27] <gchao_> magic key*
[19:33] <apw> gchao_, that is tricky, can you get a photo of the errors perhaps when they start?
[19:34] <apw> some kind of hint might tell someone enough to help
[19:35] <pepee> I'd try using netconsole