=== gerald is now known as Guest65404 | ||
smb | tseliot, Hey, we had been a look at bug 1426492 this morning as this has a ugly bus error in the logs. I just tried but could only confirm the problem of producing false crash reports issues, which actually seems to be a dkms one | 11:07 |
---|---|---|
ubot5 | bug 1426492 in nvidia-graphics-drivers-340 (Ubuntu) "nvidia-340 340.76-0ubuntu1: nvidia-340 kernel module failed to build" [High,Confirmed] https://launchpad.net/bugs/1426492 | 11:07 |
smb | tseliot, Have you heard of any other "bus error" breakage or might that way of failing be a one off | 11:08 |
=== swordsmanz is now known as hugbot | ||
tseliot | smb: no, I haven't heard of it but DKMS seems to fail at random, or rather report failing even when it doesn't (LP: #1268257) | 11:38 |
ubot5 | Launchpad bug 1268257 in nvidia-graphics-drivers-331-updates (Ubuntu) "nvidia-331-updates 331.38-0ubuntu3: nvidia-331-updates kernel module failed to build, with only error: "objdump: '... .tmp_nv.o': No such file"" [High,Triaged] https://launchpad.net/bugs/1268257 | 11:38 |
smb | tseliot, Right, that is the way I see often enough and looks to be dkms' fault for using exec in the middle of /etc/kernel/postinst.d/dkms | 11:39 |
smb | Basically one attempt fails because the headers are not done, yet. then the next one succeeds and you get a crash report while everything is shiny | 11:40 |
tseliot | oh | 11:40 |
tseliot | smb: I think it's desirable to catch any failures though | 11:42 |
smb | tseliot, Adam had spotted this. Well but exec ends the current script, doesn't it? | 11:43 |
tseliot | yep | 11:43 |
smb | What we want is the error message about headers missing and probably not error the postinst for that special case | 11:43 |
tseliot | yes, I've just noticed | 11:43 |
* tseliot nods | 11:44 | |
smb | tseliot, We should use one of those nvidia dkms fail reports to actually fix the dkms problem (which I see even in Trusty). I just was not sure whether the one with the bus error was "suitable" | 11:44 |
tseliot | smb: I think we have more than enough duplicates of 1268257 to make a point ;) | 11:45 |
smb | tseliot, I bet. :) Any of those you would prefer. That other one you mentioned is for 331.113... that be U or before. | 11:48 |
smb | Maybe I should open a new report against dkms and we can dup the nvidia ones against that as we see fit | 11:49 |
tseliot | smb: that would work too | 11:49 |
smb | tseliot, ok, so here we go ... bug 1427175 | 11:56 |
ubot5 | bug 1427175 in dkms (Ubuntu) "dkms postinst should handle missing headers" [Undecided,New] https://launchpad.net/bugs/1427175 | 11:56 |
tseliot | smb: great, thanks. I'll also talk to upstream about it | 11:56 |
smb | tseliot, Ok, sounds good | 11:57 |
=== hugbot is now known as swordsmanz | ||
smb | bjf, sforshee, I think patch#3 which was sent for bug 1410852 really has some issue. Just did a backport myself which would not remove a function. Would it be enough to re-submit just that one or should I send all three again (which I just used for a test build) | 15:51 |
ubot5 | bug 1410852 in linux (Ubuntu Trusty) "restarting container with a vlan interface results in kernel stack trace" [Medium,In progress] https://launchpad.net/bugs/1410852 | 15:51 |
bjf | smb, do all 3 so it's all clear | 15:52 |
smb | bjf, ok. ack | 15:53 |
smb | bjf, sforshee, kamal, Ok, sent... of course with another (though slightly minor) glitch. Maybe if you apply it, maybe you can drop the OCD drop of that additional newline | 16:15 |
bjf | apw, so just how broken _is_ ipv6? | 16:16 |
bjf | apw, completely or only for certain cases? | 16:16 |
apw | bjf, as broken as it was last time, exploding at random several times a day | 16:16 |
apw | bjf, remember stgrabers issue ? | 16:16 |
bjf | apw, no | 16:16 |
apw | we had a bad backport in stable of an ipv6 sizing patch, which was triggering bangs on mapped ipv4 addresss | 16:17 |
bjf | apw, i'm trying to decide if it warrants a respin to fix this one issue or wait for the SRU cycle which starts today | 16:17 |
apw | kamal, what did we do last time, did we respin for it then ? | 16:17 |
apw | bjf, its pretty serious if you have ipv6 enabled and you are a server | 16:18 |
stgraber | last time I had to wait for two kernel updates (about a month) to get a fix | 16:18 |
stgraber | which meant reverting to a kernel with a known security issue on public facing machines... | 16:18 |
bjf | apw, sounds like a respin | 16:18 |
apw | oh we clearly don't care about you :) | 16:18 |
apw | stgraber, ugg | 16:18 |
kamal | iirc, we fixed and released it promptly, once the problem had been identified | 16:19 |
bjf | kamal, you did such a fantastic job last week ... :-) | 16:20 |
kamal | ... someone else doesn't want a turn? | 16:20 |
bjf | kamal, i'd be happy to do it | 16:21 |
apw | sforshee, anyhow i've written that up in the bug ... sigh ... i guess we need to apply it ... again | 16:21 |
bjf | will have to respin hwe-trusty as well | 16:21 |
kamal | bjf, I'm swamped (with the *other* IPv6 issue, among other things) ... if you want to take this one, that would be great. | 16:22 |
bjf | kamal, not a problem. i'll take it | 16:22 |
bjf | apw, i should just revert the second one and spin right? | 16:22 |
stgraber | kamal: previous bug was reported on the 20th of December, fixed on the 6th of January and released to -updates on the 2nd of February, you and I have a different definition of "promptly" :) | 16:23 |
apw | bjf, yeah that is the more correct version | 16:23 |
apw | stgraber, i don't think we were engaged with it till much later, when we started looking at it, it was quick (for us) from there | 16:23 |
apw | i blame those christmas things | 16:23 |
bjf | stgraber, we would have loved to have gotten to it quicker but were forced to take shutdown days ... so sorry | 16:24 |
stgraber | oh yeah, the whole 20th of December till early January was totally expected due to company shutdown :) It's just that to me a month doesn't really qualify as "promptly" :) | 16:26 |
stgraber | anyway, enough complaining for today :) | 16:26 |
bjf | sforshee, you want to do a quick respin of trusty and hwe-trusty? would be good practice :-) | 16:26 |
sforshee | bjf: well *want* might be a strong word, but I'm willing ;-) | 16:28 |
bjf | sforshee, it's yours to do. | 16:32 |
bjf | sforshee, this should not be an ABI bump | 16:34 |
sforshee | bjf: ack | 16:37 |
=== adam_g_out is now known as adam_g | ||
gchao_ | Hi | 19:19 |
gchao_ | Is there someone who knows about kernel crashes? XD | 19:20 |
apw | gchao_, if you have a kernel crash, file a bug as that will get the requsite info from your machine | 19:21 |
gchao_ | hi apw! thanks for the response | 19:22 |
gchao_ | so filing a bug is standard procedure even if I'm just troubleshooting? | 19:22 |
apw | gchao_, if you want help from outside it is easiest to see things like the crash stack as they are large | 19:24 |
apw | if you have one you could pastebin it too, and someone might have ideas | 19:24 |
apw | as people are on varying timezones it is hard to keep abreast of the whole story if it isn't all in one place and the bug is a good place for that | 19:25 |
gchao_ | The thing is - I don't even get a crash stack (the one under /var/crash/ ?) I was trying to use crash to debug it but nothing is there. A real "crash" never occurs, but it gets stuck in an endless panic loop | 19:26 |
gchao_ | and seems to ignore the magic button. | 19:26 |
gchao_ | magic key* | 19:27 |
apw | gchao_, that is tricky, can you get a photo of the errors perhaps when they start? | 19:33 |
apw | some kind of hint might tell someone enough to help | 19:34 |
pepee | I'd try using netconsole | 19:35 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!