UNIm95 | apw: Hi. I got kernel panic again. Where should i paste photos? | 08:59 |
---|---|---|
apw | UNIm95, file a bug, and add them to that bug: use 'ubuntu-bug linux' to make the bug | 09:01 |
UNIm95 | apw: what is ubuntu-bug linux? Console program? Website? | 09:01 |
apw | something to run in a terminal yes | 09:01 |
UNIm95 | apw: Damn. I got Launchpad error =) | 09:17 |
UNIm95 | apw: I have submitted this. What should i do next? Wait? | 09:20 |
apw | UNIm95, let us know the LP# here for one | 09:28 |
UNIm95 | apw: What do you mean with LP#? if you mean bug number under launchpad it is #1497184. Link here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1497184 | 09:31 |
ubot5 | Ubuntu bug 1497184 in linux (Ubuntu) "Kernel-panic with 3.13-64 generic kernel" [Undecided,Confirmed] | 09:31 |
apw | yep that | 09:31 |
apw | henrix, ^ looks like a regression in trusty-proposed kernel ... | 09:34 |
apw | UNIm95, does that trace seem consistent in your failures ? | 09:34 |
apw | UNIm95, particuarly is that "BUG: ..../skbuff.c:1290" part appear in other ones | 09:35 |
UNIm95 | apw: Not sure. Now I'm still using *-64- kernel and wait for another panic. When it happens i post more photos. | 09:36 |
apw | UNIm95, thanks that is exactly what we need, you are not the only person "feeling" there is an issue in here | 09:36 |
apw | UNIm95, and thanks for 1) testing -proposed and 2) reporting the issues to us | 09:37 |
UNIm95 | apw: Please. I is not a problem for me. I have backups =) | 09:38 |
apw | heh, thanks | 09:38 |
=== apw is now known as apw_ | ||
=== apw_ is now known as cafetiere | ||
=== cafetiere is now known as apw | ||
* smb feels a disturbance in the coffee namespace | 09:47 | |
henrix | smb: ok, i *may* have found something related with this bug. if your bored, here's what i found: | 09:55 |
henrix | smb: trysty -64 kernel includes commit 738ac1ebb96d ("net: Clone skb before setting peeked flag") | 09:55 |
apw | that sounds suspicious indeed | 09:55 |
henrix | smb: however, there's an upstream commit (not included in trusty yet): a0a2a6602496 ("net: Fix skb_set_peeked use-after-free bug") | 09:56 |
henrix | which fixes the other one | 09:56 |
apw | nnng | 09:56 |
smb | henrix, yeah. I am currently trying to cause it to crash with -64 in a VM. If that succeeds I can have a look with that added | 09:56 |
henrix | smb: cool, thanks | 09:57 |
smb | henrix, but use-after-free sounds a lot like what may happen | 09:57 |
henrix | smb: yeah, the only thing is that none of these functions actually show up in the kernel trace. but since the kernel is tainted, it could have happen before | 09:58 |
henrix | *don't show | 09:58 |
smb | henrix, absolutely. the pain with use-after-free | 09:58 |
henrix | smb: btw, if the issue you're hitting is really due to this commit, it seems to be related with broadcast/multicast ;-) | 10:03 |
henrix | (according to the commit msg) | 10:03 |
smb | henrix, hmmm... so the method to trigger it is actually the dhcp client running... likely... | 10:04 |
henrix | smb: build on-going in gloin:/tmp/kernel-henrix-RVtd9xTj | 11:17 |
henrix | UNIm95: i've commented on the bug with a link to a test kernel | 12:39 |
henrix | UNIm95: The comment points to the possible issue (and the fix) | 12:39 |
UNIm95 | henrix: ok. | 12:39 |
apw | UNIm95, how long did it take to reproduce in general ? | 12:40 |
henrix | apw: smb: btw, looks like this also impacs vivid | 12:40 |
UNIm95 | apw: At home laptop worked for 4 hours, than sleep for 8 hours and, finally, after 2 hours word panic. | 12:42 |
UNIm95 | work* | 12:42 |
UNIm95 | apw: one time a got this after ~30min uptime | 12:44 |
apw | henrix, gah | 12:45 |
UNIm95 | henrix: i will try this kernel a later. I need to do my work | 12:45 |
apw | henrix, i've nom'd it to those two for now | 12:45 |
UNIm95 | henrix: i have downloaded this kernel | 12:46 |
apw | UNIm95, let us know how you get on | 12:46 |
UNIm95 | Sure | 12:46 |
henrix | UNIm95: ack, thanks! | 12:46 |
apw | but i am suspicious this is the issue and fix | 12:46 |
UNIm95 | henrix: the only question: "and introduces a use-after-free bug, fixed with upstream commit" | 12:49 |
UNIm95 | It means than this bug appers with high memory load? | 12:49 |
henrix | UNIm95: from these commits description it should occur in the broadcast and multicast packages receive paths, so not really a memory stress issue | 12:50 |
henrix | s/packages/packets | 12:51 |
UNIm95 | henrix: network packets? | 12:51 |
henrix | UNIm95: yep | 12:52 |
UNIm95 | should i emulate DDOS? | 12:52 |
UNIm95 | on my own laptop? | 12:52 |
UNIm95 | Or is possible to make some thing like this. | 12:53 |
apw | UNIm95, it is predicted taht dhcpclient would tickle the bug | 12:53 |
apw | UNIm95, so setting your dhcp refresh time on your router to 5m might make it a lot more likley to occur | 12:53 |
apw | lots of packets not so likley to trigger it, as tehy are normally unicast | 12:53 |
UNIm95 | apw: and doesn't matter Wlan or ethernet? | 12:55 |
apw | i don't believe so from the description | 12:55 |
apw | henrix, ^ | 12:55 |
henrix | apw: i... don't think so either, it's in the core network code. anyway, looking at UNIm95 photos, it seems to have occurred while using ethernet (e1000e) | 12:57 |
apw | good point | 12:57 |
henrix | but again, i believe that's irrelevant -- if this is the issue, it's in the core code, so it doesn't matter | 12:57 |
UNIm95 | henrix: apw yesterday at home i used wlan and got only kernel oops that was catched with apport-gtk. | 12:58 |
UNIm95 | do you have access to ubuntu's apport infrastructure? | 12:59 |
apw | UNIm95, the symptoms of this would be pretty random, as it is a use after free which could literally break anything | 12:59 |
apw | UNIm95, we may be able to find it yes | 12:59 |
UNIm95 | look for toshiba tecra laptop a11 | 13:00 |
UNIm95 | It was at evening. 20:00<x<22:00 Berlin time | 13:01 |
smb | apw, henrix, so far not been able to hit it in a more artificial environment but I guess its random by nature | 13:19 |
apw | tjaalton, hey are you fixing this initramfs-tools thing for the firmware ... i was about to, but i see you ahve it assigned | 13:43 |
tjaalton | apw: that's when it was a kernel bug, you can have it :) | 13:44 |
apw | tjaalton, ack | 13:44 |
tjaalton | thanks | 13:44 |
psivaa | cking_: hello, it's me again :) | 14:42 |
psivaa | Just was going to disable the memory threshold for NM health check but noticed that it's been passing for a few days. | 14:42 |
psivaa | *memory threshold tests | 14:42 |
psivaa | https://jenkins.qa.ubuntu.com/view/Trusty/view/Smoke%20Testing/job/trusty-desktop-i386-smoke-health-check/ | 14:42 |
psivaa | so i'm wondering if that's OK to let that run for a few more days to see if this has really settled | 14:43 |
cking_ | psivaa, ok, lets see how it runs over the next few days | 14:43 |
psivaa | cking_: ack, thanks | 14:43 |
=== hggdh is now known as hggdh_ | ||
=== hggdh_ is now known as hggdh__ | ||
=== hggdh__ is now known as hggdh | ||
t3hSteve | hey all, is anyone around that could help me with a cpuacct cgroup question/issue/bug? | 15:47 |
apw | t3hSteve, it is eod for me, but it is just best to ask the question and see who responds | 17:20 |
t3hSteve | ok, so basically I notice that on (at least up to) 3.13.0-64, the cpu usage reported in cpuacct.stat is significantly lower than the actual CPU usage as reported by say, top or /proc/<pid>/stat | 17:22 |
t3hSteve | this is on ubuntu 12.04, I seem to recall on 14.04 it was correct, but oddly enough I think it was on the same kernel version | 17:25 |
apw | t3hSteve, hrm, that sounds odd if the kernle version is the same | 17:40 |
t3hSteve | yeah, its very odd =/ | 17:40 |
t3hSteve | looking around bug trackers I cant see any reference to anything like this | 17:40 |
t3hSteve | but its a significant under-reporting from the cgroup | 17:41 |
apw | t3hSteve, sounds like some repeatable test is required, if you could file a bug against the kernel "ubuntu-bug linux", and could detail the two platforms that produce different numbers, and how to get the numbers | 17:41 |
t3hSteve | ex top reports 120% and the cgroup reports .9 | 17:41 |
apw | t3hSteve, then someone might be able to repro it | 17:41 |
t3hSteve | I can do that | 17:41 |
t3hSteve | althought we'll see how easy it is to reproduce :P | 17:41 |
apw | t3hSteve, heh .. i know ... things like this normaly go away when you try and make them reproducible | 17:42 |
t3hSteve | :P | 17:43 |
t3hSteve | the full context is we're running mesos in AWS | 17:43 |
apw | t3hSteve, drop the LP#number in here when you have done so, so we can find it | 17:43 |
t3hSteve | and the numbers I get out of mesos (which just reads the cgroup) are way different from what I see on the machine itself (top, /proc, etc) | 17:43 |
UNIm95 | henrix: apw damn. without kernel change i have 10 hours uptime without panic. | 18:34 |
t3hSteve | so re: my cgroup issue, it looks like the more threads a process has the more the cpuacct.stats numbers diverge | 19:59 |
t3hSteve | apw: https://bugs.launchpad.net/ubuntu/+source/linux-lts-raring/+bug/1497447 | 20:36 |
ubot5 | Ubuntu bug 1497447 in linux-lts-raring (Ubuntu) "cgroup cpuacct.stats underreports cpu usage" [Undecided,New] | 20:36 |
t3hSteve | I'm not sure if I filed it against the right package? | 20:36 |
old_benz | Hi all, I think the "flo" branch needs to be patched to support newer Nexus 7 devices with an updated eMMC controller, I need to compile a new kernel and patch my boot.img and recovery.img in order to get Ubuntu Touch to install on my Nexus 7 | 20:43 |
old_benz | details are here: https://github.com/ddagunts/UTCWM_N7_patch | 20:43 |
apw | old_benz, if you could file a bug against linux-flo with that information and drop the bug # in here ... | 21:41 |
old_benz | apw: will do, need to make an account | 22:27 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!