/srv/irclogs.ubuntu.com/2015/09/18/#ubuntu-kernel.txt

UNIm95	apw: Hi. I got kernel panic again. Where should i paste photos?	08:59
apw	UNIm95, file a bug, and add them to that bug: use 'ubuntu-bug linux' to make the bug	09:01
UNIm95	apw: what is ubuntu-bug linux? Console program? Website?	09:01
apw	something to run in a terminal yes	09:01
UNIm95	apw: Damn. I got Launchpad error =)	09:17
UNIm95	apw: I have submitted this. What should i do next? Wait?	09:20
apw	UNIm95, let us know the LP# here for one	09:28
UNIm95	apw: What do you mean with LP#? if you mean bug number under launchpad it is #1497184. Link here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1497184	09:31
ubot5	Ubuntu bug 1497184 in linux (Ubuntu) "Kernel-panic with 3.13-64 generic kernel" [Undecided,Confirmed]	09:31
apw	yep that	09:31
apw	henrix, ^ looks like a regression in trusty-proposed kernel ...	09:34
apw	UNIm95, does that trace seem consistent in your failures ?	09:34
apw	UNIm95, particuarly is that "BUG: ..../skbuff.c:1290" part appear in other ones	09:35
UNIm95	apw: Not sure. Now I'm still using *-64- kernel and wait for another panic. When it happens i post more photos.	09:36
apw	UNIm95, thanks that is exactly what we need, you are not the only person "feeling" there is an issue in here	09:36
apw	UNIm95, and thanks for 1) testing -proposed and 2) reporting the issues to us	09:37
UNIm95	apw: Please. I is not a problem for me. I have backups =)	09:38
apw	heh, thanks	09:38
=== apw is now known as apw_
=== apw_ is now known as cafetiere
=== cafetiere is now known as apw
* smb feels a disturbance in the coffee namespace		09:47
henrix	smb: ok, i may have found something related with this bug. if your bored, here's what i found:	09:55
henrix	smb: trysty -64 kernel includes commit 738ac1ebb96d ("net: Clone skb before setting peeked flag")	09:55
apw	that sounds suspicious indeed	09:55
henrix	smb: however, there's an upstream commit (not included in trusty yet): a0a2a6602496 ("net: Fix skb_set_peeked use-after-free bug")	09:56
henrix	which fixes the other one	09:56
apw	nnng	09:56
smb	henrix, yeah. I am currently trying to cause it to crash with -64 in a VM. If that succeeds I can have a look with that added	09:56
henrix	smb: cool, thanks	09:57
smb	henrix, but use-after-free sounds a lot like what may happen	09:57
henrix	smb: yeah, the only thing is that none of these functions actually show up in the kernel trace. but since the kernel is tainted, it could have happen before	09:58
henrix	*don't show	09:58
smb	henrix, absolutely. the pain with use-after-free	09:58
henrix	smb: btw, if the issue you're hitting is really due to this commit, it seems to be related with broadcast/multicast ;-)	10:03
henrix	(according to the commit msg)	10:03
smb	henrix, hmmm... so the method to trigger it is actually the dhcp client running... likely...	10:04
henrix	smb: build on-going in gloin:/tmp/kernel-henrix-RVtd9xTj	11:17
henrix	UNIm95: i've commented on the bug with a link to a test kernel	12:39
henrix	UNIm95: The comment points to the possible issue (and the fix)	12:39
UNIm95	henrix: ok.	12:39
apw	UNIm95, how long did it take to reproduce in general ?	12:40
henrix	apw: smb: btw, looks like this also impacs vivid	12:40
UNIm95	apw: At home laptop worked for 4 hours, than sleep for 8 hours and, finally, after 2 hours word panic.	12:42
UNIm95	work*	12:42
UNIm95	apw: one time a got this after ~30min uptime	12:44
apw	henrix, gah	12:45
UNIm95	henrix: i will try this kernel a later. I need to do my work	12:45
apw	henrix, i've nom'd it to those two for now	12:45
UNIm95	henrix: i have downloaded this kernel	12:46
apw	UNIm95, let us know how you get on	12:46
UNIm95	Sure	12:46
henrix	UNIm95: ack, thanks!	12:46
apw	but i am suspicious this is the issue and fix	12:46
UNIm95	henrix: the only question: "and introduces a use-after-free bug, fixed with upstream commit"	12:49
UNIm95	It means than this bug appers with high memory load?	12:49
henrix	UNIm95: from these commits description it should occur in the broadcast and multicast packages receive paths, so not really a memory stress issue	12:50
henrix	s/packages/packets	12:51
UNIm95	henrix: network packets?	12:51
henrix	UNIm95: yep	12:52
UNIm95	should i emulate DDOS?	12:52
UNIm95	on my own laptop?	12:52
UNIm95	Or is possible to make some thing like this.	12:53
apw	UNIm95, it is predicted taht dhcpclient would tickle the bug	12:53
apw	UNIm95, so setting your dhcp refresh time on your router to 5m might make it a lot more likley to occur	12:53
apw	lots of packets not so likley to trigger it, as tehy are normally unicast	12:53
UNIm95	apw: and doesn't matter Wlan or ethernet?	12:55
apw	i don't believe so from the description	12:55
apw	henrix, ^	12:55
henrix	apw: i... don't think so either, it's in the core network code. anyway, looking at UNIm95 photos, it seems to have occurred while using ethernet (e1000e)	12:57
apw	good point	12:57
henrix	but again, i believe that's irrelevant -- if this is the issue, it's in the core code, so it doesn't matter	12:57
UNIm95	henrix: apw yesterday at home i used wlan and got only kernel oops that was catched with apport-gtk.	12:58
UNIm95	do you have access to ubuntu's apport infrastructure?	12:59
apw	UNIm95, the symptoms of this would be pretty random, as it is a use after free which could literally break anything	12:59
apw	UNIm95, we may be able to find it yes	12:59
UNIm95	look for toshiba tecra laptop a11	13:00
UNIm95	It was at evening. 20:00<x<22:00 Berlin time	13:01
smb	apw, henrix, so far not been able to hit it in a more artificial environment but I guess its random by nature	13:19
apw	tjaalton, hey are you fixing this initramfs-tools thing for the firmware ... i was about to, but i see you ahve it assigned	13:43
tjaalton	apw: that's when it was a kernel bug, you can have it :)	13:44
apw	tjaalton, ack	13:44
tjaalton	thanks	13:44
psivaa	cking_: hello, it's me again :)	14:42
psivaa	Just was going to disable the memory threshold for NM health check but noticed that it's been passing for a few days.	14:42
psivaa	*memory threshold tests	14:42
psivaa	https://jenkins.qa.ubuntu.com/view/Trusty/view/Smoke%20Testing/job/trusty-desktop-i386-smoke-health-check/	14:42
psivaa	so i'm wondering if that's OK to let that run for a few more days to see if this has really settled	14:43
cking_	psivaa, ok, lets see how it runs over the next few days	14:43
psivaa	cking_: ack, thanks	14:43
=== hggdh is now known as hggdh_
=== hggdh_ is now known as hggdh__
=== hggdh__ is now known as hggdh
t3hSteve	hey all, is anyone around that could help me with a cpuacct cgroup question/issue/bug?	15:47
apw	t3hSteve, it is eod for me, but it is just best to ask the question and see who responds	17:20
t3hSteve	ok, so basically I notice that on (at least up to) 3.13.0-64, the cpu usage reported in cpuacct.stat is significantly lower than the actual CPU usage as reported by say, top or /proc/<pid>/stat	17:22
t3hSteve	this is on ubuntu 12.04, I seem to recall on 14.04 it was correct, but oddly enough I think it was on the same kernel version	17:25
apw	t3hSteve, hrm, that sounds odd if the kernle version is the same	17:40
t3hSteve	yeah, its very odd =/	17:40
t3hSteve	looking around bug trackers I cant see any reference to anything like this	17:40
t3hSteve	but its a significant under-reporting from the cgroup	17:41
apw	t3hSteve, sounds like some repeatable test is required, if you could file a bug against the kernel "ubuntu-bug linux", and could detail the two platforms that produce different numbers, and how to get the numbers	17:41
t3hSteve	ex top reports 120% and the cgroup reports .9	17:41
apw	t3hSteve, then someone might be able to repro it	17:41
t3hSteve	I can do that	17:41
t3hSteve	althought we'll see how easy it is to reproduce :P	17:41
apw	t3hSteve, heh .. i know ... things like this normaly go away when you try and make them reproducible	17:42
t3hSteve	:P	17:43
t3hSteve	the full context is we're running mesos in AWS	17:43
apw	t3hSteve, drop the LP#number in here when you have done so, so we can find it	17:43
t3hSteve	and the numbers I get out of mesos (which just reads the cgroup) are way different from what I see on the machine itself (top, /proc, etc)	17:43
UNIm95	henrix: apw damn. without kernel change i have 10 hours uptime without panic.	18:34
t3hSteve	so re: my cgroup issue, it looks like the more threads a process has the more the cpuacct.stats numbers diverge	19:59
t3hSteve	apw: https://bugs.launchpad.net/ubuntu/+source/linux-lts-raring/+bug/1497447	20:36
ubot5	Ubuntu bug 1497447 in linux-lts-raring (Ubuntu) "cgroup cpuacct.stats underreports cpu usage" [Undecided,New]	20:36
t3hSteve	I'm not sure if I filed it against the right package?	20:36
old_benz	Hi all, I think the "flo" branch needs to be patched to support newer Nexus 7 devices with an updated eMMC controller, I need to compile a new kernel and patch my boot.img and recovery.img in order to get Ubuntu Touch to install on my Nexus 7	20:43
old_benz	details are here: https://github.com/ddagunts/UTCWM_N7_patch	20:43
apw	old_benz, if you could file a bug against linux-flo with that information and drop the bug # in here ...	21:41
old_benz	apw: will do, need to make an account	22:27

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!