/srv/irclogs.ubuntu.com/2016/10/26/#ubuntu-kernel.txt

him-cesjf	Hello, I am running Kubuntu 16.10 on a Dell laptop. For some reason, the OS hangs/responds very slowly in between while typing or switching tasks or while doing any work. During this hang/slow behaviour, I notice the fan revving in high speed, delay in typing, network disconnection and stuck digital clock seconds counter which restores to normal after the lag subsides but occurs every 10 seconds or so lasting for about 3-4 seconds. Is this	05:25
him-cesjf	a kernel issue? How can I determine what is causing this?	05:25
dznet	i have a some problem with intel 7265 wireless adapter on my laptop. how i can solve this?	11:33
dznet	hello!	11:33
dznet	i use linux mint 18. kernel 4.4.0-45	11:35
ogra_	by asking in a mint channel ?	11:42
him-cesjf	Hello, I am running Kubuntu 16.10 on a Dell laptop. For some reason, the OS hangs/responds very slowly in between while typing or switching tasks or while doing any work. During this hang/slow behaviour, I notice the fan revving in high speed, delay in typing, network disconnection and stuck digital clock seconds counter which restores to normal after the lag subsides but occurs every 10 seconds or so lasting for about 3-4 seconds. Is this	12:41
him-cesjf	a kernel issue? How can I determine what is causing this?	12:41
apw	could be a kernel issue, or a display driver issue, or ... some *stat commands are good for those kinds of things	12:57
cking	him-cesjf, i suggest usign forkstat and fnotifystat to see what's up	12:59
him-cesjf	apw: Hi, could you please guide me on how to narrow down the exact cause?	12:59
him-cesjf	Sysinfo for 'TuxStick': Running inside KDE Plasma 5.7.5 on Ubuntu 16.10 (Yakkety Yak) powered by Linux 4.8.0-26-generic, CPU: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz at 2099/2700 MHz, RAM: 7443/7902 MB, Storage: 26/57 GB, 283 procs, 65.76h up	12:59
him-cesjf	cking: Okay, I'll try	12:59
cking	him-cesjf, and maybe cpustat too	13:00
apw	cking, thanks	13:01
him-cesjf	Installing forkstat	13:01
him-cesjf	cking: cpustat log - http://paste.ubuntu.com/23383540/	13:21
him-cesjf	cking: fnotifystat log - http://paste.ubuntu.com/23383534/	13:21
him-cesjf	cking: forstat log - https://paste.ubuntu.com/23383516/	13:21
him-cesjf	forkstat*	13:21
cking	plasmashell is very busy	13:22
cking	you can get more stats on what it is doing using healthcheck, e.g. sudo health-check -p plasmashell	13:22
him-cesjf	nothing from other two log files?	13:23
him-cesjf	Sure	13:23
cking	him-cesjf, lets drill down on the top offender first	13:23
cking	http://kernel.ubuntu.com/~cking/health-check - sudo apt-get install health-check	13:23
him-cesjf	Sure	13:23
cking	cking, but I'm not going to be around this afternoon, so use that tool and past the details in this channel and I'll pick it up from there hopefully tomorrow	13:24
cking	him-cesjf, ^	13:24
him-cesjf	Oh, um okay. here is health-check file - http://paste.ubuntu.com/23383576/	13:27
him-cesjf	cking: ^	13:27
cking	him-cesjf, seems to be spinning on a poll(), I'd file a bug against that application and put that information above in the bug report	13:29
him-cesjf	cking: Okay, filing bug report. Thanks for pointing it out!	13:30
cking	28967 plasmashell poll 4589.4564 112804 0 0.0 sec 0.0 sec 0.0 sec	13:30
him-cesjf	Yes, noticed	13:30
cking	this shows that it's spinning on a zero timeout poll and that's bad :-(	13:30
him-cesjf	Anything else possible other than filing bug report to solve it?	13:30
apw	ouch :)	13:31
him-cesjf	cking: Any other possible cause?	13:35
cking	him-cesjf, nothing else looks like the culprit to me	13:43
him-cesjf	Thanks	13:47
zyga	hey, I've reported https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1636847	14:11
ubot5`	Ubuntu bug 1636847 in linux (Ubuntu) "unexpectedly large memory usage of mounted snaps" [Undecided,Confirmed]	14:11
zyga	it would be great if someone could look at that and see if the bug is in squashfs itself or in the particular config we use	14:11
apw	zyga, there is some pretty suspicious sources of memory "use" calculations in that	14:14
apw	zyga, anyhow thanks for the heads up will find someone to look at it	14:15
zyga	apw: can you be more specific?	14:15
zyga	apw: memory does quickly run out though, in simple real-world tests oom killer jumps in just a few mounts in	14:15
zyga	apw: (after mounting empty snaps)	14:15
zyga	apw: and my numbes agree with slabtop	14:16
apw	total slabs allocated doesn't tell us how much of them is in use	14:16
apw	just how much memory is in them currently	14:16
zyga	apw: mounting 5 empty snaps crashes the box on 1GB VM	14:16
zyga	apw: in any case, I'd love feedback on how to improve this	14:17
zyga	apw: the relevant code is https://github.com/zyga/mounted-fs-memory-checker/blob/master/analyze.py#L19	14:17
apw	yep	14:17
him-cesjf	cking: apw: Filed bug report - https://bugs.launchpad.net/ubuntu/+source/plasma-workspace/+bug/1636869 https://bugs.kde.org/show_bug.cgi?id=371712	14:17
ubot5`	Ubuntu bug 1636869 in plasma-workspace (Ubuntu) "Plasmashell polling on zero timeout" [Undecided,New]	14:17
ubot5`	KDE bug 371712 in DataEngines "Plasmashell polling on zero timeout" [Major,Unconfirmed]	14:17
apw	do you have a nice emppyt snap ?	14:18
zyga	apw: and https://github.com/zyga/mounted-fs-memory-checker/blob/master/analyze.py#L48	14:18
zyga	apw: everything is in that repo	14:18
zyga	apw: just look around	14:18
zyga	apw: I also collected raw traces from various kernels and distros	14:18
zyga	apw: fork that repo and run the overview script, it just chruns the datq	14:18
zyga	apw: fake payload is in https://github.com/zyga/mounted-fs-memory-checker/tree/master/payload	14:18
apw	zyga, won't be me, will get someone to look at it though	14:19
zyga	apw: if you want to see the numbers this is what I get from the script now: http://paste.ubuntu.com/23383523/	14:19
zyga	apw: thank you, noted	14:19
zyga	apw: and do let them tell me if I counted this incorrectly	14:19
apw	+ ./analyze.py ubuntu 16.04 4.4.0-45-generic 4 size-1m.squashfs.xz.heavy	14:20
apw	zyga, what does the 4 mean	14:20
zyga	apw: for core system	14:20
zyga	apw: four*	14:21
apw	zyga, and you are just mounting it, right ? or are you looking at the contents	14:22
zyga	apw: just mounting	14:22
zyga	apw: the contents is a 0 sized file	14:22
zyga	apw: or in this case, a 1mb file	14:22
zyga	apw: (of urandom data)	14:22
zyga	apw: you can run those traces with the same file in vfat and ext4 for comparison	14:22
zyga	apw: there memory usage doesn't change at all (nearly)	14:23
apw	squashfs, that well tested filesystem :<	14:23
zyga	apw: we're the only distro that uses this set of kernel config options	14:23
zyga	apw: my traces include kernel config from each system	14:24
apw	snaps are literally about the only thing which uses squashfs	14:24
zyga	apw: it is worth looking into two things IMHO: 1) why are single cpu systems using so much more memory as compared to a four-cpu system	14:24
zyga	apw: and is the multi-threaded, per-cpu decompressor worth it (other distros use one single threaded decompressor)	14:25
manjo	rtg, apw any chance you will have 4.9 rc rebased ontop of unstable in the coming weeks ?	14:26
apw	zyga, yeah, well i can intuit why that might trigger that behavior, i assume we are hitting pathalogical memory allocator behaviour by our memory patterns	14:26
rtg	manjo, some chance	14:26
apw	zyga, and the majority on those pages have just one teensy bit of useful allocation in them	14:26
zyga	apw: all of the memory is in kmalloc-4096 slab	14:26
apw	zyga, can you point me at the config delta, as logically i should give you a test kernel with that changed to confirm it is tha	14:26
apw	tthat	14:26
manjo	rtg, before Nov 11 ?	14:27
zyga	apw: well, not on one line but just grep for squashfs in https://github.com/zyga/mounted-fs-memory-checker/tree/master/traces/ubuntu/16.10/4.8.0-26-generic/ncpus-1 and https://github.com/zyga/mounted-fs-memory-checker/blob/master/traces/fedora/24/4.7.9-200.fc24.x86_64/ncpus-1/kernel.config	14:28
zyga	apw: I can do that if you want to but I'd rather have someone investigate deeer and just run those tests again	14:28
zyga	deeper*	14:28
apw	zyga, so there is no deadline to make any improvement here ...	14:29
zyga	apw: well, as soon as it starts to explode on us :/	14:29
zyga	apw: I think we don't want 130MB per snap	14:29
zyga	on small sytems	14:29
apw	right so making it go away is more important than why it is	14:29
zyga	yes	14:29
apw	broken, so if we switch the config you test that	14:29
apw	and if it works we ship that, and find out _why_ later	14:29
zyga	worth a try	14:30
apw	CONFIG_SQUASHFS_DECOMP_SINGLE	14:31
apw	i assume it is those ones you want flipped here	14:31
apw	zyga, also what release are you testing in, so i make a test kernel in the right version	14:32
zyga	apw: this is all focused on snappy so currently that's a xenial kernel	14:32
zyga	apw: correct	14:32
zyga	apw: that seems to be the most plausible one	14:32
apw	i'll flip that _SINGLE and get you some kernels, can you test amd64 i assume so	14:33
apw	zyga, ^	14:33
zyga	I sure can, thanks	14:33
rtg	manjo, 4.9 won't be released before Nov 11	14:39
manjo	rtg, yes I know it is still rc 2	14:40
apw	zyga, ok ... people.canonical.com/~apw/lp1636847-xenial/ has test kerenls with that flipped over ...	15:09
apw	zyga, ping me when you know if that is better	15:10
zyga	apw: thanks, downloading now	15:10
zyga	apw: just ran the numbers again, looking much better	16:13
zyga	apw: the 131 MB/snap is down to 4	16:13
zyga	apw: data pushed back to the repo	16:15
him-cesjf	apw: Curious about polling in terms of software. I know about polling in microprocessor/hardware where it polls to check the status of a device, like in printer. What does polling mean in software, like how we noticed in plasmashell few minutes ago?	16:17
apw	him-cesjf, the poll call is a way to avoid active polling, whne done correct	16:19
apw	zyga, ok, so i think we switch that up in the next sru kernel, and i'll have one of my engineers look at why it is broken in that other seemingly superiour mode	16:19
apw	zyga, could you report tht in the bug as well for me, helps with sru'ing it	16:20
him-cesjf	apw: I didn't follow why polling was done for plasmashell and what active polling means for it	16:26
him-cesjf	Sorry if I am bothering with basic questions	16:26
zyga	apw: with pleaseure, thank you for the kernel	16:26
apw	well poll is normally used for waiting for events from like the mouse and the like, well the input in general, and respses from teh display server, this should be an waiting poll, but if you do it wrong it will return immediatly to tell you did it wrong for instance and then you can get into a cpu consuming loop	16:27
him-cesjf	Yes, but polling is usually for a device/hardware from what I kow taking classes in microprocessor in electronics, why plasmashell requires polling is what I am trying to understand	16:29
him-cesjf	know from*	16:29
apw	it is concept not related to hardware specifiically	16:31
apw	though in general unpredictable events are from hardware/users etc	16:31
apw	in this case though the name is a missnoma, it is used to avoild polling on files	16:31
apw	it specifically is used to "wait for anything to happen to any one of this set of file descriptors"	16:32
apw	and those usually are connected to your devices and display server in this kind of context	16:32
apw	it should sit their quietly and do nothing until something happens, but it showing up in the stats	16:33
apw	imply it is not... and something is wrong in that application	16:33
zyga	apw: done	16:34
him-cesjf	apw: Still around?	17:30
apw	vaugly	17:30
him-cesjf	https://bugs.kde.org/show_bug.cgi?id=371712 someone replied and thinks the lag due to plasmashell is not a problem	17:32
ubot5`	KDE bug 371712 in DataEngines "Plasmashell polling on zero timeout" [Major,Needsinfo: waitingforinfo]	17:32
him-cesjf	Maybe I didn't explain the problem well?	17:33
him-cesjf	apw^	17:38
apw	it sounds like they ahve suggested a reasonable course of action	17:40
him-cesjf	Umokay	17:40
him-cesjf	apw: Could you give me a one line explaination of this line so that I can explain the same in reply. I am not good in interpretting it?	17:47
him-cesjf	28967 plasmashell poll 4589.4564 112804 0 0.0 sec 0.0 sec 0.0 sec	17:47
apw	just shove it in verbatim, they will know what it means	17:47
apw	if not they should not have asked for it	17:47
him-cesjf	Sure	17:51
Free99	hello everyone, is there a bug where an application loading the CPU causes a full system freeze currently out for kernel 4.4.0-45? I can't seem to find one	21:18

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!