him-cesjf | Hello, I am running Kubuntu 16.10 on a Dell laptop. For some reason, the OS hangs/responds very slowly in between while typing or switching tasks or while doing any work. During this hang/slow behaviour, I notice the fan revving in high speed, delay in typing, network disconnection and stuck digital clock seconds counter which restores to normal after the lag subsides but occurs every 10 seconds or so lasting for about 3-4 seconds. Is this | 05:25 |
---|---|---|
him-cesjf | a kernel issue? How can I determine what is causing this? | 05:25 |
dznet | i have a some problem with intel 7265 wireless adapter on my laptop. how i can solve this? | 11:33 |
dznet | hello! | 11:33 |
dznet | i use linux mint 18. kernel 4.4.0-45 | 11:35 |
ogra_ | by asking in a mint channel ? | 11:42 |
him-cesjf | Hello, I am running Kubuntu 16.10 on a Dell laptop. For some reason, the OS hangs/responds very slowly in between while typing or switching tasks or while doing any work. During this hang/slow behaviour, I notice the fan revving in high speed, delay in typing, network disconnection and stuck digital clock seconds counter which restores to normal after the lag subsides but occurs every 10 seconds or so lasting for about 3-4 seconds. Is this | 12:41 |
him-cesjf | a kernel issue? How can I determine what is causing this? | 12:41 |
apw | could be a kernel issue, or a display driver issue, or ... some *stat commands are good for those kinds of things | 12:57 |
cking | him-cesjf, i suggest usign forkstat and fnotifystat to see what's up | 12:59 |
him-cesjf | apw: Hi, could you please guide me on how to narrow down the exact cause? | 12:59 |
him-cesjf | Sysinfo for 'TuxStick': Running inside KDE Plasma 5.7.5 on Ubuntu 16.10 (Yakkety Yak) powered by Linux 4.8.0-26-generic, CPU: Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz at 2099/2700 MHz, RAM: 7443/7902 MB, Storage: 26/57 GB, 283 procs, 65.76h up | 12:59 |
him-cesjf | cking: Okay, I'll try | 12:59 |
cking | him-cesjf, and maybe cpustat too | 13:00 |
apw | cking, thanks | 13:01 |
him-cesjf | Installing forkstat | 13:01 |
him-cesjf | cking: cpustat log - http://paste.ubuntu.com/23383540/ | 13:21 |
him-cesjf | cking: fnotifystat log - http://paste.ubuntu.com/23383534/ | 13:21 |
him-cesjf | cking: forstat log - https://paste.ubuntu.com/23383516/ | 13:21 |
him-cesjf | forkstat* | 13:21 |
cking | plasmashell is very busy | 13:22 |
cking | you can get more stats on what it is doing using healthcheck, e.g. sudo health-check -p plasmashell | 13:22 |
him-cesjf | nothing from other two log files? | 13:23 |
him-cesjf | Sure | 13:23 |
cking | him-cesjf, lets drill down on the top offender first | 13:23 |
cking | http://kernel.ubuntu.com/~cking/health-check - sudo apt-get install health-check | 13:23 |
him-cesjf | Sure | 13:23 |
cking | cking, but I'm not going to be around this afternoon, so use that tool and past the details in this channel and I'll pick it up from there hopefully tomorrow | 13:24 |
cking | him-cesjf, ^ | 13:24 |
him-cesjf | Oh, um okay. here is health-check file - http://paste.ubuntu.com/23383576/ | 13:27 |
him-cesjf | cking: ^ | 13:27 |
cking | him-cesjf, seems to be spinning on a poll(), I'd file a bug against that application and put that information above in the bug report | 13:29 |
him-cesjf | cking: Okay, filing bug report. Thanks for pointing it out! | 13:30 |
cking | 28967 plasmashell poll 4589.4564 112804 0 0.0 sec 0.0 sec 0.0 sec | 13:30 |
him-cesjf | Yes, noticed | 13:30 |
cking | this shows that it's spinning on a zero timeout poll and that's bad :-( | 13:30 |
him-cesjf | Anything else possible other than filing bug report to solve it? | 13:30 |
apw | ouch :) | 13:31 |
him-cesjf | cking: Any other possible cause? | 13:35 |
cking | him-cesjf, nothing else looks like the culprit to me | 13:43 |
him-cesjf | Thanks | 13:47 |
zyga | hey, I've reported https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1636847 | 14:11 |
ubot5` | Ubuntu bug 1636847 in linux (Ubuntu) "unexpectedly large memory usage of mounted snaps" [Undecided,Confirmed] | 14:11 |
zyga | it would be great if someone could look at that and see if the bug is in squashfs itself or in the particular config we use | 14:11 |
apw | zyga, there is some pretty suspicious sources of memory "use" calculations in that | 14:14 |
apw | zyga, anyhow thanks for the heads up will find someone to look at it | 14:15 |
zyga | apw: can you be more specific? | 14:15 |
zyga | apw: memory does quickly run out though, in simple real-world tests oom killer jumps in just a few mounts in | 14:15 |
zyga | apw: (after mounting empty snaps) | 14:15 |
zyga | apw: and my numbes agree with slabtop | 14:16 |
apw | total slabs allocated doesn't tell us how much of them is in use | 14:16 |
apw | just how much memory is in them currently | 14:16 |
zyga | apw: mounting 5 empty snaps crashes the box on 1GB VM | 14:16 |
zyga | apw: in any case, I'd love feedback on how to improve this | 14:17 |
zyga | apw: the relevant code is https://github.com/zyga/mounted-fs-memory-checker/blob/master/analyze.py#L19 | 14:17 |
apw | yep | 14:17 |
him-cesjf | cking: apw: Filed bug report - https://bugs.launchpad.net/ubuntu/+source/plasma-workspace/+bug/1636869 https://bugs.kde.org/show_bug.cgi?id=371712 | 14:17 |
ubot5` | Ubuntu bug 1636869 in plasma-workspace (Ubuntu) "Plasmashell polling on zero timeout" [Undecided,New] | 14:17 |
ubot5` | KDE bug 371712 in DataEngines "Plasmashell polling on zero timeout" [Major,Unconfirmed] | 14:17 |
apw | do you have a nice emppyt snap ? | 14:18 |
zyga | apw: and https://github.com/zyga/mounted-fs-memory-checker/blob/master/analyze.py#L48 | 14:18 |
zyga | apw: everything is in that repo | 14:18 |
zyga | apw: just look around | 14:18 |
zyga | apw: I also collected raw traces from various kernels and distros | 14:18 |
zyga | apw: fork that repo and run the overview script, it just chruns the datq | 14:18 |
zyga | apw: fake payload is in https://github.com/zyga/mounted-fs-memory-checker/tree/master/payload | 14:18 |
apw | zyga, won't be me, will get someone to look at it though | 14:19 |
zyga | apw: if you want to see the numbers this is what I get from the script now: http://paste.ubuntu.com/23383523/ | 14:19 |
zyga | apw: thank you, noted | 14:19 |
zyga | apw: and do let them tell me if I counted this incorrectly | 14:19 |
apw | + ./analyze.py ubuntu 16.04 4.4.0-45-generic 4 size-1m.squashfs.xz.heavy | 14:20 |
apw | zyga, what does the 4 mean | 14:20 |
zyga | apw: for core system | 14:20 |
zyga | apw: four* | 14:21 |
apw | zyga, and you are just mounting it, right ? or are you looking at the contents | 14:22 |
zyga | apw: just mounting | 14:22 |
zyga | apw: the contents is a 0 sized file | 14:22 |
zyga | apw: or in this case, a 1mb file | 14:22 |
zyga | apw: (of urandom data) | 14:22 |
zyga | apw: you can run those traces with the same file in vfat and ext4 for comparison | 14:22 |
zyga | apw: there memory usage doesn't change at all (nearly) | 14:23 |
apw | squashfs, that well tested filesystem :< | 14:23 |
zyga | apw: we're the only distro that uses this set of kernel config options | 14:23 |
zyga | apw: my traces include kernel config from each system | 14:24 |
apw | snaps are literally about the only thing which uses squashfs | 14:24 |
zyga | apw: it is worth looking into two things IMHO: 1) why are single cpu systems using so much more memory as compared to a four-cpu system | 14:24 |
zyga | apw: and is the multi-threaded, per-cpu decompressor worth it (other distros use one single threaded decompressor) | 14:25 |
manjo | rtg, apw any chance you will have 4.9 rc rebased ontop of unstable in the coming weeks ? | 14:26 |
apw | zyga, yeah, well i can intuit why that might trigger that behavior, i assume we are hitting pathalogical memory allocator behaviour by our memory patterns | 14:26 |
rtg | manjo, some chance | 14:26 |
apw | zyga, and the majority on those pages have just one teensy bit of useful allocation in them | 14:26 |
zyga | apw: all of the memory is in kmalloc-4096 slab | 14:26 |
apw | zyga, can you point me at the config delta, as logically i should give you a test kernel with that changed to confirm it is tha | 14:26 |
apw | tthat | 14:26 |
manjo | rtg, before Nov 11 ? | 14:27 |
zyga | apw: well, not on one line but just grep for squashfs in https://github.com/zyga/mounted-fs-memory-checker/tree/master/traces/ubuntu/16.10/4.8.0-26-generic/ncpus-1 and https://github.com/zyga/mounted-fs-memory-checker/blob/master/traces/fedora/24/4.7.9-200.fc24.x86_64/ncpus-1/kernel.config | 14:28 |
zyga | apw: I can do that if you want to but I'd rather have someone investigate deeer and just run those tests again | 14:28 |
zyga | deeper* | 14:28 |
apw | zyga, so there is no deadline to make any improvement here ... | 14:29 |
zyga | apw: well, as soon as it starts to explode on us :/ | 14:29 |
zyga | apw: I think we don't want 130MB per snap | 14:29 |
zyga | on small sytems | 14:29 |
apw | right so making it go away is more important than why it is | 14:29 |
zyga | yes | 14:29 |
apw | broken, so if we switch the config you test that | 14:29 |
apw | and if it works we ship that, and find out _why_ later | 14:29 |
zyga | worth a try | 14:30 |
apw | CONFIG_SQUASHFS_DECOMP_SINGLE | 14:31 |
apw | i assume it is those ones you want flipped here | 14:31 |
apw | zyga, also what release are you testing in, so i make a test kernel in the right version | 14:32 |
zyga | apw: this is all focused on snappy so currently that's a xenial kernel | 14:32 |
zyga | apw: correct | 14:32 |
zyga | apw: that seems to be the most plausible one | 14:32 |
apw | i'll flip that _SINGLE and get you some kernels, can you test amd64 i assume so | 14:33 |
apw | zyga, ^ | 14:33 |
zyga | I sure can, thanks | 14:33 |
rtg | manjo, 4.9 won't be released before Nov 11 | 14:39 |
manjo | rtg, yes I know it is still rc 2 | 14:40 |
apw | zyga, ok ... people.canonical.com/~apw/lp1636847-xenial/ has test kerenls with that flipped over ... | 15:09 |
apw | zyga, ping me when you know if that is better | 15:10 |
zyga | apw: thanks, downloading now | 15:10 |
zyga | apw: just ran the numbers again, looking much better | 16:13 |
zyga | apw: the 131 MB/snap is down to 4 | 16:13 |
zyga | apw: data pushed back to the repo | 16:15 |
him-cesjf | apw: Curious about polling in terms of software. I know about polling in microprocessor/hardware where it polls to check the status of a device, like in printer. What does polling mean in software, like how we noticed in plasmashell few minutes ago? | 16:17 |
apw | him-cesjf, the poll call is a way to avoid active polling, whne done correct | 16:19 |
apw | zyga, ok, so i think we switch that up in the next sru kernel, and i'll have one of my engineers look at why it is broken in that other seemingly superiour mode | 16:19 |
apw | zyga, could you report tht in the bug as well for me, helps with sru'ing it | 16:20 |
him-cesjf | apw: I didn't follow why polling was done for plasmashell and what active polling means for it | 16:26 |
him-cesjf | Sorry if I am bothering with basic questions | 16:26 |
zyga | apw: with pleaseure, thank you for the kernel | 16:26 |
apw | well poll is normally used for waiting for events from like the mouse and the like, well the input in general, and respses from teh display server, this should be an waiting poll, but if you do it wrong it will return immediatly to tell you did it wrong for instance and then you can get into a cpu consuming loop | 16:27 |
him-cesjf | Yes, but polling is usually for a device/hardware from what I kow taking classes in microprocessor in electronics, why plasmashell requires polling is what I am trying to understand | 16:29 |
him-cesjf | know from* | 16:29 |
apw | it is concept not related to hardware specifiically | 16:31 |
apw | though in general unpredictable events are from hardware/users etc | 16:31 |
apw | in this case though the name is a missnoma, it is used to avoild polling on files | 16:31 |
apw | it specifically is used to "wait for anything to happen to any one of this set of file descriptors" | 16:32 |
apw | and those usually are connected to your devices and display server in this kind of context | 16:32 |
apw | it should sit their quietly and do nothing until something happens, but it showing up in the stats | 16:33 |
apw | imply it is not... and something is wrong in that application | 16:33 |
zyga | apw: done | 16:34 |
him-cesjf | apw: Still around? | 17:30 |
apw | vaugly | 17:30 |
him-cesjf | https://bugs.kde.org/show_bug.cgi?id=371712 someone replied and thinks the lag due to plasmashell is not a problem | 17:32 |
ubot5` | KDE bug 371712 in DataEngines "Plasmashell polling on zero timeout" [Major,Needsinfo: waitingforinfo] | 17:32 |
him-cesjf | Maybe I didn't explain the problem well? | 17:33 |
him-cesjf | apw^ | 17:38 |
apw | it sounds like they ahve suggested a reasonable course of action | 17:40 |
him-cesjf | Umokay | 17:40 |
him-cesjf | apw: Could you give me a one line explaination of this line so that I can explain the same in reply. I am not good in interpretting it? | 17:47 |
him-cesjf | 28967 plasmashell poll 4589.4564 112804 0 0.0 sec 0.0 sec 0.0 sec | 17:47 |
apw | just shove it in verbatim, they will know what it means | 17:47 |
apw | if not they should not have asked for it | 17:47 |
him-cesjf | Sure | 17:51 |
Free99 | hello everyone, is there a bug where an application loading the CPU causes a full system freeze currently out for kernel 4.4.0-45? I can't seem to find one | 21:18 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!