[07:41] so nobody here can help me with this kernel issue? [09:27] DEvil0000, "this" ? [10:22] apw, i have a user running CI tests in qemu/kvm for ubuntu on top of a Centos box ... his VMs fill up /run/udev/data with tons of cgroup entries over time (ls output is at https://bugs.launchpad.net/snapd/+bug/1687507/+attachment/4870568/+files/run_udev_data.log) ...https://bugs.launchpad.net/snapd/+bug/1687507 any idea ? (he runs a hwe kernel in the VMs) [10:22] Ubuntu bug 1687507 in snapd "Memory leak (/tmp file system filling up)" [Undecided,New] [10:28] ogra_, so he is saying that within the VM we are dropping those failes in /run, or are they in his host ? [10:28] ogra_, but if they are /run/udev/data files i would assume they are related to systemd in whichever ? [10:29] apw, they are inside the VM ... he runs his VMs with 200MB ram each and after a week or so they all run out of ram [10:29] ogra_, ok so then our systemd seems most likely given the names [10:29] i would expect udev to only react to device creastion here [10:30] ogra_, which you do for every snap install, you make a new loop [10:30] well, thats not loop devices there [10:31] no it is session things, perhaps one for every run of a snap command [10:32] zyga_, ^^^ ? [10:32] ogra_: looking [10:32] ogra_, but they look like regular files, and they arn't in a directory the kernel makes up out of the ether, they look like actual disk blocks [10:32] well ... i see bits like: -rw-r--r-- 1 root root 31 May 1 06:08 +cgroup:kmalloc-1024(30177:apt-daily.service [10:32] ogra_: I saw this bug [10:32] thats definitely not from snapd [10:32] ogra_: perhaps the real memory is hidden somewhere else that we don't see [10:32] aha [10:33] do you have a log of what is in the slab? [10:33] ogra_, no but that doesn't make it a kernel thing. that is a systemd unit which ran [10:33] zyga_, these are actual files in a directory on a ramfs [10:33] zyga_, we only have the ls output of the udev db atm [10:33] the memory is being using storing them [10:33] i would suggest whatever is making the sessions is triggering udev to leak an internal file [10:34] apw: aha, [10:34] apw: is that an actual ramfs or did you mean tmpfs? [10:35] its a tmpfs ... [10:35] usually /run has 10% of your ram [10:35] zyga_, tmpfs prolly, as it is /run [10:36] so according to the other logs in teh forum, i think there is a session being logged in syslog for every execution of his snap commands [10:37] apw: what do you mean by session specifically? (I'm not fully aware of how various "sessions" are managed by linux) [10:37] Apr 28 09:36:50 ci-comp11-dut systemd[1]: Started Session 1817 of user root. [10:37] zyga_, that is userspace shit :) [10:37] yxou mean that line i guess [10:38] ogra_, yeah, that is what it looks like to me [10:38] perhaps that is an install of a snap, dunno, he is implying he is doing repeated installs [10:39] yeah, no wonder if he does CI [10:39] (though he should simply kill the VM and use a fresh one to do it right :) ) [10:40] ogra_, he could, but i would open a systemd task on that bug at least, as it looks to be leaking, or if it is not they might be able to tell us better waht the hell those files represent [10:41] ogra_, and what the reporter is doing wrong (this is systemd after all) [10:41] "you didn't want to do it like _that_" [10:41] do you guys think there's an actual snapd bug somewhere? [10:41] zyga_, unlikely given the type of files created there [10:41] zyga_, without knowing what the session represents ... you might ask for one of the file content as well [10:43] [12:42:15] hello cjwatson wrt the git/ssh repo cloning issue, I finally found it [10:43] [12:42:16] http://kernel.ubuntu.com/git/cking/stress-ng.git/ [10:43] [12:42:31] this one is git only clonable (ok not really a launchpad host) [10:43] cking, ^^ :) [10:43] do you mind having https too? [10:43] LocutusOfBorg, what does that mean, ie which protocols don't work [10:43] git clone https:// [10:43] * apw wonders how that works ... hmmm [10:44] if you look at the page, it is not even shown as "clonable" [10:44] probably some configuration needs an enable [10:44] that host is somewhat of a legacy mess ... no idea if it even can do https [10:45] i suspect it does not at all ... [10:45] I guess that's the reason [10:45] cking, would it make more sense to just mirror that into LP and be happy ? [10:45] :( [10:45] we have git protocol blocked here at work [10:45] cking, we could even add it to the primary mirroring list if you care to keep both working [10:45] LocutusOfBorg: Tell the people who run your work network to stop being terrible? [10:46] apw, I'd prefer LP to mirror the original git repo if thats possible [10:46] infinity, but but they need to record everything you do, oh except https:// cause you can't do anything dodgy with that [10:46] cking, i guess it depends where in the namespace it goes to [10:47] cking: Wasn't there a goal to eventually get *all* the master repos off of wani, though? [10:47] infinity, it is the customer's network :) unfortunately security team has a blacklist everything opinion [10:47] infinity, yeah, I'm being lazy [10:47] LocutusOfBorg, letting https:// out basically means they don't stop anything [10:47] I have it mirrored at https://github.com/ColinIanKing/stress-ng too [10:48] cking, oh ... heh [10:48] apw, well, somebody was doing reverse ssh over git port :) [10:48] they even blocked ssh on 443 [10:48] LocutusOfBorg, and they are now doing it over https no doubt [10:48] i guess you could download https://github.com/ColinIanKing/stress-ng/archive/master.zip [10:48] LocutusOfBorg: You can't "block ssh on 443". I mean, you can block the pre-ssh handshake, but you can't stop people from setting up ssl tunnels and then going to town. [10:49] cking, he can clone from github if it is a mirror [10:49] LocutusOfBorg: So, indeed, as Andy says, if you allow 443/https, you allow everything. Just in a way you can't filter/inspect. [10:49] infinity, yes, of course, they just made things harder [10:49] anyhow git over ssh works nicely, and I don't want to hack things on a build server [10:49] "harder" for someone who was reverse tunnelling ssh over the git port, is not going to be actually hard [10:50] apw, I keep a mirror of all my repos on github because folks in .cn sometimes can't access my repos any other way [10:50] cking, greak, then LocutusOfBorg should mirror from [10:50] github for now, and we can think about where that repo should move to [10:50] at our leisure [10:50] apw, the "at our leisure" generally means never [10:51] cking, no we have cards now [10:51] apw, OK, I'll slap it on the card deck [10:51] cking, so ... who owns that, is it cking or is it c-k-t or something else [10:51] apw, tis me [10:51] then i cannot mirror _to_ your LP, but if you switched master to lp:~cking i could mirror it back to kernel, and you could add the symlink there [10:52] cking, as an aside does github auto-mirror for you ? [10:52] apw, sounds like a plan [10:53] cking, I generally push to my repo and mirror one each push [10:53] thanks! [10:53] cking, I'm pretty sure there is some automatic push feature on github [10:53] cking: I'd argue that if you're going to the trouble of moving it, you might want it owned by a stress-ng-hackers team or something instead of cking. Future proof it a bit. [10:53] i suspect there is [10:53] but I never found how does it work [10:53] infinity, yeah, sounds like a good idea [10:54] infinity, in which case we could add the mirror-bot [10:54] to that team until we get the fine-grained permissions we are meant to have [10:54] cking, i assume you have an LP project already anyhow [10:55] https://launchpad.net/stress-ng [10:55] or maybe just [10:55] add an hook [10:57] BTW the network is mostly windows only, I'm the only one with such connection issues, and I don't want to complain to /dev/null network admins, corporate networks can be time consuming [10:57] (I do tethering and live happy for now) [10:58] LocutusOfBorg, well it sounds like you have a work-around, use the github mirror as that is https: already [10:59] yes, thanks! I forwarded this helpful message to my colleague [11:02] LocutusOfBorg, and let me know if you find any stress-ng issues too [11:03] cking, it is an awesome tool! we use it to stress our custom yocto BSP :) [11:04] LocutusOfBorg, thanks, that's good to know. can you star it on the git hub page :-) [11:04] done :) [11:04] I didn't even star my projects [11:04] \o/ [11:05] I think I had some feature requests to do for this project, something I feel "hey that would be nice" but I forgot them [11:05] LocutusOfBorg, well, I'm very open to new feature requests and/or patches too :-) [11:07] I probably remember the need to test ram overloading and some cpu micro (e.g. stress test only one single core) or something like that [11:07] it happened 1.5 year ago, too much time for my brain to remember [11:07] :) [11:07] ah, stress-ng has progressed a lot since then [11:07] sure, I would like to give it a new try :) [11:11] flashbench, this is a tool we had to add because stress-ng was not testing flashes :) [11:35] 10:10:48 - DEvil0000: I have a issue with my 14.04 and hope to find some help here. Description: https://hastebin.com/raw/fareguzego [11:35] 10:11:10 - DEvil0000: in short: core dumps of 32Bit processes on a 64Bit OS are not correct somehow. so its about gdb, kernel, ubuntu 14.04/16.04 and packages [11:37] @apw [11:39] DEvil0000, hmmm so you are saying 32bit core dumps differ depending whether they are dumped by the lts-backport 4.4 or the native 4.4 from a later release ? [11:43] that is somewhat unexpected as the only real difference is normally the compiler used to build it [11:44] DEvil0000, you should file a bug against the kernel and add a task for gdb as well i suspect, and drop the number in here [12:03] apw: yes, this is what I see. resulting in incorrect/broken stack trace in gdb later when dumped with the backport kernel [12:03] DEvil0000, that is most odd as those kernels are essentially the same other than where they are built [12:03] apw: when I install the native 4.4 packages on my 14.04 it works as expected [12:04] that makes little to no sense in reality [12:04] but, it is happening, so please file a bug against the kernel with all that detail and how you reproduced it [12:04] i assume a simple 32bit proggy which does abort will exhibit this behaviour [12:05] apw: not sure - this seams to be quite likely for my processes with about 30 threads but less likely with less threads. hard to find a good example process [12:07] apw: main thread dumps seam to be ok most of the time [12:07] hmmm [12:08] it is not 100% reproducible. that is ever odder [12:08] is it save to use the native 4.4 kernel packages instead of the backport ones? [12:09] is maybe the patch level different in some way? [12:09] or maybe a lib needed to compile the kernel? [12:10] they are identical as far as i know from a patches applied perspective (on x86 at least) [12:10] and they do not use external libraries [12:10] safe is a relative term, you won't get updates if you do tha [12:10] taht [12:12] I have found some reports from others via google having the same issues. [12:12] sometimes with gcore gdb in the title. some for debian. [12:12] updates are not a issue in this case. we have our own pool which is a less updated ubuntu pool clone [12:16] (but yes I tested it on plain and full updated 14.04) [12:18] maybe it is somehow related to this: http://www.bigeng.io/recovering-redis-data-with-gdb/ [12:18] looks similar [12:25] @apw: so you tell me using the native 4.4 should be no issue. why don't you use the same packages for both versions then? maybe even from a common package pool? [12:25] i didn't actually say there would be no issues but regardless [12:26] the thing is not compiled using tools you have so no out of tree drivers can be relied on for one [12:26] that is why we build a native version of the kernel for the series it is run in [12:27] as the differences between the two are minor, at best, i have no evidence you are not just getting lucky and the problem will come back [12:29] DEvil0000, nope there are no code differences between the two, /me checks config [12:30] DEvil0000, and the only thing there is how stack-crashing is detected (due to compiler limitations) but that should not be visible in userspace either === daniel is now known as Guest47234 === Guest47234 is now known as Odd_Bloke === JanC is now known as Guest15747 === JanC_ is now known as JanC