[15:50] <backjlack> Hello!
[15:50] <backjlack> Are there any known issues with btrfs on 3.13 on trusty?
[15:51] <backjlack> I've received a report from someone who's seeing memory allocation failures on a system with 24 GB of RAM with dpkg and related operations.
[15:51] <backjlack> http://pastebin.com/raw.php?i=GXaj1iHp
[15:52] <backjlack> Basically, they're not actually running apps which require a lot of memory, but they seem to be running into some kind of problem.
[15:59] <cking> backjlack, are there any special mount options being used?
[16:00] <cking> i've seen issues with 3.13 btrfs on large memory systems with compression being used
[16:00] <backjlack> cking: I'll provide /proc/mounts as soon as I get it, shouldn't take long.
[16:02] <backjlack> sysctl config: http://pastebin.com/raw.php?i=3Guv4KaQ
[16:04] <backjlack> cking: http://pastebin.com/raw.php?i=zNr62crU That's /proc/mounts from that system.
[16:05] <backjlack> Please let me know if there's anything else you'd need or if you happen to have any recommendations.
[16:07] <cking> backjlack, those mount options don't look like the one's I've seen issues with
[16:11] <backjlack> Yeah, I know what you mean. I've had issues with compression on an older 3.10.
[16:13] <cking> backjlack, it does seem that there are a lot of slabs being used for inodes, I make it ~4.65GB
[16:13] <cking> but still, that's still not all of memory
[16:14] <backjlack> There's btrfs as well.
[16:14] <cking> backjlack, what's the kernel reporting in dmesg?
[16:15] <backjlack> 3.13.0-43
[16:15] <cking> backjlack, I mean, is the kernel reporting any out of memory failures in the kernel log?
[16:16] <cking> backjlack, it does seem heavily loaded on memory, the conntrack messages possibly indicate lots of open connections etc..
[16:16] <backjlack> This were logs I've received from the same system before the upgrade to 3.13.0-43: http://pastebin.com/raw.php?i=rSpQU9dk
[16:18] <backjlack> http://pastebin.com/raw.php?i=rSpQU9dk
[16:22] <backjlack> http://pastebin.com/raw.php?i=X8pggEsp
[16:23] <cking> backjlack, I believe there used to be vm.zone_reclaim_mode sysctl that could be set 1 to force cached memory to be reclaimed
[16:24] <cking> not checked on that lately, but it's worth a punt
[16:25] <backjlack> cking: Thanks! I've passed that on.
[16:25] <backjlack> Also, btrfs seems to have some kind of problem.
[16:26] <cking> backjlack, in what way?
[16:26] <cking> (mind you, btrfs is experimental, so it does not surprise me)
[16:26] <backjlack> http://pastebin.com/raw.php?i=X8pggEsp
[16:27] <backjlack> There's a btrfs related stack trace in there.
[16:28] <cking> backjlack, i also suggest that vm.min_free_kbytes = 242000 is enabled (or see what it is currently set at), this may help
[16:29] <backjlack> From what I've been told, these errors were still being encountered, even when that was enabled.
[16:34] <cking> backjlack, it would be interesting to see how the slabinfo changes in time to see if there is any leaking or it perhaps is just running low on memory because you are running a memory intensive system config
[16:34] <backjlack> cking: It's not my system, it's from a user. However, I'm interested in making sure the trusty kernel is stable and doesn't have problems.
[16:35] <backjlack> It's being used by a lot of people and making sure it doesn't have such problems would be great.
[16:35] <backjlack> What would you need? Periodic snapshots of slabinfo?
[16:35] <cking> backjlack, periodic snapshots of slabinfo would be useful, eg. every minute or every 5 mins
[16:36] <cking> as it stands, btrfs being used in production environments when it is "experimental" is a tad worrysome
[16:38] <backjlack> It's a dev environment, but this is important.
[16:38] <backjlack> Such usage catches bugs which are rather difficult to catch with automated fs testing.
[16:38] <cking> sure, I agree, I think a bug should be opened, and we can work through this
[16:39] <backjlack> I've requested snapshotting for slabinfo and will open an issue after I get the slabinfo snapshots over a period of 24 hours.
[16:39] <cking> backjlack, well, I'm working on a thorough thrashing of btrfs and I will be backporting fixes once I've identified the kitten killer issues
[16:39] <backjlack> There are issues even in newer kernels, like 3.14, 3.15 and so on.
[16:39] <cking> 3.19-rc2?
[16:39] <backjlack> I've got stacktraces, but couldn't reproduce so far.
[16:40] <backjlack> I haven't tried that myself or in any dev environment yet. Reproducing the issues is rather difficult.
[16:41] <backjlack> Getting a hard btrfs crash which takes down the whole system or just crashes btrfs reproduced can be very difficult.
[16:41] <cking> backjlack, not with my tests, I crash it daily ;-)
[16:41] <backjlack> Is there a repository for these tests?
[16:42] <cking> backjlack, xfs tests, generic and xfs specific ones
[16:42] <cking> I mean "btrfs specific ones"
[16:43] <backjlack> Ah, I see. Ok, I'll try those as well.
[16:43] <cking> and I'm testing against a wide mix of mount options, I've been hammering test configs for ~3+ weeks non-stop and I'm building a matrix of failure points
[16:43] <cking> then I start identifying fixes and backporting them
[16:44] <cking> I'm on the case
[16:50] <jsalisbury> ##
[16:50] <jsalisbury> ## Kernel team meeting in 10 minutes in #ubuntu-meeting
[16:50] <jsalisbury> ##
[16:54] <bjf> apw, LP: #1400289   ...  can you verify ?
[16:57] <bjf> dannf, LP: #1381084  ...  can you verify?
[16:57] <apw> bjf, ack
[16:59] <dannf> cmagina: ^ can you verify LP: #1381084 for bjf?
[16:59] <bjf> arges, there are two for you: LP: #1396235  and  LP: #1401150
[16:59] <arges> bjf: was just looking at those : )
[16:59] <bjf> arges, would you be able to verify those if i let you have modoc for a bit?
[17:00] <jsalisbury> ##
[17:00] <jsalisbury> ## Meeting starting now
[17:00] <jsalisbury> ##
[17:00] <cmagina> dannf, bjf: sure
[17:00] <arges> bjf: i'd be able to verify 1396235, but 1401150 is a perf fix i could only verify it doens't break things
[17:01] <bjf> arges, ok, i'll see about freeing up modoc for a bit ... lets shoot for tomorrow if that's ok
[17:01] <arges> bjf: works for me
[17:03] <apw> bjf, utlemming is on the case
[17:04] <bjf> cool
[17:07] <bjf> sforshee, you think we can mark LP: #1275879 as verified?
[17:08] <sforshee> bjf: looking
[17:09] <sforshee> bjf: I think so, we'll be getting that commit via upstream stable either way