[11:16] Hi Folks, I just hit this kernel bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349711 [11:16] Launchpad bug 1349711 in linux (Ubuntu) "Machine lockup in btrfs-transaction" [Undecided,Incomplete] [11:17] The advise of the BTRFS developers is to update to 3.15 or above [11:17] s/advise/advice/ [11:17] I tried adding https://launchpad.net/~kernel-ppa/ppa and/or https://launchpad.net/~kernel-ppa/pre-proposed to my trusty system, but both resulted in 404 when doing `apt-get update` [11:18] What is a good way to get a more recent kernel on 14.04? [11:29] pwaller, that isn't exactly the most helpful advice [11:29] apw: ah. Any better suggestions? [11:30] well they could suggest a fix instead of a blanket upgrade to a newer kernel [11:30] when utopic releases tehre will be a 3.16 kernel available in 14.04 but that is not yet ready for production use, nor available in 14.04 [11:30] hm, OK. [11:31] apw: It was advice from a random on IRC, and I suppose they were trying to help me hit the ground running again. [11:31] apw: can you think of any ways I can help the bug along or is it best to sit on my hands for now? [11:31] pwaller, are you seeing any ill effects or only these warnings [11:31] apw: I've determined that these are warnings [11:31] apw: they all appear to be relating to caches which can be discarded [11:32] (according to notes found on mailing lists and the advice from #btrfs) [11:32] yeah that appears to match my understanding, i don't expect any of these to stop things working [11:32] the soft lockups all clear before 46s or whatever the second check is [11:32] apw: a cursory check suggests things are working [11:32] apw: I can't parse your last statement [11:33] apw: the soft lockups are happening after >48h running [11:33] the softlockup warnings all say 22/23s which indicates they did not continue to be locked up, they resolved each individual lockup [11:33] ah! I misread that. [11:33] those are serious when they say like 23s, 46s, 90s, 200s, ... [11:33] gotcha. [11:34] Except that it is preventing the machine from being useful [11:34] but I get what you mean. [11:34] ok so there are ill effects, whci are ? [11:34] apw: http and ssh stop responding [11:35] apw: my interim "fix" was going to be to configure a watchdog which rebooted the system, but maybe that wouldn't work [11:35] depends on the workload and how full the fs is. Do you rebalance regularly? [11:35] apw: I guess the BTRFS FS wasn't accepting writes. [11:35] xnox: stupid question from me: do I need to rebalance if it just on one block device? [11:35] xnox: FS is 87% full [11:35] hmmm, i don't think those warnings are necessarily even related. but anyhow, you should cirtainly file a bug against linux if its getting hung up [11:36] and we can suggest some debug kernels to try etc and see if we can figure out if it is fixed in later versions [11:36] apw: on the kernel tracker? [11:36] on launchpad, run "ubuntu-bug linux" [11:36] apw: the link above is on launchpad [11:36] Oops, no it isn't! [11:37] Oh, yes - the link I sent is here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349711 [11:37] Launchpad bug 1349711 in linux (Ubuntu) "Machine lockup in btrfs-transaction" [Undecided,Incomplete] [11:37] apw: so I did already file one there [11:39] ok as btrfs upstream are suggesting v3.15 I would first suggest you test with a v3.15 mainline kernel; [11:39] https://wiki.ubuntu.com/Kernel/MainlineBuilds [11:40] though these are manual installs and do not upgrade automatically; so they are only useful for testing [11:40] ah, I missed the /mainline of ~kernel-ppa [11:40] if that works try v3.14, if it does not try v3.16-rc7 [11:40] and report that back to the bug, if that shows it is fixed somewhere already we can start a bisect for the fix [11:41] apw: it's a bit difficult to bisect if it isn't triggerable, isn't it? [11:41] very much so, sadly [11:41] but if we have a bracket, we could at least look at the btrfs changes and see if anything "sticks out" [11:42] okay [11:43] apw: we reached more than 12 days uptime before the last problem I think [11:43] so it is going to be difficult to observe [11:46] apw: woah. Just looked in the nginx log [11:46] apw: which is not on the BTRFS drive - it's full of null characters at the point of the fault [11:47] There are ~2280 null characters [11:48] apw: and indeed there were no HTTP requests serviced for the duration of the fault (whilst the kernel was reporting 20s soft lockups) [11:49] afk for lunch, back in <1h. [12:37] back [13:16] Does anyone how I can run apport-collect but check that the output doesn't contain company secrets? [13:49] Does anyone know where /dev/watchdog comes from? My reading is that it should be there by standard, but I can't find it [13:51] Ah, have to load softdog [14:02] smb, did you see that bug 727459 seems to have come back for a user? [14:02] bug 727459 in linux (Ubuntu Lucid) "TSC is not reliable under Xen on some Intel CPUs" [Medium,Triaged] https://launchpad.net/bugs/727459 [14:03] bjf, I saw some updates to the bug but have not gotten to that, yet [14:27] bjf, I am very doubtful this is just the same bug, so I asked to open a new report [14:57] smb, ack [14:59] ** [15:00] ** Ubuntu Kernel Team Meeting - Today @ 17:00 UTC - #ubuntu-meeting [15:00] ** [15:20] bjf, unless I'm missing something, think your automated script might be faulty https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349883 It asked me to run apport to collect logs because logs were missing, but the logs apport attached were the same as ubuntu-bug attached in the first place. Is there some log that apport didn't pick up it should've? [15:20] Launchpad bug 1349883 in linux (Ubuntu) "dmesg time wildly incorrect on paravirtual EC2 instances." [Undecided,Confirmed] [15:23] How do I specify that a kernel module should be loaded? Is it via /etc/modules? Is there a /etc/modules.d equivalent or do I have to edit that one file? [15:23] Joe_CoT, Maybe the bot acts too quickly. Seems like status changed by apport right after [15:23] (I want to make a salt state which causes a module to be updated on boot so I don't particularly want to mess with /etc/modules) [15:23] Its now confirmed so it should be ok. === psivaa is now known as psivaa-bbl [16:36] smb, I think the issue is probably that ubuntu-bug collects the same logs, but doesn't set the tag apport-collected, which the bot cues off of [16:37] Joe_CoT, That tag seemed to be there by the time I looked. [16:39] 10:53 I put the bug in with ubuntu-bug, which had all the logs attached. 11:00 the bot replied, saying the logs were missing, and to run apport-collect. 11:10 I ran apport-collect, which attached the same logs, but added the tag apport-collected [16:40] apport-bug was already there, apport-collected was not [16:46] Hm, ok. bjf, so I am not sure why this happened ^. [16:46] Joe_CoT, Anyway I am looking into it [16:55] ## [16:55] ## Kernel team meeting in 5 minutes [16:55] ## === jsalisbury changed the topic of #ubuntu-kernel to: Home: https://wiki.ubuntu.com/Kernel/ || Ubuntu Kernel Team Meeting - Tues August 12th, 2014 - 17:00 UTC || If you have a question just ask, and do wait around for an answer! If the question is should I file a bug for something, likely you can assume yes. [17:12] <_`_> wait that was it, jsalisbury ? [17:24] rtg: git://kernel.ubuntu.com/ppisati/ubuntu-embedded.git [17:24] rtg: i'm still working on it, so it's pretty rought right now [17:24] rtg: but it works [17:24] rtg: for your mirabox board run as: [17:25] rtg: sudo ./make_img.sh -b mirabox -d 14.10 [17:26] rtg: dd the .img file that it creates to an sd, pop in in your mirabox and follow the instruction in "mirabox-uboot-env.txt" [18:03] ppisati, I'll give it a run [18:05] 3.13 most certainly did not resolve the btrfs sync performance issues :) [18:05] uh, :( === psivaa-bbl is now known as psivaa [21:34] lol [21:56] kamal: i'm seeing a build regression w/ trusty master-next that i've bisected to a 3.13.11.5 patch [21:56] kamal: is there a 3.13.11.y git tree i could use to demonstrate whether or not it is a problem outside of ubuntu? [23:16] dannf, http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.13.y-queue [23:17] dannf, what's the problem patch? [23:17] kamal: a ptrace patch - see my e-mail to kernel-team@ [23:18] kamal: not at all an obvious problem (at least to me), i haven't debugged it yet [23:18] macro fun no doubt [23:23] dannf, yeah, its not clear why that patch would cause that build failure [23:27] dannf, so yeah, if you can (or can't) reproduce the failure with that 3.13.y stable repo, that would be interesting to know [23:29] dannf, and/or I can try to reproduce it -- what config are you using? -- fwiw, I didn't see any build failure with the few ARM configs I test (tegra, omap2plus, imx_v6_v7). [23:31] yeah, weird, can't reproduce with pristine [23:32] but bisecting definitely led to that one, then going back to top of trusty master-next and reverting continued to show that it was the breaker