pwaller | Hi Folks, I just hit this kernel bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349711 | 11:16 |
---|---|---|
ubot5 | Launchpad bug 1349711 in linux (Ubuntu) "Machine lockup in btrfs-transaction" [Undecided,Incomplete] | 11:16 |
pwaller | The advise of the BTRFS developers is to update to 3.15 or above | 11:17 |
pwaller | s/advise/advice/ | 11:17 |
pwaller | I tried adding https://launchpad.net/~kernel-ppa/ppa and/or https://launchpad.net/~kernel-ppa/pre-proposed to my trusty system, but both resulted in 404 when doing `apt-get update` | 11:17 |
pwaller | What is a good way to get a more recent kernel on 14.04? | 11:18 |
apw | pwaller, that isn't exactly the most helpful advice | 11:29 |
pwaller | apw: ah. Any better suggestions? | 11:29 |
apw | well they could suggest a fix instead of a blanket upgrade to a newer kernel | 11:30 |
apw | when utopic releases tehre will be a 3.16 kernel available in 14.04 but that is not yet ready for production use, nor available in 14.04 | 11:30 |
pwaller | hm, OK. | 11:30 |
pwaller | apw: It was advice from a random on IRC, and I suppose they were trying to help me hit the ground running again. | 11:31 |
pwaller | apw: can you think of any ways I can help the bug along or is it best to sit on my hands for now? | 11:31 |
apw | pwaller, are you seeing any ill effects or only these warnings | 11:31 |
pwaller | apw: I've determined that these are warnings | 11:31 |
pwaller | apw: they all appear to be relating to caches which can be discarded | 11:31 |
pwaller | (according to notes found on mailing lists and the advice from #btrfs) | 11:32 |
apw | yeah that appears to match my understanding, i don't expect any of these to stop things working | 11:32 |
apw | the soft lockups all clear before 46s or whatever the second check is | 11:32 |
pwaller | apw: a cursory check suggests things are working | 11:32 |
pwaller | apw: I can't parse your last statement | 11:32 |
pwaller | apw: the soft lockups are happening after >48h running | 11:33 |
apw | the softlockup warnings all say 22/23s which indicates they did not continue to be locked up, they resolved each individual lockup | 11:33 |
pwaller | ah! I misread that. | 11:33 |
apw | those are serious when they say like 23s, 46s, 90s, 200s, ... | 11:33 |
pwaller | gotcha. | 11:33 |
pwaller | Except that it is preventing the machine from being useful | 11:34 |
pwaller | but I get what you mean. | 11:34 |
apw | ok so there are ill effects, whci are ? | 11:34 |
pwaller | apw: http and ssh stop responding | 11:34 |
pwaller | apw: my interim "fix" was going to be to configure a watchdog which rebooted the system, but maybe that wouldn't work | 11:35 |
xnox | depends on the workload and how full the fs is. Do you rebalance regularly? | 11:35 |
pwaller | apw: I guess the BTRFS FS wasn't accepting writes. | 11:35 |
pwaller | xnox: stupid question from me: do I need to rebalance if it just on one block device? | 11:35 |
pwaller | xnox: FS is 87% full | 11:35 |
apw | hmmm, i don't think those warnings are necessarily even related. but anyhow, you should cirtainly file a bug against linux if its getting hung up | 11:35 |
apw | and we can suggest some debug kernels to try etc and see if we can figure out if it is fixed in later versions | 11:36 |
pwaller | apw: on the kernel tracker? | 11:36 |
apw | on launchpad, run "ubuntu-bug linux" | 11:36 |
pwaller | apw: the link above is on launchpad | 11:36 |
pwaller | Oops, no it isn't! | 11:36 |
pwaller | Oh, yes - the link I sent is here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349711 | 11:37 |
ubot5 | Launchpad bug 1349711 in linux (Ubuntu) "Machine lockup in btrfs-transaction" [Undecided,Incomplete] | 11:37 |
pwaller | apw: so I did already file one there | 11:37 |
apw | ok as btrfs upstream are suggesting v3.15 I would first suggest you test with a v3.15 mainline kernel; | 11:39 |
apw | https://wiki.ubuntu.com/Kernel/MainlineBuilds | 11:39 |
apw | though these are manual installs and do not upgrade automatically; so they are only useful for testing | 11:40 |
pwaller | ah, I missed the /mainline of ~kernel-ppa | 11:40 |
apw | if that works try v3.14, if it does not try v3.16-rc7 | 11:40 |
apw | and report that back to the bug, if that shows it is fixed somewhere already we can start a bisect for the fix | 11:40 |
pwaller | apw: it's a bit difficult to bisect if it isn't triggerable, isn't it? | 11:41 |
apw | very much so, sadly | 11:41 |
apw | but if we have a bracket, we could at least look at the btrfs changes and see if anything "sticks out" | 11:41 |
pwaller | okay | 11:42 |
pwaller | apw: we reached more than 12 days uptime before the last problem I think | 11:43 |
pwaller | so it is going to be difficult to observe | 11:43 |
pwaller | apw: woah. Just looked in the nginx log | 11:46 |
pwaller | apw: which is not on the BTRFS drive - it's full of null characters at the point of the fault | 11:46 |
pwaller | There are ~2280 null characters | 11:47 |
pwaller | apw: and indeed there were no HTTP requests serviced for the duration of the fault (whilst the kernel was reporting 20s soft lockups) | 11:48 |
pwaller | afk for lunch, back in <1h. | 11:49 |
pwaller | back | 12:37 |
pwaller | Does anyone how I can run apport-collect but check that the output doesn't contain company secrets? | 13:16 |
pwaller | Does anyone know where /dev/watchdog comes from? My reading is that it should be there by standard, but I can't find it | 13:49 |
pwaller | Ah, have to load softdog | 13:51 |
bjf | smb, did you see that bug 727459 seems to have come back for a user? | 14:02 |
ubot5 | bug 727459 in linux (Ubuntu Lucid) "TSC is not reliable under Xen on some Intel CPUs" [Medium,Triaged] https://launchpad.net/bugs/727459 | 14:02 |
smb | bjf, I saw some updates to the bug but have not gotten to that, yet | 14:03 |
smb | bjf, I am very doubtful this is just the same bug, so I asked to open a new report | 14:27 |
bjf | smb, ack | 14:57 |
jsalisbury | ** | 14:59 |
jsalisbury | ** Ubuntu Kernel Team Meeting - Today @ 17:00 UTC - #ubuntu-meeting | 15:00 |
jsalisbury | ** | 15:00 |
Joe_CoT | bjf, unless I'm missing something, think your automated script might be faulty https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1349883 It asked me to run apport to collect logs because logs were missing, but the logs apport attached were the same as ubuntu-bug attached in the first place. Is there some log that apport didn't pick up it should've? | 15:20 |
ubot5 | Launchpad bug 1349883 in linux (Ubuntu) "dmesg time wildly incorrect on paravirtual EC2 instances." [Undecided,Confirmed] | 15:20 |
pwaller | How do I specify that a kernel module should be loaded? Is it via /etc/modules? Is there a /etc/modules.d equivalent or do I have to edit that one file? | 15:23 |
smb | Joe_CoT, Maybe the bot acts too quickly. Seems like status changed by apport right after | 15:23 |
pwaller | (I want to make a salt state which causes a module to be updated on boot so I don't particularly want to mess with /etc/modules) | 15:23 |
smb | Its now confirmed so it should be ok. | 15:23 |
=== psivaa is now known as psivaa-bbl | ||
Joe_CoT | smb, I think the issue is probably that ubuntu-bug collects the same logs, but doesn't set the tag apport-collected, which the bot cues off of | 16:36 |
smb | Joe_CoT, That tag seemed to be there by the time I looked. | 16:37 |
Joe_CoT | 10:53 I put the bug in with ubuntu-bug, which had all the logs attached. 11:00 the bot replied, saying the logs were missing, and to run apport-collect. 11:10 I ran apport-collect, which attached the same logs, but added the tag apport-collected | 16:39 |
Joe_CoT | apport-bug was already there, apport-collected was not | 16:40 |
smb | Hm, ok. bjf, so I am not sure why this happened ^. | 16:46 |
smb | Joe_CoT, Anyway I am looking into it | 16:46 |
jsalisbury | ## | 16:55 |
jsalisbury | ## Kernel team meeting in 5 minutes | 16:55 |
jsalisbury | ## | 16:55 |
=== jsalisbury changed the topic of #ubuntu-kernel to: Home: https://wiki.ubuntu.com/Kernel/ || Ubuntu Kernel Team Meeting - Tues August 12th, 2014 - 17:00 UTC || If you have a question just ask, and do wait around for an answer! If the question is should I file a bug for something, likely you can assume yes. | ||
_`_ | wait that was it, jsalisbury ? | 17:12 |
ppisati | rtg: git://kernel.ubuntu.com/ppisati/ubuntu-embedded.git | 17:24 |
ppisati | rtg: i'm still working on it, so it's pretty rought right now | 17:24 |
ppisati | rtg: but it works | 17:24 |
ppisati | rtg: for your mirabox board run as: | 17:24 |
ppisati | rtg: sudo ./make_img.sh -b mirabox -d 14.10 | 17:25 |
ppisati | rtg: dd the .img file that it creates to an sd, pop in in your mirabox and follow the instruction in "mirabox-uboot-env.txt" | 17:26 |
rtg | ppisati, I'll give it a run | 18:03 |
hallyn | 3.13 most certainly did not resolve the btrfs sync performance issues :) | 18:05 |
hallyn | uh, :( | 18:05 |
=== psivaa-bbl is now known as psivaa | ||
cantstanya | lol | 21:34 |
dannf | kamal: i'm seeing a build regression w/ trusty master-next that i've bisected to a 3.13.11.5 patch | 21:56 |
dannf | kamal: is there a 3.13.11.y git tree i could use to demonstrate whether or not it is a problem outside of ubuntu? | 21:56 |
kamal | dannf, http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.13.y-queue | 23:16 |
kamal | dannf, what's the problem patch? | 23:17 |
dannf | kamal: a ptrace patch - see my e-mail to kernel-team@ | 23:17 |
dannf | kamal: not at all an obvious problem (at least to me), i haven't debugged it yet | 23:18 |
dannf | macro fun no doubt | 23:18 |
kamal | dannf, yeah, its not clear why that patch would cause that build failure | 23:23 |
kamal | dannf, so yeah, if you can (or can't) reproduce the failure with that 3.13.y stable repo, that would be interesting to know | 23:27 |
kamal | dannf, and/or I can try to reproduce it -- what config are you using? -- fwiw, I didn't see any build failure with the few ARM configs I test (tegra, omap2plus, imx_v6_v7). | 23:29 |
dannf | yeah, weird, can't reproduce with pristine | 23:31 |
dannf | but bisecting definitely led to that one, then going back to top of trusty master-next and reverting continued to show that it was the breaker | 23:32 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!