=== lifelike_ is now known as lifelike | ||
=== shengyao_afk is now known as shengyao | ||
=== shengyao is now known as shengyao_afk | ||
=== shuduo_afk is now known as shuduo | ||
apb1963 | ubuntu 12.04.3 3.2.0-57-generic-pae #87-Ubuntu SMP Tue Nov 12 21:57:43 UTC 2013 i686 i686 i386 GNU/Linux CRASH. FREEZE....reboot.... FREEZE....reboot... FREEZE....reboot.... FREEZE | 03:58 |
---|---|---|
apb1963 | :( | 04:01 |
apb1963 | I get about 30 minutes of uptime now... then it freezes. | 04:02 |
apb1963 | I don't see anything exciting in the logs.... but then I don't know what to look for. | 04:03 |
apb1963 | started happening yesterday | 04:04 |
=== shuduo is now known as shuduo_afk | ||
apb1963 | or perhaps this morning... something came down in an update, I don't know what. | 04:05 |
apb1963 | AFAIK everything is current. | 04:06 |
=== shuduo_afk is now known as shuduo | ||
=== shuduo is now known as shuduo_afk | ||
=== shuduo_afk is now known as shuduo | ||
apb1963 | And we have another freeze... and reboot. | 05:23 |
apb1963 | And another freeze.... and reboot | 06:56 |
apw | apb1963, if this is sudden, then you want to try and oldre kernel and see if it does it there | 09:01 |
smb | apw, Or gfx, whatever or whichever he got | 09:03 |
apw | moin | 09:03 |
smb | apw, moin | 09:03 |
cking | yawn | 09:04 |
smb | fwiw I was running that kernel for a bit but now moved on as I got proposed enabled | 09:04 |
apb1963 | apw Would the update manager have downloaded and installed a new kernel? | 09:13 |
apb1963 | The date here....3.2.0-57-generic-pae #87-Ubuntu SMP Tue Nov 12 21:57:43 UTC 2013 i686 i686 i386 isn't that the date the kernel was compiled? | 09:14 |
smb | yes that is the compile date, but also yes update manager also does download and install kernel packages just like any other | 09:15 |
apb1963 | ok, so .... did something force a download within the last 24 hours? Wouldn't I have gotten that kernel 3 weeks ago? | 09:16 |
apw | you can tell what has ben downloaded in the last 24 hours in your apt logs | 09:16 |
apb1963 | cool. let me check that. | 09:16 |
smb | Even longer in /var/log/apt/history.log* | 09:18 |
apb1963 | yeah that's what I was looking at... not a lot in there. However, notice the time of my last comment before you guys woke up. | 09:20 |
apb1963 | it's been over 2 hours | 09:20 |
apb1963 | The only thing that really happened in that time is I updated sflphone - an application I use. A softphone. I went into that channel several hours ago... and noticed that the developer mentioned to me he thought he had fixed a bug I reported. I told him of this freeze problem - he didn't respond, but another updated came down shortly after... I installed that and now it's been 2 hours since the last freeze. | 09:22 |
apb1963 | That's the same app that was tickling a DBUS bug I reported not too long ago. | 09:23 |
apb1963 | It was crashing my system... so he changed his code... to accommodate. | 09:23 |
apb1963 | So many changes going on it's hard to know what's causing what. | 09:24 |
* apw nods | 09:24 | |
apb1963 | And... as best as I can remember... I thought I updated to the -57 version a few weeks ago... but i'm not 100% sure... hell, it could have been -56 | 09:25 |
apb1963 | I can post the apt log if you wouldn't mind having a look. I'm sure you're more familiar with the various things than I am. | 09:26 |
smb | apb1963, Usually you have at least 2 (usually even more if you don't manually delete them) of the previous kernels | 09:26 |
apb1963 | I haven't deleted anything | 09:26 |
apb1963 | although I'm not sure where to look for them | 09:27 |
smb | If you hit left shift during boot (or modify grub to always show up for a few seconds) you can select them | 09:27 |
smb | ls /boot | 09:27 |
apb1963 | Yeah.... I tried hitting both left & right shift... it just ignores it. | 09:28 |
smb | The timing is... trick | 09:28 |
smb | tricky | 09:28 |
apb1963 | I held it down for the entire boot sequence... it ignored me | 09:28 |
smb | I prefer to modify /etc/default/grub to comment out the two *HIDDEN* lines | 09:28 |
smb | set the other timeout to 10 and run "sudo update-grub" | 09:29 |
apb1963 | heh. I've got the last 10 kernels. | 09:29 |
apb1963 | I wlil do that | 09:29 |
apb1963 | so... this one...#GRUB_HIDDEN_TIMEOUT=0 | 09:30 |
smb | right and there is another one | 09:31 |
smb | #GRUB_HIDDEN_TIMEOUT=0 | 09:31 |
smb | #GRUB_HIDDEN_TIMEOUT_QUIET=true | 09:31 |
smb | GRUB_TIMEOUT=10 | 09:31 |
apb1963 | GRUB_HIDDEN_TIMEOUT_QUIET=true | 09:31 |
apb1963 | so that was already uncommented | 09:31 |
smb | apb1963, They have to be like I posted above | 09:32 |
smb | Both HIDDEN commented out and the GRUB_TIMEOUT set to whatever wait time you want | 09:32 |
smb | If hidden timeout is true, the hidden timeout value is used (which is 0 by default), which gives you no delay and you have to press shift exactly at the right time between BIOS screen after keyboard is active and before grub starts. | 09:34 |
smb | Not really much time | 09:35 |
apb1963 | go it | 09:35 |
apb1963 | got it | 09:36 |
apb1963 | Yeah, I've been experiencing all kinds of weird stuff | 09:37 |
apb1963 | My network takes like 2 minutes to come up | 09:37 |
apb1963 | the network manager can't even see my network - not that I use network manager but still | 09:38 |
apb1963 | I have an approximate 7 second delay on SIP calls. Nobody has a clue what that's all about. | 09:38 |
apb1963 | Stuff crashes randomly | 09:39 |
apb1963 | Various applications | 09:39 |
apb1963 | I get loads over 7 for no apparent reason | 09:39 |
apb1963 | Normal operation is like less than .5 | 09:39 |
apb1963 | I've been reporting bugs left & right in various apps. | 09:40 |
apb1963 | I'm frustrated :/ | 09:40 |
apb1963 | well, at least my machine stopped freezing up for now. | 09:40 |
apw | you should try running development where that kind of experience is the norm | 09:41 |
apb1963 | Yeah well...... this is my "production" machine. And I only have the one. | 09:43 |
apb1963 | This machine was supposed to an asterisk server.... then my only other machine died of hardware failure... and I was forced to install a desktop on this one... | 09:45 |
apb1963 | So I'm waiting for a hand-me-down from my brother for 3 weeks now, and I don't konw if he's ever going to send it. | 09:45 |
apb1963 | In the meantime, I'm trying to get by on just this one machine running everything... | 09:46 |
apb1963 | Don't ever be poor if you can avoid it. | 09:47 |
apb1963 | I can't decide if I should start some of the stuff I had running earlier to see if it triggers the freeze | 09:49 |
apb1963 | About 35 chrome windows, libreoffice with a half dozen or so files open, skype, Kontact, firefox.... | 09:49 |
smb | Well doing one at a time and wait some time in between at least would give some hints | 09:50 |
apb1963 | yeah | 09:50 |
apb1963 | I have a strong suspicion it was sflphone | 09:50 |
=== shuduo is now known as shuduo_afk | ||
apb1963 | how an app could cause a kernel freeze though.... I don't know. | 09:51 |
smb | Is it really a kernel freeze? Sometimes X locking up hard just looks the same | 09:51 |
apb1963 | I was afraid it might be bad RAM... is there a way to test RAM while the machine is booted? | 09:51 |
apb1963 | I couldn't switch to a VT | 09:51 |
apb1963 | When X locks up I can generally switch to a VT | 09:52 |
smb | I had those too in the past. if X takes all the keypresses and ignores them. Hard to say then | 09:52 |
apb1963 | I mean my system gets a high load and brings it to a crawl... but eventually I can get a terminal and run top... shows me a rediculous load I start killing a few chrome windows and it bounces back. | 09:53 |
apb1963 | I think flash is causing the load | 09:53 |
apb1963 | but this was different | 09:53 |
smb | Only chance would be a second computer on the network and trying to ping or ssh... If you get top running, I think 'c' sorts by cpu usage, so you would see which process is doing that | 09:53 |
apb1963 | Oh and the very first time this happend I got some kernel messages | 09:54 |
=== shuduo_afk is now known as shuduo | ||
apb1963 | yeah | 09:54 |
apb1963 | I got like a "cut here" message | 09:54 |
apb1963 | I couldn't find it in the log | 09:54 |
apb1963 | but it was a definite kernel crash | 09:54 |
apw | photos are you main plan when that happens in case it isn't in the log | 09:54 |
apb1963 | after the crash it just kept freezing | 09:54 |
apb1963 | photos? | 09:55 |
apw | if you can see it, so can your camera | 09:55 |
apb1963 | oh. yeah well... let me use the "poor" key word again | 09:55 |
apb1963 | sucks to be me | 09:55 |
apb1963 | oh well :) | 09:56 |
smb | Pen and paper? Yeah it is tedious. ;) | 09:56 |
apb1963 | well... I just assumed it would be in the log | 09:56 |
apb1963 | so I didn't even consider that | 09:56 |
smb | Yeah, problem with crashes is that the kernel grinds to a halt. So is the part that writes to disk. | 09:57 |
apb1963 | I don't know why, but I figured if it had time to write to the screen it had time to write to disk. What can I say, I wasn't thinking. | 09:58 |
=== shuduo is now known as shuduo_afk | ||
apb1963 | ok i'll startup skype | 09:59 |
apb1963 | hmmm... might have already been running.... came up too fast | 09:59 |
apb1963 | and now we call libreoffice to the stand.... | 09:59 |
smb | apb1963, If you can afford the time it would be good to wait about those 30m it took to freeze between additional applications | 10:01 |
smb | ok skype was running for that I suppose | 10:02 |
apb1963 | well, I can afford the time - but only because it's 2am and I need sleep | 10:03 |
apb1963 | so if it crashes or freezes, so be it. | 10:04 |
apb1963 | But lets say it does... then what do I do? | 10:04 |
smb | If you see crash messages on the screen, write them down. Then probably repeat with just the last app started | 10:09 |
smb | if you find a suspect, you could slowly go back to older kernel versions to see whether they are the same | 10:10 |
smb | If it looks to be the kernel start filing a bug by running "ubuntu-bug linux" from that machine. | 10:12 |
smb | If it keeps crashing even with older kernels and you find its a certain application started, you probably have to make the bug report manually. Hm, well try to replace linux with the appname. Not sure this works though | 10:14 |
apb1963 | hmm | 10:15 |
apb1963 | there's also the further complication that I'm using 6 different virtual desktops | 10:17 |
apb1963 | so who knows if that's interacting | 10:18 |
apb1963 | earlier I was sorting my apps into VD's... this last time I didn't bother. | 10:18 |
apb1963 | Would be nice if apps returned to their assigned VD's, but that's another story. | 10:18 |
smb | I could not guarantee that but which VD an app is should not matter for crashing. And no, we cannot help you with that placement problem here. ;) | 10:20 |
apb1963 | k | 10:20 |
apb1963 | KDE thing I guess | 10:20 |
apb1963 | well, thank you for the help! I'm gonna hit the hay before my head hits the keyboard :) | 10:22 |
apb1963 | 'night | 10:22 |
smb | That may help a lot. Good night. :) | 10:23 |
=== gfrog is now known as gfrog_afk | ||
=== ghostcube__ is now known as ghostcube | ||
rtg | henrix, hmm, v3.12.3 broke e1000e. have you heard anything about that ? I'll check that it really works on 3.12.2 first | 13:46 |
henrix | rtg: no, i haven't found any e1000e patch but usually network patches aren't tagged for stable | 13:55 |
henrix | rtg: davem queues them and sends them in a batch | 13:55 |
henrix | rtg: let me check his queue... | 13:55 |
henrix | rtg: a quick look here http://patchwork.ozlabs.org/bundle/davem/stable/?state=* doesn't show anything | 13:57 |
henrix | and there's nothing obvious on 3.12.3 that would break this driver | 13:58 |
bjf | rtg, how does the issue manifest itself? does it just not recognize the nic or does it spew chunks? | 13:58 |
rtg | bjf, henrix: actually, this looks like AA denying dhclient. 3.12 kernel in a precise user space | 14:01 |
rtg | I'll try booting with it disabled. | 14:02 |
rtg | that was it. | 14:06 |
rtg | jjohansen1, can you think of a reason that a 3.12 kernel wouldn't work with a precise apparmor ? its blocking dhclient. | 14:07 |
smb | Its a bit weird my VM guest was ok with dhcp and a ... oh 3.11 kernel ... | 14:09 |
rtg | smb, try a mailine 3.12.3 | 14:10 |
rtg | mainline | 14:10 |
apw | that sounds familiar, it blocking dhclient i mean | 14:11 |
smb | rtg, Yeah maybe later. Right now I use that for fiddling with some lts-s dkms | 14:11 |
apw | wasn't that the error we had on the phones when part of apparmor3 was missing ? | 14:11 |
rtg | apw, maybe | 14:11 |
apw | or if userspace was out of sync with kernel, something like that | 14:11 |
apw | smb, test the 3.13 kernel if that works we can hide from it perhaps | 14:12 |
rtg | apw, a newer kernel should always work on an older user space | 14:12 |
smb | apw, can do in a bit. but I suspect I might be too late then | 14:13 |
apw | rtg, oh of course it should, i mean when the aa3 bits went in there was an order that was bad, i think kernel had to hit before userspace | 14:17 |
apw | so we could have lost something in the kernel and break outselves with newer userspace | 14:17 |
rtg | apw, that shouldn't be the case for precise, right ? | 14:17 |
apw | oohhhh, this is lts-backport, erm, it just might, as in the new kernel is enforcing something which was specified but broken in older kernels | 14:18 |
apw | so lets think... i thnk we had to have updated user profiles _first_ then the new kernel bits for aa3, which might explain this | 14:19 |
rtg | apw, well, its not rally the LTS, its just the trusty kernel (though it shouldn't make a difference) | 14:19 |
apw | but ... we need jjohansen1 rather than my rather unreliable memeory | 14:19 |
apw | right, this is the kernel with aa3 on precise ie old profiles, that might be an issue | 14:19 |
apw | we might have to update the profiles, which will work same on older kernels. but jj would be the one to confirm for sure | 14:20 |
rtg | apw, so I'm running a 3.13 kernel on a precise server | 14:20 |
apw | and it works or does not work | 14:20 |
rtg | works fine | 14:21 |
apw | using dhcp or static network config | 14:21 |
apw | i think it was dbus mediation which was the issue | 14:21 |
rtg | apw, uses dhcp. lemme go get the bit of log where AA was denying the socket request. | 14:22 |
apw | ie nothing wrong with networking, just unable to request networking | 14:22 |
apw | something odd like that, so ehternet might work, and wireless not or ... something | 14:22 |
rtg | Dec 5 23:58:24 gbyte kernel: [ 644.311454] type=1400 audit(1386251904.862:93): apparmor="DENIED" operation="connect" parent=2101 profile="/usr/lib/NetworkManager/nm-dhcp-client.action" name="/run/dbus/system_bus_socket" pid=2103 comm="nm-dhcp-client." requested_mask="r" denied_mask="r" fsuid=0 ouid=0 | 14:23 |
rtg | De | 14:23 |
rtg | apw, yep, the network driver seems fine | 14:23 |
apw | yeah that'd be my memory of the issue, that the dbus bits get mediated and prevented | 14:24 |
apw | because the profiles specifity it wrongly, but the older kernles don't implement it either | 14:25 |
rtg | apw, yeah, its also denying cupsd | 14:25 |
apw | so it all works, and then we fix the kernel and it applies them and crunch | 14:25 |
apw | so i think we will need a profile update for precise before this can work, but as i say | 14:25 |
apw | jjohansen1, is the man to confirm my madness | 14:25 |
apw | or we might decide to switch that bit off for the backport only | 14:25 |
rtg | apw, just pushed Ubuntu-3.12.0-6.14 which we should smoke test just to be ure | 14:29 |
apw | ack | 14:29 |
apw | rtg, i take it you didn't tag it | 14:31 |
apw | (as yet) | 14:31 |
rtg | nope, not until I'm done making changes | 14:31 |
rtg | I generally tag things just before uploading | 14:31 |
apw | makes sense, jsut confriming i'd not gotten an old one | 14:34 |
apw | rtg, i've just responded to your linux-tools thing, i don't think its quite there ... | 14:40 |
rtg | apw, good, I wondered if I missed anything. I had a splitting headache whilst I was doing that | 14:40 |
=== jdstrand_ is now known as jdstrand | ||
apw | rtg, you know, i wonder if we should be offering them the linux-tools-version[-flavour] and also offering them something based on the linux-image- meta package they have installed perhaps | 16:37 |
rtg | apw, well, the meta package naming isn't totally consistent. I think this at least will give folks an idea that they might have to hunt down the _right_ meta package. | 16:38 |
apw | yeah, i think we should get the -flavour bit right, shall i have a poke and see if i can improve | 16:40 |
rtg | apw, have at it. Frankly, my attitude is that if someone is installing tools, then they likely know what they are doing (given a little nudging) | 16:40 |
apw | rtg, it would be nice to not need that to have versions in there, i wonder if we could fix that somehow | 16:52 |
rtg | apw, do you mean you wish perf was not ABI specific ? | 16:52 |
apw | rtg, hey i also wish that. i meant i wish i could incant at like dpkg to get the list of versions and packages | 16:54 |
apw | so it would work on all releases for ever | 16:54 |
rtg | apw, well, we've only got to worry about one more LTS kernel for 12.04 | 16:55 |
rtg | and we can probably predict the meta package name | 16:55 |
apw | yeah | 16:59 |
* rtg -> lunch | 18:39 | |
hallyn | apw: could you kick off a new build of the trusty -unstable kernel in ppa? | 18:52 |
jjohansen1 | apw, rtg: nothing new in 3.12 over 3.11 that I am aware of, however the aa3 bits in both of those do require some policy updates or dhclient etc will break | 19:06 |
rtg | jjohansen1, which is exactly what is happening | 19:06 |
jjohansen1 | yep | 19:07 |
jjohansen1 | sorry just working through the backscroll | 19:07 |
bjf | cking, i see this every once and a while in testing: "Timed out after 30 seconds doing mkdir() - possible eCryptfs hang" but if i just retry the test it doesn't happen again. do you want to see these? | 20:29 |
tyhicks | bjf: looking at the test, that may be caused by a race condition which would explain why you don't always see it | 20:31 |
bjf | tyhicks, do you want to look at the system when it gets into that state? | 20:32 |
tyhicks | bjf: yeah, that would be helpful | 20:32 |
bjf | tyhicks, ok, next time it happens and your around i'll let you know. do you have your QA lab vpn creds? | 20:33 |
tyhicks | bjf: I think so, but it has been a while so I'm not 100% sure | 20:33 |
bjf | tyhicks, ack | 20:33 |
tyhicks | bjf: yep, I see it in my network manager menus | 20:33 |
tyhicks | and it connects, so I should be ready when it happens again | 20:34 |
tyhicks | bjf: what kernel version is this happening on? do you know what lower filesystem ecryptfs is mounted on when it happens? | 20:35 |
* rtg -> EOD | 21:15 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!