=== lifelike_ is now known as lifelike === shengyao_afk is now known as shengyao === shengyao is now known as shengyao_afk === shuduo_afk is now known as shuduo [03:58] ubuntu 12.04.3 3.2.0-57-generic-pae #87-Ubuntu SMP Tue Nov 12 21:57:43 UTC 2013 i686 i686 i386 GNU/Linux CRASH. FREEZE....reboot.... FREEZE....reboot... FREEZE....reboot.... FREEZE [04:01] :( [04:02] I get about 30 minutes of uptime now... then it freezes. [04:03] I don't see anything exciting in the logs.... but then I don't know what to look for. [04:04] started happening yesterday === shuduo is now known as shuduo_afk [04:05] or perhaps this morning... something came down in an update, I don't know what. [04:06] AFAIK everything is current. === shuduo_afk is now known as shuduo === shuduo is now known as shuduo_afk === shuduo_afk is now known as shuduo [05:23] And we have another freeze... and reboot. [06:56] And another freeze.... and reboot [09:01] apb1963, if this is sudden, then you want to try and oldre kernel and see if it does it there [09:03] apw, Or gfx, whatever or whichever he got [09:03] moin [09:03] apw, moin [09:04] yawn [09:04] fwiw I was running that kernel for a bit but now moved on as I got proposed enabled [09:13] apw Would the update manager have downloaded and installed a new kernel? [09:14] The date here....3.2.0-57-generic-pae #87-Ubuntu SMP Tue Nov 12 21:57:43 UTC 2013 i686 i686 i386 isn't that the date the kernel was compiled? [09:15] yes that is the compile date, but also yes update manager also does download and install kernel packages just like any other [09:16] ok, so .... did something force a download within the last 24 hours? Wouldn't I have gotten that kernel 3 weeks ago? [09:16] you can tell what has ben downloaded in the last 24 hours in your apt logs [09:16] cool. let me check that. [09:18] Even longer in /var/log/apt/history.log* [09:20] yeah that's what I was looking at... not a lot in there. However, notice the time of my last comment before you guys woke up. [09:20] it's been over 2 hours [09:22] The only thing that really happened in that time is I updated sflphone - an application I use. A softphone. I went into that channel several hours ago... and noticed that the developer mentioned to me he thought he had fixed a bug I reported. I told him of this freeze problem - he didn't respond, but another updated came down shortly after... I installed that and now it's been 2 hours since the last freeze. [09:23] That's the same app that was tickling a DBUS bug I reported not too long ago. [09:23] It was crashing my system... so he changed his code... to accommodate. [09:24] So many changes going on it's hard to know what's causing what. [09:24] * apw nods [09:25] And... as best as I can remember... I thought I updated to the -57 version a few weeks ago... but i'm not 100% sure... hell, it could have been -56 [09:26] I can post the apt log if you wouldn't mind having a look. I'm sure you're more familiar with the various things than I am. [09:26] apb1963, Usually you have at least 2 (usually even more if you don't manually delete them) of the previous kernels [09:26] I haven't deleted anything [09:27] although I'm not sure where to look for them [09:27] If you hit left shift during boot (or modify grub to always show up for a few seconds) you can select them [09:27] ls /boot [09:28] Yeah.... I tried hitting both left & right shift... it just ignores it. [09:28] The timing is... trick [09:28] tricky [09:28] I held it down for the entire boot sequence... it ignored me [09:28] I prefer to modify /etc/default/grub to comment out the two *HIDDEN* lines [09:29] set the other timeout to 10 and run "sudo update-grub" [09:29] heh. I've got the last 10 kernels. [09:29] I wlil do that [09:30] so... this one...#GRUB_HIDDEN_TIMEOUT=0 [09:31] right and there is another one [09:31] #GRUB_HIDDEN_TIMEOUT=0 [09:31] #GRUB_HIDDEN_TIMEOUT_QUIET=true [09:31] GRUB_TIMEOUT=10 [09:31] GRUB_HIDDEN_TIMEOUT_QUIET=true [09:31] so that was already uncommented [09:32] apb1963, They have to be like I posted above [09:32] Both HIDDEN commented out and the GRUB_TIMEOUT set to whatever wait time you want [09:34] If hidden timeout is true, the hidden timeout value is used (which is 0 by default), which gives you no delay and you have to press shift exactly at the right time between BIOS screen after keyboard is active and before grub starts. [09:35] Not really much time [09:35] go it [09:36] got it [09:37] Yeah, I've been experiencing all kinds of weird stuff [09:37] My network takes like 2 minutes to come up [09:38] the network manager can't even see my network - not that I use network manager but still [09:38] I have an approximate 7 second delay on SIP calls. Nobody has a clue what that's all about. [09:39] Stuff crashes randomly [09:39] Various applications [09:39] I get loads over 7 for no apparent reason [09:39] Normal operation is like less than .5 [09:40] I've been reporting bugs left & right in various apps. [09:40] I'm frustrated :/ [09:40] well, at least my machine stopped freezing up for now. [09:41] you should try running development where that kind of experience is the norm [09:43] Yeah well...... this is my "production" machine. And I only have the one. [09:45] This machine was supposed to an asterisk server.... then my only other machine died of hardware failure... and I was forced to install a desktop on this one... [09:45] So I'm waiting for a hand-me-down from my brother for 3 weeks now, and I don't konw if he's ever going to send it. [09:46] In the meantime, I'm trying to get by on just this one machine running everything... [09:47] Don't ever be poor if you can avoid it. [09:49] I can't decide if I should start some of the stuff I had running earlier to see if it triggers the freeze [09:49] About 35 chrome windows, libreoffice with a half dozen or so files open, skype, Kontact, firefox.... [09:50] Well doing one at a time and wait some time in between at least would give some hints [09:50] yeah [09:50] I have a strong suspicion it was sflphone === shuduo is now known as shuduo_afk [09:51] how an app could cause a kernel freeze though.... I don't know. [09:51] Is it really a kernel freeze? Sometimes X locking up hard just looks the same [09:51] I was afraid it might be bad RAM... is there a way to test RAM while the machine is booted? [09:51] I couldn't switch to a VT [09:52] When X locks up I can generally switch to a VT [09:52] I had those too in the past. if X takes all the keypresses and ignores them. Hard to say then [09:53] I mean my system gets a high load and brings it to a crawl... but eventually I can get a terminal and run top... shows me a rediculous load I start killing a few chrome windows and it bounces back. [09:53] I think flash is causing the load [09:53] but this was different [09:53] Only chance would be a second computer on the network and trying to ping or ssh... If you get top running, I think 'c' sorts by cpu usage, so you would see which process is doing that [09:54] Oh and the very first time this happend I got some kernel messages === shuduo_afk is now known as shuduo [09:54] yeah [09:54] I got like a "cut here" message [09:54] I couldn't find it in the log [09:54] but it was a definite kernel crash [09:54] photos are you main plan when that happens in case it isn't in the log [09:54] after the crash it just kept freezing [09:55] photos? [09:55] if you can see it, so can your camera [09:55] oh. yeah well... let me use the "poor" key word again [09:55] sucks to be me [09:56] oh well :) [09:56] Pen and paper? Yeah it is tedious. ;) [09:56] well... I just assumed it would be in the log [09:56] so I didn't even consider that [09:57] Yeah, problem with crashes is that the kernel grinds to a halt. So is the part that writes to disk. [09:58] I don't know why, but I figured if it had time to write to the screen it had time to write to disk. What can I say, I wasn't thinking. === shuduo is now known as shuduo_afk [09:59] ok i'll startup skype [09:59] hmmm... might have already been running.... came up too fast [09:59] and now we call libreoffice to the stand.... [10:01] apb1963, If you can afford the time it would be good to wait about those 30m it took to freeze between additional applications [10:02] ok skype was running for that I suppose [10:03] well, I can afford the time - but only because it's 2am and I need sleep [10:04] so if it crashes or freezes, so be it. [10:04] But lets say it does... then what do I do? [10:09] If you see crash messages on the screen, write them down. Then probably repeat with just the last app started [10:10] if you find a suspect, you could slowly go back to older kernel versions to see whether they are the same [10:12] If it looks to be the kernel start filing a bug by running "ubuntu-bug linux" from that machine. [10:14] If it keeps crashing even with older kernels and you find its a certain application started, you probably have to make the bug report manually. Hm, well try to replace linux with the appname. Not sure this works though [10:15] hmm [10:17] there's also the further complication that I'm using 6 different virtual desktops [10:18] so who knows if that's interacting [10:18] earlier I was sorting my apps into VD's... this last time I didn't bother. [10:18] Would be nice if apps returned to their assigned VD's, but that's another story. [10:20] I could not guarantee that but which VD an app is should not matter for crashing. And no, we cannot help you with that placement problem here. ;) [10:20] k [10:20] KDE thing I guess [10:22] well, thank you for the help! I'm gonna hit the hay before my head hits the keyboard :) [10:22] 'night [10:23] That may help a lot. Good night. :) === gfrog is now known as gfrog_afk === ghostcube__ is now known as ghostcube [13:46] henrix, hmm, v3.12.3 broke e1000e. have you heard anything about that ? I'll check that it really works on 3.12.2 first [13:55] rtg: no, i haven't found any e1000e patch but usually network patches aren't tagged for stable [13:55] rtg: davem queues them and sends them in a batch [13:55] rtg: let me check his queue... [13:57] rtg: a quick look here http://patchwork.ozlabs.org/bundle/davem/stable/?state=* doesn't show anything [13:58] and there's nothing obvious on 3.12.3 that would break this driver [13:58] rtg, how does the issue manifest itself? does it just not recognize the nic or does it spew chunks? [14:01] bjf, henrix: actually, this looks like AA denying dhclient. 3.12 kernel in a precise user space [14:02] I'll try booting with it disabled. [14:06] that was it. [14:07] jjohansen1, can you think of a reason that a 3.12 kernel wouldn't work with a precise apparmor ? its blocking dhclient. [14:09] Its a bit weird my VM guest was ok with dhcp and a ... oh 3.11 kernel ... [14:10] smb, try a mailine 3.12.3 [14:10] mainline [14:11] that sounds familiar, it blocking dhclient i mean [14:11] rtg, Yeah maybe later. Right now I use that for fiddling with some lts-s dkms [14:11] wasn't that the error we had on the phones when part of apparmor3 was missing ? [14:11] apw, maybe [14:11] or if userspace was out of sync with kernel, something like that [14:12] smb, test the 3.13 kernel if that works we can hide from it perhaps [14:12] apw, a newer kernel should always work on an older user space [14:13] apw, can do in a bit. but I suspect I might be too late then [14:17] rtg, oh of course it should, i mean when the aa3 bits went in there was an order that was bad, i think kernel had to hit before userspace [14:17] so we could have lost something in the kernel and break outselves with newer userspace [14:17] apw, that shouldn't be the case for precise, right ? [14:18] oohhhh, this is lts-backport, erm, it just might, as in the new kernel is enforcing something which was specified but broken in older kernels [14:19] so lets think... i thnk we had to have updated user profiles _first_ then the new kernel bits for aa3, which might explain this [14:19] apw, well, its not rally the LTS, its just the trusty kernel (though it shouldn't make a difference) [14:19] but ... we need jjohansen1 rather than my rather unreliable memeory [14:19] right, this is the kernel with aa3 on precise ie old profiles, that might be an issue [14:20] we might have to update the profiles, which will work same on older kernels. but jj would be the one to confirm for sure [14:20] apw, so I'm running a 3.13 kernel on a precise server [14:20] and it works or does not work [14:21] works fine [14:21] using dhcp or static network config [14:21] i think it was dbus mediation which was the issue [14:22] apw, uses dhcp. lemme go get the bit of log where AA was denying the socket request. [14:22] ie nothing wrong with networking, just unable to request networking [14:22] something odd like that, so ehternet might work, and wireless not or ... something [14:23] Dec 5 23:58:24 gbyte kernel: [ 644.311454] type=1400 audit(1386251904.862:93): apparmor="DENIED" operation="connect" parent=2101 profile="/usr/lib/NetworkManager/nm-dhcp-client.action" name="/run/dbus/system_bus_socket" pid=2103 comm="nm-dhcp-client." requested_mask="r" denied_mask="r" fsuid=0 ouid=0 [14:23] De [14:23] apw, yep, the network driver seems fine [14:24] yeah that'd be my memory of the issue, that the dbus bits get mediated and prevented [14:25] because the profiles specifity it wrongly, but the older kernles don't implement it either [14:25] apw, yeah, its also denying cupsd [14:25] so it all works, and then we fix the kernel and it applies them and crunch [14:25] so i think we will need a profile update for precise before this can work, but as i say [14:25] jjohansen1, is the man to confirm my madness [14:25] or we might decide to switch that bit off for the backport only [14:29] apw, just pushed Ubuntu-3.12.0-6.14 which we should smoke test just to be ure [14:29] ack [14:31] rtg, i take it you didn't tag it [14:31] (as yet) [14:31] nope, not until I'm done making changes [14:31] I generally tag things just before uploading [14:34] makes sense, jsut confriming i'd not gotten an old one [14:40] rtg, i've just responded to your linux-tools thing, i don't think its quite there ... [14:40] apw, good, I wondered if I missed anything. I had a splitting headache whilst I was doing that === jdstrand_ is now known as jdstrand [16:37] rtg, you know, i wonder if we should be offering them the linux-tools-version[-flavour] and also offering them something based on the linux-image- meta package they have installed perhaps [16:38] apw, well, the meta package naming isn't totally consistent. I think this at least will give folks an idea that they might have to hunt down the _right_ meta package. [16:40] yeah, i think we should get the -flavour bit right, shall i have a poke and see if i can improve [16:40] apw, have at it. Frankly, my attitude is that if someone is installing tools, then they likely know what they are doing (given a little nudging) [16:52] rtg, it would be nice to not need that to have versions in there, i wonder if we could fix that somehow [16:52] apw, do you mean you wish perf was not ABI specific ? [16:54] rtg, hey i also wish that. i meant i wish i could incant at like dpkg to get the list of versions and packages [16:54] so it would work on all releases for ever [16:55] apw, well, we've only got to worry about one more LTS kernel for 12.04 [16:55] and we can probably predict the meta package name [16:59] yeah [18:39] * rtg -> lunch [18:52] apw: could you kick off a new build of the trusty -unstable kernel in ppa? [19:06] apw, rtg: nothing new in 3.12 over 3.11 that I am aware of, however the aa3 bits in both of those do require some policy updates or dhclient etc will break [19:06] jjohansen1, which is exactly what is happening [19:07] yep [19:07] sorry just working through the backscroll [20:29] cking, i see this every once and a while in testing: "Timed out after 30 seconds doing mkdir() - possible eCryptfs hang" but if i just retry the test it doesn't happen again. do you want to see these? [20:31] bjf: looking at the test, that may be caused by a race condition which would explain why you don't always see it [20:32] tyhicks, do you want to look at the system when it gets into that state? [20:32] bjf: yeah, that would be helpful [20:33] tyhicks, ok, next time it happens and your around i'll let you know. do you have your QA lab vpn creds? [20:33] bjf: I think so, but it has been a while so I'm not 100% sure [20:33] tyhicks, ack [20:33] bjf: yep, I see it in my network manager menus [20:34] and it connects, so I should be ready when it happens again [20:35] bjf: what kernel version is this happening on? do you know what lower filesystem ecryptfs is mounted on when it happens? [21:15] * rtg -> EOD