/srv/irclogs.ubuntu.com/2012/10/07/#ubuntu-bugs.txt

=== Ursinha is now known as Ursinha-afk
=== Ursinha-afk is now known as Ursinha
=== Ursinha is now known as Ursinha-afk
luc4_macHi! Anyone here who knows what hard lockup on cpu from watchdog means?09:58
=== mitya57_ is now known as mitya57
=== luc4_mac_ is now known as luc4_mac
=== luc4_mac_ is now known as luc4_mac
=== luc4_mac_ is now known as luc4_mac
luc4_macHi! Anyone here who knows what hard lockup on cpu from watchdog means?12:23
penguin42luc4_mac: vaguely12:28
penguin42luc4_mac: There is a 'watchdog' timer that goes off regularly, to detect when something has stopped responding (i.e. locked up)12:28
penguin42luc4_mac: What's the message you got and is it in a vm ?12:29
luc4_macpenguin42: hi, I suppose you don't remember me. You helped me with a network issue/bug. I'm still experiencing that and I notice in my dmesg that message "watchdog detected hard lockup in cpu 0". I was wondering if that could make my network go down.12:30
penguin42luc4_mac: The watchdog message is more of a symptom rather than a cause - it says something bad is happening, but not why12:31
luc4_macpenguin42: it is my understanding it might reboot some kind of processes when the CPU is overloaded.12:31
penguin42luc4_mac: It's not as simple as overloaded, if there is a lot of stuff running and the CPU is busy you still shouldn't get that12:32
luc4_macpenguin42: for some reason it seems that my old old system is not using DMA (don't know why either) and results overloaded for long periods.12:32
penguin42luc4_mac: It only happens if the kernel effectively doesn't get a chance to run for a while and that should never happen12:32
penguin42luc4_mac: Post a full dmesg to pastebin?12:33
luc4_macpenguin42: in that case… I was wondering if my network issue could be related to that and in that case if I should add the information to the bugreport. I rebooted, I'll have to search that if it is still in my logs. I noticed anyway it is very frequent.12:34
penguin42It really shouldn't happen!12:34
penguin42luc4_mac: When you say it's not using DMA - on hard drive?12:34
luc4_macpenguin42: it shouldn't happen that DMA is not used as well, but it seems there are many things not working properly here… yes, the hard drive seems not to be using DMA.12:35
penguin42what's telling you that?12:35
luc4_macpenguin42: when accessing the hard drive CPU is in IO wait almost 100%.12:35
luc4_macalso haparm seems to report that.12:36
luc4_machdparm sorry.12:36
penguin42luc4_mac: OK, get a full dmesg in a pastebin12:36
penguin42luc4_mac: the watchdog stuff can happen if the kernel is stuck in a driver for a long time, so if something is going badly wrong with some driver it's less surprising if you're getting a watchdog12:36
=== luc4_mac_ is now known as luc4_mac
luc4_macpenguin42: Ok, I found one of those warnings: http://pastebin.com/zciQkwya.12:44
penguin42luc4_mac: I need the full dmesg12:45
luc4_macpenguin42: I found that in a kern.log file. It was starting with that. Maybe it is better if I wait for it to happen again?12:46
penguin42luc4_mac: Look, I need the full dmesg to be helpful12:46
luc4_macpenguin42: do you mean from the boot of the system?12:48
penguin42luc4_mac: Just run dmesg and put the full output in a pastebin12:48
* penguin42 wants to see the stuff where it's detecting and doing stuff with the hardware and disks in particular12:48
luc4_macpenguin42: yes, I can do that, but it won't include the warning message because the system has not logged it yet.12:49
penguin42that's ok12:49
luc4_macpenguin42: entire dmesg at the moment: http://pastebin.com/0AaHcwdd.12:52
penguin42luc4_mac: OK, so my reading on there is that everything is in DMA at that point12:56
luc4_macpenguin42: oh… then there should be something else explaining the IO wait...12:57
penguin42luc4_mac: See next to each of the devices it shows UDMA or MDMA12:57
luc4_macpenguin42: this is the bug that is still affecting me anyway: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/997767.13:00
ubot2Ubuntu bug 997767 in linux "10ec:8139 Network connection rtl8139 lost after some hours of inactivity and comes up again on user interaction" [Medium,Confirmed]13:00
luc4_macpenguin42: I installed Ubuntu again, fresh system. After a month or so, the same is happening again.13:00
penguin42luc4_mac: Watch that dmesg for anything else; my guess is that after a while you'll get some errors, I'm guessing as a result of a hard drive problem and it'll reset the bus and drop out of dma13:01
penguin42luc4_mac: The important thing is to find the _first_ bad thing that happens in dmesg13:02
luc4_macpenguin42: Still anyway hdparm is reporting HDIO_GET_DMA failed: Inappropriate ioctl for device. Is this supposed to happen?13:03
penguin42no, what exactly is the hdparm command you're giving?13:04
luc4_macpenguin42: found in the Ubuntu documentation: hdparm -d /dev/sdb2.13:04
penguin42luc4_mac: That happens for me as well, it wouldn't surprise me if that's no longer supported now that stuff is goign via the /dev/sd stuff13:06
luc4_macpenguin42: ah ok, no problems then.13:06
penguin42luc4_mac: what about hdparm -I /dev/sdb  ?13:06
penguin42luc4_mac: Mine has something like   DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6     and I think the * indicates the one in use13:07
luc4_macpenguin42: can I ask you if it is possible at all that some energy saving is still causing the network shutdown?13:08
penguin42luc4_mac: Yeh that's a reasonable cause13:09
luc4_macpenguin42: so, is there a way for me to be certain that no energy saving is applied?13:09
penguin42luc4_mac: I'm not too sure about the energy saving stuff - there are loads of different things that do it13:10
luc4_macpenguin42: anyway, months ago you suggested to iterate an ifconfig and check what happens in case of network shutdown. What resulted is that ifconfig is iterated and shows many dropped packets. This doesn't seem energy saving to me...13:10
penguin42luc4_mac: I know things like 'powertop' help you find out what you can turn on to save energy, perhasp look at the docs for it to see what you can turn off13:10
luc4_macpenguin42: installing Ubuntu server might be a solution maybe...13:11
penguin42maybe, maybe not13:11
luc4_macpenguin42: do you think I'm heading the right way investigating this watchdog warning to solve my network issue? Or do you think that is unrelated?13:15
penguin42luc4_mac: the watchdog warning is a bit odd, it's possible that it's related, but the backtrace looked more disk related13:16
penguin42luc4_mac: The important thing is to see whether the watchdog is the 1st bad thing in the logs or whether there is something else first13:17
luc4_macpenguin42: this system is really really weird… What I just noticed is this: if I transfer via ethernet a large file using samba I get less than 500Kb/s and IO wait over 90%. If I transfer it via ssh, I get more than 10Mbit/s and almost no IO wait...13:20
penguin42is it a large file full of zeros ?13:21
penguin42I think ssh compresses by default13:21
luc4_macpenguin42: no, avi file.13:22
penguin42hmm ok so that should already be heavily compressed13:23
luc4_macpenguin42: ah ah, I got it… different partition :-) if I scp from one partition I get that strange behavior!13:25
penguin42luc4_mac: And that's on one of your disks and the other partition is on a different one?13:26
luc4_macpenguin42: yes, two different disks I think.13:26
luc4_macpenguin42: yes, sda* is ok, sdb* is not.13:27
penguin42luc4_mac: Ok, now do the hdparm -I /dev/sdb   -what does the DMA line show?13:28
luc4_macpenguin42: so, when tranfering from /dev/sda* transfer is fast. From /dev/sdb* I get a system completely overloaded.13:28
luc4_macpenguin42: the interesting line is this I think: DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5.13:29
penguin42hmm well that's still happy13:29
penguin42luc4_mac: Anything new in the dmesg output yet?13:30
luc4_macpenguin42: the system is so overloaded that even the mouse cursor is not moving.13:30
luc4_macpenguin42: I'm transfering now overloading the system but dmesg seems to output nothing new.13:31
penguin42luc4_mac: OK, that shouldn't happen13:31
luc4_macpenguin42: this also explains why usb transfer was almost stuck. Transfering from that disk is an issue.13:32
penguin42luc4_mac: Sure there are no new dmesg entries?13:33
luc4_macpenguin42: last line is: [   86.952063] usb 4-2: USB disconnect, device number 2.13:33
luc4_macpenguin42: the same as before.13:33
penguin42luc4_mac: Hmm ok, it's odd; it's possible that the driver/controller really doesn't like slave drives - if there was an actual faulty cable or disk I'd expect to see some retries/errors in the logs13:34
luc4_macpenguin42: maybe I could plug that differently to the mb…13:35
penguin42luc4_mac: I'd check the master/slave/cable select jumpers on it and the master on that cable, but also if you could try swapping it to your other ide chain (as the only drive) and seeing if it still gets naff performance - that would isolate whether it's the drive or the channel13:36
luc4_macpenguin42: doing it now :-(13:37
penguin42luc4_mac: You might also try running smartctl -a on the drive to see if it's reporting problems, but again if it's actually faulty I'd expect some dmesg content by now13:37
luc4_macdmesg output: [ 3266.264700] sched: RT throttling activated13:37
penguin42luc4_mac: https://lkml.org/lkml/2012/1/13/60   I think the slow disk, that RT throttling and the watchdog are probably related13:38
penguin42luc4_mac: It's either some faulty hardware or a dodgy via pata driver13:39
luc4_macpenguin42: the disk might actually be faulty yes. I might be 10 years old.13:40
penguin42luc4_mac: smartctl -a should tell you if the drive is actually faulty13:41
luc4_macah ah, I meant "it might be 10 years old".13:41
penguin42luc4_mac: And similarly if you move the drive to be the master alone on your 2nd channel it should help; if the problem goes away then it's unlikely to be the drive13:41
luc4_macpenguin42: I don't see any information reporting faulty hardware...13:42
penguin42luc4_mac: Can you pastebin the output of smartctl -a /dev/sdb   ?13:42
luc4_macpenguin42: yes, here: http://pastebin.com/Rh5k0MEF13:44
penguin42odd, it says smart support is available but disabled - never seen that before13:45
luc4_macpenguin42: yes, I see that… but to be sincere I don't know what smart is.13:45
penguin42luc4_mac: It's a bunch of testing systems internal to the hard drive to detect when they're going wrong13:46
luc4_macpenguin42: ok, now I know :-) so I should enable it to test.13:46
penguin42luc4_mac: you could try   smartctl --smart=on /dev/sdb   and then   smartctl -a /dev/sdb13:46
penguin42luc4_mac: I would, and then there are really 3 types of things; 1) some stats  2) Logs of errors 3) Some full tests you can trigger13:47
penguin42luc4_mac: Like here's my disk   http://paste.ubuntu.com/1265675/13:48
luc4_macpenguin42: oooohh… I never stop learning… :-) it is better if I pastebin this :-)13:48
penguin42luc4_mac: In that all the stats are good, there is 'No errors logged' in the error log, and I've not run any of the actual tests13:50
luc4_macpenguin42: http://pastebin.com/5pDQsxfi13:52
luc4_macpenguin42: it seems like we found the issue.13:52
penguin42luc4_mac: Yeh that error log looks bad, and the pending sectors is a little high; although the reallocated sector is only 1 - sounds like you have a few bad sectors, although I'm surprsied it isn't triggering more errors in dmesg - if it actually fails to read the sector it should get an error in dmesg, it might be taking a few goes to get it13:57
penguin42luc4_mac: Looks like the drive is a bit hot as well13:58
luc4_macpenguin42: shouldn't the bad sectors be ignored and left unused?13:59
penguin42luc4_mac: Not if you're trying to read data off them14:00
penguin42luc4_mac: Different drives behave differently; some will give up after a few retries and error it back to teh OS (and you'll see it in the logs) some will keep going and just take a heck of a long time to do anything - although it still surprises me that the 1st thing you see is a watchdog/RT error14:01
luc4_macpenguin42: unfortunately I think this is not related to the network issue right?14:05
penguin42luc4_mac: It's unlikely14:06
luc4_macpenguin42: any suggestion how I can guess what is wrong with that?14:11
penguin42luc4_mac: Not really, you need to find something in some diagnostics which changes between it working and failing14:12
luc4_macpenguin42: ok, thanks for your help! ;-)14:16
penguin42np14:16
luc4_macpenguin42: it is always interesting to discuss with you!14:16
=== luc4_mac_ is now known as luc4_mac
AssociateXWhen I click on Dash home my xserver crashes and returns me to the login screen. I've searched but only found one other person with this problem and no solution.14:27
AssociateXWhat should I look at?14:28
penguin42What version of Ubuntu and what graphics card?14:31
AssociateXhold on. Nvidia 520014:32
AssociateXI will need to check, brb14:32
AssociateXhow can I tell which version I have?14:34
penguin42AssociateX: If you click on the cog at the top right and do about this computer, if it doesn't crash then it should show you the number14:36
AssociateX12.04 lts14:38
AssociateXI'm using the nvidia-173 driver because anything newer will not let flash play with my video card.14:40
AssociateXgeforce fx5200 is the card14:43
penguin42ok, I don't know the Nvidia stuff, you might want to try #ubuntu-x or #ubuntu14:44
AssociateXok, thank you very much for your time though.14:45
AssociateXhow about this, what file should I look at for the error, or what tool would I use? It's been a long time since I've had to use cli.14:46
penguin42AssociateX: If the X server is crashing then I'd expect to see a backtrace in /var/log/Xorg.0.old or /var/log/Xorg.014:47
AssociateXOK, I'm going to go look there. Thank you.14:47
penguin42AssociateX: Depending what stuff you do with your machine I'd try dropping back to Unity-2d or try the open source Nvidia driver14:47
AssociateXStill crashes there.14:47
AssociateXand just on the Dash home button14:48
AssociateXnothing else14:48
AssociateX/var/log/Xorg.0.log|less llok clean14:50
AssociateXlooks*14:50
penguin42try the .old varient14:51
AssociateXok14:51
AssociateXnothing in /var/log/ for X14:52
penguin42ok, when you say X crashes, what do you actually see?14:53
AssociateXthe screen blinks/flashes, goes black, and then the login screen shows up.14:53
AssociateXjust like you would expect when loging out.14:54
penguin42sure sounds like an X crash14:54
AssociateXyes14:54
AssociateXnothing in /var/log/ for X  <---opp's I wasn't looking correctly. I have some files to look at. brb14:56
AssociateXCaught signal 11 (Segmentation fault). Server aborting14:58
penguin42there you go14:58
AssociateXyeah, I wonder what's causing it.14:58
penguin42almost certainly a bug in the Nvidia driver14:58
penguin42it should show you a backtrace14:59
AssociateX/usr/bin/X (xorg_backtrace+0x37) [0x80a6707]14:59
AssociateXthat's the only thing that shows backtrace in it, I wouldn't know what to do with that though.15:00
AssociateXI should get lynx up and do a paste bin15:00
penguin42right but there will be some similar lines below it with different names and numbers - that set of lines is the 'back trace' - put them in a pastebin15:00
AssociateXok, brb15:01
penguin42AssociateX: Still, you've only got a few options; 1) Use something else that doesn't trigger the crash other than the dash, 2) switch driver15:01
AssociateXhttp://paste.ubuntu.com/1265827/15:16
AssociateXthat should be the pastebin15:16
AssociateX[  8595.308] Warning: Xalloc: requesting unpleasantly large amount of memory: 0 bytes.15:17
AssociateXwhat the heck is that?15:18
penguin42yeh that's weird15:18
AssociateXyeah, I have been using blackbox instead of unity, but I have kids that would like a regular desktop. Maybe I will  just install kde or somethign.15:19
AssociateXmaybe gnome, kde is pretty big15:20
AssociateXthank you again for all of your help15:20
penguin42np15:21
=== maxb_ is now known as maxb
* penguin42 looks at bug 1062159 and wonders why someone would crypt one slice of a RAID015:31
ubot2Launchpad bug 1062159 in mdadm "Raid is incorrectly determined as DEGRADED preventing boot in 12.04" [Undecided,New] https://launchpad.net/bugs/106215915:31
* penguin42 wonders what one does with a bug like 105662616:44
penguin42bug 105662616:44
ubot2Launchpad bug 1056626 in gammu "source distributes personal information" [Undecided,New] https://launchpad.net/bugs/105662616:44
hjdpenguin42: Looks like it has been removed from upstream http://blog.cihar.com/archives/2012/09/27/think-twice-making-your-private-data-public/ Good question, though.17:03
penguin42hjd: I assume there is someone that should be subscribed for 'please remove' type of things if there is some question of privacy or legals - but I've never found who?17:31
* penguin42 has sent a request to bugcontrol asking what the right thing to do is17:43
hggdhhuh?17:59
hggdhpenguin42, hjd: re gammu -- a link to the updated upstream would be nice, but not critical; a patch would be very welcome18:04
hggdhrelating to texlive: is there a Debian bug on this? We should try to keep in sync, mostly if preining is acting on it18:07
penguin42hggdh: It was a more general question on whether there is anyone/thing that tracks license/legal issues18:08
hggdhpenguin42: as far as I can remember, not specifically. But then, who am I, I never dug into the licence arena. Let's wait, worst scenario we ask in -devel18:20
penguin42hggdh: Yeh, it just seems some things you come across sound erm dodgy and really should be sorted out18:45
=== thomi is now known as thomj
Laibsch1how do you document these days that a bug is fixed in ubuntu+1 but needs a backported fix for lucid.  Bug 579958 for example.23:14
ubot2Launchpad bug 579958 in duplicity "Assertion error "time not moving forward at appropriate pace"" [Medium,Fix released] https://launchpad.net/bugs/57995823:14
=== Laibsch1 is now known as Laibsch
bcurtiswxLaibsch, hmm. Usually you have a link to "Nominate for Series" but this one doesn't. Lucid is a half year away from EOL, so my first recommendation would be to upgrade to 12.04. If you really need it in Lucid then I'd say make a bug report for a backport request23:21
Laibschyeah, that's the one I was looking for as well23:21
Laibschhardy is half a year away23:21
Laibschlucid still has 2,5 years23:21
LaibschI found the button now, you have to look at the bug as registered in Ubuntu not for upstream23:22
Laibschand Ubuntu really needs to improve the experience in its stable releases23:22
Laibschlong-term releases23:22
bcurtiswxLaibsch, once they get this late in the cycle, they'll just try to get backport requests.23:26

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!