/srv/irclogs.ubuntu.com/2008/10/17/#ubuntu-kernel.txt

=== TheMuso_ is now known as TheMuso
cathyalim just wondering02:47
cathyalif things break in  ubuntu02:47
cathyaldo we always have to go into CLI to fix things02:47
lifelessdepends how broken it is02:54
lifelessevery operating system I know of has a CLI right at the very bottom :P02:54
lifeless    ^ current02:55
cathyalAnd some you can use without ever having to open it up, others not so much.02:55
AnAntHello, can someone reply me on those bugs: 283330  & 28145109:44
AnAnt283330 is Texas Instruments PCI6411/6421/6611/6621/7411/7421/7611/7621 Secure Digital Controller not working properly 09:45
AnAnt281451 is uvesafb (and vesafb) does not support 1280x800 resolution for NVIDIA graphics adapters09:46
amitkAnAnt: uvesafb is no longer default. The next kernel will revert back to vesafb.09:50
AnAntamitk: ok, vesafb has same problem09:56
=== asac_ is now known as asac
=== amitk is now known as amitk-lunch
=== amitk-lunch is now known as amitk
amitkrtg: so it has a physical switch to turn off the radios? and it is set to off?14:00
rtgamitk: it must be one of these slider switches on the front.14:00
amitkright14:02
rtgamitk: I'm booting it to be sure14:02
rtgamitk: yeah, that was it.14:03
rtgamitk: have you _ever_ gotten yours to hang?14:05
amitkrtg: you might be right about 3945 being the red herring. I just got it to hang again, right after sdhci and bt init14:06
amitkthis was only the third time 14:06
rtgamitk: with 3945 blacklisted?14:06
amitknope14:07
amitkunfortunately i just reverted to the distro kernel instead of the instrumented one14:07
Keybuk[context]14:42
Keybukhave been looking at the iwl3945 problem14:42
Keybukon my laptop, I have a kernel with most things compiled in14:42
Keybukonly "drivers" are not, and on my laptop, the only driver is the iwl3945 card14:42
Keybukit still locks up from time-to-time14:42
Keybukand the lock up is during udev module loading, not kernel14:42
Keybukit's a Dell Latitude D420, not a Thinkpad14:43
rtgKeybuk: 32 bit?14:43
amitkKeybuk: another good data point if it isn't a thinkpad14:43
Keybuk32-bit, aye14:43
rtgKeybuk: what happens if you compile in the 3945 ? I'll bet it stops hanging.14:44
KeybukI did not try that :)14:44
rtgI can do that. How do I automate a reboot?14:45
Keybukas in make your laptop continually reboot?14:45
amitkrtg: reboot in rc.local?14:45
Keybukput it in rc.local?14:45
rtgdoes that give udev time to settle?14:45
Keybukudev settles inside its own init script14:45
Keybukif you're seeing the lock up some time after ... then that's a whole big important data point14:46
rtgKeybuk: exactly14:46
Keybukerr14:46
Keybukrewind14:46
Keybukyou're seeing the lock-up after udev's init script has exited?14:46
Keybuklater on in the boot sequence?14:46
rtgKeybuk: mdz commented out the udevadm settle clause in restart on this laptop.14:47
Keybukinteresting14:47
KeybukI must admit, mine is commented out too14:47
Keybukwhen I put it back, the lock up doesn't hapepn14:47
rtgKeybuk: what does that do?14:48
amitkKeybuk: i would've expected it to be commented out in the start clause14:48
Keybukthen again14:48
KeybukI call settle later on, and the only thing I do in the meantime is activate swap, fsck and mount the root disk14:48
amitk..unless udev gets started in initramfs and the (re)started in rootfs14:49
rtgbut what does 'settle' do? How does it alter the timing?14:50
amitkrtg: man udevadm says it makes sure the udev queue is empty14:50
Keybukrtg: udev listens to kernel uevents14:51
Keybukobviously by the time it's started, it's missed a *huge* number of them14:51
Keybukso it "triggers" them again by walking /sys and writing "add" to all the uevent files14:51
Keybukwhich makes the kernel send them again14:51
Keybukso it ends up with a queue of events it needs to process14:51
rtgso in effect that means it waits for modules to finish loading before proceeding ?14:51
Keybuk"settle" does not exit until the kernel's event seqnum and udev's event seqnum are equal14:51
Keybukerr14:51
Keybukkiiiiinda14:51
Keybukit means that all of the devices that the kernel knew about when "trigger" were run, have at least had first-stage processing completed14:52
Keybukthis may involve loading modules, yes14:52
Keybukand would wait for those modprobe commands to finish14:52
Keybuk*BUT*14:52
Keybukloading those modules may have additional side-effects, such as further probing, further devices or interfaces showing up14:52
Keybukand those may have yet more modules to load14:52
Keybuksettle may not wait for those14:52
rtghmm, quite asynchronous (as it should be)14:53
Keybukiwl3945 has no firmware?14:55
amitkit does14:55
Keybukweird14:55
KeybukI can't see the call for it14:55
ajuniorexist a solution (patch) for e1000e driver?14:56
rtgits called ucode in the driver14:56
Keybukoh14:56
KeybukI thought all the firmware got separated out?14:56
rtgKeybuk: iwl3945_read_ucode()14:56
Keybukthere's a /lib/firmware/iwlwifi-3945-1.ucode after all14:56
amitkajunior: fixed in the ubuntu tree. will be in next kernel.14:56
Keybukrtg: right, that uses request_firmware()14:57
Keybukmy point is that I can't *see* that request from userspace14:57
ajuniorTKS14:57
rtgKeybuk: I think thats not relevant. with rf-kill switch enabled the hang still happens.14:58
Keybukwell, firmware loading affects timings that's all14:58
Keybukas in the modprobe will have to request a firmware14:58
Keybukwhich is another udev event14:58
Keybukwhich may affect settle (though shouldn't, because I think the modprobe blocks for it)14:59
rtgKeybuk: agreed. but it in this case it doesn't seem to make a difference.14:59
Keybukno...15:01
KeybukI'm just weirded out by the missing request :p15:01
Keybukoh15:02
Keybukno15:02
KeybukI'm just being stupid15:02
KeybukI see it now15:02
KeybukI forgot that firmware now appears as $DEVPATH/firmware/$ID15:02
Keybukrather than /sys/firmware15:02
Keybuk:p15:02
Keybukcould it be related to device renaming perhaps?15:04
rtgKeybuk: you mean eth1 --> wlan0 ?15:05
Keybukswapping eth1 and eth015:05
Keybukand yeah, renaming wlan0 to eth1 :p15:05
rtgKeybuk: that seems like a something the nested lock checking would catch.15:05
rtgits an rtnl lock15:06
amitkthere was a related renaming isse queued for the stable tree that I pulled in: 7b54c00efa87f519ae30f09bdbb11aaf6644605f15:06
Keybukwe don't call ifup on the device unless it's in /e/n/i (it isn't for me) and that's only after the device appears15:06
Keybukcould it be something network manager is doing on the device?15:06
Keybukthough that is quite late in the boot relatively15:07
rtgKeybuk: the hang happens long before X starts, so no NM in play.15:07
KeybukNM starts in userspace first15:08
Keybukrc2/S2815:08
rtghmm, ok15:08
Keybukmy really stripped down boot replicates it, you see15:09
Keybukthat is15:09
Keybuk udevd15:09
Keybuk trigger15:09
rtgamitk: I have that commit in my test kernel.15:09
Keybuk swapon -a -e15:09
Keybuk fsck /15:09
Keybuk mount /15:09
Keybuk update mtab, make tmp directories15:10
Keybuk udev settle15:10
Keybuk start dbus15:10
Keybuk start hal15:10
Keybuk start gdm15:10
Keybuk ifup -a15:10
Keybuk start NM15:10
Keybuk--15:11
rtgKeybuk: can you get the kernel to trigger a stack dump or anything?15:12
rtgmine locks so hard that caps-lock light is wedged.15:13
Keybukno :-/15:14
rtgKeybuk: do you mind getting stuck with this bug? I've gotta send David's laptop back to him pretty soon.15:15
rtgplus, I'm a bit out of my depth.15:15
Keybuk"getting stuck" ?15:16
rtgKeybuk: how else should I phrase it? I'm assigned to it in LP.15:16
KeybukI don't mind helping debug in spare time, but I don't really have enough free time to devote myself to it :-/15:18
rtgKeybuk: I think if we work around it by delaying the i3945 module load, we can side step the issue for now. I believe it _will_ come back to haunt us during our effort to improve boot times.15:19
Keybukdelaying how?15:19
rtginsert a 5 second delay in the modprob entry.15:19
Keybukit's not clear to me that it's actually racing anything but itself15:20
Keybuksleep 5 would just extend the udevsettle by 5s too :p15:20
rtgKeybuk: I am quite sure its not an issue with the i3945 driver, at least not directly. Jamie can reproduce it with ipw2200.15:20
Keybukwhat else do you think it is?15:22
Keybuk\o/15:22
Keybukjust got it without "quiet" ;)15:22
AnAntHello, can someone look at those bugs: 283330 & 28145115:23
rtgI've seen the i3945 complete its initialization before hanging. with rf-kill enabled, thats the last thing that happens in that device driver. hangs occur randomly during and after i3945 module loading.15:23
Keybukrf-kill enabled?15:24
rtgKeybuk: the little slider switch  that disables wireless.15:24
Keybukright, that's disabled for me15:25
Keybukie. radio is on15:25
AnAntI have reported a bug similar to 283330 last year (111756), yet I never got any response about it yet15:25
rtgok, the other way around.15:25
Keybukfirst hang was after15:25
Keybukiwl3945 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 1715:26
Keybuksecond hang had two more messages15:26
Keybukiwl3945: Detected Intel Wireless WiFi Link 3945ABG15:26
Keybukiwl3945: Tunable channels: 13 802.11bg, 23 802.11a channels15:26
Keybukso it's not quite happening at the same point each time15:26
rtgKeybuk: I've found that with wireless 'enabled' the hang happens less often. Its about 50/50 with wireless disabled.15:27
rtgand it definitely happens at different points in the boot sequence.15:27
Keybukright15:28
Keybukbut I've elimated that from my tests15:28
Keybukit is entirely unrelated to any software alongside15:28
KeybukI get the hang with only the modprobe :p15:28
rtgso, you're saying it really _is_ the i3945 driver?15:29
Keybukit looks like it to me15:29
rtgwhat about the ipw2200 data point?15:30
Keybukwhat's the other network card in the Thinkpad?15:30
rtge1000e15:30
Keybuktg3 here15:30
AnAntbdmurray: there you are15:33
Keybukrtg: am trying loading only the 3945 driver in isolation repeatedly15:34
Keybukso far, it doesn't hang15:34
Keybukwhich is interesting15:34
rtgKeybuk: if you're really sure its i3945, then that lends credence to my suspicion that its hardware related. Perhaps its some kind of PCI bus setup issue?15:35
Keybukwell, I'm confused as to what else it could be15:36
rtgThe Hardy driver is quite different. Jamie was unable to reproduce this hang with 2.6.26-5.17-generic, all subsequent released _did_ hang.15:37
amitkKeybuk: lool has already tried the modprobe/modprobe -r test on the 3945 over 700 times. Couldn't reproduce it.15:37
Keybukamitk: yeah15:38
Keybukthat's downright weird15:38
NgI forget if I asked this before, but is there any display corruption ever?15:40
rtgNg: related to this hang?15:41
Keybukit seems like it only hangs if udev calls modprobe ?!15:41
Ngrtg: yeah15:42
rtgNg: I've never noticed any.15:42
Ngok, 'cos my random boot hang that does have corruption seems to happen shortly after iwl4965 loads15:42
Ngwell, iwlagn now, but whatever15:42
rtgI wonder if this is something we ought to solicit Intel's help with?15:44
Ngerr no wait I'm on crack, it's intel_agp that's implicated in mine. As you were ;)15:45
NCommanderamitk, ping, if you get this before you commit my patch15:51
amitkNCommander: done15:52
rtgKeybuk: would you comment in #263059 why you think its driver related. I'm not exactly sure how you've come to that conclusion. In the meantime I've gotta get this laptop shipped.15:52
Keybukwell15:52
KeybukI don't get the hang if I don't load this driver ;)15:53
rtgI agree with that, but its not conclusive.15:53
KeybukI *think* it has something to do with the interface being renamed15:53
Keybukthat always seems to be the last thing that happens before the hang15:54
rtgwell, I have a little time. I'll instrument that code. its pretty straightforward IIRC.15:55
Keybukand if I disable that rule in udev, I haven't had the hang yet15:55
rtgKeybuk: I don't read udev very well. which rule is that?15:55
Keybukthe persistent-net ones15:55
Keybukyou need to remove the 7x one that has the rule15:56
Keybukand the 6x one to stop the rule coming back15:56
Keybukwhat's ecb(arc4) ?15:56
rtgdunno15:57
elmocrypto stuff15:57
Keybukhmm15:58
rtgKeybuk: which 60* rule are you talking about? Can't see anything net related.15:58
Keybuksorry15:59
Keybuk75-persistent-net-generator.rules15:59
Keybukand 70-persistent-net.rules15:59
rtg75-persistent-net-generator.rules whitelists eth and wlan, so won't they be ignored?16:00
KeybukI just moved the files out of the way :p16:00
rtgare these rules created during install according to installed HW ?16:01
Keybukeach boot16:02
Keybukthough I just ruled that out16:02
Keybukfinally got one to hang without them16:02
Keybukthey were just adding time16:02
rtghmm, dead-end16:02
Keybukthis is baffling16:03
Keybukit did:16:03
Keybuk modprobe tg3 (for eth0)16:03
Keybuk seems to have finished16:03
Keybuk iwl3945 messages16:03
Keybuk Tunable channels blah16:03
Keybuk modprobe ecb(arc4)16:03
Keybuk iwl3945 0000:0c:00.0: PCI INT A disabled16:04
Keybuk *hang*16:04
rtgso, it didn't make it to the device rename.16:04
KeybukI didn't include the device renaming stuff16:04
Keybukso it would never try16:05
rtgby the time you see 'iwl3945 0000:0c:00.0: PCI INT A disabled', the module inti is complete.16:05
Keybukdo you get the hang if you disable the e1000 from loading?16:09
rtgKeybuk: lemme try.16:10
rtgKeybuk: still locks up16:13
Keybukooooh16:15
Keybuk"Dazed and confused, but trying to continue"16:15
Keybukthe full error message of that16:21
KeybukUhhuh. NMI received for unknown reason b1 on CPU 0.16:21
KeybukYou have some hardware problem, likely on the PCI bus.16:21
KeybukDazed and confused, but trying to continue.16:22
Keybuk-- 16:22
rtgKeybuk: my feeling exactly16:22
Keybukwhile loading the iwl3945 driver16:22
Keybukrtg: you think this is a hardware problem?16:22
rtgits certainly hardwar'ish16:22
KeybukI tend to doubt that hypothesis16:23
Keybukbecause the hardware in question isn't even one piece, but across multiple types of laptop16:23
Keybukand its only *this* driver that triggers it16:23
Keybukor, at least, this family of drivers :p16:23
rtgKeybuk: could be the family of PCI interface chips used in these adapters.16:24
Keybukwhy would it only occur now, with 2.6.27 ?16:24
Keybukand why only when loading the iwl* driver alongside another driver16:24
Keybuksounds much more like the driver is failing to correctly lock out the PCI bus to me16:24
rtgKeybuk: whats the other driver? I disabled e1000e and it still happened.16:25
Keybukany pci driver fwict16:25
Keybukboth tg3 and the pcmcia socket adapter seem to do it for me16:25
Keybukbut only when combined with iwl16:25
rtgKeybuk: that kind of makes sense. 16:25
Keybukthat's why the modprobe loop doesn't work16:26
Keybukyou're only loading/unloading iwl394516:26
Keybukbut if you load two drivers at once, you get the hang or crash16:26
rtgwhich 2?16:26
Keybuk(I'm doing while true; do modprobe iwl3945 & modprobe tg3; done)16:26
rtgand that hangs?16:26
Keybukyeah16:26
Keybukand if it doesn't hang, I get that cute error message sometimes ;)16:27
rtgare you unloading them first?16:27
Keybukyes16:27
rtgoutstanding.16:27
rtgfinally, got it to do something.16:27
rtgor you did, rather.16:28
rtgKeybuk: so, I should look at the difference between Hardy and Intrepid start-up.16:29
Keybukwhat's the difference between the drivers?16:29
rtgKeybuk: in Intrepid the i3945 init is seperate from the other iwl drivers. They were more common in Hardy. Beyond taht I'll have to study them a bit before I know more.16:30
NCommanderdoes anyone else here work on the lpia kernel beside ScottK and amitk who can answer some questions?16:31
* ScottK declaims all knowledge of anything in the kernel and suspects that maybe NCommander is thinking of persia.16:31
NCommanderwe16:32
NCommanderStevenK16:32
NCommanderDamn autocomplete16:32
NCommander*er16:32
Keybukyeah16:33
Keybuka dozen times now, have the hang just with the modprobe loop with two modules in it16:33
Keybukall I have running is udev (no rename rules or anything - and can't see any side-effects with --debug on)16:33
Keybukeven the filesystem isn't writable, and no swap mapped16:34
rtgKeybuk: get that stuff in the bug report please. Its gonna be key in finding root cause.16:35
Keybukbug# ?16:35
rtgKeybuk: 263059.16:36
rtgKeybuk: one of the major differences that I see right off is that the PCI device is disabled at the end of the probe in Intrepid, whereas its left enabled in Hardy.16:37
Keybukweird16:43
Keybukifconfig eth1 up16:43
KeybukSIOCSIFFLAGS: No such device16:43
rtgKeybuk: you should boot Hardy and see if you can still reproduce it.16:44
Keybukthis is just with a normal boot trying to get my debugging info out :p16:44
Keybukoh16:45
Keybukhad the kill switch on16:45
Keybukheh heh heh16:45
rtgborked, huh?16:45
KeybukPEBCAK16:45
rtgKeybuk: you mean no lspci?17:11
Keybukno, lspci is still there17:11
Keybuksysfs is still there17:11
Keybukit's even still listed in ifconfig17:11
Keybukbut any ioctl return -ENODEV17:11
Keybukand the dmesg implies the interrupts are disabled17:12
rtgKeybuk: in fact, the interrupt isn't even assigned if rf-kill is enabled.17:12
Keybukso it's in a state I'm not familiar with ;)17:12
rtgKeybuk: yeah, the last thing the driver does is pci_disable_device()17:13
Keybukbut it enables it later if the kill switch is off?17:14
Keybukor does it not call pci_disable_device() if the kill switch is off?17:14
rtgif wireless is enabled (i.e., kill switch is off), then pci_enable_device() is called at if-up time.17:15
rtgthis is a change in behavior from Hardy.17:15
KeybukI'm wondering whether there's some invalid state going on based on the kill switch status17:16
Keybukand if the switch is off, the invalid state lasts _less_ time than when it's on17:16
Keybukand while in that invalid state, other pci drivers can hang the system17:16
rtgyou can't get this to happen if kill-switch is off?17:16
KeybukI can, just much less so17:16
Keybukwhich to me implies the time window of the hang is less if the switch is off17:16
Keybukwhen it's on, it's much more regular17:16
rtgwell, there is certainly less code in the driver being executed when the switch is on.17:17
trenton_Apologies if I intrude, but have run out of options. I'm trying to get 2.6.27 on hardy is this posible? Without compiling that is.17:42
rtgtrenton_: kernel-ppa17:43
rtghttps://edge.launchpad.net/~kernel-ppa/+archive17:43
trenton_rtg: thanks17:43

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!