=== TheMuso_ is now known as TheMuso | ||
cathyal | im just wondering | 02:47 |
---|---|---|
cathyal | if things break in ubuntu | 02:47 |
cathyal | do we always have to go into CLI to fix things | 02:47 |
lifeless | depends how broken it is | 02:54 |
lifeless | every operating system I know of has a CLI right at the very bottom :P | 02:54 |
lifeless | ^ current | 02:55 |
cathyal | And some you can use without ever having to open it up, others not so much. | 02:55 |
AnAnt | Hello, can someone reply me on those bugs: 283330 & 281451 | 09:44 |
AnAnt | 283330 is Texas Instruments PCI6411/6421/6611/6621/7411/7421/7611/7621 Secure Digital Controller not working properly | 09:45 |
AnAnt | 281451 is uvesafb (and vesafb) does not support 1280x800 resolution for NVIDIA graphics adapters | 09:46 |
amitk | AnAnt: uvesafb is no longer default. The next kernel will revert back to vesafb. | 09:50 |
AnAnt | amitk: ok, vesafb has same problem | 09:56 |
=== asac_ is now known as asac | ||
=== amitk is now known as amitk-lunch | ||
=== amitk-lunch is now known as amitk | ||
amitk | rtg: so it has a physical switch to turn off the radios? and it is set to off? | 14:00 |
rtg | amitk: it must be one of these slider switches on the front. | 14:00 |
amitk | right | 14:02 |
rtg | amitk: I'm booting it to be sure | 14:02 |
rtg | amitk: yeah, that was it. | 14:03 |
rtg | amitk: have you _ever_ gotten yours to hang? | 14:05 |
amitk | rtg: you might be right about 3945 being the red herring. I just got it to hang again, right after sdhci and bt init | 14:06 |
amitk | this was only the third time | 14:06 |
rtg | amitk: with 3945 blacklisted? | 14:06 |
amitk | nope | 14:07 |
amitk | unfortunately i just reverted to the distro kernel instead of the instrumented one | 14:07 |
Keybuk | [context] | 14:42 |
Keybuk | have been looking at the iwl3945 problem | 14:42 |
Keybuk | on my laptop, I have a kernel with most things compiled in | 14:42 |
Keybuk | only "drivers" are not, and on my laptop, the only driver is the iwl3945 card | 14:42 |
Keybuk | it still locks up from time-to-time | 14:42 |
Keybuk | and the lock up is during udev module loading, not kernel | 14:42 |
Keybuk | it's a Dell Latitude D420, not a Thinkpad | 14:43 |
rtg | Keybuk: 32 bit? | 14:43 |
amitk | Keybuk: another good data point if it isn't a thinkpad | 14:43 |
Keybuk | 32-bit, aye | 14:43 |
rtg | Keybuk: what happens if you compile in the 3945 ? I'll bet it stops hanging. | 14:44 |
Keybuk | I did not try that :) | 14:44 |
rtg | I can do that. How do I automate a reboot? | 14:45 |
Keybuk | as in make your laptop continually reboot? | 14:45 |
amitk | rtg: reboot in rc.local? | 14:45 |
Keybuk | put it in rc.local? | 14:45 |
rtg | does that give udev time to settle? | 14:45 |
Keybuk | udev settles inside its own init script | 14:45 |
Keybuk | if you're seeing the lock up some time after ... then that's a whole big important data point | 14:46 |
rtg | Keybuk: exactly | 14:46 |
Keybuk | err | 14:46 |
Keybuk | rewind | 14:46 |
Keybuk | you're seeing the lock-up after udev's init script has exited? | 14:46 |
Keybuk | later on in the boot sequence? | 14:46 |
rtg | Keybuk: mdz commented out the udevadm settle clause in restart on this laptop. | 14:47 |
Keybuk | interesting | 14:47 |
Keybuk | I must admit, mine is commented out too | 14:47 |
Keybuk | when I put it back, the lock up doesn't hapepn | 14:47 |
rtg | Keybuk: what does that do? | 14:48 |
amitk | Keybuk: i would've expected it to be commented out in the start clause | 14:48 |
Keybuk | then again | 14:48 |
Keybuk | I call settle later on, and the only thing I do in the meantime is activate swap, fsck and mount the root disk | 14:48 |
amitk | ..unless udev gets started in initramfs and the (re)started in rootfs | 14:49 |
rtg | but what does 'settle' do? How does it alter the timing? | 14:50 |
amitk | rtg: man udevadm says it makes sure the udev queue is empty | 14:50 |
Keybuk | rtg: udev listens to kernel uevents | 14:51 |
Keybuk | obviously by the time it's started, it's missed a *huge* number of them | 14:51 |
Keybuk | so it "triggers" them again by walking /sys and writing "add" to all the uevent files | 14:51 |
Keybuk | which makes the kernel send them again | 14:51 |
Keybuk | so it ends up with a queue of events it needs to process | 14:51 |
rtg | so in effect that means it waits for modules to finish loading before proceeding ? | 14:51 |
Keybuk | "settle" does not exit until the kernel's event seqnum and udev's event seqnum are equal | 14:51 |
Keybuk | err | 14:51 |
Keybuk | kiiiiinda | 14:51 |
Keybuk | it means that all of the devices that the kernel knew about when "trigger" were run, have at least had first-stage processing completed | 14:52 |
Keybuk | this may involve loading modules, yes | 14:52 |
Keybuk | and would wait for those modprobe commands to finish | 14:52 |
Keybuk | *BUT* | 14:52 |
Keybuk | loading those modules may have additional side-effects, such as further probing, further devices or interfaces showing up | 14:52 |
Keybuk | and those may have yet more modules to load | 14:52 |
Keybuk | settle may not wait for those | 14:52 |
rtg | hmm, quite asynchronous (as it should be) | 14:53 |
Keybuk | iwl3945 has no firmware? | 14:55 |
amitk | it does | 14:55 |
Keybuk | weird | 14:55 |
Keybuk | I can't see the call for it | 14:55 |
ajunior | exist a solution (patch) for e1000e driver? | 14:56 |
rtg | its called ucode in the driver | 14:56 |
Keybuk | oh | 14:56 |
Keybuk | I thought all the firmware got separated out? | 14:56 |
rtg | Keybuk: iwl3945_read_ucode() | 14:56 |
Keybuk | there's a /lib/firmware/iwlwifi-3945-1.ucode after all | 14:56 |
amitk | ajunior: fixed in the ubuntu tree. will be in next kernel. | 14:56 |
Keybuk | rtg: right, that uses request_firmware() | 14:57 |
Keybuk | my point is that I can't *see* that request from userspace | 14:57 |
ajunior | TKS | 14:57 |
rtg | Keybuk: I think thats not relevant. with rf-kill switch enabled the hang still happens. | 14:58 |
Keybuk | well, firmware loading affects timings that's all | 14:58 |
Keybuk | as in the modprobe will have to request a firmware | 14:58 |
Keybuk | which is another udev event | 14:58 |
Keybuk | which may affect settle (though shouldn't, because I think the modprobe blocks for it) | 14:59 |
rtg | Keybuk: agreed. but it in this case it doesn't seem to make a difference. | 14:59 |
Keybuk | no... | 15:01 |
Keybuk | I'm just weirded out by the missing request :p | 15:01 |
Keybuk | oh | 15:02 |
Keybuk | no | 15:02 |
Keybuk | I'm just being stupid | 15:02 |
Keybuk | I see it now | 15:02 |
Keybuk | I forgot that firmware now appears as $DEVPATH/firmware/$ID | 15:02 |
Keybuk | rather than /sys/firmware | 15:02 |
Keybuk | :p | 15:02 |
Keybuk | could it be related to device renaming perhaps? | 15:04 |
rtg | Keybuk: you mean eth1 --> wlan0 ? | 15:05 |
Keybuk | swapping eth1 and eth0 | 15:05 |
Keybuk | and yeah, renaming wlan0 to eth1 :p | 15:05 |
rtg | Keybuk: that seems like a something the nested lock checking would catch. | 15:05 |
rtg | its an rtnl lock | 15:06 |
amitk | there was a related renaming isse queued for the stable tree that I pulled in: 7b54c00efa87f519ae30f09bdbb11aaf6644605f | 15:06 |
Keybuk | we don't call ifup on the device unless it's in /e/n/i (it isn't for me) and that's only after the device appears | 15:06 |
Keybuk | could it be something network manager is doing on the device? | 15:06 |
Keybuk | though that is quite late in the boot relatively | 15:07 |
rtg | Keybuk: the hang happens long before X starts, so no NM in play. | 15:07 |
Keybuk | NM starts in userspace first | 15:08 |
Keybuk | rc2/S28 | 15:08 |
rtg | hmm, ok | 15:08 |
Keybuk | my really stripped down boot replicates it, you see | 15:09 |
Keybuk | that is | 15:09 |
Keybuk | udevd | 15:09 |
Keybuk | trigger | 15:09 |
rtg | amitk: I have that commit in my test kernel. | 15:09 |
Keybuk | swapon -a -e | 15:09 |
Keybuk | fsck / | 15:09 |
Keybuk | mount / | 15:09 |
Keybuk | update mtab, make tmp directories | 15:10 |
Keybuk | udev settle | 15:10 |
Keybuk | start dbus | 15:10 |
Keybuk | start hal | 15:10 |
Keybuk | start gdm | 15:10 |
Keybuk | ifup -a | 15:10 |
Keybuk | start NM | 15:10 |
Keybuk | -- | 15:11 |
rtg | Keybuk: can you get the kernel to trigger a stack dump or anything? | 15:12 |
rtg | mine locks so hard that caps-lock light is wedged. | 15:13 |
Keybuk | no :-/ | 15:14 |
rtg | Keybuk: do you mind getting stuck with this bug? I've gotta send David's laptop back to him pretty soon. | 15:15 |
rtg | plus, I'm a bit out of my depth. | 15:15 |
Keybuk | "getting stuck" ? | 15:16 |
rtg | Keybuk: how else should I phrase it? I'm assigned to it in LP. | 15:16 |
Keybuk | I don't mind helping debug in spare time, but I don't really have enough free time to devote myself to it :-/ | 15:18 |
rtg | Keybuk: I think if we work around it by delaying the i3945 module load, we can side step the issue for now. I believe it _will_ come back to haunt us during our effort to improve boot times. | 15:19 |
Keybuk | delaying how? | 15:19 |
rtg | insert a 5 second delay in the modprob entry. | 15:19 |
Keybuk | it's not clear to me that it's actually racing anything but itself | 15:20 |
Keybuk | sleep 5 would just extend the udevsettle by 5s too :p | 15:20 |
rtg | Keybuk: I am quite sure its not an issue with the i3945 driver, at least not directly. Jamie can reproduce it with ipw2200. | 15:20 |
Keybuk | what else do you think it is? | 15:22 |
Keybuk | \o/ | 15:22 |
Keybuk | just got it without "quiet" ;) | 15:22 |
AnAnt | Hello, can someone look at those bugs: 283330 & 281451 | 15:23 |
rtg | I've seen the i3945 complete its initialization before hanging. with rf-kill enabled, thats the last thing that happens in that device driver. hangs occur randomly during and after i3945 module loading. | 15:23 |
Keybuk | rf-kill enabled? | 15:24 |
rtg | Keybuk: the little slider switch that disables wireless. | 15:24 |
Keybuk | right, that's disabled for me | 15:25 |
Keybuk | ie. radio is on | 15:25 |
AnAnt | I have reported a bug similar to 283330 last year (111756), yet I never got any response about it yet | 15:25 |
rtg | ok, the other way around. | 15:25 |
Keybuk | first hang was after | 15:25 |
Keybuk | iwl3945 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 | 15:26 |
Keybuk | second hang had two more messages | 15:26 |
Keybuk | iwl3945: Detected Intel Wireless WiFi Link 3945ABG | 15:26 |
Keybuk | iwl3945: Tunable channels: 13 802.11bg, 23 802.11a channels | 15:26 |
Keybuk | so it's not quite happening at the same point each time | 15:26 |
rtg | Keybuk: I've found that with wireless 'enabled' the hang happens less often. Its about 50/50 with wireless disabled. | 15:27 |
rtg | and it definitely happens at different points in the boot sequence. | 15:27 |
Keybuk | right | 15:28 |
Keybuk | but I've elimated that from my tests | 15:28 |
Keybuk | it is entirely unrelated to any software alongside | 15:28 |
Keybuk | I get the hang with only the modprobe :p | 15:28 |
rtg | so, you're saying it really _is_ the i3945 driver? | 15:29 |
Keybuk | it looks like it to me | 15:29 |
rtg | what about the ipw2200 data point? | 15:30 |
Keybuk | what's the other network card in the Thinkpad? | 15:30 |
rtg | e1000e | 15:30 |
Keybuk | tg3 here | 15:30 |
AnAnt | bdmurray: there you are | 15:33 |
Keybuk | rtg: am trying loading only the 3945 driver in isolation repeatedly | 15:34 |
Keybuk | so far, it doesn't hang | 15:34 |
Keybuk | which is interesting | 15:34 |
rtg | Keybuk: if you're really sure its i3945, then that lends credence to my suspicion that its hardware related. Perhaps its some kind of PCI bus setup issue? | 15:35 |
Keybuk | well, I'm confused as to what else it could be | 15:36 |
rtg | The Hardy driver is quite different. Jamie was unable to reproduce this hang with 2.6.26-5.17-generic, all subsequent released _did_ hang. | 15:37 |
amitk | Keybuk: lool has already tried the modprobe/modprobe -r test on the 3945 over 700 times. Couldn't reproduce it. | 15:37 |
Keybuk | amitk: yeah | 15:38 |
Keybuk | that's downright weird | 15:38 |
Ng | I forget if I asked this before, but is there any display corruption ever? | 15:40 |
rtg | Ng: related to this hang? | 15:41 |
Keybuk | it seems like it only hangs if udev calls modprobe ?! | 15:41 |
Ng | rtg: yeah | 15:42 |
rtg | Ng: I've never noticed any. | 15:42 |
Ng | ok, 'cos my random boot hang that does have corruption seems to happen shortly after iwl4965 loads | 15:42 |
Ng | well, iwlagn now, but whatever | 15:42 |
rtg | I wonder if this is something we ought to solicit Intel's help with? | 15:44 |
Ng | err no wait I'm on crack, it's intel_agp that's implicated in mine. As you were ;) | 15:45 |
NCommander | amitk, ping, if you get this before you commit my patch | 15:51 |
amitk | NCommander: done | 15:52 |
rtg | Keybuk: would you comment in #263059 why you think its driver related. I'm not exactly sure how you've come to that conclusion. In the meantime I've gotta get this laptop shipped. | 15:52 |
Keybuk | well | 15:52 |
Keybuk | I don't get the hang if I don't load this driver ;) | 15:53 |
rtg | I agree with that, but its not conclusive. | 15:53 |
Keybuk | I *think* it has something to do with the interface being renamed | 15:53 |
Keybuk | that always seems to be the last thing that happens before the hang | 15:54 |
rtg | well, I have a little time. I'll instrument that code. its pretty straightforward IIRC. | 15:55 |
Keybuk | and if I disable that rule in udev, I haven't had the hang yet | 15:55 |
rtg | Keybuk: I don't read udev very well. which rule is that? | 15:55 |
Keybuk | the persistent-net ones | 15:55 |
Keybuk | you need to remove the 7x one that has the rule | 15:56 |
Keybuk | and the 6x one to stop the rule coming back | 15:56 |
Keybuk | what's ecb(arc4) ? | 15:56 |
rtg | dunno | 15:57 |
elmo | crypto stuff | 15:57 |
Keybuk | hmm | 15:58 |
rtg | Keybuk: which 60* rule are you talking about? Can't see anything net related. | 15:58 |
Keybuk | sorry | 15:59 |
Keybuk | 75-persistent-net-generator.rules | 15:59 |
Keybuk | and 70-persistent-net.rules | 15:59 |
rtg | 75-persistent-net-generator.rules whitelists eth and wlan, so won't they be ignored? | 16:00 |
Keybuk | I just moved the files out of the way :p | 16:00 |
rtg | are these rules created during install according to installed HW ? | 16:01 |
Keybuk | each boot | 16:02 |
Keybuk | though I just ruled that out | 16:02 |
Keybuk | finally got one to hang without them | 16:02 |
Keybuk | they were just adding time | 16:02 |
rtg | hmm, dead-end | 16:02 |
Keybuk | this is baffling | 16:03 |
Keybuk | it did: | 16:03 |
Keybuk | modprobe tg3 (for eth0) | 16:03 |
Keybuk | seems to have finished | 16:03 |
Keybuk | iwl3945 messages | 16:03 |
Keybuk | Tunable channels blah | 16:03 |
Keybuk | modprobe ecb(arc4) | 16:03 |
Keybuk | iwl3945 0000:0c:00.0: PCI INT A disabled | 16:04 |
Keybuk | *hang* | 16:04 |
rtg | so, it didn't make it to the device rename. | 16:04 |
Keybuk | I didn't include the device renaming stuff | 16:04 |
Keybuk | so it would never try | 16:05 |
rtg | by the time you see 'iwl3945 0000:0c:00.0: PCI INT A disabled', the module inti is complete. | 16:05 |
Keybuk | do you get the hang if you disable the e1000 from loading? | 16:09 |
rtg | Keybuk: lemme try. | 16:10 |
rtg | Keybuk: still locks up | 16:13 |
Keybuk | ooooh | 16:15 |
Keybuk | "Dazed and confused, but trying to continue" | 16:15 |
Keybuk | the full error message of that | 16:21 |
Keybuk | Uhhuh. NMI received for unknown reason b1 on CPU 0. | 16:21 |
Keybuk | You have some hardware problem, likely on the PCI bus. | 16:21 |
Keybuk | Dazed and confused, but trying to continue. | 16:22 |
Keybuk | -- | 16:22 |
rtg | Keybuk: my feeling exactly | 16:22 |
Keybuk | while loading the iwl3945 driver | 16:22 |
Keybuk | rtg: you think this is a hardware problem? | 16:22 |
rtg | its certainly hardwar'ish | 16:22 |
Keybuk | I tend to doubt that hypothesis | 16:23 |
Keybuk | because the hardware in question isn't even one piece, but across multiple types of laptop | 16:23 |
Keybuk | and its only *this* driver that triggers it | 16:23 |
Keybuk | or, at least, this family of drivers :p | 16:23 |
rtg | Keybuk: could be the family of PCI interface chips used in these adapters. | 16:24 |
Keybuk | why would it only occur now, with 2.6.27 ? | 16:24 |
Keybuk | and why only when loading the iwl* driver alongside another driver | 16:24 |
Keybuk | sounds much more like the driver is failing to correctly lock out the PCI bus to me | 16:24 |
rtg | Keybuk: whats the other driver? I disabled e1000e and it still happened. | 16:25 |
Keybuk | any pci driver fwict | 16:25 |
Keybuk | both tg3 and the pcmcia socket adapter seem to do it for me | 16:25 |
Keybuk | but only when combined with iwl | 16:25 |
rtg | Keybuk: that kind of makes sense. | 16:25 |
Keybuk | that's why the modprobe loop doesn't work | 16:26 |
Keybuk | you're only loading/unloading iwl3945 | 16:26 |
Keybuk | but if you load two drivers at once, you get the hang or crash | 16:26 |
rtg | which 2? | 16:26 |
Keybuk | (I'm doing while true; do modprobe iwl3945 & modprobe tg3; done) | 16:26 |
rtg | and that hangs? | 16:26 |
Keybuk | yeah | 16:26 |
Keybuk | and if it doesn't hang, I get that cute error message sometimes ;) | 16:27 |
rtg | are you unloading them first? | 16:27 |
Keybuk | yes | 16:27 |
rtg | outstanding. | 16:27 |
rtg | finally, got it to do something. | 16:27 |
rtg | or you did, rather. | 16:28 |
rtg | Keybuk: so, I should look at the difference between Hardy and Intrepid start-up. | 16:29 |
Keybuk | what's the difference between the drivers? | 16:29 |
rtg | Keybuk: in Intrepid the i3945 init is seperate from the other iwl drivers. They were more common in Hardy. Beyond taht I'll have to study them a bit before I know more. | 16:30 |
NCommander | does anyone else here work on the lpia kernel beside ScottK and amitk who can answer some questions? | 16:31 |
* ScottK declaims all knowledge of anything in the kernel and suspects that maybe NCommander is thinking of persia. | 16:31 | |
NCommander | we | 16:32 |
NCommander | StevenK | 16:32 |
NCommander | Damn autocomplete | 16:32 |
NCommander | *er | 16:32 |
Keybuk | yeah | 16:33 |
Keybuk | a dozen times now, have the hang just with the modprobe loop with two modules in it | 16:33 |
Keybuk | all I have running is udev (no rename rules or anything - and can't see any side-effects with --debug on) | 16:33 |
Keybuk | even the filesystem isn't writable, and no swap mapped | 16:34 |
rtg | Keybuk: get that stuff in the bug report please. Its gonna be key in finding root cause. | 16:35 |
Keybuk | bug# ? | 16:35 |
rtg | Keybuk: 263059. | 16:36 |
rtg | Keybuk: one of the major differences that I see right off is that the PCI device is disabled at the end of the probe in Intrepid, whereas its left enabled in Hardy. | 16:37 |
Keybuk | weird | 16:43 |
Keybuk | ifconfig eth1 up | 16:43 |
Keybuk | SIOCSIFFLAGS: No such device | 16:43 |
rtg | Keybuk: you should boot Hardy and see if you can still reproduce it. | 16:44 |
Keybuk | this is just with a normal boot trying to get my debugging info out :p | 16:44 |
Keybuk | oh | 16:45 |
Keybuk | had the kill switch on | 16:45 |
Keybuk | heh heh heh | 16:45 |
rtg | borked, huh? | 16:45 |
Keybuk | PEBCAK | 16:45 |
rtg | Keybuk: you mean no lspci? | 17:11 |
Keybuk | no, lspci is still there | 17:11 |
Keybuk | sysfs is still there | 17:11 |
Keybuk | it's even still listed in ifconfig | 17:11 |
Keybuk | but any ioctl return -ENODEV | 17:11 |
Keybuk | and the dmesg implies the interrupts are disabled | 17:12 |
rtg | Keybuk: in fact, the interrupt isn't even assigned if rf-kill is enabled. | 17:12 |
Keybuk | so it's in a state I'm not familiar with ;) | 17:12 |
rtg | Keybuk: yeah, the last thing the driver does is pci_disable_device() | 17:13 |
Keybuk | but it enables it later if the kill switch is off? | 17:14 |
Keybuk | or does it not call pci_disable_device() if the kill switch is off? | 17:14 |
rtg | if wireless is enabled (i.e., kill switch is off), then pci_enable_device() is called at if-up time. | 17:15 |
rtg | this is a change in behavior from Hardy. | 17:15 |
Keybuk | I'm wondering whether there's some invalid state going on based on the kill switch status | 17:16 |
Keybuk | and if the switch is off, the invalid state lasts _less_ time than when it's on | 17:16 |
Keybuk | and while in that invalid state, other pci drivers can hang the system | 17:16 |
rtg | you can't get this to happen if kill-switch is off? | 17:16 |
Keybuk | I can, just much less so | 17:16 |
Keybuk | which to me implies the time window of the hang is less if the switch is off | 17:16 |
Keybuk | when it's on, it's much more regular | 17:16 |
rtg | well, there is certainly less code in the driver being executed when the switch is on. | 17:17 |
trenton_ | Apologies if I intrude, but have run out of options. I'm trying to get 2.6.27 on hardy is this posible? Without compiling that is. | 17:42 |
rtg | trenton_: kernel-ppa | 17:43 |
rtg | https://edge.launchpad.net/~kernel-ppa/+archive | 17:43 |
trenton_ | rtg: thanks | 17:43 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!