[02:47] <cathyal> im just wondering
[02:47] <cathyal> if things break in  ubuntu
[02:47] <cathyal> do we always have to go into CLI to fix things
[02:54] <lifeless> depends how broken it is
[02:54] <lifeless> every operating system I know of has a CLI right at the very bottom :P
[02:55] <lifeless>     ^ current
[02:55] <cathyal> And some you can use without ever having to open it up, others not so much.
[09:44] <AnAnt> Hello, can someone reply me on those bugs: 283330  & 281451
[09:45] <AnAnt> 283330 is Texas Instruments PCI6411/6421/6611/6621/7411/7421/7611/7621 Secure Digital Controller not working properly 
[09:46] <AnAnt> 281451 is uvesafb (and vesafb) does not support 1280x800 resolution for NVIDIA graphics adapters
[09:50] <amitk> AnAnt: uvesafb is no longer default. The next kernel will revert back to vesafb.
[09:56] <AnAnt> amitk: ok, vesafb has same problem
[14:00] <amitk> rtg: so it has a physical switch to turn off the radios? and it is set to off?
[14:00] <rtg> amitk: it must be one of these slider switches on the front.
[14:02] <amitk> right
[14:02] <rtg> amitk: I'm booting it to be sure
[14:03] <rtg> amitk: yeah, that was it.
[14:05] <rtg> amitk: have you _ever_ gotten yours to hang?
[14:06] <amitk> rtg: you might be right about 3945 being the red herring. I just got it to hang again, right after sdhci and bt init
[14:06] <amitk> this was only the third time 
[14:06] <rtg> amitk: with 3945 blacklisted?
[14:07] <amitk> nope
[14:07] <amitk> unfortunately i just reverted to the distro kernel instead of the instrumented one
[14:42] <Keybuk> [context]
[14:42] <Keybuk> have been looking at the iwl3945 problem
[14:42] <Keybuk> on my laptop, I have a kernel with most things compiled in
[14:42] <Keybuk> only "drivers" are not, and on my laptop, the only driver is the iwl3945 card
[14:42] <Keybuk> it still locks up from time-to-time
[14:42] <Keybuk> and the lock up is during udev module loading, not kernel
[14:43] <Keybuk> it's a Dell Latitude D420, not a Thinkpad
[14:43] <rtg> Keybuk: 32 bit?
[14:43] <amitk> Keybuk: another good data point if it isn't a thinkpad
[14:43] <Keybuk> 32-bit, aye
[14:44] <rtg> Keybuk: what happens if you compile in the 3945 ? I'll bet it stops hanging.
[14:44] <Keybuk> I did not try that :)
[14:45] <rtg> I can do that. How do I automate a reboot?
[14:45] <Keybuk> as in make your laptop continually reboot?
[14:45] <amitk> rtg: reboot in rc.local?
[14:45] <Keybuk> put it in rc.local?
[14:45] <rtg> does that give udev time to settle?
[14:45] <Keybuk> udev settles inside its own init script
[14:46] <Keybuk> if you're seeing the lock up some time after ... then that's a whole big important data point
[14:46] <rtg> Keybuk: exactly
[14:46] <Keybuk> err
[14:46] <Keybuk> rewind
[14:46] <Keybuk> you're seeing the lock-up after udev's init script has exited?
[14:46] <Keybuk> later on in the boot sequence?
[14:47] <rtg> Keybuk: mdz commented out the udevadm settle clause in restart on this laptop.
[14:47] <Keybuk> interesting
[14:47] <Keybuk> I must admit, mine is commented out too
[14:47] <Keybuk> when I put it back, the lock up doesn't hapepn
[14:48] <rtg> Keybuk: what does that do?
[14:48] <amitk> Keybuk: i would've expected it to be commented out in the start clause
[14:48] <Keybuk> then again
[14:48] <Keybuk> I call settle later on, and the only thing I do in the meantime is activate swap, fsck and mount the root disk
[14:49] <amitk> ..unless udev gets started in initramfs and the (re)started in rootfs
[14:50] <rtg> but what does 'settle' do? How does it alter the timing?
[14:50] <amitk> rtg: man udevadm says it makes sure the udev queue is empty
[14:51] <Keybuk> rtg: udev listens to kernel uevents
[14:51] <Keybuk> obviously by the time it's started, it's missed a *huge* number of them
[14:51] <Keybuk> so it "triggers" them again by walking /sys and writing "add" to all the uevent files
[14:51] <Keybuk> which makes the kernel send them again
[14:51] <Keybuk> so it ends up with a queue of events it needs to process
[14:51] <rtg> so in effect that means it waits for modules to finish loading before proceeding ?
[14:51] <Keybuk> "settle" does not exit until the kernel's event seqnum and udev's event seqnum are equal
[14:51] <Keybuk> err
[14:51] <Keybuk> kiiiiinda
[14:52] <Keybuk> it means that all of the devices that the kernel knew about when "trigger" were run, have at least had first-stage processing completed
[14:52] <Keybuk> this may involve loading modules, yes
[14:52] <Keybuk> and would wait for those modprobe commands to finish
[14:52] <Keybuk> *BUT*
[14:52] <Keybuk> loading those modules may have additional side-effects, such as further probing, further devices or interfaces showing up
[14:52] <Keybuk> and those may have yet more modules to load
[14:52] <Keybuk> settle may not wait for those
[14:53] <rtg> hmm, quite asynchronous (as it should be)
[14:55] <Keybuk> iwl3945 has no firmware?
[14:55] <amitk> it does
[14:55] <Keybuk> weird
[14:55] <Keybuk> I can't see the call for it
[14:56] <ajunior> exist a solution (patch) for e1000e driver?
[14:56] <rtg> its called ucode in the driver
[14:56] <Keybuk> oh
[14:56] <Keybuk> I thought all the firmware got separated out?
[14:56] <rtg> Keybuk: iwl3945_read_ucode()
[14:56] <Keybuk> there's a /lib/firmware/iwlwifi-3945-1.ucode after all
[14:56] <amitk> ajunior: fixed in the ubuntu tree. will be in next kernel.
[14:57] <Keybuk> rtg: right, that uses request_firmware()
[14:57] <Keybuk> my point is that I can't *see* that request from userspace
[14:57] <ajunior> TKS
[14:58] <rtg> Keybuk: I think thats not relevant. with rf-kill switch enabled the hang still happens.
[14:58] <Keybuk> well, firmware loading affects timings that's all
[14:58] <Keybuk> as in the modprobe will have to request a firmware
[14:58] <Keybuk> which is another udev event
[14:59] <Keybuk> which may affect settle (though shouldn't, because I think the modprobe blocks for it)
[14:59] <rtg> Keybuk: agreed. but it in this case it doesn't seem to make a difference.
[15:01] <Keybuk> no...
[15:01] <Keybuk> I'm just weirded out by the missing request :p
[15:02] <Keybuk> oh
[15:02] <Keybuk> no
[15:02] <Keybuk> I'm just being stupid
[15:02] <Keybuk> I see it now
[15:02] <Keybuk> I forgot that firmware now appears as $DEVPATH/firmware/$ID
[15:02] <Keybuk> rather than /sys/firmware
[15:02] <Keybuk> :p
[15:04] <Keybuk> could it be related to device renaming perhaps?
[15:05] <rtg> Keybuk: you mean eth1 --> wlan0 ?
[15:05] <Keybuk> swapping eth1 and eth0
[15:05] <Keybuk> and yeah, renaming wlan0 to eth1 :p
[15:05] <rtg> Keybuk: that seems like a something the nested lock checking would catch.
[15:06] <rtg> its an rtnl lock
[15:06] <amitk> there was a related renaming isse queued for the stable tree that I pulled in: 7b54c00efa87f519ae30f09bdbb11aaf6644605f
[15:06] <Keybuk> we don't call ifup on the device unless it's in /e/n/i (it isn't for me) and that's only after the device appears
[15:06] <Keybuk> could it be something network manager is doing on the device?
[15:07] <Keybuk> though that is quite late in the boot relatively
[15:07] <rtg> Keybuk: the hang happens long before X starts, so no NM in play.
[15:08] <Keybuk> NM starts in userspace first
[15:08] <Keybuk> rc2/S28
[15:08] <rtg> hmm, ok
[15:09] <Keybuk> my really stripped down boot replicates it, you see
[15:09] <Keybuk> that is
[15:09] <Keybuk>  udevd
[15:09] <Keybuk>  trigger
[15:09] <rtg> amitk: I have that commit in my test kernel.
[15:09] <Keybuk>  swapon -a -e
[15:09] <Keybuk>  fsck /
[15:09] <Keybuk>  mount /
[15:10] <Keybuk>  update mtab, make tmp directories
[15:10] <Keybuk>  udev settle
[15:10] <Keybuk>  start dbus
[15:10] <Keybuk>  start hal
[15:10] <Keybuk>  start gdm
[15:10] <Keybuk>  ifup -a
[15:10] <Keybuk>  start NM
[15:11] <Keybuk> --
[15:12] <rtg> Keybuk: can you get the kernel to trigger a stack dump or anything?
[15:13] <rtg> mine locks so hard that caps-lock light is wedged.
[15:14] <Keybuk> no :-/
[15:15] <rtg> Keybuk: do you mind getting stuck with this bug? I've gotta send David's laptop back to him pretty soon.
[15:15] <rtg> plus, I'm a bit out of my depth.
[15:16] <Keybuk> "getting stuck" ?
[15:16] <rtg> Keybuk: how else should I phrase it? I'm assigned to it in LP.
[15:18] <Keybuk> I don't mind helping debug in spare time, but I don't really have enough free time to devote myself to it :-/
[15:19] <rtg> Keybuk: I think if we work around it by delaying the i3945 module load, we can side step the issue for now. I believe it _will_ come back to haunt us during our effort to improve boot times.
[15:19] <Keybuk> delaying how?
[15:19] <rtg> insert a 5 second delay in the modprob entry.
[15:20] <Keybuk> it's not clear to me that it's actually racing anything but itself
[15:20] <Keybuk> sleep 5 would just extend the udevsettle by 5s too :p
[15:20] <rtg> Keybuk: I am quite sure its not an issue with the i3945 driver, at least not directly. Jamie can reproduce it with ipw2200.
[15:22] <Keybuk> what else do you think it is?
[15:22] <Keybuk> \o/
[15:22] <Keybuk> just got it without "quiet" ;)
[15:23] <AnAnt> Hello, can someone look at those bugs: 283330 & 281451
[15:23] <rtg> I've seen the i3945 complete its initialization before hanging. with rf-kill enabled, thats the last thing that happens in that device driver. hangs occur randomly during and after i3945 module loading.
[15:24] <Keybuk> rf-kill enabled?
[15:24] <rtg> Keybuk: the little slider switch  that disables wireless.
[15:25] <Keybuk> right, that's disabled for me
[15:25] <Keybuk> ie. radio is on
[15:25] <AnAnt> I have reported a bug similar to 283330 last year (111756), yet I never got any response about it yet
[15:25] <rtg> ok, the other way around.
[15:25] <Keybuk> first hang was after
[15:26] <Keybuk> iwl3945 0000:0c:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[15:26] <Keybuk> second hang had two more messages
[15:26] <Keybuk> iwl3945: Detected Intel Wireless WiFi Link 3945ABG
[15:26] <Keybuk> iwl3945: Tunable channels: 13 802.11bg, 23 802.11a channels
[15:26] <Keybuk> so it's not quite happening at the same point each time
[15:27] <rtg> Keybuk: I've found that with wireless 'enabled' the hang happens less often. Its about 50/50 with wireless disabled.
[15:27] <rtg> and it definitely happens at different points in the boot sequence.
[15:28] <Keybuk> right
[15:28] <Keybuk> but I've elimated that from my tests
[15:28] <Keybuk> it is entirely unrelated to any software alongside
[15:28] <Keybuk> I get the hang with only the modprobe :p
[15:29] <rtg> so, you're saying it really _is_ the i3945 driver?
[15:29] <Keybuk> it looks like it to me
[15:30] <rtg> what about the ipw2200 data point?
[15:30] <Keybuk> what's the other network card in the Thinkpad?
[15:30] <rtg> e1000e
[15:30] <Keybuk> tg3 here
[15:33] <AnAnt> bdmurray: there you are
[15:34] <Keybuk> rtg: am trying loading only the 3945 driver in isolation repeatedly
[15:34] <Keybuk> so far, it doesn't hang
[15:34] <Keybuk> which is interesting
[15:35] <rtg> Keybuk: if you're really sure its i3945, then that lends credence to my suspicion that its hardware related. Perhaps its some kind of PCI bus setup issue?
[15:36] <Keybuk> well, I'm confused as to what else it could be
[15:37] <rtg> The Hardy driver is quite different. Jamie was unable to reproduce this hang with 2.6.26-5.17-generic, all subsequent released _did_ hang.
[15:37] <amitk> Keybuk: lool has already tried the modprobe/modprobe -r test on the 3945 over 700 times. Couldn't reproduce it.
[15:38] <Keybuk> amitk: yeah
[15:38] <Keybuk> that's downright weird
[15:40] <Ng> I forget if I asked this before, but is there any display corruption ever?
[15:41] <rtg> Ng: related to this hang?
[15:41] <Keybuk> it seems like it only hangs if udev calls modprobe ?!
[15:42] <Ng> rtg: yeah
[15:42] <rtg> Ng: I've never noticed any.
[15:42] <Ng> ok, 'cos my random boot hang that does have corruption seems to happen shortly after iwl4965 loads
[15:42] <Ng> well, iwlagn now, but whatever
[15:44] <rtg> I wonder if this is something we ought to solicit Intel's help with?
[15:45] <Ng> err no wait I'm on crack, it's intel_agp that's implicated in mine. As you were ;)
[15:51] <NCommander> amitk, ping, if you get this before you commit my patch
[15:52] <amitk> NCommander: done
[15:52] <rtg> Keybuk: would you comment in #263059 why you think its driver related. I'm not exactly sure how you've come to that conclusion. In the meantime I've gotta get this laptop shipped.
[15:52] <Keybuk> well
[15:53] <Keybuk> I don't get the hang if I don't load this driver ;)
[15:53] <rtg> I agree with that, but its not conclusive.
[15:53] <Keybuk> I *think* it has something to do with the interface being renamed
[15:54] <Keybuk> that always seems to be the last thing that happens before the hang
[15:55] <rtg> well, I have a little time. I'll instrument that code. its pretty straightforward IIRC.
[15:55] <Keybuk> and if I disable that rule in udev, I haven't had the hang yet
[15:55] <rtg> Keybuk: I don't read udev very well. which rule is that?
[15:55] <Keybuk> the persistent-net ones
[15:56] <Keybuk> you need to remove the 7x one that has the rule
[15:56] <Keybuk> and the 6x one to stop the rule coming back
[15:56] <Keybuk> what's ecb(arc4) ?
[15:57] <rtg> dunno
[15:57] <elmo> crypto stuff
[15:58] <Keybuk> hmm
[15:58] <rtg> Keybuk: which 60* rule are you talking about? Can't see anything net related.
[15:59] <Keybuk> sorry
[15:59] <Keybuk> 75-persistent-net-generator.rules
[15:59] <Keybuk> and 70-persistent-net.rules
[16:00] <rtg> 75-persistent-net-generator.rules whitelists eth and wlan, so won't they be ignored?
[16:00] <Keybuk> I just moved the files out of the way :p
[16:01] <rtg> are these rules created during install according to installed HW ?
[16:02] <Keybuk> each boot
[16:02] <Keybuk> though I just ruled that out
[16:02] <Keybuk> finally got one to hang without them
[16:02] <Keybuk> they were just adding time
[16:02] <rtg> hmm, dead-end
[16:03] <Keybuk> this is baffling
[16:03] <Keybuk> it did:
[16:03] <Keybuk>  modprobe tg3 (for eth0)
[16:03] <Keybuk>  seems to have finished
[16:03] <Keybuk>  iwl3945 messages
[16:03] <Keybuk>  Tunable channels blah
[16:03] <Keybuk>  modprobe ecb(arc4)
[16:04] <Keybuk>  iwl3945 0000:0c:00.0: PCI INT A disabled
[16:04] <Keybuk>  *hang*
[16:04] <rtg> so, it didn't make it to the device rename.
[16:04] <Keybuk> I didn't include the device renaming stuff
[16:05] <Keybuk> so it would never try
[16:05] <rtg> by the time you see 'iwl3945 0000:0c:00.0: PCI INT A disabled', the module inti is complete.
[16:09] <Keybuk> do you get the hang if you disable the e1000 from loading?
[16:10] <rtg> Keybuk: lemme try.
[16:13] <rtg> Keybuk: still locks up
[16:15] <Keybuk> ooooh
[16:15] <Keybuk> "Dazed and confused, but trying to continue"
[16:21] <Keybuk> the full error message of that
[16:21] <Keybuk> Uhhuh. NMI received for unknown reason b1 on CPU 0.
[16:21] <Keybuk> You have some hardware problem, likely on the PCI bus.
[16:22] <Keybuk> Dazed and confused, but trying to continue.
[16:22] <Keybuk> -- 
[16:22] <rtg> Keybuk: my feeling exactly
[16:22] <Keybuk> while loading the iwl3945 driver
[16:22] <Keybuk> rtg: you think this is a hardware problem?
[16:22] <rtg> its certainly hardwar'ish
[16:23] <Keybuk> I tend to doubt that hypothesis
[16:23] <Keybuk> because the hardware in question isn't even one piece, but across multiple types of laptop
[16:23] <Keybuk> and its only *this* driver that triggers it
[16:23] <Keybuk> or, at least, this family of drivers :p
[16:24] <rtg> Keybuk: could be the family of PCI interface chips used in these adapters.
[16:24] <Keybuk> why would it only occur now, with 2.6.27 ?
[16:24] <Keybuk> and why only when loading the iwl* driver alongside another driver
[16:24] <Keybuk> sounds much more like the driver is failing to correctly lock out the PCI bus to me
[16:25] <rtg> Keybuk: whats the other driver? I disabled e1000e and it still happened.
[16:25] <Keybuk> any pci driver fwict
[16:25] <Keybuk> both tg3 and the pcmcia socket adapter seem to do it for me
[16:25] <Keybuk> but only when combined with iwl
[16:25] <rtg> Keybuk: that kind of makes sense. 
[16:26] <Keybuk> that's why the modprobe loop doesn't work
[16:26] <Keybuk> you're only loading/unloading iwl3945
[16:26] <Keybuk> but if you load two drivers at once, you get the hang or crash
[16:26] <rtg> which 2?
[16:26] <Keybuk> (I'm doing while true; do modprobe iwl3945 & modprobe tg3; done)
[16:26] <rtg> and that hangs?
[16:26] <Keybuk> yeah
[16:27] <Keybuk> and if it doesn't hang, I get that cute error message sometimes ;)
[16:27] <rtg> are you unloading them first?
[16:27] <Keybuk> yes
[16:27] <rtg> outstanding.
[16:27] <rtg> finally, got it to do something.
[16:28] <rtg> or you did, rather.
[16:29] <rtg> Keybuk: so, I should look at the difference between Hardy and Intrepid start-up.
[16:29] <Keybuk> what's the difference between the drivers?
[16:30] <rtg> Keybuk: in Intrepid the i3945 init is seperate from the other iwl drivers. They were more common in Hardy. Beyond taht I'll have to study them a bit before I know more.
[16:31] <NCommander> does anyone else here work on the lpia kernel beside ScottK and amitk who can answer some questions?
[16:31]  * ScottK declaims all knowledge of anything in the kernel and suspects that maybe NCommander is thinking of persia.
[16:32] <NCommander> we
[16:32] <NCommander> StevenK
[16:32] <NCommander> Damn autocomplete
[16:32] <NCommander> *er
[16:33] <Keybuk> yeah
[16:33] <Keybuk> a dozen times now, have the hang just with the modprobe loop with two modules in it
[16:33] <Keybuk> all I have running is udev (no rename rules or anything - and can't see any side-effects with --debug on)
[16:34] <Keybuk> even the filesystem isn't writable, and no swap mapped
[16:35] <rtg> Keybuk: get that stuff in the bug report please. Its gonna be key in finding root cause.
[16:35] <Keybuk> bug# ?
[16:36] <rtg> Keybuk: 263059.
[16:37] <rtg> Keybuk: one of the major differences that I see right off is that the PCI device is disabled at the end of the probe in Intrepid, whereas its left enabled in Hardy.
[16:43] <Keybuk> weird
[16:43] <Keybuk> ifconfig eth1 up
[16:43] <Keybuk> SIOCSIFFLAGS: No such device
[16:44] <rtg> Keybuk: you should boot Hardy and see if you can still reproduce it.
[16:44] <Keybuk> this is just with a normal boot trying to get my debugging info out :p
[16:45] <Keybuk> oh
[16:45] <Keybuk> had the kill switch on
[16:45] <Keybuk> heh heh heh
[16:45] <rtg> borked, huh?
[16:45] <Keybuk> PEBCAK
[17:11] <rtg> Keybuk: you mean no lspci?
[17:11] <Keybuk> no, lspci is still there
[17:11] <Keybuk> sysfs is still there
[17:11] <Keybuk> it's even still listed in ifconfig
[17:11] <Keybuk> but any ioctl return -ENODEV
[17:12] <Keybuk> and the dmesg implies the interrupts are disabled
[17:12] <rtg> Keybuk: in fact, the interrupt isn't even assigned if rf-kill is enabled.
[17:12] <Keybuk> so it's in a state I'm not familiar with ;)
[17:13] <rtg> Keybuk: yeah, the last thing the driver does is pci_disable_device()
[17:14] <Keybuk> but it enables it later if the kill switch is off?
[17:14] <Keybuk> or does it not call pci_disable_device() if the kill switch is off?
[17:15] <rtg> if wireless is enabled (i.e., kill switch is off), then pci_enable_device() is called at if-up time.
[17:15] <rtg> this is a change in behavior from Hardy.
[17:16] <Keybuk> I'm wondering whether there's some invalid state going on based on the kill switch status
[17:16] <Keybuk> and if the switch is off, the invalid state lasts _less_ time than when it's on
[17:16] <Keybuk> and while in that invalid state, other pci drivers can hang the system
[17:16] <rtg> you can't get this to happen if kill-switch is off?
[17:16] <Keybuk> I can, just much less so
[17:16] <Keybuk> which to me implies the time window of the hang is less if the switch is off
[17:16] <Keybuk> when it's on, it's much more regular
[17:17] <rtg> well, there is certainly less code in the driver being executed when the switch is on.
[17:42] <trenton_> Apologies if I intrude, but have run out of options. I'm trying to get 2.6.27 on hardy is this posible? Without compiling that is.
[17:43] <rtg> trenton_: kernel-ppa
[17:43] <rtg> https://edge.launchpad.net/~kernel-ppa/+archive
[17:43] <trenton_> rtg: thanks