[00:35] hi guys, trying to file a bug, need to figure out which package is appropriate, a little help? [00:36] eltoozero: what kind of bug [00:36] bug #420363 [00:36] Malone bug 420363 in gnome-bluetooth "[eee 901][karmic]system freeze when toggling wifi while bluetooth disabled" [Undecided,New] https://launchpad.net/bugs/420363 [00:37] as you can see I originally filed as bluetooth [00:37] but it happens when wifi is toggled, but only after bt is toggled. [00:37] there is a video of the freeze attached to the bug. [00:38] full freeze, black screen, a console cursor and mouse cursor remain. REISUB is ineffective, full shut down required. [00:39] I was just now directed from #ubuntu-bugs to inquire here, michag suggested I check syslog to see if there is wisdom to be gained there [00:39] eltoozero: kernel is the linux package [00:40] yes, what can I do to establish if this is a kernel related issue? [00:41] I suppose the alternative would be the rt2860, or bluetooth, that's what I need help with. [00:42] eltoozero: if it is a full freeze it is at least partially kernel related, even if it is a driver [00:44] jjohansen1, ok cool, what can I do to help? [00:45] eltoozero: hrmm can you trigger it using rfkill from a vt? [00:46] that's something to try; for clarification, by vt you mean ctrl-alt f1 [00:47] yes [00:47] cool, I'd like to see if anything shows up in syslog as well, I'll test that, thank you. [00:47] doing it from the vt console if rfkill triggers the bug we might get a trace back [00:48] brb, reproducing freezes is fun. :) [00:50] hang on now, when I do an rfkill list, I get two bluetooth adapter entries... [00:50] eeepc-bluetooth: Bluetooth, and hci0: Bluetooth [00:51] That's normal [00:52] thank you mjg59, on to the crashing. [00:54] of course it works fine from vt... [00:55] the problem is triggered from the fn-f2 hotkey, at least that's how I've been reproducing it [00:56] if I can only get it to happen via that method, and not via rfkill, what does that indicate? [00:56] that rfkill isn't the problem :) [00:57] so, reproduce the way I know how and check syslog I suppose would be the next logical step? [00:57] yes [00:59] though if it is a full freeze I don't expect anything will make it to the log [01:03] eltoozero: when you tried the console, did you try getting bluetooth into an unknown state via graphical interface and then switching to console and using rfkill [01:06] jjohansen1, nothing in syslog before the reboot === eltoozero_ is now known as eltoozero [01:07] eltoozero: when you tried the console, did you try getting bluetooth into an unknown state via graphical interface and then switching to console and using rfkill [01:08] yes [01:08] unintentionally, it was in the unknown state to begin with [01:08] the rfkill toggle does not affect the unknown status of the adapter as shown in the indicator applet [01:09] okay, so the question becomes what is different between using rfkill on the console and the hotkey [01:09] on top of that I can't always reproduce the unknown status on the adapter, it will freeze even if it is reporting on/off correctly, bluetooth just has to be toggled. [01:11] eltoozero: yes but it does show something is broken in the driver [01:14] eltoozero: can you include the command that is invoked by your hotkey [01:16] jjohansen1: on a different box so I don't drop myself like a fool [01:17] eltoozero: hrmm, this could also be a bias issue [01:17] s/bias/bios [01:17] k, hit me. [01:17] fn+f2 will kick would be bios/acpi [01:17] I'm running the most updated bios for my hw, I can disable bt in the bios and when I boot it's recognized as such [01:18] initially I tried shutting wifi/bt off in the bios, enabling via gui, then shutting off to reproduce the bug [01:18] everything works until that last wifi disable, but it does disable [01:18] right, what I mean is it might not be playing nice with the driver [01:18] access the bios right after the crash, it's disabled. [01:23] I did find this previously, https://wiki.ubuntu.com/Hotkeys/Troubleshooting [01:23] eltoozero_: hrmm I found a couple other references to fn-f2 freezing the machine [01:23] it appears the key is generating the XF86WLAN event [01:24] rfkill may freeze Eees if you're using one of the broken vendor wireless drivers [01:24] They don't play nice with the PCI hotplug [01:25] sounds good, what would be a suggested workaround? locate a more appropriate driver? [01:25] Which driver are you using? [01:25] rt2860 [01:25] can I be more specific? === eltoozero_ is now known as eltoozero [01:27] Yeah that's the vendor driver from staging [01:27] I see in the dmesg is says rt2860sta and "you have been warned" [01:28] (love it) [01:28] rt2800pci doesn't seem to have been merged to mainline yet [01:28] Not sure what the situation is there [01:30] mjg59: https://lists.ubuntu.com/archives/kernel-team/2009-July/006564.html [01:31] Yeah [01:31] So right now you're stuck with the broken driver [01:31] hey, them's the breaks. [01:32] But I suspect that's your problem [01:32] And that if you unload rt2860 you won't be able to generate the hang [01:32] that would be my bet as well [01:33] sounds reasonable, I'm sure it's really just a better idea to get a well supported wifi chip. [01:34] It should be reasonably supported in the not-too-distant future [01:34] mjg59: where would I go to aid in that goal? [01:35] Searching linux-wireless may be your best bet [01:35] The mailng list [01:36] cool, so the drivers that are included are likely _not_ the manufacturers drivers, because of copyright/firmware, correct? [01:36] the question being, is it worth it to try their reasonably up-to-date source for the chip [01:37] The long-run plan is to have a rewritten driver that actually uses the Linux wifi stack [01:37] Rather than the vendor drivers, which don't integrate cleanly [01:37] rt2860 is a lightly cleaned up version of the manufacturer driver [01:37] rt2800pci will be the rewritten Linux one [01:37] from the rt2800pci page it doesn't seem to indicate my chipset in their support list [01:38] based on the following Ralink chipsets: rt2400, rt2500, rt2570, rt61 and rt73. [01:38] but I totally follow you [01:39] very helpful guys, thanks for all the info and guidance. [01:39] mjg59, jjohansen1, keep up the good work. [02:22] mjg59: do you mean that there is a tiny chance that there will be Ralink drivers that support multithreaded/multicore CPUs after all those years? ;) [02:23] One hopes [02:26] I've never had real problems to get them work on a P3, but IME anything like a P4, Core/Core2 Duo, etc. will magically work or not work depending on the system's mood today... :-/ [02:26] (sounds like timing-based concurrency problems...) [05:03] ericm_, ikepanhc are you guys using grub2 in your karmic? [05:04] i did see any menu for me to select the right kernel for booting [05:04] cooloney, y [05:04] cooloney, did or didn't? [05:04] didnt [05:04] is that a fresh installation? [05:05] cooloney: not yet, I am still using grub to boot karmic kernel/filesystem [05:05] cooloney: you mean you can not find menu.lst? [05:05] cooloney: grub2 has different format and name [05:06] ericm_, yup fresh [05:06] ikepanhc, indeed [05:06] ikepanhc, it should be the grub.cfg [05:06] cooloney, that shouldn't be [05:07] cooloney, my fresh installation here works perfect [05:07] TheMuso: I notice that a while back you removed keyboard map hooks from initramfs.conf, stating that console-setup takes care of that. now, where is the console-setup hoow to copy that keyboard map to initrd? [05:07] ikepanhc, but when i boot my machine, there is no such menu for me to pick up a right kernel [05:07] It is automatically generated by /usr/sbin/grub-mkconfig [05:08] ikepanhc, great, grub2 config is right, but i did not see the menu selection screen [05:08] did you? [05:08] cooloney, then what did you see on your boot? And can you even boot into _one_ kernel? [05:08] cooloney: I am not using it, I still spend most of my time on Jaunty [05:08] ericm_, just booting into our splash screen, progress bar then enter gdm [05:09] very quick [05:09] ikepanhc, got it. [05:09] i just wanna try an old version kernel. but the grub2 don't let me choose [05:10] cooloney, ESC doesn't help right? [05:12] ericm_, just tried ESC, too quick for me, i missed [05:12] ericm_, booting too quick is a pain for developer, heh [05:12] cooloney, hammer on your desktop to slow it down [05:14] cooloney, if you hold the shift key you should get the grub2 menu (so I've been told) [05:15] bjf-afk, thanks god, let me try. bjf-afk is every where, cheers [05:20] bjf-afk, i appreciate, shift works, [05:23] cooloney, from what I can tell *lots* of people are running into the same issue :-) [05:27] bjf-afk, i have to recode this shift trick into my Zim wiki notes, thanks, -:) === ericm_ is now known as ericm-lunch === cooloney is now known as cooloney-afk === cjwatson_ is now known as cjwatson [09:19] is anyone familiar with an issue where plugging in an earphone after resuming from suspend/hibernate does not mute the speakers? [09:48] howdy! what should we do about bug #396286 ? we've narrowed down the exact upstream git commit that broke it and filed a bug upstream in their bugzilla. [09:48] Malone bug 396286 in linux "2.6.31-generic: kernel panic near the end of initramfs" [High,In progress] https://launchpad.net/bugs/396286 [09:48] or actually, the first commit in a series of subsequent ones that made it difficult to simply revert one git commit. [09:49] ogasawara helped me track down the exact commit but, at this point, we'd need people who actually know kernel internals to jump in and help LKML fix it. [09:49] Q-FUNK, Let me have a look at it [09:50] smb: thanks! :) [09:58] Q-FUNK, Interesting. Seems atm only the nag-mails from Rafael come in. That patch that seems to break things looks nicely short... I try to sink into that a bit. But for completeness: what kinds of filesystems are involved? Only ext3, ext4, other? [10:03] ext3 [10:04] well, it's not just one commit. it's the first commit in a long series to implement.. was it ACL... in various filesystems. [10:04] and indeed, no feedback except automated nag mails from Raphael. [10:04] Yeah posix acl... thingies [10:04] or well, Ingo Molnar replied on LKML that he'd rather get the crash dump as plain text rather than a picture and then he left it at that. [10:06] IIRC, AL Viro, who made those changes, was the one who said that he could not see how adding posix acl to /sys would make the kernel crash at bootup. [10:06] The interesting part is why does it only blow into your face... Which is always hard if that happens before you have something to write it to... A problem with the pictures often is that you miss the (more valuable) top part. Or parts as often there are more than one [10:07] AFAIK neither of them bothered actually chekcing what my most recent snapshot shows. [10:07] right [10:07] here, to prevent that, I'm puposely booting with a huge framebuffer. it fits more content on screen. [10:08] 1280x1024, IIRC [10:08] yeah, this does not look bad and has the top of it [10:09] ok, sort of when cleaning an inode up. [10:09] yup, right from the uname at the start of the crash until the cursor stopped flashing. [10:10] somewhat tiny fonts, but I use a tripod to mount the camera, so the amoutn of fuzz is minimized. [10:10] Its definitely one of the better ones. My eyes do not burn while looking at it. :) [10:11] and yes, it seems that it buggers on some inode. [10:12] however, it's unclear to me whether it's an inode on the / filesystem in ext3 or something else. [10:13] I cannot remember if it was Ingo or Al who suggested that this looks like a crash while mounting /sys [10:14] Let me spend a bit of time to think about it. It could well be that it is one of the virtual fs'es and maybe some other code that forgets to do things right. Somehow it must be something that is usually not used, otherwise everybody would see it [10:15] It might be helpful to know what is special on a Geode (never saw one) [10:15] I could paste the cpuinfo [10:15] It's basically a 586 with some extra more recent instructions, plus some amd-specific things [10:16] Its maybe not the cpu only but the system as a whole. Meh, not very specific [10:17] should I paste fstab also? [10:18] Wait a bit with that. For the moment I guess there is reasonable evidence. If I think we need more I would as in the report... [10:19] Last sysfs file touched was sys/power/resume... thats also good to know [10:22] ah. acpi issue then? [10:24] Sorry, it might be somewhere in the report but have you tried with a busting version and acpi=off on the commdandline? [10:25] busting version? [10:26] err, I was trying to say a kernel version that usually does not boot [10:26] ah, right [10:26] no, but I could try. just a sec. [10:29] still crashes [10:30] noticed the line on top that says BUG: could not handle paging request at ***** [10:30] did something try to request tons of memory and barfed with a buffer overflow? [10:32] My suspicion would rather be (as it starts with that acl patch) that one of those pointers get set to something not -1 without being a real address and then are tried to get freed on destroy inode... But why needs a bit more meditation [10:38] ok [10:41] Hm, actually you could do me a favour and get me the address of the first BUG that happens on the current karmic kernel -10.something. I just need the IP: ... (like __destroy_inode+...) Then I check this more exactly here [10:41] If you would add that info to the report I can check up later on [11:01] is this where i can ask kernel related question...? [12:40] apw: when's a good time to send you a kernel patch? [12:41] Keybuk, is there a bad time? clearly if you want it in A6, its going to be tight [12:41] sent to kernel-team [12:41] it's a fairly critical bug fix [12:41] and we need to test it first, to get the other half of the bug fix [12:42] Keybuk, what do you need? [12:42] a kernel with that bodged on, so you can test? [12:42] yes please [12:42] it's a fix for ext4 to not write the "last write time" field of the superblock when the filesystem is mounted read-only [12:43] that would be clearly against the spirit of read-only [12:43] indeed [12:43] (it's replaying the journal from an unclean shutdown) [12:44] the problem is that it gets that time from the system clock [12:44] yeah i got hit by that 'your last mount was in the future, full fsck engaged' [12:44] when the filesystem is mounted read-only on boot (by either the kernel or the initramfs), the system clock contains whatever was in the hardware clock [12:44] and that's not necessarily UTC [12:44] right [12:44] this is the fix ;) [12:44] why we never hit it before? [12:44] because Ted has blamed Ubuntu for the apparent clock issues [12:44] saying we had "buggy init scripts" [12:45] and he even included a hack in e2fsck that checked whether you were running on Ubuntu, and simply ignored the issue [12:45] sounds like ted :) [12:45] I removed the hack, because we *don't* have buggy init scripts, because we spent a long time making sure they were right [12:45] and then these bugs started showing up again [12:45] All problems exist in userspace [12:45] it's refreshing to discover that they are actually kernel bugs in the ext3/4 filesystem code ;) [12:45] good on you for your being pig headed and stubourn :) [12:46] the only reason they seemed to affect Ubuntu is because [12:46] err [12:46] we have users [12:46] hehe shhhh [12:46] everyone will want them [12:46] The real issue is how widespread having the clock in local time is [12:46] I suspect that's far more widespread with Ubuntu than most distributions [12:47] windows defaults to that [12:47] Quite [12:47] anyone who dual-boots with windows will have it in local time [12:47] so any dual boot setup will have it [12:47] it's probably true that we have more "ordinary users" who do such a thing [12:47] so we are more vunerable to it through that [12:47] whereas other distros have a higher proportion of Linux purists [12:47] all true. but a bug is a bug, and finding and squashing it is good [12:47] Keybuk, a PPA needed, or some downloadable .deb's enough for you for this testing [12:48] and if the latter, what arch is most useful so i can build that first [12:48] downloadable is fine [12:48] _i386 [12:48] that point, for people who are east of GMT and who make their clock [12:48] tick in localtime for Windows bug-for-bug compatibility, and this will [12:48] nice description [12:49] Keybuk, that fix only fixes ext4 ... [12:49] yes, Ted would like to know whether this is the right fix [12:49] if it is, he'll fix ext3 too [12:50] apw: I know, I'm genuinely shocked [12:50] I fully expected Ted to find *some* way to blame Ubuntu in the description ;) [12:50] Keybuk, you got a bug for this in LP ? so i can use the number? [12:51] do you know, we don't [12:51] let me file one [12:51] * apw slaps self for not filing one midnight now - 24hours when he hit it [12:53] * apw waits with baited breath [12:53] bug #427822 [12:53] Malone bug 427822 in linux "fsck says last write time in future" [Undecided,New] https://launchpad.net/bugs/427822 [12:53] I've deliberately left an e2fsprogs task there since that's where people will look [12:53] very sensible [12:53] and close it won't fix there not invalid so it stays searchable [12:54] * apw fires up the hoover [12:55] I'll leave it open for now [12:59] yeah i meant in the long run [13:06] Keybuk, we likely have to get this done by monday eve to get this in the alpha [13:06] yes [13:06] this bug is pretty much hitting everyone [13:06] so I think we should consider it critical for the alpha [13:07] ack ... i am sure you would have reason to get it by the freeze anyhow as its bad, but just stating the obvious [13:07] it takes soooo damn long to get a kernel these days [13:08] sure, we can hold up alphas easy enough though :p [13:09] smb: come again? [13:09] Q-FUNK, yep [13:09] smb: what was it that you wanted me to add to the bug? [13:10] smb: address of the first BUG? [13:10] smb: ah, you mean the crash output? the first line that starts with BUG? [13:10] Just the address of the unhandled pointer reference with the latest ubuntu kernel [13:11] so I can have one compiled here with debugging info and check the exact location of the crash [13:11] right [13:12] but the line is that which starts with IP: ... [13:12] directly below of the BUG: line [13:12] BUG: unable to handle kernel paging request at ffffb4ff [13:12] this? [13:13] ah [13:13] below that which was: IP: [] __destroy_inode+0x4b/0x80 [13:14] Mainly to check that it is still 0x4b and not moved to somewhere slightly off [13:16] smb: added [13:18] Q-FUNK, Ok, thanks. Now I only need to have the compile done. Thanks so far [13:21] Keybuk: It'll hit far fewer after BST ends :p [13:23] mjg59: :D [13:33] * apw muses about online fsck, was it btrfs which did that? [13:36] softupdate FFS does [13:36] But softupdates are brain meltingly difficult [13:43] Keybuk, could we not have left the ted bodge in, but made it produce an message or something ... so we don't get hurt by it all the time? [13:43] you only care to know if you have stopped it occuring after all [13:43] and the pain level is very high from this thing [13:45] as an example your test kernels are off the table while my machine is rebooting [13:45] and its been fsck'ing for 20 mins already [14:02] apw: karmic is not being released tomorrow [14:03] I would rather fix this properly and *know* it's fixed [14:03] rather than bodging out the error just for people running an unreleased version ;) [14:03] but the fact is just print a vile error when the bodge was applied would give you just the same information and not mean i lose an hour every tiem i915 breaks [14:04] no it wouldn't [14:04] because the vile error isn't printed [14:04] there's a splash screen in the way [14:04] heh for you maybe ... [14:04] if you want to bodge it for yourself, just stick that buggy_init_scripts thing back [14:04] zelda scott% cat /etc/e2fsck.conf [14:04] [options] [14:04] buggy_init_scripts = 1 [14:05] oh it still in there excellent [14:05] will do that given the massive instability i seem to have today === lamont` is now known as lamont [14:45] Keybuk, that patched kernel is now available at: http://people.canonical.com/~apw/lp427822-karmic/ === bjf-afk is now known as bjf [15:58] the EC2 kernel status meeting has been moved to #ubuntu-server and will start at 16:00 UTC (ie. 2 min) [17:08] smb: thanks! what should I pay attention to when I boot your test kernel? [17:09] Q-FUNK, As I don't hope much its a double free it likely just runs into the bug again with just a slightly different address. [17:09] If it boots, then you would see WARNINGs in dmesg... [17:10] ok [17:10] I try to make a v2 that juggles around with the structure elements. So if something blindly writes it should then hit the same private pointer as before... but that I have to build [17:11] lemme grab my camera and ake a snapshot of that one. [17:14] Q-FUNK, No worries about a picture [17:14] If its still the same bug [17:28] smb: result pasted [17:28] * smb looks [17:29] Ok, I nearly expected that. I am building another one right now and will upload it as soon as it is done. [17:33] ok [17:33] I'll wait for that and test it. [17:50] pgraner: Did the weekly Karmic meeting displace the daily EC2 kernel meeting today? [17:50] Q-FUNK, Ok, its uploaded. A little warning, as I cheated a bit with the abi number, external modules (including dkms) get confused. So if using nvidia or fglrx for example, those fail to load. [17:51] erichammond, jjohansen1 bodged the meeting time and is now at the dentist [17:51] erichammond, i think the EC2 meeting is at the top of the next hour [17:51] oh perhaps not [17:52] erichammond, the only significant kernel issues that I'm aware of is the 4gb i386 trap warning. smoser is supposed to try an image with libc6-xen [17:56] apw, were you able to test 'ext4: Don't update superblock write time when filesystem is read-only' ? [17:56] rtg, i added the patch to a kernel for Keybuk ... [17:56] Keybuk, did you manage to test it yet? [17:56] apw, weren't you suffering from it? [17:57] we can do it whenever (the meeting) and as far as i am aware the only issue is bug 427288 (4gb i386) [17:57] Malone bug 427288 in linux "Karmic i386 EC2 kernel emulating unsupported memory accesses" [High,Triaged] https://launchpad.net/bugs/427288 [17:57] i was in the sense that i hit it on reboot yes [17:57] but when i do hit it it takes 40 mins to recover, and i was in the release meetings :/ [17:57] okeydoke. I was asked to attend the meeting, so I set my alarm a couple hours early :-/ [17:58] Seems like things were covered in the Karmic release meeting which I accidentally stumbled into. [17:58] erichammond, good. I think we'll skip the ec2 meeting today pending results from smoser [18:03] smb: Geode has built-in video. no prolem with that :) [18:08] can some one help me how to enable kdb in ubuntu 2.6.28-14-generic [18:09] smb: victory! [18:10] Q-FUNK, Unfortunately just partially. [18:10] smb: whatever you did, it boots. it however hangs up later at the apparmor loading step. [18:11] smb: well, you've at least succeeded in isolating the issue [18:11] The corruption might still be in place, but instead of hitting the i_acl pointer and being visible on boot, it would affect the private pointer and might go unnoticed [18:12] At least it seems that the change seems quite isolated, seems to be only at this one location... [18:15] Q-FUNK, I wonder whether apparmour failing later is a clue or just coincidence because of the silently modified structure... But I guess that needs a bit more thinking... :) [18:15] Linux geode 2.6.31-10-generic #32bug396286v2 SMP Fri Sep 11 16:26:56 UTC 2009 i586 [18:15] smb: that, I wouldn't know. [18:15] brb [18:26] cjwatson, did you say our grub2 was updated with the potential fix for the blammo? [18:35] no [18:35] Robert still seems to be working on it - the Debian bug's been updated [18:35] I think I'm waiting for it to go upstream at this point [18:35] sound indeed [18:35] bah lost the link to the debian bug ... don't have it by any chance do you? [19:53] ogra: lool: First cut of the freescale kernel rebased to 2.6.31 and all patches upto sdk1.6 applied -> http://people.canonical.com/~amitk/mx51/linux-image-2.6.31-100-imx51_2.6.31-101.8_armel.deb [19:53] I'll be reordering/cleaning up a little bit, but this is the meat of it. [19:57] ogra: thanks [19:58] err amitk, thakns [19:58] amitk: How did the kernel work for you? [19:58] amitk: Could you comment on the latest patches? Fixes or new features or...? [20:47] I get the impression that the upstream kernel people aren't aware that a desktop becomes unusable under any io load? [21:09] cwillu_at_work: it just depends on your definition of usable and load [21:09] :) [21:09] D: [21:10] but seriously :p [21:10] I don't understand why my mouse cursor should ever stop moving in response to _disk_ load [21:10] I thought this was supposed to have been taken care of years ago :p [21:11] and then I thought that having 4gb of ram to fit my 2gb working set and still have 2gb left over for caching was supposed to make me forget about my troubles [21:19] It just feels like we've made no gains in interactivity, despite the improving metrics that are intended to reflect that [21:31] brb (groan if you must :p) [21:41] Would it make sense to link from https://wiki.ubuntu.com/KernelTeam to https://www.google.com/calendar/hosted/canonical.com/embed?src=50d02kfdekgcjdcpc970hh83f0%40group.calendar.google.com&ctz=America/New_York ? === bjf is now known as bjf-afk