=== zz_gouki is now known as gouki [07:59] hiya [08:00] can anybody tell me if this looks like a kernel problem or more of a Keybuk-boot problem: [08:00] the last thing when I boot on the other machine is "ACPI: I/O resource nForce2_smbus [0x1c40-0x1c7f] conflicts with ACPI region SM01 ............. Error probing SMB2" [08:00] but I get the feeling that that message has nothing to do with the hang on the machine I'm seeing [08:00] if I boot with bin=/bin/bash the last thing I see is "bash: cannot set terminal process group (-1): Inappropriate iotcl for device" "bash: no job control in this shell" [08:00] this is the latest karmic with the ubuntu-boot ppa enabled [08:00] Keybuk: ^ any idea what I could do? [08:06] dholbach: you mean init=/bin/bash ? [08:06] jk-: oh yeah, that's right [08:06] looks like bash is starting up OK, just giving you a warning. [08:06] ok... so it might actually be something boot-related breaking [08:06] it may not have correct access to the console device [08:07] and not the kernel [08:07] tend to agree, would be suspicious its boot and not the kernel [08:07] the ACPI smbus thing seems to be unrelated? [08:07] you may even find it will respond to commands even if it does not look like its prompting [08:07] yeah, make sure you have /dev/ populated [08:08] which with init as bash you won't without devtmpfs [08:11] erm... I think I'm seeing a very very weird thing right now - after booting from a livecd and running fsck and smartctl it works again now (both reported no errors and didn't say anything about fixing something?!) [08:11] Keybuk: nevermind [08:12] apw, jk-: thanks guys - I have no idea what "fixed it now" [08:12] hrm ... computers ... [08:12] might be a good idea to get a replacement disk :-) [08:13] heh ... lets hope not [08:14] at least have it ready and do a backup again :-) [08:30] hi, with one hardware my jaunty waits long time 'starting to boot' on screen. acpi=off reduces that time but complaint pnpbios. and if acpi=off and pnpbios=off again 'starting to boot' show long time. is there a way around this? [08:38] thux, Its hard to say something on that. It might be some subsystem that just takes long to initialize. And with the combination that seems to be faster it might not be there at all. I assume you removed the "quiet" from the command line and still there is nothing between the first line and the boot? [08:38] thux one would need to get the dmesg output from the boot and see if there is an obvious time delay in the timestampts there. may give a clue as to cause [08:39] ok I try first remove quiet and then put dmesg to pastebin [08:44] smb, as a heads up, karmic intel graphics are looking like they have hang issues ... [08:45] apw, I noticed from comments last week. Seems mostly ok on the aa1 [08:45] i am suspicious it only triggers after a suspend/resume [08:47] I had, not yet, a complete lockup as you had. There is a strange case of the netbook launcher sometimes ignoring the mouse after a while. But that is rather something else [08:48] apw, What model do you have? 945GME? [08:50] here is my dmesg http://pastebin.com/m5185a0b0 [08:51] [ 9.224019] pci 0000:00:1d.7: EHCI: BIOS handoff failed (BIOS bug?) 01010001 [08:51] yes [08:51] This seems to be the problem [08:52] before that line it waits now [08:52] Is USB working after this for you? [08:52] yes [08:53] If I read this message correctly there is a problem transferring control of usb from the bios to the kernel [08:54] i have many usb devices plugged only hp laser sometimes not detected [08:54] If there are settings in the bios to control usb support, you might play around with them. But often you want early usb support for usb keyboards [08:54] yes i have that [08:55] does the problem occur if you boot without any usb devices attached [09:00] smb, i have both 945GME and GM45E's showing the issue [09:02] apw, Ok, I also try to keep an eye on it. Though I am running mixed stuff, Jaunty userspace with -10.31 kernel and x-org-edgers X [09:02] heh ... franenstein [09:03] :) Ubuntu, Halloween edition ;-) [09:03] actually thats interesting as these hangs can just as easily be a bad gpu usage from mesa as they can be kernel issues ... so ... a different mesa combination should be an interesting control [09:03] i like the sounds of that ... Ubuntu, halloween [09:03] horrific halloween [09:03] shame we have used up HH [09:04] Hehe, and its no animal... [09:04] I should compare the mesa versions against yours. The edgers should in theory be near Karmic, shouldnt' they? [09:04] halloween hyena [09:05] yeah near, but i'd expect them to be ahead a little [09:06] Mine are all 7.7.0-git20090911.a79eecb9-0ubuntu0tormod-jaunty [09:09] there was a setting in bios for usb hand off and it was disabled, i enabled it and now it boots normally, thanks :) [09:09] wonder why it was default disabled [09:11] you're welcome. good to hear. Meh would not think about it... Maybe works better with the other OS... [09:11] ah that i see === ogra_ is now known as ogra [10:08] cking, gnarl, did either of you report the sd card not mounting thingy [10:08] nope [10:08] apw, Don't think so... Though not mounting thingy is a bit broad [10:09] it was steve who hit this issue [10:10] oh welll i've filed 5 today, wahts one more [10:10] if only i knew what the heck was meant to handle it [10:11] apw, does it only affect sdhc or all or is the reader not detected? [10:11] normally the reader would issue a udev event for a device getting ready (I think) [10:17] the card is detected and the devices are all made, the partition table is read, and the p1 device is made [10:17] its the next step, the mounting it and popping up boxes to show it that don't occur [10:18] Is /media/bla created? [10:18] Err well likely not [10:18] as you say it does not get mounted [10:20] In theory the partition scanning is also done in udev ... Do you get any error message in dmesg? [10:22] I believe each partition should also do a uevent, then udev call blkid and create the symlinks /dev/disk/by-... The mount step might be device-kit... [10:35] smb, no it doesn't get mounted [10:35] the p1 device does get mounted [10:36] apw, ??? huh [10:37] _made_ sorry [10:37] not mounted [10:37] there are by-id files as well [10:39] The id files at least show the partition got through blkid. I just cannot grep the statement about the p1 device. So nothing gets mounted, correct? [10:43] i meant that it does partition discovery correctly and finds it has a partition and makes mmblk0p1 as well [10:44] nothing is mounted as a result. manual mount triggers gnome to find piccies etc [10:44] i've filed it on gnome-volume-manager, bug #429257 [10:44] Malone bug 429257 in gnome-volume-manager "MMC cards no longer auto mount" [Undecided,New] https://launchpad.net/bugs/429257 [10:44] i am a bug filing machine ... 4 today alone [10:45] apw, Shame, you should fix them. ;-) [10:45] na, i file them _you_ get to fix them [10:45] most are these intel graphics hangs ... bah [10:45] If its Karmic, I can' see it (yet) :) [10:47] apw, But seriously, I am not completely sure, the automounting thing seemed to be tied to a desktop (so gnome) but (might check up on this later) might have moved to something more generic to work without a running desktop too... [11:07] smb, yeah figured i file with gnome thingy as thats description included automounting, someone else can move it [11:49] apw, I'm seeing a lot of messages: "TKIP: RX tkey->key_idx=2 frame keyidx=1 priv=de21d780" - is this new with the latest kernel? [11:49] cking, hmmm not sure [11:50] [ 10.393047] lib80211_crypt: registered algorithm 'TKIP' [11:50] changed your AP setup recently? [11:50] not getting it here, but that is a WPA2 thing iirc ? [11:51] yep, WPA2 [11:52] apw, I've rebooted my router today, surely, that's not caused this message? [11:53] whats your driver here, that message appears to be in the rtl8187se driver [11:54] looks to always have been in there [11:54] it's the broadcom wl driver [11:54] :-( [11:54] yay [11:55] nggg === sap is now known as csurbhi [12:58] * amitk welcomes csurbhi to the kernel team [12:58] Hi csurbhi === pgraner` is now known as pgraner [13:02] Hi ! [13:02] thanks Amit [13:09] * apw waves to csurbhi [13:10] hello apw :) [13:20] csurbhi, so hows it going so far [13:20] reading up the docs and setting up accounts.. other than that.. descent [13:20] :) [13:21] Luckily there is help close. :) [13:21] yes.. thats right [13:25] apw, I wonder why we have CONFIG_LIB80211_DEBUG enabled ? [13:25] csurbhi, howdy [13:25] okie dokie [13:26] debian.master/config/config.common.ports:# CONFIG_LIB80211_DEBUG is not set [13:26] debian.master/config/config.common.ubuntu:# CONFIG_LIB80211_DEBUG is not set [13:26] rtg ? [13:27] backports? [13:27] apw: hmm, then what kernel is pgraner running that he would get the 'CCMP: replay detected' message? [13:28] updates/compat-wireless-2.6/config.mk:# CONFIG_LIB80211_DEBUG=y [13:28] rtg ahh we jut hit that with cking, a couple of the WL drivers contain the whole damn 80211 stack in them with mods [13:28] and it may be on there [13:28] apw: it is indeed enabled in LBM [13:28] rtg, apw, Or he has LBM installed [13:29] I have LBM installed [13:29] i think pete has the cranky wl driver doesn't he? [13:30] apw, rt73 IIRC [13:30] apw: actually, rtl8187se [13:30] cking, which rt driver did you have, it was that rtlxxxse one wasn;t it? [13:31] that definatly has the 80211 in it en-toto [13:31] a staging driver horror show and no mistake [13:34] a;w [13:34] apw, I saw this on my wl broadcom driver [13:34] gibber [13:35] its everywhere, and spreading :) [13:38] it's on my HP mini 1000 ;-) [14:07] apw, how about Keybuk's ext4 patch? 'ext4: Don't update superblock write time when filesystem is read-only' [14:08] i've not heard back from him that i know of on its effacacy [14:08] Keybuk, did you get to test that kernel i did? [14:09] apw: did you ever e-mail it me? [14:10] i pointed you to it on here ... [14:10] see, that never works :p [14:10] * Keybuk doesn't read IRC scrollback [14:10] ahh ... damn [14:10] * apw finds it [14:10] * Keybuk finds his laptop under things [14:11] http://people.canonical.com/~apw/lp427822-karmic/ [14:11] mine is burried in bugs [14:11] Keybuk, hmm, you only talk abour east of UTC [14:11] *about [14:12] Keybuk, (see #ubuntu-devel, loic-m is surely in france but seems to see a similar thing) [14:12] ogra: france is very much east of UTC [14:12] I realise that, historically speaking, they are very unhappy about that [14:12] and feel that the prime meridian of the world, and thus all timekeeping, should pass through Paris [14:12] * ogra must have gotten his sense of east and west wrong [14:13] heh [14:13] but that was not the consensus of the rest of the world [14:13] and if they're still clinging to that, we can't help them :p [14:13] Keybuk, you're poking the hornets nest [14:13] Keybuk, sorry about the email, i'll remember that for next time [14:13] rtg: it's a hobby ;) [14:14] * apw files his 7th bug of the day, sigh [14:14] apw: if I'm actually paying attention to IRC, it's usually ok - also /msg is always good - I save those and read them back [14:14] but things-on-a-channel I don't tend to notice if they go off the top of the screen while I'm not looking [14:14] ahh right, fair enough [14:15] My use of IRC isn't much different. [14:15] its a lousy bug tracking mechanism [14:15] ogra: the recovery-while-read-only bug affects everyone from Britain (in Summer time) through Europe, across Eastern Europe, the Middle East, Asia, Indionesia and Australia and (I think) NZ [14:15] all true [14:15] rtg: for me it's a getting-things-done defence [14:16] IRC is a productivity black hole [14:16] indeed [14:16] if I wake up and start reading through things like e-mail or IRC, then I may as well write off being productive in the morning [14:16] Keybuk, yeah, i always get the sides wrong here :P [14:16] and I find that people tend to notice when you're back and remind you about things anyway [14:16] ogra: what's nice is that it's the exact opposite of the init scripts bug [14:17] which only affected Merkins [14:17] heh [14:17] Keybuk, new topic. anything changed recently that I should investigate that slows the boot time on my SSD laptop? It used tyo be 28 seconds, but has gotten much longer since late last week. [14:17] rtg: I've noticed some slowdown with recent kernels myself [14:17] my main laptop seems to take hours to login these days [14:17] both on SDD and HDD [14:17] specially now as its hidden the feeling of length is even worse [14:17] I figured that the IO elevator of the week wasn't working as well as last weeks' [14:18] Keybuk, you think its kernel related? [14:18] rtg: I'm not sure what else has changed [14:18] we added xsplash [14:18] osd seems to have lost its mind [14:18] I'm watching the GDM throbber just sit there and flash at me, no disk activity [14:19] hmm [14:19] odd [14:19] couchdb? [14:20] dunno what couchdb is [14:20] silly desktop database stuff they added for Ubuntu One [14:20] it's quite heavy [14:20] nnng [14:20] and seems to sit there for a few minutes [14:20] isnt it erlang too ? [14:20] that reminds me it exploded on my other laptop [14:20] ubuntu one [14:20] some serious bitching is in order if thats what it is [14:20] * ogra wonders how fast/slow erlang is compared to other langs [14:21] can we tell from a bootchart [14:21] loading another interpreter during boot is surely not speeding up things [14:21] probably [14:22] Keybuk, are we using sreadahead by default now, could that be contributing to slowness [14:22] in the sense we are upgrading a lot, it will be loading old junk? [14:24] if you got a new apparmor profile it will definately slow down the next boot after that [14:24] apw: one of the main reasons we switched to sreadahead is that it reprofiles often [14:24] it may be simply that you upgrade before every reboot [14:24] so every boot sreadahead is profiling [14:24] (rather than assisting) [14:24] now that i almost always do do yes [14:24] I've also noticed a strange issue that I think is ext4, but can't place it [14:25] after people upgraded to my PPA, and rebooted [14:25] they have a 4-6s "exe" process that seems to be somewhere around the point the root filesystem was mounted [14:25] it's almost as if, after major upgrades, ext4 has to do lots of work before it can mount the filesystem next time [14:25] that needs investigation [14:26] its called 'exe' in the profiles? [14:26] yes [14:26] that's what confuses me [14:27] how does the name get generated for the profiles, from the executable link? [14:28] interestingly that link is call 'exe', so perhaps its what things get called which are not convertable back to a name [14:28] i wonder if the chrooting madness and rm madness going on at root mount time leaves us with bad /proc/NNNN/exe links and bad names in the profile [14:28] ie. it could be anything [14:28] right, that's what I think [14:29] that it's something run from the initramfs, just before it's cleaned up, so the link never exists [14:29] hmmmmm [14:30] * apw goes for a couple of boots to see if that speeds things up [14:39] Keybuk, i am getting 1m frm bios to X, 1:22 to bios to gdm, 34s gdm to desktop. now this is a HDD, so i'd expect it to be what 30s total? [14:39] try a second one :) [14:40] (that was second boot, so i assume sreadahead has done its thing) [14:40] ogra, that was the second one [14:40] ah [14:40] still seems like an insane length of time [14:40] i see a few added seconds every time i get an updated apparmor profile for an app [14:40] ouch [14:41] which i was told is normal now [14:41] can you get a bootchart of that [14:41] since apparmor turns the profile textfiles into binary thingies [14:41] use bootchart=nostop on the kernel command line [14:41] then once you're a desktop, run sudo /etc/init.d/stop-bootchart start [14:41] Keybuk, sure [14:42] yay java ... sigh [14:51] apw, whats wrong with java ? ... as long as you dont have to touch its code it's usually fine :) [15:05] Keybuk, ok got it, seems sreadahead runs for a full 90s, that seems unreasonable to me [15:06] apw: really? [15:06] when you login do "status sreadahead" [15:06] what does it say? [15:06] apw@dm$ status sreadahead [15:06] sreadahead stop/waiting [15:08] * apw pushed up the data to rookery [15:08] if this image is to be believed, then it sustained like 44MB/s for that 90 odd seconds [15:08] which is like 4GB of poop [15:09] http://people.canonical.com/~apw/bootchart/ [15:11] does the green line on the bar there show instantaneough thorugh put? of so its like 100k/s over that time? [15:12] * apw has a nother go [15:20] Keybuk, ok the second boot was a little better, so lets assume the first one was a reconfigure or something [15:20] device discovery takes like 20s alone [15:21] status sreadahead - if that says "stop/waiting" then your boot was assisted by sreadahead [15:22] if it says "stop/killed" or similar, then your boot was *reprofiled* [15:22] apw: the -2 [15:22] that shows the exact same I/O weirdness that I see [15:22] what bit? [15:22] the entire system spends most of its time in I/O wait [15:22] http://people.canonical.com/~apw/bootchart/dm-karmic-20090914-2.png [15:22] the two graphs along the top [15:23] the first graph, blue is CPU usage, red is I/O wait [15:23] the second graph, red is disk utilization, the green lines are throughput [15:23] yep, so that would imply its nailed waiting for disk [15:23] so that's showing [15:23] 1. your system is spending almost all of its time in I/O wait (BAD BAD BAD BAD) [15:23] 2. your disk is continually utilized [15:23] 3. almost no throughput is being achieved, despite the continuous utilization [15:23] I'd also point out where this starts [15:23] so read ahead is sucking? [15:23] deep inside the initramfs [15:24] in fact, last time I debugged this, it started as soon as you loaded the disk controller [15:24] which implies something seriously wrong in the kernel [15:24] apw, do you still have older kernels installed? [15:24] * apw would have a bunch of them yes [15:25] i'll go back to something older and see [15:25] apw, IIRC -8 worked well. [15:25] if its the kernel it would imply its more like kernel readahead sucking [15:25] anyhow ... i'll try -8 [15:26] apw: could be, though again, it starts in the initramfs [15:26] this is quite important [15:26] it starts *before* we mount the root filesystem [15:26] it starts before we even know which block device is the root filesystem [15:27] could it just be bad stats rather than actual anything [15:27] enough to throw off sreadahead [15:28] sreadahead doesn't look at stats [15:28] but I don't think it is bad stats [15:29] my Dell laptop has exactly this problem [15:29] and has had for a while now [15:29] though my disk light is nailed ... and its rattling like hell [15:29] this is why you hear me whining continuously about I/O performance [15:29] *but* other laptops and hardware I have *do not* [15:29] but there is io perf and io perf [15:29] mine is a dell [15:30] I suspect it's controller related [15:30] a horrible thouught [15:33] -5 is as bad === bjf-afk is now known as bjf [16:29] apw: I have tested your kernel packages, and they are good [16:52] Keybuk, most excellent [16:53] Keybuk, i am assuming you will be going back to ted, and we'll get a pair of patches from you when thats ready? [17:04] Keybuk, do you know waht this 'utilisation' on the disk actually is, which stat it is [17:04] we don't seem to have any form of actual 'how utilised is the disk' number in the numbers collected by bootchart [17:06] jjohansen, pgraner: Is there an EC2 kernel status meeting now? [17:07] jjohansen: ^^^^^^^^???? [17:07] its at this time in my calender [17:08] (modulo google being pants) [17:08] ah yes [17:09] just a sec [17:09] apw: pants, pants, pants... that gcal is [17:09] so very true [17:09] erichammond, lool has approved the linux-ec2 package for main inclusion, so perhaps I can get it built today. [17:09] here [17:10] alright lets begin [17:10] This is the EC2 kernel status update meeting [17:11] as far as I am aware the only change since thursday/friday is the state of bug #427288 [17:11] Malone bug 427288 in eglibc "Karmic i386 EC2 kernel emulating unsupported memory accesses" [High,Fix released] https://launchpad.net/bugs/427288 [17:11] which was fixed in glibc [17:12] smoser: Can a new AMI be created to test this fix? [17:12] it looks like there is some packaging work to be done as well [17:12] smoser, have you been able to test this? [17:13] i have not tested this yet, but will do that today. [17:13] erichammond, i think publishing a new ami for it doesn't make sense given alpha 6 on thursday [17:14] smoser: so are we then planning on rolling out a new ami with the karmic kernel [17:14] minus this glibc fix [17:15] i'm planing on alpha6 using the karmic kernel [17:15] jjohansen, also, i opened bug 428692 [17:15] Malone bug 428692 in ubuntu "ec2 kernel needs CONFIG_BLK_DEV_LOOP=y and other config changes" [Medium,Confirmed] https://launchpad.net/bugs/428692 [17:15] smoser: thanks [17:15] smoser, why _wouldn't_ you use the new glibc? Isn't that what the alphas are for? [17:16] i did not intend to say i wouldnt [17:16] i hope to get 427288 "all the way fixed" for alpha6 [17:16] ah, I missed interpreted, no point in rolling out a new ami before alpha6, right? [17:17] smoser: My opinion is that if we can test a fix, we should do it sooner rather than later. Are we saying that the fix won't be in glibc until alpha6? [17:18] I'll test it. it can easily be tested just by apt-get update && apt-get install libc6-xen and reboot [17:19] smoser: nice, thanks. [17:19] then, i'll make sure that we get libc6-xen into the ami, and that that ami will not get the error. but i dont plan on publishing another ami with all that incorporated prior to alpha6 [17:19] smoser: When you publish an AMI are you building it from scratch with vmbuilder or are you simply uploading and registering the daily image which is automatically created? [17:20] i much prefer to use the nightly [17:20] if that werent' sufficient, then i'd re-spin the nightly in the same way, just at a later date [17:21] smoser: Ok. How often is the code updated which is used to build the nightly? I thought I saw that you pinged soren to manually update it recently. [17:21] rtg: It was still in NEW when I looked though [17:21] erichammond, you're correct [17:22] we do need a long term solution for that. [17:22] rtg: (Thanks for clarifying the doc question BTW) [17:22] lool, right. how long will it sit there before getting accepted? [17:22] rtg: I'm not archive admin; it's up to them [17:22] lool, so, just find the Monday guy and hassle him? [17:22] Digging the Ubuntu wiki page [17:22] https://wiki.ubuntu.com/ArchiveAdministration [17:22] # [17:22] Monday: SteveLangasek; JamesWestby [17:22] rtg: You could I guess [17:23] rtg: slangasek seems like a good target; he knows about the source package and how it's constructed [17:23] rtg: I expect that if the copyright was properly updated for the Xen patches it should be ok [17:24] lool, uh, I didn't see that mentioned anywhere. [17:24] rtg: Well it's not MIR related, it's NEW related [17:24] rtg: Source packages have to document their copyright properly before they enter the archive, right? [17:24] My role was just in approving the move of this package from universe to main [17:25] lool, AFAIK there are no copyright issues. [17:25] Since it's built like the other kernel packages which are in main already and since the kernel team will care for it, I dont have a big issue with another one [17:25] everything is gplv2 [17:25] k [17:25] apw: yes, I've already sent ted a note [17:25] (I'm lost, is this still part of the EC2 status?) [17:26] apw: I think it comes from /proc/diskstats [17:26] rtg: debian/copyright usually mentions where stuff was taken from, which license it's under and who owns copyright; that said this last bit probably isn't maintainable in the copyright file so I guess you would refer to the list of AUTHORS or something similar or just omit it [17:26] erichammond, sorry, lool and I were discussing linux-ec2 acceptance. [17:26] (Is this a meeting?) [17:26] lool, 'tis [17:27] apw: bootchart logs /proc/stat, /proc/diskstats and /proc/*/stat [17:27] all chart information is derived from that [17:27] jjohansen, are we done? [17:27] I believe so [17:28] anyone else? [17:28] smoser's second bug has not been addressed. [17:28] Ok /me disappears then, sorry for interrupting [17:28] bug 428692 [17:28] Malone bug 428692 in ubuntu "ec2 kernel needs CONFIG_BLK_DEV_LOOP=y and other config changes" [Medium,Confirmed] https://launchpad.net/bugs/428692 [17:28] isn't DEV_LOOP=y already? [17:29] Keybuk, it is for the master branch [17:29] Keybuk: not in EC2 build currently [17:29] the bug is really "make the ec2 kernel more like -server" [17:29] smoser, I see that. I'll see what I can do [17:29] but explicitly, users have already asked for CONFIG_BLK_DEV_LOOP=y [17:30] smoser: you also wanted KEXEC :) [17:30] Keybuk, from waht i can see the utilisation is a number of ticks that there was an io on the queue... so that means 1 io has the basic same effect on the table as 10 [17:31] apw: it doesn't seem to be translated into binary though, it appears as an analogue graph [17:31] jjohansen, yes, but you dont want to support it. so i wasn't going to explicitly ask :) [17:31] and, most importantly [17:31] I have machines which *don't* have that I/O graph [17:31] and they boot much faster, with identical installs [17:32] smoser: right [17:32] Keybuk, right if the io is very empty then it would be empty, or if the queue is empty any time in the sampling tick the answer will be less than 1.0, but if its got 1 io for all the time then its 1.0 same as if its 100 io for all the time [17:32] ie, its pretty much saying if read ahead worked even badly the block should be 'full' utilisation wise [17:32] all right if there is nothing else, we will call the EC2 kernel meeting adjourned [17:33] seems a bit pointless [17:33] This may also be the place and time to discuss bug 429169 as some of the alternatives affect the kernel. [17:33]