=== zz_gouki is now known as gouki [07:59] hiya [08:00] can anybody tell me if this looks like a kernel problem or more of a Keybuk-boot problem: [08:00] the last thing when I boot on the other machine is "ACPI: I/O resource nForce2_smbus [0x1c40-0x1c7f] conflicts with ACPI region SM01 ............. Error probing SMB2" [08:00] but I get the feeling that that message has nothing to do with the hang on the machine I'm seeing [08:00] if I boot with bin=/bin/bash the last thing I see is "bash: cannot set terminal process group (-1): Inappropriate iotcl for device" "bash: no job control in this shell" [08:00] this is the latest karmic with the ubuntu-boot ppa enabled [08:00] Keybuk: ^ any idea what I could do? [08:06] dholbach: you mean init=/bin/bash ? [08:06] jk-: oh yeah, that's right [08:06] looks like bash is starting up OK, just giving you a warning. [08:06] ok... so it might actually be something boot-related breaking [08:06] it may not have correct access to the console device [08:07] and not the kernel [08:07] tend to agree, would be suspicious its boot and not the kernel [08:07] the ACPI smbus thing seems to be unrelated? [08:07] you may even find it will respond to commands even if it does not look like its prompting [08:07] yeah, make sure you have /dev/ populated [08:08] which with init as bash you won't without devtmpfs [08:11] erm... I think I'm seeing a very very weird thing right now - after booting from a livecd and running fsck and smartctl it works again now (both reported no errors and didn't say anything about fixing something?!) [08:11] Keybuk: nevermind [08:12] apw, jk-: thanks guys - I have no idea what "fixed it now" [08:12] hrm ... computers ... [08:12] might be a good idea to get a replacement disk :-) [08:13] heh ... lets hope not [08:14] at least have it ready and do a backup again :-) [08:30] hi, with one hardware my jaunty waits long time 'starting to boot' on screen. acpi=off reduces that time but complaint pnpbios. and if acpi=off and pnpbios=off again 'starting to boot' show long time. is there a way around this? [08:38] thux, Its hard to say something on that. It might be some subsystem that just takes long to initialize. And with the combination that seems to be faster it might not be there at all. I assume you removed the "quiet" from the command line and still there is nothing between the first line and the boot? [08:38] thux one would need to get the dmesg output from the boot and see if there is an obvious time delay in the timestampts there. may give a clue as to cause [08:39] ok I try first remove quiet and then put dmesg to pastebin [08:44] smb, as a heads up, karmic intel graphics are looking like they have hang issues ... [08:45] apw, I noticed from comments last week. Seems mostly ok on the aa1 [08:45] i am suspicious it only triggers after a suspend/resume [08:47] I had, not yet, a complete lockup as you had. There is a strange case of the netbook launcher sometimes ignoring the mouse after a while. But that is rather something else [08:48] apw, What model do you have? 945GME? [08:50] here is my dmesg http://pastebin.com/m5185a0b0 [08:51] [ 9.224019] pci 0000:00:1d.7: EHCI: BIOS handoff failed (BIOS bug?) 01010001 [08:51] yes [08:51] This seems to be the problem [08:52] before that line it waits now [08:52] Is USB working after this for you? [08:52] yes [08:53] If I read this message correctly there is a problem transferring control of usb from the bios to the kernel [08:54] i have many usb devices plugged only hp laser sometimes not detected [08:54] If there are settings in the bios to control usb support, you might play around with them. But often you want early usb support for usb keyboards [08:54] yes i have that [08:55] does the problem occur if you boot without any usb devices attached [09:00] smb, i have both 945GME and GM45E's showing the issue [09:02] apw, Ok, I also try to keep an eye on it. Though I am running mixed stuff, Jaunty userspace with -10.31 kernel and x-org-edgers X [09:02] heh ... franenstein [09:03] :) Ubuntu, Halloween edition ;-) [09:03] actually thats interesting as these hangs can just as easily be a bad gpu usage from mesa as they can be kernel issues ... so ... a different mesa combination should be an interesting control [09:03] i like the sounds of that ... Ubuntu, halloween [09:03] horrific halloween [09:03] shame we have used up HH [09:04] Hehe, and its no animal... [09:04] I should compare the mesa versions against yours. The edgers should in theory be near Karmic, shouldnt' they? [09:04] halloween hyena [09:05] yeah near, but i'd expect them to be ahead a little [09:06] Mine are all 7.7.0-git20090911.a79eecb9-0ubuntu0tormod-jaunty [09:09] there was a setting in bios for usb hand off and it was disabled, i enabled it and now it boots normally, thanks :) [09:09] wonder why it was default disabled [09:11] you're welcome. good to hear. Meh would not think about it... Maybe works better with the other OS... [09:11] ah that i see === ogra_ is now known as ogra [10:08] cking, gnarl, did either of you report the sd card not mounting thingy [10:08] nope [10:08] apw, Don't think so... Though not mounting thingy is a bit broad [10:09] it was steve who hit this issue [10:10] oh welll i've filed 5 today, wahts one more [10:10] if only i knew what the heck was meant to handle it [10:11] apw, does it only affect sdhc or all or is the reader not detected? [10:11] normally the reader would issue a udev event for a device getting ready (I think) [10:17] the card is detected and the devices are all made, the partition table is read, and the p1 device is made [10:17] its the next step, the mounting it and popping up boxes to show it that don't occur [10:18] Is /media/bla created? [10:18] Err well likely not [10:18] as you say it does not get mounted [10:20] In theory the partition scanning is also done in udev ... Do you get any error message in dmesg? [10:22] I believe each partition should also do a uevent, then udev call blkid and create the symlinks /dev/disk/by-... The mount step might be device-kit... [10:35] smb, no it doesn't get mounted [10:35] the p1 device does get mounted [10:36] apw, ??? huh [10:37] _made_ sorry [10:37] not mounted [10:37] there are by-id files as well [10:39] The id files at least show the partition got through blkid. I just cannot grep the statement about the p1 device. So nothing gets mounted, correct? [10:43] i meant that it does partition discovery correctly and finds it has a partition and makes mmblk0p1 as well [10:44] nothing is mounted as a result. manual mount triggers gnome to find piccies etc [10:44] i've filed it on gnome-volume-manager, bug #429257 [10:44] Malone bug 429257 in gnome-volume-manager "MMC cards no longer auto mount" [Undecided,New] https://launchpad.net/bugs/429257 [10:44] i am a bug filing machine ... 4 today alone [10:45] apw, Shame, you should fix them. ;-) [10:45] na, i file them _you_ get to fix them [10:45] most are these intel graphics hangs ... bah [10:45] If its Karmic, I can' see it (yet) :) [10:47] apw, But seriously, I am not completely sure, the automounting thing seemed to be tied to a desktop (so gnome) but (might check up on this later) might have moved to something more generic to work without a running desktop too... [11:07] smb, yeah figured i file with gnome thingy as thats description included automounting, someone else can move it [11:49] apw, I'm seeing a lot of messages: "TKIP: RX tkey->key_idx=2 frame keyidx=1 priv=de21d780" - is this new with the latest kernel? [11:49] cking, hmmm not sure [11:50] [ 10.393047] lib80211_crypt: registered algorithm 'TKIP' [11:50] changed your AP setup recently? [11:50] not getting it here, but that is a WPA2 thing iirc ? [11:51] yep, WPA2 [11:52] apw, I've rebooted my router today, surely, that's not caused this message? [11:53] whats your driver here, that message appears to be in the rtl8187se driver [11:54] looks to always have been in there [11:54] it's the broadcom wl driver [11:54] :-( [11:54] yay [11:55] nggg === sap is now known as csurbhi [12:58] * amitk welcomes csurbhi to the kernel team [12:58] Hi csurbhi === pgraner` is now known as pgraner [13:02] Hi ! [13:02] thanks Amit [13:09] * apw waves to csurbhi [13:10] hello apw :) [13:20] csurbhi, so hows it going so far [13:20] reading up the docs and setting up accounts.. other than that.. descent [13:20] :) [13:21] Luckily there is help close. :) [13:21] yes.. thats right [13:25] apw, I wonder why we have CONFIG_LIB80211_DEBUG enabled ? [13:25] csurbhi, howdy [13:25] okie dokie [13:26] debian.master/config/config.common.ports:# CONFIG_LIB80211_DEBUG is not set [13:26] debian.master/config/config.common.ubuntu:# CONFIG_LIB80211_DEBUG is not set [13:26] rtg ? [13:27] backports? [13:27] apw: hmm, then what kernel is pgraner running that he would get the 'CCMP: replay detected' message? [13:28] updates/compat-wireless-2.6/config.mk:# CONFIG_LIB80211_DEBUG=y [13:28] rtg ahh we jut hit that with cking, a couple of the WL drivers contain the whole damn 80211 stack in them with mods [13:28] and it may be on there [13:28] apw: it is indeed enabled in LBM [13:28] rtg, apw, Or he has LBM installed [13:29] I have LBM installed [13:29] i think pete has the cranky wl driver doesn't he? [13:30] apw, rt73 IIRC [13:30] apw: actually, rtl8187se [13:30] cking, which rt driver did you have, it was that rtlxxxse one wasn;t it? [13:31] that definatly has the 80211 in it en-toto [13:31] a staging driver horror show and no mistake [13:34] a;w [13:34] apw, I saw this on my wl broadcom driver [13:34] gibber [13:35] its everywhere, and spreading :) [13:38] it's on my HP mini 1000 ;-) [14:07] apw, how about Keybuk's ext4 patch? 'ext4: Don't update superblock write time when filesystem is read-only' [14:08] i've not heard back from him that i know of on its effacacy [14:08] Keybuk, did you get to test that kernel i did? [14:09] apw: did you ever e-mail it me? [14:10] i pointed you to it on here ... [14:10] see, that never works :p [14:10] * Keybuk doesn't read IRC scrollback [14:10] ahh ... damn [14:10] * apw finds it [14:10] * Keybuk finds his laptop under things [14:11] http://people.canonical.com/~apw/lp427822-karmic/ [14:11] mine is burried in bugs [14:11] Keybuk, hmm, you only talk abour east of UTC [14:11] *about [14:12] Keybuk, (see #ubuntu-devel, loic-m is surely in france but seems to see a similar thing) [14:12] ogra: france is very much east of UTC [14:12] I realise that, historically speaking, they are very unhappy about that [14:12] and feel that the prime meridian of the world, and thus all timekeeping, should pass through Paris [14:12] * ogra must have gotten his sense of east and west wrong [14:13] heh [14:13] but that was not the consensus of the rest of the world [14:13] and if they're still clinging to that, we can't help them :p [14:13] Keybuk, you're poking the hornets nest [14:13] Keybuk, sorry about the email, i'll remember that for next time [14:13] rtg: it's a hobby ;) [14:14] * apw files his 7th bug of the day, sigh [14:14] apw: if I'm actually paying attention to IRC, it's usually ok - also /msg is always good - I save those and read them back [14:14] but things-on-a-channel I don't tend to notice if they go off the top of the screen while I'm not looking [14:14] ahh right, fair enough [14:15] My use of IRC isn't much different. [14:15] its a lousy bug tracking mechanism [14:15] ogra: the recovery-while-read-only bug affects everyone from Britain (in Summer time) through Europe, across Eastern Europe, the Middle East, Asia, Indionesia and Australia and (I think) NZ [14:15] all true [14:15] rtg: for me it's a getting-things-done defence [14:16] IRC is a productivity black hole [14:16] indeed [14:16] if I wake up and start reading through things like e-mail or IRC, then I may as well write off being productive in the morning [14:16] Keybuk, yeah, i always get the sides wrong here :P [14:16] and I find that people tend to notice when you're back and remind you about things anyway [14:16] ogra: what's nice is that it's the exact opposite of the init scripts bug [14:17] which only affected Merkins [14:17] heh [14:17] Keybuk, new topic. anything changed recently that I should investigate that slows the boot time on my SSD laptop? It used tyo be 28 seconds, but has gotten much longer since late last week. [14:17] rtg: I've noticed some slowdown with recent kernels myself [14:17] my main laptop seems to take hours to login these days [14:17] both on SDD and HDD [14:17] specially now as its hidden the feeling of length is even worse [14:17] I figured that the IO elevator of the week wasn't working as well as last weeks' [14:18] Keybuk, you think its kernel related? [14:18] rtg: I'm not sure what else has changed [14:18] we added xsplash [14:18] osd seems to have lost its mind [14:18] I'm watching the GDM throbber just sit there and flash at me, no disk activity [14:19] hmm [14:19] odd [14:19] couchdb? [14:20] dunno what couchdb is [14:20] silly desktop database stuff they added for Ubuntu One [14:20] it's quite heavy [14:20] nnng [14:20] and seems to sit there for a few minutes [14:20] isnt it erlang too ? [14:20] that reminds me it exploded on my other laptop [14:20] ubuntu one [14:20] some serious bitching is in order if thats what it is [14:20] * ogra wonders how fast/slow erlang is compared to other langs [14:21] can we tell from a bootchart [14:21] loading another interpreter during boot is surely not speeding up things [14:21] probably [14:22] Keybuk, are we using sreadahead by default now, could that be contributing to slowness [14:22] in the sense we are upgrading a lot, it will be loading old junk? [14:24] if you got a new apparmor profile it will definately slow down the next boot after that [14:24] apw: one of the main reasons we switched to sreadahead is that it reprofiles often [14:24] it may be simply that you upgrade before every reboot [14:24] so every boot sreadahead is profiling [14:24] (rather than assisting) [14:24] now that i almost always do do yes [14:24] I've also noticed a strange issue that I think is ext4, but can't place it [14:25] after people upgraded to my PPA, and rebooted [14:25] they have a 4-6s "exe" process that seems to be somewhere around the point the root filesystem was mounted [14:25] it's almost as if, after major upgrades, ext4 has to do lots of work before it can mount the filesystem next time [14:25] that needs investigation [14:26] its called 'exe' in the profiles? [14:26] yes [14:26] that's what confuses me [14:27] how does the name get generated for the profiles, from the executable link? [14:28] interestingly that link is call 'exe', so perhaps its what things get called which are not convertable back to a name [14:28] i wonder if the chrooting madness and rm madness going on at root mount time leaves us with bad /proc/NNNN/exe links and bad names in the profile [14:28] ie. it could be anything [14:28] right, that's what I think [14:29] that it's something run from the initramfs, just before it's cleaned up, so the link never exists [14:29] hmmmmm [14:30] * apw goes for a couple of boots to see if that speeds things up [14:39] Keybuk, i am getting 1m frm bios to X, 1:22 to bios to gdm, 34s gdm to desktop. now this is a HDD, so i'd expect it to be what 30s total? [14:39] try a second one :) [14:40] (that was second boot, so i assume sreadahead has done its thing) [14:40] ogra, that was the second one [14:40] ah [14:40] still seems like an insane length of time [14:40] i see a few added seconds every time i get an updated apparmor profile for an app [14:40] ouch [14:41] which i was told is normal now [14:41] can you get a bootchart of that [14:41] since apparmor turns the profile textfiles into binary thingies [14:41] use bootchart=nostop on the kernel command line [14:41] then once you're a desktop, run sudo /etc/init.d/stop-bootchart start [14:41] Keybuk, sure [14:42] yay java ... sigh [14:51] apw, whats wrong with java ? ... as long as you dont have to touch its code it's usually fine :) [15:05] Keybuk, ok got it, seems sreadahead runs for a full 90s, that seems unreasonable to me [15:06] apw: really? [15:06] when you login do "status sreadahead" [15:06] what does it say? [15:06] apw@dm$ status sreadahead [15:06] sreadahead stop/waiting [15:08] * apw pushed up the data to rookery [15:08] if this image is to be believed, then it sustained like 44MB/s for that 90 odd seconds [15:08] which is like 4GB of poop [15:09] http://people.canonical.com/~apw/bootchart/ [15:11] does the green line on the bar there show instantaneough thorugh put? of so its like 100k/s over that time? [15:12] * apw has a nother go [15:20] Keybuk, ok the second boot was a little better, so lets assume the first one was a reconfigure or something [15:20] device discovery takes like 20s alone [15:21] status sreadahead - if that says "stop/waiting" then your boot was assisted by sreadahead [15:22] if it says "stop/killed" or similar, then your boot was *reprofiled* [15:22] apw: the -2 [15:22] that shows the exact same I/O weirdness that I see [15:22] what bit? [15:22] the entire system spends most of its time in I/O wait [15:22] http://people.canonical.com/~apw/bootchart/dm-karmic-20090914-2.png [15:22] the two graphs along the top [15:23] the first graph, blue is CPU usage, red is I/O wait [15:23] the second graph, red is disk utilization, the green lines are throughput [15:23] yep, so that would imply its nailed waiting for disk [15:23] so that's showing [15:23] 1. your system is spending almost all of its time in I/O wait (BAD BAD BAD BAD) [15:23] 2. your disk is continually utilized [15:23] 3. almost no throughput is being achieved, despite the continuous utilization [15:23] I'd also point out where this starts [15:23] so read ahead is sucking? [15:23] deep inside the initramfs [15:24] in fact, last time I debugged this, it started as soon as you loaded the disk controller [15:24] which implies something seriously wrong in the kernel [15:24] apw, do you still have older kernels installed? [15:24] * apw would have a bunch of them yes [15:25] i'll go back to something older and see [15:25] apw, IIRC -8 worked well. [15:25] if its the kernel it would imply its more like kernel readahead sucking [15:25] anyhow ... i'll try -8 [15:26] apw: could be, though again, it starts in the initramfs [15:26] this is quite important [15:26] it starts *before* we mount the root filesystem [15:26] it starts before we even know which block device is the root filesystem [15:27] could it just be bad stats rather than actual anything [15:27] enough to throw off sreadahead [15:28] sreadahead doesn't look at stats [15:28] but I don't think it is bad stats [15:29] my Dell laptop has exactly this problem [15:29] and has had for a while now [15:29] though my disk light is nailed ... and its rattling like hell [15:29] this is why you hear me whining continuously about I/O performance [15:29] *but* other laptops and hardware I have *do not* [15:29] but there is io perf and io perf [15:29] mine is a dell [15:30] I suspect it's controller related [15:30] a horrible thouught [15:33] -5 is as bad === bjf-afk is now known as bjf [16:29] apw: I have tested your kernel packages, and they are good [16:52] Keybuk, most excellent [16:53] Keybuk, i am assuming you will be going back to ted, and we'll get a pair of patches from you when thats ready? [17:04] Keybuk, do you know waht this 'utilisation' on the disk actually is, which stat it is [17:04] we don't seem to have any form of actual 'how utilised is the disk' number in the numbers collected by bootchart [17:06] jjohansen, pgraner: Is there an EC2 kernel status meeting now? [17:07] jjohansen: ^^^^^^^^???? [17:07] its at this time in my calender [17:08] (modulo google being pants) [17:08] ah yes [17:09] just a sec [17:09] apw: pants, pants, pants... that gcal is [17:09] so very true [17:09] erichammond, lool has approved the linux-ec2 package for main inclusion, so perhaps I can get it built today. [17:09] here [17:10] alright lets begin [17:10] This is the EC2 kernel status update meeting [17:11] as far as I am aware the only change since thursday/friday is the state of bug #427288 [17:11] Malone bug 427288 in eglibc "Karmic i386 EC2 kernel emulating unsupported memory accesses" [High,Fix released] https://launchpad.net/bugs/427288 [17:11] which was fixed in glibc [17:12] smoser: Can a new AMI be created to test this fix? [17:12] it looks like there is some packaging work to be done as well [17:12] smoser, have you been able to test this? [17:13] i have not tested this yet, but will do that today. [17:13] erichammond, i think publishing a new ami for it doesn't make sense given alpha 6 on thursday [17:14] smoser: so are we then planning on rolling out a new ami with the karmic kernel [17:14] minus this glibc fix [17:15] i'm planing on alpha6 using the karmic kernel [17:15] jjohansen, also, i opened bug 428692 [17:15] Malone bug 428692 in ubuntu "ec2 kernel needs CONFIG_BLK_DEV_LOOP=y and other config changes" [Medium,Confirmed] https://launchpad.net/bugs/428692 [17:15] smoser: thanks [17:15] smoser, why _wouldn't_ you use the new glibc? Isn't that what the alphas are for? [17:16] i did not intend to say i wouldnt [17:16] i hope to get 427288 "all the way fixed" for alpha6 [17:16] ah, I missed interpreted, no point in rolling out a new ami before alpha6, right? [17:17] smoser: My opinion is that if we can test a fix, we should do it sooner rather than later. Are we saying that the fix won't be in glibc until alpha6? [17:18] I'll test it. it can easily be tested just by apt-get update && apt-get install libc6-xen and reboot [17:19] smoser: nice, thanks. [17:19] then, i'll make sure that we get libc6-xen into the ami, and that that ami will not get the error. but i dont plan on publishing another ami with all that incorporated prior to alpha6 [17:19] smoser: When you publish an AMI are you building it from scratch with vmbuilder or are you simply uploading and registering the daily image which is automatically created? [17:20] i much prefer to use the nightly [17:20] if that werent' sufficient, then i'd re-spin the nightly in the same way, just at a later date [17:21] smoser: Ok. How often is the code updated which is used to build the nightly? I thought I saw that you pinged soren to manually update it recently. [17:21] rtg: It was still in NEW when I looked though [17:21] erichammond, you're correct [17:22] we do need a long term solution for that. [17:22] rtg: (Thanks for clarifying the doc question BTW) [17:22] lool, right. how long will it sit there before getting accepted? [17:22] rtg: I'm not archive admin; it's up to them [17:22] lool, so, just find the Monday guy and hassle him? [17:22] Digging the Ubuntu wiki page [17:22] https://wiki.ubuntu.com/ArchiveAdministration [17:22] # [17:22] Monday: SteveLangasek; JamesWestby [17:22] rtg: You could I guess [17:23] rtg: slangasek seems like a good target; he knows about the source package and how it's constructed [17:23] rtg: I expect that if the copyright was properly updated for the Xen patches it should be ok [17:24] lool, uh, I didn't see that mentioned anywhere. [17:24] rtg: Well it's not MIR related, it's NEW related [17:24] rtg: Source packages have to document their copyright properly before they enter the archive, right? [17:24] My role was just in approving the move of this package from universe to main [17:25] lool, AFAIK there are no copyright issues. [17:25] Since it's built like the other kernel packages which are in main already and since the kernel team will care for it, I dont have a big issue with another one [17:25] everything is gplv2 [17:25] k [17:25] apw: yes, I've already sent ted a note [17:25] (I'm lost, is this still part of the EC2 status?) [17:26] apw: I think it comes from /proc/diskstats [17:26] rtg: debian/copyright usually mentions where stuff was taken from, which license it's under and who owns copyright; that said this last bit probably isn't maintainable in the copyright file so I guess you would refer to the list of AUTHORS or something similar or just omit it [17:26] erichammond, sorry, lool and I were discussing linux-ec2 acceptance. [17:26] (Is this a meeting?) [17:26] lool, 'tis [17:27] apw: bootchart logs /proc/stat, /proc/diskstats and /proc/*/stat [17:27] all chart information is derived from that [17:27] jjohansen, are we done? [17:27] I believe so [17:28] anyone else? [17:28] smoser's second bug has not been addressed. [17:28] Ok /me disappears then, sorry for interrupting [17:28] bug 428692 [17:28] Malone bug 428692 in ubuntu "ec2 kernel needs CONFIG_BLK_DEV_LOOP=y and other config changes" [Medium,Confirmed] https://launchpad.net/bugs/428692 [17:28] isn't DEV_LOOP=y already? [17:29] Keybuk, it is for the master branch [17:29] Keybuk: not in EC2 build currently [17:29] the bug is really "make the ec2 kernel more like -server" [17:29] smoser, I see that. I'll see what I can do [17:29] but explicitly, users have already asked for CONFIG_BLK_DEV_LOOP=y [17:30] smoser: you also wanted KEXEC :) [17:30] Keybuk, from waht i can see the utilisation is a number of ticks that there was an io on the queue... so that means 1 io has the basic same effect on the table as 10 [17:31] apw: it doesn't seem to be translated into binary though, it appears as an analogue graph [17:31] jjohansen, yes, but you dont want to support it. so i wasn't going to explicitly ask :) [17:31] and, most importantly [17:31] I have machines which *don't* have that I/O graph [17:31] and they boot much faster, with identical installs [17:32] smoser: right [17:32] Keybuk, right if the io is very empty then it would be empty, or if the queue is empty any time in the sampling tick the answer will be less than 1.0, but if its got 1 io for all the time then its 1.0 same as if its 100 io for all the time [17:32] ie, its pretty much saying if read ahead worked even badly the block should be 'full' utilisation wise [17:32] all right if there is nothing else, we will call the EC2 kernel meeting adjourned [17:33] seems a bit pointless [17:33] This may also be the place and time to discuss bug 429169 as some of the alternatives affect the kernel. [17:33] Malone bug 429169 in vm-builder "ec2: Include kernel modules in AMIs" [Undecided,New] https://launchpad.net/bugs/429169 [17:33] apw: but the output doesn't support that [17:33] we have graphs which are quite fluid [17:33] suggesting it doesn't behave that way [17:33] suggesting the data doesn't always mean the same perhaps [17:33] http://people.canonical.com/~scott/boot-performance/sam-karmic-20090721-8.png [17:33] for example [17:34] but thats what they are according to the docs [17:34] I tend to disbelieve any documentation of things in /proc [17:34] they're hopelessly out of date [17:34] especially on that graph above [17:34] note how the I/O wait goes *flat* [17:34] well no that graph looks pretty the same, mostly high when read ahead is high [17:34] you don't see that on machines "affected" by this I/O issue [17:34] right [17:35] but the whole point is that on my Dell, the graph goes high when readahead isn't even running [17:35] and remains high after it finished [17:35] yeah i think the graph we have on my machine literlally only says IO isn't going very fast [17:35] amitk, any chance we see a new imx51 kernel before the freeze ? [17:35] and [17:35] key [17:35] and so the IO wait is high all the time [17:35] *the throughput on the disk is virtually zero* [17:35] that to me suggests that the I/O queue is getting stuck [17:35] that things are waiting for I/O [17:35] and the kernel isn't actually bothering to service them [17:36] since I've seen bugs exactly like this, where the kernel stops doing I/O, I'm inclined to believe that there is a bug here and it's not a charting issue [17:36] jjohansen, smoser: Did I not speak in time to keep the meeting going? [17:36] erichammond, smoser: I think it would be appropriate to take that discussion over to #ubuntu-server [17:36] ok [17:37] (just looking up the bug number) [17:37] https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/336652 [17:37] Malone bug 336652 in linux "Poor system performance under I/O load" [Medium,Triaged] [17:37] * apw notes that the green line is very low agian even in your one ... odd [17:38] jjohansen: Two of the options relate to building things into the kernel and/or ramdisk, but it is quieter over on #ubuntu-server [17:38] erichammond: that is why I suggested it [17:38] apw: on the sam one, the green line is at least trying to lift when there is I/O wait [17:38] what worries me about yours (and my other laptop) is that the green line stays flat even when there is I/O wait [17:39] the green line in mine is the same pattern [17:39] its that there is no gaps in the wait that its not so obvious [17:39] not that that means there isn't an issue, it taking 100years to boot demonstrates that on its own [17:39] no, yours shows a different pattern [17:40] you have red I/O wait almost all the time [17:40] with a flat green line [17:40] what i don't really understand is why sreadahead isn't able to get its shit onto the queues [17:40] that implies waiting for I/O without any disk throughput [17:40] nope it has two distinct lists [17:40] which leans me to I/O queue stall [17:40] lifts one where you first block is, and the second where you 51mb/s is [17:40] yes [17:40] but it's otherwise flat [17:40] even though you have I/O wait [17:41] on http://people.canonical.com/~scott/boot-performance/sam-karmic-20090721-8.png [17:41] there is [17:41] a) *far* less I/O wait [17:41] yep, that is cause sreadahead is take soooo long to do its thing [17:41] in fact, much of the top of the graph is blue, not red [17:41] its pushing in a dribble of io which accounts for the queue [17:41] and every point there is I/O wait, there is continual lift [17:41] apw: please disregard sreadahead [17:41] the question is why is read ahead not occuring in 2-4 s in mine than yours [17:41] if you like, comment out the "start on" from /etc/init/sreadahead.conf [17:42] and you'll see the same pattern [17:42] right [17:42] Keybuk, i can't as its likely the cuase of constant wait [17:42] sreadahead being unable to push I/O is a symptom of the bug [17:42] likely yes [17:42] not the bug [17:42] indeed ... [17:42] and if you remove sreadahead, you still see the exact same pattern [17:42] which fits [17:42] because [17:42] as I do keep pointing out [17:42] your problems start several seconds before sreadahead is even started ;) [17:42] sreadhead starts at 5s [17:43] well you have to be careful with that statement, as all yo ucan tell is that there is 1 page on the queue during that perios [17:43] but you've already been in large I/O wait for a couple of seconds with no throughput by then [17:43] there shouldn't be any pages on the queue [17:43] we have no idea if thats 1 block or 10000 blocks for a short time [17:43] or there should be disk throughput [17:43] on -2 there is a lift in throughput with the fvery first spike of red [17:43] its small but there [17:44] so why is there I/O wait after the lieft [17:44] there isn't the lift drops as does the wait [17:44] ok [17:44] there is a similar width gap then it starts again, lift + wait [17:44] try something for me [17:44] boot without sreadahead [17:44] yep will do that, seems most logical to confirm your theory [17:44] how did i do that? [17:44] edit /etc/init/sreadhead.conf [17:44] comment out the "start on" line [17:44] (and i do believe it in principle [17:45] just its not proven to me yet) and if its right the debug will be nasty [17:45] I can't find the bug I was looking for though yet [17:45] I had an actual test case where I could stall "dd" on this machine [17:45] launchpad hates us all [17:46] in a way that the kernel had decided that dd was doing no more I/O and could block on read() forever [17:47] that sounds like the sort of thing you see with barriers in the queue [17:47] annoyingly, I have a feeling it was when I was trying to back up my old laptop hard drive [17:47] so I don't even have the logs, because the hard drive has obviously been replaced ;) [17:48] but basically, yes, on this machine I find that any attempt to do large amounts of I/O get extremely slow [17:48] readahead included [17:48] pitti has a similar laptop, and confirmed my results [17:48] but the XPS 1330 for example *does not* have the same problem [17:54] Keybuk: anything similar to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/392288 [17:54] Malone bug 392288 in linux "dd extremely slow writing to usb key without oflag=dsync" [Medium,Triaged] [17:56] pgraner: I don't think so, on the basis that I was having these problems on jaunty [17:56] (it was during Berlin that I had the dd-stall issue) [17:56] Keybuk: ack [17:57] (I've had the abysmal I/O performance on this laptop for several releases now, it's not a "new" bug [17:58] if I were to guess, I'd say it was edgy or feisty that it started) [18:09] Keybuk, ok ... the readahead less case is faster booting [18:10] there is wait along the way as predicted, though depending how the driver calcs that stat we might expect that [18:10] its a poor metric at best [18:11] of course readahead getting behind would account for the worse with readahead mode too [18:12] so all pointing to poor io performance full stpo [18:12] stop [18:12] yeah [18:12] don't suppose you could time how long sreadahead takes? [18:13] time sreadahead --debug --no-fork [18:13] on my machine it was 90s [18:13] if i run it once booted it takes like 0 [18:13] ok [18:13] boot with init=/bin/bash [18:13] then run it (with those args) [18:14] * apw tries your incantation [18:14] * apw notes it running for some time, no IO obvious [18:14] 15s [18:14] weird [18:15] so that took 15s to do nothing? [18:15] and also notes it complains about not being oable to open things [18:15] sda/queue/read_ahead_kb for one [18:15] which sounds bad to me [18:17] Keybuk, that does seem about how it felt yes [18:19] Keybuk, does your machine (thats slow) have its root in an extended partition? [18:21] no [18:21] /dev/sda1 [18:22] Keybuk, i can't get it to do any actual readahead calls in debug mode [18:26] you need both arguments [18:27] Keybuk, have both as in --debug --no-fork [18:27] yes [18:28] Keybuk, i am worried by mdz's bug #429001 ... that warn on there is triggered by what appears to be ftrace on opens, and if i am reading sreadahead right it turns that on... could it be its failing to turn it off again ... as he had it on still when he suspend/resumed much later [18:28] Malone bug 429001 in linux "WARNING: at /build/buildd/linux-2.6.31/kernel/trace/ring_buffer.c:1393 rb_add_time_stamp+0x79/0x200()" [Low,Incomplete] https://launchpad.net/bugs/429001 [18:30] I don't think it should fail to turn it off [18:30] unless he killed it;) [18:31] he is running your boot ppa's so anything possible :) [18:32] mdz's laptop is a strange beast [18:32] also sreadahead only turns on ftrace when profiling [18:32] not in "normal operation" [18:45] bah it just fooked its own db [19:10] Keybuk, ok i am finding that running the read ahead on my laptop after dropping caches takes 30s, if i make it single thread not 4 threads it takes 15s [19:11] right, you have a HDD [19:12] but I don't see what about HDDs makes that need to happen, when on SDD you have the exact opposite behaviour [19:12] the threads can out of order the requests badly and make the disk thrash i guess [19:12] i guess we can actually tell now, there is a rotational flag in the device so we could use that to check [19:14] Keybuk, /sys/block/sda/queue/rotational seems to tell us [19:15] as sreadahead is in there already it could check and change behavior [19:15] that's handy [19:15] one assumes its cause seek latency is essentially 0 on ssd [19:16] I actually already have the sreadahead patches to use a single thread ;) [19:16] but I also found in testing that you then need sreadhead to *block* the boot [19:16] ie. single thread and other stuff in background = bad [19:16] (which fits with readahead-list) [19:16] but then I noticed slow down unless the readahead pack was not sorted ideally for HDD [19:16] hmmm, thats not hot either is it [19:16] and I hadn't done the "ideal sort" bit yet [19:17] this is all a bit of a nightmare [19:17] just trying a single thread one on my machine [19:17] but still in parallell [19:17] I wrote a very long mail about this [19:17] i know ... i remember reading it [19:18] i wonder if letting it run single in parallel but skipping the first few files deliberatly would work [19:18] as that would hold the boot in the sense those required io's would get the boot behind readahead [19:18] I don't follow [19:19] still don't follow [19:19] the risk is if the boot ever gets ahead of where the read ahead is we can make things much worse [19:19] as its reading stuff we no longer need and already ready [19:19] still don't follow [19:19] we never do that [19:19] sreadahead is the first thing started [19:20] the problem on HDD isn't that the boot gets ahead [19:20] it's that we're inherently seeking the disk all over the place [19:21] back after dinner [19:21] will try to use that rotational flag and merge my other sreadahead updates to do the foreground stuff [19:21] see if it makes a difference [19:25] yeah ... tricky [19:25] ok from 15 -> 10s if i stop it using low IO priority [19:29] ogra: yes, I am hoping for a imx51 upload tomorrow. [19:30] amitk, oh, ok [19:30] had already discussed it with rtg [19:31] yeah i saw that last week but couldnt remember the outcome [19:32] ogra: http://people.canonical.com/~amitk/mx51/linux-image-2.6.31-100-imx51_2.6.31-101.8_armel.deb is a preview if you want to play [20:29] apw: interesting on the scheduler [20:30] is that changeable per-device or ? === Seeker`_ is now known as Seeker` [21:48] Keybuk, yep per device on the fly, look for queue/scheduler in the device [21:52] if we stayed in the foreground, and blasted the readahead list for an HDD before letting the boot continue [21:52] whilst setting the io priority to realtime, and the scheduler to deadline [21:52] that might work better? === bjf is now known as bjf-afk