/srv/irclogs.ubuntu.com/2009/09/14/#ubuntu-kernel.txt

=== zz_gouki is now known as gouki
dholbachhiya07:59
dholbachcan anybody tell me if this looks like a kernel problem or more of a Keybuk-boot problem:08:00
dholbach the last thing when I boot on the other machine is "ACPI: I/O resource nForce2_smbus [0x1c40-0x1c7f] conflicts with ACPI region SM01 ............. Error probing SMB2"08:00
dholbach but I get the feeling that that message has nothing to do with the hang on the machine I'm seeing08:00
dholbach if I boot with bin=/bin/bash the last thing I see is "bash: cannot set terminal process group (-1): Inappropriate iotcl for device" "bash: no job control in this shell"08:00
dholbach this is the latest karmic with the ubuntu-boot ppa enabled08:00
dholbach Keybuk: ^ any idea what I could do?08:00
jk-dholbach: you mean init=/bin/bash ?08:06
dholbachjk-: oh yeah, that's right08:06
jk-looks like bash is starting up OK, just giving you a warning.08:06
dholbachok... so it might actually be something boot-related breaking08:06
jk-it may not have correct access to the console device08:06
dholbachand not the kernel08:07
apwtend to agree, would be suspicious its boot and not the kernel08:07
dholbachthe ACPI smbus thing seems to be unrelated?08:07
apwyou may even find it will respond to commands even if it does not look like its prompting08:07
jk-yeah, make sure you have /dev/ populated08:07
apwwhich with init as bash you won't without devtmpfs08:08
dholbacherm... I think I'm seeing a very very weird thing right now - after booting from a livecd and running fsck and smartctl it works again now (both reported no errors and didn't say anything about fixing something?!)08:11
dholbachKeybuk: nevermind08:11
dholbachapw, jk-: thanks guys - I have no idea what "fixed it now"08:12
apwhrm ... computers ...08:12
dholbachmight be a good idea to get a replacement disk :-)08:12
apwheh ... lets hope not08:13
dholbachat least have it ready and do a backup again :-)08:14
thuxhi, with one hardware my jaunty waits long time 'starting to boot' on screen. acpi=off reduces that time but complaint pnpbios. and if acpi=off and pnpbios=off again 'starting  to boot' show long time. is there a way around  this? 08:30
smbthux, Its hard to say something on that. It might be some subsystem that just takes long to initialize. And with the combination that seems to be faster it might not be there at all. I assume you removed the "quiet" from the command line and still there is nothing between the first line and the boot?08:38
apwthux one would need to get the dmesg output from the boot and see if there is an obvious time delay in the timestampts there.  may give a clue as to cause08:38
thuxok I try first remove quiet and then put dmesg to pastebin08:39
apwsmb, as a heads up, karmic intel graphics are looking like they have hang issues ...08:44
smbapw, I noticed from comments last week. Seems mostly ok on the aa108:45
apwi am suspicious it only triggers after a suspend/resume08:45
smbI had, not yet, a complete lockup as you had. There is a strange case of the netbook launcher sometimes ignoring the mouse after a while. But that is rather something else08:47
smbapw, What model do you have? 945GME?08:48
thuxhere is my dmesg http://pastebin.com/m5185a0b008:50
smb[    9.224019] pci 0000:00:1d.7: EHCI: BIOS handoff failed (BIOS bug?) 0101000108:51
thuxyes08:51
smbThis seems to be the problem08:51
thuxbefore that line it waits now08:52
smbIs USB working after this for you?08:52
thuxyes08:52
smbIf I read this message correctly there is a problem transferring control of usb from the bios to the kernel08:53
thuxi have many usb devices plugged only hp laser sometimes not detected08:54
smbIf there are settings in the bios to control usb support, you might play around with them. But often you want early usb support for usb keyboards08:54
thuxyes i have that08:54
apwdoes the problem occur if you boot without any usb devices attached08:55
apwsmb, i have both 945GME and GM45E's showing the issue09:00
smbapw, Ok, I also try to keep an eye on it. Though I am running mixed stuff, Jaunty userspace with -10.31 kernel and x-org-edgers X09:02
apwheh ... franenstein09:02
smb:) Ubuntu, Halloween edition ;-)09:03
apwactually thats interesting as these hangs can just as easily be a bad gpu usage from mesa as they can be kernel issues ... so ... a different mesa combination should be an interesting control09:03
apwi like the sounds of that ... Ubuntu, halloween09:03
apwhorrific halloween09:03
apwshame we have used up HH09:03
smbHehe, and its no animal... 09:04
smbI should compare the mesa versions against yours. The edgers should in theory be near Karmic, shouldnt' they?09:04
apwhalloween hyena09:04
apwyeah near, but i'd expect them to be ahead a little09:05
smbMine are all 7.7.0-git20090911.a79eecb9-0ubuntu0tormod-jaunty09:06
thuxthere was a setting in bios for usb hand off and it was disabled, i enabled it and now it boots normally, thanks :)09:09
thuxwonder why it was default disabled09:09
smbyou're welcome. good to hear. Meh would not think about it... Maybe works better with the other OS... 09:11
thuxah that i see 09:11
=== ogra_ is now known as ogra
apwcking, gnarl, did either of you report the sd card not mounting thingy10:08
ckingnope10:08
smbapw, Don't think so... Though not mounting thingy is a bit broad10:08
ckingit was steve who hit this issue10:09
apwoh welll i've filed 5 today, wahts one more10:10
apwif only i knew what the heck was meant to handle it10:10
smbapw, does it only affect sdhc or all or is the reader not detected?10:11
smbnormally the reader would issue a udev event for a device getting ready (I think)10:11
apwthe card is detected and the devices are all made, the partition table is read, and the p1 device is made10:17
apwits the next step, the mounting it and popping up boxes to show it that don't occur10:17
smbIs /media/bla created?10:18
smbErr well likely not10:18
smbas you say it does not get mounted10:18
smbIn theory the partition scanning is also done in udev ... Do you get any error message in dmesg?10:20
smbI believe each partition should also do a uevent, then udev call blkid and create the symlinks /dev/disk/by-... The mount step might be device-kit...10:22
apwsmb, no it doesn't get mounted10:35
apwthe p1 device does get mounted10:35
smbapw, ??? huh10:36
apw_made_ sorry10:37
apwnot mounted10:37
apwthere are by-id files as well10:37
smbThe id files at least show the partition got through blkid. I just cannot grep the statement about the p1 device. So nothing gets mounted, correct?10:39
apwi meant that it does partition discovery correctly and finds it has a partition and makes mmblk0p1 as well10:43
apwnothing is mounted as a result.  manual mount triggers gnome to find piccies etc10:44
apwi've filed it on gnome-volume-manager, bug #42925710:44
ubot3Malone bug 429257 in gnome-volume-manager "MMC cards no longer auto mount" [Undecided,New] https://launchpad.net/bugs/42925710:44
apwi am a bug filing machine ... 4 today alone10:44
smbapw, Shame, you should fix them. ;-)10:45
apwna, i file them _you_ get to fix them10:45
apwmost are these intel graphics hangs ... bah10:45
smbIf its Karmic, I can' see it (yet) :)10:45
smbapw, But seriously, I am not completely sure, the automounting thing seemed to be tied to a desktop (so gnome) but (might check up on this later) might have moved to something more generic to work without a running desktop too...10:47
apwsmb, yeah figured i file with gnome thingy as thats description included automounting, someone else can move it11:07
ckingapw, I'm seeing a lot of messages: "TKIP: RX tkey->key_idx=2 frame keyidx=1 priv=de21d780" - is this new with the latest kernel?11:49
apwcking, hmmm not sure11:49
apw[   10.393047] lib80211_crypt: registered algorithm 'TKIP'11:50
apwchanged your AP setup recently?11:50
apwnot getting it here, but that is a WPA2 thing iirc ?11:50
ckingyep, WPA211:51
ckingapw, I've rebooted my router today, surely, that's not caused this message?11:52
apwwhats your driver here, that message appears to be in the rtl8187se driver11:53
apwlooks to always have been in there11:54
ckingit's the broadcom wl driver 11:54
cking:-(11:54
apwyay11:54
ckingnggg11:55
=== sap is now known as csurbhi
* amitk welcomes csurbhi to the kernel team12:58
smbHi csurbhi 12:58
=== pgraner` is now known as pgraner
csurbhiHi !13:02
csurbhithanks Amit13:02
* apw waves to csurbhi 13:09
csurbhihello apw :)13:10
apwcsurbhi, so hows it going so far13:20
csurbhireading up the docs and setting up accounts.. other than that.. descent13:20
csurbhi:)13:20
smbLuckily there is help close. :)13:21
csurbhiyes.. thats right13:21
rtgapw, I wonder why we have CONFIG_LIB80211_DEBUG enabled ?13:25
rtgcsurbhi, howdy13:25
csurbhiokie dokie13:25
apwdebian.master/config/config.common.ports:# CONFIG_LIB80211_DEBUG is not set13:26
apwdebian.master/config/config.common.ubuntu:# CONFIG_LIB80211_DEBUG is not set13:26
apwrtg ?13:26
smbbackports?13:27
rtgapw: hmm, then what kernel is pgraner running that he would get the 'CCMP: replay detected' message?13:27
smbupdates/compat-wireless-2.6/config.mk:# CONFIG_LIB80211_DEBUG=y13:28
apwrtg ahh we jut hit that with cking, a couple of the WL drivers contain the whole damn 80211 stack in them with mods13:28
apwand it may be on there13:28
rtgapw: it is indeed enabled in LBM13:28
smbrtg, apw, Or he has LBM installed13:28
ckingI have LBM installed13:29
apwi think pete has the cranky wl driver doesn't he?13:29
rtgapw, rt73 IIRC13:30
rtgapw: actually, rtl8187se13:30
apwcking, which rt driver did you have, it was that rtlxxxse one wasn;t it?13:30
apwthat definatly has the 80211 in it en-toto13:31
apwa staging driver horror show and no mistake13:31
ckinga;w13:34
ckingapw, I saw this on my wl broadcom driver13:34
apwgibber13:34
apwits everywhere, and spreading :)13:35
ckingit's on my HP mini 1000 ;-)13:38
rtgapw, how about Keybuk's ext4 patch? 'ext4: Don't update superblock write time when filesystem is read-only'14:07
apwi've not heard back from him that i know of on its effacacy14:08
apwKeybuk, did you get to test that kernel i did?14:08
Keybukapw: did you ever e-mail it me?14:09
apwi pointed you to it on here ... 14:10
Keybuksee, that never works :p14:10
* Keybuk doesn't read IRC scrollback14:10
apwahh ... damn14:10
* apw finds it14:10
* Keybuk finds his laptop under things14:10
apwhttp://people.canonical.com/~apw/lp427822-karmic/14:11
apwmine is burried in bugs14:11
ograKeybuk, hmm, you only talk abour east of UTC 14:11
ogra*about14:11
ograKeybuk, (see #ubuntu-devel, loic-m is surely in france but seems to see a similar thing)14:12
Keybukogra: france is very much east of UTC14:12
KeybukI realise that, historically speaking, they are very unhappy about that14:12
Keybukand feel that the prime meridian of the world, and thus all timekeeping, should pass through Paris14:12
* ogra must have gotten his sense of east and west wrong 14:12
apwheh14:13
Keybukbut that was not the consensus of the rest of the world14:13
Keybukand if they're still clinging to that, we can't help them :p14:13
rtgKeybuk, you're poking the hornets nest14:13
apwKeybuk, sorry about the email, i'll remember that for next time14:13
Keybukrtg: it's a hobby ;)14:13
* apw files his 7th bug of the day, sigh14:14
Keybukapw: if I'm actually paying attention to IRC, it's usually ok - also /msg is always good - I save those and read them back14:14
Keybukbut things-on-a-channel I don't tend to notice if they go off the top of the screen while I'm not looking14:14
apwahh right, fair enough14:14
rtgMy use of IRC isn't much different.14:15
rtgits a lousy bug tracking mechanism14:15
Keybukogra: the recovery-while-read-only bug affects everyone from Britain (in Summer time) through Europe, across Eastern Europe, the Middle East, Asia, Indionesia and Australia and (I think) NZ14:15
apwall true14:15
Keybukrtg: for me it's a getting-things-done defence14:15
KeybukIRC is a productivity black hole14:16
rtgindeed14:16
Keybukif I wake up and start reading through things like e-mail or IRC, then I may as well write off being productive in the morning14:16
ograKeybuk, yeah, i always get the sides wrong here :P14:16
Keybukand I find that people tend to notice when you're back and remind you about things anyway14:16
Keybukogra: what's nice is that it's the exact opposite of the init scripts bug14:16
Keybukwhich only affected Merkins14:17
ograheh14:17
rtgKeybuk, new topic. anything changed recently that I should investigate that slows the boot time on my SSD laptop? It used tyo be 28 seconds, but has gotten much longer since late last week.14:17
Keybukrtg: I've noticed some slowdown with recent kernels myself14:17
apwmy main laptop seems to take hours to login these days14:17
Keybukboth on SDD and HDD14:17
apwspecially now as its hidden the feeling of length is even worse14:17
KeybukI figured that the IO elevator of the week wasn't working as well as last weeks'14:17
rtgKeybuk, you think its kernel related?14:18
Keybukrtg: I'm not sure what else has changed14:18
apwwe added xsplash14:18
apwosd seems to have lost its mind14:18
rtgI'm watching the GDM throbber just sit there and flash at me, no disk activity14:18
Keybukhmm14:19
Keybukodd14:19
Keybukcouchdb?14:19
rtgdunno what couchdb is14:20
Keybuksilly desktop database stuff they added for Ubuntu One14:20
Keybukit's quite heavy14:20
apwnnng14:20
Keybukand seems to sit there for a few minutes14:20
ograisnt it erlang too ?14:20
apwthat reminds me it exploded on my other laptop14:20
apwubuntu one14:20
rtgsome serious bitching is in order if thats what it is14:20
* ogra wonders how fast/slow erlang is compared to other langs14:20
apwcan we tell from a bootchart14:21
ograloading another interpreter during boot is surely not speeding up things14:21
Keybukprobably14:21
apwKeybuk, are we using sreadahead by default now, could that be contributing to slowness14:22
apwin the sense we are upgrading a lot, it will be loading old junk?14:22
ograif you got a new apparmor profile it will definately slow down the next boot after that14:24
Keybukapw: one of the main reasons we switched to sreadahead is that it reprofiles often14:24
Keybukit may be simply that you upgrade before every reboot14:24
Keybukso every boot sreadahead is profiling14:24
Keybuk(rather than assisting)14:24
apwnow that i almost always do do yes14:24
KeybukI've also noticed a strange issue that I think is ext4, but can't place it14:24
Keybukafter people upgraded to my PPA, and rebooted14:25
Keybukthey have a 4-6s "exe" process that seems to be somewhere around the point the root filesystem was mounted14:25
Keybukit's almost as if, after major upgrades, ext4 has to do lots of work before it can mount the filesystem next time14:25
Keybukthat needs investigation14:25
apwits called 'exe' in the profiles?14:26
Keybukyes14:26
Keybukthat's what confuses me14:26
apwhow does the name get generated for the profiles, from the executable link?14:27
apwinterestingly that link is call 'exe', so perhaps its what things get called which are not convertable back to a name14:28
apwi wonder if the chrooting madness and rm madness going on at root mount time leaves us with bad /proc/NNNN/exe links and bad names in the profile14:28
apwie. it could be anything14:28
Keybukright, that's what I think14:28
Keybukthat it's something run from the initramfs, just before it's cleaned up, so the link never exists14:29
apwhmmmmm14:29
* apw goes for a couple of boots to see if that speeds things up14:30
apwKeybuk, i am getting 1m frm bios to X, 1:22 to bios to gdm, 34s gdm to desktop.  now this is a HDD, so i'd expect it to be what 30s total?14:39
ogratry a second one :)14:39
apw(that was second boot, so i assume sreadahead has done its thing)14:40
apwogra, that was the second one14:40
ograah14:40
apwstill seems like an insane length of time14:40
ograi see a few added seconds every time i get an updated apparmor profile for an app14:40
apwouch14:40
ograwhich i was told is normal now 14:41
Keybukcan you get a bootchart of that14:41
ograsince apparmor turns the profile textfiles into binary thingies14:41
Keybukuse bootchart=nostop on the kernel command line14:41
Keybukthen once you're a desktop, run sudo /etc/init.d/stop-bootchart start14:41
apwKeybuk, sure14:41
apwyay java ... sigh14:42
ograapw, whats wrong with java ? ... as long as you dont have to touch its code it's usually fine :)14:51
apwKeybuk, ok got it, seems sreadahead runs for a full 90s, that seems unreasonable to me15:05
Keybukapw: really?15:06
Keybukwhen you login do "status sreadahead"15:06
Keybukwhat does it say?15:06
apwapw@dm$ status sreadahead15:06
apwsreadahead stop/waiting15:06
* apw pushed up the data to rookery15:08
apwif this image is to be believed, then it sustained like 44MB/s for that 90 odd seconds15:08
apwwhich is like 4GB of poop15:08
apwhttp://people.canonical.com/~apw/bootchart/15:09
apwdoes the green line on the bar there show instantaneough thorugh put?  of so its like 100k/s over that time?15:11
* apw has a nother go15:12
apwKeybuk, ok the second boot was a little better, so lets assume the first one was a reconfigure or something15:20
apwdevice discovery takes like 20s alone15:20
Keybukstatus sreadahead - if that says "stop/waiting" then your boot was assisted by sreadahead 15:21
Keybukif it says "stop/killed" or similar, then your boot was *reprofiled*15:22
Keybukapw: the -215:22
Keybukthat shows the exact same I/O weirdness that I see15:22
apwwhat bit?15:22
Keybukthe entire system spends most of its time in I/O wait15:22
Keybukhttp://people.canonical.com/~apw/bootchart/dm-karmic-20090914-2.png15:22
Keybukthe two graphs along the top15:22
Keybukthe first graph, blue is CPU usage, red is I/O wait15:23
Keybukthe second graph, red is disk utilization, the green lines are throughput15:23
apwyep, so that would imply its nailed waiting for disk15:23
Keybukso that's showing15:23
Keybuk 1. your system is spending almost all of its time in I/O wait (BAD BAD BAD BAD)15:23
Keybuk 2. your disk is continually utilized15:23
Keybuk 3. almost no throughput is being achieved, despite the continuous utilization15:23
KeybukI'd also point out where this starts15:23
apwso read ahead is sucking?15:23
Keybukdeep inside the initramfs15:23
Keybukin fact, last time I debugged this, it started as soon as you loaded the disk controller15:24
Keybukwhich implies something seriously wrong in the kernel15:24
rtgapw, do you still have older kernels installed?15:24
* apw would have a bunch of them yes15:24
apwi'll go back to something older and see15:25
rtgapw, IIRC -8 worked well.15:25
apwif its the kernel it would imply its more like kernel readahead sucking15:25
apwanyhow ... i'll try -815:25
Keybukapw: could be, though again, it starts in the initramfs15:26
Keybukthis is quite important15:26
Keybukit starts *before* we mount the root filesystem15:26
Keybukit starts before we even know which block device is the root filesystem15:26
apwcould it just be bad stats rather than actual anything15:27
apwenough to throw off sreadahead15:27
Keybuksreadahead doesn't look at stats15:28
Keybukbut I don't think it is bad stats15:28
Keybukmy Dell laptop has exactly this problem15:29
Keybukand has had for a while now15:29
apwthough my disk light is nailed ... and its rattling like hell15:29
Keybukthis is why you hear me whining continuously about I/O performance15:29
Keybuk*but* other laptops and hardware I have *do not*15:29
apwbut there is io perf and io perf15:29
apwmine is a dell15:29
KeybukI suspect it's controller related15:30
apwa horrible thouught15:30
apw-5 is as bad15:33
=== bjf-afk is now known as bjf
Keybukapw: I have tested your kernel packages, and they are good16:29
apwKeybuk, most excellent16:52
apwKeybuk, i am assuming you will be going back to ted, and we'll get a pair of patches from you when thats ready?16:53
apwKeybuk, do you know waht this 'utilisation' on the disk actually is, which stat it is17:04
apwwe don't seem to have any form of actual 'how utilised is the disk' number in the numbers collected by bootchart17:04
erichammondjjohansen, pgraner: Is there an EC2 kernel status meeting now?17:06
pgranerjjohansen: ^^^^^^^^????17:07
apwits at this time in my calender17:07
apw(modulo google being pants)17:08
jjohansenah yes17:08
jjohansenjust a sec17:09
pgranerapw: pants, pants, pants... that gcal is17:09
apwso very true17:09
rtgerichammond, lool has approved the linux-ec2 package for main inclusion, so perhaps I can get it built today.17:09
smoserhere17:09
jjohansenalright lets begin17:10
jjohansenThis is the EC2 kernel status update meeting17:10
jjohansenas far as I am aware the only change since thursday/friday is the state of bug #42728817:11
ubot3Malone bug 427288 in eglibc "Karmic i386 EC2 kernel emulating unsupported memory accesses" [High,Fix released] https://launchpad.net/bugs/42728817:11
jjohansenwhich was fixed in glibc17:11
erichammondsmoser: Can a new AMI be created to test this fix?17:12
jjohansenit looks like there is some packaging work to be done as well17:12
rtgsmoser, have you been able to test this?17:12
smoseri have not tested this yet, but will do that today.17:13
smosererichammond, i think publishing a new ami for it doesn't make sense given alpha 6 on thursday17:13
jjohansensmoser: so are we then planning on rolling out a new ami with the karmic kernel17:14
jjohansenminus this glibc fix17:14
smoseri'm planing on alpha6 using the karmic kernel17:15
smoserjjohansen, also, i opened bug 42869217:15
ubot3Malone bug 428692 in ubuntu "ec2 kernel needs CONFIG_BLK_DEV_LOOP=y and other config changes" [Medium,Confirmed] https://launchpad.net/bugs/42869217:15
jjohansensmoser: thanks17:15
rtgsmoser, why _wouldn't_ you use the new glibc? Isn't that what the alphas are for?17:15
smoseri did not intend to say i wouldnt17:16
smoseri hope to get 427288 "all the way fixed" for alpha617:16
jjohansenah, I missed interpreted, no point in rolling out a new ami before alpha6, right?17:16
erichammondsmoser: My opinion is that if we can test a fix, we should do it sooner rather than later.  Are we saying that the fix won't be in glibc until alpha6?17:17
smoserI'll test it. it can easily be tested just by apt-get update && apt-get install libc6-xen and reboot17:18
erichammondsmoser: nice, thanks.17:19
smoserthen, i'll make sure that we get libc6-xen into the ami, and that that ami will not get the error. but i dont plan on publishing another ami with all that incorporated prior to alpha617:19
erichammondsmoser: When you publish an AMI are you building it from scratch with vmbuilder or are you simply uploading and registering the daily image which is automatically created?17:19
smoseri much prefer to use the nightly17:20
smoserif that werent' sufficient, then i'd re-spin the nightly in the same way, just at a later date17:20
erichammondsmoser: Ok.  How often is the code updated which is used to build the nightly?  I thought I saw that you pinged soren to manually update it recently.17:21
loolrtg: It was still in NEW when I looked though17:21
smosererichammond, you're correct17:21
smoserwe do need a long term solution for that.17:22
loolrtg: (Thanks for clarifying the doc question BTW)17:22
rtglool, right. how long will it sit there before getting accepted?17:22
loolrtg: I'm not archive admin; it's up to them17:22
rtglool, so, just find the Monday guy and hassle him?17:22
loolDigging the Ubuntu wiki page17:22
loolhttps://wiki.ubuntu.com/ArchiveAdministration17:22
lool#17:22
loolMonday: SteveLangasek; JamesWestby 17:22
loolrtg: You could I guess17:22
loolrtg: slangasek seems like a good target; he knows about the source package and how it's constructed17:23
loolrtg: I expect that if the copyright was properly updated for the Xen patches it should be ok17:23
rtglool, uh, I didn't see that mentioned anywhere. 17:24
loolrtg: Well it's not MIR related, it's NEW related17:24
loolrtg: Source packages have to document their copyright properly before they enter the archive, right?17:24
loolMy role was just in approving the move of this package from universe to main17:24
rtglool, AFAIK there are no copyright issues.17:25
loolSince it's built like the other kernel packages which are in main already and since the kernel team will care for it, I dont have a big issue with another one17:25
rtgeverything is gplv217:25
loolk17:25
Keybukapw: yes, I've already sent ted a note17:25
erichammond(I'm lost, is this still part of the EC2 status?)17:25
Keybukapw: I think it comes from /proc/diskstats17:26
loolrtg: debian/copyright usually mentions where stuff was taken from, which license it's under and who owns copyright; that said this last bit probably isn't maintainable in the copyright file so I guess you would refer to the list of AUTHORS or something similar or just omit it17:26
rtgerichammond, sorry, lool and I were discussing linux-ec2 acceptance.17:26
lool(Is this a meeting?)17:26
rtglool, 'tis17:26
Keybukapw: bootchart logs /proc/stat, /proc/diskstats and /proc/*/stat17:27
Keybukall chart information is derived from that17:27
rtgjjohansen, are we done?17:27
jjohansenI believe so17:27
jjohansenanyone else?17:28
erichammondsmoser's second bug has not been addressed.17:28
loolOk /me disappears then, sorry for interrupting17:28
erichammondbug 42869217:28
ubot3Malone bug 428692 in ubuntu "ec2 kernel needs CONFIG_BLK_DEV_LOOP=y and other config changes" [Medium,Confirmed] https://launchpad.net/bugs/42869217:28
Keybukisn't DEV_LOOP=y already?17:28
rtgKeybuk, it is for the master branch17:29
jjohansenKeybuk: not in EC2 build currently17:29
smoserthe bug is really "make the ec2 kernel more like -server" 17:29
rtgsmoser, I see that. I'll see what I can do17:29
smoserbut explicitly, users have already asked for CONFIG_BLK_DEV_LOOP=y17:29
jjohansensmoser: you also wanted KEXEC :)17:30
apwKeybuk, from waht i can see the utilisation is a number of ticks that there was an io on the queue... so that means 1 io has the basic same effect on the table as 1017:30
Keybukapw: it doesn't seem to be translated into binary though, it appears as an analogue graph17:31
smoserjjohansen, yes, but you dont want to support it. so i wasn't going to explicitly ask :)17:31
Keybukand, most importantly17:31
KeybukI have machines which *don't* have that I/O graph17:31
Keybukand they boot much faster, with identical installs17:31
jjohansensmoser: right17:32
apwKeybuk, right if the io is very empty then it would be empty, or if the queue is empty any time in the sampling tick the answer will be less than 1.0, but if its got 1 io for all the time then its 1.0 same as if its 100 io for all the time17:32
apwie, its pretty much saying if read ahead worked even badly the block should be 'full' utilisation wise17:32
jjohansenall right if there is nothing else, we will call the EC2 kernel meeting adjourned17:32
apwseems a bit pointless17:33
erichammondThis may also be the place and time to discuss bug 429169 as some of the alternatives affect the kernel.17:33
ubot3Malone bug 429169 in vm-builder "ec2: Include kernel modules in AMIs" [Undecided,New] https://launchpad.net/bugs/42916917:33
Keybukapw: but the output doesn't support that17:33
Keybukwe have graphs which are quite fluid17:33
Keybuksuggesting it doesn't behave that way17:33
apwsuggesting the data doesn't always mean the same perhaps17:33
Keybukhttp://people.canonical.com/~scott/boot-performance/sam-karmic-20090721-8.png17:33
Keybukfor example17:33
apwbut thats what they are according to the docs17:34
KeybukI tend to disbelieve any documentation of things in /proc17:34
Keybukthey're hopelessly out of date17:34
Keybukespecially on that graph above17:34
Keybuknote how the I/O wait goes *flat*17:34
apwwell no that graph looks pretty the same, mostly high when read ahead is high17:34
Keybukyou don't see that on machines "affected" by this I/O issue17:34
Keybukright17:34
Keybukbut the whole point is that on my Dell, the graph goes high when readahead isn't even running17:35
Keybukand remains high after it finished17:35
apwyeah i think the graph we have on my machine literlally only says IO isn't going very fast17:35
ograamitk, any chance we see a new imx51 kernel before the freeze ?17:35
Keybukand17:35
Keybukkey17:35
apwand so the IO wait is high all the time17:35
Keybuk*the throughput on the disk is virtually zero*17:35
Keybukthat to me suggests that the I/O queue is getting stuck17:35
Keybukthat things are waiting for I/O17:35
Keybukand the kernel isn't actually bothering to service them17:35
Keybuksince I've seen bugs exactly like this, where the kernel stops doing I/O, I'm inclined to believe that there is a bug here and it's not a charting issue17:36
erichammondjjohansen, smoser: Did I not speak in time to keep the meeting going?17:36
jjohansenerichammond, smoser: I think it would be appropriate to take that discussion over to #ubuntu-server17:36
smoserok17:36
Keybuk(just looking up the bug number)17:37
Keybukhttps://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/33665217:37
ubot3Malone bug 336652 in linux "Poor system performance under I/O load" [Medium,Triaged] 17:37
* apw notes that the green line is very low agian even in your one ... odd17:37
erichammondjjohansen: Two of the options relate to building things into the kernel and/or ramdisk, but it is quieter over on #ubuntu-server17:38
jjohansenerichammond: that is why I suggested it17:38
Keybukapw: on the sam one, the green line is at least trying to lift when there is I/O wait17:38
Keybukwhat worries me about yours (and my other laptop) is that the green line stays flat even when there is I/O wait17:38
apwthe green line in mine is the same pattern17:39
apwits that there is no gaps in the wait that its not so obvious17:39
apwnot that that means there isn't an issue, it taking 100years to boot demonstrates that on its own17:39
Keybukno, yours shows a different pattern17:39
Keybukyou have red I/O wait almost all the time17:40
Keybukwith a flat green line17:40
apwwhat i don't really understand is why sreadahead isn't able to get its shit onto the queues17:40
Keybukthat implies waiting for I/O without any disk throughput17:40
apwnope it has two distinct lists17:40
Keybukwhich leans me to I/O queue stall17:40
apwlifts one where you first block is, and the second where you 51mb/s is17:40
Keybukyes17:40
Keybukbut it's otherwise flat17:40
Keybukeven though you have I/O wait17:40
Keybukon http://people.canonical.com/~scott/boot-performance/sam-karmic-20090721-8.png17:41
Keybukthere is17:41
Keybuk a) *far* less I/O wait17:41
apwyep, that is cause sreadahead is take soooo long to do its thing17:41
Keybukin fact, much of the top of the graph is blue, not red17:41
apwits pushing in a dribble of io which accounts for the queue17:41
Keybukand every point there is I/O wait, there is continual lift17:41
Keybukapw: please disregard sreadahead17:41
apwthe question is why is read ahead not occuring in 2-4 s in mine than yours17:41
Keybukif you like, comment out the "start on" from /etc/init/sreadahead.conf17:41
Keybukand you'll see the same pattern17:42
Keybukright17:42
apwKeybuk, i can't as its likely the cuase of constant wait17:42
Keybuksreadahead being unable to push I/O is a symptom of the bug17:42
apwlikely yes17:42
Keybuknot the bug17:42
apwindeed ...17:42
Keybukand if you remove sreadahead, you still see the exact same pattern17:42
Keybukwhich fits17:42
Keybukbecause17:42
Keybukas I do keep pointing out17:42
Keybukyour problems start several seconds before sreadahead is even started ;)17:42
Keybuksreadhead starts at 5s17:42
apwwell you have to be careful with that statement, as all yo ucan tell is that there is 1 page on the queue during that perios17:43
Keybukbut you've already been in large I/O wait for a couple of seconds with no throughput by then17:43
Keybukthere shouldn't be any pages on the queue17:43
apwwe have no idea if thats 1 block or 10000 blocks for a short time17:43
Keybukor there should be disk throughput17:43
apwon -2 there is a lift in throughput with the fvery first spike of red17:43
apwits small but there17:43
Keybukso why is there I/O wait after the lieft17:44
apwthere isn't the lift drops as does the wait17:44
Keybukok17:44
apwthere is a similar width gap then it starts again, lift + wait17:44
Keybuktry something for me17:44
Keybukboot without sreadahead17:44
apwyep will do that, seems most logical to confirm your theory17:44
apwhow did i do that?17:44
Keybukedit /etc/init/sreadhead.conf17:44
Keybukcomment out the "start on" line17:44
apw(and i do believe it in principle17:44
apwjust its not proven to me yet) and if its right the debug will be nasty17:45
KeybukI can't find the bug I was looking for though yet17:45
KeybukI had an actual test case where I could stall "dd" on this machine17:45
apwlaunchpad hates us all17:45
Keybukin a way that the kernel had decided that dd was doing no more I/O and could block on read() forever17:46
apwthat sounds like the sort of thing you see with barriers in the queue17:47
Keybukannoyingly, I have a feeling it was when I was trying to back up my old laptop hard drive17:47
Keybukso I don't even have the logs, because the hard drive has obviously been replaced ;)17:47
Keybukbut basically, yes, on this machine I find that any attempt to do large amounts of I/O get extremely slow17:48
Keybukreadahead included17:48
Keybukpitti has a similar laptop, and confirmed my results17:48
Keybukbut the XPS 1330 for example *does not* have the same problem17:48
pgranerKeybuk: anything similar to https://bugs.launchpad.net/ubuntu/+source/linux/+bug/39228817:54
ubot3Malone bug 392288 in linux "dd extremely slow writing to usb key without oflag=dsync" [Medium,Triaged] 17:54
Keybukpgraner: I don't think so, on the basis that I was having these problems on jaunty17:56
Keybuk(it was during Berlin that I had the dd-stall issue)17:56
pgranerKeybuk: ack17:56
Keybuk(I've had the abysmal I/O performance on this laptop for several releases now, it's not a "new" bug17:57
Keybuk if I were to guess, I'd say it was edgy or feisty that it started)17:58
apwKeybuk, ok ... the readahead less case is faster booting18:09
apwthere is wait along the way as predicted, though depending how the driver calcs that stat we might expect that18:10
apwits a poor metric at best18:10
apwof course readahead getting behind would account for the worse with readahead mode too18:11
apwso all pointing to poor io performance full stpo18:12
apwstop18:12
Keybukyeah18:12
Keybukdon't suppose you could time how long sreadahead takes?18:12
Keybuktime sreadahead --debug --no-fork18:13
apwon my machine it was 90s18:13
apwif i run it once booted it takes like 018:13
Keybukok18:13
Keybukboot with init=/bin/bash18:13
Keybukthen run it (with those args)18:13
* apw tries your incantation18:14
* apw notes it running for some time, no IO obvious18:14
apw15s18:14
Keybukweird18:14
Keybukso that took 15s to do nothing?18:15
apwand also notes it complains about not being oable to open things18:15
apwsda/queue/read_ahead_kb for one18:15
apwwhich sounds bad to me18:15
apwKeybuk, that does seem about how it felt yes18:17
apwKeybuk, does your machine (thats slow) have its root in an extended partition?18:19
Keybukno18:21
Keybuk/dev/sda118:21
apwKeybuk, i can't get it to do any actual readahead calls in debug mode18:22
Keybukyou need both arguments18:26
apwKeybuk, have both as in --debug --no-fork18:27
Keybukyes18:27
apwKeybuk, i am worried by mdz's bug #429001 ... that warn on there is triggered by what appears to be ftrace on opens, and if i am reading sreadahead right it turns that on... could it be its failing to turn it off again ... as he had it on still when he suspend/resumed much later18:28
ubot3Malone bug 429001 in linux "WARNING: at /build/buildd/linux-2.6.31/kernel/trace/ring_buffer.c:1393 rb_add_time_stamp+0x79/0x200()" [Low,Incomplete] https://launchpad.net/bugs/42900118:28
KeybukI don't think it should fail to turn it off18:30
Keybukunless he killed it;)18:30
apwhe is running your boot ppa's so anything possible :)18:31
Keybukmdz's laptop is a strange beast18:32
Keybukalso sreadahead only turns on ftrace when profiling18:32
Keybuknot in "normal operation"18:32
apwbah it just fooked its own db18:45
apwKeybuk, ok i am finding that running the read ahead on my laptop after dropping caches takes 30s, if i make it single thread not 4 threads it takes 15s19:10
Keybukright, you have a HDD19:11
Keybukbut I don't see what about HDDs makes that need to happen, when on SDD you have the exact opposite behaviour19:12
apwthe threads can out of order the requests badly and make the disk thrash i guess19:12
apwi guess we can actually tell now, there is a rotational flag in the device so we could use that to check19:12
apwKeybuk, /sys/block/sda/queue/rotational seems to tell us19:14
apwas sreadahead is in there already it could check and change behavior19:15
Keybukthat's handy19:15
apwone assumes its cause seek latency is essentially 0 on ssd19:15
KeybukI actually already have the sreadahead patches to use a single thread ;)19:16
Keybukbut I also found in testing that you then need sreadhead to *block* the boot19:16
Keybukie. single thread and other stuff in background = bad19:16
Keybuk(which fits with readahead-list)19:16
Keybukbut then I noticed slow down unless the readahead pack was not sorted ideally for HDD19:16
apwhmmm, thats not hot either is it19:16
Keybukand I hadn't done the "ideal sort" bit yet19:16
apwthis is all a bit of a nightmare19:17
apwjust trying a single thread one on my machine19:17
apwbut still in parallell19:17
KeybukI wrote a very long mail about this19:17
apwi know ... i remember reading it19:17
apwi wonder if letting it run single in parallel but skipping the first few files deliberatly would work19:18
apwas that would hold the boot in the sense those required io's would get the boot behind readahead19:18
KeybukI don't follow19:18
Keybukstill don't follow19:19
apwthe risk is if the boot ever gets ahead of where the read ahead is we can make things much worse19:19
apwas its reading stuff we no longer need and already ready19:19
Keybukstill don't follow19:19
Keybukwe never do that19:19
Keybuksreadahead is the first thing started19:19
Keybukthe problem on HDD isn't that the boot gets ahead19:20
Keybukit's that we're inherently seeking the disk all over the place19:20
Keybukback after dinner19:21
Keybukwill try to use that rotational flag and merge my other sreadahead updates to do the foreground stuff19:21
Keybuksee if it makes a difference19:21
apwyeah ... tricky19:25
apwok from 15 -> 10s if i stop it using low IO priority19:25
amitkogra: yes, I am hoping for a imx51 upload tomorrow.19:29
ograamitk, oh, ok 19:30
amitkhad already discussed it with rtg19:30
ograyeah i saw that last week but couldnt remember the outcome19:31
amitkogra: http://people.canonical.com/~amitk/mx51/linux-image-2.6.31-100-imx51_2.6.31-101.8_armel.deb is a preview if you want to play19:32
Keybukapw: interesting on the scheduler20:29
Keybukis that changeable per-device or ?20:30
=== Seeker`_ is now known as Seeker`
apwKeybuk, yep per device on the fly, look for queue/scheduler in the device21:48
Keybukif we stayed in the foreground, and blasted the readahead list for an HDD before letting the boot continue21:52
Keybukwhilst setting the io priority to realtime, and the scheduler to deadline21:52
Keybukthat might work better?21:52
=== bjf is now known as bjf-afk

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!