/srv/irclogs.ubuntu.com/2011/02/02/#ubuntu-kernel.txt

=== skaet is now known as skaet_afk
=== smb` is now known as smb
tjaaltonhmm, so x86 suspend issues should be fixed on natty? I still get bug 70455009:55
ubot2Launchpad bug 704550 in linux "Lenovo Thinkpad X61 fails to resume from suspend" [Undecided,New] https://launchpad.net/bugs/70455009:55
=== _LibertyZero is now known as LibertyZero
lagapw: It seems we have the right idea already: https://wiki.linaro.org/PackageYourOwnKernel12:35
lagapw: Thanks for your help :)12:35
apwlag hehe good12:38
=== herton is now known as herton_lunch
=== tgardner is now known as tgardner-afk
=== skaet_afk is now known as skaet
quupI compiled the kernel just fine, I get the linux-image and linux-headers-...-generic package, the kernel boots and everything15:43
quupbut I can't install the headers package because: dpkg: dependency problems prevent configuration of linux-headers-2.6.38-1-generic:15:44
quup linux-headers-2.6.38-1-generic depends on linux-headers-2.6.38-1; however:15:44
quup  Package linux-headers-2.6.38-1 is not installed.15:44
quuphow do I get a .deb for that as well?15:44
apwquup, what did you build ?  binary-generic ?15:44
quupapw: exactly15:44
apwyou also need to build binary-indep for the common headers15:44
quupoh ok, thanks!15:44
smbOr binary-headers15:44
apwyeah binary-headers looks to do the least15:46
=== Quintasan_ is now known as Quintasan
quupworked perfectly! thanks :)15:57
=== herton_lunch is now known as herton
stgraberapw: oops, indeed, that channel is probably a lot better for that kind of discussion :)16:50
apwstgraber, this is the right time to move16:51
apwstgraber, so if you could get the whole dmesg in case there is anything before of interest that'd be good16:51
apwstgraber, i suspect a locking issue from the diagnostic16:51
apwis it easy for you to test a debug kernel in this setup?16:51
=== JanC_ is now known as JanC
stgraberyep, as soon as I get the VM re-installed, I'll make a snapshot to make sure I won't trash it again. Then testing will be easy16:52
apwstgraber, thanks16:52
apwstgraber, i'll shove some debug in and get something building in the meantime16:53
stgrabercdimage is a bit slow so I'll need 15min to get a natty desktop image, then I just need to install it, install nbd-server and export some empty file ;)16:53
apwmy build server may be large, but its not going to be done before you :)16:56
apwstgraber, 32 or 64 bit ?16:58
=== tgardner-afk is now known as tgardner
stgraberapw: that one is 32bit17:00
stgraberthough if that's useful for debugging, I had the same issue with 64bit on Monday17:00
apwstgraber, ok ... building with some lock debug17:00
apwstgraber, don't care, just don't want to make both for speed reasons17:00
apwbuilding 32 bit now17:00
apwstgraber, kernel will be here in about 5 mins ... http://people.canonical.com/~apw/lp711951-natty/17:20
apwstgraber, ok kernel is there17:27
apwstgraber, poke me when you have tested it17:33
apwkees, could we get a mergy merge please, bored of these results17:38
apwkees, how hard would it be to have a shadow set which are built from our tip, just for our packages, obviously emitted somewhere else17:39
keesapw: the entire export mechanism is already in the cve tracker. just need to set env vars and type "make" in the toplevel dir.17:40
JFo<- grabbing lunch17:41
apwkees, got a recipe you use on people ?  then i can just dup it into mine :)17:41
keesapw: but I can try to set up an automatic thing17:41
apwkees, happy to run it myself if its that easy17:41
apwi assume all the pre-reqs are on people if you can run it17:42
keesapw: yeah, see ~ubuntu-security/bin/html-report.sh17:42
keesapw: /home/ubuntu-security/bin/cron-cve-tracker.sh does the bzr pull before calling html-report.sh17:42
keesapw: yup17:42
keesyou'll need ~ubuntu-security/.ubuntu-cve-tracker.conf for the repo settings17:43
apwkees, cool will have a look at it tommorrow17:44
keesokay17:47
stgraberapw: just got back from lunch, looking at it now17:59
apwstgraber, ahh ok18:00
apwkees, ok that was so easy its already done :)18:06
apwsmb, sconklin, bjf, this url points to the kernel package etc as per the cve tracker, but showing the status as from the tip of our branch: http://people.canonical.com/~apw/cve/pkg/linux.html18:07
apwauto updating much like the original18:08
apwnow we are not beholden to security team merges to get team status18:08
sconklin"a man with two watched never knows what time it is. A man with one is always certain"18:09
sconklinBut nice job. At least we know which one to believe18:09
apwthey just mean differnt things.  one is how we thinkg we are doing, the other is how security think we are doing18:09
keesapw: it's so easy to use. :)18:09
sconklinyeah18:09
apwkees, to get data out of yes :-p18:10
keesheh18:10
apwbut we are instant gratification junkies here18:10
keesapw: you may want to start using scripts/check-syntax too, otherwise the exports might blow up a bit if things aren't sane18:10
apwkees, tried runinng it, but it needed unspecified pre-reqs to run and exploded in a heap18:11
keeshm18:11
=== sforshee is now known as sforshee-lunch
apwkees, well we'll see it first anyhow, if our own tables explode18:12
keesapw: now that you have the ~/.ubuntu-cve-tracker.conf, check-syntax should work18:12
stgraberapw: same issue with the new kernel18:28
apwstgraber, yeah not expecting anything fixed by it ... but some text in the dmesg 18:28
apwwith APW in it18:28
stgraberhttp://paste.ubuntu.com/561540/18:30
stgraberhttp://paste.ubuntu.com/561541/18:30
stgraberhttp://paste.ubuntu.com/561542/18:30
stgraberfirst paste is before starting nbd-client, second is after the first nbd-client, third is after the second nbd-client18:31
* tgardner --> lunch18:32
stgraberapw: ^ (not sure how closely you monitor the channel ;))18:32
apwstgraber, always best to add my nick, i am easily distracted by shiney objects18:33
apwstgraber, have you got the entire thing as one dmesg?18:34
stgraberI can cat all of them together ;)18:34
apwi think its telling me the storey just not sure18:34
stgraberhttp://paste.ubuntu.com/561545/18:34
apwif you are 100% sure its the complete dmesg thats great18:35
stgraberI basically did a "dmesg -c" between each so I shouldn't have lost any entry18:35
apw[   26.659231] APW: taken18:35
apw[   27.660130] Dev nbd0: unable to read RDB block 818:35
apw[   27.660144]  nbd0: unable to read partition table18:35
apw[   27.660148] nbd0: partition table beyond EOD, truncated18:35
apw[   36.754723] APW: taking &nbd_mutex18:35
apwok ... so that says an error path is not dropping the lock18:35
apwnot the last one but the one before!?!18:35
apwstgraber, damn not enough info to diagnose ... will have to spin you another kenel, you about for a bit ?18:40
stgraberapw: yep18:42
apwstgraber, ooo might have an idea... what are the userspace tools called ?18:42
stgrabernbd-client and nbd-server18:43
apwstgraber, ooo I think I have it18:44
apwstgraber, ok i have a theory, it might be bunnies18:47
apwstgraber, building a test kernel with more debug to confirm18:47
stgraberok :)18:47
apwstgraber, ok updated kernel in the same place19:11
stgraberapw: ok, downloading it now19:11
=== sforshee-lunch is now known as sforshee
stgraberapw: http://paste.ubuntu.com/561560/19:16
stgraberapw: http://paste.ubuntu.com/561559/19:16
stgrabermanagement to connect to it quite a few times, no hang19:17
apwstgraber, i assume that is good yes?19:17
apwstgraber, shame we didn't know about this a little earlier, we could have had this fixed for a219:17
stgraberapw: yep, that's perfect. I'm doing a bit more testing on it, mounting the same volume 6 times and trying to play with each mount, then unmount19:18
stgraberto make sure it doesn't freeze at some point, but it looks great for now19:18
apwstgraber, i think the fix is clear so i'll push it upstream19:18
stgraberroot@isotest-ltsp:/mnt# nbd-client -d /dev/nbd119:19
stgraberError: Cannot open NBD: Permission denied19:19
stgraberPlease ensure the 'nbd' module is loaded.19:19
apwstgraber, what triggered that19:19
stgraberjust trying to unmount19:19
stgrabers/unmount/disconnect/19:19
stgraberseems like I can connect fine but can't disconnect19:19
apwhrm, i suspect that thats been in there for a long time19:19
apwas nothing has changed other than this locking19:20
stgraberopen("/dev/nbd1", O_RDWR|O_LARGEFILE)   = -1 EACCES (Permission denied)19:20
apwbut i suspect its a different bug19:20
stgraberthat'll still make ltsp to fail as I need to connect, check the block device and disconnect everytime. If disconnect fails, the check will either be pointless or will crash somehow19:20
apwwell as i say there are exactly 0 changes other than this specific lock change19:21
apwwhich only affect ioctl19:21
apwso i am suspicious its not new19:21
stgraberwasn't there in maverick ;) though might be caused by the new client19:21
apwthe EACCESS cirtianly implies the kernel hated you19:22
apwstgraber, can you look see hwat nbd devices are listed in /sys19:23
apwstgraber, anything in dmesg in concert with the open failure ?19:23
stgrabernope, nothing shows up in dmesg19:24
stgraber /sys shows all 16 devices19:24
apwstgraber, and are there any nbd-client thingies running?19:24
apwand any nbd_threads ?19:25
stgraberyep, my 7 nbd-client are still there19:25
apwso you cannot stop any of them ?19:25
stgraberindeed19:25
stgraberand they are still working19:25
stgraberso if I do: nbd-client -d /dev/nbd019:25
stgraberI get the error message19:25
stgraberthen if I try to mount it, it works just fine19:26
apwand its not mounted19:26
apwwhat does lsof /dev/nbd1 say19:26
stgraberyep, I unmounted all of them before testing the disconnect19:26
stgraberCOMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME19:26
stgrabernbd-clien 1395 root    3u   BLK  43,16      0t0 5489 /dev/nbd119:26
stgrabersame issue using maverick's nbd-client19:29
stgraberquickly trying to connect on the same nbd server from a maverick VM to make sure it worked fine then19:30
stgraberroot@desktop-maverick01:~# nbd-client -d /dev/nbd119:31
stgraberDisconnecting: que, disconnect, sock, done19:31
stgraberapw: that's with the same userspace on both natty and maverick, so something must have change somewhere in the kernel19:31
apwstgraber, yep ... shame we don't test more often19:32
stgraberShould be possible to make some automatic test of it, that's basically what I did to get a test environment: http://paste.ubuntu.com/561547/19:33
stgraberthen nbd-client -d /dev/nbdX to test the disconnect19:34
keesanyone have clues on how to further debug kernel-only edid failures? bug 71207519:38
ubot2Launchpad bug 712075 in linux "[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid" [Undecided,New] https://launchpad.net/bugs/71207519:38
* bjf -> lunch19:39
=== bjf is now known as bjf[afk]
ohsixkees: ask ickle :]19:40
keesohsix: okay, cool19:41
keesohsix: where can I find them?19:41
ohsixa lot are just plain wrong though; but theres a sysfs file to write a working cached copy iirc19:41
ohsix#intel-gfx 19:41
keesohsix: the issue with this is that it spontaneously fails. it works most of them time and then in the middle of a running session, bombs out19:41
ohsixah19:42
apwstgraber, are you able to compile the user space tools ?19:46
apwif so could you try changng the open in disconnect() to be a O_RDONLY open and see if it works then ?19:47
stgrabergrabbing the source now19:48
apwnot sure if it makes sense that it cannot open it RDWR but wnat to know if that helps19:49
stgraberroot@isotest-ltsp:~/nbd-2.9.16# nbd-client -d /dev/nbd119:51
stgraberDisconnecting: que, disconnect, sock, done19:51
stgraberworked fine19:52
apwstgraber, hrm19:59
apwstgraber, are there any normal commands, like status ones, from that client19:59
apwand if so do they work20:00
stgraberapw: hmm, now I can disconnect but can't reconnect after that ...20:04
stgraberI have a confcall just now, will continue debugging after that20:04
apwtry making the other open thats RDWR in the thing RDONLY20:04
apwpretty sure you shouldn't need to, but suspect it will work20:04
apwstgraber, do you have a complete strace ?20:07
apwstgraber, of the sucessful attach the first time20:08
apw        if (ioctl(nbd, BLKROSET, (unsigned long) &read_only) < 0)20:09
apw                err("Unable to set read-only attribute for device");20:09
apwstgraber, i think it is doing it to itself ... seting the device read only so even it cannot open it20:09
apwpossibly that is hanging over on last close and it should not, but i think it is right that the disconnect cannot open20:10
stgraberapw: making all of them read-only works fine :)20:10
apwstgraber, and you can  mount the disk still ?20:11
brycehJFo, mind putting kees' bug 712075 on the kernel team's review list?  Gots a patch from upstream.20:11
ubot2Launchpad bug 712075 in linux "[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid" [High,Triaged] https://launchpad.net/bugs/71207520:11
* JFo opens the bug20:12
apwbryceh, JFo done20:12
JFowill do bryceh :)20:12
JFoooh, he's quick on the draw that one :)20:12
JFothanks apw20:13
apwstgraber, i am not sure if i know if the userspace was broken or the kernel is now broken20:13
apwstgraber, will think on that for a bit20:15
apwstgraber, can u stress test the open close mount unmounts against it with the new tools20:17
apwand let me knwo if anything else goes wrong20:17
apwcirtianly test you can read/write files on the filesystems inside20:17
brycehapw, JFo, thanks20:19
sorenapw: I think 711951 is a dupe.20:21
sorenapw: I've just sent a patch to lkml to fix a lockig issue in nbd.20:22
sorenI was looking for the bug reference on launchpad when I found this new one.20:22
stgraberapw: http://paste.ubuntu.com/561592/20:24
sorenstgraber: ^20:24
sorenhttps://lkml.org/lkml/2011/2/2/23720:25
stgrabersoren: so both of you fixed it apparently ;)20:25
sorenI've just sent the patch to the kernel-team ml.20:26
* jjohansen -> lunch20:27
tgardnersoren, has Paul queued it for upstream submission ?20:29
sorentgardner: He told me to send it directly to akpm.20:30
sorentgardner: I suppose that's a "yes".20:30
tgardnersoren: ok, I'll pull it in until it conflicts with the next rc candidate.20:30
sorentgardner: ta very much.20:37
tgardnerapw, did you already have something on deck for this nbd lockup ?20:37
sorenI gues I should hav eincluded this link in my e-mail: https://lkml.org/lkml/2011/1/26/13120:37
tgardnersoren, I'll include it in the commit log20:37
=== bjf[afk] is now known as bjf
apwtgardner, yes20:49
apwabout to send it out20:49
tgardnerapw, check master-next first, perhaps we're colliding20:49
apwtgardner, yeah i see that hallyn has a different approach removing it20:50
apwhrm20:50
apwtgardner, ahh ok so you pushed it, then i'll wait and do nothing20:50
tgardnerapw, well, they pretty much just ripped it out.20:50
tgardnerthe mutex, I mean20:50
apwtgardner, yeah, their argument is its not needed.  hard to say20:50
tgardnerapw, guess we'll find out when it goes to akpm20:51
apwmy patch does the opposite, and drops it during the long lived ioctl20:51
apwyep20:51
apwtgardner, so much for that effort :/20:52
apwsuch is life20:52
tgardnersome days you bite the bear, some days the bear bites back20:54
apwyeah ... food then i recon20:54
brycehapw, I'm about to forward bug #711275 upstream - seems a bit false positive-ish - but want to check first if this is something you already know about?20:58
ubot2Launchpad bug 711275 in xserver-xorg-video-intel "[gm45] GPU lockup (EIR: 0x00000010) during boot - EIR stuck: 0x00000010, masking" [Undecided,New] https://launchpad.net/bugs/71127520:58
brycehapw, it appears to me that the GPU is hanging during boot but successfully resetting itself, but this still generates an apport crash report20:58
apwbryceh, hrm, doesn't that sound a lot like the other one, the one where they recently said 'blacklisting vesafb fixes things'21:01
apwthough this continues rather than breaking21:01
brycehyeah21:01
apwbug #70209021:01
apwand didn't you send that up already?21:01
* apw pokes ubot2 21:01
brycehapw, yeah you're right21:01
apwsounds like they are being luckier, perhaps ask them to try the blacklist 21:01
brycehapw, actually I hadn't sent that one upstream yet, but it does look similarish21:01
ubot2Launchpad bug 702090 in xserver-xorg-video-intel "i965gm GPU lockup if vesafb is left loaded (EIR: 0x00000010 PGTBL_ER: 0x00000100)" [High,Triaged] https://launchpad.net/bugs/70209021:01
apwbryceh, so i recon send one or other up, and test the vesafb thing on the new one21:02
brycehor maybe it's bug #686388 which I did revisit21:02
ubot2Launchpad bug 686388 in xserver-xorg-video-intel "[i965gm] GPU lockup - Invalid GTT entry during Display B Fetch" [Unknown,Confirmed] https://launchpad.net/bugs/68638821:02
brycehanyway can't hurt to send another report up, thanks.21:02
apwwhat happens when we flip over to to the drmfb doesn't bare thinking about21:02
bryceho_O21:02
apwlayering violations just isn't in it21:02
apwwe don't give anyone any time to handle the loss of the old driver21:04
apweven the people with it open get a nasty shock21:04
apw(plymouth)21:04
bryceherf21:08
brycehapw, btw I'm also seeing a spate of 'GPU hanging too fast, declaring wedged!' bugs - #710321  711645  71169121:09
bryceh bug #71032121:10
ubot2Launchpad bug 710321 in xserver-xorg-video-intel "[i965gm] GPU lockup during login - GPU hanging too fast, declaring wedged!" [Undecided,New] https://launchpad.net/bugs/71032121:10
apwbryceh, does that mean "i reset it a few times and it broke again and i am giving up"21:16
brycehapw, seems like a good guess21:16
brycehapw, but this is the first time I've seen that particular syntax in a message, so dunno if it's generic verbage or something specific21:17
apwyeah odd indeed21:17
apw0x00000a78:      0x0a000002: MI_DISPLAY_BUFFER_INFO21:20
apwBad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3]21:20
brycehhuh21:20
brycehthat looks promising21:20
apwbryceh, that looks bad, wouldn't that be a mesa injection thing21:21
brycehcould be21:21
apw0x000010b8:      0x0a000002: MI_DISPLAY_BUFFER_INFO21:23
apwBad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3]21:23
apwmore than one in here too21:23
apwso if each was breaking it ... that might make sense21:23
brycehwhere'd you spot that Bad length error?21:24
brycehoh I see it21:24
apwin the gpu dump yeah21:24
apwof course it could be the dump tool thats wrong :/21:25
apw0x00014a88:      0x0a000002: MI_DISPLAY_BUFFER_INFO21:26
apwBad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3]21:26
apwthere are a number in here21:26
brycehyeah the other bugs have it too21:26
apwi suspect we could do with an arsenal script to look through these for errors and put them in as a comment on the bug21:27
brycehapw, yeah21:28
brycehactually what I'd like to do is modify the gpu hook itself to extract the warnings for us directly21:28
apwbryceh, a better plan indeed if you can be bothered :)21:29
brycehand that's on my todo list, but down under a few other bigger priorities currently21:29
apwthere seem to be similar quantities of these errors in the ring as there are hangs in the dmesg21:30
apwso i suspect they are worth investigating21:30
apwbryceh, is there somewhere i could find a document on the format of the ring to see what it says for the sizes etc21:30
brycehapw, btw check out - http://www.bryceharrington.org/X/Reports/ubuntu-x-swat/totals-natty-workqueue.svg21:30
apwintel is a problem :)21:32
brycehapw, yeah probably at http://intellinuxgraphics.org/documentation.html21:32
brycehapw, and thus my interest in these gpu bugs today ;-)21:32
apwheh indeed21:32
brycehand http://www.x.org/docs/intel/ looks like it has mirrors of the docs21:33
* apw screams about unity being balls21:45
* apw cannot cope with the stacking issues21:45
apwhow hard is it to stack windows in the right order21:45
jjohansenapw: obviously harder than crashing21:47
bjfapw, jjohansen, look at it this way, it's harder for people to notice kernel bugs when their desktop keeps crashing (or they don't have one)22:55
bjfapw, jjohansen, there's a pony in there somewhere :-)22:56
apwnice, a bright side22:56
jjohansenwhee yet another pathological oom killer death23:11
=== sconklin is now known as sconklin-gone

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!