[09:55] <tjaalton> hmm, so x86 suspend issues should be fixed on natty? I still get bug 704550
[09:55] <ubot2> Launchpad bug 704550 in linux "Lenovo Thinkpad X61 fails to resume from suspend" [Undecided,New] https://launchpad.net/bugs/704550
[12:35] <lag> apw: It seems we have the right idea already: https://wiki.linaro.org/PackageYourOwnKernel
[12:35] <lag> apw: Thanks for your help :)
[12:38] <apw> lag hehe good
[15:43] <quup> I compiled the kernel just fine, I get the linux-image and linux-headers-...-generic package, the kernel boots and everything
[15:44] <quup> but I can't install the headers package because: dpkg: dependency problems prevent configuration of linux-headers-2.6.38-1-generic:
[15:44] <quup>  linux-headers-2.6.38-1-generic depends on linux-headers-2.6.38-1; however:
[15:44] <quup>   Package linux-headers-2.6.38-1 is not installed.
[15:44] <quup> how do I get a .deb for that as well?
[15:44] <apw> quup, what did you build ?  binary-generic ?
[15:44] <quup> apw: exactly
[15:44] <apw> you also need to build binary-indep for the common headers
[15:44] <quup> oh ok, thanks!
[15:44] <smb> Or binary-headers
[15:46] <apw> yeah binary-headers looks to do the least
[15:57] <quup> worked perfectly! thanks :)
[16:50] <stgraber> apw: oops, indeed, that channel is probably a lot better for that kind of discussion :)
[16:51] <apw> stgraber, this is the right time to move
[16:51] <apw> stgraber, so if you could get the whole dmesg in case there is anything before of interest that'd be good
[16:51] <apw> stgraber, i suspect a locking issue from the diagnostic
[16:51] <apw> is it easy for you to test a debug kernel in this setup?
[16:52] <stgraber> yep, as soon as I get the VM re-installed, I'll make a snapshot to make sure I won't trash it again. Then testing will be easy
[16:52] <apw> stgraber, thanks
[16:53] <apw> stgraber, i'll shove some debug in and get something building in the meantime
[16:53] <stgraber> cdimage is a bit slow so I'll need 15min to get a natty desktop image, then I just need to install it, install nbd-server and export some empty file ;)
[16:56] <apw> my build server may be large, but its not going to be done before you :)
[16:58] <apw> stgraber, 32 or 64 bit ?
[17:00] <stgraber> apw: that one is 32bit
[17:00] <stgraber> though if that's useful for debugging, I had the same issue with 64bit on Monday
[17:00] <apw> stgraber, ok ... building with some lock debug
[17:00] <apw> stgraber, don't care, just don't want to make both for speed reasons
[17:00] <apw> building 32 bit now
[17:20] <apw> stgraber, kernel will be here in about 5 mins ... http://people.canonical.com/~apw/lp711951-natty/
[17:27] <apw> stgraber, ok kernel is there
[17:33] <apw> stgraber, poke me when you have tested it
[17:38] <apw> kees, could we get a mergy merge please, bored of these results
[17:39] <apw> kees, how hard would it be to have a shadow set which are built from our tip, just for our packages, obviously emitted somewhere else
[17:40] <kees> apw: the entire export mechanism is already in the cve tracker. just need to set env vars and type "make" in the toplevel dir.
[17:41] <JFo> <- grabbing lunch
[17:41] <apw> kees, got a recipe you use on people ?  then i can just dup it into mine :)
[17:41] <kees> apw: but I can try to set up an automatic thing
[17:41] <apw> kees, happy to run it myself if its that easy
[17:42] <apw> i assume all the pre-reqs are on people if you can run it
[17:42] <kees> apw: yeah, see ~ubuntu-security/bin/html-report.sh
[17:42] <kees> apw: /home/ubuntu-security/bin/cron-cve-tracker.sh does the bzr pull before calling html-report.sh
[17:42] <kees> apw: yup
[17:43] <kees> you'll need ~ubuntu-security/.ubuntu-cve-tracker.conf for the repo settings
[17:44] <apw> kees, cool will have a look at it tommorrow
[17:47] <kees> okay
[17:59] <stgraber> apw: just got back from lunch, looking at it now
[18:00] <apw> stgraber, ahh ok
[18:06] <apw> kees, ok that was so easy its already done :)
[18:07] <apw> smb, sconklin, bjf, this url points to the kernel package etc as per the cve tracker, but showing the status as from the tip of our branch: http://people.canonical.com/~apw/cve/pkg/linux.html
[18:08] <apw> auto updating much like the original
[18:08] <apw> now we are not beholden to security team merges to get team status
[18:09] <sconklin> "a man with two watched never knows what time it is. A man with one is always certain"
[18:09] <sconklin> But nice job. At least we know which one to believe
[18:09] <apw> they just mean differnt things.  one is how we thinkg we are doing, the other is how security think we are doing
[18:09] <kees> apw: it's so easy to use. :)
[18:09] <sconklin> yeah
[18:10] <apw> kees, to get data out of yes :-p
[18:10] <kees> heh
[18:10] <apw> but we are instant gratification junkies here
[18:10] <kees> apw: you may want to start using scripts/check-syntax too, otherwise the exports might blow up a bit if things aren't sane
[18:11] <apw> kees, tried runinng it, but it needed unspecified pre-reqs to run and exploded in a heap
[18:11] <kees> hm
[18:12] <apw> kees, well we'll see it first anyhow, if our own tables explode
[18:12] <kees> apw: now that you have the ~/.ubuntu-cve-tracker.conf, check-syntax should work
[18:28] <stgraber> apw: same issue with the new kernel
[18:28] <apw> stgraber, yeah not expecting anything fixed by it ... but some text in the dmesg 
[18:28] <apw> with APW in it
[18:30] <stgraber> http://paste.ubuntu.com/561540/
[18:30] <stgraber> http://paste.ubuntu.com/561541/
[18:30] <stgraber> http://paste.ubuntu.com/561542/
[18:31] <stgraber> first paste is before starting nbd-client, second is after the first nbd-client, third is after the second nbd-client
[18:32]  * tgardner --> lunch
[18:32] <stgraber> apw: ^ (not sure how closely you monitor the channel ;))
[18:33] <apw> stgraber, always best to add my nick, i am easily distracted by shiney objects
[18:34] <apw> stgraber, have you got the entire thing as one dmesg?
[18:34] <stgraber> I can cat all of them together ;)
[18:34] <apw> i think its telling me the storey just not sure
[18:34] <stgraber> http://paste.ubuntu.com/561545/
[18:35] <apw> if you are 100% sure its the complete dmesg thats great
[18:35] <stgraber> I basically did a "dmesg -c" between each so I shouldn't have lost any entry
[18:35] <apw> [   26.659231] APW: taken
[18:35] <apw> [   27.660130] Dev nbd0: unable to read RDB block 8
[18:35] <apw> [   27.660144]  nbd0: unable to read partition table
[18:35] <apw> [   27.660148] nbd0: partition table beyond EOD, truncated
[18:35] <apw> [   36.754723] APW: taking &nbd_mutex
[18:35] <apw> ok ... so that says an error path is not dropping the lock
[18:35] <apw> not the last one but the one before!?!
[18:40] <apw> stgraber, damn not enough info to diagnose ... will have to spin you another kenel, you about for a bit ?
[18:42] <stgraber> apw: yep
[18:42] <apw> stgraber, ooo might have an idea... what are the userspace tools called ?
[18:43] <stgraber> nbd-client and nbd-server
[18:44] <apw> stgraber, ooo I think I have it
[18:47] <apw> stgraber, ok i have a theory, it might be bunnies
[18:47] <apw> stgraber, building a test kernel with more debug to confirm
[18:47] <stgraber> ok :)
[19:11] <apw> stgraber, ok updated kernel in the same place
[19:11] <stgraber> apw: ok, downloading it now
[19:16] <stgraber> apw: http://paste.ubuntu.com/561560/
[19:16] <stgraber> apw: http://paste.ubuntu.com/561559/
[19:17] <stgraber> management to connect to it quite a few times, no hang
[19:17] <apw> stgraber, i assume that is good yes?
[19:17] <apw> stgraber, shame we didn't know about this a little earlier, we could have had this fixed for a2
[19:18] <stgraber> apw: yep, that's perfect. I'm doing a bit more testing on it, mounting the same volume 6 times and trying to play with each mount, then unmount
[19:18] <stgraber> to make sure it doesn't freeze at some point, but it looks great for now
[19:18] <apw> stgraber, i think the fix is clear so i'll push it upstream
[19:19] <stgraber> root@isotest-ltsp:/mnt# nbd-client -d /dev/nbd1
[19:19] <stgraber> Error: Cannot open NBD: Permission denied
[19:19] <stgraber> Please ensure the 'nbd' module is loaded.
[19:19] <apw> stgraber, what triggered that
[19:19] <stgraber> just trying to unmount
[19:19] <stgraber> s/unmount/disconnect/
[19:19] <stgraber> seems like I can connect fine but can't disconnect
[19:19] <apw> hrm, i suspect that thats been in there for a long time
[19:20] <apw> as nothing has changed other than this locking
[19:20] <stgraber> open("/dev/nbd1", O_RDWR|O_LARGEFILE)   = -1 EACCES (Permission denied)
[19:20] <apw> but i suspect its a different bug
[19:20] <stgraber> that'll still make ltsp to fail as I need to connect, check the block device and disconnect everytime. If disconnect fails, the check will either be pointless or will crash somehow
[19:21] <apw> well as i say there are exactly 0 changes other than this specific lock change
[19:21] <apw> which only affect ioctl
[19:21] <apw> so i am suspicious its not new
[19:21] <stgraber> wasn't there in maverick ;) though might be caused by the new client
[19:22] <apw> the EACCESS cirtianly implies the kernel hated you
[19:23] <apw> stgraber, can you look see hwat nbd devices are listed in /sys
[19:23] <apw> stgraber, anything in dmesg in concert with the open failure ?
[19:24] <stgraber> nope, nothing shows up in dmesg
[19:24] <stgraber>  /sys shows all 16 devices
[19:24] <apw> stgraber, and are there any nbd-client thingies running?
[19:25] <apw> and any nbd_threads ?
[19:25] <stgraber> yep, my 7 nbd-client are still there
[19:25] <apw> so you cannot stop any of them ?
[19:25] <stgraber> indeed
[19:25] <stgraber> and they are still working
[19:25] <stgraber> so if I do: nbd-client -d /dev/nbd0
[19:25] <stgraber> I get the error message
[19:26] <stgraber> then if I try to mount it, it works just fine
[19:26] <apw> and its not mounted
[19:26] <apw> what does lsof /dev/nbd1 say
[19:26] <stgraber> yep, I unmounted all of them before testing the disconnect
[19:26] <stgraber> COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
[19:26] <stgraber> nbd-clien 1395 root    3u   BLK  43,16      0t0 5489 /dev/nbd1
[19:29] <stgraber> same issue using maverick's nbd-client
[19:30] <stgraber> quickly trying to connect on the same nbd server from a maverick VM to make sure it worked fine then
[19:31] <stgraber> root@desktop-maverick01:~# nbd-client -d /dev/nbd1
[19:31] <stgraber> Disconnecting: que, disconnect, sock, done
[19:31] <stgraber> apw: that's with the same userspace on both natty and maverick, so something must have change somewhere in the kernel
[19:32] <apw> stgraber, yep ... shame we don't test more often
[19:33] <stgraber> Should be possible to make some automatic test of it, that's basically what I did to get a test environment: http://paste.ubuntu.com/561547/
[19:34] <stgraber> then nbd-client -d /dev/nbdX to test the disconnect
[19:38] <kees> anyone have clues on how to further debug kernel-only edid failures? bug 712075
[19:38] <ubot2> Launchpad bug 712075 in linux "[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid" [Undecided,New] https://launchpad.net/bugs/712075
[19:39]  * bjf -> lunch
[19:40] <ohsix> kees: ask ickle :]
[19:41] <kees> ohsix: okay, cool
[19:41] <kees> ohsix: where can I find them?
[19:41] <ohsix> a lot are just plain wrong though; but theres a sysfs file to write a working cached copy iirc
[19:41] <ohsix> #intel-gfx 
[19:41] <kees> ohsix: the issue with this is that it spontaneously fails. it works most of them time and then in the middle of a running session, bombs out
[19:42] <ohsix> ah
[19:46] <apw> stgraber, are you able to compile the user space tools ?
[19:47] <apw> if so could you try changng the open in disconnect() to be a O_RDONLY open and see if it works then ?
[19:48] <stgraber> grabbing the source now
[19:49] <apw> not sure if it makes sense that it cannot open it RDWR but wnat to know if that helps
[19:51] <stgraber> root@isotest-ltsp:~/nbd-2.9.16# nbd-client -d /dev/nbd1
[19:51] <stgraber> Disconnecting: que, disconnect, sock, done
[19:52] <stgraber> worked fine
[19:59] <apw> stgraber, hrm
[19:59] <apw> stgraber, are there any normal commands, like status ones, from that client
[20:00] <apw> and if so do they work
[20:04] <stgraber> apw: hmm, now I can disconnect but can't reconnect after that ...
[20:04] <stgraber> I have a confcall just now, will continue debugging after that
[20:04] <apw> try making the other open thats RDWR in the thing RDONLY
[20:04] <apw> pretty sure you shouldn't need to, but suspect it will work
[20:07] <apw> stgraber, do you have a complete strace ?
[20:08] <apw> stgraber, of the sucessful attach the first time
[20:09] <apw>         if (ioctl(nbd, BLKROSET, (unsigned long) &read_only) < 0)
[20:09] <apw>                 err("Unable to set read-only attribute for device");
[20:09] <apw> stgraber, i think it is doing it to itself ... seting the device read only so even it cannot open it
[20:10] <apw> possibly that is hanging over on last close and it should not, but i think it is right that the disconnect cannot open
[20:10] <stgraber> apw: making all of them read-only works fine :)
[20:11] <apw> stgraber, and you can  mount the disk still ?
[20:11] <bryceh> JFo, mind putting kees' bug 712075 on the kernel team's review list?  Gots a patch from upstream.
[20:11] <ubot2> Launchpad bug 712075 in linux "[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid" [High,Triaged] https://launchpad.net/bugs/712075
[20:12]  * JFo opens the bug
[20:12] <apw> bryceh, JFo done
[20:12] <JFo> will do bryceh :)
[20:12] <JFo> ooh, he's quick on the draw that one :)
[20:13] <JFo> thanks apw
[20:13] <apw> stgraber, i am not sure if i know if the userspace was broken or the kernel is now broken
[20:15] <apw> stgraber, will think on that for a bit
[20:17] <apw> stgraber, can u stress test the open close mount unmounts against it with the new tools
[20:17] <apw> and let me knwo if anything else goes wrong
[20:17] <apw> cirtianly test you can read/write files on the filesystems inside
[20:19] <bryceh> apw, JFo, thanks
[20:21] <soren> apw: I think 711951 is a dupe.
[20:22] <soren> apw: I've just sent a patch to lkml to fix a lockig issue in nbd.
[20:22] <soren> I was looking for the bug reference on launchpad when I found this new one.
[20:24] <stgraber> apw: http://paste.ubuntu.com/561592/
[20:24] <soren> stgraber: ^
[20:25] <soren> https://lkml.org/lkml/2011/2/2/237
[20:25] <stgraber> soren: so both of you fixed it apparently ;)
[20:26] <soren> I've just sent the patch to the kernel-team ml.
[20:27]  * jjohansen -> lunch
[20:29] <tgardner> soren, has Paul queued it for upstream submission ?
[20:30] <soren> tgardner: He told me to send it directly to akpm.
[20:30] <soren> tgardner: I suppose that's a "yes".
[20:30] <tgardner> soren: ok, I'll pull it in until it conflicts with the next rc candidate.
[20:37] <soren> tgardner: ta very much.
[20:37] <tgardner> apw, did you already have something on deck for this nbd lockup ?
[20:37] <soren> I gues I should hav eincluded this link in my e-mail: https://lkml.org/lkml/2011/1/26/131
[20:37] <tgardner> soren, I'll include it in the commit log
[20:49] <apw> tgardner, yes
[20:49] <apw> about to send it out
[20:49] <tgardner> apw, check master-next first, perhaps we're colliding
[20:50] <apw> tgardner, yeah i see that hallyn has a different approach removing it
[20:50] <apw> hrm
[20:50] <apw> tgardner, ahh ok so you pushed it, then i'll wait and do nothing
[20:50] <tgardner> apw, well, they pretty much just ripped it out.
[20:50] <tgardner> the mutex, I mean
[20:50] <apw> tgardner, yeah, their argument is its not needed.  hard to say
[20:51] <tgardner> apw, guess we'll find out when it goes to akpm
[20:51] <apw> my patch does the opposite, and drops it during the long lived ioctl
[20:51] <apw> yep
[20:52] <apw> tgardner, so much for that effort :/
[20:52] <apw> such is life
[20:54] <tgardner> some days you bite the bear, some days the bear bites back
[20:54] <apw> yeah ... food then i recon
[20:58] <bryceh> apw, I'm about to forward bug #711275 upstream - seems a bit false positive-ish - but want to check first if this is something you already know about?
[20:58] <ubot2> Launchpad bug 711275 in xserver-xorg-video-intel "[gm45] GPU lockup (EIR: 0x00000010) during boot - EIR stuck: 0x00000010, masking" [Undecided,New] https://launchpad.net/bugs/711275
[20:58] <bryceh> apw, it appears to me that the GPU is hanging during boot but successfully resetting itself, but this still generates an apport crash report
[21:01] <apw> bryceh, hrm, doesn't that sound a lot like the other one, the one where they recently said 'blacklisting vesafb fixes things'
[21:01] <apw> though this continues rather than breaking
[21:01] <bryceh> yeah
[21:01] <apw> bug #702090
[21:01] <apw> and didn't you send that up already?
[21:01]  * apw pokes ubot2 
[21:01] <bryceh> apw, yeah you're right
[21:01] <apw> sounds like they are being luckier, perhaps ask them to try the blacklist 
[21:01] <bryceh> apw, actually I hadn't sent that one upstream yet, but it does look similarish
[21:01] <ubot2> Launchpad bug 702090 in xserver-xorg-video-intel "i965gm GPU lockup if vesafb is left loaded (EIR: 0x00000010 PGTBL_ER: 0x00000100)" [High,Triaged] https://launchpad.net/bugs/702090
[21:02] <apw> bryceh, so i recon send one or other up, and test the vesafb thing on the new one
[21:02] <bryceh> or maybe it's bug #686388 which I did revisit
[21:02] <ubot2> Launchpad bug 686388 in xserver-xorg-video-intel "[i965gm] GPU lockup - Invalid GTT entry during Display B Fetch" [Unknown,Confirmed] https://launchpad.net/bugs/686388
[21:02] <bryceh> anyway can't hurt to send another report up, thanks.
[21:02] <apw> what happens when we flip over to to the drmfb doesn't bare thinking about
[21:02] <bryceh> o_O
[21:02] <apw> layering violations just isn't in it
[21:04] <apw> we don't give anyone any time to handle the loss of the old driver
[21:04] <apw> even the people with it open get a nasty shock
[21:04] <apw> (plymouth)
[21:08] <bryceh> erf
[21:09] <bryceh> apw, btw I'm also seeing a spate of 'GPU hanging too fast, declaring wedged!' bugs - #710321  711645  711691
[21:10] <bryceh>  bug #710321
[21:10] <ubot2> Launchpad bug 710321 in xserver-xorg-video-intel "[i965gm] GPU lockup during login - GPU hanging too fast, declaring wedged!" [Undecided,New] https://launchpad.net/bugs/710321
[21:16] <apw> bryceh, does that mean "i reset it a few times and it broke again and i am giving up"
[21:16] <bryceh> apw, seems like a good guess
[21:17] <bryceh> apw, but this is the first time I've seen that particular syntax in a message, so dunno if it's generic verbage or something specific
[21:17] <apw> yeah odd indeed
[21:20] <apw> 0x00000a78:      0x0a000002: MI_DISPLAY_BUFFER_INFO
[21:20] <apw> Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3]
[21:20] <bryceh> huh
[21:20] <bryceh> that looks promising
[21:21] <apw> bryceh, that looks bad, wouldn't that be a mesa injection thing
[21:21] <bryceh> could be
[21:23] <apw> 0x000010b8:      0x0a000002: MI_DISPLAY_BUFFER_INFO
[21:23] <apw> Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3]
[21:23] <apw> more than one in here too
[21:23] <apw> so if each was breaking it ... that might make sense
[21:24] <bryceh> where'd you spot that Bad length error?
[21:24] <bryceh> oh I see it
[21:24] <apw> in the gpu dump yeah
[21:25] <apw> of course it could be the dump tool thats wrong :/
[21:26] <apw> 0x00014a88:      0x0a000002: MI_DISPLAY_BUFFER_INFO
[21:26] <apw> Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3]
[21:26] <apw> there are a number in here
[21:26] <bryceh> yeah the other bugs have it too
[21:27] <apw> i suspect we could do with an arsenal script to look through these for errors and put them in as a comment on the bug
[21:28] <bryceh> apw, yeah
[21:28] <bryceh> actually what I'd like to do is modify the gpu hook itself to extract the warnings for us directly
[21:29] <apw> bryceh, a better plan indeed if you can be bothered :)
[21:29] <bryceh> and that's on my todo list, but down under a few other bigger priorities currently
[21:30] <apw> there seem to be similar quantities of these errors in the ring as there are hangs in the dmesg
[21:30] <apw> so i suspect they are worth investigating
[21:30] <apw> bryceh, is there somewhere i could find a document on the format of the ring to see what it says for the sizes etc
[21:30] <bryceh> apw, btw check out - http://www.bryceharrington.org/X/Reports/ubuntu-x-swat/totals-natty-workqueue.svg
[21:32] <apw> intel is a problem :)
[21:32] <bryceh> apw, yeah probably at http://intellinuxgraphics.org/documentation.html
[21:32] <bryceh> apw, and thus my interest in these gpu bugs today ;-)
[21:32] <apw> heh indeed
[21:33] <bryceh> and http://www.x.org/docs/intel/ looks like it has mirrors of the docs
[21:45]  * apw screams about unity being balls
[21:45]  * apw cannot cope with the stacking issues
[21:45] <apw> how hard is it to stack windows in the right order
[21:47] <jjohansen> apw: obviously harder than crashing
[22:55] <bjf> apw, jjohansen, look at it this way, it's harder for people to notice kernel bugs when their desktop keeps crashing (or they don't have one)
[22:56] <bjf> apw, jjohansen, there's a pony in there somewhere :-)
[22:56] <apw> nice, a bright side
[23:11] <jjohansen> whee yet another pathological oom killer death