=== skaet is now known as skaet_afk === smb` is now known as smb [09:55] hmm, so x86 suspend issues should be fixed on natty? I still get bug 704550 [09:55] Launchpad bug 704550 in linux "Lenovo Thinkpad X61 fails to resume from suspend" [Undecided,New] https://launchpad.net/bugs/704550 === _LibertyZero is now known as LibertyZero [12:35] apw: It seems we have the right idea already: https://wiki.linaro.org/PackageYourOwnKernel [12:35] apw: Thanks for your help :) [12:38] lag hehe good === herton is now known as herton_lunch === tgardner is now known as tgardner-afk === skaet_afk is now known as skaet [15:43] I compiled the kernel just fine, I get the linux-image and linux-headers-...-generic package, the kernel boots and everything [15:44] but I can't install the headers package because: dpkg: dependency problems prevent configuration of linux-headers-2.6.38-1-generic: [15:44] linux-headers-2.6.38-1-generic depends on linux-headers-2.6.38-1; however: [15:44] Package linux-headers-2.6.38-1 is not installed. [15:44] how do I get a .deb for that as well? [15:44] quup, what did you build ? binary-generic ? [15:44] apw: exactly [15:44] you also need to build binary-indep for the common headers [15:44] oh ok, thanks! [15:44] Or binary-headers [15:46] yeah binary-headers looks to do the least === Quintasan_ is now known as Quintasan [15:57] worked perfectly! thanks :) === herton_lunch is now known as herton [16:50] apw: oops, indeed, that channel is probably a lot better for that kind of discussion :) [16:51] stgraber, this is the right time to move [16:51] stgraber, so if you could get the whole dmesg in case there is anything before of interest that'd be good [16:51] stgraber, i suspect a locking issue from the diagnostic [16:51] is it easy for you to test a debug kernel in this setup? === JanC_ is now known as JanC [16:52] yep, as soon as I get the VM re-installed, I'll make a snapshot to make sure I won't trash it again. Then testing will be easy [16:52] stgraber, thanks [16:53] stgraber, i'll shove some debug in and get something building in the meantime [16:53] cdimage is a bit slow so I'll need 15min to get a natty desktop image, then I just need to install it, install nbd-server and export some empty file ;) [16:56] my build server may be large, but its not going to be done before you :) [16:58] stgraber, 32 or 64 bit ? === tgardner-afk is now known as tgardner [17:00] apw: that one is 32bit [17:00] though if that's useful for debugging, I had the same issue with 64bit on Monday [17:00] stgraber, ok ... building with some lock debug [17:00] stgraber, don't care, just don't want to make both for speed reasons [17:00] building 32 bit now [17:20] stgraber, kernel will be here in about 5 mins ... http://people.canonical.com/~apw/lp711951-natty/ [17:27] stgraber, ok kernel is there [17:33] stgraber, poke me when you have tested it [17:38] kees, could we get a mergy merge please, bored of these results [17:39] kees, how hard would it be to have a shadow set which are built from our tip, just for our packages, obviously emitted somewhere else [17:40] apw: the entire export mechanism is already in the cve tracker. just need to set env vars and type "make" in the toplevel dir. [17:41] <- grabbing lunch [17:41] kees, got a recipe you use on people ? then i can just dup it into mine :) [17:41] apw: but I can try to set up an automatic thing [17:41] kees, happy to run it myself if its that easy [17:42] i assume all the pre-reqs are on people if you can run it [17:42] apw: yeah, see ~ubuntu-security/bin/html-report.sh [17:42] apw: /home/ubuntu-security/bin/cron-cve-tracker.sh does the bzr pull before calling html-report.sh [17:42] apw: yup [17:43] you'll need ~ubuntu-security/.ubuntu-cve-tracker.conf for the repo settings [17:44] kees, cool will have a look at it tommorrow [17:47] okay [17:59] apw: just got back from lunch, looking at it now [18:00] stgraber, ahh ok [18:06] kees, ok that was so easy its already done :) [18:07] smb, sconklin, bjf, this url points to the kernel package etc as per the cve tracker, but showing the status as from the tip of our branch: http://people.canonical.com/~apw/cve/pkg/linux.html [18:08] auto updating much like the original [18:08] now we are not beholden to security team merges to get team status [18:09] "a man with two watched never knows what time it is. A man with one is always certain" [18:09] But nice job. At least we know which one to believe [18:09] they just mean differnt things. one is how we thinkg we are doing, the other is how security think we are doing [18:09] apw: it's so easy to use. :) [18:09] yeah [18:10] kees, to get data out of yes :-p [18:10] heh [18:10] but we are instant gratification junkies here [18:10] apw: you may want to start using scripts/check-syntax too, otherwise the exports might blow up a bit if things aren't sane [18:11] kees, tried runinng it, but it needed unspecified pre-reqs to run and exploded in a heap [18:11] hm === sforshee is now known as sforshee-lunch [18:12] kees, well we'll see it first anyhow, if our own tables explode [18:12] apw: now that you have the ~/.ubuntu-cve-tracker.conf, check-syntax should work [18:28] apw: same issue with the new kernel [18:28] stgraber, yeah not expecting anything fixed by it ... but some text in the dmesg [18:28] with APW in it [18:30] http://paste.ubuntu.com/561540/ [18:30] http://paste.ubuntu.com/561541/ [18:30] http://paste.ubuntu.com/561542/ [18:31] first paste is before starting nbd-client, second is after the first nbd-client, third is after the second nbd-client [18:32] * tgardner --> lunch [18:32] apw: ^ (not sure how closely you monitor the channel ;)) [18:33] stgraber, always best to add my nick, i am easily distracted by shiney objects [18:34] stgraber, have you got the entire thing as one dmesg? [18:34] I can cat all of them together ;) [18:34] i think its telling me the storey just not sure [18:34] http://paste.ubuntu.com/561545/ [18:35] if you are 100% sure its the complete dmesg thats great [18:35] I basically did a "dmesg -c" between each so I shouldn't have lost any entry [18:35] [ 26.659231] APW: taken [18:35] [ 27.660130] Dev nbd0: unable to read RDB block 8 [18:35] [ 27.660144] nbd0: unable to read partition table [18:35] [ 27.660148] nbd0: partition table beyond EOD, truncated [18:35] [ 36.754723] APW: taking &nbd_mutex [18:35] ok ... so that says an error path is not dropping the lock [18:35] not the last one but the one before!?! [18:40] stgraber, damn not enough info to diagnose ... will have to spin you another kenel, you about for a bit ? [18:42] apw: yep [18:42] stgraber, ooo might have an idea... what are the userspace tools called ? [18:43] nbd-client and nbd-server [18:44] stgraber, ooo I think I have it [18:47] stgraber, ok i have a theory, it might be bunnies [18:47] stgraber, building a test kernel with more debug to confirm [18:47] ok :) [19:11] stgraber, ok updated kernel in the same place [19:11] apw: ok, downloading it now === sforshee-lunch is now known as sforshee [19:16] apw: http://paste.ubuntu.com/561560/ [19:16] apw: http://paste.ubuntu.com/561559/ [19:17] management to connect to it quite a few times, no hang [19:17] stgraber, i assume that is good yes? [19:17] stgraber, shame we didn't know about this a little earlier, we could have had this fixed for a2 [19:18] apw: yep, that's perfect. I'm doing a bit more testing on it, mounting the same volume 6 times and trying to play with each mount, then unmount [19:18] to make sure it doesn't freeze at some point, but it looks great for now [19:18] stgraber, i think the fix is clear so i'll push it upstream [19:19] root@isotest-ltsp:/mnt# nbd-client -d /dev/nbd1 [19:19] Error: Cannot open NBD: Permission denied [19:19] Please ensure the 'nbd' module is loaded. [19:19] stgraber, what triggered that [19:19] just trying to unmount [19:19] s/unmount/disconnect/ [19:19] seems like I can connect fine but can't disconnect [19:19] hrm, i suspect that thats been in there for a long time [19:20] as nothing has changed other than this locking [19:20] open("/dev/nbd1", O_RDWR|O_LARGEFILE) = -1 EACCES (Permission denied) [19:20] but i suspect its a different bug [19:20] that'll still make ltsp to fail as I need to connect, check the block device and disconnect everytime. If disconnect fails, the check will either be pointless or will crash somehow [19:21] well as i say there are exactly 0 changes other than this specific lock change [19:21] which only affect ioctl [19:21] so i am suspicious its not new [19:21] wasn't there in maverick ;) though might be caused by the new client [19:22] the EACCESS cirtianly implies the kernel hated you [19:23] stgraber, can you look see hwat nbd devices are listed in /sys [19:23] stgraber, anything in dmesg in concert with the open failure ? [19:24] nope, nothing shows up in dmesg [19:24] /sys shows all 16 devices [19:24] stgraber, and are there any nbd-client thingies running? [19:25] and any nbd_threads ? [19:25] yep, my 7 nbd-client are still there [19:25] so you cannot stop any of them ? [19:25] indeed [19:25] and they are still working [19:25] so if I do: nbd-client -d /dev/nbd0 [19:25] I get the error message [19:26] then if I try to mount it, it works just fine [19:26] and its not mounted [19:26] what does lsof /dev/nbd1 say [19:26] yep, I unmounted all of them before testing the disconnect [19:26] COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME [19:26] nbd-clien 1395 root 3u BLK 43,16 0t0 5489 /dev/nbd1 [19:29] same issue using maverick's nbd-client [19:30] quickly trying to connect on the same nbd server from a maverick VM to make sure it worked fine then [19:31] root@desktop-maverick01:~# nbd-client -d /dev/nbd1 [19:31] Disconnecting: que, disconnect, sock, done [19:31] apw: that's with the same userspace on both natty and maverick, so something must have change somewhere in the kernel [19:32] stgraber, yep ... shame we don't test more often [19:33] Should be possible to make some automatic test of it, that's basically what I did to get a test environment: http://paste.ubuntu.com/561547/ [19:34] then nbd-client -d /dev/nbdX to test the disconnect [19:38] anyone have clues on how to further debug kernel-only edid failures? bug 712075 [19:38] Launchpad bug 712075 in linux "[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid" [Undecided,New] https://launchpad.net/bugs/712075 [19:39] * bjf -> lunch === bjf is now known as bjf[afk] [19:40] kees: ask ickle :] [19:41] ohsix: okay, cool [19:41] ohsix: where can I find them? [19:41] a lot are just plain wrong though; but theres a sysfs file to write a working cached copy iirc [19:41] #intel-gfx [19:41] ohsix: the issue with this is that it spontaneously fails. it works most of them time and then in the middle of a running session, bombs out [19:42] ah [19:46] stgraber, are you able to compile the user space tools ? [19:47] if so could you try changng the open in disconnect() to be a O_RDONLY open and see if it works then ? [19:48] grabbing the source now [19:49] not sure if it makes sense that it cannot open it RDWR but wnat to know if that helps [19:51] root@isotest-ltsp:~/nbd-2.9.16# nbd-client -d /dev/nbd1 [19:51] Disconnecting: que, disconnect, sock, done [19:52] worked fine [19:59] stgraber, hrm [19:59] stgraber, are there any normal commands, like status ones, from that client [20:00] and if so do they work [20:04] apw: hmm, now I can disconnect but can't reconnect after that ... [20:04] I have a confcall just now, will continue debugging after that [20:04] try making the other open thats RDWR in the thing RDONLY [20:04] pretty sure you shouldn't need to, but suspect it will work [20:07] stgraber, do you have a complete strace ? [20:08] stgraber, of the sucessful attach the first time [20:09] if (ioctl(nbd, BLKROSET, (unsigned long) &read_only) < 0) [20:09] err("Unable to set read-only attribute for device"); [20:09] stgraber, i think it is doing it to itself ... seting the device read only so even it cannot open it [20:10] possibly that is hanging over on last close and it should not, but i think it is right that the disconnect cannot open [20:10] apw: making all of them read-only works fine :) [20:11] stgraber, and you can mount the disk still ? [20:11] JFo, mind putting kees' bug 712075 on the kernel team's review list? Gots a patch from upstream. [20:11] Launchpad bug 712075 in linux "[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid" [High,Triaged] https://launchpad.net/bugs/712075 [20:12] * JFo opens the bug [20:12] bryceh, JFo done [20:12] will do bryceh :) [20:12] ooh, he's quick on the draw that one :) [20:13] thanks apw [20:13] stgraber, i am not sure if i know if the userspace was broken or the kernel is now broken [20:15] stgraber, will think on that for a bit [20:17] stgraber, can u stress test the open close mount unmounts against it with the new tools [20:17] and let me knwo if anything else goes wrong [20:17] cirtianly test you can read/write files on the filesystems inside [20:19] apw, JFo, thanks [20:21] apw: I think 711951 is a dupe. [20:22] apw: I've just sent a patch to lkml to fix a lockig issue in nbd. [20:22] I was looking for the bug reference on launchpad when I found this new one. [20:24] apw: http://paste.ubuntu.com/561592/ [20:24] stgraber: ^ [20:25] https://lkml.org/lkml/2011/2/2/237 [20:25] soren: so both of you fixed it apparently ;) [20:26] I've just sent the patch to the kernel-team ml. [20:27] * jjohansen -> lunch [20:29] soren, has Paul queued it for upstream submission ? [20:30] tgardner: He told me to send it directly to akpm. [20:30] tgardner: I suppose that's a "yes". [20:30] soren: ok, I'll pull it in until it conflicts with the next rc candidate. [20:37] tgardner: ta very much. [20:37] apw, did you already have something on deck for this nbd lockup ? [20:37] I gues I should hav eincluded this link in my e-mail: https://lkml.org/lkml/2011/1/26/131 [20:37] soren, I'll include it in the commit log === bjf[afk] is now known as bjf [20:49] tgardner, yes [20:49] about to send it out [20:49] apw, check master-next first, perhaps we're colliding [20:50] tgardner, yeah i see that hallyn has a different approach removing it [20:50] hrm [20:50] tgardner, ahh ok so you pushed it, then i'll wait and do nothing [20:50] apw, well, they pretty much just ripped it out. [20:50] the mutex, I mean [20:50] tgardner, yeah, their argument is its not needed. hard to say [20:51] apw, guess we'll find out when it goes to akpm [20:51] my patch does the opposite, and drops it during the long lived ioctl [20:51] yep [20:52] tgardner, so much for that effort :/ [20:52] such is life [20:54] some days you bite the bear, some days the bear bites back [20:54] yeah ... food then i recon [20:58] apw, I'm about to forward bug #711275 upstream - seems a bit false positive-ish - but want to check first if this is something you already know about? [20:58] Launchpad bug 711275 in xserver-xorg-video-intel "[gm45] GPU lockup (EIR: 0x00000010) during boot - EIR stuck: 0x00000010, masking" [Undecided,New] https://launchpad.net/bugs/711275 [20:58] apw, it appears to me that the GPU is hanging during boot but successfully resetting itself, but this still generates an apport crash report [21:01] bryceh, hrm, doesn't that sound a lot like the other one, the one where they recently said 'blacklisting vesafb fixes things' [21:01] though this continues rather than breaking [21:01] yeah [21:01] bug #702090 [21:01] and didn't you send that up already? [21:01] * apw pokes ubot2 [21:01] apw, yeah you're right [21:01] sounds like they are being luckier, perhaps ask them to try the blacklist [21:01] apw, actually I hadn't sent that one upstream yet, but it does look similarish [21:01] Launchpad bug 702090 in xserver-xorg-video-intel "i965gm GPU lockup if vesafb is left loaded (EIR: 0x00000010 PGTBL_ER: 0x00000100)" [High,Triaged] https://launchpad.net/bugs/702090 [21:02] bryceh, so i recon send one or other up, and test the vesafb thing on the new one [21:02] or maybe it's bug #686388 which I did revisit [21:02] Launchpad bug 686388 in xserver-xorg-video-intel "[i965gm] GPU lockup - Invalid GTT entry during Display B Fetch" [Unknown,Confirmed] https://launchpad.net/bugs/686388 [21:02] anyway can't hurt to send another report up, thanks. [21:02] what happens when we flip over to to the drmfb doesn't bare thinking about [21:02] o_O [21:02] layering violations just isn't in it [21:04] we don't give anyone any time to handle the loss of the old driver [21:04] even the people with it open get a nasty shock [21:04] (plymouth) [21:08] erf [21:09] apw, btw I'm also seeing a spate of 'GPU hanging too fast, declaring wedged!' bugs - #710321 711645 711691 [21:10] bug #710321 [21:10] Launchpad bug 710321 in xserver-xorg-video-intel "[i965gm] GPU lockup during login - GPU hanging too fast, declaring wedged!" [Undecided,New] https://launchpad.net/bugs/710321 [21:16] bryceh, does that mean "i reset it a few times and it broke again and i am giving up" [21:16] apw, seems like a good guess [21:17] apw, but this is the first time I've seen that particular syntax in a message, so dunno if it's generic verbage or something specific [21:17] yeah odd indeed [21:20] 0x00000a78: 0x0a000002: MI_DISPLAY_BUFFER_INFO [21:20] Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3] [21:20] huh [21:20] that looks promising [21:21] bryceh, that looks bad, wouldn't that be a mesa injection thing [21:21] could be [21:23] 0x000010b8: 0x0a000002: MI_DISPLAY_BUFFER_INFO [21:23] Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3] [21:23] more than one in here too [21:23] so if each was breaking it ... that might make sense [21:24] where'd you spot that Bad length error? [21:24] oh I see it [21:24] in the gpu dump yeah [21:25] of course it could be the dump tool thats wrong :/ [21:26] 0x00014a88: 0x0a000002: MI_DISPLAY_BUFFER_INFO [21:26] Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3] [21:26] there are a number in here [21:26] yeah the other bugs have it too [21:27] i suspect we could do with an arsenal script to look through these for errors and put them in as a comment on the bug [21:28] apw, yeah [21:28] actually what I'd like to do is modify the gpu hook itself to extract the warnings for us directly [21:29] bryceh, a better plan indeed if you can be bothered :) [21:29] and that's on my todo list, but down under a few other bigger priorities currently [21:30] there seem to be similar quantities of these errors in the ring as there are hangs in the dmesg [21:30] so i suspect they are worth investigating [21:30] bryceh, is there somewhere i could find a document on the format of the ring to see what it says for the sizes etc [21:30] apw, btw check out - http://www.bryceharrington.org/X/Reports/ubuntu-x-swat/totals-natty-workqueue.svg [21:32] intel is a problem :) [21:32] apw, yeah probably at http://intellinuxgraphics.org/documentation.html [21:32] apw, and thus my interest in these gpu bugs today ;-) [21:32] heh indeed [21:33] and http://www.x.org/docs/intel/ looks like it has mirrors of the docs [21:45] * apw screams about unity being balls [21:45] * apw cannot cope with the stacking issues [21:45] how hard is it to stack windows in the right order [21:47] apw: obviously harder than crashing [22:55] apw, jjohansen, look at it this way, it's harder for people to notice kernel bugs when their desktop keeps crashing (or they don't have one) [22:56] apw, jjohansen, there's a pony in there somewhere :-) [22:56] nice, a bright side [23:11] whee yet another pathological oom killer death === sconklin is now known as sconklin-gone