=== skaet is now known as skaet_afk | ||
=== smb` is now known as smb | ||
tjaalton | hmm, so x86 suspend issues should be fixed on natty? I still get bug 704550 | 09:55 |
---|---|---|
ubot2 | Launchpad bug 704550 in linux "Lenovo Thinkpad X61 fails to resume from suspend" [Undecided,New] https://launchpad.net/bugs/704550 | 09:55 |
=== _LibertyZero is now known as LibertyZero | ||
lag | apw: It seems we have the right idea already: https://wiki.linaro.org/PackageYourOwnKernel | 12:35 |
lag | apw: Thanks for your help :) | 12:35 |
apw | lag hehe good | 12:38 |
=== herton is now known as herton_lunch | ||
=== tgardner is now known as tgardner-afk | ||
=== skaet_afk is now known as skaet | ||
quup | I compiled the kernel just fine, I get the linux-image and linux-headers-...-generic package, the kernel boots and everything | 15:43 |
quup | but I can't install the headers package because: dpkg: dependency problems prevent configuration of linux-headers-2.6.38-1-generic: | 15:44 |
quup | linux-headers-2.6.38-1-generic depends on linux-headers-2.6.38-1; however: | 15:44 |
quup | Package linux-headers-2.6.38-1 is not installed. | 15:44 |
quup | how do I get a .deb for that as well? | 15:44 |
apw | quup, what did you build ? binary-generic ? | 15:44 |
quup | apw: exactly | 15:44 |
apw | you also need to build binary-indep for the common headers | 15:44 |
quup | oh ok, thanks! | 15:44 |
smb | Or binary-headers | 15:44 |
apw | yeah binary-headers looks to do the least | 15:46 |
=== Quintasan_ is now known as Quintasan | ||
quup | worked perfectly! thanks :) | 15:57 |
=== herton_lunch is now known as herton | ||
stgraber | apw: oops, indeed, that channel is probably a lot better for that kind of discussion :) | 16:50 |
apw | stgraber, this is the right time to move | 16:51 |
apw | stgraber, so if you could get the whole dmesg in case there is anything before of interest that'd be good | 16:51 |
apw | stgraber, i suspect a locking issue from the diagnostic | 16:51 |
apw | is it easy for you to test a debug kernel in this setup? | 16:51 |
=== JanC_ is now known as JanC | ||
stgraber | yep, as soon as I get the VM re-installed, I'll make a snapshot to make sure I won't trash it again. Then testing will be easy | 16:52 |
apw | stgraber, thanks | 16:52 |
apw | stgraber, i'll shove some debug in and get something building in the meantime | 16:53 |
stgraber | cdimage is a bit slow so I'll need 15min to get a natty desktop image, then I just need to install it, install nbd-server and export some empty file ;) | 16:53 |
apw | my build server may be large, but its not going to be done before you :) | 16:56 |
apw | stgraber, 32 or 64 bit ? | 16:58 |
=== tgardner-afk is now known as tgardner | ||
stgraber | apw: that one is 32bit | 17:00 |
stgraber | though if that's useful for debugging, I had the same issue with 64bit on Monday | 17:00 |
apw | stgraber, ok ... building with some lock debug | 17:00 |
apw | stgraber, don't care, just don't want to make both for speed reasons | 17:00 |
apw | building 32 bit now | 17:00 |
apw | stgraber, kernel will be here in about 5 mins ... http://people.canonical.com/~apw/lp711951-natty/ | 17:20 |
apw | stgraber, ok kernel is there | 17:27 |
apw | stgraber, poke me when you have tested it | 17:33 |
apw | kees, could we get a mergy merge please, bored of these results | 17:38 |
apw | kees, how hard would it be to have a shadow set which are built from our tip, just for our packages, obviously emitted somewhere else | 17:39 |
kees | apw: the entire export mechanism is already in the cve tracker. just need to set env vars and type "make" in the toplevel dir. | 17:40 |
JFo | <- grabbing lunch | 17:41 |
apw | kees, got a recipe you use on people ? then i can just dup it into mine :) | 17:41 |
kees | apw: but I can try to set up an automatic thing | 17:41 |
apw | kees, happy to run it myself if its that easy | 17:41 |
apw | i assume all the pre-reqs are on people if you can run it | 17:42 |
kees | apw: yeah, see ~ubuntu-security/bin/html-report.sh | 17:42 |
kees | apw: /home/ubuntu-security/bin/cron-cve-tracker.sh does the bzr pull before calling html-report.sh | 17:42 |
kees | apw: yup | 17:42 |
kees | you'll need ~ubuntu-security/.ubuntu-cve-tracker.conf for the repo settings | 17:43 |
apw | kees, cool will have a look at it tommorrow | 17:44 |
kees | okay | 17:47 |
stgraber | apw: just got back from lunch, looking at it now | 17:59 |
apw | stgraber, ahh ok | 18:00 |
apw | kees, ok that was so easy its already done :) | 18:06 |
apw | smb, sconklin, bjf, this url points to the kernel package etc as per the cve tracker, but showing the status as from the tip of our branch: http://people.canonical.com/~apw/cve/pkg/linux.html | 18:07 |
apw | auto updating much like the original | 18:08 |
apw | now we are not beholden to security team merges to get team status | 18:08 |
sconklin | "a man with two watched never knows what time it is. A man with one is always certain" | 18:09 |
sconklin | But nice job. At least we know which one to believe | 18:09 |
apw | they just mean differnt things. one is how we thinkg we are doing, the other is how security think we are doing | 18:09 |
kees | apw: it's so easy to use. :) | 18:09 |
sconklin | yeah | 18:09 |
apw | kees, to get data out of yes :-p | 18:10 |
kees | heh | 18:10 |
apw | but we are instant gratification junkies here | 18:10 |
kees | apw: you may want to start using scripts/check-syntax too, otherwise the exports might blow up a bit if things aren't sane | 18:10 |
apw | kees, tried runinng it, but it needed unspecified pre-reqs to run and exploded in a heap | 18:11 |
kees | hm | 18:11 |
=== sforshee is now known as sforshee-lunch | ||
apw | kees, well we'll see it first anyhow, if our own tables explode | 18:12 |
kees | apw: now that you have the ~/.ubuntu-cve-tracker.conf, check-syntax should work | 18:12 |
stgraber | apw: same issue with the new kernel | 18:28 |
apw | stgraber, yeah not expecting anything fixed by it ... but some text in the dmesg | 18:28 |
apw | with APW in it | 18:28 |
stgraber | http://paste.ubuntu.com/561540/ | 18:30 |
stgraber | http://paste.ubuntu.com/561541/ | 18:30 |
stgraber | http://paste.ubuntu.com/561542/ | 18:30 |
stgraber | first paste is before starting nbd-client, second is after the first nbd-client, third is after the second nbd-client | 18:31 |
* tgardner --> lunch | 18:32 | |
stgraber | apw: ^ (not sure how closely you monitor the channel ;)) | 18:32 |
apw | stgraber, always best to add my nick, i am easily distracted by shiney objects | 18:33 |
apw | stgraber, have you got the entire thing as one dmesg? | 18:34 |
stgraber | I can cat all of them together ;) | 18:34 |
apw | i think its telling me the storey just not sure | 18:34 |
stgraber | http://paste.ubuntu.com/561545/ | 18:34 |
apw | if you are 100% sure its the complete dmesg thats great | 18:35 |
stgraber | I basically did a "dmesg -c" between each so I shouldn't have lost any entry | 18:35 |
apw | [ 26.659231] APW: taken | 18:35 |
apw | [ 27.660130] Dev nbd0: unable to read RDB block 8 | 18:35 |
apw | [ 27.660144] nbd0: unable to read partition table | 18:35 |
apw | [ 27.660148] nbd0: partition table beyond EOD, truncated | 18:35 |
apw | [ 36.754723] APW: taking &nbd_mutex | 18:35 |
apw | ok ... so that says an error path is not dropping the lock | 18:35 |
apw | not the last one but the one before!?! | 18:35 |
apw | stgraber, damn not enough info to diagnose ... will have to spin you another kenel, you about for a bit ? | 18:40 |
stgraber | apw: yep | 18:42 |
apw | stgraber, ooo might have an idea... what are the userspace tools called ? | 18:42 |
stgraber | nbd-client and nbd-server | 18:43 |
apw | stgraber, ooo I think I have it | 18:44 |
apw | stgraber, ok i have a theory, it might be bunnies | 18:47 |
apw | stgraber, building a test kernel with more debug to confirm | 18:47 |
stgraber | ok :) | 18:47 |
apw | stgraber, ok updated kernel in the same place | 19:11 |
stgraber | apw: ok, downloading it now | 19:11 |
=== sforshee-lunch is now known as sforshee | ||
stgraber | apw: http://paste.ubuntu.com/561560/ | 19:16 |
stgraber | apw: http://paste.ubuntu.com/561559/ | 19:16 |
stgraber | management to connect to it quite a few times, no hang | 19:17 |
apw | stgraber, i assume that is good yes? | 19:17 |
apw | stgraber, shame we didn't know about this a little earlier, we could have had this fixed for a2 | 19:17 |
stgraber | apw: yep, that's perfect. I'm doing a bit more testing on it, mounting the same volume 6 times and trying to play with each mount, then unmount | 19:18 |
stgraber | to make sure it doesn't freeze at some point, but it looks great for now | 19:18 |
apw | stgraber, i think the fix is clear so i'll push it upstream | 19:18 |
stgraber | root@isotest-ltsp:/mnt# nbd-client -d /dev/nbd1 | 19:19 |
stgraber | Error: Cannot open NBD: Permission denied | 19:19 |
stgraber | Please ensure the 'nbd' module is loaded. | 19:19 |
apw | stgraber, what triggered that | 19:19 |
stgraber | just trying to unmount | 19:19 |
stgraber | s/unmount/disconnect/ | 19:19 |
stgraber | seems like I can connect fine but can't disconnect | 19:19 |
apw | hrm, i suspect that thats been in there for a long time | 19:19 |
apw | as nothing has changed other than this locking | 19:20 |
stgraber | open("/dev/nbd1", O_RDWR|O_LARGEFILE) = -1 EACCES (Permission denied) | 19:20 |
apw | but i suspect its a different bug | 19:20 |
stgraber | that'll still make ltsp to fail as I need to connect, check the block device and disconnect everytime. If disconnect fails, the check will either be pointless or will crash somehow | 19:20 |
apw | well as i say there are exactly 0 changes other than this specific lock change | 19:21 |
apw | which only affect ioctl | 19:21 |
apw | so i am suspicious its not new | 19:21 |
stgraber | wasn't there in maverick ;) though might be caused by the new client | 19:21 |
apw | the EACCESS cirtianly implies the kernel hated you | 19:22 |
apw | stgraber, can you look see hwat nbd devices are listed in /sys | 19:23 |
apw | stgraber, anything in dmesg in concert with the open failure ? | 19:23 |
stgraber | nope, nothing shows up in dmesg | 19:24 |
stgraber | /sys shows all 16 devices | 19:24 |
apw | stgraber, and are there any nbd-client thingies running? | 19:24 |
apw | and any nbd_threads ? | 19:25 |
stgraber | yep, my 7 nbd-client are still there | 19:25 |
apw | so you cannot stop any of them ? | 19:25 |
stgraber | indeed | 19:25 |
stgraber | and they are still working | 19:25 |
stgraber | so if I do: nbd-client -d /dev/nbd0 | 19:25 |
stgraber | I get the error message | 19:25 |
stgraber | then if I try to mount it, it works just fine | 19:26 |
apw | and its not mounted | 19:26 |
apw | what does lsof /dev/nbd1 say | 19:26 |
stgraber | yep, I unmounted all of them before testing the disconnect | 19:26 |
stgraber | COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME | 19:26 |
stgraber | nbd-clien 1395 root 3u BLK 43,16 0t0 5489 /dev/nbd1 | 19:26 |
stgraber | same issue using maverick's nbd-client | 19:29 |
stgraber | quickly trying to connect on the same nbd server from a maverick VM to make sure it worked fine then | 19:30 |
stgraber | root@desktop-maverick01:~# nbd-client -d /dev/nbd1 | 19:31 |
stgraber | Disconnecting: que, disconnect, sock, done | 19:31 |
stgraber | apw: that's with the same userspace on both natty and maverick, so something must have change somewhere in the kernel | 19:31 |
apw | stgraber, yep ... shame we don't test more often | 19:32 |
stgraber | Should be possible to make some automatic test of it, that's basically what I did to get a test environment: http://paste.ubuntu.com/561547/ | 19:33 |
stgraber | then nbd-client -d /dev/nbdX to test the disconnect | 19:34 |
kees | anyone have clues on how to further debug kernel-only edid failures? bug 712075 | 19:38 |
ubot2 | Launchpad bug 712075 in linux "[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid" [Undecided,New] https://launchpad.net/bugs/712075 | 19:38 |
* bjf -> lunch | 19:39 | |
=== bjf is now known as bjf[afk] | ||
ohsix | kees: ask ickle :] | 19:40 |
kees | ohsix: okay, cool | 19:41 |
kees | ohsix: where can I find them? | 19:41 |
ohsix | a lot are just plain wrong though; but theres a sysfs file to write a working cached copy iirc | 19:41 |
ohsix | #intel-gfx | 19:41 |
kees | ohsix: the issue with this is that it spontaneously fails. it works most of them time and then in the middle of a running session, bombs out | 19:41 |
ohsix | ah | 19:42 |
apw | stgraber, are you able to compile the user space tools ? | 19:46 |
apw | if so could you try changng the open in disconnect() to be a O_RDONLY open and see if it works then ? | 19:47 |
stgraber | grabbing the source now | 19:48 |
apw | not sure if it makes sense that it cannot open it RDWR but wnat to know if that helps | 19:49 |
stgraber | root@isotest-ltsp:~/nbd-2.9.16# nbd-client -d /dev/nbd1 | 19:51 |
stgraber | Disconnecting: que, disconnect, sock, done | 19:51 |
stgraber | worked fine | 19:52 |
apw | stgraber, hrm | 19:59 |
apw | stgraber, are there any normal commands, like status ones, from that client | 19:59 |
apw | and if so do they work | 20:00 |
stgraber | apw: hmm, now I can disconnect but can't reconnect after that ... | 20:04 |
stgraber | I have a confcall just now, will continue debugging after that | 20:04 |
apw | try making the other open thats RDWR in the thing RDONLY | 20:04 |
apw | pretty sure you shouldn't need to, but suspect it will work | 20:04 |
apw | stgraber, do you have a complete strace ? | 20:07 |
apw | stgraber, of the sucessful attach the first time | 20:08 |
apw | if (ioctl(nbd, BLKROSET, (unsigned long) &read_only) < 0) | 20:09 |
apw | err("Unable to set read-only attribute for device"); | 20:09 |
apw | stgraber, i think it is doing it to itself ... seting the device read only so even it cannot open it | 20:09 |
apw | possibly that is hanging over on last close and it should not, but i think it is right that the disconnect cannot open | 20:10 |
stgraber | apw: making all of them read-only works fine :) | 20:10 |
apw | stgraber, and you can mount the disk still ? | 20:11 |
bryceh | JFo, mind putting kees' bug 712075 on the kernel team's review list? Gots a patch from upstream. | 20:11 |
ubot2 | Launchpad bug 712075 in linux "[drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid" [High,Triaged] https://launchpad.net/bugs/712075 | 20:11 |
* JFo opens the bug | 20:12 | |
apw | bryceh, JFo done | 20:12 |
JFo | will do bryceh :) | 20:12 |
JFo | ooh, he's quick on the draw that one :) | 20:12 |
JFo | thanks apw | 20:13 |
apw | stgraber, i am not sure if i know if the userspace was broken or the kernel is now broken | 20:13 |
apw | stgraber, will think on that for a bit | 20:15 |
apw | stgraber, can u stress test the open close mount unmounts against it with the new tools | 20:17 |
apw | and let me knwo if anything else goes wrong | 20:17 |
apw | cirtianly test you can read/write files on the filesystems inside | 20:17 |
bryceh | apw, JFo, thanks | 20:19 |
soren | apw: I think 711951 is a dupe. | 20:21 |
soren | apw: I've just sent a patch to lkml to fix a lockig issue in nbd. | 20:22 |
soren | I was looking for the bug reference on launchpad when I found this new one. | 20:22 |
stgraber | apw: http://paste.ubuntu.com/561592/ | 20:24 |
soren | stgraber: ^ | 20:24 |
soren | https://lkml.org/lkml/2011/2/2/237 | 20:25 |
stgraber | soren: so both of you fixed it apparently ;) | 20:25 |
soren | I've just sent the patch to the kernel-team ml. | 20:26 |
* jjohansen -> lunch | 20:27 | |
tgardner | soren, has Paul queued it for upstream submission ? | 20:29 |
soren | tgardner: He told me to send it directly to akpm. | 20:30 |
soren | tgardner: I suppose that's a "yes". | 20:30 |
tgardner | soren: ok, I'll pull it in until it conflicts with the next rc candidate. | 20:30 |
soren | tgardner: ta very much. | 20:37 |
tgardner | apw, did you already have something on deck for this nbd lockup ? | 20:37 |
soren | I gues I should hav eincluded this link in my e-mail: https://lkml.org/lkml/2011/1/26/131 | 20:37 |
tgardner | soren, I'll include it in the commit log | 20:37 |
=== bjf[afk] is now known as bjf | ||
apw | tgardner, yes | 20:49 |
apw | about to send it out | 20:49 |
tgardner | apw, check master-next first, perhaps we're colliding | 20:49 |
apw | tgardner, yeah i see that hallyn has a different approach removing it | 20:50 |
apw | hrm | 20:50 |
apw | tgardner, ahh ok so you pushed it, then i'll wait and do nothing | 20:50 |
tgardner | apw, well, they pretty much just ripped it out. | 20:50 |
tgardner | the mutex, I mean | 20:50 |
apw | tgardner, yeah, their argument is its not needed. hard to say | 20:50 |
tgardner | apw, guess we'll find out when it goes to akpm | 20:51 |
apw | my patch does the opposite, and drops it during the long lived ioctl | 20:51 |
apw | yep | 20:51 |
apw | tgardner, so much for that effort :/ | 20:52 |
apw | such is life | 20:52 |
tgardner | some days you bite the bear, some days the bear bites back | 20:54 |
apw | yeah ... food then i recon | 20:54 |
bryceh | apw, I'm about to forward bug #711275 upstream - seems a bit false positive-ish - but want to check first if this is something you already know about? | 20:58 |
ubot2 | Launchpad bug 711275 in xserver-xorg-video-intel "[gm45] GPU lockup (EIR: 0x00000010) during boot - EIR stuck: 0x00000010, masking" [Undecided,New] https://launchpad.net/bugs/711275 | 20:58 |
bryceh | apw, it appears to me that the GPU is hanging during boot but successfully resetting itself, but this still generates an apport crash report | 20:58 |
apw | bryceh, hrm, doesn't that sound a lot like the other one, the one where they recently said 'blacklisting vesafb fixes things' | 21:01 |
apw | though this continues rather than breaking | 21:01 |
bryceh | yeah | 21:01 |
apw | bug #702090 | 21:01 |
apw | and didn't you send that up already? | 21:01 |
* apw pokes ubot2 | 21:01 | |
bryceh | apw, yeah you're right | 21:01 |
apw | sounds like they are being luckier, perhaps ask them to try the blacklist | 21:01 |
bryceh | apw, actually I hadn't sent that one upstream yet, but it does look similarish | 21:01 |
ubot2 | Launchpad bug 702090 in xserver-xorg-video-intel "i965gm GPU lockup if vesafb is left loaded (EIR: 0x00000010 PGTBL_ER: 0x00000100)" [High,Triaged] https://launchpad.net/bugs/702090 | 21:01 |
apw | bryceh, so i recon send one or other up, and test the vesafb thing on the new one | 21:02 |
bryceh | or maybe it's bug #686388 which I did revisit | 21:02 |
ubot2 | Launchpad bug 686388 in xserver-xorg-video-intel "[i965gm] GPU lockup - Invalid GTT entry during Display B Fetch" [Unknown,Confirmed] https://launchpad.net/bugs/686388 | 21:02 |
bryceh | anyway can't hurt to send another report up, thanks. | 21:02 |
apw | what happens when we flip over to to the drmfb doesn't bare thinking about | 21:02 |
bryceh | o_O | 21:02 |
apw | layering violations just isn't in it | 21:02 |
apw | we don't give anyone any time to handle the loss of the old driver | 21:04 |
apw | even the people with it open get a nasty shock | 21:04 |
apw | (plymouth) | 21:04 |
bryceh | erf | 21:08 |
bryceh | apw, btw I'm also seeing a spate of 'GPU hanging too fast, declaring wedged!' bugs - #710321 711645 711691 | 21:09 |
bryceh | bug #710321 | 21:10 |
ubot2 | Launchpad bug 710321 in xserver-xorg-video-intel "[i965gm] GPU lockup during login - GPU hanging too fast, declaring wedged!" [Undecided,New] https://launchpad.net/bugs/710321 | 21:10 |
apw | bryceh, does that mean "i reset it a few times and it broke again and i am giving up" | 21:16 |
bryceh | apw, seems like a good guess | 21:16 |
bryceh | apw, but this is the first time I've seen that particular syntax in a message, so dunno if it's generic verbage or something specific | 21:17 |
apw | yeah odd indeed | 21:17 |
apw | 0x00000a78: 0x0a000002: MI_DISPLAY_BUFFER_INFO | 21:20 |
apw | Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3] | 21:20 |
bryceh | huh | 21:20 |
bryceh | that looks promising | 21:20 |
apw | bryceh, that looks bad, wouldn't that be a mesa injection thing | 21:21 |
bryceh | could be | 21:21 |
apw | 0x000010b8: 0x0a000002: MI_DISPLAY_BUFFER_INFO | 21:23 |
apw | Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3] | 21:23 |
apw | more than one in here too | 21:23 |
apw | so if each was breaking it ... that might make sense | 21:23 |
bryceh | where'd you spot that Bad length error? | 21:24 |
bryceh | oh I see it | 21:24 |
apw | in the gpu dump yeah | 21:24 |
apw | of course it could be the dump tool thats wrong :/ | 21:25 |
apw | 0x00014a88: 0x0a000002: MI_DISPLAY_BUFFER_INFO | 21:26 |
apw | Bad length (4) in MI_DISPLAY_BUFFER_INFO, [3, 3] | 21:26 |
apw | there are a number in here | 21:26 |
bryceh | yeah the other bugs have it too | 21:26 |
apw | i suspect we could do with an arsenal script to look through these for errors and put them in as a comment on the bug | 21:27 |
bryceh | apw, yeah | 21:28 |
bryceh | actually what I'd like to do is modify the gpu hook itself to extract the warnings for us directly | 21:28 |
apw | bryceh, a better plan indeed if you can be bothered :) | 21:29 |
bryceh | and that's on my todo list, but down under a few other bigger priorities currently | 21:29 |
apw | there seem to be similar quantities of these errors in the ring as there are hangs in the dmesg | 21:30 |
apw | so i suspect they are worth investigating | 21:30 |
apw | bryceh, is there somewhere i could find a document on the format of the ring to see what it says for the sizes etc | 21:30 |
bryceh | apw, btw check out - http://www.bryceharrington.org/X/Reports/ubuntu-x-swat/totals-natty-workqueue.svg | 21:30 |
apw | intel is a problem :) | 21:32 |
bryceh | apw, yeah probably at http://intellinuxgraphics.org/documentation.html | 21:32 |
bryceh | apw, and thus my interest in these gpu bugs today ;-) | 21:32 |
apw | heh indeed | 21:32 |
bryceh | and http://www.x.org/docs/intel/ looks like it has mirrors of the docs | 21:33 |
* apw screams about unity being balls | 21:45 | |
* apw cannot cope with the stacking issues | 21:45 | |
apw | how hard is it to stack windows in the right order | 21:45 |
jjohansen | apw: obviously harder than crashing | 21:47 |
bjf | apw, jjohansen, look at it this way, it's harder for people to notice kernel bugs when their desktop keeps crashing (or they don't have one) | 22:55 |
bjf | apw, jjohansen, there's a pony in there somewhere :-) | 22:56 |
apw | nice, a bright side | 22:56 |
jjohansen | whee yet another pathological oom killer death | 23:11 |
=== sconklin is now known as sconklin-gone |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!