[00:47] hi, i'm having trouble starting X with the radeon driver on oneiric with Sarvatt's x updates: https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/972024 [00:47] Launchpad bug 972024 in xorg (Ubuntu) "RADEONDRIGetVersion failed because of version mismatch (1.17.0 -> 2.10.0)" [Undecided,New] [00:48] I would appreciate any help you can provide, or hints on how to improve my report [02:09] hi [02:09] I have had to turn off 3d as the system freezes regularly [02:09] I get [02:09] [ 1934.458109] [drm] nouveau 0000:01:00.0: fail pre-validate sync [02:09] [ 1934.458114] [drm] nouveau 0000:01:00.0: validate vram_list [02:09] [ 1934.458238] [drm] nouveau 0000:01:00.0: validate: -16 [02:09] is this a known problem? [02:11] also seems to be this bug [02:11] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=663763 [02:11] Debian bug 663763 in xserver-xorg-video-nouveau "xserver-xorg-video-nouveau: X server freeze" [Important,Open] [02:13] lesshaste: On what chip? [02:13] 01:00.0 VGA compatible controller: nVidia Corporation NV44 [Quadro NVS 285] (rev a1) [02:14] RAOF, does that answer the question?I am not sure how to find out the exact chip [02:15] That's enough, yeah. The other way would be “dmesg | grep nouveau | grep Detected” [02:15] So, when you asked in #nouveau mumpf suggested that might be fixed by a new libdrm. [02:16] RAOF, well yes but one in the future [02:16] it's a pretty fatal bug [02:16] and 11.10 doesn't have the latest kernel/X [02:16] so I was hoping there might be interim fixes [02:19] You can try a newer kernel, but that doesn't look particularly promising. [02:19] RAOF, ok thanks.. so just a newer kernel, not a new X ? [02:19] (You can try, for example, the 3.3.1 kernel here http://kernel.ubuntu.com/~kernel-ppa/mainline/ ) [02:19] I am not sure where all the parts of nouveau live [02:19] Newer X is unlikely to be interesting. [02:19] so the driver is entirely in the kernel? [02:20] Much of it is. [02:20] ok [02:20] I vaguely remember this debate :) [02:20] 3D lives in mesa, which would be another candidate for update. [02:20] oh well it's definitely a 3d problem [02:20] it all works fine in 2d [02:21] epiphany flickers for me [02:21] im using nvidia [02:21] lesshaste: But a lockup is always a kernel bug :) [02:21] RAOF, :P [02:22] also, xorg-edgers [02:24] scientes: You're using the binary nvidia driver, and edgers? [02:24] no i am not [02:24] im using stock precise all over [02:25] and epiphany is flickering with h264 video [02:25] i was just participating in the conf on mainline kernel [02:26] RAOF, you in? [02:27] bryceh, he is [02:27] Indeed. [02:28] RAOF, been poking at those memory corruption bugs this afternoon, think I got a feel for what's going on [02:28] RAOF, https://bugs.launchpad.net/ubuntu/+source/xorg-server?field.status:list=NEW&field.status:list=INCOMPLETE_WITH_RESPONSE&field.status:list=CONFIRMED&field.status:list=TRIAGED&field.status:list=INPROGRESS&field.tag=precise&field.tags_combinator=ALL [02:28] that should get the main ones [02:28] now, the bugs all have different back traces and seem to occur under a random variety of situations [02:28] however they have a few things in common [02:29] a ton are going through __libc_message [02:29] so, like it could be they're trying to print out an error message, or generate a stacktrace in Xorg.0.log, or even just print out the usual input device spew [02:30] they hit malloc_printerr(), then __libc_message, and poomph [02:31] so this is characteristic of Ye Olde buffer overflow or other out-of-place memory write [02:31] and I'm betting ALL of these bugs are going to be traceable to the same flaw [02:31] https://launchpadlibrarian.net/93074590/Stacktrace.txt looks like the initial error is in malloc, and the reason why it's going through __libc_message is that libc is abort()ing. That would be entirely consistent with memory corruption due to out-of-place writes. [02:32] afaict this is not afflicting upstream, so that would suggest it's either the fault of frankenserver, or one of our patches added in precise [02:32] (or maybe somehting underneath X, but slangasek didn't think that was likely) [02:33] it is *possible* we could narrow things down by finding the earliest one of these reports [02:33] if we were able to reliably reproduce it, then we could try iterating through the patches or some such [02:34] however while some people are able to get it reliably, there aren't steps identified to reliably reproduce it independently [02:34] it's going to be the first frankenserver that did it so not much narrowing down, guaranteed :( [02:34] Other option is in my barrier patch (again!) [02:34] Sarvatt, yeah I'm afraid of that too. Offhand I think I recall seeing these reports right after we put it in [02:35] anyway, I'm EOD so wanted to hand that off to you. also hoping maybe you're more clever than me and can figure it out from here [02:36] now, one bit of good news, it seems that while we have a lot of reports, a large chunk of these occur on shutdown only, so won't be noticeable once we shut off apport [02:37] but enough are occurring otherwise, that I think this might be our #1 bug in X for the release [02:38] I'm wondering if it might be something as simple as a some struct or memory chunk that changed size between 1.11 and 1.12, but some code is using the wrong size when writing it or freeing it [02:40] RAOF, one thing you might try is running valgrind with and without your patch, to see if anything turns up in just a static analysis? [02:41] from what i saw from every bug i've filed its mostly, sigsegv in X for the actual bug, signal handler called, trying to print a backtrace for that crash, some input closedown "stuff" is going on past that and trying to write to the log at the same time, memory corruption bug duped to doko's even if its unrelated because its looking at the __libc_message abort higher up [02:42] aka 100_rethrow_signals.patch causing it most likely [02:43] yes, some portion of the bugs are double crashes. Crash, then in the segfault handler it's crashing again when trying to print the crash out [02:44] and yes the signal throw in that patch might be partly to blame; it's always been a bit of a klunker [02:44] bryceh: Yeah. [02:45] or I should say, we've had to debug it a few times in the past... we might want to consider disabling that patch for the release, just to mitigate any chance of it contributing to the problem [02:45] that'd mean we'd need to have people gather crash dumps manually, but that's nothing new [02:51] RAOF, sorry.. back [02:51] RAOF, it isn't a lock up exactly.. I mean you can ssh in [02:51] RAOF, and the mouse pointer moves :) [02:56] lesshaste: Eh, so it's a GPU lockup then. Still, try a newer kernel. [02:56] RAOF, k [02:56] You could also try a Precise livecd; there's a newer mesa on it, but I don't think there's been much work on 3d for the nv4x family. [02:57] RAOF, ok thanks. I still don't understand if mesa can cause a gpu lockup [02:57] It can, by submitting an invalid command stream. [02:58] The kernel would ideally disallow that, but there's all sorts of fun ways to kill the GPU. [02:58] ah ok [02:58] thanks [04:47] meh, desktop hung during the night, ssh works [04:52] * RAOF suddenly remembers he has a bunch of *tests* that can easily be run under valgrind. [06:11] hmm, turns out that turning the kvm box on really helps in getting to the session.. :P [07:23] Heh. [07:24] Bah. That valgrind session has not been very helpful. [07:24] Although at least I know that the barrier event overflow code works. [08:17] hmm, should some part of the system save the brightness setting or not? [08:17] already happens afaict, at least on kde [08:17] oh [08:18] gnome-settings-daemon then :) [08:23] tjaalton, known feature request, no need to file it [08:24] seb128: ok, it was filed already so I moved it there.. [08:24] tjaalton, well you can guess that such requests got filed several times ;-) [08:24] like that's one of those coming back regularly [08:28] seb128: ok found the master bug [08:28] tjaalton, I'm not sure what was the current trend upstream, I think some of the upstream people think it shouldn't be an user thing [08:28] but rather a system one [08:29] yeah, we probably needs systemd for that ;) [08:29] -s [08:30] * seb128 slaps tjaalton [08:30] tjaalton, you also need systemd to make you coffee right? ;-) [08:30] seb128: good thing I don't drink that stuff :) [09:43] ah, downgrading freetype to the oneiric version fixed monospace font size issues.. [09:53] seb128: ooh, your commit fixed that too :) bug 966654 should be duped to the one you just fixed [09:53] Launchpad bug 966654 in freetype (Ubuntu) "Monospace fonts have too much space between glyphs" [Undecided,New] https://launchpad.net/bugs/966654 [09:53] tjaalton, oh, feel free to close it then ;-) [09:54] seb128: sure thing, upgrading now and comparing the terminal size to the one with 2.4.4 [09:54] i mailed ubuntu-desktop@ early in the cycle about the regression, but didn't look deeper into what caused the bump in the font size [09:57] tjaalton, upstream reply seems the old style is wrong [09:58] tjaalton, upstream just replied to my email [09:58] "For years, FreeType had this metrics [09:58] bug, and unfortunately the users got used to the appearance of far too [09:58] widely spaced lines. What FreeType now returns is what the font [09:58] designer has had in mind while designing the font, and what can be [09:58] found in the font." [09:59] meh [10:01] tjaalton, well I've no strong opinion on rendering but I want _ to display in gtk when underline markup is set ;-) [10:02] seb128: yeah, my vote is to keep it this way for lts and fight the issues with upstream for q->? [10:03] tjaalton, that's sort of what my upload and email to upstream just set up for ;-) [10:03] i.e revert to avoid regression and let them time to deal with it [10:03] yup [10:04] and confirmed that I get the same font as on 11.10, whee [10:24] yup, total horizontal size of the terminals 344 -> 416 characters :):) [10:24] of course now i've used this for months, takes some time to get used to the old again [11:38] bryceh: I've got a valgrind log of one of those X crashes in abort(). From which I can determine three things: (1) the Intel driver *still* does a huge bunch of stuff valgrind considers suspicious on initialisation, (2) in this case the crash is triggered by the [mi] EQ overflow message being written, and (3) there seems to be some valgrind-suspicious ioctls in synaptics initialisation, but I didn't have debugging symbols for that. [11:50] Given that trigger, and that X under valgrind is hella slow, I should be able to trigger it again with more debugging symbols. Tomorrow, though ☺. === yofel_ is now known as yofel [15:03] ricotz, Sarvatt: could one of you help me with this one by chance? https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/972024 [15:03] Launchpad bug 972024 in xserver-xorg-video-ati (Ubuntu) "RADEONDRIGetVersion failed because of version mismatch (1.17.0 -> 2.10.0)" [Undecided,New] [15:12] mornau: I just looked into it and apparently ati just needs a rebuild, but the machine i do my uploads from got hosed from the grub update and i'm fixing that so it'll be about an hour until i can upload it [15:13] ati built against libdrm 2.4.32 when it requires 2.4.33 for KMS support [15:15] that's an edgers bug? === lool- is now known as lool [15:15] so it silently dropped kms support when it built - checking for LIBDRM_RADEON... no [15:19] tjaalton: yeah edgers bug and argh this iso needs to download faster so i can fix it :) [15:23] stupid macs using rEFIt that can't boot a liveusb unless they are created a certain way. you have to make the vfat file system on the usb stick directly, it wont boot off a partition [15:35] Sarvatt, hmm, which grub update caused this trouble? [15:35] 20ubuntu1 [15:35] https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/972250 [15:35] Launchpad bug 972250 in grub2 (Ubuntu) "regression from 4k_sectors.patch: "non-sector-aligned data is found in the core file"" [Critical,In progress] [15:36] ah, just wanted to ask if this is gpt related [15:37] yup it was, silly me left the laptop unplugged last night and got stuck with the screwed up grub install that was full of errors after the upgrade :) [15:38] that is bad indeed :\ [15:48] mornau, the fix is on its way [15:50] thanks guys, you're great [15:54] btw. should i have added some tag to the bug report to ensure you get notified? (It's a pity that common tags aren't documented on launchpad) [15:57] mornau, i think you could use something like "[xorg-edgers ppa] ..." as bug title [15:58] alright, i'll try to think of it next time if it turns out it's likely PPA related. [16:01] mornau, you probably want to use a newer kernel too to get the kernel-space drm-updates while using edgers [16:02] I remember there's a PPA for newer kernel builds but i tend to forget its name, and have trouble finding it on a search engine then. [16:03] the edgers ppa contains a rebuild of the precise kernel [16:03] currently 3.2.0-20.32 (and 3.2.0-21.34 building) [16:04] "apt-cache search linux-image" doesn't seem to list it on this oneiric amd64 system [16:06] https://launchpad.net/~xorg-edgers/+archive/ppa/+sourcepub/2321458/+listing-archive-extra [16:09] okay, this is not yet built for amd64 [16:10] hmm actually it is, sorry [16:59] the availability of this updated kernel image might also be a good hint to add to the edgers page on launchpad [17:10] RAOF, ah interesting findings. If you happen to still be around and can post one of the valgrind logs somewhere, I wouldn't mind perusing it a bit myself. [17:14] RAOF: building libdrm with valgrind installed should fix most of that [18:45] so just to report back: linux-image-3.2.0-20.32-generic with libdrm-radeon1 2.4.33+git20120403.43704256-0ubuntu0ricotz~oneiric fixes radeon X output via DRM/DRI. [18:48] this kernel image looks a little buggy (gives me trouble with apparmor -> libvirtd keeps respawning, and dhcp fails somehow (haven't looked closer into it, yet)) but i guess that's why a newer image is just building now. [19:39] phew, good thing cairo 1.12 didn't get shoved in at the last second https://lists.debian.org/debian-x/2012/04/msg00076.html [19:40] exa's all kinds of busted with it [19:54] heh === ajmitch_ is now known as ajmitch [23:14] bryceh: http://paste2.org/p/1965772 is the valgrind log. [23:14] Sarvatt: Oooh, if I build libdrm with valgrind installed it fixes a bunch of those ioctl warnings? How? [23:15] Also, could we build-depend on valgrind for libdrm ☺ [23:18] RAOF, thanks [23:26] This *is* easy to trigger; firing up a unity session under valgrind and thrashing around on the touchpad will overflow the EQ and trigger. I'll rebuild libdrm to clear *that* noise out and go again. [23:26] RAOF: http://cgit.freedesktop.org/mesa/drm/log/?qt=grep&q=valgrind [23:27] Yeah, found it by updating my libdrm git and grepping. [23:56] RAOF, you might also try comparing a valgrind run with patch 100 commented out [23:58] I'll give that a whirl. I *think* what will happen is that it'll die in exactly the same place; the rethrow-signals codepath is only being hit once ErrorF has thrown a SIGSEGV anyway.