/srv/irclogs.ubuntu.com/2012/04/03/#ubuntu-x.txt

mornauhi, i'm having trouble starting X with the radeon driver on oneiric with Sarvatt's x updates: https://bugs.launchpad.net/ubuntu/+source/xorg/+bug/97202400:47
ubottuLaunchpad bug 972024 in xorg (Ubuntu) "RADEONDRIGetVersion failed because of version mismatch (1.17.0 -> 2.10.0)" [Undecided,New]00:47
mornauI would appreciate any help you can provide, or hints on how to improve my report 00:48
lesshastehi02:09
lesshasteI have had to turn off 3d as the system freezes regularly 02:09
lesshasteI get02:09
lesshaste[ 1934.458109] [drm] nouveau 0000:01:00.0: fail pre-validate sync02:09
lesshaste[ 1934.458114] [drm] nouveau 0000:01:00.0: validate vram_list02:09
lesshaste[ 1934.458238] [drm] nouveau 0000:01:00.0: validate: -1602:09
lesshasteis this a known problem?02:09
lesshastealso seems to be this bug02:11
lesshastehttp://bugs.debian.org/cgi-bin/bugreport.cgi?bug=66376302:11
ubottuDebian bug 663763 in xserver-xorg-video-nouveau "xserver-xorg-video-nouveau: X server freeze" [Important,Open]02:11
RAOFlesshaste: On what chip?02:13
lesshaste01:00.0 VGA compatible controller: nVidia Corporation NV44 [Quadro NVS 285] (rev a1)02:13
lesshasteRAOF, does that answer the question?I am not sure how to find out the exact chip02:14
RAOFThat's enough, yeah.  The other way would be “dmesg | grep nouveau | grep Detected”02:15
RAOFSo, when you asked in #nouveau mumpf suggested that might be fixed by a new libdrm.02:15
lesshasteRAOF, well yes but one in the future02:16
lesshasteit's a pretty fatal bug02:16
lesshasteand 11.10 doesn't have the latest kernel/X02:16
lesshasteso I was hoping there might be interim fixes02:16
RAOFYou can try a newer kernel, but that doesn't look particularly promising.02:19
lesshasteRAOF, ok thanks.. so just a newer kernel, not a new X ?02:19
RAOF(You can try, for example, the 3.3.1 kernel here http://kernel.ubuntu.com/~kernel-ppa/mainline/ )02:19
lesshasteI am not sure where all the parts of nouveau live02:19
RAOFNewer X is unlikely to be interesting.02:19
lesshasteso the driver is entirely in the kernel?02:19
RAOFMuch of it is.02:20
lesshasteok02:20
lesshasteI vaguely remember this debate :)02:20
RAOF3D lives in mesa, which would be another candidate for update.02:20
lesshasteoh well it's definitely  a 3d problem02:20
lesshasteit all works fine in 2d02:20
scientesepiphany flickers for me02:21
scientesim using nvidia02:21
RAOFlesshaste: But a lockup is always a kernel bug :)02:21
scientesRAOF, :P02:21
scientesalso, xorg-edgers02:22
RAOFscientes: You're using the binary nvidia driver, and edgers?02:24
scientesno i am not02:24
scientesim using stock precise all over02:24
scientesand epiphany is flickering with h264 video02:25
scientesi was just participating in the conf on mainline kernel02:25
brycehRAOF, you in?02:26
scientesbryceh, he is02:27
RAOFIndeed.02:27
brycehRAOF, been poking at those memory corruption bugs this afternoon, think I got a feel for what's going on02:28
brycehRAOF, https://bugs.launchpad.net/ubuntu/+source/xorg-server?field.status:list=NEW&field.status:list=INCOMPLETE_WITH_RESPONSE&field.status:list=CONFIRMED&field.status:list=TRIAGED&field.status:list=INPROGRESS&field.tag=precise&field.tags_combinator=ALL02:28
brycehthat should get the main ones02:28
brycehnow, the bugs all have different back traces and seem to occur under a random variety of situations02:28
brycehhowever they have a few things in common02:28
bryceha ton are going through __libc_message02:29
brycehso, like it could be they're trying to print out an error message, or generate a stacktrace in Xorg.0.log, or even just print out the usual input device spew02:29
brycehthey hit malloc_printerr(), then __libc_message, and poomph02:30
brycehso this is characteristic of Ye Olde buffer overflow or other out-of-place memory write02:31
brycehand I'm betting ALL of these bugs are going to be traceable to the same flaw02:31
RAOFhttps://launchpadlibrarian.net/93074590/Stacktrace.txt looks like the initial error is in malloc, and the reason why it's going through __libc_message is that libc is abort()ing.  That would be entirely consistent with memory corruption due to out-of-place writes.02:31
brycehafaict this is not afflicting upstream, so that would suggest it's either the fault of frankenserver, or one of our patches added in precise02:32
bryceh(or maybe somehting underneath X, but slangasek didn't think that was likely)02:32
brycehit is *possible* we could narrow things down by finding the earliest one of these reports02:33
brycehif we were able to reliably reproduce it, then we could try iterating through the patches or some such02:33
brycehhowever while some people are able to get it reliably, there aren't steps identified to reliably reproduce it independently02:34
Sarvattit's going to be the first frankenserver that did it so not much narrowing down, guaranteed :(02:34
RAOFOther option is in my barrier patch (again!)02:34
brycehSarvatt, yeah I'm afraid of that too.  Offhand I think I recall seeing these reports right after we put it in02:34
brycehanyway, I'm EOD so wanted to hand that off to you.  also hoping maybe you're more clever than me and can figure it out from here02:35
brycehnow, one bit of good news, it seems that while we have a lot of reports, a large chunk of these occur on shutdown only, so won't be noticeable once we shut off apport02:36
brycehbut enough are occurring otherwise, that I think this might be our #1 bug in X for the release02:37
brycehI'm wondering if it might be something as simple as a some struct or memory chunk that changed size between 1.11 and 1.12, but some code is using the wrong size when writing it or freeing it02:38
brycehRAOF, one thing you might try is running valgrind with and without your patch, to see if anything turns up in just a static analysis?02:40
Sarvattfrom what i saw from every bug i've filed its mostly, sigsegv in X for the actual bug, signal handler called, trying to print a backtrace for that crash, some input closedown "stuff" is going on past that and trying to write to the log at the same time, memory corruption bug duped to doko's even if its unrelated because its looking at the __libc_message abort higher up02:41
Sarvattaka 100_rethrow_signals.patch causing it most likely02:42
brycehyes, some portion of the bugs are double crashes.  Crash, then in the segfault handler it's crashing again when trying to print the crash out02:43
brycehand yes the signal throw in that patch might be partly to blame; it's always been a bit of a  klunker02:44
RAOFbryceh: Yeah.02:44
brycehor I should say, we've had to debug it a few times in the past...  we might want to consider disabling that patch for the release, just to mitigate any chance of it contributing to the problem02:45
brycehthat'd mean we'd need to have people gather crash dumps manually, but that's nothing new02:45
lesshasteRAOF, sorry.. back02:51
lesshasteRAOF, it isn't a lock up exactly.. I mean you can ssh in02:51
lesshasteRAOF, and the mouse pointer moves :)02:51
RAOFlesshaste: Eh, so it's a GPU lockup then.  Still, try a newer kernel.02:56
lesshasteRAOF, k02:56
RAOFYou could also try a Precise livecd; there's a newer mesa on it, but I don't think there's been much work on 3d for the nv4x family.02:56
lesshasteRAOF, ok thanks. I still don't understand if mesa can cause a gpu lockup02:57
RAOFIt can, by submitting an invalid command stream.02:57
RAOFThe kernel would ideally disallow that, but there's all sorts of fun ways to kill the GPU.02:58
lesshasteah ok02:58
lesshastethanks02:58
tjaaltonmeh, desktop hung during the night, ssh works04:47
* RAOF suddenly remembers he has a bunch of *tests* that can easily be run under valgrind.04:52
tjaaltonhmm, turns out that turning the kvm box on really helps in getting to the session.. :P06:11
RAOFHeh.07:23
RAOFBah.  That valgrind session has not been very helpful.07:24
RAOFAlthough at least I know that the barrier event overflow code works.07:24
tjaaltonhmm, should some part of the system save the brightness setting or not?08:17
mlankhorstalready happens afaict, at least on kde08:17
tjaaltonoh08:17
tjaaltongnome-settings-daemon then :)08:18
seb128tjaalton, known feature request, no need to file it08:23
tjaaltonseb128: ok, it was filed already so I moved it there..08:24
seb128tjaalton, well you can guess that such requests got filed several times ;-)08:24
seb128like that's one of those coming back regularly08:24
tjaaltonseb128: ok found the master bug08:28
seb128tjaalton, I'm not sure what was the current trend upstream, I think some of the upstream people think it shouldn't be an user thing08:28
seb128but rather a system one08:28
tjaaltonyeah, we probably needs systemd for that ;)08:29
tjaalton-s08:29
* seb128 slaps tjaalton08:30
seb128tjaalton, you also need systemd to make you coffee right? ;-)08:30
tjaaltonseb128: good thing I don't drink that stuff :)08:30
tjaaltonah, downgrading freetype to the oneiric version fixed monospace font size issues..09:43
tjaaltonseb128: ooh, your commit fixed that too :) bug 966654 should be duped to the one you just fixed09:53
ubottuLaunchpad bug 966654 in freetype (Ubuntu) "Monospace fonts have too much space between glyphs" [Undecided,New] https://launchpad.net/bugs/96665409:53
seb128tjaalton, oh, feel free to close it then ;-)09:53
tjaaltonseb128: sure thing, upgrading now and comparing the terminal size to the one with 2.4.409:54
tjaaltoni mailed ubuntu-desktop@ early in the cycle about the regression, but didn't look deeper into what caused the bump in the font size09:54
seb128tjaalton, upstream reply seems the old style is wrong09:57
seb128tjaalton, upstream just replied to my email09:58
seb128"For years, FreeType had this metrics09:58
seb128bug, and unfortunately the users got used to the appearance of far too09:58
seb128widely spaced lines.  What FreeType now returns is what the font09:58
seb128designer has had in mind while designing the font, and what can be09:58
seb128found in the font."09:58
tjaaltonmeh09:59
seb128tjaalton, well I've no strong opinion on rendering but I want _ to display in gtk when underline markup is set ;-)10:01
tjaaltonseb128: yeah, my vote is to keep it this way for lts and fight the issues with upstream for q->?10:02
seb128tjaalton, that's sort of what my upload and email to upstream just set up for ;-)10:03
seb128i.e revert to avoid regression and let them time to deal with it10:03
tjaaltonyup10:03
tjaaltonand confirmed that I get the same font as on 11.10, whee10:04
tjaaltonyup, total horizontal size of the terminals 344 -> 416 characters :):)10:24
tjaaltonof course now i've used this for months, takes some time to get used to the old again10:24
RAOFbryceh: I've got a valgrind log of one of those X crashes in abort().  From which I can determine three things: (1) the Intel driver *still* does a huge bunch of stuff valgrind considers suspicious on initialisation, (2) in this case the crash is triggered by the [mi] EQ overflow message being written, and (3) there seems to be some valgrind-suspicious ioctls in synaptics initialisation, but I didn't have debugging symbols for that.11:38
RAOFGiven that trigger, and that X under valgrind is hella slow, I should be able to trigger it again with more debugging symbols.  Tomorrow, though ☺.11:50
=== yofel_ is now known as yofel
mornauricotz, Sarvatt: could one of you help me with this one by chance? https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/97202415:03
ubottuLaunchpad bug 972024 in xserver-xorg-video-ati (Ubuntu) "RADEONDRIGetVersion failed because of version mismatch (1.17.0 -> 2.10.0)" [Undecided,New]15:03
Sarvattmornau: I just looked into it and apparently ati just needs a rebuild, but the machine i do my uploads from got hosed from the grub update and i'm fixing that so it'll be about an hour until i can upload it15:12
Sarvattati built against libdrm 2.4.32 when it requires 2.4.33 for KMS support15:13
tjaaltonthat's an edgers bug?15:15
=== lool- is now known as lool
Sarvattso it silently dropped kms support when it built - checking for LIBDRM_RADEON... no15:15
Sarvatttjaalton: yeah edgers bug and argh this iso needs to download faster so i can fix it :)15:19
Sarvattstupid macs using rEFIt that can't boot a liveusb unless they are created a certain way. you have to make the vfat file system on the usb stick directly, it wont boot off a partition15:23
ricotzSarvatt, hmm, which grub update caused this trouble?15:35
Sarvatt20ubuntu115:35
Sarvatthttps://bugs.launchpad.net/ubuntu/+source/grub2/+bug/97225015:35
ubottuLaunchpad bug 972250 in grub2 (Ubuntu) "regression from 4k_sectors.patch: "non-sector-aligned data is found in the core file"" [Critical,In progress]15:35
ricotzah, just wanted to ask if this is gpt related15:36
Sarvattyup it was, silly me left the laptop unplugged last night and got stuck with the screwed up grub install that was full of errors after the upgrade :)15:37
ricotzthat is bad indeed :\15:38
ricotzmornau, the fix is on its way15:48
mornauthanks guys, you're great15:50
mornaubtw. should i have added some tag to the bug report to ensure you get notified? (It's a pity that common tags aren't documented on launchpad)15:54
ricotzmornau, i think you could use something like "[xorg-edgers ppa] ..." as bug title15:57
mornaualright, i'll try to think of it next time if it turns out it's likely PPA related.15:58
ricotzmornau, you probably want to use a newer kernel too to get the kernel-space drm-updates while using edgers16:01
mornauI remember there's a PPA for newer kernel builds but i tend to forget its name, and have trouble finding it on a search engine then.16:02
ricotzthe edgers ppa contains a rebuild of the precise kernel16:03
ricotzcurrently 3.2.0-20.32 (and 3.2.0-21.34 building)16:03
mornau"apt-cache search linux-image" doesn't seem to list it on this oneiric amd64 system16:04
ricotzhttps://launchpad.net/~xorg-edgers/+archive/ppa/+sourcepub/2321458/+listing-archive-extra16:06
mornauokay, this is not yet built for amd6416:09
mornauhmm actually it is, sorry16:10
mornauthe availability of this updated kernel image might also be a good hint to add to the edgers page on launchpad16:59
brycehRAOF, ah interesting findings. If you happen to still be around and can post one of the valgrind logs somewhere, I wouldn't mind perusing it a bit myself.17:10
SarvattRAOF: building libdrm with valgrind installed should fix most of that17:14
mornauso just to report back: linux-image-3.2.0-20.32-generic with libdrm-radeon1 2.4.33+git20120403.43704256-0ubuntu0ricotz~oneiric fixes radeon X output via DRM/DRI.18:45
mornauthis kernel image looks a little buggy (gives me trouble with apparmor -> libvirtd keeps respawning, and dhcp fails somehow (haven't looked closer into it, yet)) but i guess that's why a newer image is just building now.18:48
Sarvattphew, good thing cairo 1.12 didn't get shoved in at the last second https://lists.debian.org/debian-x/2012/04/msg00076.html19:39
Sarvattexa's all kinds of busted with it19:40
brycehheh19:54
=== ajmitch_ is now known as ajmitch
RAOFbryceh: http://paste2.org/p/1965772 is the valgrind log.23:14
RAOFSarvatt: Oooh, if I build libdrm with valgrind installed it fixes a bunch of those ioctl warnings?  How?23:14
RAOFAlso, could we build-depend on valgrind for libdrm ☺23:15
brycehRAOF, thanks23:18
RAOFThis *is* easy to trigger; firing up a unity session under valgrind and thrashing around on the touchpad will overflow the EQ and trigger.  I'll rebuild libdrm to clear *that* noise out and go again.23:26
SarvattRAOF: http://cgit.freedesktop.org/mesa/drm/log/?qt=grep&q=valgrind23:26
RAOFYeah, found it by updating my libdrm git and grepping.23:27
brycehRAOF, you might also try comparing a valgrind run with patch 100 commented out23:56
RAOFI'll give that a whirl.  I *think* what will happen is that it'll die in exactly the same place; the rethrow-signals codepath is only being hit once ErrorF has thrown a SIGSEGV anyway.23:58

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!