Sarvatt | http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0e253fdb3b5739fd8514f617ec582762bcfaea48 | 00:01 |
---|---|---|
Sarvatt | :/ | 00:01 |
Sarvatt | pasted that in the wrong place, sorry | 00:03 |
persia | Good day. I was pointed at the plymouth FTBFS, and it dies in ./configure with "configure:11953: error: Package requirements (libdrm libdrm_intel libdrm_radeon libdrm_nouveau) were not met" for powerpc and armel (the only architectures from the FTBFS list I can test). | 01:49 |
persia | Does anyone know what modules should replace that, or am I in the wrong place (drm and X are linked in my head for some reason) | 01:50 |
Sarvatt | it's the libdrm_intel thats the problem for sure, only builds on amd64/i386 for obvious reasons. libdrm-nouveau/radeon build on all arches. it was added here http://cgit.freedesktop.org/plymouth/commit/?id=f2048af97dcc862dbbb587a0cc2546ddbdbd2b0c | 02:13 |
persia | Sarvatt: So basically either libdrm-intel needs to be ported to other architectures, or ./configure needs to know not to look for it on those architectures? | 02:14 |
Sarvatt | would have to conditionalize the intel drm renderer build in the plymouth build system I think? | 02:15 |
persia | That avoids porting :) | 02:15 |
persia | And if I wanted to use plymouth with SiS, I'd need to do something like the patch you pasted to enable that? | 02:16 |
Sarvatt | it needs a KMS driver to work that way :( you could use a vesa FB though | 02:18 |
persia | Hrm, that's probably beyond me then :) I do have a powerpc with radeon, so maybe I can test with that to cover just the don't-use-intel-for-sparc/powerpc/armel/ia64 patch. | 02:19 |
persia | Thanks a lot for the guidance. | 02:20 |
Sarvatt | redhat might have done the work already, might be worth poking around their CVS | 02:20 |
Sarvatt | KMS isnt enabled in the ubuntu powerpc kernel last I checked, I build my own for my ibook | 02:21 |
persia | heh. First google result on "RedHat CVS" is an article entitled "CVS is out, Subversion is in" | 02:21 |
Sarvatt | fedora cvs sorry :) | 02:21 |
persia | Sarvatt: It's probably not enabled due to lack of testing. If it works, submit a patch to the ubuntu-kernel team. | 02:22 |
Sarvatt | poking around there now too | 02:22 |
Sarvatt | no luck, they must just not use it on powerpc | 02:24 |
Sarvatt | or they build libdrm_intel on other arches | 02:24 |
persia | Yeah. BuildRequires: pkgconfig(libdrm_intel) | 02:24 |
Sarvatt | i dont think it would really hurt anything to just build libdrm-intel1 on linux-any? | 02:26 |
persia | Let me see if it works on powerpc | 02:26 |
Sarvatt | i mean, libdrm-nouveau and libdrm-radeon are linux-any and would never be used on most other arches | 02:26 |
persia | Why linux-any rather than any? The current list is "amd64 i386 kfreebsd-amd64 kfreebsd-i386" | 02:29 |
persia | Why wouldn't nouveau or radeon be used on other arches? I have a radeon in my powerpc box, and I know several were sold with nVidia cards. | 02:31 |
Sarvatt | oh, whoops :) really though it doesnt make sense for it to build other places since its limited to integrated graphics, at least the others can come in pci/agp form | 02:31 |
persia | My powerpc radeon certainly looks like integrated graphics (older MacMini) | 02:31 |
persia | Sarvatt: Good call. libdrm-intel indeed seems to compile cleanly for at least powerpc. That seems the easiest way forward (and those people who have intel graphics cards in their sparc/powerpc/armel/ia64 boxes can file bugs if they encounter issues) | 02:46 |
Sarvatt | fedora does build libdrm-intel on powerpc, just looked in their packages | 02:47 |
persia | Excellent: there's precedence, which always helps in convincing someone to upload. | 02:48 |
persia | I have a suspicion that it's a completely untested codepath, due to lack of hardware, but I don't imagine that it matters because of the same lack of hardware. | 02:49 |
Sarvatt | they cant ever have intel graphics in their powerpc/sparc/arm/ia64 is the problem | 02:50 |
Sarvatt | well ia64 maybe but it wouldnt use libdrm-intel | 02:50 |
persia | Well, that just happens to be true today. I do have an ARM box with an Intel processor (but intel suggested an ATI video adaptor back then). | 02:50 |
bjsnider | since when does ubuntu support powerpc? | 02:51 |
persia | bjsnider: Since warty or hoary or something. | 02:51 |
bjsnider | not officially | 02:51 |
bjsnider | it was dropped | 02:51 |
persia | Um, only kinda. Not like hppa was dropped. | 02:52 |
persia | So packages still build, and it still shows up on all the QA reports, and there are still images on cdimage.ubuntu.com and thousands of users. | 02:52 |
Sarvatt | I still use it and appreciate it working still at least on powerpc :D | 02:52 |
persia | And a bunch of people fix powerpc-specific bugs. | 02:52 |
persia | It's not unless a port gets really unpopular or is just broken that it gets dropped. | 02:53 |
persia | For instance, hppa had 3 users left when it went away, and all of them agreed. | 02:53 |
Sarvatt | granted hardy was the last release I could ever install from disk on my ibook because of the ide-cd changes | 02:53 |
persia | Sarvatt: Is that a general hardware support issue, or something that could be fixed? | 02:53 |
Sarvatt | nothing 6 hours of dist-upgrades doesn't fix though :D | 02:54 |
Sarvatt | it was over my head when I looked into it last so I'm not sure, was just a problem with the cd installers because the drive works fine after its installed | 02:57 |
Sarvatt | http://ucho.ignum.cz/fedora/linux/development/ppc/os/Packages/libdrm-2.4.6-6.fc11.ppc.rpm is the fedora libdrm I looked at to see libdrm-intel1 was being built on powerpc there though | 03:00 |
persia | Well, on the bright side, once 10.04 is out, it should be a one-step dist-upgrade. | 03:00 |
persia | Anyway, bug #507765 filed to port libdrm & ideally fix plymouth ftbfs. | 03:49 |
ubottu | Launchpad bug 507765 in libdrm "Please build libdrm-intel for ports architectures" [Undecided,New] https://launchpad.net/bugs/507765 | 03:49 |
=== ripps is now known as ripps|sleep | ||
superm1 | bjsnider, would you mind adding a conflicts on your nvidia karmic 190 and later packages to the mythtv version in the karmic archives or less? the mythbuntu autobuilds PPA is compatible with your stuff, but the stuff karmic launched with isn't and we get people popping into #ubuntu-mythtv with some weird stuff that keeps coming up and is leading to that | 06:53 |
akaihola | The page https://wiki.ubuntu.com/X/Troubleshooting/Freeze contradicts itself. It claims that a backtrace is a non-symptom, but lists "[mi] EQ overflowing" and X freeze as a relevant problem. A backtrace is included with such log messages. | 07:55 |
akaihola | Also, there are no instructions about what to do in case of a "crash, not freeze" under the subheading "Non-Symptoms". | 07:56 |
akaihola | Otherwise the page is fantastic! I'd like to help improve it further, but I have insufficient knowledge to decide what information is correct. | 08:00 |
akaihola | Alternatively, what is a good place besides IRC to discuss Wiki pages, if I don't want to join the mailing lists? File a bug in Launchpad? | 08:05 |
Ng | so I had a weird thing last night where I plugged my phone into my laptop and something about USB went a little bit mental and started flapping, and X segfaulted | 09:36 |
Ng | going by the kernel messages it seems like all USB devices were appearing and disappearing | 09:36 |
Ng | very fast I mean | 09:37 |
Ng | but whatever the problem with my USB is, that shouldn't make X crash :) | 09:38 |
Ng | apport doesn't seem to have caught anything though, which is a bit weird | 09:41 |
Ng | hmm I wonder if it's https://bugs.freedesktop.org/show_bug.cgi?id=25640 | 09:47 |
ubottu | Freedesktop bug 25640 in Server/general "Reattaching USB keyboard causes double free" [Critical,Assigned] | 09:48 |
Ng | that one sounds a whole bunch more reproucible though | 09:48 |
tseliot | Ng: did you try this patch? http://lists.freedesktop.org/archives/xorg-devel/2010-January/004908.html | 10:03 |
Ng | tseliot: not yet, I don't have as predictable a reproduction scenario as the fedora bug, but the circumstances are similar | 10:06 |
Ng | (something about my USB went a bit strange when I plugged my phone into my USB hub and all the USB devices started flapping) | 10:06 |
Ng | I really hate USB, maybe I should switch to bluetooth keyboards/mice ;) | 10:07 |
seb128 | bug #507239 | 10:09 |
ubottu | Launchpad bug 507239 in xorg-server "Xorg crashed with SIGSEGV in FatalError()" [Medium,New] https://launchpad.net/bugs/507239 | 10:09 |
seb128 | is that something being worked? | 10:09 |
sebner | seb128: heya, can I ask you something (little bit offtopic)? I'm wondering if the split between in the indicator session makes sense (again, 1 icon more) + what's the plan for the future. Isn't that useless afford since Ubuntu 10.10 will ship with Gnome3 and gnome-shell with it's own indicator system? | 10:15 |
seb128 | sebner, who said we will ship gnome-shell? | 10:15 |
sebner | seb128: wondering as it's in the archive anyways and will be part(?) of GNOME3? | 10:16 |
seb128 | nothing decided | 10:16 |
seb128 | it's far to be ready | 10:16 |
seb128 | and we need something which works on non accelerated videos | 10:16 |
seb128 | netbooks | 10:16 |
seb128 | old computer | 10:16 |
seb128 | remote display | 10:16 |
seb128 | etc | 10:16 |
sebner | sure but I'm pretty sure the old (current look) will be outdated or not maintained at some point at the future and gnome-shell is the future imho | 10:17 |
seb128 | the indicator are new technologies and use dbus | 10:17 |
tseliot | seb128: I'm not sure about that bug. Maybe I can reproduce it here. I think I've experienced something similar | 10:17 |
seb128 | they can be used in gnome-shell if upstream wants | 10:17 |
seb128 | tseliot, it crashes this way when trying to open a guest session there on lucid | 10:18 |
seb128 | intel i965 | 10:18 |
jcristau | seb128: the backtraces on that bug seem fairly useless | 10:18 |
jcristau | like there are symbols for libc but not the server | 10:19 |
sebner | seb128: I never said that the indicator stuff is bad but as you said it's really something like "pushing our stuff into upstream and hoping they'll accept" | 10:19 |
seb128 | sebner, you think doing work and suggesting people try it then is the wrong way? | 10:20 |
seb128 | what would be the right one? | 10:20 |
seb128 | coming with random ideas and waiting for others to do the work? | 10:20 |
sebner | seb128: nah, doing discussion with upstream beforehand | 10:20 |
seb128 | jcristau, do you know what debug package is required for VAuditF? | 10:21 |
sebner | and/or with other distributions so more can benefit | 10:21 |
sebner | Intercompatible ftw! ... | 10:21 |
seb128 | sebner, those were discussed with upstream at boston summit hackfest 2 years ago | 10:21 |
jcristau | seb128: xserver-xorg-core-dbg | 10:21 |
seb128 | jcristau, thanks | 10:21 |
tseliot | I think -dbg packages should interfere with apport | 10:22 |
sebner | seb128: so only a matter of time, you said "if upstream wants" though so I doesn't really seem that fix | 10:22 |
seb128 | sebner, sometime people disagree on the way | 10:23 |
seb128 | sebner, it doesn't we shouldn't experiment | 10:24 |
seb128 | to be honest I'm not convinced by gnome-shell yet | 10:24 |
sebner | seb128: me neither but I have the feeling that sooner or later it will be default so I was just wondering about the ubuntu self-development stuff :) | 10:25 |
seb128 | re | 11:40 |
seb128 | jcristau, | 11:40 |
seb128 | #0 0x080ac9b4 in _XSERVTransClose (ciptr=0x8be8aa8) | 11:40 |
seb128 | at /usr/include/X11/Xtrans/Xtrans.c:930 | 11:40 |
seb128 | #1 0x080a6be4 in CloseWellKnownConnections () at ../../os/connection.c:492 | 11:40 |
seb128 | #2 0x080b0caf in SigAbortServer (signo=11) at ../../os/log.c:426 | 11:40 |
seb128 | #3 0x080b10d1 in FatalSignal (signo=11) at ../../os/log.c:560 | 11:40 |
seb128 | #4 0x080a9aa2 in OsSigHandler (signo=11, sip=0xbfb0f21c, unused=0xbfb0f29c) | 11:40 |
seb128 | at ../../os/osinit.c:157 | 11:40 |
seb128 | #5 <signal handler called> | 11:40 |
seb128 | #6 0x00000000 in ?? () | 11:41 |
seb128 | is that useful in some way? | 11:41 |
jcristau | sort of. it starts at the sigsegv signal handler though, so it doesn't catch the first crash | 11:42 |
seb128 | #6 0x00000000 in ?? () | 11:48 |
seb128 | #7 0x080b13de in FatalError (f=0x81c5206 "no screens found") | 11:48 |
seb128 | at ../../os/log.c:585 | 11:48 |
seb128 | #8 0x08067044 in main (argc=9, argv=0xbfb0f6c4, envp=0xbfb0f6ec) | 11:48 |
seb128 | at ../../dix/main.c:206 | 11:48 |
seb128 | jcristau, ups, forgot to copy that | 11:48 |
seb128 | the 0x00000000 is weird | 11:48 |
jcristau | indeed | 11:49 |
jcristau | tseliot: so, hum, patch 191 is not in git | 11:50 |
jcristau | seb128: os/log.c:585 seems to be AbortServer(); which is defined 100 lines earlier... | 11:52 |
tseliot | jcristau: no, it's not. Shall I add it with -f? .patch files are in .gitignore | 12:04 |
jcristau | tseliot: yes | 12:05 |
tseliot | ok | 12:05 |
jcristau | (or use .diff) | 12:05 |
jcristau | (or kill .gitignore from the debian or ubuntu branch) | 12:06 |
tseliot | ok, pushed | 12:11 |
tseliot | next time I'll use .diff | 12:11 |
jcristau | thanks | 12:16 |
seb128 | jcristau, any suggestion to get extra informations? | 12:20 |
seb128 | or to turn the bug about this crash about something useful | 12:20 |
jcristau | seb128: is it reproducible? can you try without the ubuntu patches? | 12:21 |
seb128 | I get it every time I try to open a guest session | 12:21 |
seb128 | but I guess it's nothing specific to the guest session | 12:21 |
seb128 | it's probably when trying to open a second xsession | 12:22 |
jcristau | sounds likely | 12:22 |
seb128 | ubuntu patches for which source? | 12:22 |
jcristau | xorg-server | 12:22 |
seb128 | ok, will try | 12:22 |
seb128 | I guess that takes a bit to build | 12:22 |
jcristau | there's a couple that touch os/log.c | 12:22 |
jcristau | something like 15min on my laptop | 12:22 |
seb128 | I will do a noopt nostrip build | 12:22 |
seb128 | jcristau, which patches do you recommend to comment, all the patches directory or just the one added in ubuntu compared to debian? | 12:39 |
jcristau | seb128: 100 and 160 in particular | 12:46 |
seb128 | ok, that's the one I turned off since they were touching log.c, it's building | 12:47 |
seb128 | also other issue | 12:49 |
seb128 | on the mini10v and lucid the screen flickers every few seconds after suspend | 12:49 |
seb128 | what is useful to debug such issues? | 12:49 |
jcristau | try i915.powersave=0 on the kernel cmdline | 12:51 |
seb128 | ok, no luck getting a debug stacktrace | 13:43 |
seb128 | it's not crashing now | 13:43 |
seb128 | I got the xorg mouse pointer over all vts | 13:43 |
seb128 | and vt7 displaying text | 13:43 |
seb128 | instead of my session | 13:43 |
seb128 | but things are still running | 13:43 |
seb128 | weird... | 13:43 |
tseliot | seb128: as I said, -dbg packages shouldn't work particularly well with apport (if that's your problem) | 13:57 |
seb128 | no it's not | 13:58 |
tseliot | ok | 13:58 |
seb128 | I did use gdb to get the stacktrace I copied before | 13:58 |
tseliot | yes, that should work | 13:58 |
seb128 | and did load the symbol by hand using "symbol file" | 13:58 |
seb128 | the automatic loading is broken for some reason | 13:58 |
seb128 | now I tried a build without the patch which touch log.c | 13:58 |
seb128 | but it doesn't crash without those, it just destroy the graphical view and display a vt | 13:59 |
seb128 | but desktop proccess are still running | 13:59 |
tseliot | patches 100 and 160? | 13:59 |
seb128 | yes | 14:02 |
tseliot | seb128: what does the Xorg.0.log say if you ssh into that computer? | 14:03 |
seb128 | I don't need to ssh, vts are still working | 14:04 |
seb128 | there is nothing in the xorg logs | 14:04 |
seb128 | no error | 14:04 |
tseliot | I'm a bit concerned about this: FatalError (f=0x81c5206 "no screens found") | 14:05 |
tseliot | which, I must admit, is a bit vague | 14:06 |
tseliot | seb128: does dmesg say anything interesting? | 14:07 |
seb128 | tseliot, no | 14:10 |
seb128 | does opening a guest session is working? | 14:10 |
seb128 | does opening a guest session is working for you? | 14:11 |
seb128 | does opening a guest session is working for you? | 14:11 |
seb128 | ups | 14:11 |
tseliot | seb128: can you reproduce the problem if you disable KMS? | 14:15 |
seb128 | tseliot, I will try later | 14:15 |
seb128 | I'm using this computer now | 14:15 |
tseliot | ok | 14:16 |
tseliot | apw: did you try my fix for X? | 14:18 |
seb128 | tseliot, did you try the guest session? | 14:33 |
tseliot | seb128: no, I didn't | 15:14 |
=== ripps|sleep is now known as ripps | ||
=== yofel_ is now known as yofel | ||
Sarvatt | do these guest session bugs that we're getting hit with really look like xserver? it just looks like it might be a GDM problem to me, guest sessions trying to spawn a new xserver instance with the old one still active for some reason? | 18:09 |
jcristau | Sarvatt: if the server is segfaulting, that sounds like a server bug :) | 18:10 |
Sarvatt | I get eerily similar gdm logs when I have to start in failsafeX, x segfaults and I get [ 0.000000] (WW) xf86CloseConsole: KDSETMODE failed: Bad file descriptor | 18:12 |
Sarvatt | [ 0.000099] (WW) xf86CloseConsole: VT_GETMODE failed: Bad file descriptor | 18:12 |
Sarvatt | [ 0.000132] (WW) xf86OpenConsole: VT_GETSTATE failed: Bad file descriptor | 18:12 |
Sarvatt | (when I dont have failsafeX set to use fbdev instead of vesa and am using KMS) | 18:12 |
jcristau | so X can't get a fd to the console? | 18:12 |
Sarvatt | Fatal server error: | 18:13 |
Sarvatt | Server is already active for display 0 | 18:13 |
Sarvatt | If this server is no longer running, remove /tmp/.X0-lock | 18:13 |
Sarvatt | and start again. | 18:13 |
jcristau | hmm | 18:13 |
Sarvatt | and it tries to spawn the new one on tty2 when tty7 is still active | 18:13 |
jcristau | why does it try to start the new server as :0? | 18:14 |
Sarvatt | just looked over my logs again, that was happening when fsck on boot was kicking me to failsafe 100% of the time, and i think it was because gdm would start, crash because something (dbus?) wasn't ready and relaunch itself before the other cleanly ended? failsafe ended up on :1 and tty2 when it finally worked. i really can't follow this gdm startup maze | 18:44 |
Sarvatt | it's over my head and i'm just adding noise, it seems to be happening with guest sessions for everyone now though. seen bugs from people with ati intel and s3 getting it. lets see if it crashes here :) | 18:50 |
Sarvatt | yep it kicks me to a VT with the mouse still working when I do a guest session | 18:56 |
Sarvatt | :0-slave.log says gdm-session-worker[3500]: pam_unix(gdm-autologin:session): session opened for user robert by (uid=0) | 18:56 |
Sarvatt | /etc/gdm/PreSession/Default: 16: initctl: not found | 18:56 |
Sarvatt | seb128: I fixed guest sessions here by changing /etc/gdm/PreSession/Default to call /sbin/initctl -q emit desktop-session-start DISPLAY_MANAGER=gdm instead of just initctl -q emit desktop-session-start DISPLAY_MANAGER=gdm | 19:08 |
Sarvatt | there was just a bunch of changes to the paths being exported around in the sessions, guessing it needed changing after that since /sbin isnt in the path for it anymore | 19:09 |
Sarvatt | of course I cant get back into my logged in session after trying to switch back though, its stuck with the gdm background on the screen. but switching to a guest session works :D | 19:13 |
Sarvatt | this is the commit I was talking about messing with the paths http://git.gnome.org/browse/gdm/commit/?id=e33ee9d9b23c103ac25b6fdb53fe8c074de0de53 | 19:26 |
bryyce | hi Sarvatt | 19:33 |
Sarvatt | heyo! | 19:33 |
bryyce | Sarvatt, think we need that patch added to gdm? | 19:33 |
bryyce | maybe we should point it out to seb128 | 19:34 |
Sarvatt | yeah pinged him there, doubt hardcoding /sbin/ is the right thing to do, just know it works here | 19:36 |
bryyce | Sarvatt, is there a launchpad bug open about this issue? | 19:38 |
Sarvatt | loooots, looks like some people are getting X crashes also thats probably a different issue (vish?) | 19:40 |
Sarvatt | bug #506510 | 19:41 |
ubottu | Launchpad bug 506510 in xorg-server "Xorg crashed with SIGSEGV in FatalError()" [Medium,New] https://launchpad.net/bugs/506510 | 19:41 |
Sarvatt | bug #507239 | 19:41 |
ubottu | Launchpad bug 507239 in xorg-server "Xorg crashed with SIGSEGV in FatalError()" [Medium,New] https://launchpad.net/bugs/507239 | 19:41 |
vish | i tried with > /sbin/initctl -q emit desktop-session-start DISPLAY_MANAGER=gdm | 19:41 |
vish | but it kept kicking me out ;) | 19:41 |
Sarvatt | oh I changed the path to have /sbin in it too | 19:42 |
Sarvatt | PATH="/usr/bin:/sbin:$PATH" | 19:42 |
* vish tires again ;) | 19:42 | |
vish | tries* | 19:43 |
Sarvatt | http://paste.ubuntu.com/357210/ | 19:43 |
vish | \o/ | 19:44 |
vish | Sarvatt: works | 19:45 |
Sarvatt | sorry about that | 19:45 |
vish | np :) | 19:45 |
vish | Sarvatt: thanks ... | 19:45 |
bryyce | Sarvatt, so I guess the question for us is if X should not be crashing in this situation but should exit with a friendly error message? | 19:45 |
Sarvatt | I have no idea what magic gdm is doing to segfault X like that, it was doing it sometimes before we waited for dbus to have started and when fsck's were going when it tried to start before too.. I only noticed the error in :0-slave.log because of the timestamp, thats not getting attached to the bugs | 19:48 |
bryyce | hrm | 19:49 |
bryyce | yeah the stack traces on 506510 are fairly useless | 19:49 |
Sarvatt | trying to respawn a second server with DISPLAY=:0 when its already taken I guess? | 19:49 |
bryyce | Sarvatt, you are able to reproduce this bug? | 19:50 |
Sarvatt | yeah you can too, just try to log in a guest session on lucid | 19:51 |
Sarvatt | (an up to date one) | 19:51 |
Sarvatt | not machine specific as far as I can see, seen it on ati intel nvidia and s3 | 19:51 |
Sarvatt | and SIS | 19:51 |
vish | bryyce: thanks , was just about to comment on the bug about Sarvatt's fix :) | 19:52 |
Sarvatt | launchpad + tethered EDGE connection dont go well together :D | 19:53 |
bryyce | Sarvatt, ok sounds like the next step would be to get a gdb attached to xorg-server and figure out where it is crashing | 19:53 |
Sarvatt | seb128 did that earlier, have the scrollback handy? | 19:53 |
bryyce | ah | 19:53 |
* bryyce looks | 19:53 | |
Sarvatt | around 6:40AM EST | 19:54 |
bryyce | mm | 19:59 |
bryyce | jcristau suspects patch 100 or 160 | 19:59 |
jcristau | bryyce: it's the 2 patches that touched the code that seemed to crash. might still be something else. | 20:03 |
jcristau | but i thought it was worth testing without that | 20:03 |
bryyce | 160 has been in for a while | 20:03 |
bryyce | 100 is a few weeks old, it might be worth a more detailed review | 20:04 |
Sarvatt | yeah would be nice to find why it happens because it seems easy to make gdm do it :D time to build a server with no ubuntu patches and see if it happens since its so easily reproducable right now | 20:04 |
seb128 | I don't get a crash on a debug build without those | 20:05 |
seb128 | debug build = noopt nostrip | 20:05 |
bryyce | however, the os/log.c stuff is normal to pass through on crashes so could be something higher up | 20:05 |
seb128 | it doesn't segfault but it still bugs | 20:05 |
seb128 | vt7 goes away and get the xorg mouse pointer over any vt | 20:05 |
Sarvatt | thats what happens here with the stock packages | 20:06 |
seb128 | vt7 goes away = it's in text mode with some fsck lines which seems to come from boot | 20:06 |
Sarvatt | puts me to vt8 and i have a mouse over my vt's | 20:06 |
seb128 | but ps still lists the GNOME processes running | 20:06 |
seb128 | I get no backtrace in Xorg.0.log though | 20:06 |
seb128 | nor error | 20:06 |
Sarvatt | weird that i dont get any errors outside of the segfault message in dmesg, nothing in Xorg.0.log or /var/log/gdm/:0.log (which is just a copy of /var/log/Xorg.0.log) | 20:07 |
bryyce | Sarvatt, is that with or without patch 100? | 20:08 |
Sarvatt | the stock packages in lucid | 20:08 |
bryyce | patch 100 changes how the crash is reported | 20:08 |
Sarvatt | with | 20:08 |
bryyce | hm | 20:08 |
seb128 | without the patch I don't get apport triggering which is expected | 20:08 |
bryyce | with patch 100 should give *more* info not less | 20:08 |
seb128 | but it's not written in Xorg.0.log either | 20:08 |
seb128 | with the patch I get apport triggering | 20:09 |
bryyce | ok, | 20:09 |
seb128 | that's the stacktrace I copied on the chan earlier if you read backlog | 20:09 |
seb128 | earlier being some 9 hours ago | 20:09 |
bryyce | well I do know there are situations where X can crash but no backtrace goes into Xorg.0.log so that's not too unusual | 20:09 |
bryyce | seb128, yeah going through it now | 20:09 |
Sarvatt | maybe its a second xserver trying to start thats actually crashing for me and the first stays up | 20:10 |
bryyce | seb128, you're right that line #6 looks weird. #6 0x00000000 in ?? () | 20:10 |
bryyce | line 585 in log.c is AbortServer(); | 20:11 |
bryyce | my guess there is that AbortServer() calls AbortDDX(), which is a virtual function called differently for different hw's | 20:12 |
* bryyce ponders | 20:12 | |
jcristau | yeah so there was two issues. first is why was it calling FatalError in the first place, and then why FatalError was calling a null function pointer | 20:12 |
bryyce | could in the guest mode case, it's using some hw thingee with no defined AbortDDX()? | 20:13 |
jcristau | it's all Xorg afaik.. | 20:14 |
bryyce | ok well as to the question of why it's calling FatalError in the first place, it's this code: | 20:15 |
bryyce | if (screenInfo.numScreens < 1) | 20:15 |
bryyce | FatalError("no screens found"); | 20:15 |
bryyce | so that *should* be just our nice friendly neighborhood "No screens available" error | 20:15 |
bryyce | maybe it's a miscommunication between X and gdm about what screen number to use | 20:16 |
bryyce | next step for that probably would be to examine what args X is getting called with (ps should say) | 20:17 |
bryyce | and compare with what gdm is actually doing | 20:17 |
jcristau | one reason i'm sort of uncomfortable with the os/log.c patches is that this is touching signal handlers, and there's not a lot of stuff you can do safely in signal handlers | 20:17 |
bryyce | Sarvatt / seb128, in the case where you aren't using patch 100, can you confirm whether you are seeing "No screens found" listed in one of the gdm log files? | 20:18 |
Sarvatt | going to take about 30 minutes to build here, speedy atom cpu :) | 20:19 |
jcristau | but we wouldn't get in these signal handlers if stuff worked fine in the first place so that doesn't explain why it fails InitOutput.. | 20:20 |
bryyce | jcristau, well I can certainly say I don't like how much code patch 100 touches. The old patch was a lot more concise. But due to the way signal handling stuff has changed it was not as easy to keep all the code in one place | 20:20 |
bryyce | jcristau, I know messing with the signal handlers is bad but we have to get apport hooked in somehow, and the X signal handling stuff eats the signals by default, which prevents apport from working | 20:21 |
jcristau | ack | 20:21 |
bryyce | jcristau, I would *love* it if we could render this into something worth upstreaming, I hate having to maintain this patch as ubuntu-only | 20:21 |
Sarvatt | would the sudo DISPLAY=:0 xinit& output be useful at all? because thats what I see in gdm's log and what the s3 guy's gdm log showed | 20:23 |
bryyce | Sarvatt, couldn't hurt | 20:24 |
jcristau | gdm is calling xinit? | 20:24 |
Sarvatt | http://paste.ubuntu.com/357228/ | 20:24 |
Sarvatt | http://paste.ubuntu.com/357230/ | 20:25 |
Sarvatt | second one was the GdmLog attached to the savage bug | 20:25 |
* bryyce ews lines 15-17 | 20:26 | |
bryyce | Sarvatt, ok what we need is what the X command line was, to see if it was indeed trying to start two X's on :0 | 20:26 |
bryyce | Sarvatt, and if so, instrument gdm to show its work | 20:27 |
bryyce | jcristau, hmm, looking at seb128's backtrace, it is crasing on this bit of code in CloseWellKnownConnections(): | 20:35 |
bryyce | for (i = 0; i < ListenTransCount; i++) | 20:35 |
bryyce | _XSERVTransClose (ListenTransConns[i]); | 20:35 |
bryyce | any ideas on what this does? | 20:36 |
Sarvatt | hmm cant get gdb to load the symbols for xserver-xorg-core-dbg or -dbgsym | 20:39 |
bryyce | hmm, looking at Xtrans.c, it doesn't have null pointer checking, so if ListenTransConns[i] is NULL, it'd crash. Perhaps that bit of code needs to do some nullptr checking? | 20:39 |
seb128 | Sarvatt, symbol /usr/lib/debug/usr/bin/Xorg | 20:40 |
seb128 | (I'm not really around, just passing in front of computer) | 20:40 |
Sarvatt | thanks seb128, used to load that automatically | 20:40 |
seb128 | right, it's buggy for some reason | 20:41 |
Sarvatt | guess it just plain wont load the dbgsym symbols at all | 20:46 |
Sarvatt | -dbg worked though | 20:46 |
bryyce | Sarvatt, ok I think I understand the crash | 20:47 |
Sarvatt | fix = disable xace/xselinux? :D | 20:50 |
* Sarvatt hides | 20:50 | |
bryyce | Sarvatt, main() notices theres no screens so issues a FatalError(), which is fine | 20:52 |
bryyce | now in FatalError(), it calls AbortServer() (which is just a pass-through to SigAbortServer(0)) | 20:53 |
bryyce | here's where it gets fun | 20:53 |
bryyce | SigAbortServer calls CloseWellKnownConnections() to do some sort of cleanup I don't understand | 20:54 |
bryyce | it continues on with cleanup work | 20:54 |
Sarvatt | ending secure connections thanks to selinux support? | 20:54 |
bryyce | somewhere this triggers signal 11 | 20:55 |
jcristau | Sarvatt: xselinux is disabled in ubuntu.. | 20:55 |
bryyce | so now we find ourselves in OsSigHandler() | 20:55 |
bryyce | so now it hits FatalSignal(11) (fine), and here we get to SigAbortServer(11) | 20:55 |
bryyce | again, it calls CloseWellKnownConnections() | 20:56 |
bryyce | but this time through, the stuff has already been cleared, so a null pointer gets called | 20:56 |
bryyce | *boom* | 20:56 |
bryyce | so... we have three bugs | 20:56 |
bryyce | a) why is X getting called by gdm with :0 twice | 20:56 |
bryyce | b) why is the signal 11 being thrown during an ordinary FatalError() call? | 20:57 |
bryyce | c) why is CloseWellKnownConnections() brittle at being called twice? | 20:57 |
bryyce | c could be fixed by just adding a nullptr check in the Xtrans stuff. dunno if that's warranted | 20:58 |
Sarvatt | just starting /usr/bin/Xorg I get the same segfault, dont even have to specify :0 | 20:58 |
jcristau | Sarvatt: :0 is the default | 20:58 |
bryyce | or actually we could probably handle the nullptr check in CloseWellKnownConnections() | 20:58 |
jcristau | Sarvatt: so if you don't give a display it uses that | 20:58 |
bryyce | anyway, wife says I must eat, so bbiab | 20:58 |
Sarvatt | whats with the trash after ddxSigGiveUp: Closing log? http://paste.ubuntu.com/357244/ | 21:03 |
Sarvatt | yeah same here, wife's waiting for me to make dinner :D | 21:03 |
bryyce | o_O | 21:36 |
bryyce | Sarvatt, aha. Looks like there are some error messages logged after the log is closed. | 21:36 |
bryyce | that can't be good | 21:36 |
bryyce | Sarvatt, boy I bet that's the explanation for the sig 11 throw | 21:37 |
bryyce | totally | 21:39 |
bryyce | ok easy fix | 21:39 |
Sarvatt | woohoo! | 21:40 |
bryyce | hmm, I wonder if this is the same root cause for bugs like lp #508035 | 22:25 |
ubottu | Launchpad bug 508035 in xorg-server "Xorg crashed with SIGSEGV in <signal handler called>()" [Medium,New] https://launchpad.net/bugs/508035 | 22:25 |
=== ubott2 is now known as ubottu |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!