/srv/irclogs.ubuntu.com/2012/04/06/#ubuntu-x.txt

mlankhorstnoon07:23
cndbryceh, so I'm working on fixing signal-unsafe logging15:34
cndthe issue is that I fear it will make quite a bit of logging broken15:35
cndand I haven't seen any logging break the X server so far, but then again it might only show up as memory corruption...15:35
jcristaui thought you were only allowed X_NONE in signal context, which didn't add the timeout15:39
jcristaumaybe i misremember15:40
jcristauapparently i do...15:41
jcristaus/timeout/timestamp/15:41
mlankhorstheya15:44
cndjcristau, what do you mean?15:45
cndI think there's been some misconceptions about what is actually allowed in signal context15:45
jcristaucnd: yeah i thought people had been careful about that, and the log was ~ just an fwrite().  guess not.15:52
cndjcristau, even fwrite isn't signal safe15:52
cndyou have to use write15:52
cndhttp://linux.die.net/man/7/signal15:52
cndit has a list of signal safe functions15:52
jcristaui guess that can take locks...15:54
brycehcnd, great, let me know how I can help17:10
cndbryceh, the question is: do we want to take the patches17:11
cndthat could ultimately make our logging less useful17:11
cndbut for which we can be sure there will be no memory or other corruption17:11
brycehcnd, fewer crashes trumps better logging I should think17:11
cndI think so too, I just wanted to get a second opinion17:11
brycehdepends on the patch and actual effects of course.  I suspect much of the logging we don't really care about that much, and if we lose something that we do, there's probably more than one way to do it17:13
brycehcnd, the one thing is that I'm wondering why it'd be crashing only now; we haven't seen these types of crashes in natty/oneiric afaict17:13
cndbryceh, yeah, I don't think the logging is the real cause of the crashing17:14
cndbut we can't be sure17:14
cndhowever, the logging does prevent us from running X under valgrind17:14
cndunder certain circumstances17:14
* bryceh nods17:15
cndbryceh, my current task for today is:17:16
cnd1. fix logging17:16
cnd2. run valgrind again to try to resolve bug 97401717:16
ubottuLaunchpad bug 974017 in unity-2d (Ubuntu) "Crash when touching trackpad with 10 fingers" [Undecided,New] https://launchpad.net/bugs/97401717:16
cndwhich may be the root cause of some of the memory corruption issues?17:16
brycehhope something turns up!17:18
cndbryceh, is it normal for the archive to be frozen now?17:23
cndI just saw skaet's email on ubuntu-devel17:23
Sarvattcnd: wow, just reproduced it and thats my same bug17:23
Sarvatt[ 11090.523] 3: /usr/bin/X (valuator_mask_set_double+0x0) [0x7fa1c508cb00]17:23
cndSarvatt, yep17:23
Sarvattthat happens when the lid is closed here17:23
brycehcnd, they froze it after beta last release too17:23
cndbryceh, ok17:24
brycehcnd, historically no, it's usually been unfrozen.  But they think this will help ensure a higher quality level at release17:24
cndit seems like if they are going to freeze the archive, they should be putting it in the release schedule methinks17:24
brycehcnd, you can still get stuff in, it just takes an extra layer of review and chance getting kicked out17:24
cndyeah17:24
brycehcnd, showing evidence of thorough testing appears to help minimize those chances17:24
cndbryceh, can you review the patch for bug 975356?18:38
ubottuLaunchpad bug 975356 in xorg-server (Ubuntu) "Logging from signal context is unsafe" [Medium,In progress] https://launchpad.net/bugs/97535618:38
cndthe patch is attached18:38
brycehcnd, on it18:39
brycehcnd, did you check that 100_rethrow_signals.patch is not adding unsafe calls?18:41
cndno, I didn't18:41
brycehok, we'll need to remember to do that.  it might be ok, I don't remember18:42
cndbryceh, it may be easier to review the patches sent to xorg-devel18:44
cndsince they are split up into easier to review commits18:44
brycehok18:45
cndthe only difference is the ubuntu patch includes an extra patch from upstream 1.12 that splits out the logging type string to a separate function18:45
cndit backported without issue18:45
cndhmm... well, valgrind now works properly18:48
cndas in, it won't die18:48
cndbut I don't get any hits18:48
cndno leads as to what the real bug is when putting lots of touches down18:49
cndbryceh, so now I can't get it to crash anymore18:53
cndwhich may mean that the logging in signal context was the real culprit all along!18:53
brycehcnd, ok finished reviewing the patches18:53
brycehcnd, well that would be pretty sweet if it's true18:54
cndbryceh, the results of your review is?18:54
brycehyet still I wonder why we didn't see this behavior previously?18:55
cndbryceh, the only two code paths that I know of that log in signal context are touch-specific18:55
cndso they didn't exist before18:55
brycehcnd, +1, I didn't spot anything erroneous, so sent my reviewed-by to the list.  my knowledge of signal code is sketchy though so dunno how useful that is...18:55
cndoh, I didn't check the list18:55
cndbryceh, if you could comment in the bug too, that would be helpful18:56
brycehokie18:56
cndI am going to go take this to #ubuntu-release18:56
cndto get their sign off18:56
brycehcnd, should we get a bit more testing before we push it into the distro?18:58
cndbryceh, how do you propose we do it?18:58
* bryceh ponders18:58
brycehwell, we've got tons of bug reports.  Might be one or two who can reproduce the bug (or a similar bug) pretty easily and have them test it?19:00
cndyeah19:00
cndbryceh, I'll throw it up in a ppa19:00
brycehI suppose the thing we're more concerned about is regressions.  I could just slap it on a couple machines here and just make sure they boot and basically work19:00
brycehbut yeah if you can ppa it, I'll scare up some testing19:00
cndk19:00
cndbryceh, I've uploaded to ppa:chasedouglas/jupiter19:04
brycehok19:04
brycehwow, that built surprisingly fast19:20
brycehwait dah, looking at the wrong one19:20
brycehcnd, hmm bunch of test failures on the build19:28
cndhmm19:28
cndI must admit I had tests turned off19:29
cndI'll enable them and check19:29
brycehPASS: xfree8619:29
bryceh/bin/bash: line 5: 14823 Segmentation fault      (core dumped) MALLOC_PERTURB_=15 ${dir}$tst19:29
brycehFAIL: touch19:29
bryceh========================================================================19:29
bryceh1 of 8 tests failed19:29
brycehhere's where it started failing:19:29
brycehTesting bytes_to_int32()19:29
brycehTesting pad_to_int3219:29
brycehUnlinking from front.19:29
bryceh[mi] Increasing EQ size to 512 to prevent dropped events.19:29
cndit helps having a quad core hyperthreaded behemoth when compiling the x server :)19:32
brycehfull output http://paste.ubuntu.com/917885/19:34
brycehhmm, output is out of order there19:35
bryceh        config_tests = --disable-unit-tests  ?19:39
brycehcan that be set via DEB_BUILD_OPTIONS?19:39
* bryceh tries 'nocheck'19:40
cndbryceh, it's an issue with the test basically19:40
cndit creates a test device, but doesn't give the device a name19:40
cndso the logging code segfaults when it tries to print the device name19:41
brycehah19:41
cndI've got a fix and am test building now19:46
Sarvattricotz: man that took way too long, here's a refreshed pointer barriers patch for newer xserver http://kernel.ubuntu.com/~sarvatt/patches/500_pointer_barrier_thresholds.diff20:09
Sarvattgonna test it out now20:09
* Sarvatt refreshed an older version of the patch first like an idiot and had to redo it20:10
Sarvattheh go figure, i refreshed it against master not 1.12 branch, have to fix up one hunk20:12
cndbryceh, if this logging is the cause of the corruption, any corruption bugs where we have the X log from the crash should also contain messages from error context, i.e. messages about not finding touches or having to resize the touch array20:12
Sarvattricotz: thank you so freaking much for refreshing our entire patch stack against the coding style changes :P20:18
ricotzSarvatt, except 190 ;)20:20
Sarvattgot it building now to see if i screwed up the barriers patch somewhere20:21
ricotzgood, let me know if it works20:22
ricotzi like to push it to the ppa :P20:22
brycehcnd, ok, so they'd have to involve touch in some fashion20:24
brycehcnd, not sure we have any matchers there but I'll take a deeper look20:24
brycehcnd, the good news is I stuck it on 6 machines and they all still boot at least20:24
brycehunfortunately I did get a SIGABRT on one of them (the serial touchscreen one).  Dunno if it's related though.  Didn't notice the system crash myself, and it hasn't done it again.20:25
Sarvattwell in the past few releases only input related crashes were caught by 100_rethrow_signals so it makes sense20:25
Sarvatts/few/5/20:26
brycehricotz, \o/ !!20:27
Sarvattricotz: bah /usr/bin/install: failed to extend `/home/sarvatt/source/bzr/xorg-pkg-tools/xorg-server/debian/tmp/main/usr/bin/Xorg': No space left on device20:27
Sarvatt 20 minutes into it :P20:28
brycehSarvatt, do you know why it was only catching the input crashes?20:28
brycehSarvatt, doh.  SSD?  ;-)20:28
Sarvattbryceh: we had to disable it a bit because it wasn't working in the karmic timeframe, then pitti did some magic to get it working again and after that only input related crashes triggered it, i never could figure out why20:28
Sarvattinput and proprietary drivers, let me rephrase that :)20:29
brycehyeah20:30
Sarvattyeah SSD, i run on 2gb free space 99% of the time :)20:31
brycehI certainly remember going through it with pitti.20:31
brycehproblem is it's kinda hard to test20:33
brycehwe would just send signals to the server.  probably would be better if we deliberately introduced various kinds of faults, and checked that apport caught them20:34
brycehbut we've never been short on bugs, and people are willing to run gdb (which gives better backtraces anyway), so hasn't been that high on my todo list20:34
ricotzbryceh, ;)20:35
ricotzSarvatt, will testbuild and push it then20:35
brycehplus signal handling code hurts my little brain20:35
mlankhorstbryceh: Heheh, no longer the case here somehow :-)20:35
* ricotz uses pbuilder on tmpfs ;P < Sarvatt 20:36
Sarvattbut i like to use a web browser, 4GB isnt even enough for chromium20:37
ricotzyeah i am struggling with 8gb here which isnt enough in many cases :\20:37
mlankhorstricotz: Yeah I'm using 16gb atm :x20:38
Sarvattricotz: it builds, ship it!20:39
ricotzSarvatt, done20:47
Sarvattcnd: Warning: attempting to log data in a signal unsafe manner while in signal context. Please update to check inSignalContext and/or use LogMessageVerbSigSafe(). The offending log format message is:21:58
Sarvatt%d: %s (%s+0x%lx) [%p]21:58
cndSarvatt, right, but backtrace printer is running in signal context...21:58
cnds/but/the/21:59
cndhmmm21:59
Sarvattits awesome its not crashing the server anymore, can finally close my lid :)21:59
cndSarvatt, is it fixed with the patch I added?21:59
Sarvatt[mi] EQ overflow continuing.  %lu events have been dropped.21:59
Sarvattyeah i'm using your newest one http://paste.ubuntu.com/918083/22:00
cndok22:00
cndSarvatt, so based on your testing, the printing in a signal-safe manner fixes things?22:01
cndjust want to be sure22:01
Sarvattit fixes things but the printing is screwed up and its screwing up the mieqEnqueue printing too22:01
cndyeah22:01
Sarvattthats me 10 finger + puppy paw pressing on the touchpad repeatedly in the log22:03
Sarvatt(canonical-tech email humor)22:04
cndheh22:05
Sarvattgoing to dupe all the bugs related to the crash i have on lid close on macs to the 10 finger one, its the same darn thing22:05
cndyay22:06
Sarvattbryceh: if you see [291943.052] 3: /usr/bin/X (valuator_mask_set_double+0x0) [0x7fea24624ab0] in any crash logs its a dupe of 97401722:07
Sarvattmacs are wigging out sending input events constantly from the lid when its closed22:07
Sarvattthere were a few duped to doko's bug22:08
Sarvatthttps://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/933504 going through those now22:09
ubottuLaunchpad bug 933504 in xorg-server (Ubuntu Precise) "Xorg crashed with SIGABRT in __libc_message "double free or corruption (out)" from DeleteInputDeviceRequest" [High,Triaged]22:09
Sarvattcnd: is it known xinput testxi2 stopped working on macs in the past 2 weeks or so?22:11
SarvattBadAtom xerrors22:11
cndSarvatt, I know of it :)22:11
cndhaven't had a chance to look at it22:12
Sarvatttried git xinput but no go, this was working a few weeks ago when i used it to see that input crap was happening from the lid22:12
cndyeah22:12
cndI don't have any idea what might have changed22:12
Sarvattsomething in x-x-i-synaptics no doubt22:13
cndmight be22:14
Sarvatti'll bisect that22:14
Sarvattcnd: surprised you didn't think it would be good to enable the right button area even if it didn't apply to macs :P22:17
Sarvattit was just mac people complaining22:17
cndSarvatt, I think it would be good22:17
cndbut it was a feature that came really late in the cycle22:17
cndif we had another RC or beta release I would have pushed for it22:17
cndand I certainly won't stand in the way if people who are interested want to raise it with the release team22:18
cndI just didn't feel I had the justification to enable it myself22:18
Sarvatti would have totally done it no questions asked, it only affects non apple clickpads which all do need it22:18
Sarvattbut yeah too late now with the freeze :(22:19
brycehSarvatt, ok thanks, valuator_mask_set_double sounds familiar22:19
cndSarvatt, hmm... olli ries has managed to crash X still by drumming on his magic trackpad22:19
Sarvattwith your new one?22:20
cndunfortunately, the backtrace in signal context doesn't help :)22:20
cndyeah22:20
Sarvatthmm wth is the touchpad sending e for22:20
Sarvattdrumming my finger on the pad its going eeeee in irc22:20
cndSarvatt, do you have ginn running, by any chance?22:21
* Sarvatt has 10 finger drumming going on and still hasnt crashed22:21
Sarvattnope22:21
cnddunno then22:22
Sarvatti'm using utouch daily too22:22
Sarvattthe e's stopped22:22
Sarvattis he clicking when hes drumming?22:23
Sarvatt4 minues of drumming hasnt crashed anything here on bcm597422:23
Sarvattcnd: nothing new is getting sent to my Xorg.0.log which is weird22:24
cndSarvatt, what are you expecting?22:24
Sarvattit stopped at [    34.404] [mi] Increasing EQ size to 512 to prevent dropped events.22:25
Sarvatt[    34.404] [mi] EQ processing has resumed after 249 dropped events.22:25
Sarvatt[    34.404] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.22:25
Sarvattstopped logging totally22:25
Sarvattthat log i pastebinned is still the same, nothing new is getting written to the log22:25
Sarvatt5 minutes of drumming on the touchpad should have sent tons of spam :)22:27
Sarvattyeah i plugged in an external monitor and it didnt log any of the edid probes22:27
Sarvattcnd: ignore me, it did just now, thought i found a bug but nope22:30
cndok22:31
Sarvattheh then X crashed22:32
bryceh%-}22:33
Sarvattit stopped logging input stuff for a good 30 minutes, then plugging in an external and the edid probe writing to the log got it back where 10 fingers on the touchpad would crash it again22:33
Sarvatt10 finger press spams the log like hell with your patches, so i know it stopped logging input related things after that EQ processing message22:34
cndhuh22:37
cndI should add an fsync to the logging function22:37
cndmaybe that'll fix things22:37
brycehSarvatt, no dupes for 974017 in xorg or xorg-server with valuator_mask_set_*22:46
Sarvattbryceh: your dupe checker isnt checking for bugs that are already duped to another bug :)22:46
Sarvattevery time i filed it it tried to get duped to dokos bug22:46
brycehSarvatt, hum true22:46
Sarvatthas anyone asked doko if he still hits the bug?22:47
Sarvattits eating up other valid bugs22:47
Sarvatti just asked on the bug22:49
Sarvattdont think apport will dupe to closed bugs22:49
Sarvattwhen apport dupes it removes the useful info so will be good to start fresh if hes not hitting it, his specific bug he was hitting when glibc busted nvidia proprietary drivers22:50
Sarvattwhich was fixed forever ago22:51
brycehSarvatt, sounds good22:53
brycehSarvatt, or we can add a tag to make apport recollect stuff22:53
Sarvattproprietary drivers taking down X, signal handler called, input stack trying to print messages still to the log while its printing the backtrace from the real crash, every bug where input is writing to the log unsafely is getting duped to it22:53
Sarvattbryceh: theres a tag for that?22:54
brycehyep22:56
brycehneeds-retrace I think, lemme check22:57
Sarvatti filed a bug, duped to doko's bug and all good logs removed, unduped it, and it automatically duped it yet again so i gave up and filed a new one and changed the info/deleted the core before it got retraced so it could be permenant22:57
Sarvattpain in the butt22:57
brycehnope, it's apport-request-retrace22:58
brycehhttp://www.piware.de/2011/11/apport-1-90-client-side-duplicate-checking/22:58
bryceh$ ./search-attachments xorg-server ThreadStacktrace.txt valuator_mask_set_22:58
bryceh948792 Confirmed Xorg crashed with SIGSEGV in valuator_mask_set_double22:58
bryceh948697 Confirmed [bcm5974] Xorg crashed with SIGSEGV in valuator_mask_set_double()22:58
brycehso, including dupes I only found those two22:58
brycehboth already properly duped22:59
Sarvattyou knew pinged me about someone who could reproduce it easily and had 2-3 bugs already filed last month22:59
Sarvattthink it was a canonical employee so it showed up on the bug radar management pushes23:00
Sarvattgoogle to the rescue, just duped 3 more23:02
Sarvattof course the master bug is filed against unity when its not a unity problem23:05

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!