/srv/irclogs.ubuntu.com/2012/04/06/#ubuntu-x.txt

mlankhorst	noon	07:23
cnd	bryceh, so I'm working on fixing signal-unsafe logging	15:34
cnd	the issue is that I fear it will make quite a bit of logging broken	15:35
cnd	and I haven't seen any logging break the X server so far, but then again it might only show up as memory corruption...	15:35
jcristau	i thought you were only allowed X_NONE in signal context, which didn't add the timeout	15:39
jcristau	maybe i misremember	15:40
jcristau	apparently i do...	15:41
jcristau	s/timeout/timestamp/	15:41
mlankhorst	heya	15:44
cnd	jcristau, what do you mean?	15:45
cnd	I think there's been some misconceptions about what is actually allowed in signal context	15:45
jcristau	cnd: yeah i thought people had been careful about that, and the log was ~ just an fwrite(). guess not.	15:52
cnd	jcristau, even fwrite isn't signal safe	15:52
cnd	you have to use write	15:52
cnd	http://linux.die.net/man/7/signal	15:52
cnd	it has a list of signal safe functions	15:52
jcristau	i guess that can take locks...	15:54
bryceh	cnd, great, let me know how I can help	17:10
cnd	bryceh, the question is: do we want to take the patches	17:11
cnd	that could ultimately make our logging less useful	17:11
cnd	but for which we can be sure there will be no memory or other corruption	17:11
bryceh	cnd, fewer crashes trumps better logging I should think	17:11
cnd	I think so too, I just wanted to get a second opinion	17:11
bryceh	depends on the patch and actual effects of course. I suspect much of the logging we don't really care about that much, and if we lose something that we do, there's probably more than one way to do it	17:13
bryceh	cnd, the one thing is that I'm wondering why it'd be crashing only now; we haven't seen these types of crashes in natty/oneiric afaict	17:13
cnd	bryceh, yeah, I don't think the logging is the real cause of the crashing	17:14
cnd	but we can't be sure	17:14
cnd	however, the logging does prevent us from running X under valgrind	17:14
cnd	under certain circumstances	17:14
* bryceh nods		17:15
cnd	bryceh, my current task for today is:	17:16
cnd	1. fix logging	17:16
cnd	2. run valgrind again to try to resolve bug 974017	17:16
ubottu	Launchpad bug 974017 in unity-2d (Ubuntu) "Crash when touching trackpad with 10 fingers" [Undecided,New] https://launchpad.net/bugs/974017	17:16
cnd	which may be the root cause of some of the memory corruption issues?	17:16
bryceh	hope something turns up!	17:18
cnd	bryceh, is it normal for the archive to be frozen now?	17:23
cnd	I just saw skaet's email on ubuntu-devel	17:23
Sarvatt	cnd: wow, just reproduced it and thats my same bug	17:23
Sarvatt	[ 11090.523] 3: /usr/bin/X (valuator_mask_set_double+0x0) [0x7fa1c508cb00]	17:23
cnd	Sarvatt, yep	17:23
Sarvatt	that happens when the lid is closed here	17:23
bryceh	cnd, they froze it after beta last release too	17:23
cnd	bryceh, ok	17:24
bryceh	cnd, historically no, it's usually been unfrozen. But they think this will help ensure a higher quality level at release	17:24
cnd	it seems like if they are going to freeze the archive, they should be putting it in the release schedule methinks	17:24
bryceh	cnd, you can still get stuff in, it just takes an extra layer of review and chance getting kicked out	17:24
cnd	yeah	17:24
bryceh	cnd, showing evidence of thorough testing appears to help minimize those chances	17:24
cnd	bryceh, can you review the patch for bug 975356?	18:38
ubottu	Launchpad bug 975356 in xorg-server (Ubuntu) "Logging from signal context is unsafe" [Medium,In progress] https://launchpad.net/bugs/975356	18:38
cnd	the patch is attached	18:38
bryceh	cnd, on it	18:39
bryceh	cnd, did you check that 100_rethrow_signals.patch is not adding unsafe calls?	18:41
cnd	no, I didn't	18:41
bryceh	ok, we'll need to remember to do that. it might be ok, I don't remember	18:42
cnd	bryceh, it may be easier to review the patches sent to xorg-devel	18:44
cnd	since they are split up into easier to review commits	18:44
bryceh	ok	18:45
cnd	the only difference is the ubuntu patch includes an extra patch from upstream 1.12 that splits out the logging type string to a separate function	18:45
cnd	it backported without issue	18:45
cnd	hmm... well, valgrind now works properly	18:48
cnd	as in, it won't die	18:48
cnd	but I don't get any hits	18:48
cnd	no leads as to what the real bug is when putting lots of touches down	18:49
cnd	bryceh, so now I can't get it to crash anymore	18:53
cnd	which may mean that the logging in signal context was the real culprit all along!	18:53
bryceh	cnd, ok finished reviewing the patches	18:53
bryceh	cnd, well that would be pretty sweet if it's true	18:54
cnd	bryceh, the results of your review is?	18:54
bryceh	yet still I wonder why we didn't see this behavior previously?	18:55
cnd	bryceh, the only two code paths that I know of that log in signal context are touch-specific	18:55
cnd	so they didn't exist before	18:55
bryceh	cnd, +1, I didn't spot anything erroneous, so sent my reviewed-by to the list. my knowledge of signal code is sketchy though so dunno how useful that is...	18:55
cnd	oh, I didn't check the list	18:55
cnd	bryceh, if you could comment in the bug too, that would be helpful	18:56
bryceh	okie	18:56
cnd	I am going to go take this to #ubuntu-release	18:56
cnd	to get their sign off	18:56
bryceh	cnd, should we get a bit more testing before we push it into the distro?	18:58
cnd	bryceh, how do you propose we do it?	18:58
* bryceh ponders		18:58
bryceh	well, we've got tons of bug reports. Might be one or two who can reproduce the bug (or a similar bug) pretty easily and have them test it?	19:00
cnd	yeah	19:00
cnd	bryceh, I'll throw it up in a ppa	19:00
bryceh	I suppose the thing we're more concerned about is regressions. I could just slap it on a couple machines here and just make sure they boot and basically work	19:00
bryceh	but yeah if you can ppa it, I'll scare up some testing	19:00
cnd	k	19:00
cnd	bryceh, I've uploaded to ppa:chasedouglas/jupiter	19:04
bryceh	ok	19:04
bryceh	wow, that built surprisingly fast	19:20
bryceh	wait dah, looking at the wrong one	19:20
bryceh	cnd, hmm bunch of test failures on the build	19:28
cnd	hmm	19:28
cnd	I must admit I had tests turned off	19:29
cnd	I'll enable them and check	19:29
bryceh	PASS: xfree86	19:29
bryceh	/bin/bash: line 5: 14823 Segmentation fault (core dumped) MALLOC_PERTURB_=15 ${dir}$tst	19:29
bryceh	FAIL: touch	19:29
bryceh	========================================================================	19:29
bryceh	1 of 8 tests failed	19:29
bryceh	here's where it started failing:	19:29
bryceh	Testing bytes_to_int32()	19:29
bryceh	Testing pad_to_int32	19:29
bryceh	Unlinking from front.	19:29
bryceh	[mi] Increasing EQ size to 512 to prevent dropped events.	19:29
cnd	it helps having a quad core hyperthreaded behemoth when compiling the x server :)	19:32
bryceh	full output http://paste.ubuntu.com/917885/	19:34
bryceh	hmm, output is out of order there	19:35
bryceh	config_tests = --disable-unit-tests ?	19:39
bryceh	can that be set via DEB_BUILD_OPTIONS?	19:39
* bryceh tries 'nocheck'		19:40
cnd	bryceh, it's an issue with the test basically	19:40
cnd	it creates a test device, but doesn't give the device a name	19:40
cnd	so the logging code segfaults when it tries to print the device name	19:41
bryceh	ah	19:41
cnd	I've got a fix and am test building now	19:46
Sarvatt	ricotz: man that took way too long, here's a refreshed pointer barriers patch for newer xserver http://kernel.ubuntu.com/~sarvatt/patches/500_pointer_barrier_thresholds.diff	20:09
Sarvatt	gonna test it out now	20:09
* Sarvatt refreshed an older version of the patch first like an idiot and had to redo it		20:10
Sarvatt	heh go figure, i refreshed it against master not 1.12 branch, have to fix up one hunk	20:12
cnd	bryceh, if this logging is the cause of the corruption, any corruption bugs where we have the X log from the crash should also contain messages from error context, i.e. messages about not finding touches or having to resize the touch array	20:12
Sarvatt	ricotz: thank you so freaking much for refreshing our entire patch stack against the coding style changes :P	20:18
ricotz	Sarvatt, except 190 ;)	20:20
Sarvatt	got it building now to see if i screwed up the barriers patch somewhere	20:21
ricotz	good, let me know if it works	20:22
ricotz	i like to push it to the ppa :P	20:22
bryceh	cnd, ok, so they'd have to involve touch in some fashion	20:24
bryceh	cnd, not sure we have any matchers there but I'll take a deeper look	20:24
bryceh	cnd, the good news is I stuck it on 6 machines and they all still boot at least	20:24
bryceh	unfortunately I did get a SIGABRT on one of them (the serial touchscreen one). Dunno if it's related though. Didn't notice the system crash myself, and it hasn't done it again.	20:25
Sarvatt	well in the past few releases only input related crashes were caught by 100_rethrow_signals so it makes sense	20:25
Sarvatt	s/few/5/	20:26
bryceh	ricotz, \o/ !!	20:27
Sarvatt	ricotz: bah /usr/bin/install: failed to extend `/home/sarvatt/source/bzr/xorg-pkg-tools/xorg-server/debian/tmp/main/usr/bin/Xorg': No space left on device	20:27
Sarvatt	20 minutes into it :P	20:28
bryceh	Sarvatt, do you know why it was only catching the input crashes?	20:28
bryceh	Sarvatt, doh. SSD? ;-)	20:28
Sarvatt	bryceh: we had to disable it a bit because it wasn't working in the karmic timeframe, then pitti did some magic to get it working again and after that only input related crashes triggered it, i never could figure out why	20:28
Sarvatt	input and proprietary drivers, let me rephrase that :)	20:29
bryceh	yeah	20:30
Sarvatt	yeah SSD, i run on 2gb free space 99% of the time :)	20:31
bryceh	I certainly remember going through it with pitti.	20:31
bryceh	problem is it's kinda hard to test	20:33
bryceh	we would just send signals to the server. probably would be better if we deliberately introduced various kinds of faults, and checked that apport caught them	20:34
bryceh	but we've never been short on bugs, and people are willing to run gdb (which gives better backtraces anyway), so hasn't been that high on my todo list	20:34
ricotz	bryceh, ;)	20:35
ricotz	Sarvatt, will testbuild and push it then	20:35
bryceh	plus signal handling code hurts my little brain	20:35
mlankhorst	bryceh: Heheh, no longer the case here somehow :-)	20:35
* ricotz uses pbuilder on tmpfs ;P < Sarvatt		20:36
Sarvatt	but i like to use a web browser, 4GB isnt even enough for chromium	20:37
ricotz	yeah i am struggling with 8gb here which isnt enough in many cases :\	20:37
mlankhorst	ricotz: Yeah I'm using 16gb atm :x	20:38
Sarvatt	ricotz: it builds, ship it!	20:39
ricotz	Sarvatt, done	20:47
Sarvatt	cnd: Warning: attempting to log data in a signal unsafe manner while in signal context. Please update to check inSignalContext and/or use LogMessageVerbSigSafe(). The offending log format message is:	21:58
Sarvatt	%d: %s (%s+0x%lx) [%p]	21:58
cnd	Sarvatt, right, but backtrace printer is running in signal context...	21:58
cnd	s/but/the/	21:59
cnd	hmmm	21:59
Sarvatt	its awesome its not crashing the server anymore, can finally close my lid :)	21:59
cnd	Sarvatt, is it fixed with the patch I added?	21:59
Sarvatt	[mi] EQ overflow continuing. %lu events have been dropped.	21:59
Sarvatt	yeah i'm using your newest one http://paste.ubuntu.com/918083/	22:00
cnd	ok	22:00
cnd	Sarvatt, so based on your testing, the printing in a signal-safe manner fixes things?	22:01
cnd	just want to be sure	22:01
Sarvatt	it fixes things but the printing is screwed up and its screwing up the mieqEnqueue printing too	22:01
cnd	yeah	22:01
Sarvatt	thats me 10 finger + puppy paw pressing on the touchpad repeatedly in the log	22:03
Sarvatt	(canonical-tech email humor)	22:04
cnd	heh	22:05
Sarvatt	going to dupe all the bugs related to the crash i have on lid close on macs to the 10 finger one, its the same darn thing	22:05
cnd	yay	22:06
Sarvatt	bryceh: if you see [291943.052] 3: /usr/bin/X (valuator_mask_set_double+0x0) [0x7fea24624ab0] in any crash logs its a dupe of 974017	22:07
Sarvatt	macs are wigging out sending input events constantly from the lid when its closed	22:07
Sarvatt	there were a few duped to doko's bug	22:08
Sarvatt	https://bugs.launchpad.net/ubuntu/+source/xorg-server/+bug/933504 going through those now	22:09
ubottu	Launchpad bug 933504 in xorg-server (Ubuntu Precise) "Xorg crashed with SIGABRT in __libc_message "double free or corruption (out)" from DeleteInputDeviceRequest" [High,Triaged]	22:09
Sarvatt	cnd: is it known xinput testxi2 stopped working on macs in the past 2 weeks or so?	22:11
Sarvatt	BadAtom xerrors	22:11
cnd	Sarvatt, I know of it :)	22:11
cnd	haven't had a chance to look at it	22:12
Sarvatt	tried git xinput but no go, this was working a few weeks ago when i used it to see that input crap was happening from the lid	22:12
cnd	yeah	22:12
cnd	I don't have any idea what might have changed	22:12
Sarvatt	something in x-x-i-synaptics no doubt	22:13
cnd	might be	22:14
Sarvatt	i'll bisect that	22:14
Sarvatt	cnd: surprised you didn't think it would be good to enable the right button area even if it didn't apply to macs :P	22:17
Sarvatt	it was just mac people complaining	22:17
cnd	Sarvatt, I think it would be good	22:17
cnd	but it was a feature that came really late in the cycle	22:17
cnd	if we had another RC or beta release I would have pushed for it	22:17
cnd	and I certainly won't stand in the way if people who are interested want to raise it with the release team	22:18
cnd	I just didn't feel I had the justification to enable it myself	22:18
Sarvatt	i would have totally done it no questions asked, it only affects non apple clickpads which all do need it	22:18
Sarvatt	but yeah too late now with the freeze :(	22:19
bryceh	Sarvatt, ok thanks, valuator_mask_set_double sounds familiar	22:19
cnd	Sarvatt, hmm... olli ries has managed to crash X still by drumming on his magic trackpad	22:19
Sarvatt	with your new one?	22:20
cnd	unfortunately, the backtrace in signal context doesn't help :)	22:20
cnd	yeah	22:20
Sarvatt	hmm wth is the touchpad sending e for	22:20
Sarvatt	drumming my finger on the pad its going eeeee in irc	22:20
cnd	Sarvatt, do you have ginn running, by any chance?	22:21
* Sarvatt has 10 finger drumming going on and still hasnt crashed		22:21
Sarvatt	nope	22:21
cnd	dunno then	22:22
Sarvatt	i'm using utouch daily too	22:22
Sarvatt	the e's stopped	22:22
Sarvatt	is he clicking when hes drumming?	22:23
Sarvatt	4 minues of drumming hasnt crashed anything here on bcm5974	22:23
Sarvatt	cnd: nothing new is getting sent to my Xorg.0.log which is weird	22:24
cnd	Sarvatt, what are you expecting?	22:24
Sarvatt	it stopped at [ 34.404] [mi] Increasing EQ size to 512 to prevent dropped events.	22:25
Sarvatt	[ 34.404] [mi] EQ processing has resumed after 249 dropped events.	22:25
Sarvatt	[ 34.404] [mi] This may be caused my a misbehaving driver monopolizing the server's resources.	22:25
Sarvatt	stopped logging totally	22:25
Sarvatt	that log i pastebinned is still the same, nothing new is getting written to the log	22:25
Sarvatt	5 minutes of drumming on the touchpad should have sent tons of spam :)	22:27
Sarvatt	yeah i plugged in an external monitor and it didnt log any of the edid probes	22:27
Sarvatt	cnd: ignore me, it did just now, thought i found a bug but nope	22:30
cnd	ok	22:31
Sarvatt	heh then X crashed	22:32
bryceh	%-}	22:33
Sarvatt	it stopped logging input stuff for a good 30 minutes, then plugging in an external and the edid probe writing to the log got it back where 10 fingers on the touchpad would crash it again	22:33
Sarvatt	10 finger press spams the log like hell with your patches, so i know it stopped logging input related things after that EQ processing message	22:34
cnd	huh	22:37
cnd	I should add an fsync to the logging function	22:37
cnd	maybe that'll fix things	22:37
bryceh	Sarvatt, no dupes for 974017 in xorg or xorg-server with valuator_mask_set_*	22:46
Sarvatt	bryceh: your dupe checker isnt checking for bugs that are already duped to another bug :)	22:46
Sarvatt	every time i filed it it tried to get duped to dokos bug	22:46
bryceh	Sarvatt, hum true	22:46
Sarvatt	has anyone asked doko if he still hits the bug?	22:47
Sarvatt	its eating up other valid bugs	22:47
Sarvatt	i just asked on the bug	22:49
Sarvatt	dont think apport will dupe to closed bugs	22:49
Sarvatt	when apport dupes it removes the useful info so will be good to start fresh if hes not hitting it, his specific bug he was hitting when glibc busted nvidia proprietary drivers	22:50
Sarvatt	which was fixed forever ago	22:51
bryceh	Sarvatt, sounds good	22:53
bryceh	Sarvatt, or we can add a tag to make apport recollect stuff	22:53
Sarvatt	proprietary drivers taking down X, signal handler called, input stack trying to print messages still to the log while its printing the backtrace from the real crash, every bug where input is writing to the log unsafely is getting duped to it	22:53
Sarvatt	bryceh: theres a tag for that?	22:54
bryceh	yep	22:56
bryceh	needs-retrace I think, lemme check	22:57
Sarvatt	i filed a bug, duped to doko's bug and all good logs removed, unduped it, and it automatically duped it yet again so i gave up and filed a new one and changed the info/deleted the core before it got retraced so it could be permenant	22:57
Sarvatt	pain in the butt	22:57
bryceh	nope, it's apport-request-retrace	22:58
bryceh	http://www.piware.de/2011/11/apport-1-90-client-side-duplicate-checking/	22:58
bryceh	$ ./search-attachments xorg-server ThreadStacktrace.txt valuator_mask_set_	22:58
bryceh	948792 Confirmed Xorg crashed with SIGSEGV in valuator_mask_set_double	22:58
bryceh	948697 Confirmed [bcm5974] Xorg crashed with SIGSEGV in valuator_mask_set_double()	22:58
bryceh	so, including dupes I only found those two	22:58
bryceh	both already properly duped	22:59
Sarvatt	you knew pinged me about someone who could reproduce it easily and had 2-3 bugs already filed last month	22:59
Sarvatt	think it was a canonical employee so it showed up on the bug radar management pushes	23:00
Sarvatt	google to the rescue, just duped 3 more	23:02
Sarvatt	of course the master bug is filed against unity when its not a unity problem	23:05

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!