bregma | well, I'm going to try again: desktop Unity 8 has been dead in the water for a while, segfaulting in the Intel driver on uninitialized buffers when the Mir server does a glClear() | 00:02 |
---|---|---|
bregma | currently logged as https://bugs.launchpad.net/ubuntu/+source/unity8-desktop-session/+bug/1336854 | 00:03 |
ubot5 | Ubuntu bug 1336854 in unity8-desktop-session (Ubuntu) "Unity 8 fails to start, segfault in i965_dri.so" [Undecided,New] | 00:03 |
bregma | wit han attached stack trace | 00:03 |
bregma | I'm looking for suggestions on what may have changed in the stack in the last week or two, and/or suggestions on how to chase down more information to help narrow the cause down | 00:03 |
RAOF | bregma: Yeah, I noticed that yesterday. | 00:03 |
RAOF | It's going to be a mesa problem. | 00:04 |
bregma | unity-system-compositor continues to run fine, so it could be something wacky higher in the stack, it's under a lot of churn lately too | 00:04 |
RAOF | usc is fine because it's not using the Mir EGL platform. | 00:05 |
RAOF | I suspect that you first saw this on the 23rd? | 00:05 |
RAOF | That's when Maarten uploaded hte new mesa. | 00:05 |
bregma | sounds suspicious | 00:05 |
bregma | if I can perhaps revert to an older version, that would point the finger | 00:06 |
RAOF | Give it a whirl. | 00:06 |
RAOF | I'll poke around in the mesa code. | 00:06 |
bregma | FTR, reverting libgl1-mesa-dri to 10.1.3-0ubuntu0.1 fixes the segfault | 00:34 |
RAOF | Thanks. I'm installing a debug build, so it shouldn't be long before working out what's wrong and fixing it. | 00:42 |
RAOF | Ah! There's the problem. | 01:34 |
RAOF | bregma: Mesa 10 | 01:54 |
RAOF | bregma: Mesa 10.2.1-2ubuntu3 fixes your problem. Enjoy! | 01:55 |
duflu | racarr: What's a prompt session? | 02:57 |
racarr | duflu: trust sessions | 03:07 |
racarr | ;) | 03:07 |
racarr | https://wiki.ubuntu.com/Security/TrustStoreAndSessions | 03:07 |
racarr | sort of explains how the name prompt session | 03:08 |
racarr | comes about | 03:08 |
duflu | racarr: OK then... | 03:09 |
RAOF | Oh, whoops. We appear to call all manner of un-signal-safe functions in the emergency cleanup signal handler. | 06:25 |
RAOF | Including allocations! | 06:25 |
duflu | RAOF: Sounds a bit silly. Incidentally I mentioned we shouldn't ever do such a thing in a branch that's pending... https://code.launchpad.net/~vanvugt/mir/fatal-error/+merge/219471 | 06:48 |
RAOF | Yeah, reviewing that is what brought me to look at the emergency cleanup bit. | 06:48 |
duflu | RAOF: Well the latest emergency cleanup stuff I don't think I reviewed either | 06:49 |
duflu | The plate was piled too high in May | 06:49 |
duflu | Hmm Chrome in Trusty has annoyingly bad tearing (diagonal/triangles)... I wonder if that's Chrome or Compiz | 06:51 |
duflu | If it's Compiz then of course we don't care any more | 06:52 |
RAOF | I think it's Chrome; it started with the Aura drop, IIRC. | 06:53 |
* duflu looks up Aura | 06:53 | |
duflu | Oh, nice one Google... they really should be able to do better than that | 06:54 |
alf_ | RAOF: duflu: where are we doing allocations in the emergency cleanup handlers? | 07:12 |
duflu | No idea | 07:12 |
* duflu points to RAOF | 07:12 | |
RAOF | alf_: You assign to a vector; that's at least potentially doing allocation. | 07:13 |
duflu | Oh that will do it | 07:13 |
RAOF | Also, pthreads isn't threadsafe, right? | 07:13 |
duflu | Depending on the method... | 07:13 |
duflu | RAOF: Umm, what? | 07:13 |
RAOF | Ahem. | 07:13 |
RAOF | Signalsafe | 07:13 |
RAOF | 'cause we also lock a mutex in there. | 07:13 |
duflu | RAOF: It never used to be... Signals could arrive in _any_ thread but should arrive in just one. So unpredictable but safe I think | 07:14 |
RAOF | Right, but what happens if pthread_lock gets interrupted by a signal? | 07:14 |
duflu | If you want to determine the thread to use then pthread_kill | 07:14 |
alf_ | RAOF: duflu: at vector, right, that's easy to fix I will take a look | 07:15 |
duflu | RAOF: "If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread resumes waiting for the mutex as if it was not interrupted." | 07:15 |
duflu | [http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_lock.html] | 07:15 |
RAOF | I'm not sure that covers my concern. | 07:16 |
RAOF | But signal-safety documentation isn't the best :) | 07:16 |
duflu | RAOF: Well that's just the spec. Everyone has their own implementation | 07:17 |
RAOF | No, I mean that it's not clear that documentation covers the case I'm concerned about. | 07:17 |
alf_ | RAOF: "The mutex functions are not async-signal safe. What this means is that they should not be called from a signal handler. In particular, calling pthread_mutex_lock or pthread_mutex_unlock from a signal handler may deadlock the calling thread." | 07:17 |
alf_ | RAOF: :) | 07:17 |
RAOF | alf_: Ding! | 07:18 |
duflu | Certainly, locking in signal handlers is also bad | 07:18 |
duflu | Then again without such bugs I would have received more core files in my life time than I have | 07:18 |
RAOF | Basically, the set of things you can _safely_ do in a signal handler is poke at memory addresses :) | 07:18 |
RAOF | </hyperbole> | 07:19 |
duflu | I remember plenty of stack traces from customers showing code hanging in a signal handler after a crash :/ | 07:19 |
RAOF | Yup. | 07:20 |
RAOF | It's disturbingly easy to deadlock in one. | 07:20 |
alf_ | duflu: RAOF: We can drop the locks, making it clear that you are not to call the emergency cleanup while concurrently adding a handler | 07:21 |
alf_ | duflu: RAOF: plus the emergency cleanup is a best effort cleanup | 07:21 |
duflu | alf_: If we can guarantee the signal handler is only called once then I guess reduced locking is OK | 07:22 |
RAOF | Well, the signal handler isn't reentrant. | 07:22 |
RAOF | (By default) | 07:22 |
RAOF | But we should ensure that we don't do anything that's highly likely to deadlock there :) | 07:22 |
duflu | Also easy to do by accident though... if someone else asks the signal() function for the existing handler (yours) and calls it | 07:23 |
RAOF | ?! | 07:23 |
RAOF | Why on earth would they do that?+ | 07:23 |
alf_ | duflu: RAOF: and if we get rid of the locking requirement we might as well drop the vector copy | 07:24 |
duflu | RAOF: 3rd party libraries or general dealing with bad APIs where you need a signal handler | 07:24 |
RAOF | duflu: Oh, as in wrapping a signal handler? | 07:24 |
duflu | RAOF: I think that's one case | 07:25 |
duflu | alf_: If the vector never gets too big then an array is fine | 07:25 |
RAOF | alf_: Yup. We could, of course, make it a signal-safe data structure, but given that it's basically write-once I think we can happily read without locking. | 07:26 |
duflu | If it's write-once and that's guaranteed to happen before the read there is no race. And Helgrind etc will be happy without locking | 07:28 |
alf_ | duflu: RAOF: it says "The mutex functions are not *async-signal* safe", but the signals we handle with emergency cleanup are synchronous | 07:32 |
RAOF | Unless someone sends us SIGsomething :) | 07:34 |
duflu | Heh, bypass must be awesome if it's still doing my head in a year later | 07:35 |
=== doko_ is now known as doko | ||
=== vila_ is now known as vila | ||
=== alan_g is now known as alan_g|lunch | ||
=== alan_g|lunch is now known as alan_g | ||
=== pete-woods is now known as pete-woods-lunch | ||
=== alan_g is now known as alan_g|tea | ||
=== alan_g|tea is now known as alan_g | ||
=== greyback_ is now known as greyback|post | ||
=== pete-woods-lunch is now known as pete-woods | ||
=== greyback|post is now known as greyback | ||
=== chihchun is now known as chihchun_afk | ||
=== dandrader is now known as dandrader|lunch | ||
=== alan_g is now known as alan_g|EOD | ||
=== dandrader|lunch is now known as dandrader | ||
dobey | who would be best to bug to maybe get a crash in libmirclient on application exit on the phone fixed asap today? | 18:09 |
AlbertA | dobey: bug #? | 18:10 |
AlbertA | dobey: which image# ? | 18:10 |
AlbertA | racarr: can you take a look at https://code.launchpad.net/~albaguirre/unity-system-compositor/no-inactivity-handling-desktop/+merge/225537 | 18:11 |
AlbertA | racarr: it's fairly small :) | 18:11 |
dobey | AlbertA: on 111, but lokos like it's been happening for a little while. i haven't filed a bug yet, the reports on errors.u.c failed to retrace, so there's no "create a bug" link for them. trying to determine the best way to file the bug | 18:11 |
=== renato_ is now known as Guest58058 | ||
AlbertA | dobey: so any application exiting crashes? | 18:12 |
dobey | should i just file it with the top of the stack trace and a link to the errors? | 18:12 |
AlbertA | dobey: what's the fastest way to reproduce? | 18:12 |
dobey | AlbertA: any using the mir backend of qt/qml afaict. open clock app, wait a few seconds, and close it, and there should be a crash report in /var/crash/ | 18:12 |
dobey | https://errors.ubuntu.com/problem/6552ba4342afeb93d20e22711ac36f655cd885d8 | 18:13 |
dobey | or go to online accounts, then hit back and should result in in a crash report too | 18:14 |
AlbertA | dobey: I couldn't access that link | 18:16 |
AlbertA | dobey: but I'll take a look, if you can submit a bug # that would be great | 18:16 |
dobey | AlbertA: sure. should i just copy/paste the top of the stack trace in the bug? | 18:17 |
AlbertA | dobey: sure | 18:17 |
dobey | ok will do | 18:18 |
racarr | AlbertA: Sure | 18:21 |
dobey | AlbertA: https://bugs.launchpad.net/ubuntu/+source/mir/+bug/1337481 | 18:28 |
ubot5 | Ubuntu bug 1337481 in mir (Ubuntu) "Crash in libmirclient on app exit on phone" [Undecided,New] | 18:28 |
=== dandrader is now known as dandrader|afk | ||
racarr | dandrader|afk: greyback: https://code.launchpad.net/~unity-team/platform-api/devel-for-qtmircompositor/+merge/225320https://code.launchpad.net/~unity-team/platform-api/devel-for-qtmircompositor/+merge/225320 | 18:59 |
racarr | err whoops | 18:59 |
racarr | but what is up with line 427 | 18:59 |
racarr | It looks like, state_before_hiding is being used to save the state across minimize state changes | 19:01 |
racarr | but why initialize to MAXIMIZED? There must be a default or currentstate | 19:01 |
greyback | racarr: indeed. I can't say. Hope dandrader|afk can reply | 19:01 |
racarr | anyway good besides that | 19:02 |
racarr | if he doesnt come back soon ill leave some comments | 19:03 |
racarr | on launchpad | 19:03 |
greyback | racarr: please do, and I'll try to fix (daniel off on hols at eod today) | 19:03 |
racarr | :) sounds nice | 19:04 |
racarr | racarr off on holidays in 51 days | 19:04 |
racarr | lol | 19:04 |
=== dandrader|afk is now known as dandrader | ||
dandrader | racarr, right, I'm just initializing it to something | 21:43 |
dandrader | racarr, the first time you call show(), the state will be set to this value | 21:44 |
dandrader | racarr, which is what we want on phablet anyway | 21:46 |
dandrader | racarr, "restored" windows make no sense there | 21:46 |
dandrader | racarr, and the "beautiful" papi API is not expressive enough at the moment for the user to set the mir surface states properly. eg: "papi::window::show() == mir::surface::set_state(maximized) or mir::surface::set_state(restored? | 21:48 |
dandrader | racarr, so I don't bother with this sad situation and just let it go maximized, which is what we want | 21:48 |
dandrader | and now I'm repeating myself... | 21:48 |
dandrader | racarr, so if you wanna to fix thing you would have to rewrite that papi api to be a 1-to-1 mapping of mir | 21:49 |
dandrader | racarr, which is a waste of developer time IMHO | 21:49 |
dandrader | and would also require changing papi users: ie, qtubuntu | 21:51 |
=== mterry is now known as 18VAAOQDS | ||
=== 18VAAOQDS is now known as mterry |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!