| bregma | well, I'm going to try again: desktop Unity 8 has been dead in the water for a while, segfaulting in the Intel driver on uninitialized buffers when the Mir server does a glClear() | 00:02 |
|---|---|---|
| bregma | currently logged as https://bugs.launchpad.net/ubuntu/+source/unity8-desktop-session/+bug/1336854 | 00:03 |
| ubot5 | Ubuntu bug 1336854 in unity8-desktop-session (Ubuntu) "Unity 8 fails to start, segfault in i965_dri.so" [Undecided,New] | 00:03 |
| bregma | wit han attached stack trace | 00:03 |
| bregma | I'm looking for suggestions on what may have changed in the stack in the last week or two, and/or suggestions on how to chase down more information to help narrow the cause down | 00:03 |
| RAOF | bregma: Yeah, I noticed that yesterday. | 00:03 |
| RAOF | It's going to be a mesa problem. | 00:04 |
| bregma | unity-system-compositor continues to run fine, so it could be something wacky higher in the stack, it's under a lot of churn lately too | 00:04 |
| RAOF | usc is fine because it's not using the Mir EGL platform. | 00:05 |
| RAOF | I suspect that you first saw this on the 23rd? | 00:05 |
| RAOF | That's when Maarten uploaded hte new mesa. | 00:05 |
| bregma | sounds suspicious | 00:05 |
| bregma | if I can perhaps revert to an older version, that would point the finger | 00:06 |
| RAOF | Give it a whirl. | 00:06 |
| RAOF | I'll poke around in the mesa code. | 00:06 |
| bregma | FTR, reverting libgl1-mesa-dri to 10.1.3-0ubuntu0.1 fixes the segfault | 00:34 |
| RAOF | Thanks. I'm installing a debug build, so it shouldn't be long before working out what's wrong and fixing it. | 00:42 |
| RAOF | Ah! There's the problem. | 01:34 |
| RAOF | bregma: Mesa 10 | 01:54 |
| RAOF | bregma: Mesa 10.2.1-2ubuntu3 fixes your problem. Enjoy! | 01:55 |
| duflu | racarr: What's a prompt session? | 02:57 |
| racarr | duflu: trust sessions | 03:07 |
| racarr | ;) | 03:07 |
| racarr | https://wiki.ubuntu.com/Security/TrustStoreAndSessions | 03:07 |
| racarr | sort of explains how the name prompt session | 03:08 |
| racarr | comes about | 03:08 |
| duflu | racarr: OK then... | 03:09 |
| RAOF | Oh, whoops. We appear to call all manner of un-signal-safe functions in the emergency cleanup signal handler. | 06:25 |
| RAOF | Including allocations! | 06:25 |
| duflu | RAOF: Sounds a bit silly. Incidentally I mentioned we shouldn't ever do such a thing in a branch that's pending... https://code.launchpad.net/~vanvugt/mir/fatal-error/+merge/219471 | 06:48 |
| RAOF | Yeah, reviewing that is what brought me to look at the emergency cleanup bit. | 06:48 |
| duflu | RAOF: Well the latest emergency cleanup stuff I don't think I reviewed either | 06:49 |
| duflu | The plate was piled too high in May | 06:49 |
| duflu | Hmm Chrome in Trusty has annoyingly bad tearing (diagonal/triangles)... I wonder if that's Chrome or Compiz | 06:51 |
| duflu | If it's Compiz then of course we don't care any more | 06:52 |
| RAOF | I think it's Chrome; it started with the Aura drop, IIRC. | 06:53 |
| * duflu looks up Aura | 06:53 | |
| duflu | Oh, nice one Google... they really should be able to do better than that | 06:54 |
| alf_ | RAOF: duflu: where are we doing allocations in the emergency cleanup handlers? | 07:12 |
| duflu | No idea | 07:12 |
| * duflu points to RAOF | 07:12 | |
| RAOF | alf_: You assign to a vector; that's at least potentially doing allocation. | 07:13 |
| duflu | Oh that will do it | 07:13 |
| RAOF | Also, pthreads isn't threadsafe, right? | 07:13 |
| duflu | Depending on the method... | 07:13 |
| duflu | RAOF: Umm, what? | 07:13 |
| RAOF | Ahem. | 07:13 |
| RAOF | Signalsafe | 07:13 |
| RAOF | 'cause we also lock a mutex in there. | 07:13 |
| duflu | RAOF: It never used to be... Signals could arrive in _any_ thread but should arrive in just one. So unpredictable but safe I think | 07:14 |
| RAOF | Right, but what happens if pthread_lock gets interrupted by a signal? | 07:14 |
| duflu | If you want to determine the thread to use then pthread_kill | 07:14 |
| alf_ | RAOF: duflu: at vector, right, that's easy to fix I will take a look | 07:15 |
| duflu | RAOF: "If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread resumes waiting for the mutex as if it was not interrupted." | 07:15 |
| duflu | [http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_lock.html] | 07:15 |
| RAOF | I'm not sure that covers my concern. | 07:16 |
| RAOF | But signal-safety documentation isn't the best :) | 07:16 |
| duflu | RAOF: Well that's just the spec. Everyone has their own implementation | 07:17 |
| RAOF | No, I mean that it's not clear that documentation covers the case I'm concerned about. | 07:17 |
| alf_ | RAOF: "The mutex functions are not async-signal safe. What this means is that they should not be called from a signal handler. In particular, calling pthread_mutex_lock or pthread_mutex_unlock from a signal handler may deadlock the calling thread." | 07:17 |
| alf_ | RAOF: :) | 07:17 |
| RAOF | alf_: Ding! | 07:18 |
| duflu | Certainly, locking in signal handlers is also bad | 07:18 |
| duflu | Then again without such bugs I would have received more core files in my life time than I have | 07:18 |
| RAOF | Basically, the set of things you can _safely_ do in a signal handler is poke at memory addresses :) | 07:18 |
| RAOF | </hyperbole> | 07:19 |
| duflu | I remember plenty of stack traces from customers showing code hanging in a signal handler after a crash :/ | 07:19 |
| RAOF | Yup. | 07:20 |
| RAOF | It's disturbingly easy to deadlock in one. | 07:20 |
| alf_ | duflu: RAOF: We can drop the locks, making it clear that you are not to call the emergency cleanup while concurrently adding a handler | 07:21 |
| alf_ | duflu: RAOF: plus the emergency cleanup is a best effort cleanup | 07:21 |
| duflu | alf_: If we can guarantee the signal handler is only called once then I guess reduced locking is OK | 07:22 |
| RAOF | Well, the signal handler isn't reentrant. | 07:22 |
| RAOF | (By default) | 07:22 |
| RAOF | But we should ensure that we don't do anything that's highly likely to deadlock there :) | 07:22 |
| duflu | Also easy to do by accident though... if someone else asks the signal() function for the existing handler (yours) and calls it | 07:23 |
| RAOF | ?! | 07:23 |
| RAOF | Why on earth would they do that?+ | 07:23 |
| alf_ | duflu: RAOF: and if we get rid of the locking requirement we might as well drop the vector copy | 07:24 |
| duflu | RAOF: 3rd party libraries or general dealing with bad APIs where you need a signal handler | 07:24 |
| RAOF | duflu: Oh, as in wrapping a signal handler? | 07:24 |
| duflu | RAOF: I think that's one case | 07:25 |
| duflu | alf_: If the vector never gets too big then an array is fine | 07:25 |
| RAOF | alf_: Yup. We could, of course, make it a signal-safe data structure, but given that it's basically write-once I think we can happily read without locking. | 07:26 |
| duflu | If it's write-once and that's guaranteed to happen before the read there is no race. And Helgrind etc will be happy without locking | 07:28 |
| alf_ | duflu: RAOF: it says "The mutex functions are not *async-signal* safe", but the signals we handle with emergency cleanup are synchronous | 07:32 |
| RAOF | Unless someone sends us SIGsomething :) | 07:34 |
| duflu | Heh, bypass must be awesome if it's still doing my head in a year later | 07:35 |
| === doko_ is now known as doko | ||
| === vila_ is now known as vila | ||
| === alan_g is now known as alan_g|lunch | ||
| === alan_g|lunch is now known as alan_g | ||
| === pete-woods is now known as pete-woods-lunch | ||
| === alan_g is now known as alan_g|tea | ||
| === alan_g|tea is now known as alan_g | ||
| === greyback_ is now known as greyback|post | ||
| === pete-woods-lunch is now known as pete-woods | ||
| === greyback|post is now known as greyback | ||
| === chihchun is now known as chihchun_afk | ||
| === dandrader is now known as dandrader|lunch | ||
| === alan_g is now known as alan_g|EOD | ||
| === dandrader|lunch is now known as dandrader | ||
| dobey | who would be best to bug to maybe get a crash in libmirclient on application exit on the phone fixed asap today? | 18:09 |
| AlbertA | dobey: bug #? | 18:10 |
| AlbertA | dobey: which image# ? | 18:10 |
| AlbertA | racarr: can you take a look at https://code.launchpad.net/~albaguirre/unity-system-compositor/no-inactivity-handling-desktop/+merge/225537 | 18:11 |
| AlbertA | racarr: it's fairly small :) | 18:11 |
| dobey | AlbertA: on 111, but lokos like it's been happening for a little while. i haven't filed a bug yet, the reports on errors.u.c failed to retrace, so there's no "create a bug" link for them. trying to determine the best way to file the bug | 18:11 |
| === renato_ is now known as Guest58058 | ||
| AlbertA | dobey: so any application exiting crashes? | 18:12 |
| dobey | should i just file it with the top of the stack trace and a link to the errors? | 18:12 |
| AlbertA | dobey: what's the fastest way to reproduce? | 18:12 |
| dobey | AlbertA: any using the mir backend of qt/qml afaict. open clock app, wait a few seconds, and close it, and there should be a crash report in /var/crash/ | 18:12 |
| dobey | https://errors.ubuntu.com/problem/6552ba4342afeb93d20e22711ac36f655cd885d8 | 18:13 |
| dobey | or go to online accounts, then hit back and should result in in a crash report too | 18:14 |
| AlbertA | dobey: I couldn't access that link | 18:16 |
| AlbertA | dobey: but I'll take a look, if you can submit a bug # that would be great | 18:16 |
| dobey | AlbertA: sure. should i just copy/paste the top of the stack trace in the bug? | 18:17 |
| AlbertA | dobey: sure | 18:17 |
| dobey | ok will do | 18:18 |
| racarr | AlbertA: Sure | 18:21 |
| dobey | AlbertA: https://bugs.launchpad.net/ubuntu/+source/mir/+bug/1337481 | 18:28 |
| ubot5 | Ubuntu bug 1337481 in mir (Ubuntu) "Crash in libmirclient on app exit on phone" [Undecided,New] | 18:28 |
| === dandrader is now known as dandrader|afk | ||
| racarr | dandrader|afk: greyback: https://code.launchpad.net/~unity-team/platform-api/devel-for-qtmircompositor/+merge/225320https://code.launchpad.net/~unity-team/platform-api/devel-for-qtmircompositor/+merge/225320 | 18:59 |
| racarr | err whoops | 18:59 |
| racarr | but what is up with line 427 | 18:59 |
| racarr | It looks like, state_before_hiding is being used to save the state across minimize state changes | 19:01 |
| racarr | but why initialize to MAXIMIZED? There must be a default or currentstate | 19:01 |
| greyback | racarr: indeed. I can't say. Hope dandrader|afk can reply | 19:01 |
| racarr | anyway good besides that | 19:02 |
| racarr | if he doesnt come back soon ill leave some comments | 19:03 |
| racarr | on launchpad | 19:03 |
| greyback | racarr: please do, and I'll try to fix (daniel off on hols at eod today) | 19:03 |
| racarr | :) sounds nice | 19:04 |
| racarr | racarr off on holidays in 51 days | 19:04 |
| racarr | lol | 19:04 |
| === dandrader|afk is now known as dandrader | ||
| dandrader | racarr, right, I'm just initializing it to something | 21:43 |
| dandrader | racarr, the first time you call show(), the state will be set to this value | 21:44 |
| dandrader | racarr, which is what we want on phablet anyway | 21:46 |
| dandrader | racarr, "restored" windows make no sense there | 21:46 |
| dandrader | racarr, and the "beautiful" papi API is not expressive enough at the moment for the user to set the mir surface states properly. eg: "papi::window::show() == mir::surface::set_state(maximized) or mir::surface::set_state(restored? | 21:48 |
| dandrader | racarr, so I don't bother with this sad situation and just let it go maximized, which is what we want | 21:48 |
| dandrader | and now I'm repeating myself... | 21:48 |
| dandrader | racarr, so if you wanna to fix thing you would have to rewrite that papi api to be a 1-to-1 mapping of mir | 21:49 |
| dandrader | racarr, which is a waste of developer time IMHO | 21:49 |
| dandrader | and would also require changing papi users: ie, qtubuntu | 21:51 |
| === mterry is now known as 18VAAOQDS | ||
| === 18VAAOQDS is now known as mterry | ||
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!