[00:02] well, I'm going to try again: desktop Unity 8 has been dead in the water for a while, segfaulting in the Intel driver on uninitialized buffers when the Mir server does a glClear() [00:03] currently logged as https://bugs.launchpad.net/ubuntu/+source/unity8-desktop-session/+bug/1336854 [00:03] Ubuntu bug 1336854 in unity8-desktop-session (Ubuntu) "Unity 8 fails to start, segfault in i965_dri.so" [Undecided,New] [00:03] wit han attached stack trace [00:03] I'm looking for suggestions on what may have changed in the stack in the last week or two, and/or suggestions on how to chase down more information to help narrow the cause down [00:03] bregma: Yeah, I noticed that yesterday. [00:04] It's going to be a mesa problem. [00:04] unity-system-compositor continues to run fine, so it could be something wacky higher in the stack, it's under a lot of churn lately too [00:05] usc is fine because it's not using the Mir EGL platform. [00:05] I suspect that you first saw this on the 23rd? [00:05] That's when Maarten uploaded hte new mesa. [00:05] sounds suspicious [00:06] if I can perhaps revert to an older version, that would point the finger [00:06] Give it a whirl. [00:06] I'll poke around in the mesa code. [00:34] FTR, reverting libgl1-mesa-dri to 10.1.3-0ubuntu0.1 fixes the segfault [00:42] Thanks. I'm installing a debug build, so it shouldn't be long before working out what's wrong and fixing it. [01:34] Ah! There's the problem. [01:54] bregma: Mesa 10 [01:55] bregma: Mesa 10.2.1-2ubuntu3 fixes your problem. Enjoy! [02:57] racarr: What's a prompt session? [03:07] duflu: trust sessions [03:07] ;) [03:07] https://wiki.ubuntu.com/Security/TrustStoreAndSessions [03:08] sort of explains how the name prompt session [03:08] comes about [03:09] racarr: OK then... [06:25] Oh, whoops. We appear to call all manner of un-signal-safe functions in the emergency cleanup signal handler. [06:25] Including allocations! [06:48] RAOF: Sounds a bit silly. Incidentally I mentioned we shouldn't ever do such a thing in a branch that's pending... https://code.launchpad.net/~vanvugt/mir/fatal-error/+merge/219471 [06:48] Yeah, reviewing that is what brought me to look at the emergency cleanup bit. [06:49] RAOF: Well the latest emergency cleanup stuff I don't think I reviewed either [06:49] The plate was piled too high in May [06:51] Hmm Chrome in Trusty has annoyingly bad tearing (diagonal/triangles)... I wonder if that's Chrome or Compiz [06:52] If it's Compiz then of course we don't care any more [06:53] I think it's Chrome; it started with the Aura drop, IIRC. [06:53] * duflu looks up Aura [06:54] Oh, nice one Google... they really should be able to do better than that [07:12] RAOF: duflu: where are we doing allocations in the emergency cleanup handlers? [07:12] No idea [07:12] * duflu points to RAOF [07:13] alf_: You assign to a vector; that's at least potentially doing allocation. [07:13] Oh that will do it [07:13] Also, pthreads isn't threadsafe, right? [07:13] Depending on the method... [07:13] RAOF: Umm, what? [07:13] Ahem. [07:13] Signalsafe [07:13] 'cause we also lock a mutex in there. [07:14] RAOF: It never used to be... Signals could arrive in _any_ thread but should arrive in just one. So unpredictable but safe I think [07:14] Right, but what happens if pthread_lock gets interrupted by a signal? [07:14] If you want to determine the thread to use then pthread_kill [07:15] RAOF: duflu: at vector, right, that's easy to fix I will take a look [07:15] RAOF: "If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread resumes waiting for the mutex as if it was not interrupted." [07:15] [http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_lock.html] [07:16] I'm not sure that covers my concern. [07:16] But signal-safety documentation isn't the best :) [07:17] RAOF: Well that's just the spec. Everyone has their own implementation [07:17] No, I mean that it's not clear that documentation covers the case I'm concerned about. [07:17] RAOF: "The mutex functions are not async-signal safe. What this means is that they should not be called from a signal handler. In particular, calling pthread_mutex_lock or pthread_mutex_unlock from a signal handler may deadlock the calling thread." [07:17] RAOF: :) [07:18] alf_: Ding! [07:18] Certainly, locking in signal handlers is also bad [07:18] Then again without such bugs I would have received more core files in my life time than I have [07:18] Basically, the set of things you can _safely_ do in a signal handler is poke at memory addresses :) [07:19] [07:19] I remember plenty of stack traces from customers showing code hanging in a signal handler after a crash :/ [07:20] Yup. [07:20] It's disturbingly easy to deadlock in one. [07:21] duflu: RAOF: We can drop the locks, making it clear that you are not to call the emergency cleanup while concurrently adding a handler [07:21] duflu: RAOF: plus the emergency cleanup is a best effort cleanup [07:22] alf_: If we can guarantee the signal handler is only called once then I guess reduced locking is OK [07:22] Well, the signal handler isn't reentrant. [07:22] (By default) [07:22] But we should ensure that we don't do anything that's highly likely to deadlock there :) [07:23] Also easy to do by accident though... if someone else asks the signal() function for the existing handler (yours) and calls it [07:23] ?! [07:23] Why on earth would they do that?+ [07:24] duflu: RAOF: and if we get rid of the locking requirement we might as well drop the vector copy [07:24] RAOF: 3rd party libraries or general dealing with bad APIs where you need a signal handler [07:24] duflu: Oh, as in wrapping a signal handler? [07:25] RAOF: I think that's one case [07:25] alf_: If the vector never gets too big then an array is fine [07:26] alf_: Yup. We could, of course, make it a signal-safe data structure, but given that it's basically write-once I think we can happily read without locking. [07:28] If it's write-once and that's guaranteed to happen before the read there is no race. And Helgrind etc will be happy without locking [07:32] duflu: RAOF: it says "The mutex functions are not *async-signal* safe", but the signals we handle with emergency cleanup are synchronous [07:34] Unless someone sends us SIGsomething :) [07:35] Heh, bypass must be awesome if it's still doing my head in a year later === doko_ is now known as doko === vila_ is now known as vila === alan_g is now known as alan_g|lunch === alan_g|lunch is now known as alan_g === pete-woods is now known as pete-woods-lunch === alan_g is now known as alan_g|tea === alan_g|tea is now known as alan_g === greyback_ is now known as greyback|post === pete-woods-lunch is now known as pete-woods === greyback|post is now known as greyback === chihchun is now known as chihchun_afk === dandrader is now known as dandrader|lunch === alan_g is now known as alan_g|EOD === dandrader|lunch is now known as dandrader [18:09] who would be best to bug to maybe get a crash in libmirclient on application exit on the phone fixed asap today? [18:10] dobey: bug #? [18:10] dobey: which image# ? [18:11] racarr: can you take a look at https://code.launchpad.net/~albaguirre/unity-system-compositor/no-inactivity-handling-desktop/+merge/225537 [18:11] racarr: it's fairly small :) [18:11] AlbertA: on 111, but lokos like it's been happening for a little while. i haven't filed a bug yet, the reports on errors.u.c failed to retrace, so there's no "create a bug" link for them. trying to determine the best way to file the bug === renato_ is now known as Guest58058 [18:12] dobey: so any application exiting crashes? [18:12] should i just file it with the top of the stack trace and a link to the errors? [18:12] dobey: what's the fastest way to reproduce? [18:12] AlbertA: any using the mir backend of qt/qml afaict. open clock app, wait a few seconds, and close it, and there should be a crash report in /var/crash/ [18:13] https://errors.ubuntu.com/problem/6552ba4342afeb93d20e22711ac36f655cd885d8 [18:14] or go to online accounts, then hit back and should result in in a crash report too [18:16] dobey: I couldn't access that link [18:16] dobey: but I'll take a look, if you can submit a bug # that would be great [18:17] AlbertA: sure. should i just copy/paste the top of the stack trace in the bug? [18:17] dobey: sure [18:18] ok will do [18:21] AlbertA: Sure [18:28] AlbertA: https://bugs.launchpad.net/ubuntu/+source/mir/+bug/1337481 [18:28] Ubuntu bug 1337481 in mir (Ubuntu) "Crash in libmirclient on app exit on phone" [Undecided,New] === dandrader is now known as dandrader|afk [18:59] dandrader|afk: greyback: https://code.launchpad.net/~unity-team/platform-api/devel-for-qtmircompositor/+merge/225320https://code.launchpad.net/~unity-team/platform-api/devel-for-qtmircompositor/+merge/225320 [18:59] err whoops [18:59] but what is up with line 427 [19:01] It looks like, state_before_hiding is being used to save the state across minimize state changes [19:01] but why initialize to MAXIMIZED? There must be a default or currentstate [19:01] racarr: indeed. I can't say. Hope dandrader|afk can reply [19:02] anyway good besides that [19:03] if he doesnt come back soon ill leave some comments [19:03] on launchpad [19:03] racarr: please do, and I'll try to fix (daniel off on hols at eod today) [19:04] :) sounds nice [19:04] racarr off on holidays in 51 days [19:04] lol === dandrader|afk is now known as dandrader [21:43] racarr, right, I'm just initializing it to something [21:44] racarr, the first time you call show(), the state will be set to this value [21:46] racarr, which is what we want on phablet anyway [21:46] racarr, "restored" windows make no sense there [21:48] racarr, and the "beautiful" papi API is not expressive enough at the moment for the user to set the mir surface states properly. eg: "papi::window::show() == mir::surface::set_state(maximized) or mir::surface::set_state(restored? [21:48] racarr, so I don't bother with this sad situation and just let it go maximized, which is what we want [21:48] and now I'm repeating myself... [21:49] racarr, so if you wanna to fix thing you would have to rewrite that papi api to be a 1-to-1 mapping of mir [21:49] racarr, which is a waste of developer time IMHO [21:51] and would also require changing papi users: ie, qtubuntu === mterry is now known as 18VAAOQDS === 18VAAOQDS is now known as mterry