[00:02] <bregma> well, I'm going to try again: desktop Unity 8 has been dead in the water for a while, segfaulting in the Intel driver on uninitialized buffers when the Mir server does a glClear()
[00:03] <bregma> currently logged as https://bugs.launchpad.net/ubuntu/+source/unity8-desktop-session/+bug/1336854
[00:03] <bregma> wit han attached stack trace
[00:03] <bregma> I'm looking for suggestions on what may have changed in the stack in the last week or two, and/or suggestions on how to chase down more information to help narrow the cause down
[00:03] <RAOF> bregma: Yeah, I noticed that yesterday.
[00:04] <RAOF> It's going to be a mesa problem.
[00:04] <bregma> unity-system-compositor continues to run fine, so it could be something wacky higher in the stack, it's under a lot of churn lately too
[00:05] <RAOF> usc is fine because it's not using the Mir EGL platform.
[00:05] <RAOF> I suspect that you first saw this on the 23rd?
[00:05] <RAOF> That's when Maarten uploaded hte new mesa.
[00:05] <bregma> sounds suspicious
[00:06] <bregma> if I can perhaps revert to an older version, that would point the finger
[00:06] <RAOF> Give it a whirl.
[00:06] <RAOF> I'll poke around in the mesa code.
[00:34] <bregma> FTR, reverting libgl1-mesa-dri to 10.1.3-0ubuntu0.1 fixes the segfault
[00:42] <RAOF> Thanks. I'm installing a debug build, so it shouldn't be long before working out what's wrong and fixing it.
[01:34] <RAOF> Ah! There's the problem.
[01:54] <RAOF> bregma: Mesa 10
[01:55] <RAOF> bregma: Mesa 10.2.1-2ubuntu3 fixes your problem. Enjoy!
[02:57] <duflu> racarr: What's a prompt session?
[03:07] <racarr> duflu: trust sessions
[03:07] <racarr> ;)
[03:07] <racarr> https://wiki.ubuntu.com/Security/TrustStoreAndSessions
[03:08] <racarr> sort of explains how the name prompt session
[03:08] <racarr> comes about
[03:09] <duflu> racarr: OK then...
[06:25] <RAOF> Oh, whoops. We appear to call all manner of un-signal-safe functions in the emergency cleanup signal handler.
[06:25] <RAOF> Including allocations!
[06:48] <duflu> RAOF: Sounds a bit silly. Incidentally I mentioned we shouldn't ever do such a thing in a branch that's pending... https://code.launchpad.net/~vanvugt/mir/fatal-error/+merge/219471
[06:48] <RAOF> Yeah, reviewing that is what brought me to look at the emergency cleanup bit.
[06:49] <duflu> RAOF: Well the latest emergency cleanup stuff I don't think I reviewed either
[06:49] <duflu> The plate was piled too high in May
[06:51] <duflu> Hmm Chrome in Trusty has annoyingly bad tearing (diagonal/triangles)... I wonder if that's Chrome or Compiz
[06:52] <duflu> If it's Compiz then of course we don't care any more
[06:53] <RAOF> I think it's Chrome; it started with the Aura drop, IIRC.
[06:53]  * duflu looks up Aura
[06:54] <duflu> Oh, nice one Google... they really should be able to do better than that
[07:12] <alf_> RAOF: duflu: where are we doing allocations in the emergency cleanup handlers?
[07:12] <duflu> No idea
[07:12]  * duflu points to RAOF
[07:13] <RAOF> alf_: You assign to a vector; that's at least potentially doing allocation.
[07:13] <duflu> Oh that will do it
[07:13] <RAOF> Also, pthreads isn't threadsafe, right?
[07:13] <duflu> Depending on the method...
[07:13] <duflu> RAOF: Umm, what?
[07:13] <RAOF> Ahem.
[07:13] <RAOF> Signalsafe
[07:13] <RAOF> 'cause we also lock a mutex in there.
[07:14] <duflu> RAOF: It never used to be... Signals could arrive in _any_ thread but should arrive in just one. So unpredictable but safe I think
[07:14] <RAOF> Right, but what happens if pthread_lock gets interrupted by a signal?
[07:14] <duflu> If you want to determine the thread to use then pthread_kill
[07:15] <alf_> RAOF: duflu: at vector, right, that's easy to fix I will take a look
[07:15] <duflu> RAOF: "If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread resumes waiting for the mutex as if it was not interrupted."
[07:15] <duflu> [http://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_mutex_lock.html]
[07:16] <RAOF> I'm not sure that covers my concern.
[07:16] <RAOF> But signal-safety documentation isn't the best :)
[07:17] <duflu> RAOF: Well that's just the spec. Everyone has their own implementation
[07:17] <RAOF> No, I mean that it's not clear that documentation covers the case I'm concerned about.
[07:17] <alf_> RAOF: "The mutex functions are not async-signal safe. What this means is that they should not be called from a signal handler. In particular, calling pthread_mutex_lock or pthread_mutex_unlock from a signal handler may deadlock the calling thread."
[07:17] <alf_> RAOF: :)
[07:18] <RAOF> alf_: Ding!
[07:18] <duflu> Certainly, locking in signal handlers is also bad
[07:18] <duflu> Then again without such bugs I would have received more core files in my life time than I have
[07:18] <RAOF> Basically, the set of things you can _safely_ do in a signal handler is poke at memory addresses :)

[07:19] <duflu> I remember plenty of stack traces from customers showing code hanging in a signal handler after a crash :/
[07:20] <RAOF> Yup.
[07:20] <RAOF> It's disturbingly easy to deadlock in one.
[07:21] <alf_> duflu: RAOF: We can drop the locks, making it clear that you are not to call the emergency cleanup while concurrently adding a handler
[07:21] <alf_> duflu: RAOF: plus the emergency cleanup is a best effort cleanup
[07:22] <duflu> alf_: If we can guarantee the signal handler is only called once then I guess reduced locking is OK
[07:22] <RAOF> Well, the signal handler isn't reentrant.
[07:22] <RAOF> (By default)
[07:22] <RAOF> But we should ensure that we don't do anything that's highly likely to deadlock there :)
[07:23] <duflu> Also easy to do by accident though... if someone else asks the signal() function for the existing handler (yours) and calls it
[07:23] <RAOF> ?!
[07:23] <RAOF> Why on earth would they do that?+
[07:24] <alf_> duflu: RAOF: and if we get rid of the locking requirement we might as well drop the vector copy
[07:24] <duflu> RAOF: 3rd party libraries or general dealing with bad APIs where you need a signal handler
[07:24] <RAOF> duflu: Oh, as in wrapping a signal handler?
[07:25] <duflu> RAOF: I think that's one case
[07:25] <duflu> alf_: If the vector never gets too big then an array is fine
[07:26] <RAOF> alf_: Yup. We could, of course, make it a signal-safe data structure, but given that it's basically write-once I think we can happily read without locking.
[07:28] <duflu> If it's write-once and that's guaranteed to happen before the read there is no race. And Helgrind etc will be happy without locking
[07:32] <alf_> duflu: RAOF: it says "The mutex functions are not *async-signal* safe", but the signals we handle with emergency cleanup are synchronous
[07:34] <RAOF> Unless someone sends us SIGsomething :)
[07:35] <duflu> Heh, bypass must be awesome if it's still doing my head in a year later
[18:09] <dobey> who would be best to bug to maybe get a crash in libmirclient on application exit on the phone fixed asap today?
[18:10] <AlbertA> dobey: bug #?
[18:10] <AlbertA> dobey: which image# ?
[18:11] <AlbertA> racarr: can you take a look at https://code.launchpad.net/~albaguirre/unity-system-compositor/no-inactivity-handling-desktop/+merge/225537
[18:11] <AlbertA> racarr: it's fairly small :)
[18:11] <dobey> AlbertA: on 111, but lokos like it's been happening for a little while. i haven't filed a bug yet, the reports on errors.u.c failed to retrace, so there's no "create a bug" link for them. trying to determine the best way to file the bug
[18:12] <AlbertA> dobey: so any application exiting crashes?
[18:12] <dobey> should i just file it with the top of the stack trace and a link to the errors?
[18:12] <AlbertA> dobey: what's the fastest way to reproduce?
[18:12] <dobey> AlbertA: any using the mir backend of qt/qml afaict. open clock app, wait a few seconds, and close it, and there should be a crash report in /var/crash/
[18:13] <dobey> https://errors.ubuntu.com/problem/6552ba4342afeb93d20e22711ac36f655cd885d8
[18:14] <dobey> or go to online accounts, then hit back and should result in in a crash report too
[18:16] <AlbertA> dobey: I couldn't access that link
[18:16] <AlbertA> dobey: but I'll take a look, if you can submit a bug # that would be great
[18:17] <dobey> AlbertA: sure. should i just copy/paste the top of the stack trace in the bug?
[18:17] <AlbertA> dobey: sure
[18:18] <dobey> ok will do
[18:21] <racarr> AlbertA: Sure
[18:28] <dobey> AlbertA: https://bugs.launchpad.net/ubuntu/+source/mir/+bug/1337481
[18:59] <racarr> dandrader|afk: greyback: https://code.launchpad.net/~unity-team/platform-api/devel-for-qtmircompositor/+merge/225320https://code.launchpad.net/~unity-team/platform-api/devel-for-qtmircompositor/+merge/225320
[18:59] <racarr> err whoops
[18:59] <racarr> but what is up with line 427
[19:01] <racarr> It looks like, state_before_hiding is being used to save the state across minimize state changes
[19:01] <racarr> but why initialize to MAXIMIZED? There must be a default or currentstate
[19:01] <greyback> racarr: indeed. I can't say. Hope dandrader|afk can reply
[19:02] <racarr> anyway good besides that
[19:03] <racarr> if he doesnt come back soon ill leave some comments
[19:03] <racarr> on launchpad
[19:03] <greyback> racarr: please do, and I'll try to fix (daniel off on hols at eod today)
[19:04] <racarr> :) sounds nice
[19:04] <racarr> racarr off on holidays in 51 days
[19:04] <racarr> lol
[21:43] <dandrader> racarr, right, I'm just initializing it to something
[21:44] <dandrader> racarr, the first time you call show(), the state will be set to this value
[21:46] <dandrader> racarr, which is what we want on phablet anyway
[21:46] <dandrader> racarr, "restored" windows make no sense there
[21:48] <dandrader> racarr, and the "beautiful" papi API is not expressive enough at the moment for the user to set the mir surface states properly. eg: "papi::window::show() == mir::surface::set_state(maximized) or mir::surface::set_state(restored?
[21:48] <dandrader> racarr, so I don't bother with this sad situation and just let it go maximized, which is what we want
[21:48] <dandrader> and now I'm repeating myself...
[21:49] <dandrader> racarr, so if you wanna to fix thing you would have to rewrite that papi api to be a 1-to-1 mapping of mir
[21:49] <dandrader> racarr, which is a waste of developer time IMHO
[21:51] <dandrader> and would also require changing papi users: ie, qtubuntu