[00:39] too many threads.....too many mutexes.... [00:42] so I think a combination of anpok_ branch and just avoiding canceling the timer in TimeoutFrameDroppingPolicy [00:43] with appropiate checks of pending_swaps...should do it... [00:49] AlbertA: I think one extra mutex should make everything work right :) [00:51] RAOF: heh...I think we also need an extra thread :) === chihchun is now known as chihchun_afk === alf is now known as 6JTAAQ1J9 === kgunn is now known as Guest2511 [02:37] Guest2511: Welcome to Mir :) [02:38] its just me :) kgunn [02:38] night o/ === chihchun_afk is now known as chihchun [04:34] Is there really no way to tell before you try writing how much data you'll need to write to a socket before it'll block? === 6JTAAQ1J9 is now known as alf_ === chihchun is now known as chihchun_afk === chihchun_afk is now known as chihchun [08:23] anpok_: I know you've a wider solution in the pipe, but if you're happy with this can we top-approve to get a fix landed? https://code.launchpad.net/~vanvugt/mir/fix-1339700-alarm/+merge/226252 [09:46] alan_g: http://paste.ubuntu.com/7779262/ , it's not a problem in the observer code per se, it's a race in our session shutdown code [09:47] alan_g: if when closing a session we focus another session that has a different display configuration, we queue a display configuration change in the action loop that eventually needs to a acquire a write lock to remove surface observers when shutting the compositor down [09:47] alf_: should I take your word for it or do you need a second set of eyes? [09:48] alan_g: thanks, the problem is clear to me so need to take up your time [09:48] alan_g: ...so no need... [09:50] alan_g: ... meanwhile we continue tearing down the session object which triggers a surface remove observer (under read lock) and which eventually wants to unregister an input fd from the main loop [09:54] alan_g: that's just FYI, you can ignore the above, just expressing the issue so that it becomes even clearer in mind [09:54] alf_: understood. And it will help me with the review [10:52] alan_g: camako_: FYI, https://bugs.launchpad.net/mir/+bug/1340669 [10:52] Ubuntu bug 1340669 in Mir "Intermittent deadlock when swithcing to session with custom display configuration while closing other session" [High,New] [10:53] alf_, hmmm this is different from the other deadlocks? [10:54] camako_: yes [10:54] camako_: (not fixed/affected by Alan's branch) [10:55] alf_, ack... just as severe, and need to be included soon in a release, I guess. [10:56] camako_: right, I marked it "high" instead of "critical" since it only affects XMir session at the moment, but feel free to upgrade [10:58] camako_: unfortunately, although the problem is clear, the solution hasn't become clear yet [10:59] alf_, finding the bug is half of the work to fix it :-)... [11:02] writing the test that demonstrates the bug is half of the work too [11:02] ...so fixing it must be trivial [11:04] camako_: any problems with the nested_lifecycle_events MP or just been busy? [11:05] alan_g, just back to it now.. should update soon [11:05] cool === chihchun is now known as chihchun_afk [11:54] alan_g: alf_: hey what about going all single threaded? [11:55] alf_: what's the status of MesaDisplayTest? I thought there was a recent "intermittent failure" bug but can't seem to find it [11:55] anpok_: where's the fun in that? [11:55] I believe we should copy whatever the renderers need to composite a scene each frame, and have the rest happen in a single thread [11:56] alan_g: there is none! but maybe when we get bored after a few months without a race or deadlock we would happily switch back? [11:57] alan_g: https://bugs.launchpad.net/mir/+bug/1336671 , waiting for fixed umockdev package to land in the archive (haven't checked today, perhaps it already did) [11:57] Ubuntu bug 1336671 in Mir "Intermittent mir_unit_tests.MesaDisplayTest.drm_device_change_event_triggers_handler test failure: libumockdev isn't thread safe" [Medium,In progress] [12:00] anpok_: all the threads we have are there for a reason (performance), I am all for a single threaded structure, but I doubt we could get decent performance this way [12:02] alf_: thanks. [12:08] alf_: you mean both the ipc handling threads and the renderer threads? === alan_g is now known as alan_g|lunch [12:09] anpok_: and input [12:10] ah right I forgot about input [12:10] at least i try to :) [12:10] * alf_ is reminded once again that "Writing correct shut-down code is hard" http://books.google.gr/books?id=_i6bDeoCQzsC&lpg=PT411&ots=eo2Pxn2b06&dq=writing%20correct%20shutdown%20code%20is%20hard%20martin&pg=PT411#v=onepage&q&f=false [12:12] at least I know have an acceptance/regression test for the deadlock... [12:25] doodle? [12:30] alan_g, just pushed the fixed version of nested_lc [13:00] alan_g|lunch ^^ [13:12] kgunn: hi! === alan_g|lunch is now known as alan_g [13:15] sil2100: hey [13:15] camako_: looking [13:15] * kgunn realizes this is going to be a 3 coffee day [13:15] see i've included all the "e"s so i've already had 2 [13:15] kgunn: how's the progress on bug LP: #1339700 ? As I see many merges there and got a bit confused ;) [13:15] Launchpad bug 1339700 in Mir "[regression] Device locks randomly on welcome screen" [High,In progress] https://launchpad.net/bugs/1339700 [13:16] sil2100: sorry...i meant to send a mail last night [13:16] sil2100: lemme see one email in my pile real quick [13:16] re this [13:16] Ok :) [13:16] Thanks [13:18] sil2100: ok, bottom line, we thot we had a fix, got to talking realized we needed to do a little surgery vs band-aiding.... [13:18] sil2100: so we backed up, had lots of discussion on irc & launchpad.... [13:18] sil2100: i think we'll consolidate those thots into one patch today, test, promote on our devel, then propose to trunk in an isolated fashion... [13:19] e.g. we'll do a 0.4.1mir with this bug fix only [13:19] kgunn: that's a good plan [13:19] camako_: ^ did i speak right ? [13:19] kgunn: would be best to land this isolated, without pulling in all the other, risky bits [13:19] Especially that we're trying to get a completely green image ;) [13:19] kgunn, yeah sounds right [13:21] kgunn, camako_: thanks guys :) [13:21] camako_: oh, and let me finish up my check-up on the lp:mir request you had yesterday [13:22] sil2100.. sure.. take ur time [13:43] camako: another iteration of fixes [13:45] alan_g ack [14:11] * kdub_ is debating adding a mir::Fd type [14:13] kdub_: do it [14:15] yay, will help me fix this flummoxing resource problem where I have to dance around pod-int-fds [14:21] * kdub_ wishes gmock was better with movable / not copyable types [14:24] * alan_g too === alan_g is now known as alan_g|tea === alan_g|tea is now known as alan_g === chihchun_afk is now known as chihchun === chihchun is now known as chihchun_afk [16:42] If I could just std::lock all the three mutexes involved....we would be golden... [16:49] AlbertA: assuming there is a canonical order to the locking [16:51] alan_g: right... it would mean maybe the Alarm would need to get a reference to the mutex it's handler uses [16:51] so the handler can lock it in the same order [16:54] Sounds messy - a mutex should only be protecting data in the associated object [16:54] maybe we could make it part of the handler interface [16:54] a lock method... [16:55] then you have an opportunity to implement the same locking order [16:55] and optional if the handler does not need to deal with such a thing [16:57] Hmm. It is too late in the week for me to think about that. [17:01] its still too brittle though, prone to error, because as soon as you break the lock hierarchy ( inadvertently) you would still deadlock [17:01] very possible with the levels of indirection we have === alan_g is now known as alan_g|EOW === renato is now known as Guest85133 === Guest85133 is now known as renato___ [20:12] my head hurts....but I think a simple solution is just to restore cancel to the way it was...and just make the destructor of AlarmImpl [20:13] use the synchronous wait for callback to complete version [21:15] kgunn: so I'm confident that https://code.launchpad.net/~mir-team/mir/fix-1339700-take2/+merge/226534 [21:15] kgunn: with Daniel's just landed change should address the deadlock without sideeffects [21:16] kgunn: so this should not break abi: https://code.launchpad.net/~mir-team/mir/fix-1339700-take2-in-0.4/+merge/226537 [21:24] AlbertA: you are awesome [21:24] thanks [21:24] racarr__: ^ can you give that a quick review as well ? [21:25] kdub_: ^ if you're gonna be on for a little [21:25] AlbertA: so i assume you tested on n4 & n10 ? [22:00] kgunn: yes