[05:24] <RAOF> Bah! Why does my USB controller suddenly drop dead?
[05:24] <RAOF> I _like_ my external keyboard, damnit!
[05:30] <duflu> RAOF: Snap. Me too, on random systems
[06:41] <anpok_> it workss
[06:42] <anpok_> now i need to add mir platform support.. hmm
[08:19] <duflu> Hmm why is bzr push suddenly so slow this week?
[08:19] <duflu> My upload speed is unchanged
[08:30] <duflu> camako, alan_g: Priority regression fix needs review: https://code.launchpad.net/~vanvugt/mir/fix-1339700-alarm/+merge/226252
[08:31] <alan_g> duflu: looking...
[08:32] <duflu> I think the safety of dropping the lock early is easy to verify on inspection. Then you just have to be convinced that it's the same issue as shown in the stack traces
[08:34] <duflu> Agh, stupid slow uploads. What's wrong with the pipe?
[08:35] <duflu> Hmm, actually it looks like it's uploading at theoretically max speed. So bzr has somehow been slowed down by the complexity of our branches?..
[08:36] <alan_g> duflu: I'm not convinced by "*easy* to verify on inspection" - if the cb be can removed between the unlock and invoking the copy then the called back code may touch objects that no longer exist.
[08:38] <duflu> alan_g: You're assuming the callback touches its own alarm object. That's actually quite hard to do, and even if you did, you would be re-entering the alarm's mutex (which is not recursive), resulting in a crash/deadlock
[08:39] <duflu> Actually C++ says "undefined" behaviour, which is sometimes crash or deadlock
[08:41] <duflu> alan_g: Or rather, the callback is only made within the lifetime of the code that owns the alarm. So you do have confidence in when/how the callback will be made
[08:41] <alan_g> duflu: I'm saying that it is isn't "easy to verify" that nothing (on any thread) can decide to cancel the alarm and destroy the handler between the unlock and the invoke.
[08:42] <alan_g> It may well be *possible* to verify
[08:42] <alan_g> Am looking...
[08:44] <duflu> alan_g: OK, I'm not sure now either. The right answer is for objects to never have internal locking except to protect threads they themselves have created. But that's a larger architectural change
[08:50] <alf_> camako: alan_g: duflu: top-approving https://code.launchpad.net/~thomas-voss/mir/explicit-gcc-version/+merge/226140 unless someone objects soon
[08:50] <duflu> alf_: *shrug*
[08:50] <alan_g> alf_: no objection
[09:05] <alf_> @duflu's "Hmm why is bzr push suddenly so slow this week?" -> something changed and bzr is uploading an awful lot even for small branches: just uploaded 40M for a 6K diff :/
[09:06] <alan_g> alf_: I *think* it is because we lost lp:mir and the uploads are diffs against that
[09:07] <alf_> alan_g: hmm, so lp is not using stacked branches for our new branches...
[09:08] <alan_g> alf_: it is a guess based on the delays seen when we diverge from lp:mir not on research into how bzr works
[09:10] <alan_g> alf_: camako as duflu has gone can we have another opinion on https://code.launchpad.net/~mir-team/mir/fix-1339700/+merge/226233 with a view to top-approving?
[09:10] <camako> alan_g, sure...
[09:12] <alf_> camako: ^^ and we also need to fix the bzr upload issue... we can't be pushing 40M for each new branch
[09:13] <camako> alf_, okay
[09:19] <greyback> I think you can use bzr push --stacked-on=lp:something for now. But not having lp:mir is extremely confusing!
[09:20] <alan_g> greyback: thanks, that makes sense.
[09:23] <anpok_> alan_g, alf, camako: I have a third solution
[09:24] <anpok_> but not ready yet
[09:24] <alan_g> anpok_: solution to which discussion?
[09:24] <anpok_> deadlock
[09:25] <alan_g> anpok_: lp:1339700?
[09:25] <anpok_> yes
[09:25] <anpok_> calling timer callback without a lock, and ensuring sequential ordering of timer callback execution and eventual canceling/reconfiguration
[09:26] <anpok_> havent worked on it since I do the qxl mesa/kernel stuff
[09:32] <anpok_> https://code.launchpad.net/~andreas-pokorny/mir/synchronous-cancel-of-alarms/+merge/224530
[09:32] <anpok_> just updated it
[09:45] <alf_> anpok_: Can't we make main_loop_thread atomic<> instead of locking it? As it is, the code may deadlock if e.g. a synchronous action from execute() calls AsioMainLoop::stop()
[09:48] <anpok_> hmmm
[09:49] <anpok_> hm
[09:50] <anpok_> ok then i would reset the main_loop_thread before stopping the io service
[09:50] <anpok_> to ensure that nobody queues in further handlers/actions that might not get executed
[09:51] <anpok_> the lock is more about the stop() than about the reset
[09:55] <alf_> anpok_: so, to make sure I understand correctly:
[09:57] <alf_> anpok_: 150+ if (data->state.compare_exchange_strong(expected_state, mir::time::Alarm::triggered)) , is what guards us from an alarm event that was enqueued asynchronously after cancel() was called?
[09:58] <alf_> anpok_: e.g. we call cancel and enqueue a cancellation handler, but meanwhile the alarm gets triggered and enqueues an alarm handler
[10:05] <anpok_> the cancel op is first
[10:05] <anpok_> it will remove the strong ref from the timer object
[10:05] <anpok_> when the alarm handler is called it will fail in getting a shared_ptr
[10:05] <davmor2> Hey guys I'm ready to start testing silo 006 the qt comp only I've been informed that the lastest version didn't build is there anyone that can double check that before I waste my time trying to install and test it please?
[10:07] <alan_g> greyback: were you dealing with the silo? ^
[10:08] <davmor2> greyback: looking at it, it might of built for arm and just failed else where but I just wanted to be sure
[10:08] <anpok_> alf_: i would love to get rid of the thread id mutex - if we could gurantee that any outstanding operation is executed during stop - and an attempt to restart during the stop procedure is avoided (<-why did I think this is necessary?)
[10:09] <anpok_> during or before stop completes
[10:14] <alf_> anpok_: "the cancel op is first, it will remove the strong ref from the timer object", where does it do that?
[10:16] <anpok_> oh you are right
[10:17] <anpok_> I have seen too many versions of that part
[10:17] <anpok_> it just changes the state
[10:18] <anpok_> so cas will fail and no handler will be executed
[10:20] <greyback> davmor2: silo is rebuilding, a recent unity8 release means silo6 is a little out of date
[10:21] <davmor2> greyback: awesome thanks,  Didn't want to waste half a day to realise I hadn't actually tested what needed testing :)
[10:22] <greyback> davmor2: I'd love your testing feedback at some stage however. Can I ping you when silo ready?
[10:24] <davmor2> greyback: yeap sure I'm setup today for just testing this silo on manta mako and flo
[10:29] <greyback> davmor2: magic. Silo6 does work currently, but you have to carefully specify packages with this http://pastebin.ubuntu.com/7774465/ - you might be better off waiting for the rebuild tho
[10:30] <davmor2> greyback: I can wait there is other stuff I need to get on with too :)
[10:31] <greyback> davmor2: ack
[11:08] <davmor2> greyback: hmm silobot tells me that the packages are built now ;)
[11:09] <greyback> davmor2: huh, it didn't ping me
[11:10] <davmor2> greyback: see #ubuntu-ci-eng
[11:10]  * greyback doubts his irc client now
[11:10] <greyback> aha there we are
[11:10] <davmor2> haha
[11:56] <alf_> greyback: Trying out QtComp on N4, works well. Some notes that I am not sure if they are problems:
[11:57] <greyback> alf_: great, please share!
[11:58] <alf_> greyback: The icons in the launcher seem strange, at least slightly different from what I remember with previous unity8. e.g. some icons have the bottom left corner cut off
[11:58] <alf_> greyback: launcher => the bar you swipe in from the left
[11:58] <greyback> mzanetti: is that the new design^^
[11:59] <mzanetti> greyback: alf_: yes it is. It indicates which icons are pinned to the launcher
[11:59] <mzanetti> recent (unpinned) onces won't have the corner clipped
[11:59] <mzanetti> and yes, the whole launcher got a new design :)
[11:59] <alf_> greyback: mzanetti: also the ubuntu/home icon is now a full rectangle?
[12:00] <greyback> mzanetti: how are you doing that corner clip?
[12:00] <mzanetti> yeah... I'm not yet used to that either...
[12:00] <mzanetti> greyback: shadereffect
[12:00] <greyback> mzanetti: ok
[12:00] <alf_> greyback: I also only see the apps scope, is this normal, or have I messed up something?
[12:01] <mzanetti> alf_: are you?
[12:01] <mzanetti> alf_: note, scopes have a new header too :)
[12:01] <mzanetti> alf_: tried swiping left/right?
[12:01] <alf_> mzanetti: nothing happens
[12:01] <mzanetti> hmm... ok... *should* work
[12:01] <greyback> actually same here
[12:01] <alf_> mzanetti: I see the new header with search icon on the right
[12:02] <mzanetti> is this QtComp?
[12:02] <greyback> yeah
[12:02] <mzanetti> it was working here before when I tried the merge. lemme check
[12:05] <alf_> greyback: mzanetti: not related to qtcomp, more of a design issue, but I find the various different ways that you go "back" distracting
[12:06] <alf_> greyback: mzanetti: not much consistency there
[12:06] <mzanetti> alf_: hmm... example?
[12:06] <mzanetti> alf_: afaik we only have the one back button at the upper left corner (except for apps that haven't been updated yet)
[12:07] <mzanetti> which shouldn't be many any more
[12:07] <mzanetti> gallery app is one of them
[12:08] <alf_> mzanetti: right, that's the one I was thinking of
[12:08] <mzanetti> alf_: yeah... that's gallery app being outdated
[12:09] <alf_> mzanetti: but on a related note, the upper left corner is not very easy to reach if you are holding the phone with one hand (the right)
[12:10] <mzanetti> alf_: I agree. there has been a discussion on the ubuntu-phone mailing list
[12:10] <mzanetti> alf_: seems that's the only place where users don't fail to find it
[12:10] <mzanetti> don't ask me why
[12:11] <mzanetti> alf_: ogra_ even dropped his phone multiple times because of this :D
[12:13] <alf_> mzanetti: since I am play testing... have there been any discussions to reduce applications start up times, perhaps preload some common ones (e.g. waiting 2-3 seconds for the dialer to come up is painful)
[12:14] <ogra_> mzanetti, yeah !
[12:14] <mzanetti> alf_: there has been a discussion although I don't know the state/outcome.
[12:14] <ogra_> there wasnt any
[12:14] <ogra_> design said "this is how we designed it" ... no further discussion happened
[12:15] <mzanetti> right, for the back button. yep, that's what it was mostly
[12:15] <mzanetti> for the preloading though I think architects had some thoughts about it
[12:23] <anpok_> well, we should look what takes so much time - i.e. it could be something trivial like shader compilation..
[12:25] <mzanetti> lots of it is QML compilation
[12:25] <mzanetti> there have been thoughts about precompiling QML
[14:43] <AlbertA> alf_: so there is one scenario in TimeoutFrameDroppingPolicy
[14:43] <AlbertA> alf_: where an alarm can be rescheduled after it was cancelled
[14:44] <AlbertA> alf_: if Thread A calling swap_not_blocking is pre-empted right after if (pending_swaps++ == 0)
[14:45] <AlbertA> alf_: and Thread B calls swap_unblocked, cancels alarm and decrements pending_swaps
[14:45] <AlbertA> then Thread A will reschedule the timer, which can potentially lead to the assert(pending_swaps.load() > 0); triggering
[14:46] <AlbertA> but I think converting that assert into just a return should cover that...
[14:52] <dobey> AlbertA: for #1337481 i wonder if just doing a no-change rebuild in the archive might fix it? (though of coruse it won't prevent it from possibly recurring in the future)
[14:56] <AlbertA> dobey: it's something I wanted to try for sure...we are about to spin 0.4.1 though...so perhaps that would cover it
[14:56] <AlbertA> dobey: I hate these heisenbugs...
[14:58] <davmor2> kgunn: osk is playing up in Qtcomp trying to put my finger on why also I only see the apps scope I don't seem to be able to change
[14:58] <dobey> AlbertA: yeah, i know what you mean
[14:59] <greyback> davmor2: apps scope bug is our bug, we've fix on the way
[14:59] <davmor2> greyback: right nice
[14:59] <greyback> davmor2: osk bug?
[15:03] <davmor2> greyback: what I've discovered is that osk will randomly stops working in some apps.  Like messaging app I now can't get it to raise, The text box goes white but the cursor doesn't appear in the test field
[15:06] <kgunn> davmor2: might want to make doubly sure its qtcomp...it was acting up like hell on N7 for me in the virgin image (y'day)
[15:07] <kgunn> messaging and phone app very very wonky specifically on n7 virgin image also
[16:43] <davmor2> kgunn: okay so everything I can test seems to be working,  Only the keyboard issue that I've hit.  I'll start digging into that and see if I can get anything useful from logs etc.  I would need the fix for scopes to land to actually continue testing though.  Apps only currently :)
[16:44] <kgunn> davmor2: thanks for testing!
[17:26] <popey> kgunn: looks like latest mir broke the music app (again). Unplug phone, start music app, press play, let phone go dark, it doesn't continue to the next track, but does when you wake the phone.
[17:26] <popey> bug 1292306 related
[17:27] <kgunn> popey: are you playing local music ?
[17:27] <popey> yes
[17:27] <kgunn> hmmm....
[17:27] <popey> ahayzen: music dev discovered and I confirmed it
[17:27] <kgunn> mir hadn't changed since image 110
[17:28] <kgunn> popey: are you sure its mir  ^
[17:28] <popey> well it felt like *that* bug ☻
[17:28] <ahayzen> kgunn, how can we tell if it is/isn't mir?
[17:28] <kgunn> popey: and we sure hadn't touched that particular part of the mechanism (....so....much....pain)
[17:29] <popey> heh, i hear you!
[17:29] <kgunn> ahayzen: when did it start happening ?
[17:29] <kgunn> i assume this gets tested every image
[17:29] <popey> not sure it does ⍨
[17:29] <ahayzen> kgunn, 'recently' ... no we don't have any automated testing on this :/
[17:29] <popey> i have it on the devel image
[17:32] <kgunn> popey: i do know that jhaddop and the boys/girls were changing stuff in their area related to this....but not sure what...i can go retro and test an image to see if it was mir...but nothings changed since 110
[17:32] <kgunn> btw i need to run
[17:33] <popey> ok, thanks kgunn
[17:33] <ahayzen> thanks kgunn
[17:33] <popey> ahayzen: lets get a new bug filed for this, and get it on the radar
[17:33]  * popey moves to -ci-eng
[17:33] <ahayzen> popey, agreed
[17:37] <anpok> AlbertA: regarding the deadlock
[17:37] <anpok> display needs to be off?
[17:38] <anpok> to experience it
[17:38] <AlbertA> anpok: it needs to be off, and then you need to hit the power key again to start the compositor
[17:38] <anpok> oh seems like I just experienced a different problem
[17:38] <AlbertA> so if you get lucky
[17:38] <AlbertA> the timeout will be executing as the compositor starts and calls swap_unblocked
[17:38] <AlbertA> which will deadlock
[17:39] <AlbertA> anpok: oh yea?
[17:39] <anpok> n10 with qtcomp branch .. freezes
[17:39] <AlbertA> just in normal use?
[17:39] <anpok> for a few seconds then continues
[17:39] <anpok> hmm during animations
[17:40] <anpok> from app to shell
[17:40] <anpok> or on application startup
[17:40] <AlbertA> anpok: I see....
[17:42] <anpok> hm this is new .. n10 was working extremely fluid yesterday
[18:30] <greyback> anpok: yeah, I'm seeing it now too. First time I've ever experienced that, wtf is making everything just block?
[18:34] <greyback> hmm, wonder if the snapshotting is to blame
[18:35] <greyback> I see blocking in libdbus which I didn't expect either
[20:18] <AlbertA> anpok: so your branch https://code.launchpad.net/~andreas-pokorny/mir/synchronous-cancel-of-alarms/+merge/224530
[20:19] <AlbertA> anpok: would not resolve the deadlock for https://bugs.launchpad.net/mir/+bug/1339700
[20:19] <AlbertA> anpok: since cancel is synchronous
[20:21] <AlbertA> i.e. Thread A (the one executing the ServerActionQueue.. may be executing the alarm handler for TimeoutFrameDroppingPolicy
[20:22] <AlbertA> the policy callback then will try to acquire BufferQueue::guard
[20:23] <AlbertA> let's say there's thread B, calling BufferQueue::compositor_release
[20:23] <AlbertA> which owns BufferQueue::guard
[20:23] <AlbertA> and attempts to cancel the alarm (due to framedrop_policy->swap_unblocked();)
[20:25] <AlbertA> which will wait indefintely since it's synchronous and won't get executed until AsioMainLoop::process_server_actions returns from the alarm handler
[20:25] <AlbertA> so deadlock...
[20:31] <AlbertA> anpok: but I think if we expose the async_cancel api, we can make use of that to avoid this condition. The alarm handler in TimeoutFrameDroppingPolicy
[20:32] <AlbertA> can deal with spurious calls....
[20:32] <AlbertA> maybe....I need to think about it some more....
[20:50] <anpok> AlbertA: hm but it wont use the queue in that case
[20:51] <AlbertA> anpok: ? Trhead B? but that's the compositor thread
[20:51] <anpok> ah ok I have to read that again
[20:55] <anpok> AlbertA: yes you are right this was an attemmpt to keep up the synchronous api
[20:56] <anpok> but it seems that is the actual mistake
[20:56] <AlbertA> anpok: so the reason for queing them up for server action queue,
[20:57] <AlbertA> is due to timer.cancel() not guaranteeing that there will be no more handlers invoked?
[20:57] <anpok> yes
[20:57] <AlbertA> ok
[20:58] <anpok> timer.cancel tries to provide a synchronous api without the guarantees
[20:58] <anpok> i.e. cancel may return, and may destroy other related objects, previously referenced by the timer callback, and another thread is scheduled and executes the timer
[20:59] <anpok> i.e. happened in ~TimeoutFrameDroppingPolicy
[21:00] <anpok> greyback: yes there are unity8 logs about snapshotting
[21:00] <greyback> anpok: actually I think it is a problem with dbus
[21:00] <anpok> but only in the app -> phone shell switch cases..
[21:01] <greyback> anpok: connecting with strace, unity8 is continually polling for something, and when it tries to send a dbus message, blocks for 25seconds before timing out (then tries again I think)
[21:01] <greyback> sbus messages are sent for app focus changes
[21:02] <anpok> and that inside the rendering thread?
[21:02] <anpok> or the event thread block rendering again?
[21:02] <greyback> anpok: event thread blocks
[21:02] <anpok> yay!
[21:02] <greyback> why dbus is failing I don't understand
[21:03] <anpok> was there a recent change?
[21:03] <greyback> but I see in my unity8 log that it crashed the first time when trying to connect to the dbus socket
[21:03] <anpok> my n4 image is a bit older than the n10 image
[21:03] <anpok> only see it there
[21:03] <greyback> no relevant recent change actually
[21:03] <greyback> I only see this on N10, not N4/7
[21:03] <greyback> perhaps a race somewhere
[21:08] <anpok> AlbertA: i am not sure - there are a few synchronisation points like destructors - there we need to have synchronous behavior. Apart from that queuing or working with completion handlers seems simpler.
[21:10] <AlbertA> also it looks like the users of alarm need to protect it externally
[21:10] <AlbertA> i.e. like if one thread is doing reschedule_in and another trying to cancel
[21:11] <AlbertA> well not in the traditional sense I suppose
[21:11] <AlbertA> alarm state itself will be fine
[21:17] <AlbertA> anpok: I think the branch looks fine, except for line 244
[21:17] <AlbertA> data = std::make_shared<InternalState>(data->callback);
[21:17] <AlbertA> USC will update the timer repeatedly to reset the inactivity timer during motion events
[21:17] <AlbertA> I'm concerned about the overhead
[21:18] <kgunn> i love it when i type reboot in the wrong window
[21:18] <AlbertA> kgunn: I hate it...:)
[21:18] <kgunn> so annoying
[21:18] <AlbertA> I've gotten used to adb shell reboot instead.... <= workaround
[21:19] <kgunn> totally...
[21:19] <kgunn> i had a moment of weakness :)
[21:19] <anpok> AlbertA: wait until your system exposes an adb service that can be used from your phone
[21:19] <AlbertA> anpok: ha
[21:21] <anpok> AlbertA: hm could be replaced by a different mechanism
[21:21] <anpok> maybe a configuration counter? to differ between two pending states?
[21:21] <anpok> and the action inside the queue stores the expected pending counter?
[21:25] <AlbertA> like adding a pending_state ?
[21:27] <anpok> hm there alread is?
[21:27] <anpok> i meant something to detect inside the queue whether the alarm object is out of sync with the currently executed action
[21:59] <AlbertA> anpok: so the only reason I see for data being a shared_ptr is for lifetime issues...which should be now addressed with the synchronous cancel no?
[21:59] <AlbertA> do we really need auto data = possible_data.lock();
[22:01] <AlbertA> anpok: I mean basicaly this comment
[22:01] <AlbertA> http://paste.ubuntu.com/7777346/