[00:06] <ricmm> hi
[00:06] <ricmm> anyone around?
[00:17] <RAOF> ricmm: Sure; best to actually ask your question, though.
[00:20] <ricmm> RAOF: seeing this with latest Mir in the phone image that runs unity-mir
[00:20] <ricmm> http://pastebin.ubuntu.com/5938197/
[00:20] <ricmm> catching a SIGBUS there
[00:20] <ricmm> any ideas?
[00:20] <ricmm> examples run fine
[00:24] <RAOF> Hm. Nothing obviously springs to mind.
[00:24] <RAOF> Except that SIGBUS is one of those fun errors that's only ever going to occur on armd
[00:24] <kdub_> oh hmm
[00:25] <RAOF> Oh, hey, arm guy!
[00:26] <kdub_> ricmm, we moved that code between libraries recently, perhaps something isnt updated
[00:26] <kdub_> i'll try it, one minute
[00:27] <ricmm> thanks kevin
[00:35] <ricmm> kdub_: any clues?
[00:40] <kdub_> ricmm, i'm not seeing any problems with the example programs i have
[00:40] <kdub_> not set up for the unity-mir though on this device at the moment
[00:42] <ricmm> can you think of any API changes that might've triggered this?
[00:44] <kdub_> no :/
[00:44] <kdub_> ricmm, actually, yes!
[00:45] <ricmm> :)
[00:45] <kdub_> we changed the way that the display size is handed out, perhaps there's something amiss there
[00:45] <kdub_> it should be backward-compatible though...
[00:46] <kdub_> but if something is wrong, mabye its trying to make a surface that's a nonsense size
[00:48] <ricmm> lemme catch the call to create_surface and see the params
[00:51] <ricmm> kdub_: http://pastebin.ubuntu.com/5938258/
[00:52] <ricmm> size looks fine, it has been working like this before
[00:52] <ricmm> depth 0 means anything?
[00:52] <kdub_> right, that looks fine
[00:54] <kdub_> depth 0 is just the stack ordering position, its ok
[00:57] <ricmm> what change with the size passing?
[00:58] <kdub_> that was a client side api change, this is an internal client, so what i mentioned probably won't affect anything
[05:20] <duflu> RAOF: ping
[05:20] <RAOF> duflu: pong
[05:20] <RAOF> Would you quite like to talk bypass?
[05:20] <duflu> RAOF: I've removed the explicit (and slightly hacky) workaround fix for the input lag bug... but suspect it might still be fixed by that branch. Can you retest?
[05:20] <duflu> RAOF: No bypass this week :(
[05:21] <RAOF> duflu: Still the "switch" branch?
[05:21] <duflu> RAOF: Yes.
[05:21] <RAOF> I'll give it a try. It may take me a little while to get to that.
[05:22] <duflu> RAOF: Actually, maybe wait. I might try to do an independent fix for that bug
[05:23] <RAOF> Excellent. The best kind of testing :)
[05:28] <duflu> RAOF: I just realized, even if my branch eliminates the cause of the bug under double buffering, it introduces triple buffering which is expected to cause the same bug. I think we need to fix the compositor...
[05:29] <RAOF> ok
[05:48] <RAOF> Ok. Who changed the mir_connection_get_display_info ABI on me?
[05:52]  * duflu points to multimonitor guys
[06:01] <RAOF> Ahem.
[06:01] <RAOF> Yes, that commit did indeed break mir_connection_get_display_info
[06:04] <RAOF> Grr. I wonder what's easier. Fixing mir_connection_get_display_info, or switching to the new API.
[06:04] <tvoss_> good morning :)
[06:04] <duflu> RAOF: Robert was right when he said "everyone forgets to bump ABIs"
[06:04] <duflu> Morning tvoss_
[06:05] <tvoss_> duflu, good morning
[06:05] <RAOF> duflu: This isn't actually a deliberate ABI break. This is just a buggy implementation of the deprecated function.
[06:09]  * RAOF 's symbol file branch wouldn't have helped here
[06:16] <RAOF> Ok. Whoever wrote the mir_connection_get_display_info was too trusting :)
[06:17] <RAOF> Ok. I'm going to go and collect something; back in 20 minutes or so. Then I'll fix this.
[06:19] <tvoss_> RAOF, see you :)
[06:19] <tvoss_> duflu, on the switch branch: Alf mentioned that it might break the mm use-case he is working on
[06:19] <tvoss_> did you have a chance to look into that?
[06:19] <duflu> tvoss_: Yes I have a fix/workaround in mind.
[06:20] <duflu> ... but still think we need a more robust solution later, as discussed with alf a while back
[06:20] <tvoss_> duflu, okay, what would be your proposal to move forward?
[06:21] <duflu> tvoss_: I'll do a fix/workaround to work with the existing mm logic today, I hope
[06:21] <duflu> Just saying I think the MM assumptions are wrong/unsafe
[06:21] <tvoss_> duflu, existing as in what alf and kdub are landing right now?
[06:22] <duflu> tvoss_: Yes I've discussed it with alf already
[06:22] <tvoss_> duflu, ack. I will grab some breakfast and be back after that
[06:22] <alf__> duflu: What's unsafe about the MM assumptions?
[06:24] <duflu> alf__: The assumption that each monitor will compositor_acquire at roughly the same time. I think we need better guarantees that it only happens once per vsync
[06:24] <duflu> But I will do a workaround in the switch branch, which should work in theory
[06:25] <alf__> duflu: the problem is that we don't have a single vsync, we have as many vsyncs as outputs
[06:26] <duflu> alf__: Yes, I know. I can probably defer the discussion if I can workaround in switch before my EOD
[06:27] <alf__> duflu: ok, I would be interested in any sane alternatives, but I don't have any in mind
[06:27] <duflu> It will be a blocker for bypass though. Which is why I've changed it...
[06:28] <alf__> duflu: I guess that indicates that it needs to be configurable. At this point it doesn't seem that we can have a one size fits all swapper.
[06:29] <duflu> alf__: I've had a few ideas. But am too rushed to revisit today. I'll try and give you a workaround in the switch branch tho
[06:29] <tvoss_> duflu, alf__ we should tackle that together on Monday morning.
[06:35] <duflu> alf__: The squashed render_surfaces issue seems to be triggered by my lazy allocation optimization. render_surfaces' RenderResourcesBufferInitializer is making the poor assumption about when it is executed, and interacting with the wrong GL context
[06:41] <duflu> alf__: So I'm wondering if I should disable the optimization or leave the render_surfaces bug for fixing separately?
[06:44] <alf__> duflu: what resource are you allocating lazily?
[06:45] <duflu> alf__: The buffers
[06:47] <duflu> alf__: Aha! Never mind. I have a simple fix
[06:58] <racarr> Strange jenkins error on no-input-for-hidden-surfaces
[06:58] <racarr> is what racarr would be wondering about if he were awake ;)
[07:00] <duflu> racarr, welcome to Friday
[07:02] <RAOF> alf__: Hey, so mir_connection_get_display_info is broken. What should I be fixing - the false assumption that config->displays[0] is always valid if num_displays > 1, or making that assumption true?
[07:07] <alf__> RAOF: the false assumption that config->displays[0] is always valid (if by valid you mean connected and used)
[07:07] <RAOF> alf__: Good, that's what I'm in the process of doing.
[07:09] <alf__> RAOF: hmm, config->displays should be config->outputs...
[07:10] <RAOF> Quite true.
[07:10] <alf__> RAOF: Do you want to wait a moment so I can fix that?
[07:11] <RAOF> Ok
[07:11] <alf__> RAOF: so you don't have to make another change soon, but whatever is more convenient for you
[07:11] <RAOF> Well, this is blocking me now, and the update will be trivial.
[07:14] <alf__> RAOF: ok, then feel free to change it now and I will ping you for an update (perhaps we need to wait until Monday to push the s/display/output/ change so we don't break everything)
[07:14] <RAOF> Ideally we'd land this at the same time.
[07:14] <RAOF> Actually, no.
[07:14] <RAOF> We're breaking ABI, so your change bumps SONAME regardless.
[07:14] <RAOF> Monday is fine.
[07:15] <alf__> RAOF: if I push the change today will it break the packages?
[07:15] <RAOF> Not any more than the packages are currently broken.
[07:16] <alf__> RAOF: (I assume so, since you won't be able to update the mesa/x again until next week)
[07:19] <alf__> RAOF: ok, either Kevin or I will have an MP ready for you. When you feel like it next week approve it and make the Mesa/X changes at your convenience.
[07:19] <RAOF> It actually doesn't change X or Mesa (at this point), so we kindof could get away with silently breaking ABI.
[07:20] <alf__> RAOF: ok, where is the breakage then?
[07:20] <RAOF> mir_connection_get_display_info segfaults
[07:20] <RAOF> Because its implementation assumes that config->displays[0] is always valid.
[07:21] <alf__> RAOF: sorry, I misunderstood the whole discussion, looking
[07:25] <alf__> RAOF: ok, feel free to make the change and we can rename afterwards (hopefully later today)
[07:25] <RAOF> Hm. We've got no tests for mir_connection_get_display_info.
[07:26] <alf__> RAOF: I am sure we had... I guess they were remove when introducing the new API
[07:26] <RAOF> Indeed.
[07:27] <RAOF> didrocks: Good morning.
[07:27] <RAOF> How goes IoM?
[07:28] <didrocks> hey RAOF!
[07:28] <didrocks> RAOF: foggy? ;)
[07:28] <didrocks> how is the other side of the planet?
[07:28] <RAOF> Darkening!
[07:35] <RAOF> alf__: https://code.launchpad.net/~raof/mir/fix-mir_connection_get_display_info/+merge/178216 is there for your viewing pleasure.
[07:38] <alf__> RAOF: a more reliable test would be to check that (.used == 1 and .connected == 1 and .current_mode < .num_modes)
[07:38] <didrocks> RAOF: btw, any luck with testing xmir/xorg and upload that to distro?
[07:38] <didrocks> (or still building?)
[07:40] <RAOF> didrocks: See conversation above. The libmirclient1 that landed this morning broke xmir.
[07:40] <RAOF> *Apart* from that, everything seems to be good to go.
[07:40] <RAOF> :)
[07:41] <didrocks> RAOF: ahah, ok, so once your MP merged, I need to repush mir to distro
[07:41] <didrocks> then you push all the crack
[07:41] <didrocks> we promote
[07:41] <didrocks> and then u-s-c
[07:49] <duflu> alf__: For future-proofing can we not test booleans against "1" ? :)
[07:49] <duflu> *"can we please not"
[07:52] <alf__> duflu: sure :)
[07:52] <RAOF> alf__: Thanks
[07:53] <robert_ancell> RAOF, hey, can you do a quick triage on bug 1206744
[07:53] <RAOF> Bah! What's happened to radeon?
[07:54] <alf__> RAOF: hmm, test failure about uninitialized variable
[07:55] <RAOF> Hm, quite true.
[07:55] <RAOF> Why didn't that error locally?
[08:04] <RAOF> Evening Zoë time.
[08:22] <duflu> Back in 20ish
[08:33] <alan_g> hikiko: you've not addressed https://code.launchpad.net/~hikiko/mir/mir.nested-mode-connection/+merge/178042/comments/401241 - should be a 5 minute job?
[08:38] <hikiko> I ve fixed i t already alan_g :)
[08:38] <hikiko> pushing!
[08:38] <hikiko> oh no
[08:39] <alan_g> alf__: ^ - want to re-review?
[08:39] <hikiko> I was referring to another comment
[08:39] <hikiko> sorry
[08:39] <hikiko> you mean to replace MirConnection* with unique_ptr?
[08:39] <hikiko> https://code.launchpad.net/~hikiko/mir/mir.nested-mode-connection/+merge/178042/comments/401765
[08:40] <hikiko> I am pushing this fix
[08:43] <alan_g> hikiko: it is better to resolve "Needs Fixing" comments than "Comment" comments.
[08:44] <hikiko> yes I pushed alex's changes except the MirConnection*
[08:44] <hikiko> do we really need this?
[08:45] <hikiko> (because we use it once)
[08:45] <hikiko> and never again
[08:46] <alan_g> hikiko: because we throw in the constructor the destructor never executes. So the connection is leaked.
[08:46] <hikiko> oh
[08:47] <hikiko> i see
[08:47] <alan_g> unique_ptr isn't IMO the cleanest solution - I'd write an RAII class
[08:47] <hikiko> sorry if I sound too naive but I don't know what a RAII class is
[08:48] <hikiko> http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
[08:48] <hikiko> is it this?^
[08:48] <alf__> hikiko: yes
[08:50] <hikiko> cool :) I'll read it (I've never used it before)
[09:09]  * duflu is not a fan of RAII. Most notably because it confuses object destruction for resource close. If you for example "delete my_file", that does not delete or destroy the file like destruction usually implies
[09:14]  * alan_g Likes RAII - but thinks it should be called "use destructors for paired operations"
[09:20] <tsdgeos> guys
[09:20] <tsdgeos> i've come across
[09:20] <tsdgeos> class ForwardingInternalSurface : public mg::InternalSurface
[09:20] <tsdgeos> with the comment
[09:20] <tsdgeos>  // TODO this ought to be provided by the library
[09:20] <alan_g> tsdgeos: ack I wrote that
[09:20] <tsdgeos> I agree, since i think i need something similar to make http://bazaar.launchpad.net/~phablet-team/platform-api/mir-packaging/ compile again
[09:21] <tsdgeos> so i copy that class in there or?
[09:24] <alan_g> tsdgeos: you could as a workaround. I'll be fixing that in a few days (sorry didn't know of your dependency)
[09:24] <tsdgeos> ok
[09:24] <tsdgeos> will do
[09:53] <RAOF> alf__: MP should be fixed; review at your leisure.
[09:53] <alf__> RAOF: great, thanks
[10:00] <duflu> RAOF: I proposed a new, different, fix for the lag bug, if you wish to continue hacking into the night :)
[10:01]  * duflu -> dinner
[10:17] <RAOF> didrocks: Hey, it's ok for me to upload the stack before that mir fix lands, isn't it? It'll be blocked in -proposed until everything works?
[10:18] <didrocks> RAOF: it will be blocked in -proposed until someone put mir in main
[10:19] <didrocks> RAOF: what can happen TBH during the week-end
[10:19] <didrocks> (low chance though)
[10:19] <didrocks> RAOF: can you bump the build-dep?
[10:19] <didrocks> that way, we'll be sure that we'll need a new Mir version
[10:19] <didrocks> (even if it's artificial)
[10:19] <didrocks> wdyt?
[10:19] <RAOF> I could do that, I guess?
[10:20] <RAOF> Actually - unity-system-compositor *does* have auto tests, right?
[10:20] <RAOF> The bug only occurs under u-s-c, and is highly likely to be caught by *any* u-s-c testing
[10:21] <didrocks> RAOF: those tests are integration tests
[10:21] <didrocks> RAOF: they can't run autopilot
[10:22] <RAOF> So those tests aren't run before promotion from -proposed?
[10:22] <didrocks> RAOF: exactly
[10:24] <RAOF> Ok; build-dep bump it is.
[10:26] <RAOF> Fire in the hole!
[10:51] <arsson> Is there still two cursors?
[10:52] <RAOF> Nope.
[10:53] <RAOF> So, now that everything's uploaded to proposed, time to disappear for the weekend.
[10:53] <RAOF> What could possibly go wrong!
[11:10] <tvoss_> RAOF, ping
[12:55] <robert_ancell> alf__, you are working on a patch for alt+ctrl+backspace in the examples?
[12:56] <alf__> robert_ancell: yes
[12:56] <robert_ancell> alf__, ok, I was working on that too - I'll stop then
[12:56] <alf__> robert_ancell: basically, your branch adapted to mir::examples::ServerConfiguration
[12:57] <robert_ancell> alf__, ok
[12:57] <robert_ancell> alf__, I just rebased so it uses mir::examples::ServerConfiguration, but I didn't remove the duplication yet
[12:58] <alf__> robert_ancell: well, then you are farther ahead than I am :)
[12:58] <robert_ancell> I was stuck on std::initializer_list - how to have one build in filter in the config but also support other filters that the examples require
[12:58] <robert_ancell> built in filter rather
[12:59] <alf__> robert_ancell: yes, we need to get rid of initializer_list and replace it with a mutable container (e.g. vector) or perhaps some other abstraction
[12:59] <robert_ancell> yes :)
[13:00] <robert_ancell> alf__, ok, should I stop and leave you to make the merge then?
[13:01] <alf__> robert_ancell: I am not sure now, since you are further ahead than I am. I just declared intention of doing it, haven't really started yet
[13:01] <robert_ancell> alf__, ok, I'll continue then - I'll be off on Monday after I fly back from IOM so feel free to finish it off if you have plans there
[13:02] <alf__> robert_ancell: ok, sounds good then
[13:17] <thomi> racarr: any progress with the stress test / thread race fix?
[14:23] <alan_g> hikiko: how are you doing?
[14:24] <hikiko> alan_g,
[14:24] <hikiko> I did it
[14:25] <hikiko> it wasn't that simple
[14:25] <hikiko> I had to overload * as well
[14:25] <hikiko> to keep using the same methods
[14:25] <hikiko> but it seems to work :D
[14:25] <hikiko> I am testing and I will push in a moment
[14:26] <alan_g> Cool, I have an automated test for it: lp:~alan-griffiths/mir/sketch-for-nested-mir-tests
[14:27] <hikiko> thanks a lot! :D I will merge it
[14:27] <alan_g> you merge that and run bin/acceptance-tests --gtest_filter=TestNestedMir*
[14:44] <hikiko> alan_g, I got some ok and passed
[14:44] <hikiko> can I push the merge?
[14:44] <alan_g> hikiko: please do
[18:43] <racarr> :(  just merged trunk in to client focus notifications again and lots of test failures
[18:49] <kdub_> trunk looks ok to me (rev920), aside from the already known problem tests
[18:53] <racarr> kdub_: Yeah its just some weird interaction
[18:53] <racarr> probably having to do ith some redone test fixture and stubs vs mocks and blabla bla still investigating
[18:53] <racarr> its kind of cryptical somehow
[18:53] <racarr> broken pipe, connection refused, etc.
[19:01] <racarr> The thing is ApplicationMediatorReport session_connect and session_create_surface fail
[19:01] <racarr> but, session_disconnect
[19:01] <racarr> which contains session_create_surface as a literal textual subset
[19:01] <racarr> passes
[19:01] <racarr> I think something funny is going on with test_protobuf_client
[19:07] <racarr> maybe the event sink is trying to send events after the client has already destroyed it's surface
[19:07] <racarr> i.e. done->Run(); client destroys surface quickly
[19:07] <racarr> err
[19:07] <racarr> client disconnects
[19:07] <racarr> then the event sink is unclogged
[19:07] <racarr> and the socket is invalid
[19:07] <racarr> and I guess
[19:07] <racarr> an exception isnt being handled
[19:07] <racarr> due to some unrelated change
[19:13] <racarr> nah keeping the client alive doesnt fixit so its somethingelse
[19:18] <racarr> it may be happening when trying to send the focus lost message on close_session
[19:18] <racarr> need to investigate the ownership of the MessageSender
[19:19] <racarr> i.e. perhaps EventSender can not take a strong reference
[19:25] <racarr> I see. It can only happen in the ~SessionMediator Session closed without disconnect code path
[19:25] <racarr> and even then only sometimes
[19:30] <racarr> so, it seems like the client process finishes succesfully
[19:31] <racarr> and dies without disconnect then the server gets sigterm by the testing process manager
[19:31] <racarr> and shuts down, ~SessionMediator is reached
[19:31] <racarr> which sees that disconnect was never called and calls SessionManager->close_session...
[19:32] <racarr> so, then SessionManager::close_session tries to clear the focus
[19:32] <racarr> the client which is being closed now, never closed it's surface either
[19:32] <racarr> so, the focus mechanism sees that said surface last had focus
[19:32] <racarr> and delivers an unfocused event
[19:33] <racarr> eventually ending in SocketMessenger throwing because of broken pipe
[19:33] <racarr> Almost surely, event_sender taking a weak reference to the message sender will fix it
[19:33] <racarr> but im concerned in that I do not understand why it didn't happen before
[19:33] <racarr> kdub_: Any ideas perhaps?
[19:36] <kdub_> hmm
[19:37] <kdub_> so a client is outliving the server?
[19:38] <racarr> kdub_: No, the client is shutting don, without calling disconnect or release_surface
[19:38] <racarr> so then the testing_process_manager kills the server
[19:38] <racarr> and in ~SessionMediator when close_session is called
[19:38] <racarr> eventually the surface is unfocused
[19:38] <racarr> causing things to try and send an event over the wire
[19:38] <racarr> and unhandled broken pipe exception
[19:42] <kdub_> i'd hope there's a better way handle that than a proxy
[19:43] <kdub_> havent thought too deply though how yet
[19:43] <racarr> kdub_: Mm. there are a bunch of ways to make the test pass but it feels like
[19:43] <racarr> something is wrong
[19:43] <racarr> kdub_: One thing is I think
[19:44] <racarr> EventSender can take std::weak_ptr<MessageSender> perhaps?
[19:46] <kdub_> i suppose
[19:46] <racarr> but it seems
[19:46] <kdub_> that was the proxying though
[19:46] <racarr> also not correct XD
[19:46] <racarr> yeah
[19:48] <racarr> kdub_: This is a total diversion but I saw a crazy idea yesterday from john carmack on handling mutable state in multithreaded C++ game logic
[19:49] <racarr> first, you use a seperate garbage collected heap for all your mutable data i.e. your non asset data.
[19:49] <racarr> and basically you write all your code, kind of in the style we are using for multithreading except with two crazy differences
[19:50] <racarr> 1. Rather than any of your actors or whatever actually updating state, they return partially applied functions to do such, and then the main frame loop
[19:50] <racarr> sorts out how to apply such
[19:50] <racarr> but then the also clever bit is
[19:51] <racarr> well how do you actually enforce this, if you are handing out pointers still
[19:51] <racarr> so the objects can look at eachother
[19:51] <racarr> so the idea is during the garbage collection step (which you do every frame), you copy the entire mutable heap
[19:51] <racarr> and all your objects that have pointers to other objects
[19:52] <racarr> you update with a pointer to the heap from the last frame
[19:52] <racarr> which you then mprotect
[19:52] <racarr> so basically each of your actors, can have "real" enforced purity
[19:52] <racarr> and you have a small bit of monadic code that applies all these unctions they create
[19:52] <racarr> :p not really an architecture for mir but I thought it was really fascinating
[20:54] <racarr> Back from lunch, Ill produce a reduced test case. so this doesn't muddle client focus notifications
[20:55] <racarr> i.e....events_sent_during_disconnect_do_not_crash_server
[21:30] <racarr> got a reduced test but the ownership doesnt work like I thought it did so just weak_ptr isnt enough
[21:53]  * kdub_ forgot to mention... if i go dark, its because the internet company cut me off early
[21:53] <kdub_> transferring my service over the weekend
[22:15] <racarr> the connected sessions abstraction is making this hard to solve
[22:27] <racarr> sigh I tried having the EventSender used connected_sessions to verify that
[22:27] <racarr> the session was still around
[22:27] <racarr> but this whole process of disconnect happens during the
[22:28] <racarr> connected_sessions.remove call
[22:28] <racarr> so of course that is a deadlock
[22:29] <racarr> part of the reason this code is so difficult is socket session killing itself
[22:29] <racarr> by calling connected_sessions.remove