[00:00] well [00:00] fcntl(O_NONBLOCK...blabla [00:00] cant fail immediately after socketpair suceeds [00:00] so this must be from the second InputChannel constructor...due to an architectural bug now we [00:00] end up creating two input channels [00:00] at different times [00:00] so there is a race between the first one closing and the second one being created [00:00] tricky [00:00] that...could happen *thinks* [00:00] Yes could happen! [00:01] It is either fixed in rebuild-input-targeting [00:01] *thinks* [00:01] sounds likely, given that this is multi-threaded test [00:01] ok it's very possible [00:01] it's fixed in rebuild-input-targeting [00:01] it's easy enough to reproduce, anyway [00:01] if not it will definitely be fixed in this follow up branch I was planning on doing to get rid of the duplicate InputChannels [00:01] racarr: that an unmerged branch? [00:01] yes [00:01] so I can fix it tomorrow [00:02] ok [00:02] In the mean time, I'll just disable input, and move on to rendering stuff [00:03] you mean finding the next bug ;) [00:04] I just found a race in surface states that can end up hosing the socket session XD [00:04] yeah :) [00:09] Ok client-focus-notifications is mostly finished I just need to rework some of the SessionMediator test fixtures. [00:09] I'm also thinking a lot about input acceptance tests... [00:10] it's frustratingly difficult to express really simple scenarios like [00:10] one client opens, another client opens, once the second client has opened we send some input, then the second client closes and we send some more input to see that the first client gets it [00:11] becomes a strange excercise in interprocess expectation synchronization every time [00:11] so I am trying to come up with...a better fixture, because there are a good dozen (easily) meaningful input acceptance tests that we could put in place [00:12] This focus notification test fixture comes closer to it but not quite [00:12] However. This day is already an hour too long and I want to go rock climbing before I get evening sleepy [00:12] so bye for now, will try and come back to finish off and submit client-focus-notifications :) [00:45] Today the part of Chris will be played by a sleep-deprived misanthrope. [01:47] RAOF, did you do that work on lightdm set defaults? I can do it now if you want [01:49] robert_ancell: I have not done that yet, thanks. [02:02] RAOF, there, it will cost you a review :) https://code.launchpad.net/~robert-ancell/lightdm/set-defaults-seat-type/+merge/166420 [02:10] robert_ancell: That's probably a one-approve-to-merge review, right? [02:11] pretty much [02:11] normally with lightdm it's a zero-approve-to-merge :) [02:13] Hah. I can't approve it anyway :) [02:24] thomi, hmm, what's the CI complaining about here? https://jenkins.qa.ubuntu.com/job/lightdm-raring-amd64-ci/29/console [02:24] Looks like it might be trying to merge in the packaging twice? [02:24] * thomi looks [02:25] that does look borked. just checking the job config [02:31] robert_ancell: problem was a config error in the cupstream tool. Have manually edited jenkins config, and re-run that CI job again. Will propose the fix for the config, so Francis can merge it & re-deploy jenkins jobs tomorrow [02:31] thomi, ok, thanks! [02:31] no worries [02:34] robert_ancell: also, lightdm & unity-system-compositor are now being dput'd to the ppa after every mir build [02:39] yay! [02:55] thomi, hmm, autolanding now failing? https://code.launchpad.net/~robert-ancell/lightdm/set-defaults-seat-type/+merge/166420 [02:57] robert_ancell: interesting failure :-/ [03:52] RAOF, does this look correct? https://code.launchpad.net/~robert-ancell/mir/revert-drm-auth-magic-removal/+merge/166432 [03:56] robert_ancell: Yes. [04:21] RAOF, I don't get the --trees arg to bzr init-repo - do you use it? [04:22] robert_ancell: You mean --no-trees? No, I don't. [04:22] RAOF, according to http://wiki.bazaar.canonical.com/SharedRepositoryTutorial --no-trees is the default I think [04:23] But that's because I want a working tree for my branches. If I were just serving branches out over ssh I'd want --no-trees. [04:23] working tree = directory with files in it to edit/build? [04:24] Right. [04:24] hmm, looks like --trees is the default when trying all cases [04:24] --no-trees is very similar to git's branches, except they're non-hidden directories rather than files in a hidden directory. [04:24] Yeah, --trees is totally the default. [04:25] oh, --no-trees means "all branches in one directory"? [04:35] No, it means ‘just the branch metadata in the directory’ [04:37] You can turn this into “all branches (effectively) in one working tree” by use of lightweight checkouts and swiching. [04:59] robert_ancell: that lightdm issue seems to be fixed. In the end it was related to the MBS rebuild changes we made [06:14] TIL that “auto fn(int foo) -> int { return foo; }” is a valid C++11 function definition. [06:15] For all the functional programming immigrants, I guess :) [07:20] RAOF, +1 :) [08:13] alf__: can you check https://code.launchpad.net/~robertcarr/mir/rebuild-input-targeting/+merge/165712 - the more eyes the better for this one [08:17] hikiko: ping [08:24] alan_g, :) [08:24] hi [08:24] I just merged your branch [08:25] I thought approving is enough for jenkins to merge it :/ sorry :) [08:36] hikiko: Only to trunk - your branch is your own. ;) [08:49] alan_g: @rebuild-input-targeting, sure, @customisable-DefaultFramebufferFactory, I think jenkins is having network problems, and is scheduled to shut down, so autolanding will probably fail again :/ [08:51] alf__: ack (otp) === pete-woods1 is now known as pete-woods === alan_g is now known as alan_g|tea === alan_g|tea is now known as alan_g === mmrazik is now known as mmrazik|afk === mmrazik|afk is now known as mmrazik [14:52] Morning [14:54] racarr: hi! [14:54] racarr: heads up: jenkins is offline [14:54] racarr: actually it is online now but network is problematic [14:57] :( [14:57] racarr: mornin' [14:58] racarr: i noticed the the platform-api bulk changes got merged [14:59] Afternoon [15:03] status, dreaming up ways to break my swapper switcher... might put it in pre-review now just to get some air on the MP [15:07] kgunn: Cool ill update my branch today [15:08] ugh [15:08] 3 days of climbing in a row -> painful typing [15:08] rocks? [15:10] https://code.launchpad.net/~robertcarr/mir/client-focus-notifications/+merge/166440 exist [15:11] kgunn: Fake rocks. I found a climbing gym like 3 blocks away [15:11] cool [15:11] It's pretty fun as a form of excercise, they make little "puzzles" [15:11] where you can only climb using holds of certain colors, etc, and have to get from one point to another [15:12] and some of it is just strength/flexibility [15:12] but the puzzles also have solutions...like "Oh I have to use this funny reach with my right hand to get this hold from underneath then its easy!" [15:12] *babble* [15:12] Fun stuff [15:25] racarr: i like stuff like that [15:26] :) [15:47] racarr: rock climbing is great, I used to do a lot outside on real rock when I was younger. Advice: don't over-do it, especially in the beginning. Muscles can get into shape much faster than tendons etc, so you start feeling stronger and want more, but it's easy to get strain injuries (e.g. tendonitis). [15:51] alf__: Ah. That's good advice [15:52] Thanks :) [15:52] katie: I am ok for our meeting today but could be 1-2 minute late if the line at the coffee shop is long [15:52] brb [16:02] back [16:03] racarr, ok [16:03] racarr, i was a bit late too! === mmrazik is now known as mmrazik|afk [16:27] kdub_: ping [16:28] alan_g, pong === greyback is now known as greyback|food === alan_g is now known as alan_g|life [16:59] Met with katie. Worked through some of the details for the tiled surface state (various maximized states). Talked about how minimized isn't really a state (i.e. a minimized window still shows in the maximized state it was in in the alt tab preview), it's just hidden [16:59] Realized that the snapping constraints (i.e. anchoring) for the tiled state already apply to all windows [16:59] so really the tiled state just means it had a previous floating size that it will be restored to when untiled [17:00] We also talked about calling Surface States [17:00] Surface Modes to be more explicit [17:00] because it's unclear, why (for example) [17:00] focus isn't a surface state [17:01] i.e. if you asked someone who hadn't read the design documents, talk about the state of this surface === greyback|food is now known as greyback [17:20] Sorry internet failure! [17:21] end of sentence: "Talk about the state of this surface" the first things that come to mind are like [17:21] oh it's open, it's on top, it has keyboard focus, it has this size [17:21] not necessarily, "Vertically maximized" [17:26] what's going on with https://bugs.launchpad.net/mir/+bug/1183327 [17:26] Launchpad bug 1183327 in Mir "Stress tests cause server to crash" [Medium,Fix committed] [17:26] I thought we determined last night it was in the input channel stuf then as I was going to sleep got some email about [17:26] a bug being assigned to me [17:26] and now I can't find it [17:30] https://bugs.launchpad.net/mir/+bug/1185589 :) [17:30] Launchpad bug 1185589 in Mir "Mir server crahes when allocating & freeing surfaces from multiple threads" [High,Triaged] [17:30] racarr: agreed that minimized is more a flag on a surface. We want to maintain the surface geometry while minimized so it restores correctly [17:32] greyback: Mm. [17:34] racarr: could minimized be considered a special tiled state (i.e. docked to nothing)? [17:36] greyback: That's what I was just thinking :) [17:36] it has most in common with the tiled states, i.e. it snaps back [17:36] yep [17:36] but I think. actually with minimized it might be easier to just model it [17:36] seperately [17:37] i.e. rather than maintain/restore the geometry [17:37] the surface never goes anywhere (there is no meaningful minimized geometry change imo...) [17:37] so we just say it [17:37] 's hidden [17:37] that is probably easier [17:37] Something I have started to think about lately [17:37] with surface types/roles and surface states/modes [17:38] is that tight coupling shows up the same way it does in code [17:38] in "design languages" [17:39] well, not exactly the same way XD [17:39] but I think in general, rather than try and do the programmer thing (i.e. look for grand abstractions over the various design patterns) [17:40] there is value to modelling the concepts seperately and more explicitly, even when they can be grouped [17:40] because then we end up with more "interchangeable concepts" XD [17:40] which is part of the difficulty, in trying to make the same words about surface management [17:40] apply to the phone/tablet/desktop [17:41] True that. [17:44] well having not done a display manager/window manager before, I'm not sure what's the best approach. I guess start simple :) [17:46] don't worry having done a window manager before im pretty sure all it did was fill me with lots of really strong opinions that have nothing to do with reality :p [17:48] :D [17:50] Merging rebuild-input-targeting [17:56] Took down my computer with the stress test lol [17:58] Oh I see [17:58] input_registrar->input_surface_opened needs to be inside [17:58] the lock_guard in ms::SurfaceStack::create_surface [17:59] but i wanted to merge the input_registrar and the input_factory anyway so might take the time to fix it that way [17:59] the race is the surface is destroyed inbetween the call to surfaces.push_back(surface) and the input registration [17:59] causing the input registration to fail on the closed fd [17:59] shouldn't locking higher up solve this? [18:00] this means that mf::SessionMediator::destroy_surface is being called from another thread before [18:00] mf::SessionMediator::create_surface ever returns. [18:00] It seems like we should lock the SessionMediator right, to guarantee [18:00] in order processing of messages [18:00] does that even guarantee that [18:01] no lol [18:01] Inbetween reading the event and calling the appropriate method on the SessionMediator [18:02] another thread could read another message (true? y/n...I believe y) [18:02] locking the session mediator, and handling messages out of order again [18:02] locking the session mediator seems kind of reasonable though [18:08] or is all this impossible [18:08] because how does the client cause destroy_surface to be called until it has replied to [18:08] until it has received a response to* [18:08] create_surface [18:09] hmm hmm hmm [18:10] Could the IDs be getting mixed up somehow [18:11] oh man [18:11] I can use an lttng trace [18:11] to see what is happening [18:11] :O [18:20] trying to use lttv but its not working :( [18:24] all the messages look in a sensible order [18:25] going to "fix" thi scenario even though I cant understand how it would happen and see if another bug exhibits [18:31] It does by locking up my system :) [18:37] perhaps a race in the communicator with assosciating the mediator... [18:50] Almost sure this is two races...or like a race and a memory error...or... === greyback is now known as greyback|away [19:10] I ruled out everything I could think of that wasnt memory corruption [19:10] then I tried to run mir demo server (without even the stress tests) nder electric-fence [19:10] and it hung my entire system [19:10] so i had to reboot and then i tried again [19:10] with the same results XD [19:12] ok whatever is going on. it seems to maybe not actually have to do with the race in this dual input hcannel creation [19:12] but that's still a weird issue that can be solved by deleting code (After redoing some interfaces) [19:13] so going to spend some time on that, and then see how this stress test issue remanifests [19:45] lunch [20:07] back :) [20:51] I am not friends with this bug [20:51] actually right now I am not friends with the difficulty of debugging and frequency of system restarts required [20:51] the bug is kinda neat === greyback|away is now known as greyback [21:02] Interesting [21:02] I moved it to an exception from the InputRegistrar [21:02] requesting window handle for an unregistered surface [21:02] I can't make gdb work without hanging my system -.- [21:09] robert_ancell: so i was reading this [21:09] http://blog.mecheye.net/2012/06/the-linux-graphics-stack/#rendering-stack [21:10] which was stellar btw... [21:10] but he said "With the new dumb ioctls in place, it is recommended to use those and not libkms." [21:11] are the "dumb ioctls" still considred kms ? [21:13] not sure [21:14] kgunn, asking in #phablet, they do expect to use the whole lightdm/unity-system-compositor/unity-greeter/unity stack for the phone - does that match your expectations? [21:15] morning all [21:15] racarr, ping [21:15] robert_ancell: Pong! [21:15] racarr, hangout? [21:16] * kgunn reading the #phablet scrollback [21:20] robert_ancell: yes...matches my expectation [21:21] kgunn, no, I just wasn't sure that was what they'd decided on [22:07] thomi, do you know why mesa is -jenkins71 for raring and -jenkins1 for saucy in https://launchpad.net/~mir-team/+archive/staging/+packages? [22:07] * thomi looks [22:08] It causes apt to downgrade the packages from raring->saucy, which I *think* is fine [22:09] robert_ancell: yeah, it's because each series is controlled by a separate jenkins job, and it's the jenkins job build number that's used in the package version number [22:09] robert_ancell: It was uploaded on the same date though, so I'm 99% sure it's the same package contents [22:09] any way to manually bump the number up so they match? [22:10] I may be able to finesse the numbers in the jenkins job, let me take a look [22:10] also, any chance of getting jenkins to build xorg like it is for mesa? [22:14] robert_ancell: ok, I've patched the build script, so it will shortly upload a new -jenkins71 package for saucy. However, it's kind of fragile. As soon as the build for one series passes and the other series fail the versions will get out of sync again [22:14] but at least people upgrading from raring->saucy won't get their mesa downgraded anymore [22:14] robert_ancell: how is xorg currently built? === greyback is now known as greyback|away [22:15] robert_ancell: it looks like RAOF manually dputs it? [22:17] thomi, yes, I think so [22:17] thomi, will the same out-of-sync problem happen for the other jobs? [22:18] robert_ancell: no, these jobs are special since we're not using the autolanding infrastructure, since the source is external [22:18] i.e.- github [22:19] yeah [22:19] if launchpad still mirrored git repos :) [22:19] oh, it stopped doing that? [22:38] thomi: Can you trigger the crash with the stress test [22:38] and input on [22:38] even if you don't [22:38] move the cursor at all [22:38] I think I might have fixed part of it? [22:39] racarr: in mir trunk? [22:39] Sure [22:39] now if I don't thrash the cursor its all fine [22:39] if I thrash the cursor [22:39] im not actually sure that its not a deadlock [22:39] instead of a crash [22:39] because im never able to interact with my system again without rebooting [22:39] XD [22:39] racarr: I never touch teh cursor [22:39] hmm [22:40] racarr: in fact, moving the cursor on my laptop doesn't move the arror in mir_demo_server... is it supposed to? [22:40] well I think that part of it I found then, and there is another part...when you thrash the cursor over thrashing clients [22:40] thomi: If you have permissions on /dev/input* [22:40] input/* [22:41] racarr: I see. [22:46] racarr: the server no longer crashes, but it still has problems. It seems to get in a state where some call the client is making blocks forever, and the client prints out: [22:46] ERROR: Invocation failed: id: 0 method_name: connect error: Broken pipe [22:46] ERROR: Header receipt failed: error: End of file [22:46] right before hanging [22:47] that's possible [22:47] im a few fixes ahead of trunk atm in my branch [22:47] "fixes" [22:47] I do n't know which of the three "races" I fixed [22:47] racarr: heh, ok [22:47] stopped the problem yet XD [22:48] racarr: if you put your branch somewhere publci I can run the tests for you.. in about 10 minutes time [23:01] oh [23:01] I think I found the deadlock [23:01] a deadlock [23:02] ...is thi going to come down to moving one line :( [23:02] well and adding another one somewhere else [23:03] we should install a kill handler [23:03] all the way down in the input stack so as long as the InputReader thread is running [23:03] even if the SurfaceStack is deadlocked [23:03] you can kill mir [23:05] hmm no hard to solve the lock [23:05] racarr, well, the kill handler should be in the shell [23:05] robert_ancell: but I mean it should be installed lower than EventFilter [23:05] which is the only mechanism we provide now [23:05] why not do it in the event filter? [23:06] because the EventFilter is run from the dispatcher thread [23:06] and interacts with all the other threads [23:06] then don't deadlock that one :) [23:06] well, ideally right. [23:06] ;) [23:16] we need a cleanup tuesday again [23:17] kdub_: Yes! [23:18] racarr, i went to add a function to the surface, and found that i want to cleanup these interfaces a bit while i'm here :) [23:23] do it :) it needsss it [23:26] Hey! [23:26] I really fixed it that time [23:26] Either that or I am having an extraordinary run of race condition luck [23:26] that would be cruel [23:27] the final dead lock was between the input registrar the dispatcher and the surface stack [23:27] I think it's resolved but it changed what was a really simple (albeit not working) locking pattern [23:27] in to...a locking dance [23:27] which is never a nice thing to maintain [23:27] so I need to think about it more