[00:00] RAOF: hey, I was just going to ask if you had any indication from ickle about that patch rejection, other than what was in the git log and NEWS file [00:00] mhall119: No, I've not. [00:00] I really feel sorry for him, sounds like he's in an unpleasant spot [00:01] I don't know; I don't have a feel for Intel-internal politics. [00:52] robert_ancell: https://code.launchpad.net/~raof/mir/prebump-abi-for-lifecycle-cookie/+merge/184703 ? [00:57] RAOF, cool. I have the u-s-c change in lp:~robert-ancell/unity-system-compositor/app-lifecycle - still feels a bit clunky though [00:57] I tried out mir last night (installed unity-system-compositor , open source radeon driver), and the whole display had small black stripes (say, 5px normal, 5x black, 5px normal... across the whole thing) [00:57] is this known, or should I re-repro and file a bug? [00:58] RAOF, so the logic for XMir needs to be, don't grab input devices on startup until you get lifecycle state set to "mir_lifecycle_state_resumed". Drop them on "mir_lifecycle_state_will_suspend" [00:58] jrr, just file a bug and we'll mark it duplicate if there is already one [00:59] alright, will do [00:59] robert_ancell: And then tell Mir that it's done, right? [00:59] RAOF, yes, once we can do that [01:00] RAOF, and if I don't hear from you in a reasonable amount of time I just drop your connection :) [01:00] :) [01:01] RAOF, how can I build a local XMir to test this? [01:01] ./autogen.sh --prefix=$HOME/.local --enable-xmir && make -j9 && make install [01:02] RAOF, have you made the switch from surface focus to app lifecycle on master? [01:02] No, I haven't yet. [01:02] * robert_ancell -> lunch [01:04] * RAOF → coffee [01:08] RAOF: no, thats not how the application model is defined [01:09] applications are given a long timeframe to save their state [01:09] theres no calling back into the server to signal state saving completion [01:09] its a hard policy scenario, the client has no say in the time/resource constraints associated with its lifecycle [01:11] My reading of the application lifecycle doc was that the on_application_about_to_stop callback would “3. Wait for timeout or completion then SIGSTOP the process.” [01:12] Which implies to me that the unity API needs some way to signal completion? [01:12] Not that *client code* would be calling it; we don't expect clients to use libmirclient directly anyway. [01:14] ricmm: ^^^ [01:18] RAOF, nope, the client just receives the signal, and is granted a grace period for serialization [01:18] a.k.a. state preservation [01:19] RAOF: if the document says that then it is wrong, will need to update it [01:19] ricmm, taking an AI to update [01:19] thanks [01:20] ricmm, might be later my day, though [01:20] google docs are a bit difficult in this place [01:21] no worries, about time to sleep over here [01:21] RAOF: consider that 3. item broken, tvoss will update document, applications only get signaled of an about_to_suspend() transition [01:21] or a resumed() transition if they are previously-stopped applications [01:22] otherwise they start from 0, its up to applications to restore their state from their serialization targets [01:23] thnks for taking an interest in clarifying the design guidelines [01:23] * ricmm -> beer/sleep [01:44] Hm. [01:44] I'd prefer if 3. *wasn't* broken, and I'm not sure why we're adamant that it is. [02:03] RAOF, what's the document link? [02:03] robert_ancell: https://docs.google.com/a/canonical.com/document/d/186nT03Jyu_d-GMyJ--8Qp83o1Ey-O1EsWZtwrPGE2TQ/edit [02:09] RAOF, hmm, I'm not seeing how this can be completed without the shell being signalled when a client has completed [02:13] robert_ancell: I think they're just going to not bother with the "until completed" [02:13] robert_ancell: So instead it's ‘SIGSTOP after $TIMEOUT’ [02:14] Rather than SIGSTOP after $TIMEOUT or completion, whichever happens first. [02:14] Which I think is the trivially better behaviour, so I'm not sure why they're adamant that it should not be that way. [02:15] RAOF, right, but the component that does the killing is the shell right? Which means it has to know when completion occurs to do the killing. Unless the platform API has a separate thread that does the killing if the function call doesn't return in a sufficient time [02:15] I'm not sure how this is all implemented [02:15] robert_ancell: No, it can just assume that completion has occurred by $TIMEOUT. [02:16] ie: the shell sends the "about_to_suspend" signal, and then in 10 seconds SIGSTOPs the client. [02:17] RAOF, oh, I guess the process will quit before 10s if it is successful, so the "complete" signal is the process termination [02:17] RAOF, in our case where we don't actually want the process to quit, it doesn't work so well [02:17] Oh, no. [02:17] The process doesn't quit. [02:17] or in the case of the shell, it might not be the process termination but the Mir client connection quitting [02:17] It just has no completion signal. [02:18] RAOF, so the shell has 0 idea if the process is actually suspended? [02:18] Correct [02:18] ah [02:18] Well, the shell knows, because it's sent SIGSTOP [02:18] It has no idea if the process has *saved its state* [02:19] Which I think is silly. [02:19] yes [02:19] So I'm going to continue hacking on the cookie branch and then land it when they come to their senses. [02:19] :) [02:19] I don't see any reason not to send the signal back to the shell [02:19] Right. [02:20] The apps won't ever know [02:20] The shell doesn't have to *wait* for the signal [02:20] and it will be make debugging a lot easier [02:20] Yes. [02:20] rather than just guessing if an app actually suspended [02:20] And, hell, we'll be able to send SIGSTOP *sooner* [02:21] RAOF, does SIGSTOP free up any resources except for CPU? [02:21] And give interesting metrics, like ‘your app took 5s to suspend, which is getting close to the 10s timeout…’ [02:21] robert_ancell: No [02:21] I suppose it allows the kernel to page out all the memory for that app [02:21] Yeah. [02:21] But the kernel could page out all but one page of the app anyway. [02:22] (Assuming a reasonable app that's blocked when mir_swap_buffers is blocked) [02:22] and all apps will be like that :) === chihchun_afk is now known as chihchun [03:41] RAOF: consider it a no-op to the display server itself [03:42] if you want I can rewrite in a way that the Mir protocol itself needs not understand what the passed message means [03:43] I think it makes perfect sense for the Mir protocol to understand? [03:43] I'm not sure why having a ‘Yo! I've finished this thing you asked me to do’ callback is a bad idea. [03:43] well I believe saying "until they come to our senses" is out of place [03:43] its irrelevant, this was planned ages ago [03:43] It doesn't prevent us from having a hard timeout. [03:44] for functionality scheduled to land for 13.10 [03:44] and which has been in place since a long time ago [03:44] Mir being the new player in the game [03:45] For a little context: Robert and I would like for this callback to exist, because it's really useful for unity-system-compositor to know when XMir's done handling the "please suspend" message. [03:45] I dont think you need to care about the timeout or not, the lifecycle policy itself is up to the shell to implement [03:45] not the display server [03:45] *policy* [03:46] But the shell is a part of the display server. Not the display server policy, certainly, but the callback gives the shell a mechanism for better policy. [03:47] hmmm [03:47] I'm sorry but the display server is a library that the shell implements [03:47] the display server imposing non-display-server policy on the shell is wrong [03:47] But this isn't imposing policy? [03:47] yes, the shell and therefore the system can decide what policy to implement regarding application management [03:47] the display server != application manager [03:48] If the shell wants to ignore the completion event that's well within its rights? [03:48] the shell *defines* the existance of an event at all [03:48] maybe the mistake was exposing it Mir in terms of defined semantics [03:49] it should've just been an opaque message passing over a bus [03:49] An extensible Mir protocol. Yeah, that would have been a good idea :) [03:50] sure, but this is what we have due to lack of assigned man power [03:50] if it doesnt fit the XMir world, extend it [03:50] and make it fit, without breaking the touch world [03:50] Sure, that's what I intend to do. [03:51] great, but dont extend the lifecycle states, feel free to add extra stuff to the protobuf message itself [03:51] because the first interferes with the defined model [03:52] But the cleanest way to do that is to add a completion event to the lifecycle callback added the client passes to libmirclient. [03:52] That doesn't extend the lifecycle states, nor does it interfere with the model at all. [03:52] that is not going to happen before post-October planning [03:52] it demands extending a model that has been in place for many months now [03:52] What? [03:52] No it doesn't! [03:52] and it is not something that will be considered for development 4 weeks before delivery date [03:53] I don't see how it extends the lifecycle model? [03:53] well first of all I dont see why the *display* *server* needs to care about an application having saved its state [03:54] that souns like session/application management [03:54] am I wrong to think that? [03:55] Not particularly, but session/application management is also handled through Mir. [03:55] *if* you are doing some sort of session management in Mir to support XMir itself then thats XMir's problem (which only runs legacy applications, afaik) === chihchun is now known as chihchun_afk [03:55] This is true. === chihchun_afk is now known as chihchun [04:28] ricmm: So, http://bazaar.launchpad.net/~raof/platform-api/update-for-lifecycle-cookie/revision/149 is what this would look like, platform-api side. === duflu is now known as duflu|away [05:39] ricmm, the session management is u-s-c managing its children (i.e. XMir), not the applications running underneath those (i.e. the X clients). It's the same logic as in the shell - when an application is no longer visible, then it should be triggered to suspend. In this case, when you switch sessions the old session is asked to "suspend" and it stops reading input (and potentially could do more if it wanted) [05:40] gtg, bye all [05:41] RAOF: Hi! Any thoughts on https://lists.ubuntu.com/archives/mir-devel/2013-September/000376.html ? [05:44] alf_: Not really, no. [05:44] Although... hm. [05:44] It's possible that you're running afoul of the name caching that i965 does? [05:49] RAOF: ? [05:50] In intel_context.c, intel_process_dri2_buffer === duflu|away is now known as duflu [06:04] RAOF: ah, yes, we were trying various things in there with Marteen yesterday but unfortunately didn't fix the problem [06:04] Ah, ok. [06:05] RAOF: (not to say that the problem isn't there, perhaps we didn't try the right fix :)) [06:05] ☺ [06:06] RAOF: I am just saying that we haven't exhausted the investigation of what could be going wrong in intel_process_dri2_buffer with PRIME fds [06:09] RAOF: does xmir copy the entire root window (as it lives in gpu memory) to a mir surface (which also lives on the gpu)? [06:09] or does it copy and blit each subwindow directly [06:09] Entire root window [06:09] smspillaz: Damage rects only [06:09] ah, that makes sense [06:10] yep +1 [06:10] I'd like to subwindow it, actually. [06:10] But that's a discussion for next week. [06:10] RAOF: that would probably make sense for the noncomposited case [06:10] smspillaz: At least the intel DDX seems to loop through the rectangles and take as little as possible each frame [06:11] duflu: The other DDXen are similar, but they copy the bounding-box of the damage [06:11] smspillaz: Indeed. [06:11] Close (and efficient) enough [06:11] RAOF: In fact, possibly _better_ for cache performance [06:11] Indeed [06:12] RAOF: I guess that's why some parts of xmir had to live in the ddx [06:12] Doesn't Intel gate on the number of rectangles, though, and do the bounding box given sufficiently many rectangles? [06:12] because the 2d accel parts are all different wherever you look [06:12] smspillaz: Right [06:12] RAOF: Not AFAIK... it copies the rects as they're given [06:12] cool [06:13] I was just explaining to someone why xmir stuff had to live in the drivers and wanted to make sure I understood the code correctly [06:13] Speaking of which, I must set up a saucy nouveau today. To figure out which bugs are truly common [06:13] There's a long-term intent to do a generic xf86-video-mir using Mir's EGL platform and Glamor, but that's a lot more effort and likely to be less performant. [06:15] duflu: @https://lists.ubuntu.com/archives/mir-devel/2013-September/000379.html, I think the OP is using fglrx? [06:16] RAOF: as I thought [06:16] RAOF: just to confirm, xwayland works by forcing all windows ot live on the cpu right? [06:16] or at least the root window [06:17] erm [06:17] normal windwos actually [06:17] its rootless [06:17] (so that they can be used as a wl_buffer via shm) [06:17] No, XWayland also has DDX patches [06:17] alf_: Possibly a good point, but not true for the existing reporters of that bug [06:18] duflu: sure, just for this particular instance [06:18] RAOF: ah okay [06:19] RAOF: I wonder what's to us from having ddx patches which all they do is "copy damaged bit from PixmapPtr to fd however you like" [06:19] smspillaz: It's just that there's exactly one maintained xwayland DDX patch - intel. I wrote the ati and nouveau patches a year ago, and they're somewhat out of date. [06:19] alf_: Unless it _is_ possible to start Mir with radeon while fglrx kmod is loaded? [06:20] smspillaz: Well, xwayland *doesn't* copy from the PixmapPtr; it shares the backing BO with weston, and submits damage rects. [06:21] duflu: no idea if it's possible... [06:21] RAOF: oh weird [06:21] smspillaz: Nah, it's perfectly sensible for a client-allocated model. [06:22] I guess that makes sense actually [06:22] because xserver was allocating anyways [06:22] I wonder how hard it would be to beat the xserver into accepting some foreign buffer [06:22] probably quite hard [06:26] No, pretty easy actually. [06:26] If we single-buffered in Mir I could totally do that for XMir. [06:26] RAOF: ah, right [06:29] RAOF: BTW, single buffering in SwitchingBundle recently became impossible in order to simplify the logic. But I could add it back in easily enough [06:29] I don't think we particularly want to use single-buffering. [06:30] True [06:36] That's a new one. Unity panel takes up a quarter of the screen height [06:47] RAOF: How do I disable i915 and let nvidia/optimus rule? [06:52] duflu: In your bios? [06:52] RAOF: No such option. Either intel only, or both intel+nv with intel given control of all outputs except VGA :( [06:53] duflu: You *may* have luck with /sys/kernel/debug/vgaswitcheroo [06:53] But it's also possible that the nvidia card is *only* hooked up to VGA [07:01] RAOF: It certainly looks like nvidia only talks to the VGA port. That's quite annoying and unexpected [07:02] Not entirely unexpected. Hardware muxes cost money. [07:02] And suck a bit anyway [07:02] No wonder it cost me so little :/ [07:02] (Unless you do fancy things, like Apple do/did) [07:02] Great. Then I still have only intel hardware for saucy/xmir testing. [07:03] Except for the vga port? :) [07:04] RAOF: I particularly needed dual monitor testing [07:04] Hm. Less useful. [07:13] Oookay. Perhaps I need a second desktop and prerequisite electrical upgrades to the house :/ [07:15] alf_: oh btw are you sure it didn't help things? if so can you please do a strace of the failing process? [07:15] with patches applied [07:16] I only care about the failing instance, maybe it will show a clue of what's wrong [07:16] mlankhorst: ok, just be sure (because I am applying the patches to a local tree), I only care about the intel changes from the diff, right? [07:17] what other changes are there in that diff? [07:22] various other bits here and there e.g. in the gallium state tracker [07:22] oh just some whitespace fixups [07:24] mlankhorst: @populating region.name, if we are dealing with prime fd buffers, won't all incoming buffer only have the .fd field populated? Setting the region flink name, won't help us avoid recreating the region. Am I missing something? [07:26] why wouldn't it? [07:27] if 2 buffers ar equal the flink name would be the same [07:28] but.. mayb3e stracea will help find the issue [07:29] RAOF: Did you (or someone) hack xserver-xorg-video-intel to fix initial mode selection? [07:29] It's *different* now [07:31] mlankhorst: sure, but the incoming buffers don't have the .name field set, just the .fd field, so they will always compare unequal to the region. Unless, that is, the buffer information is also updated somehow? [07:31] alf_: oh like that.. [07:33] alf_: ugh how could that be the case for dri buffers? o.O [07:34] oh right, mir backend is mapped to dri2 [07:35] bah, I'll need to think about it some more.. grr:P [07:37] duflu: Hm, not deliberately? :) [07:39] RAOF: Oh, one definite change seems to be that the intel DDX no longer accepts NullRegion (now crashes) [07:39] Yeah. We should never be passing in NullRegion, though. [07:39] Are we? [07:42] dun dun duuuun [07:42] mlankhorst: that sounds ominous :) [07:44] RAOF: No, but I may need to as a workaround. Unless I can figure out how to fix the intel code :P [07:46] RAOF: Did you investigate https://bugs.freedesktop.org/show_bug.cgi?id=68969 ? [07:46] Freedesktop bug 68969 in Driver/intel "xf86-video-intel 2.99.901 + XMir + multimonitors = all displays black" [Normal,Resolved: notourbug] [07:47] duflu: I did have a look, but didn't get as far as reproducing. [07:50] duflu: I pulled the latest xmir patch from ickle's branch into the latest Ubuntu package, though, so it's some other change in the tree breaking it. [07:56] RAOF: https://github.com/RAOF/mesa/pull/4 [07:58] alf_: Hm. What frees dri2_surf in that case? [08:00] RAOF: dri2_surf is just a cast of surf to dri2_egl_surface, they are the same thing [08:00] ...urgh. Quite true! [08:01] alf_: hm i have a fix for i965, i think [08:01] * alf_ is excited... [08:03] http://paste.debian.net/37818/ [08:05] no idea if it works though or if the fd is correct ;;p [08:06] mlankhorst: thanks, will check [08:26] RAOF: It *looks* like the intel DDX is tracking its own damage per-pixmap, and XMir's multi-monitor optimization of only submitting outputs/pixmaps when dirty is confusing it. Any ideas? I'm going round in circles [08:33] mlankhorst: no luck, bug still occurs? Do you want me to get an strace? [08:33] definitely [08:38] mlankhorst: btw, I tried another experiment: I also pass the GEM name with the incoming buffer (name provided by Mir), but use the fd to create the buffer, and setting the name in the region manually with flink (like the previous patches). I still get the bug, which indicates that the core problem may not be (only) in intel_process_dri2_buffer() [08:39] alf_: fun :p [08:52] alf_: oops, can you set singlesample_mt->region->handle = region->handle in intel_miptree_create_for_dri2_buffer ? [08:52] mlankhorst: sure [09:00] it would appear I missed that part on importing bo's, so it was still 0 ;) [09:24] mlankhorst: https://github.com/afrantzis/mesa/tree/egl-platform-mir-egl-image-i965-experiment [09:24] mlankhorst: (https://github.com/afrantzis/mesa.git branch egl-platform-mir-egl-image-i965-experiment) [09:27] alf_: do you close the original gbm bo afterwards? [09:27] or at any point [09:28] mlankhorst: the BOs are closed only when the Mir surface is destroyed [09:29] what about the pixmap created with dri2_create_image_khr_pixmap [09:29] do you ever close that one? [09:30] mlankhorst: also when the surfaces are destroyed (the bo and respective EGL images are created lazily when the compositor/clients needs them the first time). [09:30] ok but this definitely looks wrong here.. [09:31] + dri2_img->dri_image = [09:31] + dri2_dpy->image->createImageFromName(dri2_dpy->dri_screen, [09:31] + width, [09:31] + height, [09:31] + format, [09:31] + flink_arg.name, [09:31] + stride / 4, [09:31] + NULL); [09:31] mlankhorst: ok, what is wrong with it? [09:32] you can't use FLINK internally [09:34] you'd need to use the dupimage call, assuming it works [09:34] mlankhorst: it doesn't... [09:35] what happens when you try? [09:35] mlankhorst: the call succeeds but I still get errors further down when rendering, even when using GEM names [09:36] mlankhorst: let me paste... [09:36] alf_: yes probably, but it's more correct than flinking [09:39] mlankhorst: here is the strace output with USE_DUP 1 , http://paste.ubuntu.com/6087181/ [09:40] still more correct [09:40] mlankhorst: btw, why can't I FLINK? Note that this is still happening at the EGL platform level, outside any driver specific context. [09:40] alf_: because there is no refcounting in drm [09:40] if you close 1 handle, everything is invalid. userspace has to explicitly keep track themselves [09:42] mlankhorst: I am only getting the global name, I am not opening/closing anything [09:42] alf_: you are creating a new representation of the bo through createImage. [09:45] alf_: but regardless in this case the problem didn't change.. can you add extra traces to libdrm/intel to the GEM_CLOSE calls? [09:48] mlankhorst: I guess that is drm_intel_gem_bo_free(), sure [09:50] with indication of which handle closed, I'm not good enough to understand that part from the ioctl numbers yet :P [10:01] mlankhorst: http://paste.ubuntu.com/6087259/, with USE_DUP = 1 [10:09] I give up === dandrader is now known as dandrader|afk [10:11] alf_: what do close messages look like? [10:12] mlankhorst: "drm_intel_gem_bo_free: ..." [10:13] alf_: because I see some handles that are being re-created after close [10:13] however due to flushing they may not be killed right away after destroying [10:17] alf_: I think mir needs to be smarter, and cache the fd's. check if it's seen them before or not and if so re-use them.. [10:17] or better yet [10:17] allocate them on the client side and give them to mir for use [10:20] mlankhorst: so instead of sending the Prime FD every time, send it once and then send an another id back and forth? [10:20] alf_: no, keep the fd cached on the client side, don't close it === dandrader|afk is now known as dandrader [10:30] mlankhorst: hmm, I think we are already doing this in the client [10:36] alf_: I may have missed the context, but we have two client processes in the chain - nested-mir and the application. Both get passed the fd [10:39] Yeah, we cache fds. [10:43] alan_g: This is about buffer fds used by the final application. (As you know) nested-mir creates the surface/buffers itself. [10:44] alf_: Ok then. Sorry for the noise === chihchun is now known as chihchun_afk === hikiko is now known as hikiko|lunch === alan_g is now known as alan_g|lunch === hikiko|lunch is now known as hikiko === dandrader is now known as dandrader|lunch [12:17] RAOF: you and ricmm all ok ? [12:17] :) === dandrader|lunch is now known as dandrader === alan_g|lunch is now known as alan_g [13:15] alf_: Are you OK with the updated https://code.launchpad.net/~alan-griffiths/mir/spike-nested-input/+merge/184351? [13:20] alan_g: approved [13:20] alf_: thanks [13:48] alf_: anyway the final application is messing up here, nothing the nested mir could do would cause -ENOENT here unless they share the drm fd.. :P === jono is now known as Guest88427 === mzanetti is now known as mzanetti|meeting [14:07] mlankhorst: the final application gets the drm fd so it can use it with mesa [14:07] mlankhorst: and handles everything through it (and libmirclient) [14:08] mlankhorst: through it == mesa [14:15] alf_: yes, but is the same fd shared between mesa and nested mir? === alan_g is now known as alan_g|tea [14:21] mlankhorst: yes (of course not with the same fd number, but the same underlying file) [14:24] mlankhorst: host mir, nested mir and clients/mesa use dup()-ed fds that point to the same drm file instance === mzanetti|meeting is now known as mzanetti [14:30] didrocks: ping [14:31] kgunn: pong [14:31] didrocks: hey...been a while, hope time off was good [14:31] didrocks: just wanting to know...are you ok with resolution here [14:31] https://bugs.launchpad.net/xmir/+bug/1221209 [14:31] Launchpad bug 1221209 in unity-system-compositor (Ubuntu) "need to establish upgrade action when xmir becomes default" [Undecided,New] [14:31] kgunn: was excellent, thanks! In a sprint in Boston this week (so investigating some part of holidays to travel ;)) [14:31] * didrocks looks [14:32] didrocks: basically...robert recommending reboot on xmir distro roll out [14:33] kgunn: yeah, it sounds good (and the right way to implement it) [14:33] didrocks: cool...just wanted to make sure we're ok (before the 11th hour gets here :) [14:33] kgunn: but I think people won't reboot every 4 hours for now (as we release every 4 hours) [14:33] kgunn: so, it comes back to the discussion about ABI stability, is it coming so that we can remove the "hack" for forcing rebuilds in both u-s-c and unity-mir? [14:34] seems I was disconnected… [14:34] 10:33:11 didrocks | kgunn: yeah, it sounds good (and the right way to implement it) [14:34] 10:33:29 didrocks | kgunn: but I think people won't reboot every 4 hours for now (as we release every 4 hours) [14:34] 10:33:54 didrocks | kgunn: so, it comes back to the discussion about ABI stability, is it coming so that we can remove the "hack" for forcing [14:34] | rebuilds in both u-s-c and unity-mir? [14:34] kgunn: ^ [14:35] didrocks: do you recall if there is a bug for that hack ? if not...i'll log one...and tag it for "make-xmir-default" [14:35] https://bugs.launchpad.net/xmir/+bugs?field.tag=make-xmir-default [14:35] kgunn: indeed, let me log it as a bug, one sec === alan_g|tea is now known as alan_g [14:35] didrocks: oh..thanks...yeah, i think its a seperate issue from the reboot discussion [14:37] kgunn: it's linked as because of this, it means that potentially, we force every 4 hours people to reboot [14:38] as we rebuild u-s-c as soon as there is a commit in Mir [14:38] didrocks: true...i see the link...just saying, it deserves its own bug [14:39] kgunn: https://bugs.launchpad.net/xmir/+bug/1223393 [14:39] Launchpad bug 1223393 in XMir "ABI stability of libmirserver" [Undecided,New] [14:39] let me add the tag [14:40] kgunn: TBH, I think ABI stability needs to happen in the incoming 2 weeks [14:40] seeing how far we are, we need to try to stabilize now [14:40] didrocks: no doubt === dandrader is now known as dandrader|afk === dandrader|afk is now known as dandrader === tkamppeter__ is now known as tkamppeter [15:12] bye [15:19] alf_: ARRRRRRRGHHHHHH [15:19] ARGHH [15:19] bad alf_ [15:19] bad! [15:19] mlankhorst: ? [15:20] alf_: if any client closes the bo, they will be closed for all clients.. [15:20] you can't dup the drm fd for that reason [15:23] mlankhorst: but open()-ing a new drm_fd works? [15:23] yes [15:23] mlankhorst: and possibly sending it over a unix socket? [15:24] it will, but the dup is why it fails [15:24] if the nested mir closes their bo it was closed on the other one too :P [15:24] causing the -ENOENT out of nowhere [15:26] mlankhorst: ok, I will try to de-dup() and see if that helps. Note, though, that I don't think we are explicitly closing anything during rendering... [15:26] maybe, but it will at least nail the issue down to the process causing it [15:29] mlankhorst: ah, sorry for the false alarm, we actually fixed that long ago... we drmOpen() a new fd and send it to clients :/ [15:33] mlankhorst: but hmm... [15:35] mlankhorst: I actually see a dup() in NestedMir... I will get rid of this and check what is going on [16:31] mlankhorst: removed stray dup() in nested mir, no change :/ [16:48] kgunn: nested input works on android drivers. (still needs more test coverage, but sort of there) - https://code.launchpad.net/~alan-griffiths/mir/missing-links-to-wire-up-input/+merge/184824 [16:49] alan_g, yay [16:50] kdub: let's forget about this mesa stack. ;) [16:51] alan_g: wow!...i guess input was easier than render [16:51] awesome! [16:52] kgunn: all the input problems were in code we "own" [16:52] alan_g, we could table it! http://translate.google.com/#es/en/mesa [16:53] kdub: sadly, in the UK "table it" means something different. [16:53] need to find an idiom translator now... [16:54] http://en.wikipedia.org/wiki/Table_(parliamentary_procedure) [16:55] Anyway, a good point for... === alan_g is now known as alan_g|EOD [16:59] kdub: so technically mterry could take your android updates+ alan_g|EOD 's input and try to integrate greeter against it [17:00] kgunn, sure, it would be easier if its the ubuntu touch shell/greeter (qml apps?) as opposed to lightdm and xmir though [17:38] I think I've come up with a way to rebuild message processor + socket session + session mediator [17:38] that will fix this DPMS IPC issue but it's kind of invasive [17:39] wondering if I should run with it, or refocus on perhaps doing DPMS for XMir via some out of channel API for the time being [17:40] racarr, whats the plan? [17:41] kdub: So the essentialy difficulty is now, that the message reading loop runs like [17:42] Step -1: Begin read in constructor [17:42] Step 0: Respond to asynbc read [17:42] Step 1: Allow the message processor and session mediator to fully handle the message (i.e. then say block on Surface::advance_buffer [17:43] racarr, why is that? I thought we tried to avoid blocking event loops and be as parallel as possible? [17:43] STep 2: Schedule the next asynchronous read. [17:43] tvoss_: No, the way it's architected now we can't read a second message from a client until a first is fully processed [17:43] racarr, ? [17:44] tvoss_: Because things are written as in Steps 0-2 there, we don't [17:44] read a second message until we have finished processing the first one completely [17:44] which we use to keep messages in order [17:44] the problem is not all messages are in the same "channel" [17:44] racarr, can't we have a special case on the client side though... "if you are the client that turned off the screen, you'll get an error if you try to swapbuffers before you turn the screen on" [17:44] (or did we already consider that?) [17:44] and the particular problem is, say you use DPMS to turn off the screen, then call advance buffer [17:45] racarr, which would be fine, too, if the message handling just handles stuff like dpms asynchronously [17:45] and you are [17:45] perpetually blocked on advance buffer [17:45] so you can never turn [17:45] the screen back on [17:45] kdub: It could always be racy I think (because other people can turn on/off the screen...but it's a possibility) [17:46] racarr, even the server could turn on the screen, but then it would just be an event sent to clients [17:46] kdub, I would rather want to avoid such a hacky appraoch [17:46] kdub: The plan I was developing, was instead to read messages as fast as possible, then the SessionMediator uses a thread pool [17:46] to perform the actual operations and returns futures or whatever to the message processor [17:46] the SessionMediator can use, multiple locks [17:46] to enforce the different channels [17:46] i.e. there is a [17:46] SessionMediator::display_configuration_lock [17:46] and SessionMediator::surface_channel_lock [17:47] so, you can't execute resize_surface while swap is still executing [17:47] but you can reconfigure the display, or say receive a display reconfiguration event [17:48] you know messages within a "sequence" i.e. alll protected by the surface channel lock [17:48] will all be in order, because you don't read the next message [17:48] until you have actually received the std::future from the SessionMediator [17:48] at which point you know the lock from that channel is held [17:49] I am pretty sure it works, but am nervous about doing it because it changes the entire server side threading model basically [17:49] and who knows what that does [17:49] I mean I guess hypothetically I should [17:49] racarr, yeah, thats why i keep thinking [17:49] but I don't seem to be confident about that :p [17:49] 'there's got to be some smarts we could put into the client side' [17:50] I think it's always a race on the client. [17:51] also this may show up other places [17:51] i.e. two surface clients [17:51] if you have to wait until the server responds that you have swapped one buffer before you can begin swapping the next [17:51] its a big change, because client requests go from in-order to out of order [17:51] or rather two (or more channels) and we have to do the sync logic in the server at some point [17:52] kdub: No, that's the thing with still not reading the next message until the SessionMediator takes some lock for you [17:52] it's just there become seperate channels, i.e. display-configuration, and surface [17:52] racarr, well, thats the sync i was talking about :) taking the lock [17:53] but messages in the surface channel will still be processed in the order they are sent [17:53] ah yeah [17:53] yeah [17:53] and it's not super trivial with all the thread pools and such [17:54] with swapbuffers, we currently say 'this might block!' [17:55] we could just say like, 'after calling 'block server/turn screen off' the only thing you can do is some subset of the server functionality' [17:55] I think we understand that to mean though, that mir_client_swap_buffers_sync might not return immediately [17:55] not that you can't call any mir_client_ functions [17:56] Mm. I guess we could say that, espescially perhaps to XMir [17:56] but it seems really difficult if you are writing a multithreaded client [17:56] well, the renderthread just knows the swap could wait an arbitrarily long time [17:57] it doesn't have to know why [17:57] and I worry we will end up with bugs that literally mean the screen turns off and can't come back on :p [17:57] ? [17:57] if xmir accidentally swaps after turning off DPMS [17:57] the swap thread will wait infinitely [17:57] racarr, if it has two threads [17:57] a render thread, and display management thread, its not a problem [17:58] You mean, it's not a problem as long as the client [17:58] never calls swap_buffers after turning off the display? [17:58] I am just not sure it's a good idea to put that requirement on the client when the failure case is [17:58] restart the entire session [17:59] right, as long as the client remembers the well-known "swapbuffers can wait a really long time sometimes" [17:59] ? How does that help you? [17:59] It will wait forever [17:59] there's no way for the system to recover [17:59] even if the client uses [17:59] async swap buffers [18:00] and the client itself isn't blocking [18:00] it can still never turn the display back on. [18:00] ah, the 'server thread is blocked' problem [18:00] Yes :( [18:00] still coming back into the problem :) [18:01] haha no worries, it's confusing, I didn't realized this is what was happening on GBM [18:01] ...for a long time haha [18:02] well, if the client sends 'display off', then 'swapbuffers' [18:02] we know when we receive the swapbuffers command [18:02] that we cannot guarantee that we will ever be able to service the command [18:02] so perhaps an error? [18:04] kdub: I've been thinking about an error yeah [18:04] the thing is how does the client know when to call swap buffers again [18:04] the client then has to like [18:04] call swap buffers, if error, see if the error was because the display was off [18:04] when it turns the screen back on, or sees that the screen has been turned back on [18:04] if so, update some flag so we watch the display configuration for the display to come back on to start our render loop [18:05] the thing is there will be apps and stuff too who will try and swap buffers while the screen is off, not just the client who has to turn the screen back on [18:05] so it has to be a reasonable behavior for them too [18:05] well, those can block [18:05] normally [18:05] unless we throw an error :p [18:06] in which case they have to decide what to do with it, which can't just be call swap_buffers again because they need to be sleeping while [18:06] the screen is off [18:06] just the client that is turning the screen on and off is the problem one that we have to give an error to [18:06] so i think that approach would work, but there's also the reworking of the session mediator [18:07] hmm yeah maybe just errors to the one client... [18:07] because, although i sorta like it the way it is :) if we have a new scenario we might have to improve it to fit the new scenario [18:07] I think it could be improved in general some [18:08] like... on new message, make a std::async to service the request [18:08] I think you should be able to process messages from the same client [18:08] about different surfaces [18:08] concurrently [18:08] (as an example of a general improvement which has kind of similar requirements to this) [18:09] racarr, right [18:09] there's a few other ones i'm sure where that would be beneficial [18:12] racarr, i'm starting to lean more towards improving the server so that a new message is handled in a future [18:14] kdub: Yes I think it should work...I'm worried it might be too big to finish this week though [18:14] lots of test redoing, etc. [18:14] I just had an interesting idea for a "solution" [18:15] if you swap buffers while you turned off the display [18:15] you turn it back on XD [18:15] racarr, it is a fair amount of test reworking and restructuring [18:15] and then xmir tries to do the right thing [18:15] but if it ever fails its not catastrophic [18:15] i.e. the session mediator implicitly turns it on for you [18:15] racarr, not a bad solution :) [18:19] kdub: Mm. [18:19] Ok thanks for talking through it with me :D I'm going to work on other stuff for a few hours [18:19] and then revisit and choose an approach [18:20] maybe wait to sync with Alan in the morning, I bet he has an opinion :D [18:21] racarr, no problem... if you want, we could call a hangout on it for tomorrow morning [18:23] kdub: Mm good idea, ill make sure to be up early enough and we can just do it after the standup [18:59] alf_: meh new strace then :P [18:59] with the dup removed still === seb128_ is now known as seb128 [21:03] mornin robert_ancell [21:03] kgunn, hello [21:14] robert_ancell: curious, any joy on fixin the sec bug/vt input shotgun style [21:15] kgunn, we had a disagreement with ricmm/tvoss over using the app lifecycle api, so we need to do some more convincing there [21:15] robert_ancell: we're supposed to be default on the 19th....i fear we're running out of runway [21:15] kgunn, very much so [21:16] kgunn, I think we're just going to have to make the API change - it shouldn't affect the shell team [21:19] didrocks, still around? [21:19] robert_ancell: yeah, I'm in Boston right now in a sprint [21:20] didrocks, oh, cool. All the autolanding is off for mir right? How do we push through releases now? [21:20] robert_ancell: we just workarounded a dns issue that we are having for the last 3 days [21:20] (an infra issue) [21:20] when I say just, it's really *just* [21:21] didrocks, oh, they weren't intentionally blocked? [21:21] so at least, we'll have build for dailies [21:21] 2 issues :) [21:21] 1. intentionally block [21:21] 2. dns issues making things even not building for the past 3 days [21:21] count 2 as fixed [21:21] for 1, we need to ensure the current image is fine [21:21] and then doing the unity8+mir transition [21:22] is what is in mir trunk for that goal? [21:22] (the thing you do want to release?) [21:22] sorry, but on holidays and then, just back on this sprint, so quite lost on all the things that happened :) [21:25] robert_ancell: still around? [21:26] didrocks, yep [21:26] does my question make sense? [21:27] didrocks, mir trunk should still be going into saucy, otherwise we wont get any critical fixes or features for the phone [21:29] brb, need to restart modem [21:32] blew our 100G cap, just upgraded to 200G. The 64k throttled speed is so unusable... [21:33] robert_ancell_: ok, let's try to get mir, u-s-c, unity-mir and qtubuntu rebuilt for making the transition [21:33] didrocks, the mirclient3 transition? === robert_ancell_ is now known as robert_ancell [21:34] robert_ancell_: https://bugs.launchpad.net/bugs/1218381 is happening on saucy, with additional black vertical lines (there is a bug for the black lines) [21:34] Launchpad bug 1218381 in XMir "saucy has horizontal distortion on ati" [High,Incomplete] [21:35] robotfuel, ta [21:45] robert_ancell: urgh, transition? [21:45] robert_ancell: no transition right now please [21:46] asac tries to get unity-mir on [21:47] :) [21:47] yay!! [21:47] we all try to get it in [21:47] not me :) [23:25] Transition all the things! [23:29] ok [23:32] robert_ancell: Incidentally, http://bazaar.launchpad.net/~raof/platform-api/update-for-lifecycle-cookie/revision/149 is the platform-api update for our proposed change. [23:33] RAOF, yes, saw that [23:33] RAOF, any luck convincing ricmm? [23:33] Not really. [23:34] RAOF, ricmm, I was also thinking, if the correct behaviour when switching sessions is that the hidden session should suspend its apps the only way this can occur is if the system compositor can notify the shell. So it makes sense in the Unity 8 case as well as the XMir case [23:34] RAOF, we should code it up without the cookie for now. That's at least an improvement on the current case. It will have the input overlap issue potentially [23:35] Yeah. I'll just push the branch so you can test.