[00:37] <RAOF> thomi: Yup?
[00:38] <thomi> RAOF: hey, maybe you can help me - I think the mir client library is leaking file handles when running multiple connections from different threads. I *think* it's leaking /dev/dri/card0, but I'm not sure where that's opened in the ode to do any debugging
[00:39] <RAOF> thomi: That's entirely plausible.
[00:41] <thomi> I'm just trying to get some more evidence to support my theory that I've got the correct FD
[00:43] <RAOF> You could do that with strace -e file, right?
[00:44] <thomi> ahh, good idea, I've been taking lsof snapshots over time
[00:46] <RAOF> Yeah. Dump the strace output to a file, parse it with something, and track the open fd.
[01:25] <thomi> RAOF: so on th client side, I don't see any open() calls while the client is spitting out errors, which makes me think that perhaps the client is just reporting a server-side error.
[01:26] <thomi> RAOF:  but running strace on mir_demo_server doesn't seem to show me anything interesting either
[01:26] <thomi> the error I get is one of either:
[01:26] <thomi> Could not connect to server, error is: connect: Too many open files
[01:26] <thomi> or:
[01:26] <thomi> Could not connect to server, error is: eventfd_select_interrupter: Too many open files
[01:27] <RAOF> That does sound a lot like a fd leak, doesn,t it.
[01:27] <thomi> yeah
[01:28] <thomi> if I slow down the client app, it takes longer to appear, so it's soemthing that happens within a connect & disconnect
[01:28] <thomi> is valgrind something that might help in situations like this?
[01:29] <RAOF> It *can* trace fd leaks, yes.
[01:29] <RAOF> Wow. Compiz on llvmpipe is much better in mesa master.
[01:29] <RAOF> I didn't notice that I don't have HW accel.
[01:29] <thomi> heh
[01:30] <thomi> will try valgrind, but I suspect I'm missing something
[01:30] <thomi> it seems like strace is missing some open calls, since lsof shows more files open thatn strace shows open() calls for
[01:30] <RAOF> If it doesn't happen when slowing down the client then maybe it's *not* a leak.
[01:30] <RAOF> Maybe you're just trying to open too many files? :)
[01:30] <thomi> RAOF: no, it does happen, it just takes longer
[01:30] <RAOF> Ah, fair enough.
[01:31] <thomi> RAOF: so my point was that it's linked to the speed at which I run the stress test
[01:31] <RAOF> I was trying to see if glxinfo open()s /dev/dri/card0, but llvmpipe has foiled that brilliant plan.
[01:31] <thomi> haha
[01:33] <thomi> hmmm. when I run mir_demo_server under valgrind, everything works, probably because it's so bloody slow :-/
[01:55] <RAOF> :)
[02:12] <thomi> ugh, I hate trying to track down these sorts of issues
[02:47] <smspillaz> RAOF: compiz and unity or just compiz?
[02:47] <RAOF> smspillaz: Compiz and Unity.
[02:47] <smspillaz> hm, neat
[07:17] <RAOF> thomi: OOH!
[07:17] <thomi> RAOF: ?
[07:18] <RAOF> thomi: I think I know where your fd leak is.
[07:18] <thomi> oh good
[07:18] <thomi> easily fixable?
[07:18] <RAOF> I need to check the stress-test code.
[07:18] <RAOF> But - do you close() the platform fd?
[07:18] <thomi> all it does is a connect & then release
[07:19] <RAOF> Right. So you get the platform package, which contains a drm fd, and then don't close it.
[07:19] <thomi> hmmmm...
[07:19] <thomi> seems like a mistake lots of people are going to make. Can we not make it RAII?
[07:20] <alf_> RAOF: thomi: Could be, although in that case I would argue that it should be closed on release
[07:20] <thomi> alf_: +1
[07:20] <RAOF> alf_: Maybe?
[07:20] <thomi> the thing is: I didn't do anything to get the platform package
[07:20] <thomi> all I do is a connect
[07:20] <RAOF> Correct. You get it automatically on connect.
[07:21] <thomi> right, so if you're going to allocate it automatically, I think you should also deallocate it automatically
[07:21] <thomi> otherwise, how do I know that I need to deallocate it?
[07:21] <RAOF> Well, you don't, unless you're deliberately writing stress-tests :)
[07:22] <alf_> RAOF: what's your concern about deallocating automatically?
[07:22] <RAOF> alf_: Silently closing file descriptors out from under clients smells a bit to me.
[07:23] <RAOF> But since we already do it for the prime fds in buffers, I guess it's got the virtue of consistency.
[07:24] <tvoss> RAOF, alf_ why not make it a policy and default to atexit, while clients have a way to say: please close it asap
[07:25] <alf_> RAOF: If it turns out to be an issue, we could also give copies of the fds to the user when asking for the platform package, and tell them that they need to close it themselves there, and we can still close our own fds internally
[07:25] <thomi> RAOF: surely it should be closed when I release the connection anyway?
[07:26] <RAOF> thomi: Why? You might want it, and it's not in any way bound up to the connection.
[07:26] <RAOF> alf_: I think I prefer that, and would like to also do that for the buffer package.
[07:26] <thomi> I mean, if it's created automatically as a side effect of calling connect(), I'd expect it to be released as a side effect of calling release()
[07:27] <thomi> anyway, at the very least the documentation for the connect() call needs to be updated with the fact that you need to close certain FDs
[07:27] <RAOF> To client code it doesn't look like it's created automatically; it looks like they get it as a result of get_platform_package().
[07:27] <alf_> tvoss: we could, although I find that giving copies is more versatile and straightforward
[07:28] <RAOF> I'd support a policy of only ever handing out dup()'d fds from *get_*_package(), and managing the lifetimes of our internal fds.
[07:28] <tvoss> alf_, yup :)
[07:28] <tvoss> RAOF, +1
[07:29] <RAOF> That will mean that we eat up double the fd space, but it's not like we're consuming a vast range of it now anyway :)
[07:29] <thomi> RAOF: hang on, I never call get_platform_package()
[07:29] <RAOF> thomi: Yeah. That *doesn't* create the platform package, merely accesses it. But from the client code's perspective, that's where it's created.
[07:29] <thomi> right
[07:30] <RAOF> Which is why dup()ing in that call and handling our own fd lifecycles internally makes sense.
[07:31] <alf_> thomi: RAOF: perhaps even better, we should hand a release function to the user when they get the platform/buffer packages, so they will just need to call that, don't care about explicitly closing etc
[07:31] <alf_> tvoss: ^^
[07:32] <thomi> alf_: that would work for me
[07:32] <RAOF> alf_: I don't know; client code that deliberately accesses the platform/buffer packages needs to know the platform-specific nature of that package anyway.
[07:33] <tvoss> alf_, okay, so they receive responsibility and power at the same time
[07:33] <RAOF> I think that if dup() is sufficiently costless then dup() + automatic internal management is a better approach.
[07:35] <alf_> RAOF: I am not arguing against dup. I am saying that the client doesn't need to know the details of how to release the platform package. The FD they get may be dup()ed, maybe not, and the same things goes for other resources contained in the platform/buffer packages.
[07:37] <alf_> RAOF: tvoss: Perhaps an other approach would be, to not dup the FDs, just say that if you need it beyond connection lifetime just make a dup yourself.
[07:38] <alf_> RAOF: tvoss: This doesn't consume more fd space unless needed by the client.
[07:38] <tvoss> alf_, I like that idea. RAOF: thoughts?
[07:38] <tvoss> alf_, it requires the client to express intent quite clearly
[07:38] <tvoss> and our policy is strict and straightforward
[07:39] <RAOF> I'm a fan of the API being able to respond consistently, and I think dup() makes that easier.
[07:39] <alf_> RAOF: what's inconsistend about saying we are giving access to resources but we own them?
[07:39] <RAOF> Otherwise, for example if a client accesses the platform buffer and close()s the drm fd, subsequent calls to get_platform_package() will appear to succeed but return a garbage platform.
[07:41] <alf_> RAOF: so it's mostly about not allowing the client to mess up?
[07:41] <RAOF> Yeah.,
[07:42] <RAOF> Where it's low-cost, I think it makes sense to have a forgiving API.
[07:42] <RAOF> (With an implicit assumption that dup() is low-cost)
[07:45] <mlankhorst> it usually is, it just adds a duplicate to the process' fd list
[07:53] <RAOF> Yeah, it didn't seem like something that would be high-cost.
[07:53] <RAOF> And even if we double-up on fds we're only using about 10 or so, which is a negligible fraction of the 1024 limit(ish).
[07:57] <thomi> ok guys, it's getting late here. Can someone post a msg to the ML with whatever the outcome of this is please? I'll fix up the stress tests in the morning
[08:22] <hikiko> hmm hi, alan_g is not working today?
[08:22] <hikiko> alf_, ping? :)
[08:23] <alf_> hikiko: it's a public holiday for US and UK today
[08:23] <hikiko> :)
[08:23] <hikiko> I have a question
[08:23] <hikiko> why do we use protected destructors in display.h?
[08:24] <hikiko> is it a design pattern?
[08:26] <hikiko> I guess we only want to destroy the object at release time
[08:27] <hikiko> ?
[08:28] <hikiko> sorry :p i don't guess anything :)
[08:31] <alf_> hikiko: Protected destructors in display don't allow destroying a derived *Display object when using a pointer to a base class. This basically enforces the use of a shared_ptr for ownership when upcasting, and that's the reason it is done this way. It's not necessary to do that, though. You can just make the virtual destructor public.
[08:36] <hikiko> :)
[08:36] <hikiko> thanks alf_ !
[10:31] <hikiko> alf_, could you please review this: lp:~hikiko/mir/mir.fix-missing-virtual-destructors if you have some time later? +question: shall I push my future branches as lp:~mir-team/...?
[10:34] <alf_> hikiko: whichever way you prefer is fine
[10:35] <hikiko> :)
[20:14] <thomi> morning
[23:25] <RAOF> Yo!