/srv/irclogs.ubuntu.com/2013/05/27/#ubuntu-mir.txt

RAOF	thomi: Yup?	00:37
thomi	RAOF: hey, maybe you can help me - I think the mir client library is leaking file handles when running multiple connections from different threads. I think it's leaking /dev/dri/card0, but I'm not sure where that's opened in the ode to do any debugging	00:38
RAOF	thomi: That's entirely plausible.	00:39
thomi	I'm just trying to get some more evidence to support my theory that I've got the correct FD	00:41
RAOF	You could do that with strace -e file, right?	00:43
thomi	ahh, good idea, I've been taking lsof snapshots over time	00:44
RAOF	Yeah. Dump the strace output to a file, parse it with something, and track the open fd.	00:46
thomi	RAOF: so on th client side, I don't see any open() calls while the client is spitting out errors, which makes me think that perhaps the client is just reporting a server-side error.	01:25
thomi	RAOF: but running strace on mir_demo_server doesn't seem to show me anything interesting either	01:26
thomi	the error I get is one of either:	01:26
thomi	Could not connect to server, error is: connect: Too many open files	01:26
thomi	or:	01:26
thomi	Could not connect to server, error is: eventfd_select_interrupter: Too many open files	01:26
RAOF	That does sound a lot like a fd leak, doesn,t it.	01:27
thomi	yeah	01:27
thomi	if I slow down the client app, it takes longer to appear, so it's soemthing that happens within a connect & disconnect	01:28
thomi	is valgrind something that might help in situations like this?	01:28
RAOF	It can trace fd leaks, yes.	01:29
RAOF	Wow. Compiz on llvmpipe is much better in mesa master.	01:29
RAOF	I didn't notice that I don't have HW accel.	01:29
thomi	heh	01:29
thomi	will try valgrind, but I suspect I'm missing something	01:30
thomi	it seems like strace is missing some open calls, since lsof shows more files open thatn strace shows open() calls for	01:30
RAOF	If it doesn't happen when slowing down the client then maybe it's not a leak.	01:30
RAOF	Maybe you're just trying to open too many files? :)	01:30
thomi	RAOF: no, it does happen, it just takes longer	01:30
RAOF	Ah, fair enough.	01:30
thomi	RAOF: so my point was that it's linked to the speed at which I run the stress test	01:31
RAOF	I was trying to see if glxinfo open()s /dev/dri/card0, but llvmpipe has foiled that brilliant plan.	01:31
thomi	haha	01:31
thomi	hmmm. when I run mir_demo_server under valgrind, everything works, probably because it's so bloody slow :-/	01:33
RAOF	:)	01:55
thomi	ugh, I hate trying to track down these sorts of issues	02:12
smspillaz	RAOF: compiz and unity or just compiz?	02:47
RAOF	smspillaz: Compiz and Unity.	02:47
smspillaz	hm, neat	02:47
=== Mirv_ is now known as Mirv
RAOF	thomi: OOH!	07:17
thomi	RAOF: ?	07:17
RAOF	thomi: I think I know where your fd leak is.	07:18
thomi	oh good	07:18
thomi	easily fixable?	07:18
RAOF	I need to check the stress-test code.	07:18
RAOF	But - do you close() the platform fd?	07:18
thomi	all it does is a connect & then release	07:18
RAOF	Right. So you get the platform package, which contains a drm fd, and then don't close it.	07:19
thomi	hmmmm...	07:19
thomi	seems like a mistake lots of people are going to make. Can we not make it RAII?	07:19
alf_	RAOF: thomi: Could be, although in that case I would argue that it should be closed on release	07:20
thomi	alf_: +1	07:20
RAOF	alf_: Maybe?	07:20
thomi	the thing is: I didn't do anything to get the platform package	07:20
thomi	all I do is a connect	07:20
RAOF	Correct. You get it automatically on connect.	07:20
thomi	right, so if you're going to allocate it automatically, I think you should also deallocate it automatically	07:21
thomi	otherwise, how do I know that I need to deallocate it?	07:21
RAOF	Well, you don't, unless you're deliberately writing stress-tests :)	07:21
alf_	RAOF: what's your concern about deallocating automatically?	07:22
RAOF	alf_: Silently closing file descriptors out from under clients smells a bit to me.	07:22
RAOF	But since we already do it for the prime fds in buffers, I guess it's got the virtue of consistency.	07:23
tvoss	RAOF, alf_ why not make it a policy and default to atexit, while clients have a way to say: please close it asap	07:24
alf_	RAOF: If it turns out to be an issue, we could also give copies of the fds to the user when asking for the platform package, and tell them that they need to close it themselves there, and we can still close our own fds internally	07:25
thomi	RAOF: surely it should be closed when I release the connection anyway?	07:25
RAOF	thomi: Why? You might want it, and it's not in any way bound up to the connection.	07:26
RAOF	alf_: I think I prefer that, and would like to also do that for the buffer package.	07:26
thomi	I mean, if it's created automatically as a side effect of calling connect(), I'd expect it to be released as a side effect of calling release()	07:26
thomi	anyway, at the very least the documentation for the connect() call needs to be updated with the fact that you need to close certain FDs	07:27
RAOF	To client code it doesn't look like it's created automatically; it looks like they get it as a result of get_platform_package().	07:27
alf_	tvoss: we could, although I find that giving copies is more versatile and straightforward	07:27
RAOF	I'd support a policy of only ever handing out dup()'d fds from get__package(), and managing the lifetimes of our internal fds.	07:28
tvoss	alf_, yup :)	07:28
tvoss	RAOF, +1	07:28
RAOF	That will mean that we eat up double the fd space, but it's not like we're consuming a vast range of it now anyway :)	07:29
thomi	RAOF: hang on, I never call get_platform_package()	07:29
RAOF	thomi: Yeah. That doesn't create the platform package, merely accesses it. But from the client code's perspective, that's where it's created.	07:29
thomi	right	07:29
RAOF	Which is why dup()ing in that call and handling our own fd lifecycles internally makes sense.	07:30
alf_	thomi: RAOF: perhaps even better, we should hand a release function to the user when they get the platform/buffer packages, so they will just need to call that, don't care about explicitly closing etc	07:31
alf_	tvoss: ^^	07:31
thomi	alf_: that would work for me	07:32
RAOF	alf_: I don't know; client code that deliberately accesses the platform/buffer packages needs to know the platform-specific nature of that package anyway.	07:32
tvoss	alf_, okay, so they receive responsibility and power at the same time	07:33
RAOF	I think that if dup() is sufficiently costless then dup() + automatic internal management is a better approach.	07:33
alf_	RAOF: I am not arguing against dup. I am saying that the client doesn't need to know the details of how to release the platform package. The FD they get may be dup()ed, maybe not, and the same things goes for other resources contained in the platform/buffer packages.	07:35
alf_	RAOF: tvoss: Perhaps an other approach would be, to not dup the FDs, just say that if you need it beyond connection lifetime just make a dup yourself.	07:37
alf_	RAOF: tvoss: This doesn't consume more fd space unless needed by the client.	07:38
tvoss	alf_, I like that idea. RAOF: thoughts?	07:38
tvoss	alf_, it requires the client to express intent quite clearly	07:38
tvoss	and our policy is strict and straightforward	07:38
RAOF	I'm a fan of the API being able to respond consistently, and I think dup() makes that easier.	07:39
alf_	RAOF: what's inconsistend about saying we are giving access to resources but we own them?	07:39
RAOF	Otherwise, for example if a client accesses the platform buffer and close()s the drm fd, subsequent calls to get_platform_package() will appear to succeed but return a garbage platform.	07:39
alf_	RAOF: so it's mostly about not allowing the client to mess up?	07:41
RAOF	Yeah.,	07:41
RAOF	Where it's low-cost, I think it makes sense to have a forgiving API.	07:42
RAOF	(With an implicit assumption that dup() is low-cost)	07:42
mlankhorst	it usually is, it just adds a duplicate to the process' fd list	07:45
RAOF	Yeah, it didn't seem like something that would be high-cost.	07:53
RAOF	And even if we double-up on fds we're only using about 10 or so, which is a negligible fraction of the 1024 limit(ish).	07:53
thomi	ok guys, it's getting late here. Can someone post a msg to the ML with whatever the outcome of this is please? I'll fix up the stress tests in the morning	07:57
hikiko	hmm hi, alan_g is not working today?	08:22
hikiko	alf_, ping? :)	08:22
alf_	hikiko: it's a public holiday for US and UK today	08:23
hikiko	:)	08:23
hikiko	I have a question	08:23
hikiko	why do we use protected destructors in display.h?	08:23
hikiko	is it a design pattern?	08:24
hikiko	I guess we only want to destroy the object at release time	08:26
hikiko	?	08:27
hikiko	sorry :p i don't guess anything :)	08:28
alf_	hikiko: Protected destructors in display don't allow destroying a derived *Display object when using a pointer to a base class. This basically enforces the use of a shared_ptr for ownership when upcasting, and that's the reason it is done this way. It's not necessary to do that, though. You can just make the virtual destructor public.	08:31
hikiko	:)	08:36
hikiko	thanks alf_ !	08:36
hikiko	alf_, could you please review this: lp:~hikiko/mir/mir.fix-missing-virtual-destructors if you have some time later? +question: shall I push my future branches as lp:~mir-team/...?	10:31
alf_	hikiko: whichever way you prefer is fine	10:34
hikiko	:)	10:35
=== tvoss is now known as tvoss\|lunch
=== mzanetti is now known as mzanetti\|lunch
=== tvoss\|lunch is now known as tvoss
=== mzanetti\|lunch is now known as mzanetti
=== mmrazik is now known as mmrazik\|afk
=== francisco is now known as Guest85729
=== mmrazik\|afk is now known as mmrazik
=== mmrazik is now known as mmrazik\|afk
=== greyback is now known as greyback\|food
=== tvoss is now known as tvoss\|test
=== tvoss\|test is now known as tvoss
=== greyback\|food is now known as greyback
=== francisco is now known as Guest35949
thomi	morning	20:14
RAOF	Yo!	23:25

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!