[12:41] <darius12_> hi, upstart's testsuite hangs in nih_child_poll() (with signal from traced child)
[12:42] <darius12_> I left it for ~ 15 minutes before I killed it
[12:42] <ion_> keybuk: Btw, did you notice this?
[12:42] <ion_> 2008-01-25 17:02:41 < ion_> keybuk: An idea occured to me: how about disallowing nih_ref when the object has a parent (so that it always has a reference count of 1 and nih_unref would be used just like nih_free is now – not callect directly on the object), thus reference counting could *optionally* be used with parentless objects?
[12:42] <ion_> 2008-01-25 17:03:54 < ion_> Children would be released whenever their parent is released, and the refcounting may be used with the parent when appropriate just by using nih_ref.
[12:46] <Keybuk> darius12_: kernel version?
[12:46] <Keybuk> ion_: that wouldn't seem unreasonable
[12:46] <Keybuk> ion_: would that let you do what you want?
[12:46] <darius12_> 2.6.18 from debian etch
[12:46] <ion_> keybuk: Yes
[12:46] <darius12_> I suspected the kernel as well
[12:47] <Keybuk> darius12_: you need a newer kernel (assuming you're talking about trunk's test suite)
[12:47] <darius12_> so I 've been building 2.6.24 to try with this too
[12:47] <darius12_> ah, thanks :-)
[12:47] <Keybuk> the test suite manages to find a couple of kernel bugs ;)
[12:47] <Keybuk> in fact, that particular one it's hanging on, got a CVE assigned to it
[12:48] <darius12_> yes, I 'm talking about trunk's latest testsuite, I noticed you mentioning about the usefulness of upstart as a kernel bug uncovering tool :-)
[12:50] <Keybuk> the test suite uses waitid(W_NOWAIT) so it can ensure that there's no race conditions
[12:50] <Keybuk> ie. if it kills a child, it waits without reaping for the child to die, so then when it calls nih_child_poll() it knows *that* function's call to wait() will work
[12:51] <Keybuk> it turns out that waitid(W_NOWAIT) isn't used much, so had some issues
[12:51] <Keybuk> especially when combined with ptrace
[12:51] <darius12_> I see
[12:52] <darius12_> I 'm a bit scared about upstart using ptrace
[12:53] <darius12_> Especially if there is a bit of flux in this area when/if utrace gets merged
[12:53] <darius12_> I 'm ok with strace breaking for a while but pid #1 is a different story ...
[12:56] <darius12_> Then again, if upstart manages to get in fedora it will get testing combined with utrace so bugs will probably get fixed before utrace becomes mainstream
[12:59] <Keybuk> I haven't heard of utrace before
[12:59] <darius12_> It is roland mcgrath's heroic effort to reimplement ptrace in the kernel
[13:00] <darius12_> (the kernel-userspace interface of course stays the same)
[13:01] <darius12_> It is quite nice, e.g., with it you can implement policies such as "on segfault, just freeze the thread and wait for me to attach a debugger instead of dumping core"
[13:02] <darius12_> very easily in few lines of code
[13:02] <ion_> Neat
[13:03] <darius12_> and it gets even neater when you want to implement dtrace-like monitoring etc
[13:03] <darius12_> It is already in the fedora kernel and it is slowly getting merged upstream
[13:16] <Keybuk> if the interface is the same, there shouldn't be any problems there?
[13:20] <darius12_> just increased probability for new bugs :-)
[13:20] <darius12_> ptrace is pretty hard to get right
[13:22] <Keybuk> reimplementations always do
[13:22] <Keybuk> happily we have a test suite ;)
[13:24] <darius12_> The test suite will give you reassurance that the bug is not yours.
[13:25] <Keybuk> it also helps us fix the kernel bug too
[13:25] <Keybuk> since it's a small (10-20 lines of code) test case for the bug
[13:25] <darius12_> yep, this is also true :-)
[13:27] <darius12_> maybe upstart's testsuite should be added to test.kernel.org
[13:33] <Keybuk> :-)
[13:45] <darius12_> Is there a way to declare in upstart that bash wants to be killed with SIGHUP?
[13:48] <Keybuk> it does?
[13:48] <Keybuk> (not yet)
[13:48] <darius12_> yes, otherwise it doesn't save its history
[13:48] <darius12_> which is very annoying
[13:50] <Keybuk> you spawn a login shell with Upstart? :)
[13:51] <Keybuk> don't you usually spawn getty or login for that?
[13:51] <darius12_> yes, but how about kde konsole sessions?
[13:52] <darius12_> bbiaw
[13:54] <Keybuk> those get killed by other means
[13:54] <Keybuk> Upstart doesn't send them a signal directly since it doesn't know about them
[13:55] <Keybuk> they might be in a process group of something that Upstart does know about
[13:55] <Keybuk> or they might get the signal from something else that is being killed itself
[15:01] <darius12_> back 
[15:02] <darius12_> you are right. Login sends sighup to its child login shell when it gets a sigterm
[15:02] <darius12_> This didn't happen when I was testing it ages ago (it seems it has been fixed since 2003)
[15:03] <darius12_> So there is no problem
[15:04] <darius12_> for kde I guess when it doesn't work, konsole is to blame
[17:02] <darius12_> Keybuk: with kernel 2.6.24 I get 3 testsuite failures:
[17:03] <darius12_> nih_dir_walk()  nih_main_write_pidfile() and nih_watch_new()
[17:04] <Keybuk> which failures?
[17:05] <darius12_> http://pastebin.ca/895177
[17:17] <Keybuk> err, ok
[17:17] <Keybuk> how kooky
[17:20] <Keybuk> things that should fail are not failing
[17:39] <darius12_> which kernel version are you testing upstart with?
[17:40] <Keybuk> ubuntu hardy stock
[17:41] <darius12_> it looks like it is 2.6.24-rc8 based
[17:41] <darius12_> strange