[12:41] hi, upstart's testsuite hangs in nih_child_poll() (with signal from traced child) [12:42] I left it for ~ 15 minutes before I killed it [12:42] keybuk: Btw, did you notice this? [12:42] 2008-01-25 17:02:41 < ion_> keybuk: An idea occured to me: how about disallowing nih_ref when the object has a parent (so that it always has a reference count of 1 and nih_unref would be used just like nih_free is now – not callect directly on the object), thus reference counting could *optionally* be used with parentless objects? [12:42] 2008-01-25 17:03:54 < ion_> Children would be released whenever their parent is released, and the refcounting may be used with the parent when appropriate just by using nih_ref. [12:46] darius12_: kernel version? [12:46] ion_: that wouldn't seem unreasonable [12:46] ion_: would that let you do what you want? [12:46] 2.6.18 from debian etch [12:46] keybuk: Yes [12:46] I suspected the kernel as well [12:47] darius12_: you need a newer kernel (assuming you're talking about trunk's test suite) [12:47] so I 've been building 2.6.24 to try with this too [12:47] ah, thanks :-) [12:47] the test suite manages to find a couple of kernel bugs ;) [12:47] in fact, that particular one it's hanging on, got a CVE assigned to it [12:48] yes, I 'm talking about trunk's latest testsuite, I noticed you mentioning about the usefulness of upstart as a kernel bug uncovering tool :-) [12:50] the test suite uses waitid(W_NOWAIT) so it can ensure that there's no race conditions [12:50] ie. if it kills a child, it waits without reaping for the child to die, so then when it calls nih_child_poll() it knows *that* function's call to wait() will work [12:51] it turns out that waitid(W_NOWAIT) isn't used much, so had some issues [12:51] especially when combined with ptrace [12:51] I see [12:52] I 'm a bit scared about upstart using ptrace [12:53] Especially if there is a bit of flux in this area when/if utrace gets merged [12:53] I 'm ok with strace breaking for a while but pid #1 is a different story ... [12:56] Then again, if upstart manages to get in fedora it will get testing combined with utrace so bugs will probably get fixed before utrace becomes mainstream [12:59] I haven't heard of utrace before [12:59] It is roland mcgrath's heroic effort to reimplement ptrace in the kernel [13:00] (the kernel-userspace interface of course stays the same) [13:01] It is quite nice, e.g., with it you can implement policies such as "on segfault, just freeze the thread and wait for me to attach a debugger instead of dumping core" [13:02] very easily in few lines of code [13:02] Neat [13:03] and it gets even neater when you want to implement dtrace-like monitoring etc [13:03] It is already in the fedora kernel and it is slowly getting merged upstream [13:16] if the interface is the same, there shouldn't be any problems there? [13:20] just increased probability for new bugs :-) [13:20] ptrace is pretty hard to get right [13:22] reimplementations always do [13:22] happily we have a test suite ;) [13:24] The test suite will give you reassurance that the bug is not yours. [13:25] it also helps us fix the kernel bug too [13:25] since it's a small (10-20 lines of code) test case for the bug [13:25] yep, this is also true :-) [13:27] maybe upstart's testsuite should be added to test.kernel.org [13:33] :-) [13:45] Is there a way to declare in upstart that bash wants to be killed with SIGHUP? [13:48] it does? [13:48] (not yet) [13:48] yes, otherwise it doesn't save its history [13:48] which is very annoying [13:50] you spawn a login shell with Upstart? :) [13:51] don't you usually spawn getty or login for that? [13:51] yes, but how about kde konsole sessions? [13:52] bbiaw [13:54] those get killed by other means [13:54] Upstart doesn't send them a signal directly since it doesn't know about them [13:55] they might be in a process group of something that Upstart does know about [13:55] or they might get the signal from something else that is being killed itself [15:01] back [15:02] you are right. Login sends sighup to its child login shell when it gets a sigterm [15:02] This didn't happen when I was testing it ages ago (it seems it has been fixed since 2003) [15:03] So there is no problem [15:04] for kde I guess when it doesn't work, konsole is to blame === kyle__ is now known as kylem [17:02] Keybuk: with kernel 2.6.24 I get 3 testsuite failures: [17:03] nih_dir_walk() nih_main_write_pidfile() and nih_watch_new() [17:04] which failures? [17:05] http://pastebin.ca/895177 [17:17] err, ok [17:17] how kooky [17:20] things that should fail are not failing [17:39] which kernel version are you testing upstart with? [17:40] ubuntu hardy stock [17:41] it looks like it is 2.6.24-rc8 based [17:41] strange