darius12_ | hi, upstart's testsuite hangs in nih_child_poll() (with signal from traced child) | 12:41 |
---|---|---|
darius12_ | I left it for ~ 15 minutes before I killed it | 12:42 |
ion_ | keybuk: Btw, did you notice this? | 12:42 |
ion_ | 2008-01-25 17:02:41 < ion_> keybuk: An idea occured to me: how about disallowing nih_ref when the object has a parent (so that it always has a reference count of 1 and nih_unref would be used just like nih_free is now – not callect directly on the object), thus reference counting could *optionally* be used with parentless objects? | 12:42 |
ion_ | 2008-01-25 17:03:54 < ion_> Children would be released whenever their parent is released, and the refcounting may be used with the parent when appropriate just by using nih_ref. | 12:42 |
Keybuk | darius12_: kernel version? | 12:46 |
Keybuk | ion_: that wouldn't seem unreasonable | 12:46 |
Keybuk | ion_: would that let you do what you want? | 12:46 |
darius12_ | 2.6.18 from debian etch | 12:46 |
ion_ | keybuk: Yes | 12:46 |
darius12_ | I suspected the kernel as well | 12:46 |
Keybuk | darius12_: you need a newer kernel (assuming you're talking about trunk's test suite) | 12:47 |
darius12_ | so I 've been building 2.6.24 to try with this too | 12:47 |
darius12_ | ah, thanks :-) | 12:47 |
Keybuk | the test suite manages to find a couple of kernel bugs ;) | 12:47 |
Keybuk | in fact, that particular one it's hanging on, got a CVE assigned to it | 12:47 |
darius12_ | yes, I 'm talking about trunk's latest testsuite, I noticed you mentioning about the usefulness of upstart as a kernel bug uncovering tool :-) | 12:48 |
Keybuk | the test suite uses waitid(W_NOWAIT) so it can ensure that there's no race conditions | 12:50 |
Keybuk | ie. if it kills a child, it waits without reaping for the child to die, so then when it calls nih_child_poll() it knows *that* function's call to wait() will work | 12:50 |
Keybuk | it turns out that waitid(W_NOWAIT) isn't used much, so had some issues | 12:51 |
Keybuk | especially when combined with ptrace | 12:51 |
darius12_ | I see | 12:51 |
darius12_ | I 'm a bit scared about upstart using ptrace | 12:52 |
darius12_ | Especially if there is a bit of flux in this area when/if utrace gets merged | 12:53 |
darius12_ | I 'm ok with strace breaking for a while but pid #1 is a different story ... | 12:53 |
darius12_ | Then again, if upstart manages to get in fedora it will get testing combined with utrace so bugs will probably get fixed before utrace becomes mainstream | 12:56 |
Keybuk | I haven't heard of utrace before | 12:59 |
darius12_ | It is roland mcgrath's heroic effort to reimplement ptrace in the kernel | 12:59 |
darius12_ | (the kernel-userspace interface of course stays the same) | 13:00 |
darius12_ | It is quite nice, e.g., with it you can implement policies such as "on segfault, just freeze the thread and wait for me to attach a debugger instead of dumping core" | 13:01 |
darius12_ | very easily in few lines of code | 13:02 |
ion_ | Neat | 13:02 |
darius12_ | and it gets even neater when you want to implement dtrace-like monitoring etc | 13:03 |
darius12_ | It is already in the fedora kernel and it is slowly getting merged upstream | 13:03 |
Keybuk | if the interface is the same, there shouldn't be any problems there? | 13:16 |
darius12_ | just increased probability for new bugs :-) | 13:20 |
darius12_ | ptrace is pretty hard to get right | 13:20 |
Keybuk | reimplementations always do | 13:22 |
Keybuk | happily we have a test suite ;) | 13:22 |
darius12_ | The test suite will give you reassurance that the bug is not yours. | 13:24 |
Keybuk | it also helps us fix the kernel bug too | 13:25 |
Keybuk | since it's a small (10-20 lines of code) test case for the bug | 13:25 |
darius12_ | yep, this is also true :-) | 13:25 |
darius12_ | maybe upstart's testsuite should be added to test.kernel.org | 13:27 |
Keybuk | :-) | 13:33 |
darius12_ | Is there a way to declare in upstart that bash wants to be killed with SIGHUP? | 13:45 |
Keybuk | it does? | 13:48 |
Keybuk | (not yet) | 13:48 |
darius12_ | yes, otherwise it doesn't save its history | 13:48 |
darius12_ | which is very annoying | 13:48 |
Keybuk | you spawn a login shell with Upstart? :) | 13:50 |
Keybuk | don't you usually spawn getty or login for that? | 13:51 |
darius12_ | yes, but how about kde konsole sessions? | 13:51 |
darius12_ | bbiaw | 13:52 |
Keybuk | those get killed by other means | 13:54 |
Keybuk | Upstart doesn't send them a signal directly since it doesn't know about them | 13:54 |
Keybuk | they might be in a process group of something that Upstart does know about | 13:55 |
Keybuk | or they might get the signal from something else that is being killed itself | 13:55 |
darius12_ | back | 15:01 |
darius12_ | you are right. Login sends sighup to its child login shell when it gets a sigterm | 15:02 |
darius12_ | This didn't happen when I was testing it ages ago (it seems it has been fixed since 2003) | 15:02 |
darius12_ | So there is no problem | 15:03 |
darius12_ | for kde I guess when it doesn't work, konsole is to blame | 15:04 |
=== kyle__ is now known as kylem | ||
darius12_ | Keybuk: with kernel 2.6.24 I get 3 testsuite failures: | 17:02 |
darius12_ | nih_dir_walk() nih_main_write_pidfile() and nih_watch_new() | 17:03 |
Keybuk | which failures? | 17:04 |
darius12_ | http://pastebin.ca/895177 | 17:05 |
Keybuk | err, ok | 17:17 |
Keybuk | how kooky | 17:17 |
Keybuk | things that should fail are not failing | 17:20 |
darius12_ | which kernel version are you testing upstart with? | 17:39 |
Keybuk | ubuntu hardy stock | 17:40 |
darius12_ | it looks like it is 2.6.24-rc8 based | 17:41 |
darius12_ | strange | 17:41 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!