[13:19] <rjbell4> Keybuk: I was wondering if you could elaborate, as ion mentioned yesterday.
[13:19] <Keybuk> rjbell4: what's up?
[13:19] <rjbell4> I asked the following yesterday: Is there any support in upstart for monitoring multiple child processes?  I've found "expect fork" and "expect daemon", which seem to monitor a single child or grandchild, but what if there are several child processes, and if any of them fail then I want to take action?
[13:21] <Keybuk> that's planned
[13:21] <Keybuk> though obviously within certain limits
[13:21] <Keybuk> since you'd need to know beforehand how many children to expect
[13:21] <Keybuk> how long it typically takes for that number of children to appear
[13:21] <Keybuk> and wouldn't want to die if they double-fork()d along the way
[13:21] <Keybuk> etc.
[13:22] <ion> rjbell4: Please describe your exact use case.
[13:22] <rjbell4> Keybuk: Okay, but not yet implemented, I gather?
[13:23] <ion> When i said 0.10 yesterday, i meant the-next-major-version-still-in-planning.
[13:23] <Keybuk> correct
[13:23] <Keybuk> rjbell4: but without more details, I'm not sure that what's planned will actually do what you want
[13:24] <rjbell4> ion: We have replaced init with upstart on our product, and one of the services we run forks a few different processes to do related-but-separate tasks.  If any of those processes fails (segfaults, whatever), the whole thing should be restarted.
[13:24] <Keybuk> how would you tell Upstart, in advance, what those related-but-separate tasks were?
[13:25] <Keybuk> what can Upstart use to distinguish them?
[13:25] <Keybuk> does the parent process that spawned them remain, or does that terminate?
[13:25] <Keybuk> do any of the processes daemonise, or fork?
[13:26] <rjbell4> I was actually intrigued by the "read-a-pid-a-file" approach, as I thought if the service had support, that might be a cheap way to support telling Upstart which processes to monitor.  But that support appears to have been removed.
[13:26] <Keybuk> are the spawned processes the same executable, or are they exec() of other executables?
[13:26] <Keybuk> rjbell4: right, it's not very reliable
[13:26] <rjbell4> It might suffice to tell upstart to monitor the 3 children of the process that it kicks off, rather that just looking for 1 with "expect fork".  I'm not positive it would work, but I think it might.
[13:27] <Keybuk> can you answer the other questions above?
[13:28] <rjbell4> Keybuk: In this case, I think the parent terminates, and the children keep running.  I'd have to check to be certain, but I believe that's correct.  It's possible that they *should* daemonize (by forking again), but don't.
[13:29] <ion> If one of the children dies, what should happen to the other children?
[13:29] <Keybuk> the problem there is that you're spawning more then three new processes then
[13:30] <Keybuk> so how does Upstart know it's terminating because it's daemonising
[13:30] <Keybuk> or terminating because of an error?
[13:30] <rjbell4> ion: I suspect that should be handled however a "stop" would normally be handled
[13:32] <rjbell4> Keybuk: How is that different from 'expect daemon'?  For example, couldn't there be an 'expect 3 daemons'-like functionality?  I might be missing the problem.
[13:33] <rjbell4> Oh, BTW, good job with upstart.  I'm actually considering what some colleagues have done with Upstart and wondering if it couldn't be done better / easier.
[13:33] <Keybuk> because that just follows one line of processes
[13:33] <Keybuk> you need the mechanism to follow multiple children
[13:33] <Keybuk> and know when it's reached the stable end
[13:33] <Keybuk> I'm thinking, from the information you've given, that it's not a hard problem
[13:33] <Keybuk> I assume that if these children were to exit(0) that'd be ok?
[13:36] <rjbell4> Keybuk: I'd actually expect that to happen only as a result of a stop event.  If they terminated any earlier it should be with an error.  Terminating earlier with a 0 exit status "should never happen", so I wouldn't presume to describe what the result should be.
[13:37] <rjbell4> Keybuk: re: hard problem.  I suspect it's just extra work to track things.  For the "expect 3 daemons" case, you trace the primary process until it forks, then continue to trace the child process as it forks 3 times, each time adding the child process to the list of processes to monitor for that service.
[13:39] <rjbell4> I suppose an argument could be made that this crosses some line and something else should be monitoring more complex models like this, but Upstart just seems to be in the right place to do this.
[13:39] <Keybuk> no, you're not following
[13:40] <Keybuk> the problem is how do you keep count of the forks
[13:40] <Keybuk> a - b - c
[13:40] <Keybuk>      +- d
[13:40] <Keybuk>      +- e
[13:40] <Keybuk>      +- f
[13:40] <Keybuk> that's 5 forks, not 3 ;)
[13:43] <ion> How about this: keep track of *all* children. Whenever one of them exits with a zero exit status and it wasn’t the last process, just remove it from the child list and continue as usual. If one of the dies with a non-zero exit status or due to a signal, consider that an error and kill the other processes (if any). When the last process exits with a zero exit status, consider that a non-error exit.
[13:43] <Keybuk> ion: that's exactly what I was thinking ;-)
[13:44] <rjbell4> Keybuk: Right, you're not just keeping track of the forks, but which process forks.  You ptrace a until it forks, then you continue to trace b until it forks 3 times (in this case, for c and d and e, but not f, because you only told Upstart to trace three children)
[13:44] <Keybuk> ion: that fits with the general model I was going for with netlink
[13:44] <rjbell4> Keybuk, ion: Sounds reasonable, but since the mechanism is ptrace, that unfortunately rules out running a debugging on any of those processes, right?
[13:44] <Keybuk> rather than having "the main process" we're in the "running" state with N processes
[13:44] <ion> Substitute zero exit status with the value in “normal exit”
[13:44] <rjbell4> ^debugging^debugger
[13:45] <Keybuk> rjbell4: the mechanism is not going to be ptrace for long
[13:45] <Keybuk> ptrace doesn't work
[13:45] <Keybuk> ion: right - normal exit usually only implies 0 if it's a task
[13:45] <rjbell4> Keybuk: Oh, well that probably colors my responses then.
[13:45] <Keybuk> but it makes sense that normal exit implies 0 if it's a task *or* there are other processes still running
[13:46] <rjbell4> Keybuk: You guys know what you are doing better than I do (obviously); I'm just providing feedback from a consumer. :-)
[13:46] <Keybuk> rjbell4: I think we can definitely make this work for you
[13:46] <rjbell4> Keybuk: Out of curiosity, what's the new mechanism planned to be?
[13:47] <Keybuk> rjbell4: using a Linux feature called the "proc connector"
[13:47] <Keybuk> it's a netlink socket from which you receive messages for all fork(), exec(), setsid(), setuid(), etc. calls
[13:49] <rjbell4> Keybuk: Interesting, thanks for the info.
[13:50] <Keybuk> ptrace has an annoying race condition
[13:50] <Keybuk> when you get the TRAP for fork(), this actually happens in the parent *after* the child process is spawned
[13:50] <Keybuk> the child STOPs of course
[13:51] <Keybuk> but Upstart ignores that, because it didn't know the pid
[13:57] <ion> Say, Upstart receives a message from the proc connector saying ”process 1234 exited with status 1”. While Upstart begins to process that message, 1234’s child, 1235 already exited with exit status 1 and a new, unrelated process happened to start with pid 1235. Upstart happily kills 1235 and then continues to read further messages from proc connector (”process 1235 exited with status 1” etc). Is this remotely possible? (I haven’t looked at how proc ...
[13:57] <ion> ... connector behaves yet.)
[14:09] <ion> keybuk: Highlight
[14:53] <Keybuk> hmm
[14:53] <Keybuk> ah
[14:53] <Keybuk> no, you're forgetting one key detail
[14:53] <Keybuk> pids aren't reused until you wait() and clean them up
[14:53] <Keybuk> if the process is a direct child of Upstart, or a daemon that has been reparented to Upstart
[14:53] <Keybuk> (pulls up the notes he made about this on his iPhone)
[14:54] <Keybuk> right
[14:54] <Keybuk> receiving a "process 1234 exited" from the proc connector *before* the SIGCHLD means we store a flag
[14:54] <Keybuk> likewise receiving a SIGCHLD for process 1234 before we see the proc connector entry means we store the flag
[14:54] <Keybuk> then on the opposite one, we actually take action
[14:55] <Keybuk> in other words, a child of init is not considered dead until we've been told about it *and* seen the body
[14:55] <Keybuk> at that point, all of its children are implicitly reparented to init
[14:55] <Keybuk> so again, init would receive SIGCHLD on them as well as the proc connector event
[14:55] <Keybuk> so provided we wait for the notification and the body, we're safe
[14:56] <Keybuk> now there's a race as you say, if the children aren't reparented
[14:56] <Keybuk> if the tree is
[14:56] <Keybuk> a
[14:56] <Keybuk>  `-b
[14:56] <Keybuk>  `-c
[14:56] <Keybuk> a is our child, we get SIGCHLD
[14:56] <Keybuk> but b isn't, it's a's child
[15:08] <Keybuk> in that case, I'm not sure that it's up to upstart to supervise b
[15:08] <Keybuk> that's b's job ;)
[15:09] <Keybuk> err, a's job
[15:09] <Keybuk> but if a dies, proc connector means we know about b and c
[15:09] <Keybuk> so know they're reparented to us
[15:09] <Keybuk> so Upstart *will* supervise them both
[15:10] <sadmac2> Keybuk: is it possible to get information about b from proc connector before a dies?
[15:10] <Keybuk> sadmac2: yes, proc connector tells us everything
[15:10] <Keybuk> we will know that a forked b
[15:11] <Keybuk> the exception is if either b or c call setsid(), to put themselves out of a's session
[15:12] <Keybuk> the only reason a process would call setsid() is if it *needs* to be the leader of a session
[15:12] <Keybuk> because it wishes to control a tty
[15:12] <Keybuk> e.g. ssh
[15:12] <Keybuk> in which case, we deliberately abandon it
[15:13] <Keybuk> because when a (sshd) dies, we don't want to just supervise b and c (ssh-login scott) in their place
[15:13] <Keybuk> we want to consider a dying a bad thing
[15:13] <Keybuk> and we don't want to kill b or c either
[15:15] <sadmac2> Keybuk: yes, I was happy when you figured out those bits
[15:16] <sadmac2> Keybuk: here's a question: how do things like gnome-session work in upstart-of-tommorow? We don't really want another per-user session manager duplicating most of our effort, do we?
[15:17] <Keybuk> it's quite easy to just have upstart be the session manager
[15:17] <Keybuk> the question turns out to be whether we want pid #1 to be that session manager,
[15:17] <Keybuk> whether we want a middle-man session manager running as the user,
[15:17] <sadmac2> Keybuk: my thought was gnome-session isn't a process anymore. Its just a taskless state that all the session services depend on
[15:17] <Keybuk>   (but everything still gets reparented to #1 anyway)
[15:17] <Keybuk> or whether we actually want a mini-init for user sessions
[15:17] <Keybuk> such that any user session daemon actually gets reparented to the user session manager
[15:17] <Keybuk> and make the process trees look pretty
[15:18] <sadmac2> Keybuk: can we actually manipulate the reparenting behavior right now?
[15:19] <Keybuk> not without a patch Lennart sent to lkml
[15:19] <Keybuk> http://lkml.org/lkml/2009/5/28/430
[15:20] <ion> keybuk: Alright
[15:20]  * sadmac2 now cringes every time exit.c is patched
[15:21] <Keybuk> lol, why?
[15:21] <sadmac2> Keybuk: its like playing Jenga
[15:21] <Keybuk> it's just the kernel
[15:22] <sadmac2> I really do have to find time to just rewrite that whole file
[15:22] <Keybuk> one of the simpler parts reall
[15:22] <ion> http://www.youtube.com/watch?v=F9BmTmMEOhQ
[15:23] <sadmac2> Keybuk: its particularly poorly written IMHO. if (some return value) { // we have a spinlock } else { //we gave up the spinlock awhile ago } is never a good thing to see
[15:24] <sadmac2> ugly code is worse than broken code. Bugs are easier to fix than hideous.
[15:30] <sadmac2> Keybuk: so in order to take 0.6 in Fedora, we need to not regress on the state transfer thing.
[15:31] <Keybuk> does the patch apply?
[15:31] <sadmac2> Keybuk: and since 0.6 is hopefully forward compatible, that means we need a solution that you've had some architectural say in.
[15:33] <sadmac2> Keybuk: the patch might nearly apply (I'd imagine it needs some heavy reworking) but if you're going to do it differently when you do it, its probably best that we do it your way.
[15:35] <ion> keybuk: Btw, now that jobs are in /etc/init without any 0.6 namespacing, how do you plan to have 0.10 handle 0.6 jobs cleanly? :-) I’m still advocating a separate parser for 0.6 jobs that outputs 0.10 job objects.
[15:36] <sadmac2> Keybuk: what's the last thing you got before you dropped?
[15:36] <Keybuk> ion: there's not that much difference in the syntax
[15:36] <Keybuk> sadmac2: can't remember, what was the last thing I said? :)
[15:36] <sadmac2> Keybuk: does the patch apply?
[15:36] <Keybuk> haven't tried
[15:36] <sadmac2> Keybuk: no, you were asking me
[15:36] <sadmac2> :)
[15:37] <sadmac2> 10:35 < sadmac2> Keybuk: so in order to take 0.6 in Fedora, we need to not regress on the state transfer thing.
[15:37] <sadmac2> 10:35 < Keybuk> does the patch apply?
[15:37] <sadmac2> 10:35 < sadmac2> Keybuk: and since 0.6 is hopefully forward compatible, that means we need a solution that you've had some architectural say in.
[15:37] <sadmac2> 10:38 < sadmac2> Keybuk: the patch might nearly apply (I'd imagine it needs some heavy reworking) but if you're going to do it differently when you do it, its probably best that we  do it your way.
[15:37] <sadmac2> Keybuk: ^^ that's what lead up to you dropping
[15:59] <Keybuk> sadmac2: the problem is I don't have a preferred way of doing it yet
[15:59] <Keybuk> and I glanced through your patch, and I don't see how it can possibly work
[15:59] <Keybuk> you don't transfer most of the state
[15:59] <sadmac2> Keybuk: its enough. It does depend on the configs lining up.
[16:00] <sadmac2> Keybuk: it fixed our broken TTYs anyway
[16:00] <Keybuk> it isn't enough though
[16:00] <Keybuk> what if a job is mid-starting?
[16:00] <sadmac2> Keybuk: bear in mind that I didn't write this. Just reformatted it :)
[16:01] <sadmac2> Keybuk: what are we missing for that. We have the state and the pid. The new process should get the signal and move it forward.
[16:01] <Keybuk> the event queue?
[16:01] <Keybuk> the attached events?
[16:02] <Keybuk> if there's a "start" command running, you don't transfer the D-Bus message structure from one instance to the other
[16:02] <Keybuk> (let along the d-bus connections)
[16:02] <sadmac2> Keybuk: yes. that is true...
[16:04] <sadmac2> Keybuk: wouldn't it be better to just stop taking input and flush all those out?
[16:04] <sadmac2> well, the way blocking is done now you still need the event queue
[16:05] <sadmac2> no that won't work yet..
[16:05] <Keybuk> the problem is you need to be in a state where all services are running or waiting
[16:05] <Keybuk> and all tasks are waiting
[16:05] <Keybuk> it may not be possible to be in that state
[16:06] <sadmac2> yeah. I saw it captured the other states but didn't look at what it needed to advance out of them...
[16:06] <sadmac2> time to stop trusting patches from strangers
[16:10] <sadmac2> Keybuk: wb
[16:14] <ion> Instead of /etc/init/dbus-reconnect.conf, why not do telinit q in /etc/init.d/dbus, and then in /etc/init/dbus.conf whenever it’s moved over?
[16:15] <Keybuk> ion: this was a simpler hack