[15:26] "start on started rc2" doesn't really do what I want. [15:49] I think what I want to do would require the rc2 job to emit an event like "rc2-finished". This is different from "stopped rc2" because that happens regardless of whether or not rc2 was killed due to switching runlevels or rc2 finished normally (and we're in runlevel 2). [15:49] Am I thinking about this wrong? [15:50] Can I do conditional "start on " directives? [15:50] Like "start on runlevel 2 and stopped rc2"? [15:50] That doesn't seem like it would work because it's not an event that happens at a specific time... [15:53] fbond: I believe the stopped event has an environment variable (I think its RESULT) that says whether it was a success or a failure [15:53] fbond: so start on stopped rc2 RESULT=ok [15:54] sadmac2: My only concern is a race condition. [15:55] fbond: what race condition? [15:55] If we go into runlevel 2 and then switch to runlevel 3 before rc2 has totally finished, is it not at all possible that some odd timing (rc2 finishes just after we enter runlevel 3) will lead to me receiving the events out of order? [15:56] Let me be more explicit: [15:56] I want a job with: [15:56] start on stopped rc2 RESULT=ok [15:56] stop on runlevel [!2] [15:56] This will make sure the job doesn't get started until after the rc2 init scripts have finished. [15:57] those two lines should do it [15:57] But what if rc2 init scripts finish normally just after runlevel 3 is entered but before upstart kills rc2? [15:57] Is that not possible? [15:58] If that happened, my job would be running in runlevel 3. [15:58] (Sorry if I'm making this overly complicated. I'm just wondering what does upstart do to prevent this?) [15:59] I don't know that that can happen [16:00] I think it can... [16:00] but I'm not certain now... [16:00] Because runlevel 3 is an event, so it happens before upstart kills rc2. [16:00] That leaves a brief window for things to go wrong (unless I'm missing something). [16:02] fbond: upstart is single-threaded internally, so I think that the normal exit of rc2 and the runlevel event would be serialized [16:02] fbond: if the runlevel event happened first, upstart would register rc2 as "killed" even though it had terminated successfully [16:02] I think [16:02] sadmac2: The runlevel event *must* happen first, right? [16:02] Because rc2 has "stop on runlevel [!2]" [16:03] sadmac2: Okay, is upstart smart enough to say "rc2 failed because I would have killed it"? [16:03] Time to look at the source I guess... [16:03] fbond: its more dumb enough I think :) [16:04] sadmac2: ;) [16:05] Any idea where in the source I ought to look? [16:05] This is hard to grep for :) [16:06] fbond: init/job.c will be helpful [16:06] sadmac2: thanks [16:08] sadmac2: Doesn't look good. [16:09] job_child_reaper appears to set the job status as failed based on process exit status. [16:09] That doesn't take into account what upstart may have wanted to do to the process. ;) [16:10] fbond: but that's fine [16:10] It is? [16:10] fbond: because the runlevel event hasn't been processed at all yet [16:10] fbond: so runlevel 3 hasn't been received by anything [16:10] I don't see how. [16:10] fbond: upstart can only be doing one thing at a time [16:11] You are saying the event has been emitted but not finished processing? [16:11] fbond: not even started processing. [16:11] That can't be true. [16:11] Because stopping rc2 happens in response to the runlevel 3 event. [16:11] Right? [16:11] fbond: that's a different case though :) [16:11] I'm not sure I follow you. I've been talking about the same case the whole time. ;) [16:13] sadmac2: Do we agree that there is a race in that particular case? [16:13] fbond: for your race condition, either the child is reaped before the runlevel event is processed (in which case it appears rc2 finished before runlevel was ever emitted) OR the runlevel event is processed before the child is reaped (in which case upstart tries to kill it anyway and just registers a failure) [16:14] sadmac2: There is a third case. [16:14] The process dies just before upstart tries to kill it. [16:14] The failure isn't registered because the process exited normally. [16:15] fbond: upstart should call it a failure if it explicitly killed something [16:15] fbond: and I think it won't even wait for the reap result. [16:16] sadmac2: That's not what the code I'm looking at says (unless I'm missing something). [16:16] fbond: I have to be somewhere. I'll be back a bit later [16:16] If you reviewed the code and told me I was wrong I would be exceedingly pleased. [16:16] sadmac2: Okay, no worries. [16:24] fbond: ah, looks like I have more time than I thought [16:25] sadmac2: Good for you. ;) [16:26] sadmac2: BTW, I'm looking at upstart source from Hardy (0.3.9-2). [16:26] fbond: ohh [16:26] fbond: I was looking at 0.5.0 [16:26] hold on... [16:44] fbond: ok, the RESULT variable is new in 0.5.0 [16:44] so that won't work :) [16:45] sadmac2: Ah. So I should just write an init script, I guess. [16:45] fbond: that's an option [16:45] probably the best option [16:46] sadmac2: Okay, KISS, I guess. [16:46] heh [16:46] sadmac2: Too bad, though. [16:46] I just want upstart to take over already. [16:46] Can't use it to replace an init script until all of the init scripts are gone. [16:46] That's a bummer. [16:48] sadmac2: I think that still leaves an obvious use case for upstart that is not well supported ... ? [16:48] sadmac2: But I won't be needing it for my current project. [16:48] sadmac2: Thanks for your help! [16:49] fbond: np [23:19] fbond: itsn't that a contraditon ? [23:21] fbond: I was reading the backlog. I see you found something