=== Amaranth [n=travis@ubuntu/member/amaranth] has joined #upstart === theCore [n=alex@modemcable229.181-131-66.mc.videotron.ca] has joined #upstart === theCore [i=alex@gateway/tor/x-609a03f7c1992d97] has joined #upstart === theCore [n=alex@modemcable229.181-131-66.mc.videotron.ca] has joined #upstart === theCore [n=alex@modemcable229.181-131-66.mc.videotron.ca] has joined #upstart === Amaranth [n=travis@ubuntu/member/amaranth] has joined #upstart === theCore [n=alex@modemcable229.181-131-66.mc.videotron.ca] has joined #upstart === theCore [n=alex@modemcable229.181-131-66.mc.videotron.ca] has joined #upstart === theCore [n=alex@modemcable229.181-131-66.mc.videotron.ca] has joined #upstart === wasabi__ [n=wasabi@cpe-76-184-95-8.tx.res.rr.com] has joined #upstart === theCore [n=alex@modemcable229.181-131-66.mc.videotron.ca] has joined #upstart === Kamping_Kaiser [n=kgoetz@easyubuntu/docteam/kgoetz] has joined #upstart === Amaranth [n=travis@ubuntu/member/amaranth] has joined #upstart === Kamping_Kaiser [n=kgoetz@easyubuntu/docteam/kgoetz] has joined #upstart === Keybuk [n=scott@quest.netsplit.com] has joined #upstart === Md [i=md@freenode/staff/md] has joined #upstart === Md [i=md@freenode/staff/md] has joined #upstart === mbiebl [n=michael@e180120209.adsl.alicedsl.de] has joined #upstart [11:17] strange [11:17] is it normal that my karma in launchpad goes up without me doing anything? [11:19] yeh, LP is whacko [11:19] heh === AlexExtreme disappear again [11:28] heh [11:34] <_ion> It would be easy to make a script that creates a graphviz graph alike the one in the wiki of the current system, btw. It would just need a purely informational stanza about events a job may emit, e.g. fhs_check could contain 'emits fhs-filesystem-mounted fhs-filesystem-unmounted'. [11:35] yeah [11:36] <_ion> I'll probably write that script later. [11:36] didnt notice you here Keybuk [11:38] ... I have always been here ... [11:38] ... oh ... [11:38] <_ion> after the complex-event-config spec is final. [11:41] Kamping_Kaiser: actually, that's a lie, I can't back that up [11:52] _ion: http://upstart.ubuntu.com/wiki/Emits [12:02] <_ion> keybuk: Nice, thanks. [12:14] <_ion> keybuk: Hm, will all that information (the "start/stop on" and the "emits" events) about current jobs be exposed via libupstart? I could just write the bindings for libupstart and use them. [12:15] I think so [12:15] not sure which works best [12:15] have a tool ask upstart for job relationships [12:15] or parse the configuration files itself [12:18] <_ion> Yeah, that's what i'm pondering. :-) [12:18] I think the former, that way what you see is always consistent with what upstart knows [12:18] and allows for future registry of jobs from other sources (e.g. autostart files) [12:18] <_ion> Yeah. === Kamping_Kaiser [n=kgoetz@easyubuntu/docteam/kgoetz] has left #upstart ["More] === _ion [i=johan@kiviniemi.name] has joined #upstart [03:09] morning folks [03:09] heyhey [03:10] where did we leave off yesterday? [03:10] Oh yes! Yeah. You have a real job. It just seems like it's too good to be real. ;) [03:10] it has its downsides, like any other [03:14] I'm just going through the specs tidying them up and marking those we're pretty much agreed on as approved [03:24] well, that was a conversation killer :p [03:31] Heh. [03:37] so, I was thinking on last night's conversation [03:37] I'm convinced that the eight-stage state machine is correct, though the names could do with a little tweaking [03:38] however I think I agree that there should not be eight corresponding events [03:38] the main reason being that the paths through the state machine aren't circular [03:38] e.g. you could have a start event, but a failure along the way, which could cause you to go straight to stopped [03:38] so anything with "from start FOO to stop FOO" wouldn't actually get stopped [03:39] and I think that inconsistency is bad [03:39] going through all the examples, I couldn't find anything that really needed that level of complexity [03:39] so I propose keeping the state machine *but* reducing the events [03:39] in fact, there should be just two pairs [03:39] "start"/"starting" = goal changed to start, process begun [03:40] "stop/stopping" = goal changed to stop, process begun [03:40] "started" = process of starting finished [03:40] "stopped" = process of stopping finished [03:40] these events would be emitted at the most appropriate places in the state machine [03:40] I agree wholeheartedly. [03:40] and would always go [03:40] starting -> started [03:40] or [03:40] starting -> stopping -> stopped [03:40] or stopping -> stopped [03:40] or stopping -> starting [03:40] etc. [03:40] so they make logical sense [03:40] Yes. [03:41] if you get a starting event, you ALWAYS get at least stopping [03:41] Makes sense. [03:41] and if you get a stopping event, you ALWAYS get at least starting following [03:41] So, if pre-stop fails, it'll emit starting again? [03:42] right [03:42] goal changes back [03:42] But not actually traverse through pre-start and post-start. [03:42] yup [03:42] Super. [03:42] this also gives us an interesting idea [03:42] the started/stopped events are just "now at rest" events [03:42] Yup. [03:42] but the starting/stopping events are "goal being changed" [03:43] we could delay any action on those two events until they've been hancled [03:43] uh, handled === j_ack [n=rudi@p508D8B8E.dip0.t-ipconnect.de] has joined #upstart [03:43] so: [03:43] start ssh => sets goal to start => emits "starting ssh" => waits for event to finish => changes state to pre-start [03:43] likewise for stopping [03:44] stop ssh => sets goal to stop => emits "stopping ssh" => waits for event to finish => changes state to pre-stop [03:44] so for the more usual example: [03:44] tomcat could be started on starting apache; which means that apache won't be started until apache is actually running [03:44] uh tomcat is actually running [03:45] Yes. That makes sense for apache/tomcat. [03:45] network-manager could be stopped on stopping dbus; which means that dbus won't be stopped until network-manager has been stopped [03:45] And that also makes perfect sense. [03:45] are there any examples of something you would run "on starting" or "on stopping" which _shouldn't_ delay the named job? [03:45] And in both cases the definition of the dependency is described from the POV of the depending service. [03:45] Say that again? [03:46] Ahh I see. [03:46] I'm unsure. [03:47] I can't see it. [03:47] I think the most likely case is desiring a service to run on started. [03:48] Which leaves the emitting job at rest, done, so blocking doesn't matter. [03:49] And I can't imagine any circumstances where that isn't good enough. [03:49] <_ion> I agree with having only starting/stopping/started/stopped emitted. [03:50] <_ion> And with "on starting" and "on stopping" causing the emitting job to wait. [03:51] I think the only desire to have something start but not block "starting" is simply because it's reactionary, but independent. It doesn't actually care about the state of the first job. [03:51] Thus making it more async. [03:52] And I would probably say that that could be handled better. Don't make your job block. [03:52] Let it start quick. [03:52] <_ion> Can you think of an use case for having "on starting" without blocking? [03:52] how would you define that, though? [03:52] Define it? By geting through pre-start FAST. [03:53] *confused* [03:53] Don't just spin in pre-start. [03:53] Well, the blocking is only waiting for the dependent job to be started [03:53] the only use cases I can think of for "on starting" do suggest blocking [03:53] Which is waiting from child stopped->started [03:53] yeah [03:54] I can't imagine that being slow enough to be unsatisfactory justify [03:54] s/justify// [03:59] I guess the only potential bottleneck I can think of is that if somebody wants to have their job run before apache runs, but their job is waiting on some other external resource before it starts. [03:59] And they represent that waiting by spinning in pre-start =) [03:59] I'd argue that's a bug anyway :) [03:59] And that seems pretty broken anyways. [03:59] Yeah [04:02] running to work now [04:03] Are we suseptable to any sort of feedback loop with this? [04:03] Or deadlocking [04:04] you can be deadlocked by two jobs [04:04] foo: start on starting bar [04:04] <_ion> A "start on starting B" and B "start on starting A"? [04:04] bar: start on starting foo [04:05] actually, no you wouldn't [04:05] <_ion> That could be easily detected, couldn't it? [04:05] because the "starting bar" event wouldn't affect foo, because it's goal is already set [04:05] bar would start immediately [04:05] and thus foo would too [04:06] <_ion> Ah, nice. === mbiebl [n=michael@e180120209.adsl.alicedsl.de] has joined #upstart === Amaranth [n=travis@ubuntu/member/amaranth] has joined #upstart [04:47] hmm [04:47] what to do about respawn? [04:47] would you want an event for that? [04:48] should it be when it notices the job has died, or when it has successfully restarted it? [04:53] back [04:53] <_ion> died foo 42 exit value [04:53] <_ion> stopped foo [04:53] <_ion> starting foo [04:53] <_ion> started foo [04:53] <_ion> ? [04:53] Nice. [04:54] I don't think an event is required for respawn specifically as much as "died" [04:54] And I *love* the idea of including the exit code in the event args [04:54] That's slick. [04:54] failed? [04:54] we have that already [04:54] Yeah. [04:54] my only concern there is that it'd cause any service using starting or stopping to get restarted [04:54] "jobname failed 1" [04:54] but maybe that's right too [04:55] Hmm. [04:55] actually, it'd only do it for anything on stopping [04:55] so if dbus died, all the dbus services would get restarted [04:55] That is most certainly appropiate. [04:55] At least in that case. [04:56] I suspect if something really needs to be stopped on stopping, and block the parent, that does imply that the connection is critical [04:56] I think failed is definatly a bug, as in, something which should never happen. But when it does, it's best to be conservative. [04:56] so does imply that respawning one means to respawn the other [04:56] When it fails, run the post-stop scripts, which would clean up on service stop. [04:56] Then cycle back through to start. [04:57] Also, the post-stop script, and in fact the pre-start script will have access to the event that caused it. Including args. [04:57] [ "$1" == "failed" ] [04:57] yeah [04:58] ok, so another topic [04:58] There are cases where we have known buggy software, that people want to run, but we know is going to crash. [04:58] (nscd) [04:58] (from the same spec) [04:58] <_ion> Hmm, if foo: "start on stopping bar" and bar fails, foo should be stopped. Should there be some magic, or should there be a dummy "stopping bar" event right after the "failed bar" event? [04:59] I think I lean towards having a dummy event. [04:59] <_ion> Me too. [04:59] You know, we might not even have a failed event. [05:00] <_ion> Whoops, i meant foo: "stop on stopping bar" === Keybuk thinkgs about "start on failed self" again [05:00] Are we going to use job names as events or start/stop ? [05:00] We should decide that once and for all. ;) [05:00] wasabi: right, so let's discuss that now [05:01] there's a good reason (imo) to make events have a _different_ namespace to jobs [05:01] the example event is the block-device-added event [05:01] we can write things like /etc/event.d/mount-filesystem with "on block-device-added" [05:01] but someone is going to try and write /etc/event.d/block-device-added [05:01] this has several effects [05:01] Hmm. [05:02] 1) it doesn't get triggered, whey they probably expect it to -- no matter, they can add "on block-device-added" in it [05:02] Weird events being mixed. [05:02] 2) so the block-device-added hda event from udev now starts the block-device-added job [05:02] You'd get block-device-added events with data that was not in fact a block device name [05:02] 3) which emits the block-device-added starting event [05:02] and poor mount-filesystem would get both [05:02] You win. =) [05:02] this is a real world problem [05:03] someone (me) shipped /etc/event.d/control-alt-delete in Ubuntu [05:03] Heh. [05:03] which is why the event got renamed [05:03] so I think in general we want to keep the namespace separate [05:03] and I think it's never useful to say "on JOBNAME" and get started whenever that job changes state [05:04] The only reason I was thinking the other way was for our reusable complex event stuff. [05:04] but it's potentially useful to have "on STATE" (on failed) and get started whenever any job reaches that state [05:04] I don't think there's much semantic difference between [05:04] /etc/event.d/some-reusable-state-machine [05:04] from started apache until stopping apache [05:04] and [05:04] from apache started until apache stopping [05:04] which is the only real difference [05:04] Yeah. Agreed. === _ion totally agrees with Keybuk [05:04] the reusable complex event stuff defines states which have the various events [05:04] I'm on board now. ;) [05:04] not a single up/down event [05:05] the cute thing about that is you can define pre-start and post-stop scripts for your reusable state machines [05:05] Yeah. [05:05] of course, this all means that /etc/event.d isn't the right name anymore [05:05] it's a directory that defines jobs, services, tasks and states [05:06] But not events. ;) [05:06] I agree that /etc/event.d/FOO implies that FOO is an event, and the content of the file is what happens on it [05:06] Explicitly not events, yet we call it event.d hehe [05:06] so I think we should rename that directory [05:06] <_ion> TBH, i was never very fond of the name "event.d". [05:07] Well, we need a name which better describes { jobs, services, tasks and states } [05:08] http://upstart.ubuntu.com/wiki/JobEvents [05:08] ^ suitable for approval? [05:08] reading [05:08] <_ion> /etc/upstart would be quite obvious, but perhaps that would be better preserved for possible configuration of upstart itself, such as the profiles spec. [05:09] I'm curious how internally you will deal with respawning. [05:09] wasabi_: I'm edging closer and closer to dropping it entirely from the internal state [05:09] You have the concept of a goal, which is essentially an enum set on a job, and the job works towards that state. [05:09] Ahh. [05:09] I think that respawn just indicates that the job doesn't change goal when it dies [05:09] But in the case of restart, your goal is started, but you have to instruct the machine to cycle to that. [05:09] we can still kick it into stopping, and let it come back as if it had been manually restarted [05:10] True. [05:10] so it'll go stopping -> starting -> started [05:10] actually, it'll go failed -> stopping -> starting -> started [05:10] I sometimes wonder whether someone with far more forethought than I wrote it :p [05:10] You'd kick it into post-stop, basically. [05:10] right [05:10] <_ion> "Events emitted as part of a job state change are currently named after the job, with a suffix indicating the new state." [05:10] And it would just cycle back around naturally. [05:10] <_ion> Isn't that the other way around? [05:10] _ion: apache/started [05:11] <_ion> Sorry, i missed the word "currently" and misinterpreted the sentence. [05:12] I usually try in the summary to first say how things are, then say what the spec proposes [05:15] <_ion> "it'll go failed -> stopping -> starting -> started" Shouldn't "stopped" be there after "stopping"? [05:16] err [05:16] yes [05:19] <_ion> Have i understood this correctly: [05:21] <_ion> When admin stops a job: "stopping foo", pre-stop script, kill proces, post-stop script, "stopped foo" [05:22] <_ion> When a process exits by itself: "failed foo exitval" IF it failed, "stopping foo", pre-stop script, post-stop script, "stopped foo" [05:22] yes [05:23] It might be worth thinking about removing "failed", and replacing it with a third argument to "stopping". [05:23] I might be totally offbase there. [05:23] <_ion> Then "foo" would need to know what exit value "bar" has when it fails or doesn't fail. [05:24] If it's set to respawn, any exit code is a failure. [05:24] Even 0. [05:24] At least, that's how I understand it. [05:24] Maybe I'm wrong there too. [05:25] correct [05:25] or any value in normalexit [05:26] Consider this. A piece of software, such as a java application server (which I can attest operates this way), which has a built in shutdown/startup interface. [05:26] So, it's possible to shutdown this server using it's built in web interface. This does not instruct upstart to in fact shut it down. [05:27] It could, but any such patches to do so would have to be written. [05:27] Instead, it terminates normally with exit 0. [05:27] Even though you want it to respawn on crash. [05:27] So, basically, you would desire upstart to "respawn on !0" [05:28] right [05:28] respawn [05:28] normal exit 0 [05:28] (this works today) [05:28] Ahh. Cool. [05:29] So, that would result in upstart running post-stop, and emitting stopping and stopped. [05:29] yes [05:29] Nice. === mbiebl [n=michael@e180120209.adsl.alicedsl.de] has joined #upstart [06:47] wasabi: what if a post-start script fails? [06:47] remind me [06:49] I think your description says it kills the process === Amaranth [n=travis@ubuntu/member/amaranth] has joined #upstart === j_ack [n=rudi@p508D8B8E.dip0.t-ipconnect.de] has joined #upstart === treetop [n=treetop@88-212-90-121.vl20-cph.dhcp.clearwire.dk] has joined #upstart === j_ack [n=rudi@p508D8B8E.dip0.t-ipconnect.de] has joined #upstart [08:16] I think if post-start fails, it would proceed to "stopped" along normal paths. [08:17] Hmm. Guess not./ [08:17] Guess it would skip to stop (bypassing pre-stop) [08:17] kill the process, then post-stop. [08:51] what about if the pre-stop fails? [09:07] which is kinda interesting [09:07] assuming we issue the stopping event first, and hold on that, then things will need to be restarted [09:08] so you'd have to go stopping -> starting -> started again [09:08] without actually doing anything :p [09:08] alternatively one could wait to issue the stopping event until after pre-stop, but that negates the pre-stop being used to instruct the server to shut down safely [09:25] pre-stop wouldn't be used to instruct it to shut down. [09:25] It would be used to determine if it should actually shutdown. [09:25] Maybe that's what we're missing? [09:25] So, it isn't in fact going to stop until pre-stop says it can. [09:26] the trouble with that is that you've already issued the stopping event, haven't you? [09:26] or do you wait until pre-stop says you can? [09:26] I can see both uses for the script [09:28] Hmmm. Also makes me wonder if we in fact need a "start"/"stop" script seperatly from exec. [09:29] pre-stop i think is and idea we had simply to prevent stopping. [09:29] I'm not even sure that is even neccassary. [09:29] I'd say, if there is some actual shell script that is run that iniates the stop, it might be seperate from all of this, and a companion with "exec" [09:29] exec foo [09:30] stop script [09:30] do something to kill foo [09:30] end script [09:33] pre-start;start;post-start, pre-stop;stop;post-stop, where start and stop can run at the same time. Each can be expressed by "name {exec|script}" [09:34] So, exec, logically, becomes "start exec". [09:34] JUst like we would have "post-start exec", or various other uses. [09:34] * all works fine: starting, started, stopping, stopped [09:34] * pre-start fails: starting, failed, stopping, stopped [09:34] * post-start fails: starting, failed, stopping, stopped [09:34] * binary fails: starting, failed, stopping, stopped [09:34] * binary respawns: starting, started, failed, stopping, starting, started [09:35] start/stop are runnable at the same time. while start is still going, we are at rest. [09:35] those are the event sequences I've come up with so far [09:35] probably: [09:35] * binary terminates badly: starting, started, failed, stopping, stopped [09:35] as well [09:36] I think there are a number of differnet things represented there which need to be thought about seperatly. [09:36] 1) emitted events [09:36] 2) "status" of job [09:36] those are simply emitted events [09:36] 3) internal status of job used to maintain state machine [09:37] Okay [09:37] trying to work out what works best [09:37] 2), the queryable statuses I see are "starting", "started", "stopping", "stopped". initctl can get the status of a job and return only those 4. [09:37] do those event sequences make sense? [09:37] 4) executable steps [09:37] Yes I believe so. [09:38] the respawn one is the odd one, as it never reaches "stopped" [09:38] it's stopping, then it's starting again [09:38] Yeah. [09:38] MIght emit stopped anyways. [09:38] I think that makes sense, if a job is "stop on stopped foo", and foo respawns, it doesn't get restarted; where a job that's "stop on stopping foo" does [09:38] Since another job may be watching for it. [09:38] the problem is working out where to emit events [09:39] You could emit stopped, but there would be no point at which you could query and return "stopped" [09:39] emitting stopped there is damned hard [09:39] Just let the mainloop run twice. [09:39] no such thing [09:39] I mean, you don't let it do that, you iterate and set it to stopped, and run the normal stopped code, which emits the event, and then it runs again immediatly after and progresses. [09:40] once it hit stopped, nothing would get it back out again [09:40] Sure, it's at stopped, but hte goal isn't stopped. [09:40] yes [09:40] So the next loop would send it to starting again [09:40] but what would pick that up? [09:40] you'd need something in the main loop checking every single job [09:40] which is damned ineffecient [09:40] Ahh. [09:41] Well, I suspect you will end up with an idle bit someplace. [09:41] right now, it flows because each change causes the next [09:41] I'm trying to drop the idle bit :) [09:41] if goal != state idle = false; [09:41] hmm. i see. [09:41] idle = (goal == state) actually. [09:42] Then always "process" jobs where !idle, after work, reset idle again. [09:42] 4) executable steps. these are basically PIDs that are being tracked. [09:43] 4) pre-start, start, post-start, pre-stop, stop, post-stop [09:43] They can be expressed as "name [exec|script] ..." [09:43] start exec foo, start script\n foo\nend script [09:44] pre-start exec, pre-start script. Same stuff. [09:44] http://people.ubuntu.com/~scott/states.png [09:48] (reload) [09:48] (reload again) [09:48] follow the green lines while the goal is start [09:48] follow the red lines while the goal is stop [09:49] These are events? [09:49] states [09:49] Didn't think pre-start and such would be states. [09:49] why not? [09:49] they represent discreet points in the lifecycle [09:50] okay i see [09:52] starting is where we fork the process [09:52] stopping is where we kill it [09:52] we emit the "started" event when we hit "running" [09:52] and the "stopped" event when we hit "waiting" [09:52] where to emit the starting and stopping events is the tricky one [09:52] I might say we emit the stopped event when we finish post-stop. [09:52] right now I have them being emitted when the goal changes, so completely outside that diagram [09:53] doesn't work if you don't have a post-stop [09:53] and we emit the started event when we finish post-start, to keep it simple. [09:53] The machine would always progress through post-stop, but if there was no post-stop PID to run/track, it would just be done. [09:53] And on completion, it emits event. [09:53] it'd enter post-stop, but then you'd need some kind of "ok, what state was I just in?" thing [09:53] Not really, always, at the end of post-stop, emit stopped event. [09:54] Regardless of what happened previously. [09:54] no such thing as "end of" [09:54] You'd want to emit stopped on a respawn. [09:54] we define an end of a state by changing to a different one [09:54] see, I'm not sure we do [09:54] Sure there is. End of just just hte last step in the small machine known as "post-stop [09:54] why does respawn need to say the job is stopped? [09:54] it never really is [09:54] Functionally it isn't available, or has gone up and down. [09:54] And thus you would have to reconnect. [09:54] nm/dbus. [09:55] right, but I think anything that connects will be hanging on the started and stopping events [09:55] NOT the starting or stopped events [09:55] because those implicitly don't allow connection [09:55] nm could do from (dbus started until dbus stopping) or (dbus started until dbus failed) [09:55] but that would be a bit obnoxious. [09:55] you'd always get stopping [09:55] Hmm. the blue line skips over it [09:56] you're confusing a state with an event [09:56] that's probably my fault :p [09:56] those stopping and starting are just the interim states, nothing to do with the events [09:56] I haven't renamed them yet [09:56] that diagram should have events hanging off of each step. [09:56] heh [09:56] that doesn't work if we don't have an event for each step :p [09:56] Okay, so stopped would not be emitted... so in that case: [09:56] the point is that we're having events to indicate which way round we're going, *not* where we are [09:57] from (dbus starting until dbus stopped) or (dbus starting until dbus failed) [09:57] Since stopped isn't emitted when failed is. [09:57] But the conditition is functionally the same. [09:57] ah [09:57] from starting dbus until stopped dbus [09:57] ie no service running, can't connect to it. [09:57] ^ I don't need to connect to dbus, just be around when it is (and it should wait for me to be around) [09:57] dbus is a bad example there. [09:57] It was just in my buffer and I didn't want to retype. [09:57] better example [09:58] from starting apache until stopped apache [09:58] Yeah. [09:58] apache needs me to be started first, and needs to be stopped before I am [09:58] the counter-example is [09:58] from started dbus until stopping dbus [09:58] I need dbus to be around, and I don't want dbus to be stopped until I am [09:58] {...dbus...{...nm...}...} [09:59] our first example wouldn't get restarted if apache respawned (no need, apache depends on it, not the other way around) [09:59] our second example would get restarted whenever dbus respawned [09:59] {...tomcat...{...apache...}...} [10:00] Yeah I think you're right. If tomcat can run without apache, then it doesn't matter if it knows that apache just died. [10:00] Because apache will just come back. [10:00] But the situation described is still right. tomcat will always be running while apache is. [10:00] from started foo until stopped foo -- I suspect this will be rare; it implies that foo must start first, but you don't really need foo, as it's ok for it to stop [10:01] from starting foo until stopping foo -- odd, foo has to wait for you to start, and wait for you to stop -- some kind of strange hold; this works in the respawn example though [10:01] {foo {bar }foo ?bar [10:02] Here's an interesting thing. [10:02] dbus will be considered started, and nm will launch. [10:03] But dbus might not have actually opened it's socket and been completely prepared. [10:03] that's what the post-start script is for, and why the started event isn't emitted until that finishes [10:03] And hence post-start can hold nm from starting until dbus is sure it's up. [10:03] And that is just totally kick ass. [10:03] Easy race elimination. [10:04] pre-stop is the tricky one [10:04] do we emit stopping before pre-stop, and wait for other jobs to finish first? [10:04] Yeah, interesting. [10:04] or do we run pre-stop first, and then only emit stopping after - waiting for other jobs to finish before killing the process? [10:06] I think that's probably best. [10:09] which means the stopping event is tricky [10:15] 1) execute and wait for pre-stop 2) if going to stop emit and wait for stopping, else reset back to started 3) execute and wait for stop (procedure to terminate process) 4) execute and wait for post-stop [10:22] http://people.ubuntu.com/~scott/states.png [10:22] reload again [10:22] follow the green lines when the goal is start [10:22] follow the red lines when the goal is stop [10:22] failure of a step sets the goal to stop [10:23] note: [10:23] running can terminate normally, and change the goal to stop (the red line out marked "normal") [10:23] this goes to pre-stop [10:24] running can terminate badly, and change the goal to stop (the red line out marked "failed") [10:24] err [10:24] sorry [10:24] the one marked failed is supposed to be marked "terminated" [10:24] the normal one is a stop request or event [10:24] the idea there is that you skip pre-stop and "kill process" if the process died [10:25] there's a red and green line in that direction, because respawn doesn't change the goal [10:25] pre-stop can change the goal back to start, so there's a green line back to running again [10:42] does that look right? [10:45] still parsing [10:45] between writing an application deployment lifecycle policy and procedures document for the audit guy. blah. [10:47] Is there a way in dotty to make a big grouping of pieces of this? [10:47] I think so, yes [10:47] Like, off to the right side, a big line which says what parts are in a group. stopped, starting, started, stopped. [10:48] what would be in those groups? [10:48] stopped would be "waiting" [10:48] ? [10:48] starting would be [emit starting, pre-start, exec process, post-start] [10:49] started would be [emit-started, running, pre-stop] [10:49] stopping would be the rest. get it? [10:49] I'll see what I can make it do :) [10:49] What I mean, is representing the queryable state of the job at every given point. [11:00] reload [11:00] though dot mangled that a bit :p [11:08] right bed, will mull on that a bit [11:08] nite === mbiebl [n=michael@e180095208.adsl.alicedsl.de] has joined #upstart