[01:33] <styol> thanks again xnox, you rock
[15:32] <dstokes> hi guys. what's the proper way to execute something when an app crashes but hasn't reached the respawn limit. post-stop is also run when the process is manualy stopped so that's not working for me. is there an event i can have another task listen to?
[15:34] <dstokes> like, maybe i can check $PROCESS on the 'stopped' event?
[16:10] <dstokes> basically looking for `start on respawned <job>`
[16:47] <dstokes> guise?
[17:47] <styol> dstokes: I'm a newbie so I may not have the proper guidance available, but you might be able to `initctl emit your-event-name` within your stop routine and then `start on your-event-name`
[17:49] <dstokes> styol: afaik there is no way to detect a process failure / respawn in stop routines. only difference in a respawn is that pre-stop isn't run
[17:53] <styol> dstokes: ah I see what you're saying
[17:54] <dstokes> easy to detect when the process stops, not easy to make a distinction btwn `stop <job>` and process crashing
[17:54] <styol> dstokes: this post suggests PROCESS=respawn might be able to be used https://bugs.launchpad.net/upstart/+bug/716802
[17:55] <dstokes> styol: process=respawn indicates a process that reached it's respawn limit. it's triggered once after n unsuccessful respawn attempts. i'm currently using a task to monitor those failures.
[17:56] <dstokes> goal now is to detect when apps crash but successfully restart, for debugging purposes
[22:33] <xnox> styol: dstokes: the events emitted are - stopped/failed JOB='test' INSTANCE='' RESULT='failed' PROCESS='respawn'
[22:33] <xnox> styol: which is emited after e.g. the main process segfaulted.
[22:34] <xnox> styol: after that there will be new starting/started events from respawning.
[22:34] <xnox> wait no, that one is when the respawn limit is reached
[22:34] <styol> ah i see
[22:34] <styol> dstokes: ping
[22:35] <dstokes> yup
[22:36] <dstokes> respawn does not represent a respawn, but reaching the respawn limit
[22:36] <xnox> styol: dstokes: for the intermediate failures one gets: started/failed JOB='test' INSTANCE=''
[22:36] <xnox> styol: dstokes: let me paste the log of events
[22:37] <dstokes> job failure is indicative of the upstart job failing, not the managed process right? pretty sure i tested that case..
[22:46] <xnox> dstokes: so without respawn the event i see is - stopped JOB=test EXIT_STATUS=1 PROCESS=main RESULT=failed
[22:46] <xnox> dstokes: or e.g. - stopped JOB=test RESULT=failed PROCESS=main EXIT_SIGNAL=FPE
[22:46] <xnox> (floating point exception, core-dump)
[22:47] <xnox> dstokes: but with respawn one gets less information. You could leave out respawn, and instead have a second job - e.g. monitor.conf which is "start on stopped mainjob" which then can act appropriate and do "$ start --no-wait mainjob" to "respawn" and/or do other clean-up things 
[22:48] <dstokes> i'm only seeing that event after respawn limit with - stopped JOB=test PROCESS=respawn RESULT=failed
[22:48] <xnox> dstokes: looks ugly, and i think it's a bug that /less/ info is passed.
[22:48] <dstokes> maybe i need to check my exit code..
[22:48] <dstokes> xnox: then you have to hack together your own respawn limit right?
[22:48] <xnox> dstokes: "so without respawn the event i see is" as in if the main job _does not have "respawn" stanza_ I  see more info in the failed events.
[22:48] <xnox> dstokes: yeap.
[22:49] <xnox> i think it's bug that we don't emit as to /why/ we failed.
[22:49] <dstokes> xnox: sry, main job _does_ have respawn stanza, along with limit
[22:49] <dstokes> i'm only seeing stopped and stopping emitted at the end of several respawn attempts (when the limit is reached)
[22:49] <xnox> dstokes: respawn -> only one failed event, when respawn limit reached without info; without respawn -> more details as to why main process failed.
[22:50] <dstokes> xnox: i see. so that's just the way it is ;)
[22:50] <dstokes> to workarnd, should be writing my own respawn task
[22:51] <dstokes> often times a process will fail, then startup successfully. those are the cases i'm after so i can debug why it failed before the logs fill up etc
[22:51] <xnox> dstokes: it is weird. I'll open a bug about it, but probably will not help much as it will take a while for such a fix to be created.
[22:51] <dstokes> right, thx for your help anyway. happy to at least confirm that it's not misconfiguration on my end
[22:51] <xnox> dstokes: oh, i see. I wonder if you can just crank up upstart logging to get that.
[22:51] <xnox> dstokes: so do you actually want to take any automated scripts/job upon failures? or are you simply to collect data?
[22:52] <xnox> dstokes: after $ initctl log-level debug (the most debugging)
[22:52] <dstokes> the ideal scenario is: setup main job to respawn a process when it fails, setup secondary task to curl when the process is respawned succesfully (curl for notification)
[22:53] <xnox> dstokes: I see - http://paste.ubuntu.com/7232773/
[22:53] <dstokes> i suppose i could watch the log, but that's a little more involved than i want to get ;)
[22:55] <xnox> dstokes: i think we can succeed your requirements!
[23:02] <dstokes> xnox++
[23:03] <xnox> dstokes: one sec, testing.
[23:05] <dstokes> for context: main job http://paste.ubuntu.com/7232809/
[23:06] <dstokes> and associated task: http://paste.ubuntu.com/7232813/
[23:06] <dstokes> alrdy have the respawn limit task working properly. notifies me when a process fails to respawn (after limit)
[23:10] <xnox> dstokes: so my "main" job simply does "main() { pause()};"
[23:10] <xnox> dstokes: that's the process, and then externally i send FPE (kill -8) to it.
[23:10] <xnox> $ cat /etc/init/test.conf 
[23:10] <xnox> respawn
[23:10] <xnox> exec /tmp/a.out
[23:10] <xnox> that's main job.
[23:10] <xnox> and here is my "monitor"
[23:10] <xnox> $ cat /etc/init/monitor.conf 
[23:10] <xnox> start on stopping test
[23:10] <xnox> stop on stopped test
[23:10] <xnox> script
[23:10] <xnox> 	sleep 2; echo "Main job respawned successfully"
[23:10] <xnox> end script
[23:10] <xnox> ..
[23:11] <xnox> dstokes: so when job under test fails (stopping test) monitor kicks in. If the job is getting normally stopped or reached respawn limit, the monitor will be stopped.
[23:12] <xnox> dstokes: however, if respawn succeeds and the job stays alive for 2 seconds then in /var/log/upstart/monitor.log i get notification that respawn was successful.
[23:12] <xnox> dstokes: but the sleep2 needs to be adjusted. You could do better without a sleep
[23:13] <dstokes> clever..
[23:13] <xnox> dstokes: e.g. "stop on stopped test or started test"
[23:13] <xnox> dstokes: and then instead of script, you'd have a post-stop script -> which checks the stop event environment.
[23:13] <xnox> dstokes: if the reason for getting stopped is "started test" it means a succssful respawn happened.
[23:14] <xnox> let me code that.
[23:21] <xnox> dstokes: nah, needs sleep / appropriate matching for respawn limit none-the-less.