[17:51] <nepthar> Hey upstart community! I've got a question about non-zero exit codes in upstart. Typing out info now...
[17:53] <nepthar> I have one upstart job which runs a web server, and another upstart job which I want to act as a crash reporter. The goal of the crash reporter is to examine the exit status of my web server and if it crashed or failed, shoot off an e-mail or something. I've managed to get the crash reporter job to run each time the web server is stopped, but the actual info that comes along with the 'stopping' even shows that the web server exited successfully, even whe
[17:53] <nepthar> induce a horrible crash.
[17:54] <nepthar> I've just put up a sample of my config files in paste bin, along with snippets from the corresponding log files
[17:54] <nepthar> http://pastebin.com/0ZbzNhzs
[17:55] <nepthar> Am I somehow incorrectly handling the failure situation?
[19:01] <nepthar> For completeness, I've done more digging and discovered that setting the respawn flag causes non-zero exit status to NOT be reported in the 'stopping' event that is emitted. Question posed here: https://answers.launchpad.net/upstart/+question/205836
[21:39] <SpamapS> nepthar: interesting idea
[21:39] <nepthar> oh hey, thanks. Not sure if you saw (my IRC client crashed), but I posted a question about it on launchpad
[21:40] <nepthar> my current workaround is to simply remove the respawn flag from my web service job
[21:40] <nepthar> and the crash reporter now looks like this:
[21:42] <nepthar> start on stopped RESULT=failed JOB!=xxx-crash-reporter
[21:42] <nepthar> task
[21:42] <nepthar> console log
[21:42] <nepthar> script
[21:42] <nepthar> 	case $JOB in
[21:42] <nepthar> 		xxx-*)
[21:42] <nepthar> 			sleep 2
[21:42] <nepthar> 			echo -n "Catching stopped job [$JOB] "
[21:42] <nepthar> 			date
[21:42] <nepthar> 			logfile="/var/log/upstart/${JOB}.log"
[21:42] <nepthar> 			desc_file="$(tempfile)"
[21:42] <nepthar> 			echo "..." >> $desc_file
[21:42] <nepthar> 			tail -n $LOG_LINES "$logfile" >> $desc_file
[21:42] <nepthar> 			echo "Submitting notification..."
[21:42] <nepthar> 			curl --data-urlencode "application=${JOB}" \
[21:42] <nepthar> 			 	--data-urlencode "even=stopped" \
[21:42] <nepthar> 			 	--data-urlencode "description@$desc_file" "$NOTIFY_SERVER"
[21:42] <nepthar> 			echo "Restarting job..."
[21:42] <nepthar> 			initctl start ${JOB} || true
[21:42] <nepthar> 			echo "..done."
[21:42] <nepthar> 			;;
[21:42] <nepthar> 		*)
[21:42] <nepthar> 			;;
[21:42] <nepthar> 	esac
[21:43] <nepthar> 	
[21:43] <nepthar> end script
[21:43] <nepthar> Ah… REALLY sorry about that…
[21:45] <SpamapS> nepthar: apt-get install pastebinit :)
[21:45] <nepthar> SpamapS: see http://pastebin.com/McTqEG2U
[21:45] <nepthar> thanks :)
[21:46] <SpamapS> nepthar: realistically, your job can take over for respawn..
[21:46] <SpamapS> nepthar: not ideal, but its a workaround
[21:46] <nepthar> SpampS: exactly. It takes over the respawn process, but I love the respawn limit features built in to upstart
[22:11] <JanC> why not do the crash-reporting from the webserver job?
[22:27] <SpamapS> indeed, it will have more insight if the webserver is its own child and in its own process group