/srv/irclogs.ubuntu.com/2014/04/10/#upstart.txt

=== hatchetation_ is now known as hatchetation
=== balkamos_ is now known as balkamos
styol	thanks again xnox, you rock	01:33
=== SpamapS_ is now known as SpamapS
=== PaulePan1er is now known as PaulePanter
dstokes	hi guys. what's the proper way to execute something when an app crashes but hasn't reached the respawn limit. post-stop is also run when the process is manualy stopped so that's not working for me. is there an event i can have another task listen to?	15:32
dstokes	like, maybe i can check $PROCESS on the 'stopped' event?	15:34
dstokes	basically looking for `start on respawned <job>`	16:10
dstokes	guise?	16:47
styol	dstokes: I'm a newbie so I may not have the proper guidance available, but you might be able to `initctl emit your-event-name` within your stop routine and then `start on your-event-name`	17:47
dstokes	styol: afaik there is no way to detect a process failure / respawn in stop routines. only difference in a respawn is that pre-stop isn't run	17:49
styol	dstokes: ah I see what you're saying	17:53
dstokes	easy to detect when the process stops, not easy to make a distinction btwn `stop <job>` and process crashing	17:54
styol	dstokes: this post suggests PROCESS=respawn might be able to be used https://bugs.launchpad.net/upstart/+bug/716802	17:54
dstokes	styol: process=respawn indicates a process that reached it's respawn limit. it's triggered once after n unsuccessful respawn attempts. i'm currently using a task to monitor those failures.	17:55
dstokes	goal now is to detect when apps crash but successfully restart, for debugging purposes	17:56
xnox	styol: dstokes: the events emitted are - stopped/failed JOB='test' INSTANCE='' RESULT='failed' PROCESS='respawn'	22:33
xnox	styol: which is emited after e.g. the main process segfaulted.	22:33
xnox	styol: after that there will be new starting/started events from respawning.	22:34
xnox	wait no, that one is when the respawn limit is reached	22:34
styol	ah i see	22:34
styol	dstokes: ping	22:34
dstokes	yup	22:35
dstokes	respawn does not represent a respawn, but reaching the respawn limit	22:36
xnox	styol: dstokes: for the intermediate failures one gets: started/failed JOB='test' INSTANCE=''	22:36
xnox	styol: dstokes: let me paste the log of events	22:36
dstokes	job failure is indicative of the upstart job failing, not the managed process right? pretty sure i tested that case..	22:37
xnox	dstokes: so without respawn the event i see is - stopped JOB=test EXIT_STATUS=1 PROCESS=main RESULT=failed	22:46
xnox	dstokes: or e.g. - stopped JOB=test RESULT=failed PROCESS=main EXIT_SIGNAL=FPE	22:46
xnox	(floating point exception, core-dump)	22:46
xnox	dstokes: but with respawn one gets less information. You could leave out respawn, and instead have a second job - e.g. monitor.conf which is "start on stopped mainjob" which then can act appropriate and do "$ start --no-wait mainjob" to "respawn" and/or do other clean-up things	22:47
dstokes	i'm only seeing that event after respawn limit with - stopped JOB=test PROCESS=respawn RESULT=failed	22:48
xnox	dstokes: looks ugly, and i think it's a bug that /less/ info is passed.	22:48
dstokes	maybe i need to check my exit code..	22:48
dstokes	xnox: then you have to hack together your own respawn limit right?	22:48
xnox	dstokes: "so without respawn the event i see is" as in if the main job _does not have "respawn" stanza_ I see more info in the failed events.	22:48
xnox	dstokes: yeap.	22:48
xnox	i think it's bug that we don't emit as to /why/ we failed.	22:49
dstokes	xnox: sry, main job _does_ have respawn stanza, along with limit	22:49
dstokes	i'm only seeing stopped and stopping emitted at the end of several respawn attempts (when the limit is reached)	22:49
xnox	dstokes: respawn -> only one failed event, when respawn limit reached without info; without respawn -> more details as to why main process failed.	22:49
dstokes	xnox: i see. so that's just the way it is ;)	22:50
dstokes	to workarnd, should be writing my own respawn task	22:50
dstokes	often times a process will fail, then startup successfully. those are the cases i'm after so i can debug why it failed before the logs fill up etc	22:51
xnox	dstokes: it is weird. I'll open a bug about it, but probably will not help much as it will take a while for such a fix to be created.	22:51
dstokes	right, thx for your help anyway. happy to at least confirm that it's not misconfiguration on my end	22:51
xnox	dstokes: oh, i see. I wonder if you can just crank up upstart logging to get that.	22:51
xnox	dstokes: so do you actually want to take any automated scripts/job upon failures? or are you simply to collect data?	22:51
xnox	dstokes: after $ initctl log-level debug (the most debugging)	22:52
dstokes	the ideal scenario is: setup main job to respawn a process when it fails, setup secondary task to curl when the process is respawned succesfully (curl for notification)	22:52
xnox	dstokes: I see - http://paste.ubuntu.com/7232773/	22:53
dstokes	i suppose i could watch the log, but that's a little more involved than i want to get ;)	22:53
xnox	dstokes: i think we can succeed your requirements!	22:55
dstokes	xnox++	23:02
xnox	dstokes: one sec, testing.	23:03
dstokes	for context: main job http://paste.ubuntu.com/7232809/	23:05
dstokes	and associated task: http://paste.ubuntu.com/7232813/	23:06
dstokes	alrdy have the respawn limit task working properly. notifies me when a process fails to respawn (after limit)	23:06
xnox	dstokes: so my "main" job simply does "main() { pause()};"	23:10
xnox	dstokes: that's the process, and then externally i send FPE (kill -8) to it.	23:10
xnox	$ cat /etc/init/test.conf	23:10
xnox	respawn	23:10
xnox	exec /tmp/a.out	23:10
xnox	that's main job.	23:10
xnox	and here is my "monitor"	23:10
xnox	$ cat /etc/init/monitor.conf	23:10
xnox	start on stopping test	23:10
xnox	stop on stopped test	23:10
xnox	script	23:10
xnox	sleep 2; echo "Main job respawned successfully"	23:10
xnox	end script	23:10
xnox	..	23:10
xnox	dstokes: so when job under test fails (stopping test) monitor kicks in. If the job is getting normally stopped or reached respawn limit, the monitor will be stopped.	23:11
xnox	dstokes: however, if respawn succeeds and the job stays alive for 2 seconds then in /var/log/upstart/monitor.log i get notification that respawn was successful.	23:12
xnox	dstokes: but the sleep2 needs to be adjusted. You could do better without a sleep	23:12
dstokes	clever..	23:13
xnox	dstokes: e.g. "stop on stopped test or started test"	23:13
xnox	dstokes: and then instead of script, you'd have a post-stop script -> which checks the stop event environment.	23:13
xnox	dstokes: if the reason for getting stopped is "started test" it means a succssful respawn happened.	23:13
xnox	let me code that.	23:14
xnox	dstokes: nah, needs sleep / appropriate matching for respawn limit none-the-less.	23:21

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!