[17:27] <frew> does anyone know if there is a way to detect a restart?  That is, I would like to run a command when a service is restarted automatically, but not when it is started via stop then start
[17:30] <frew> I sorta expect the start on reason or something to be in an env var?
[17:32] <chras> frew: yeah
[17:32] <chras> you can do this
[17:32] <chras> sec, lemme look for my code which does this
[17:32] <frew> awesome, thanks
[17:32] <frew> oh is it UPSTART_EVENTS ?
[17:33]  * frew *just* found http://upstart.ubuntu.com/cookbook/#standard-environment-variables
[17:33]  * frew would guess a "respawn" in UPSTART_EVENTS means it crashed
[17:34] <chras> yeah you can check on that
[17:35] <chras> so when you do an initctl restart you want another event to fire?
[17:35] <chras> or when you kill the running process, and upstart restarts it automatically
[17:35] <chras> or both?
[17:35] <frew> no, really all I want, if possible, is that if it crashes (or if I killed it directly)
[17:35] <frew> so the latter (I think)
[17:36] <frew> basically if it restarted due to a crash
[17:36] <chras> right, k, so i have an event which does this
[17:36] <chras> sec lemme pastebin it
[17:36] <frew> thanks
[17:37] <chras> https://pastebin.mozilla.org/8870501
[17:37] <chras> but, when i test it, it doesnt trigger on an initctl restart $process
[17:37] <frew> which is the goal here.
[17:38] <frew> huh, so are scripts bash?
[17:38] <chras> more or less
[17:38] <chras> with a set -e
[17:38] <frew> ok
[17:38] <frew> I assumed they'd be sh
[17:38] <chras> well my sh is linked to bash
[17:38] <frew> mine's dash
[17:38] <chras> i dont use dash
[17:38] <frew> gotcha.
[17:38] <chras> dash does ... unexpected things
[17:38] <frew> heh
[17:39] <frew> so you install this upstart config
[17:39] <frew> and it captures basucally all events?
[17:40] <frew> (hence the JOB!=self_help, which is what this is)
[17:41] <chras> yeah im having trouble getting it to trigger on a restart though
[17:42] <frew> restart meaning a crash?
[17:42] <chras> my thing triggers on a crash
[17:42] <chras> just not an 'initctl restart'
[17:42] <frew> ok that's how I want it
[17:42] <frew> yeah
[17:42] <frew> that's what I want.
[17:42] <frew> if someone manually restarts, I think I'm ok with that not being logged especially
[17:42] <frew> though it would be nice to know as well
[17:42] <chras> i made this event back when i had heartbeat dying unexpectedly
[17:43] <chras> and then i had a config file which took actiosn based on what happened
[17:43] <chras> ie
[17:43] <chras> heartbeat XCPU echo "nas_self_help: heartbeat died unexpectedly with XCPU Signal, restarting" | wall -n
[17:43] <chras> so heartbeat was getting a XCPU kill flag for whatever reason (ended up being a bug in heartbeat debug)
[17:43] <chras> and then id wall it out for instance
[17:44] <frew> right my thought here is to either page or create an issue or osmething
[17:44] <frew> not sure
[17:44] <chras> and it does trigger on a stop / kill / etc
[17:44] <chras> but not sure about restart atm
[17:44] <frew> ok well I need to get this installed and I'll see how well it works
[17:45] <frew> so GCONFIG and PCONFIG (currently) do nothing
[17:45] <frew> so key signal rest is like
[17:45] <frew> who did what
[17:46] <frew> what they did
[17:46] <frew> and other info?
[17:46] <frew> or something?
[17:46] <frew> oh sorry I am reading this and learning more
[17:46] <frew> I hsould talk to my rubber duck before asking here
[17:47] <chras> well thoes are just configs that i use
[17:47] <frew> right
[17:47] <chras> i can paste thoes in the thing, sec
[17:47] <frew> thanks that'd be helpful
[17:48] <chras> https://pastebin.mozilla.org/8870503
[17:49] <chras> i guess the main thing i wanted you to originally see was the 'start on stopped JOB=${some_job}
[17:50] <chras> and then have a task which does whatever you want in a script stanza
[17:50] <frew> right
[17:50] <frew> well and having a single task that does it is very elegant
[17:51] <frew> I was gonna have a script I put in all my scripts
[18:09] <chras> frew: yeah i cant get it working in the context of an 'initctl restart'
[18:11] <chras> as a workaround you could do an initctl emit in your pre-stop
[18:27] <chras> yeah a restart never emits a 'stopping' or 'stopped' event
[18:31] <frew> chras: fwiw the z thing is not really needed anymore iirc
[18:31] <chras> frew: ug. i got it
[18:31] <chras> its definately a bug
[18:31] <frew> yeah
[18:31] <chras> if you have a pre-stop script
[18:32] <chras> then the job goes running -> pre-stop -> start
[18:32] <chras> and never emits a stopping/stopped
[18:32] <frew> you think the maintainers of upstart would fix it?
[18:32] <frew> I was telling my boss about this and he was a little hesitant since upstart is destined for the void
[18:32] <frew> but so is everything so I'm gonna keep going
[18:32] <chras> if you DONT have a pre-stop script, then it works as intended
[18:32] <frew> interesting.
[18:32] <chras> it MIGHT be this bug, https://bugs.launchpad.net/upstart/+bug/703800
[18:33] <frew> btw logger ~~ syslog?
[18:33] <frew> I'd say even if it is that bug, that bug is 4y old and will probably never get fixed
[18:33] <chras> right
[18:34] <frew> I think I'm gonna make an "ANY" job too.
[18:34] <frew> at least so I can get started with this reasonably.
[18:36] <chras> anyways, sorry couldnt be more help
[18:37] <frew> no way this is great
[18:37] <frew> you 100% solved my issue and added more
[18:38] <chras> great
[20:15] <frew> chras: actually I guess I maybe misunderstand you; I don't get logging or anything for a respawn (crash and restart)
[20:15] <frew> is that what you were saying is broken?
[20:17] <chras> hi
[20:18] <chras> k the quick summary was
[20:18] <chras> IF you have a pre-stop, then an initctl restart does start -> pre-stop -> start , and skips doing a stopping, or stopped event
[20:18] <frew> ok that's what I thought you were saying.
[20:19] <frew> this is neither a start, stop, nor restart, but a service that starts and crashes after about 3s, consistently
[20:19] <frew> and I get no events
[20:20] <chras> so your event fails to start at all
[20:20] <chras> and thats what you're trying to catch ?
[20:21] <frew> well I mean, it never forks, so as long as upstart knows, it worked correctly for 3s
[20:21] <frew> and then crashed
[20:21] <frew> but yes, I'm trying to catch an error causing a respawn, likely a compile error
[20:21] <chras> right. ok, i think i understand
[20:21] <frew> though it could have been running for three hours
[20:21] <chras> so your daemon forks, and upstart isnt detecting that it died?
[20:21] <frew> it would still be the same situation
[20:21] <frew> no
[20:21] <frew> it never forks
[20:21] <frew> it just exec's
[20:21] <chras> ah
[20:22] <frew> hence upstart not knowing it's ready
[20:22] <chras> does your upstart throw an error like
[20:23] <chras> main process ($pid) terminated with status $exit?
[20:23] <frew> if it would place an error in the syslog, no
[20:23] <frew> not sure where else to look
[20:23] <chras> check dmesg
[20:23] <chras> do you 'console log' ?
[20:24] <frew> I don't know?
[20:24] <chras> mind putting your event on pastebin?
[20:24] <frew> yeah it's super basic
[20:25] <frew> https://gist.github.com/frioux/0e4e4c1dc0b02f82a9a1a327ae55c248
[20:25] <chras> right k
[20:26] <chras> i think you want to add a 'console log' to that
[20:26] <chras> so you can get some minimal logging
[20:26] <frew> ok
[20:26] <frew> (fwiw I know why it's crashing, I just want to be able to detect better in the future)
[20:27] <chras> and you're trying to throw another event when that one fails to start?
[20:27] <frew> well, I just want to find out, somehow
[20:28] <frew> originally I was gonna write a pre-start that looked at 
[20:28] <frew> UPSTART_EVENTS
[20:28] <frew> and if respawn was in there, do a thing
[20:28] <frew> but your solution was a lot more elegan
[20:29] <chras> so, when i do this
[20:32] <chras> https://pastebin.mozilla.org/8870543
[20:32] <chras> when my main process ends with a non 0 exit code (which i assume is your failureA)
[20:32] <chras> it still triggers my self_help
[20:32] <frew> surely it is
[20:33] <frew> although... it's probably exiting 255
[20:33] <frew> any chance upstart ignores high errors?
[20:33] <chras> should be anything non-zero
[20:33] <frew> ok
[20:33] <chras> unless you told it to ignore certain exit codes
[20:33] <frew> I don't even know how I'd do that?
[20:33] <frew> where would that be done
[20:33] <chras> its just a main-config command
[20:34] <chras> sec
[20:34] <chras> lemme find it
[20:35] <chras> http://upstart.ubuntu.com/cookbook/#normal-exit
[20:35] <frew> well I showed you the whole file
[20:35] <frew> unless you can glboally do it, I didn't do that
[20:36] <chras> yeah so itll be non-zero exit codes
[20:36] <chras> will cause it to fail
[20:37] <chras> so my self_help event isnt triggering on your thing exiting weirdly?
[20:38] <frew> correct
[20:38] <frew> I have to admit though
[20:38] <chras> does your reaper fork ?
[20:38] <chras> or is it a foreground process
[20:38] <frew> I did clean up your self_help thing to be shorter and simpler
[20:38] <frew> no it doesn't fork
[20:38] <frew> it's foreground
[20:39] <frew> lemme paste the changes I made
[20:39] <chras> did you run an initctl reload-configuration?
[20:39] <frew> they shouldn't harm anything
[20:39] <chras> to get the new event
[20:39] <frew> yes
[20:39] <chras> does an initctl start self_help work?
[20:39] <frew> well I get output about other services
[20:39] <frew> but let's see
[20:39] <chras> ah interesting so you know it IS working, just not with your reaper?
[20:40] <frew> oh wait
[20:40] <frew> wtf
[20:40] <frew> initctl: Unknown parameter: JOB
[20:40] <frew> does that mean antyhing to you?
[20:40] <chras> yeah
[20:40] <chras> initctl start self_help JOB=test
[20:40] <chras> thats just the instancing stuff triggering
[20:41] <chras> normally JOB is emitted automatically by other upstart events
[20:41] <frew> initctl: Job failed to start
[20:41] <frew> right
[20:41] <frew> no obvious reason why it's failing
[20:41] <chras> hm, pastebin it ?
[20:41] <frew> the modified self_help or what?
[20:41] <chras> yea
[20:41] <frew> k
[20:42] <frew> https://gist.github.com/frioux/953fe2f4b950c2bef58521f0ca0e0ad9
[20:42] <frew> should be effectively the same, but work with any POSIX shell (and only allow one extra config file)
[20:42] <frew> (and I renamed it watchdog)
[20:43] <chras> right
[20:45] <chras> you're using dash right?
[20:45] <frew> I wrote it with dash in mind, since we are moving to dash soon, but it's still bash now (12.04)
[20:46] <frew> oh no
[20:46] <chras> its just a syntax error for me
[20:46] <frew> 12.04 is dash
[20:46] <frew> oh?
[20:46]  * frew eyeballs
[20:47] <chras> ./foo.script: 24: ./foo.script: Syntax error: "(" unexpected (expecting "then")
[20:47] <frew> I wonder why that's wrong
[20:47] <frew> it's supposed to work
[20:47] <frew> `(expression)  True if expression is true.` from dash(1)
[20:48] <chras> yeah it looks right
[20:48] <chras> sec, seeing if i can fix
[20:48] <frew> can repro fwiw
[20:49] <frew> fwiw the ()'s are not needed, since -o binds more tightly than -a
[20:49] <frew> but I thoguht it made it easier to read
[20:50] <frew> dash -c '[ ( -f /bin/dash ) ] && echo 1'
[20:51] <frew> I think this must be a bug in dash.
[20:52] <chras> dash does unexpected things :P
[20:52] <frew> I never claimed it was perfect
[20:53] <frew> I jsut can't switch what /bin/sh is
[20:54] <frew> ok I'll just leave out the () for now
[20:54] <chras> yeah i cant use ()'s in dash for whatever reason
[21:06] <frew> oh derp
[21:06] <frew> dash -c '[ \( -f /bin/dash \) ] && echo true'
[21:07] <frew> gotta escape them for some reason
[21:08] <frew> must be a history expansion or something.
[21:09] <frew> nonetheless I do see this: `self_help(pt-heartbeat-update-mysql): caught job[pt-heartbeat-update-mysql] instance[] process[respawn] result[failed] status[] signal[]` in syslog, but nothing about my reaper
[21:20] <chras> hm
[21:20] <frew> trying a different route, but 90% sure that upstart is just not creating an event for that
[21:20] <frew> I don't see where it would in the lifecycle anyway, fwiw
[21:21] <chras> you can try bumping the debug up with log-priority debug
[21:21] <chras> and see where its dying
[21:23] <frew> hm.
[21:23] <frew> I have a theory.
[21:23]  * frew "simplifies"
[21:28] <frew> yeah, it catches start and stop, but not respawn
[21:29] <chras> respawn, when the main process dies?
[21:29] <frew> yeah
[21:29] <frew> respawn is an even that's not real, to be clear
[21:29] <frew> I thought it might be
[21:29] <frew> s/even/event
[21:30] <frew> I sorta thing consuming events won't work for this, since there isn't one for what I'm thinking of
[21:32] <chras> lemme try here
[21:32] <chras> k so
[21:33] <chras> we're triggering self_help/watchdog on the upstart event 'stopped'
[21:33] <chras> after something is completely stopped
[21:33] <chras> for it to also hit your respawn, you need to change that to be 'start on stopping ...'
[21:34] <chras> i should make that change for my own needs as well i guess
[21:37] <chras> with it only being set to 'stopped'
[21:38] <chras> it will wait for upstart to give up on respawning before its triggered
[21:38] <chras> like i suppose, its the difference between if your job fails 10 times in a row, and upstart gives up on it
[21:38] <chras> getting 10 emails, or 1
[21:38] <chras> if your watchdog sends email out at least.
[21:51] <chras> oo, interesting though
[21:52] <chras> the respawn attempts have a result[ok] instead of a result[failed]