=== JanC is now known as Guest22660 === JanC_ is now known as JanC [17:27] does anyone know if there is a way to detect a restart? That is, I would like to run a command when a service is restarted automatically, but not when it is started via stop then start [17:30] I sorta expect the start on reason or something to be in an env var? [17:32] frew: yeah [17:32] you can do this [17:32] sec, lemme look for my code which does this [17:32] awesome, thanks [17:32] oh is it UPSTART_EVENTS ? [17:33] * frew *just* found http://upstart.ubuntu.com/cookbook/#standard-environment-variables [17:33] * frew would guess a "respawn" in UPSTART_EVENTS means it crashed [17:34] yeah you can check on that [17:35] so when you do an initctl restart you want another event to fire? [17:35] or when you kill the running process, and upstart restarts it automatically [17:35] or both? [17:35] no, really all I want, if possible, is that if it crashes (or if I killed it directly) [17:35] so the latter (I think) [17:36] basically if it restarted due to a crash [17:36] right, k, so i have an event which does this [17:36] sec lemme pastebin it [17:36] thanks [17:37] https://pastebin.mozilla.org/8870501 [17:37] but, when i test it, it doesnt trigger on an initctl restart $process [17:37] which is the goal here. [17:38] huh, so are scripts bash? [17:38] more or less [17:38] with a set -e [17:38] ok [17:38] I assumed they'd be sh [17:38] well my sh is linked to bash [17:38] mine's dash [17:38] i dont use dash [17:38] gotcha. [17:38] dash does ... unexpected things [17:38] heh [17:39] so you install this upstart config [17:39] and it captures basucally all events? [17:40] (hence the JOB!=self_help, which is what this is) [17:41] yeah im having trouble getting it to trigger on a restart though [17:42] restart meaning a crash? [17:42] my thing triggers on a crash [17:42] just not an 'initctl restart' [17:42] ok that's how I want it [17:42] yeah [17:42] that's what I want. [17:42] if someone manually restarts, I think I'm ok with that not being logged especially [17:42] though it would be nice to know as well [17:42] i made this event back when i had heartbeat dying unexpectedly [17:43] and then i had a config file which took actiosn based on what happened [17:43] ie [17:43] heartbeat XCPU echo "nas_self_help: heartbeat died unexpectedly with XCPU Signal, restarting" | wall -n [17:43] so heartbeat was getting a XCPU kill flag for whatever reason (ended up being a bug in heartbeat debug) [17:43] and then id wall it out for instance [17:44] right my thought here is to either page or create an issue or osmething [17:44] not sure [17:44] and it does trigger on a stop / kill / etc [17:44] but not sure about restart atm [17:44] ok well I need to get this installed and I'll see how well it works [17:45] so GCONFIG and PCONFIG (currently) do nothing [17:45] so key signal rest is like [17:45] who did what [17:46] what they did [17:46] and other info? [17:46] or something? [17:46] oh sorry I am reading this and learning more [17:46] I hsould talk to my rubber duck before asking here [17:47] well thoes are just configs that i use [17:47] right [17:47] i can paste thoes in the thing, sec [17:47] thanks that'd be helpful [17:48] https://pastebin.mozilla.org/8870503 [17:49] i guess the main thing i wanted you to originally see was the 'start on stopped JOB=${some_job} [17:50] and then have a task which does whatever you want in a script stanza [17:50] right [17:50] well and having a single task that does it is very elegant [17:51] I was gonna have a script I put in all my scripts [18:09] frew: yeah i cant get it working in the context of an 'initctl restart' [18:11] as a workaround you could do an initctl emit in your pre-stop [18:27] yeah a restart never emits a 'stopping' or 'stopped' event [18:31] chras: fwiw the z thing is not really needed anymore iirc [18:31] frew: ug. i got it [18:31] its definately a bug [18:31] yeah [18:31] if you have a pre-stop script [18:32] then the job goes running -> pre-stop -> start [18:32] and never emits a stopping/stopped [18:32] you think the maintainers of upstart would fix it? [18:32] I was telling my boss about this and he was a little hesitant since upstart is destined for the void [18:32] but so is everything so I'm gonna keep going [18:32] if you DONT have a pre-stop script, then it works as intended [18:32] interesting. [18:32] it MIGHT be this bug, https://bugs.launchpad.net/upstart/+bug/703800 [18:33] btw logger ~~ syslog? [18:33] I'd say even if it is that bug, that bug is 4y old and will probably never get fixed [18:33] right [18:34] I think I'm gonna make an "ANY" job too. [18:34] at least so I can get started with this reasonably. [18:36] anyways, sorry couldnt be more help [18:37] no way this is great [18:37] you 100% solved my issue and added more [18:38] great [20:15] chras: actually I guess I maybe misunderstand you; I don't get logging or anything for a respawn (crash and restart) [20:15] is that what you were saying is broken? [20:17] hi [20:18] k the quick summary was [20:18] IF you have a pre-stop, then an initctl restart does start -> pre-stop -> start , and skips doing a stopping, or stopped event [20:18] ok that's what I thought you were saying. [20:19] this is neither a start, stop, nor restart, but a service that starts and crashes after about 3s, consistently [20:19] and I get no events [20:20] so your event fails to start at all [20:20] and thats what you're trying to catch ? [20:21] well I mean, it never forks, so as long as upstart knows, it worked correctly for 3s [20:21] and then crashed [20:21] but yes, I'm trying to catch an error causing a respawn, likely a compile error [20:21] right. ok, i think i understand [20:21] though it could have been running for three hours [20:21] so your daemon forks, and upstart isnt detecting that it died? [20:21] it would still be the same situation [20:21] no [20:21] it never forks [20:21] it just exec's [20:21] ah [20:22] hence upstart not knowing it's ready [20:22] does your upstart throw an error like [20:23] main process ($pid) terminated with status $exit? [20:23] if it would place an error in the syslog, no [20:23] not sure where else to look [20:23] check dmesg [20:23] do you 'console log' ? [20:24] I don't know? [20:24] mind putting your event on pastebin? [20:24] yeah it's super basic [20:25] https://gist.github.com/frioux/0e4e4c1dc0b02f82a9a1a327ae55c248 [20:25] right k [20:26] i think you want to add a 'console log' to that [20:26] so you can get some minimal logging [20:26] ok [20:26] (fwiw I know why it's crashing, I just want to be able to detect better in the future) [20:27] and you're trying to throw another event when that one fails to start? [20:27] well, I just want to find out, somehow [20:28] originally I was gonna write a pre-start that looked at [20:28] UPSTART_EVENTS [20:28] and if respawn was in there, do a thing [20:28] but your solution was a lot more elegan [20:29] so, when i do this [20:32] https://pastebin.mozilla.org/8870543 [20:32] when my main process ends with a non 0 exit code (which i assume is your failureA) [20:32] it still triggers my self_help [20:32] surely it is [20:33] although... it's probably exiting 255 [20:33] any chance upstart ignores high errors? [20:33] should be anything non-zero [20:33] ok [20:33] unless you told it to ignore certain exit codes [20:33] I don't even know how I'd do that? [20:33] where would that be done [20:33] its just a main-config command [20:34] sec [20:34] lemme find it [20:35] http://upstart.ubuntu.com/cookbook/#normal-exit [20:35] well I showed you the whole file [20:35] unless you can glboally do it, I didn't do that [20:36] yeah so itll be non-zero exit codes [20:36] will cause it to fail [20:37] so my self_help event isnt triggering on your thing exiting weirdly? [20:38] correct [20:38] I have to admit though [20:38] does your reaper fork ? [20:38] or is it a foreground process [20:38] I did clean up your self_help thing to be shorter and simpler [20:38] no it doesn't fork [20:38] it's foreground [20:39] lemme paste the changes I made [20:39] did you run an initctl reload-configuration? [20:39] they shouldn't harm anything [20:39] to get the new event [20:39] yes [20:39] does an initctl start self_help work? [20:39] well I get output about other services [20:39] but let's see [20:39] ah interesting so you know it IS working, just not with your reaper? [20:40] oh wait [20:40] wtf [20:40] initctl: Unknown parameter: JOB [20:40] does that mean antyhing to you? [20:40] yeah [20:40] initctl start self_help JOB=test [20:40] thats just the instancing stuff triggering [20:41] normally JOB is emitted automatically by other upstart events [20:41] initctl: Job failed to start [20:41] right [20:41] no obvious reason why it's failing [20:41] hm, pastebin it ? [20:41] the modified self_help or what? [20:41] yea [20:41] k [20:42] https://gist.github.com/frioux/953fe2f4b950c2bef58521f0ca0e0ad9 [20:42] should be effectively the same, but work with any POSIX shell (and only allow one extra config file) [20:42] (and I renamed it watchdog) [20:43] right [20:45] you're using dash right? [20:45] I wrote it with dash in mind, since we are moving to dash soon, but it's still bash now (12.04) [20:46] oh no [20:46] its just a syntax error for me [20:46] 12.04 is dash [20:46] oh? [20:46] * frew eyeballs [20:47] ./foo.script: 24: ./foo.script: Syntax error: "(" unexpected (expecting "then") [20:47] I wonder why that's wrong [20:47] it's supposed to work [20:47] `(expression) True if expression is true.` from dash(1) [20:48] yeah it looks right [20:48] sec, seeing if i can fix [20:48] can repro fwiw [20:49] fwiw the ()'s are not needed, since -o binds more tightly than -a [20:49] but I thoguht it made it easier to read [20:50] dash -c '[ ( -f /bin/dash ) ] && echo 1' [20:51] I think this must be a bug in dash. [20:52] dash does unexpected things :P [20:52] I never claimed it was perfect [20:53] I jsut can't switch what /bin/sh is [20:54] ok I'll just leave out the () for now [20:54] yeah i cant use ()'s in dash for whatever reason [21:06] oh derp [21:06] dash -c '[ \( -f /bin/dash \) ] && echo true' [21:07] gotta escape them for some reason [21:08] must be a history expansion or something. [21:09] nonetheless I do see this: `self_help(pt-heartbeat-update-mysql): caught job[pt-heartbeat-update-mysql] instance[] process[respawn] result[failed] status[] signal[]` in syslog, but nothing about my reaper [21:20] hm [21:20] trying a different route, but 90% sure that upstart is just not creating an event for that [21:20] I don't see where it would in the lifecycle anyway, fwiw [21:21] you can try bumping the debug up with log-priority debug [21:21] and see where its dying [21:23] hm. [21:23] I have a theory. [21:23] * frew "simplifies" [21:28] yeah, it catches start and stop, but not respawn [21:29] respawn, when the main process dies? [21:29] yeah [21:29] respawn is an even that's not real, to be clear [21:29] I thought it might be [21:29] s/even/event [21:30] I sorta thing consuming events won't work for this, since there isn't one for what I'm thinking of [21:32] lemme try here [21:32] k so [21:33] we're triggering self_help/watchdog on the upstart event 'stopped' [21:33] after something is completely stopped [21:33] for it to also hit your respawn, you need to change that to be 'start on stopping ...' [21:34] i should make that change for my own needs as well i guess [21:37] with it only being set to 'stopped' [21:38] it will wait for upstart to give up on respawning before its triggered [21:38] like i suppose, its the difference between if your job fails 10 times in a row, and upstart gives up on it [21:38] getting 10 emails, or 1 [21:38] if your watchdog sends email out at least. [21:51] oo, interesting though [21:52] the respawn attempts have a result[ok] instead of a result[failed]