=== j4m3s__ is now known as j4m3s_ === Kiall is now known as zz_Kiall === zz_Kiall is now known as Kiall [18:52] is there a way to 'unstick' the upstart system on rhel6 without rebooting? stop: Job has already been stopped: elasticsearch - fair enough. when i start it, though, it hangs indefinitely. i've had this happen before when developing scripts, and the 'fix' is to reboot it and start with a good script. [18:52] rhel6 upstart = upstart-0.6.5-12.el6.x86_64 . old, i know. [18:56] i know my scripts are right now because if i copy elasticsearch.conf to elasticsearch2.conf it starts and stops correctly [19:12] astrostl: stuck how? [19:12] i ended up rebooting to clear it, but [19:13] astrostl: can you paste the output of 'status elasticsearch' and the job.conf ? [19:13] stuck as in 'start elasticsearch' does nothing [19:13] hangs indefinitely [19:13] astrostl: hangs indefinitely sounds like a problem with the .conf file [19:13] as verified with the copy, it isn't. [19:13] http://pastebin.com/6tmJ4tau [19:14] (there is a leading s in the actual file) [19:14] weird, that should return as soon as su is executed [19:14] 'start elasticsearch' - hangs forever [19:14] btw, using su has some problems [19:14] cp elasticsearch.conf elasticsearch2.conf && 'start elasticsearch' - works perfectly [19:14] it opens a pam session [19:14] this isn't an su thing [19:14] agreed, but you should be aware of that [19:15] astrostl: status elasticsearch shows what? [19:15] if i ctrl-c, says it's running [19:15] if i stop, hangs indefinitely again [19:15] if i ctrl-c that, then status, says stopped [19:15] "says its running" is a bit vague [19:15] can you paste the full output? [19:15] i ended up rebooting to clear it [19:16] example: elasticsearch start/running, process 1753 [19:16] astrostl: got syslogs for around that time? [19:16] yes, they have nothing of note [19:16] astrostl: should be something like 'init: ....' [19:17] i watched messages live, it reports nothing at all when it's in "stuck" mode [19:17] not on start, not on stop [19:17] astrostl: I've only ever seen start hang forever when there's a really long post-start or expect fork where the main process never forks [19:17] 'kill -HUP 1' doesn't resolve it either [19:17] HUP'ing init is definitely not advised [19:17] as i said twice, i ended up *REBOOTING* [19:18] relative to that, a HUP on init is not significant in my view [19:18] I understand that. Trying to prep you for the next time. [19:19] astrostl: HUP doesn't do what I thought it did... so ignore that warning. :p [19:19] init is designed to take a hup for reloads (e.g. inittab changes) [19:20] astrostl: ok so your question, how do I unstick a job, is hard to answer without some extra logs.. [19:20] astrostl: if you expect it to happen again, perhaps raise log priority with 'initctl log-priority info' [19:20] i've had this happen 2-3 times during upstart script development [19:21] basic pattern: make an upstart script, start it, oops, try to stop, hangs indefinitely, now we're screwed [19:22] certain conditions from failed script starts seem to put that *NAME* in a hosed state [19:23] yes there is one well known way to do that [19:23] fixing it won't do - fixing it and *RENAMING* (or rebooting) does [19:23] notably, bug #406397 [19:23] is there a well-known way to undo that, aside from rebooting? [19:23] https://bugs.launchpad.net/upstart/+bug/406397 [19:24] astrostl: if its that bug, what has happened is upstart has lost track of the pid it thinks it should be tracking... [19:24] astrostl: the way to know if you've hit that problem is if 'status $jobname' shows a pid that does not exist [19:24] astrostl: the way to fix it w/o reboot is to exhaust the pid space so it does exist, then upstart will kill it [19:24] that sounds like the problem exactly [19:25] lol @ the solution - but that's exactly what i need to know. thx! [19:25] * dluna had the exact same problem a couple of weeks ago [19:28] yeah, I'm hoping a fix can be worked out in the next Ubuntu dev cycle, but I doubt that will land in any RHEL release any time soon with systemd looming [19:28] astrostl: please mark yourself as being affected by that bug.. it helps us figure out what to work on next [19:29] that bug, by far, has the highest "heat" [19:30] will do, although i'm less optimistic that rhel will notice or care given how far back they are from the prod version of upstart [19:31] upvoted, subscribed [20:32] thx again, cya