[18:52] <astrostl> is there a way to 'unstick' the upstart system on rhel6 without rebooting?  stop: Job has already been stopped: elasticsearch - fair enough.  when i start it, though, it hangs indefinitely.  i've had this happen before when developing scripts, and the 'fix' is to reboot it and start with a good script.
[18:52] <astrostl> rhel6 upstart = upstart-0.6.5-12.el6.x86_64 .  old, i know.
[18:56] <astrostl> i know my scripts are right now because if i copy elasticsearch.conf to elasticsearch2.conf it starts and stops correctly
[19:12] <SpamapS> astrostl: stuck how?
[19:12] <astrostl> i ended up rebooting to clear it, but
[19:13] <SpamapS> astrostl: can you paste the output of 'status elasticsearch' and the job.conf ?
[19:13] <astrostl> stuck as in 'start elasticsearch' does nothing
[19:13] <astrostl> hangs indefinitely
[19:13] <SpamapS> astrostl: hangs indefinitely sounds like a problem with the .conf file
[19:13] <astrostl> as verified with the copy, it isn't.
[19:13] <astrostl> http://pastebin.com/6tmJ4tau
[19:14] <astrostl> (there is a leading s in the actual file)
[19:14] <SpamapS> weird, that should return as soon as su is executed
[19:14] <astrostl> 'start elasticsearch' - hangs forever
[19:14] <SpamapS> btw, using su has some problems
[19:14] <astrostl> cp elasticsearch.conf elasticsearch2.conf && 'start elasticsearch' - works perfectly
[19:14] <SpamapS> it opens a pam session
[19:14] <astrostl> this isn't an su thing
[19:14] <SpamapS> agreed, but you should be aware of that
[19:15] <SpamapS> astrostl: status elasticsearch shows what?
[19:15] <astrostl> if i ctrl-c, says it's running
[19:15] <astrostl> if i stop, hangs indefinitely again
[19:15] <astrostl> if i ctrl-c that, then status, says stopped
[19:15] <SpamapS> "says its running" is a bit vague
[19:15] <SpamapS> can you paste the full output?
[19:15] <astrostl> i ended up rebooting to clear it
[19:16] <astrostl> example: elasticsearch start/running, process 1753
[19:16] <SpamapS> astrostl: got syslogs for around that time?
[19:16] <astrostl> yes, they have nothing of note
[19:16] <SpamapS> astrostl: should be something like 'init: ....'
[19:17] <astrostl> i watched messages live, it reports nothing at all when it's in "stuck" mode
[19:17] <astrostl> not on start, not on stop
[19:17] <SpamapS> astrostl: I've only ever seen start hang forever when there's a really long post-start or expect fork where the main process never forks
[19:17] <astrostl> 'kill -HUP 1' doesn't resolve it either
[19:17] <SpamapS> HUP'ing init is definitely not advised
[19:17] <astrostl> as i said twice, i ended up *REBOOTING*
[19:18] <astrostl> relative to that, a HUP on init is not significant in my view
[19:18] <SpamapS> I understand that. Trying to prep you for the next time.
[19:19] <SpamapS> astrostl: HUP doesn't do what I thought it did... so ignore that warning. :p
[19:19] <astrostl> init is designed to take a hup for reloads (e.g. inittab changes)
[19:20] <SpamapS> astrostl: ok so your question, how do I unstick a job, is hard to answer without some extra logs..
[19:20] <SpamapS> astrostl: if you expect it to happen again, perhaps raise log priority with 'initctl log-priority info'
[19:20] <astrostl> i've had this happen 2-3 times during upstart script development
[19:21] <astrostl> basic pattern: make an upstart script, start it, oops, try to stop, hangs indefinitely, now we're screwed
[19:22] <astrostl> certain conditions from failed script starts seem to put that *NAME* in a hosed state
[19:23] <SpamapS> yes there is one well known way to do that
[19:23] <astrostl> fixing it won't do - fixing it and *RENAMING* (or rebooting) does
[19:23] <SpamapS> notably, bug #406397
[19:23] <astrostl> is there a well-known way to undo that, aside from rebooting?
[19:23] <SpamapS> https://bugs.launchpad.net/upstart/+bug/406397
[19:24] <SpamapS> astrostl: if its that bug, what has happened is upstart has lost track of the pid it thinks it should be tracking...
[19:24] <SpamapS> astrostl: the way to know if you've hit that problem is if 'status $jobname' shows a pid that does not exist
[19:24] <SpamapS> astrostl: the way to fix it w/o reboot is to exhaust the pid space so it does exist, then upstart will kill it
[19:24] <astrostl> that sounds like the problem exactly
[19:25] <astrostl> lol @ the solution - but that's exactly what i need to know.  thx!
[19:25]  * dluna had the exact same problem a couple of weeks ago
[19:28] <SpamapS> yeah, I'm hoping a fix can be worked out in the next Ubuntu dev cycle, but I doubt that will land in any RHEL release any time soon with systemd looming
[19:28] <SpamapS> astrostl: please mark yourself as being affected by that bug.. it helps us figure out what to work on next
[19:29] <SpamapS> that bug, by far, has the highest "heat"
[19:30] <astrostl> will do, although i'm less optimistic that rhel will notice or care given how far back they are from the prod version of upstart
[19:31] <astrostl> upvoted, subscribed
[20:32] <astrostl> thx again, cya