=== j4m3s__ is now known as j4m3s_ | ||
=== Kiall is now known as zz_Kiall | ||
=== zz_Kiall is now known as Kiall | ||
astrostl | is there a way to 'unstick' the upstart system on rhel6 without rebooting? stop: Job has already been stopped: elasticsearch - fair enough. when i start it, though, it hangs indefinitely. i've had this happen before when developing scripts, and the 'fix' is to reboot it and start with a good script. | 18:52 |
---|---|---|
astrostl | rhel6 upstart = upstart-0.6.5-12.el6.x86_64 . old, i know. | 18:52 |
astrostl | i know my scripts are right now because if i copy elasticsearch.conf to elasticsearch2.conf it starts and stops correctly | 18:56 |
SpamapS | astrostl: stuck how? | 19:12 |
astrostl | i ended up rebooting to clear it, but | 19:12 |
SpamapS | astrostl: can you paste the output of 'status elasticsearch' and the job.conf ? | 19:13 |
astrostl | stuck as in 'start elasticsearch' does nothing | 19:13 |
astrostl | hangs indefinitely | 19:13 |
SpamapS | astrostl: hangs indefinitely sounds like a problem with the .conf file | 19:13 |
astrostl | as verified with the copy, it isn't. | 19:13 |
astrostl | http://pastebin.com/6tmJ4tau | 19:13 |
astrostl | (there is a leading s in the actual file) | 19:14 |
SpamapS | weird, that should return as soon as su is executed | 19:14 |
astrostl | 'start elasticsearch' - hangs forever | 19:14 |
SpamapS | btw, using su has some problems | 19:14 |
astrostl | cp elasticsearch.conf elasticsearch2.conf && 'start elasticsearch' - works perfectly | 19:14 |
SpamapS | it opens a pam session | 19:14 |
astrostl | this isn't an su thing | 19:14 |
SpamapS | agreed, but you should be aware of that | 19:14 |
SpamapS | astrostl: status elasticsearch shows what? | 19:15 |
astrostl | if i ctrl-c, says it's running | 19:15 |
astrostl | if i stop, hangs indefinitely again | 19:15 |
astrostl | if i ctrl-c that, then status, says stopped | 19:15 |
SpamapS | "says its running" is a bit vague | 19:15 |
SpamapS | can you paste the full output? | 19:15 |
astrostl | i ended up rebooting to clear it | 19:15 |
astrostl | example: elasticsearch start/running, process 1753 | 19:16 |
SpamapS | astrostl: got syslogs for around that time? | 19:16 |
astrostl | yes, they have nothing of note | 19:16 |
SpamapS | astrostl: should be something like 'init: ....' | 19:16 |
astrostl | i watched messages live, it reports nothing at all when it's in "stuck" mode | 19:17 |
astrostl | not on start, not on stop | 19:17 |
SpamapS | astrostl: I've only ever seen start hang forever when there's a really long post-start or expect fork where the main process never forks | 19:17 |
astrostl | 'kill -HUP 1' doesn't resolve it either | 19:17 |
SpamapS | HUP'ing init is definitely not advised | 19:17 |
astrostl | as i said twice, i ended up *REBOOTING* | 19:17 |
astrostl | relative to that, a HUP on init is not significant in my view | 19:18 |
SpamapS | I understand that. Trying to prep you for the next time. | 19:18 |
SpamapS | astrostl: HUP doesn't do what I thought it did... so ignore that warning. :p | 19:19 |
astrostl | init is designed to take a hup for reloads (e.g. inittab changes) | 19:19 |
SpamapS | astrostl: ok so your question, how do I unstick a job, is hard to answer without some extra logs.. | 19:20 |
SpamapS | astrostl: if you expect it to happen again, perhaps raise log priority with 'initctl log-priority info' | 19:20 |
astrostl | i've had this happen 2-3 times during upstart script development | 19:20 |
astrostl | basic pattern: make an upstart script, start it, oops, try to stop, hangs indefinitely, now we're screwed | 19:21 |
astrostl | certain conditions from failed script starts seem to put that *NAME* in a hosed state | 19:22 |
SpamapS | yes there is one well known way to do that | 19:23 |
astrostl | fixing it won't do - fixing it and *RENAMING* (or rebooting) does | 19:23 |
SpamapS | notably, bug #406397 | 19:23 |
astrostl | is there a well-known way to undo that, aside from rebooting? | 19:23 |
SpamapS | https://bugs.launchpad.net/upstart/+bug/406397 | 19:23 |
SpamapS | astrostl: if its that bug, what has happened is upstart has lost track of the pid it thinks it should be tracking... | 19:24 |
SpamapS | astrostl: the way to know if you've hit that problem is if 'status $jobname' shows a pid that does not exist | 19:24 |
SpamapS | astrostl: the way to fix it w/o reboot is to exhaust the pid space so it does exist, then upstart will kill it | 19:24 |
astrostl | that sounds like the problem exactly | 19:24 |
astrostl | lol @ the solution - but that's exactly what i need to know. thx! | 19:25 |
* dluna had the exact same problem a couple of weeks ago | 19:25 | |
SpamapS | yeah, I'm hoping a fix can be worked out in the next Ubuntu dev cycle, but I doubt that will land in any RHEL release any time soon with systemd looming | 19:28 |
SpamapS | astrostl: please mark yourself as being affected by that bug.. it helps us figure out what to work on next | 19:28 |
SpamapS | that bug, by far, has the highest "heat" | 19:29 |
astrostl | will do, although i'm less optimistic that rhel will notice or care given how far back they are from the prod version of upstart | 19:30 |
astrostl | upvoted, subscribed | 19:31 |
astrostl | thx again, cya | 20:32 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!