[02:57] <twb> For lucid, is this the right way to do multiple instances?  http://paste.debian.net/144039/
[03:44] <twb> How the hell do you get a list of all active instances of a job
[03:44] <broder> twb: initctl list will print out all active instances
[03:47] <twb> Ah, OK, and then grep my job name out of that
[03:49] <twb> Yeah initctl list | sed -rn 's/^network-interface \((.*)\).*/\1/p'
[03:50] <twb> initctl list | sed -rn 's/^network-interface \((.*)\).*/\1/p' | while read url do stop streamripper URL=$url done
[10:15] <lericson> Guys, seriously -- is "Job failed to start" the best diagnostics message I can get?
[10:15] <lericson> I'll tell you what, it does not suffice.
[10:17] <lericson> Also, can I get the instance identifier (i.e. `instance $X/$Y` I want a variable containing "$X/$Y", potentially named $INSTANCE? I would TIAS but it's impossible to diagnostice errors.)
[10:18] <ion> As init(5) says, UPSTART_INSTANCE.
[10:18] <ion> Anything in syslog? If not, make your scripts log something.
[10:18] <lericson> Nov  9 10:12:02 localhost init: gunicorn (julkort/julkort:app) pre-start process (32452) terminated with status 1
[10:18] <lericson> It says
[10:19] <lericson> Which is a little better I agree, but I want the stderr output
[10:19] <ion> pre-start script
[10:19] <ion>   exec >>/tmp/foo 2>&1
[10:19] <ion>   …
[10:20] <jhunt> lericson: http://upstart.ubuntu.com/cookbook/#debugging
[10:23] <jhunt> lericson: Have you checked your job with init-checkconf (http://upstart.ubuntu.com/cookbook/#init-checkconf)? Also, you may be affected by "set -e". Read this snippet for details: http://upstart.ubuntu.com/cookbook/#debugging-a-script-which-appears-to-be-behaving-oddly
[10:27] <lericson> Ah, thanks -- seems my pre-start script wasn't to its liking, will look into later.
[10:27] <lericson> But now I get this weird behavior, inictl list says:
[10:27] <lericson> gunicorn (julkort/julkort:app) start/running
[10:27] <lericson> Yet the process is dead, and the syslog says "main process ended, respawning" twice with one second in between
[10:27] <lericson> I'd expect it to respawn it 10 times within 15 seconds?
[10:28] <lericson> Or does it somehow consider it running?
[10:28] <lericson> (It would seem to be the case given the list output.)
[10:28] <ion> Add similar logging to the main process and see why it dies immediately.
[10:29] <lericson> jhunt: And I don't have init-checkconf on this server, no idea how to get it either.
[10:29] <jhunt> lericson: sounds like you're running lucid?
[10:29] <lericson> ion: It dies immediately because I told it to, but I want to know why Upstart isn't doing as advertised (respawning is a major point)
[10:29] <lericson> jhunt: Yes, LTS
[10:29] <jhunt> lericson: do this then - http://upstart.ubuntu.com/cookbook/#older-versions-of-upstart
[10:31] <lericson> Thank you! I give you two 5/5 in friendliness, expertise and professionalism-- feels like customer support.
[10:32] <jhunt> lericson: I'd guess you hadn't told it to respawn. If you have a service you want to respawn, you need to add the "respawn" stanza to your .conf file (see "man 5 init" for details).
[10:32] <lericson> I like the idea of Upstart by the way, I was a Gentooist for the longest time but I think Upstart > OpenRC
[10:32] <lericson> I did, jhunt
[10:32] <ion> Please paste your job definition and the relevant lines from syslog.
[10:32] <lericson> Relevant might be that the service daemonizes itself (I do have expect fork)
[10:33] <ion> Are you sure the service forks exactly once?
[10:33] <jhunt> lericson: when you say "daemonizes", do you mean that? If so, you need to specify "expect daemon" (2 forks) rather than "expect fork" (1 fork)
[10:34] <jhunt> lericson: this is a known problem - if you don't know how many times your app forks and mis-specify it, Upstart is unable to "track" the pid so you can get odd behaviour. We intend to fix this issue for the next LTS (it's a difficult problem to address).
[10:34] <lericson> http://pb.lericson.se/p/SBiMze/
[10:34] <lericson> I know, jhunt -- I haven't read gunicorn's source lately, so I don't know, but I would guess they don't daemonize properly.
[10:35] <lericson> But you're saying this is a symptom of it losing the PID?
[10:36] <jhunt> lericson: quite possibly. I'd recommend running "strace -fFv -o /tmp/strace.log /usr/bin/gunicorn ..." and grepping for fork/clone calls in /tmp/strace.log.
[10:36] <lericson> Also by the way, my pre-start script seems to check out fine.
[10:37] <ion> It may be simpler just to tell the main process not to daemonize and drop the “expect” stanza.
[10:37] <lericson> Well, interestingly http://gunicorn.org/faq.html#gunicorn-fails-to-start-with-upstart
[10:38] <jhunt> lericson: also, your "stop on" looks wrong - there is no standard "shutdown" event on Lucid. You generally specify "stop on runlevel [016]" (ie stop on halt/single-user mode/reboot).
[10:38] <lericson> Oh, I ripped that off of mysql
[10:39] <lericson> No, I didn't.
[10:39] <lericson> I got it somewhere anyway
[10:45] <lericson> So now initctl stop blah just sits there, apparently doing nothing.
[10:45] <lericson> syslog is empty
[10:46] <ion> I take it initctl status says ”stop/running” for the job?
[10:46] <lericson> ^C'd it and it says gunicorn (julkort/julkort:app) stop/killed, process 32572
[10:46] <ion> ah, stop/killed indeed.
[10:46] <ion> Another symptom of lying to Upstart about the forking behavior of the main process. :-) Run workaround-upstart-snafu 32572. http://heh.fi/tmp/workaround-upstart-snafu
[10:50] <jhunt> I'd recomment: (1) killing the pid of your gunicorn process manually, then (2) retrying the "stop". Then, (3) comment out "respawn" and "expect fork" and starting the job. If the reported pid is correct, gunicorn isn't forking at all and you can then just re-enable the respawn.
[10:50] <jhunt> If the pid is wrong, stop the job again (kill + stop), and try adding "expect daemon".
[10:51] <lericson> ion: It forks until it gets the same pid and then dies? :|
[10:51] <ion> yeah
[10:51] <jhunt> The maximum number of forks an app will do is 2 (there is no benefit doing more)
[10:51] <lericson> jhunt: It's dead already, not even a zombie process
[10:52] <lericson> Out of curiosity, how does it track forks?
[10:53] <jhunt> so does "gunicorn --daemon" coupled with "expect daemon" work for you?
[10:53] <jhunt> lericson: it uses ptrace
[10:53] <lericson> jhunt: I don't know, I can't even get it to start again.
[10:53] <lericson> Haven't tried ion's workaround (no ruby)
[10:54] <jhunt> lericson: We will be making some improvements to the tracking this cycle and may start using cgroups at a later stage. However, 99% of apps can be handled via ptrace.
[10:55] <ion> When status says stop/killed with an inexistent process, Upstart is in a confused state, perpetually expecting to receive a SIGCHLD for a process with that PID. The workaround provides such a process. :-P
[10:56] <ion> jhunt: cgroups, huh? Didn’t Keybuk come up with a proc connector implementation much superior to a cgroups implementation? There’s even a working prototype.
[10:56] <lericson> Can't I restart the daemon? >_>
[10:56] <jhunt> ion: Keybuk and I discussed both options recently. The proc connector has issues.
[10:57] <ion> ok
[10:57] <jhunt> ion: specifically wrt containerized environments
[10:57] <ion> Is there a fix for the cgroups issues?
[10:58] <jhunt> lericson: unfortunately, if the pid being reported by upstart no longer exists, nominally you'll have to reboot to clear Upstarts knowledge of that job. If you're on a dev box, just copy your .conf file to a new name and work with that newly named job for the meantime.
[10:59] <ion> (or use workaround-upstart-snafu)
[11:02] <lericson> So quite predictably, running a tool that is essentially a fork bomb fork bombed the machine
[11:02] <ion> It’s not a fork bomb.
[11:02] <jhunt> lericson: seems that gunicorn doesn't fork at all by default.
[11:03] <lericson> jhunt: No, but it wants to
[11:03] <jhunt> lericson: hence, you don't need "expect" at all.
[11:03] <lericson> As linked above, their FAQ explicitly says "use --daemon"
[11:04] <jhunt> lericson: have you checked your gunicorn config file (daemon=False|True)?
[11:04] <ion> The script has no more than three processes at any time.
[11:04] <lericson> jhunt: I did, and the reason Upstart got confused is me
[11:04] <lericson> However I'm more focused on fixing the issue than placing blame :-)
[11:05] <jhunt> lericson: so, any joy with copying the .conf file and modifying it?
[11:06] <lericson> I'm going with that approach yeah, thanks for the tip-- other question: $UPSTART_INSTANCE expands to $UPSTART_INSTANCE (or rather doesn't expand at all) in my exec line
[11:08] <lericson> Ah, mystery is solved; I had expect fork but Gunicorn does proper daemonization https://github.com/benoitc/gunicorn/blob/master/gunicorn/util.py#L280
[11:10] <jhunt> lericson: I thought we'd already come to that conclusion :)
[11:11] <jhunt> so you should be able to run gunicorn without --daemon and without the "expect" stanza, or with --daemon and "expect daemon". If one/both of these work, I'd appreaciate it if you could let the gunicorn people know so they can update their faq (and maybe provide a "debian/gunicorn.upstart" in their distribution files)
[11:13] <lericson> jhunt: I'll be contributing this upstart script to gunicorn most likely, and yes it does work as intended with --daemon and expect daemon
[11:13] <lericson> Which seems pretty obvious now that I say it, but hey
[11:14] <jhunt> lericson: great - thanks very much!!
[11:15] <lericson> BTW, OpenRC has this /etc/init.d/foo zap command which is extremely useful in situations like these as I am sure you're aware, is this avoided on a philosophical basis or just not implemented?
[11:15] <jhunt> jhunt: I'll update the cookbook with a section on how to identify which stanzas you need for daemons of various types hopefully this week...
[11:17] <jhunt> lericson: well, historically "both", but we do intend to resolve this issue in the current cycle. Since Upstart now supports user jobs, being able to kill a mis-specified user job is particularly important as user jobs can be used as a testing ground for system jobs.
[11:28] <lericson> ISTM a lot of things can make Startup get stuck, nginx has the same issue.
[11:38] <lericson> So the same freeze effect happens if the post-stop script fails?
[11:39] <lericson> Hey, nevermind
[11:40] <lericson> Turns out it doesn't freeze
[11:44] <jhunt> lericson: it can be tricky to write a new .conf file, particularly when it isn't obvious what the daemon is actually doing. However, once Upstart knows how to establish the pid of your job, you're in good hands :)
[11:48] <lericson> (-:
[11:49] <lericson> It occurred to me that you can't really expand variables in the `exec` line, is this true?
[11:50] <lericson> Or rather they need to be isolated somehow?
[11:58] <jhunt> lericson: expansion should work fine. Maybe you've somehow got some unicode dollar/underscores in the original version of your .conf file?
[13:06] <lericson> jhunt: It was in the env lines
[13:13] <jhunt> lericson: aha!
[13:40] <lericson> So how does one restart a service with Upstart?
[13:41] <lericson> initctl restart <job>
[13:41] <lericson> Ok, thanks.
[13:41] <lericson> Does this first stop the job and the start it again, or how does it work?
[13:44] <jY> yes that's how it works
[14:15] <lericson> So, is there a way to add a reload event? I was thinking along the lines of initctl emit reload JOB=nginx
[14:16] <lericson> Hmmmm
[14:33] <SpamapS> lericson: restart stops and starts the job without reloading the job config
[14:33] <SpamapS> lericson: initctl reload will tell upstart to send SIGHUP to the tracked process
[14:43] <lericson> SpamapS: The daemon I'm writing the script cycles PID on HUP.
[14:44] <lericson> So I have a task without an exec or script section, only post-start and post-stop. It is meant to start a set of services on boot
[14:45] <lericson> The problem is that it does nothing.
[14:45] <lericson> In fact starting it just makes initctl stop
[15:00] <SpamapS> lericson: yeah, you have to use stop/start for that
[15:01] <SpamapS> I actually don't like restart for most operations. :-/
[15:01] <lericson> I think it's a mistake assuming all daemons reload with SIGHUP without changing, and that it is a mistake assuming that all daemons best restart via stop + start.
[15:02] <codebeaker> lericson: I think there's a missing pardigm [:start, :stop, :restart (stop, something else?, :start), :reload]
[15:03] <codebeaker> ^ would fit the way daemons actually need to work
[15:04] <lericson> on reload exec nginx -s reload
[15:05] <lericson> make it go!
[15:07] <codebeaker> ^ lericson ?
[15:07] <codebeaker> is that valid "on reload …………"
[15:07] <codebeaker> ?
[15:10] <lericson> Nope, but that's what I want.
[15:12] <codebeaker> ^ right :)
[15:14] <SpamapS> You can change the kill signal now... but I don't think you can change the reload signal. :-P
[15:15] <SpamapS> lericson: its not meant to do all service orchestration for you. Its meant to facilitate starting and stopping of services. You can write a script to do complex interactions like that.
[15:16] <SpamapS> I agree that assuming behaviors of tracked daemons causes as many problems as it solves.
[15:16] <codebeaker> SpamapS: HUP certainly wasn't designed historically to tell a process to RELOAD
[15:23] <SpamapS> indeed, it just became a convention of nearly every daemon ;)
[15:24] <SpamapS> since daemons, by definition, do not have a terminal, "the terminal hung up" isn't meaningful for them in its original context
[15:31] <codebeaker> that's a point I hadn't considered
[15:31] <codebeaker> but, what's stopping people inventing new signals ?
[15:31] <codebeaker> (theoretically?)
[15:42] <lericson> Standardization across platforms
[15:59] <codebeaker> sure, but upstart introduced an arbitrary signaling interface, did it not ?
[15:59] <codebeaker> or, "Eventing" rather