/srv/irclogs.ubuntu.com/2012/08/29/#upstart.txt

=== bradleyayers_ is now known as bradleyayers
mitsuhikodoes anyone have any ideas why upstart sometimes i unable to pickup changes?12:07
mitsuhikohow does one go about debugging upstart?12:07
mitsuhikodoes this ring a bell to anyone? https://gist.github.com/5f9061af79bb8b38d24012:29
mitsuhikoservice start just hangs12:29
mitsuhikoi can confirm it does not even manage to start the executable12:31
mitsuhikogod knows what it waitpids for12:31
mitsuhikoi replaced the config file with one that just logs something12:32
mitsuhikoand it still does not do anything12:32
mitsuhikoforce initctl reload-configuration does not do anything either12:33
mitsuhikowhat the fucking fuck. i just renamed the service and that fixes it12:34
jodhmitsuhiko: is 'salt-master' a daemon by any chance? If you haven't already done so, I'd recommend reading http://upstart.ubuntu.com/cookbook/#precepts-for-creating-a-job-configuration-file.12:44
jodhmitsuhiko: also, if you are creating this .conf from scratch, don't add respawn until you are convinced the job is working as you expect (http://upstart.ubuntu.com/cookbook/#respawn).12:45
jodhmitsuhiko: also, you can remove all that redirection and either specify "console none" or just remove it any have Upstart auto-log any output to /var/log/upstart/<job>.log.12:45
jodhmitsuhiko: another point - you don't need the script/end-script - just use 'exec'.12:46
jodhmitsuhiko: to understand what is going on with your original job, 'initctl status <job>'. I suspect salt-master is forking but you haven't told Upstart that it does that (respawn stanza), which will lead to interesting results.12:47
jodhmitsuhiko: oops - I meant 'expect' stanza, not 'respawn' above.12:47
mitsuhikojodh: the file works just fine12:59
mitsuhikoi renamed it and it works12:59
mitsuhikoso something in upstart is broken13:00
mitsuhikoalso the file worked for two months without changes13:00
mitsuhikojodh: the script/end-script was there for debugging purposes13:00
mitsuhikobefore it was expect daemon and salt-master -d as only exec line13:00
jodhmitsuhiko: Upstart is not broken. The problem I think is that your original unrenamed job is not working as you expect, and since Upstart was unable to track the PID of the initial job, you were not able to start it (as Upstart thought it was still running).13:01
mitsuhikojodh: so how do i fix the unrename job?13:02
mitsuhikoand no, the script was always correct13:02
mitsuhikoi verified by spawning instances of all permutations of scripts i tried under the new name13:02
mitsuhikoall of them work13:02
jodhmitsuhiko: well, if you had 'expect daemon' and tried to start the job but salt-master does *not* fork, that too will confuse Upstart, hence your problem.13:02
mitsuhikojodh: as i said, all permutations work13:02
mitsuhikoi tried expect daemon with -d and no expect with the redirected script block13:03
mitsuhikoboth work fine as expected13:03
mitsuhikomy salt-master.conf file still does not start13:03
mitsuhikothe same file as salt-wtf.conf does start13:03
jodhmitsuhiko: and you verified that the PID Upstart reported in 'status job' reflected the real pid?13:03
mitsuhikojodh: the service is not running13:03
mitsuhikoit *does not start*13:03
mitsuhikoit hangs13:03
mitsuhikoas shown by the strace above13:03
jodhmitsuhiko: what does 'status job' show?13:04
mitsuhiko$ status salt-master13:04
mitsuhikosalt-master stop/killed, process 724013:04
jodhmitsuhiko: and does pid 7240 exist on your system?13:04
mitsuhikono13:04
mitsuhikoit might have at one point but it's hard to say because upstart is completely unable to do anything with salt-master.conf at this point13:05
mitsuhikoi cannot start it13:05
mitsuhikoi cannot stop it13:05
mitsuhikoyet if i rename it to salt-wtf.conf i can start and stop it properly13:05
mitsuhikointictl reload-configuration does not help13:05
jodhmitsuhiko: right, so I think you or someone else has at some point changed the original conf file and attempted to start that job. Upstart is unable to track the pid as 'expect' probably wasn't specified correctly. Hence, you cannot start that job as it's in a bad state. Copying the job file creates a brand new job, so is not encumbered by that problem ;)13:05
mitsuhikojodh: trust me, there was nothing wrong with the file in the first place13:06
mitsuhikojodh: but assuming it was, how do you let upstart forget about the job?13:06
jodhmitsuhiko: unfortunately, currently you can't without using gross hacks (or rebooting): the whole point is that Upstart is supposed to be supervising your services so it should not be possible to say "just forget about this one". That said, we are considering adding a feature as this does catch folks out occasionally.13:07
mitsuhikoand you're telling me that is not an upstart bug13:08
mitsuhikoseriously. i fix this by rebooting?13:08
jodhmitsuhiko: ultimately, building up a new .conf file step-by-step should allow you to be assured the job is behaving correctly such that you'd never get into that scenario.13:08
mitsuhikojodh: seriously. that file was never wrong13:08
mitsuhikofeel free to mistrust me on that one, but it was always correct13:09
mitsuhikohow do i know? because we used this for two months13:09
mitsuhikowhat changed? i did a service stop, upgrade on salt, service start13:09
mitsuhikostopped functioning13:09
jodhmitsuhiko: it admit, it's not ideal, but I've explained the rationale. Patches welcome of course ;-)13:09
jodhmitsuhiko: but did you test the job with stop/start/restart followed by killing the PID to see how it behaves on respawn?13:09
mitsuhikojodh: it does not start13:09
mitsuhikothere is no pid to kill13:09
mitsuhikoit does not start13:10
mitsuhikoand in case you have not noticed it, i am kinda pissed off right now because i am debugging this problem for two hours by now13:10
mitsuhikoand now it turns out to be a bug in upstart with the only solution being a … restart?13:10
mitsuhikothere is also zero information that upstart gives even at highest debug levels13:11
mitsuhikoi wonder if telinit c would fix it, but i am too scared to try that13:11
jodhmitsuhiko: I don't know the full details of what has happened on your system. However, the limitation in Upstart is that is it not currently possible to rectify a problem cause by job misconfiguration.13:11
mitsuhikothere was no job misconfiguration13:11
mitsuhikojodh: what would the gross hack be?13:12
mitsuhikoi much rather not reboot that machine13:12
jodhmitsuhiko:  it's not something I would consider - it involves exhausting the PID namespace until you get back to the PID shown in 'status <job>' to allow that (pid) to  be stopped.13:14
mitsuhikojodh: exhausting does not work14:02
mitsuhikoit did not even try to issue a signal14:02
mitsuhikoreadlink("/proc/7340/root", "/", 4096)  = 114:05
mitsuhikowat14:05
mitsuhikoah yes.  recorded the wrong pid14:16
mitsuhiko7340 != 7240.  Not sure why upstart showed the wrong one on the status message14:16
mitsuhikoi suppose what happend is that on salt upgrade the process did not deamonize properly and upstart responded badly to it.14:16
mitsuhikorace?14:16
freerobbyCan anybody explain why this works in a shell, but when I start a process via torquebox upstart, it doesn't honor the system ulimits? https://gist.github.com/7fff364c0bca20c27aa522:59
freerobbyhttps://gist.github.com/6b22f5e1742f2bd63a7522:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!