/srv/irclogs.ubuntu.com/2009/10/27/#upstart.txt

PovAddicthttp://lwn.net/Articles/351013/ this is absolutely awesome02:17
mbtHi.  I'm having issues with Upstart (as found in Ubuntu Karmic) successfully starting a system when run in an LXC container. I'm hoping someone may have an idea of what I might try to do to fix it.03:29
mbtstrace says it's hanging on a read from a pipe pretty early into startup, just after trying (and failing) to open a connection to the /var/run/dbus/system_bus_socket UNIX socket.03:32
sadmac_tokyombt: you should probably try in an ubuntu channel first04:56
JanCI doubt "LXC containers" are something officially supported by Ubuntu?04:59
mbtLXC support is rather new to Ubuntu, I was unable to find anyone who seemed to know what they were let alone what could be an issue.04:59
mbtThat said, as I have no clue what's going on, I'm kind of stuck. Currently reading through the upstart sources to see if that will turn on any light bulbs for me.05:00
JanCmight be useful to ask LXC people too05:01
mbtI have no clue where to even find them.05:02
JanCprobably a permission issue or something05:02
JanCmbt: they've sent patches to the kernel...  ;-)05:03
mbtlol05:03
JanCLKML must have their email address  ;-)05:04
mbtlol, that's true.  Something tells me that would take longer than trying to solve by brute force, though.05:06
mbtThe last time I emailed any kernel maintainer it took weeks and I got brushed off because it couldn't have been a software fault in the kernel.05:07
sadmac_tokyombt: who?05:07
mbtThe maintainers for a USB->Parallel Printer cable driver.  (Driver would eat a character of output to the printer and require the printer to signal offline/online before printing, weird).05:10
mbtI don't remember who it was at this point, though, that was a few years ago.05:10
sadmac_tokyombt: the trick is to figure out the sociable ones and email the closest one to what your issue is05:11
sadmac_tokyombt: Ingo is good, Andrew Morton is good....05:11
JanCI guess the language of your mail mathers too  ;-)05:14
JanCmatters05:14
sadmac_tokyoJanC: If you mean literal language: English.05:14
sadmac_tokyokernel development occurs in English.05:14
sadmac_tokyoNo exceptions05:14
PovAddict'your sh***y code is f***ing broken'05:15
JanCsadmac_tokyo: more like PovAddict's example won't help  ;)05:15
sadmac_tokyo"Good day, kind sir. It is my humble regret to inform you that your sh***y code is f***ing broken."05:17
JanCsadmac_tokyo: feel free to try it05:19
mbtThat said, I can't find anything that points to the kernel so I wouldn't have a clue what to even seek out.05:22
mbtThis same host kernel is running several other systems, including older (by about 70 days) Karmic systems.  Just an up-to-date Karmic guest doesn't work.  I don't have any way to test upstream components to try to narrow it down further.05:22
PovAddictI once commented on some disgusting code duplication (adding a feature by copy/paste/minor replace), and the project BDFL replied 'if you have better code, let's see it; otherwise shut up'05:23
mbtThere is some more info at bug 461638 in launchpad.05:23
mbtRather, bug 46143805:23
sadmac_tokyoPovAddict: he's right. if you know exactly what's wrong, why not fix it?05:24
mbtThat's the type of thing I'd send a patch for.05:24
PovAddictmbt: I'd start by comparing package versions, and trying downgrading candidates to the same version as your old-but-working system05:24
PovAddictcandidate = package that might possibly be the cause or related05:25
mbtPovAddict, I don't know how likely that is to work; there are a whole slew of changes around the way things went in.  There's new packages for things like mountall and I don't think rolling back is going to be straightforward.05:25
mbtOf course, that was also my fault for not snapshotting the bloody FS first.  I always do that. This time, I forgot to.  :(05:26
PovAddictlet's start with the obvious05:26
PovAddictis it same kernel and same upstart on working and non-working systems?05:27
mbtSame kernel (Linux Containers/LXC is like BSD Jail).  Host hasn't changed configuration and works with the latest stuff; so upstart on the host works, but not in the container.  At least, I think it's upstart, that's what's failing to read something else.05:28
mbtThough, the "something else" part I haven't figured out (yet).05:29
Keybukmbt: the fact that you can strace Upstart scares me10:26
ion:-)13:29
mbtApparently, while strace will work, ltrace does not.15:05
* mbt shrugs15:05
Keybukheh15:34
Keybukstrace isn't supposed to work on init15:34
PovAddictwhy not?15:35
Keybukbecause ptrace() relies the process you are tracing being your child15:36
Keybukthis means that you become the parent of init15:36
Keybukwhich means that strace becomes init15:36
Keybukthis is obviously wrong15:36
PovAddictyou can attach to an existing process too15:37
PovAddictand in his case, probably upstart is inside the container and strace is outside... or something15:38
Keybukyes15:38
Keybukand when you attach to the existing process15:38
Keybukyou become its parents15:38
Keybukyou get SIGCHLD if it does, not its real parent15:38
Keybuket.c15:38
Keybukit dies, etc.15:38
mbtKeybuk, it can work on init in a container, because in a container, init has PID 1, but outside the container it has a parent.16:45
mbtKeybuk, see http://pastebin.com/d55fbfbbb -- those have non-1 PIDs, but when as far as they are concerned, they are PID 1 inside of the container.16:46
mbtThe problem with ltrace is documented as a bug in its man page, so that's pretty much a dead end for me.16:47
mbtAnd it's not like I can install sysvinit, because Karmic doesn't have it apparently.16:48
mbtSo I can't even verify that the problem is what I think it is.16:48
PovAddicthttp://packages.ubuntu.com/search?keywords=sysvinit right, it's gone in karmic16:49
mbtYeah, so I'm feeling rather stuck at the moment.16:50
mbtI wonder....16:51
mbtIf I take a core dump of the failing Upstart process, would anyone know how to extract anything useful about the state of the process from it?16:51
mbtI've attached one to https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/461438 though I don't know how useful it will be.16:56
Keybukyes, absolutely16:57
Keybukdid Upstart core dump?16:57
Keybukhonestly though, you're not really describing your problem16:58
Keybukon the bug you're talking about mountall outputting errors16:58
Keybukbut here's you're talking about upstart core dumps16:58
mbtI'm trying to figure out *what* the problem is.  At first, I thought it was mountall.  Now it's looking like it's upstart.  The problem is the container won't start.16:58
Keybukdo you get a pid 1 in the container?16:59
mbtAnd no, I used gcore to get the process memory.16:59
mbtYes, that coredump is taken from the container's PID 1.  That's the *only* process that is running in the container.16:59
Keybukso the container is clearly starting16:59
mbtAlright, the container shell starts, but the stuff in the container does not.17:00
Keybukif no other processes are running, how do you get error messages from them?17:00
Keybukbecause you reported the bug with error messages from something upstart had run17:00
Keybukso stuff in the container *CLEARLY IS* running17:00
mbtYes, I did.  It was running mountall.  To see if that was the problem, I replaced mountall with a script that returns 0.17:00
Keybukdid the script return 0?17:00
Keybukdid Upstart see the script return 0 and return exited normally?17:00
mbtNow, if you look at the 4th comment on the bug you see where Upstart is hanging.17:00
mbtMountall is the last thing it does, and it goes no further.  It sees that it exits normally.17:01
mbtAfter mountall is done, upstart hangs forever, blocked on either a read or a select() call.  The data I have collected there seems to be inconsistent.17:01
mbtSo not only am I confused and not sure where the problem's root cause is (the only thing I *do* know is that it's related to a change in the container; Karmic as it was installed in the container 70 days ago worked just fine; Karmic as was updated 2 days ago does not).17:02
mbtBut I can't seem to collect enough data to figure out *why* it's hanging.17:02
mbtAll I know is that it is.17:03
mbtAnd that all Upstart does is start mountall, hostname, and hwclock; nothing else is getting run, and init is the only process that is left running when those are completed.17:03
Keybukright17:05
Keybukbut that can just mean that Upstart is done17:05
Keybukdid mountall work?17:05
Keybukdo you see events being emitted by mountall?17:05
mbtAlright, let's back up just a second here.17:05
mbtComment 4 on the bug shows all there is to show on Upstart's early startup.  From there, nothing else in the container that is configured to start, does.  Upstart isn't finished because the system never finishes booting; no gettys are spawned, no avahi-daemon, no sshd, nothing.  When the container is fully booted, it should have about 20 processes running, including a web server.17:07
mbtThe last thing to happen in that is mountall exits normally (according to upstart) and it changes state from post-stop to waiting.17:08
mbtNow, the container has nothing to mount (all of that is done by lxc-start).17:08
Keybukand what does mountall say?17:10
mbtWhat do you mean? It is a script that returns 0.17:10
mbtNow, here's something else:17:10
mbtI disabled the mountall, hwclock and hostname .conf files in /etc/init --- and now, Upstart is doing nothing.17:10
mbtIt says:17:10
mbtroot@spicerack:~# exec /sbin/init -v17:11
mbtLoading configuration from /etc/init.conf17:11
mbtLoading configuration from /etc/init17:11
mbtinit: Handling startup event17:11
mbtAnd hangs.17:11
mbtAttaching to it with gdb (from outside the container) shows that it's blocking on a call to select().17:11
Keybukerr17:12
Keybukwhy is mountall a script that returns zero17:12
Keybukmountall is a C binary17:12
mbtYou're not paying attention to what I've said.  12 minutes ago, right here, in this conversation, I said that I replaced it with a script that returns success to see if that was the culprit.17:12
Keybukwell, that won't work17:12
mbtAs everything in the container is already mounted when lxc-start forks and exec's /sbin/init, mountall has nothing to do in the container in the first place.17:13
Keybukyes it does17:13
Keybukit had to send the events that the rest of the system is waiting for17:13
Keybukotherwise nothing else will start17:13
Keybukrather than an exit 017:13
Keybukwhy not?17:13
Keybuk  initctl emit virtual-filesystems17:13
Keybuk  initctl emit local-filesystems17:13
Keybuk  initctl emit remote-filesystems17:13
Keybuk  initctl emit filesystem17:13
Keybuk  exit 017:13
Keybukyou may want || true on the end of those17:14
mbtInteresting.  So the bug *is* in mountall?17:14
Keybukor maybe -n17:14
Keybukno, not at all17:14
Keybukyou're not running mountall17:14
mbtI was.17:14
Keybukyou replaced it with a shell script17:14
mbtAnd it was still doing nothing.17:14
mbtThat's why I replaced it with the shell script, to see if the system *still* did nothing.17:14
mbtLet me swap back and see if that has any effect.17:14
mbtOkay, swapping back, I still hang, though the output of exec /sbin/init -v has changed:  https://bugs.edge.launchpad.net/ubuntu/+source/mountall/+bug/461438/comments/717:16
mbtSo mountall is sending events just fine, but upstart is still failing to proceed with the boot process.17:17
Keybukwhat does mountall output?17:17
Keybukmountall hasn't sent the virtual-filesystems event yet17:18
Keybuklet along the filesystem event17:18
Keybukso nothing will start yet17:18
mbtWhat do you mean, what does mountall output?  Everything that is output is there on the link I just pasted.  That's it, in its entirety.17:18
mbtI don't see any event names in the output so I can't tell what's being fired or not.17:19
mbtIf I use that script you put here a few minutes ago, though, things seem to work.17:21
mbtWell sort-of.  I get some things started.17:22
Keybukok17:22
Keybuktry mountall --debug17:22
mbtBefore execing init, or from the script?17:23
Keybukfrom the mountall script17:26
mbtIf I do that, it spawns over and over and over again, like a giant fork-bomb.17:28
Keybukreally?17:29
mbtYep.  I had to kill the container forcibly.17:29
* Keybuk doesn't see respawn in mountall.conf17:29
mbtI don't have one in mine, either.  I haven't modded any of the *.conf files in /etc/init.17:29
mbtBut there were 3000 processes when I killed the container.17:30
mbtAt this point, I have the container running using the initctl statements to fake mountall's presence.17:32
mbtI'm going to create a new container to continue debugging in, since I need this one to be up and running.17:32
mbtI'll bbiaf, need to switch back to my regular freenode profile.17:33
mbtBack.17:33
=== robbiew is now known as robbiew-afk

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!