[10:47] <delkin> Hello. I am trying to create an upstart script that runs on `starting hostname`. When I simulate this with `start hostname` my script runs well, but when I restart the machine it doesnt... does anyone have a hint on this issue?
[10:48] <delkin> the objective is to change the hostname of the machine to be equal to the mac address.
[10:50] <jodh> delkin: what happens on restart? Do you get any errors in /var/log/upstart/$job.log ?
[10:51] <delkin> jodh: I didnt know about that file. Thanks. I will have a look
[10:54] <jodh> delkin: you might want to take a look at http://upstart.ubuntu.com/cookbook/#debugging. An alternative approach btw is to create an /etc/init/hostname.override that contains your custom 'exec hostname ...' command which will replace the default in /etc/init/hostname.conf.
[10:55] <jodh> delkin: that would also be easier since otherwise your new job will have to set the hostname, then call 'stop hostname || true' to ensure that the real hostname job doesn't run and change the hostname again.
[10:56] <delkin> jodh: I see. Thanks a lot. I also believe I will have a tough time changing the hostname. It seems that before the hostname service runs the file system is Read-only, so I cannot update the /etc/hostname.
[10:57] <delkin> jodh: can you suggest me the best moment when (triggered by which service) I should try to change the /etc/hostname file?
[11:02] <jodh> delkin: yes, the hostname job does run before the fs is writeable, but that doesn't stop you from changing the hostname - just 'exec hostname -b myhostname'. You can create another job that specifies 'start on filesystem' that essentially calls 'exec hostname > /etc/hostname'. However, you do have the problem that anything that reads the hostname directly from /etc/hostname between those jobs running will
[11:02] <jodh> have the incorrect value :(
[11:06] <delkin> jodh: I will try to set that new upstart script to `start on (started filesystem and starting static-network-up)`. Something like this, i guess
[11:19] <delkin> jodh: i'm afraid that the all network services run while the file system is still Read-only. This is not good :\
[11:28] <xnox> delkin: "start on remote-filesystem" should be enough, no?
[11:43] <delkin> xnox: :( nop. Looks like the hostname is set before that.
[11:44] <xnox> delkin: well, hostname is and can be set before filesystems are RW. So you can't have both, e.g. you cannot block setting hostname until filesystems are RW.
[11:44] <xnox> delkin: i am not sure i understand your issue then.
[11:47] <delkin> xnox: I am trying to change the hostname during boot to have the mac address in it. Every machine with this configuration would have its mac address as hostname.
[11:48] <xnox> delkin: create a task which fires upon udev event which adds a network interface and local-filesystem, echo desired hostname into /etc/hostname; start --no-wait hostname.
[11:48] <xnox> delkin: this will update the hostname dynamically on each boot, based on first network interface that comes up.
[11:49] <delkin> xnox: I believe that at some point the networking services will look at the /etc/hostname file to set the hostname and let know that to the rest of the network. I am trying to find when I can set the /etc/hostnames (the sweet spot). I have realize that if it is to early the file system is Read-only. If it is too late the networking services already checked the /etc/hostname, so it doesnt matter what I put there anymore
[11:49] <delkin> xnox: I will try that
[11:50] <xnox> delkin: after filesystems are RW, after you update /etc/hostname, you need to run "start --no-wait hostname" which will update hostname (e.g. upon ssh connection new hostname will be visible)
[11:55] <delkin> xnox: so, it will basically override the old hostname, right? I am afraid that it can take minutes before I can ssh using the new hostname. I will give it a try. Thanks!
[11:56] <xnox> delkin: huh? fix your dns server to assign correct hostnames =)
[11:57] <delkin> xnox: the problem is that the DNS takes time to propagate, i think
[11:58] <delkin> xnox: that's why I wanted to do the changes before the network services even started
[11:59] <delkin> xnox: i'm just afraid that at that point the file system is still read only
[11:59] <xnox> delkin: yes, they are but e.g. ssh server is not up yet, so it doesn't matter.
[12:03] <delkin> xnox: the trouble is that the old hostname is published to the DNS when the network services start. Even if I change the hostname before the ssh server, the DNS still thinks that my machine is named with the old hostname
[14:58] <delkin> xnox: I found that if I  `ifdown -a` and then `ifup -a` it will trigger an update in the DNS server.
[17:33] <SpamapS> hey I have a box in production whose init is overwhelmed ..
[17:33] <SpamapS> root         1 91.1  0.0  37772 13384 ?        Rs   11:02 359:42 /sbin/init
[17:34] <SpamapS> not responding to initctl
[17:34] <SpamapS> slangasek: ^^ any ideas?
[17:34] <SpamapS> jodh: ^^ You?
[17:34] <SpamapS> ahh
[17:34] <SpamapS> pipe(0x7fff759353d0)                    = -1 EMFILE (Too many open files)
[17:34] <slangasek> wow
[17:34] <slangasek> yuck
[17:35] <SpamapS> slangasek: caused by bug 1300885
[17:35] <slangasek> ok
[17:35] <SpamapS> slangasek: basically.. OpenStack with neutron creates a network-interface job per tap interface
[17:36] <SpamapS> and on this box, they're trying, and failing, so fast that upstart can't stop them anymore
[17:36] <slangasek> SpamapS: I guess you can forcibly adjust the file limit through /proc to temporarily restore it
[17:36] <SpamapS> I'm killing processes now to try and get enough room to finish
[17:36] <slangasek> ok
[17:39] <SpamapS> slangasek: raising it just raised the nr open :(
[17:39] <SpamapS> # cat /proc/sys/fs/file-nr
[17:39] <SpamapS> 2688	0	10000000
[17:42] <SpamapS> oh n/m
[17:48] <SpamapS> root@ci-overcloud-novacompute6-sq4g5z35pfzt:/proc/1# echo -n "Max open files=unlimited:unlimited" > limits
[17:48] <SpamapS> -su: echo: write error: Invalid argument
[17:48] <SpamapS> slangasek: ^^ any ideas?
[17:49] <slangasek> mm, I don't know the syntax for updating limits offhand
[17:59] <SpamapS> slangasek: ok so I used the minimal program here: http://linux.die.net/man/2/prlimit
[17:59] <SpamapS> slangasek: worked
[18:00] <SpamapS> slangasek: seems init is subjecting itself to the soft limit of 1024 .. given the logging.. that seems like something it should raise on its own.
[18:01] <SpamapS> or refuse to log more things
[18:01] <SpamapS> slangasek: shall I report as a bug? Seems worthy of a fix even for trusty.
[18:17] <slangasek> SpamapS: yeah, I'd say it's bugworthy
[18:18] <SpamapS> slangasek: ok, will add an Ubuntu task to the bug we're tracking in our project. THanks.
[18:18] <SpamapS> err, an upstart task
[18:21] <SpamapS> ahh looks like jodh is already looking at that bug.
[18:22] <SpamapS> slangasek: https://bugs.launchpad.net/upstart/+bug/1300663 for reference.
[18:49] <xnox> SpamapS: if you know the rougue job/jobs you can set "console none" on them via an override file, thus upstart will not log anything for them... still a workaround though.
[18:49] <SpamapS> xnox: yes thats what we're doing now for all of our compute nodes
[18:50] <xnox> console output, would work and spew to console if that's collected / managed.
[18:50] <SpamapS> tap interfaces will come and go at a high rate for nova-compute
[18:50] <SpamapS> xnox: no thanks
[18:50] <xnox> =))))
[18:51] <SpamapS> the real problem, ifdown sqawking on ignored interfaces, is fixed, but I'd rather not be overwhelmed on console or logfiles if some other bug comes up
[19:30] <stgraber> SpamapS: haha, yeah, I got bored of ifdown being a bit too verbose a couple weeks back when doing lxc stress testing, at some point I was wondering what was taking 2GB of my /var/log ;)
[19:31] <stgraber> every run of my test script was basically creating around 10000 veth pairs, so triggering the job around 20000, that pretty quickly made a mess in /var/log/upstart ;)
[19:47] <SpamapS> After thinking for the last hour.. I wonder if 'console output' would make more sense for network-interface
[20:03] <stgraber> it sure would make debugging much harder
[20:03] <stgraber> at least for me
[20:04] <stgraber> currently when something is wrong with networking, I ask for a tarball of /var/log/upstart + /etc/network/interfaces and between the two I can pretty much always figure out what happened. If things were just printing to the console, we'd loose that and as everything is event based, chances are that the console output would be pretty unreadable should something actually go wrong.