/srv/irclogs.ubuntu.com/2021/01/29/#cloud-init.txt

=== jmcgnh_ is now known as jmcgnh
=== jmcgnh is now known as jmcgnh_
=== jmcgnh_ is now known as jmcgnh
=== hjensas\|afk is now known as hjensas
march0	Hello, I have a race issue between Ubuntu 20.04.1 LTS cloudinit & unattended-upgr. Basically, my cloudinit is mime multipart of several yaml with repo_update: true & some package list, and sometimes, unattended-upgrades decides to kick in and get the /var/lib/dpkg/lock-frontend.	08:30
march0	and ofc, cloudinit fails	08:30
march0	is this a known behavior ? or should I fill a bug. So far didn't find any existing bug or related documentation	08:51
tribaal	march0: this sounds similar to https://bugs.launchpad.net/cloud-init/+bug/1827204	09:14
ubot5	Ubuntu bug 1827204 in cloud-init "Doesn't run unattended-upgrades on first boot by default" [High,Triaged]	09:14
march0	it's similar, but in my case, I would expect that cloud-init detects unattended-upgrade (that seems to be randomly started, as the issue is not easily reproducible), and wait for the dpkg lock to be release	09:27
march0	*d	09:27
march0	because runcmd depends on packages statement, so the provisioning is completely failed	09:28
Odd_Bloke	march0: Hmm, I'm honestly surprised we don't already have a bug filed for this, but I can't find one. A bug report (via https://bugs.launchpad.net/cloud-init/+filebug) would be great!	14:51
kryl	hi	16:20
powersj	question: does vendor data take priority over written cloud-config in /etc/cloud/cloud.cfg.d?	17:00
powersj	i.e. I modify an image to create a user in /etc/cloud/cloud.cfg.d, but then vendor data comes in with its own cloud-config and defines its own users	17:00
march0	Odd_Bloke, this is very similar to https://unix.stackexchange.com/questions/315502/how-to-disable-apt-daily-service-on-ubuntu-cloud-vm-image/474024 & https://github.com/systemd/systemd/issues/5659, but it breaks completely package install on fresh ubuntu at 7% rate (estimated)	17:15
march0	I start usually around 30 ec2 instances to get 1 or 2 failures	17:15
Odd_Bloke	march0: Right, which is why I'm surprised we don't have a bug already. :) Both the apt-daily services have a randomised wait time from boot, so it makes sense that you'd only see a proportion fail (those where $random_wait is less than the time your config takes to apply).	17:20
Odd_Bloke	We do have Before=apt-daily.service, I wonder if we're missing Before=apt-daily-upgrade.service?	17:21
Odd_Bloke	Hmm, though apt-daily-upgrade.service does have After=apt-daily.service (on my groovy machine, at least).	17:21
march0	I've been trying some workarounds with bootcmd but no luck so far. It's quite hard to wait or stop the unattended-upgrades, even in cloud-init early stage	17:25
Odd_Bloke	march0: So I wouldn't expect you to need to workaround this issue, we've already fixed it once: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1693361. If you file a bug and attach the output of `cloud-init collect-logs` on an affected system, we can dig into what's happening on your system, and figure out the root cause. :)	17:44
ubot5	Ubuntu bug 1693361 in cloud-init (Ubuntu Artful) "cloud-init sometimes fails on dpkg lock due to concurrent apt-daily.service execution" [Medium,Fix released]	17:44
Odd_Bloke	(I did not search for apt-daily previously; I _knew_ I'd seen a bug for this.)	17:45
Odd_Bloke	falcojr: So you're certainly not wrong that SSHing into LXDs is much slower than exec'ing, but a large part of that is due to this one sleep: https://github.com/canonical/pycloudlib/blob/1ac9d4c82fdfd5cb1407f70b8a2b17e02953569d/pycloudlib/lxd/instance.py#L85	20:30
Odd_Bloke	Unless LXD is incredibly fast, we'll miss finding an IP first time around, so this basically guarantees a 20s sleep before attempting to SSH into LXDs.	20:31
falcojr	sleep 20??? That's...a bit much	20:31
Odd_Bloke	Yeah, if I drop that to a 1, then it's more like 6s to run the first command via SSH.	20:33
falcojr	also, the original "_wait_for_cloudinit" that was probably still around when they reported long wait times used increasing sleep times	20:36
falcojr	there's also a ssh connect sleep that waits 10 seconds before trying again	20:37
rharper	Could you exec into the instance to find out if networking is up?	20:39
Odd_Bloke	Yeah, I tried that one first and it had no effect: the 20s sleep happens before it, so by the time we're trying to SSH connect, SSH has been up for (20 - 6)s.	20:39
Odd_Bloke	rharper: The context here is we're trying to decide what the appropriate default access method for LXD is: our existing understanding is that SSHing was much slower, so we were leaning towards `exec` for pragmatic reasons (we run these tests all the time, so saving 15s per test run will add up very fast). However, I then noticed that we were, suspiciously, taking 21-22s to SSH in every time, so went	20:41
Odd_Bloke	digging.	20:41
rharper	huh	20:42
Odd_Bloke	I suspect some of these timeout values are more sensible in the context of LXD VMs, but they're applied to containers too.	20:44

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!