/srv/irclogs.ubuntu.com/2021/10/13/#cloud-init.txt

gellertpHi, we're running EC2 instances on centos 7 and noticed that the cloud-init stage 'init-local' is suspiciously taking >50s in the EC2LocalDatasource step. Does anyone have any idea on what could be causing this?15:20
gellertpIn particular, the following log lines in /var/log/cloud-init.log were suspicious: 15:21
gellertp2021-10-06 13:42:55,162 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 took 40.044 seconds15:21
gellertp2021-10-06 13:43:05,174 - util.py[DEBUG]: Resolving URL: http://instance-data.:8773 took 10.011 seconds15:21
minimalgellertp: "resolving" sounds strange for the 1st url as its an IP address15:48
rharperThat sounds vaguely familiar with an issue RedHat saw in their build where the /etc/resolv.conf file was left in the cloud-image and included bogus DNS entries 15:52
rharperaw  .... gellertp is gone15:52
minimalrharper: think the use of "resolving" in the debug message is misleading as util.py uses urllib to parse the url but doesn't seem to actually do any DNS/hosts lookups (for a cursory look at urllib's parse.py)15:58
rharperlemme get the details ...  it was something deep 15:59
rharperminimal: I'm not finding it in my logs, but the source of the issue was that centos/rhel cloud images end up with an /etc/resolv.conf that had bogus 10.X  nameservers, the is_resolvable_url() check in the ec2 datasource attempst to resolve "bogus" domain names on purpose, normally this isn't an issue on systems during local time as we've not *yet* applied network config, but in these images, having the bad nameserver entry meant 16:13
rharpercloud-init waited for those DNS requests to bogus servers to timeout16:13
rharperhttps://opendev.org/openstack/kayobe/commit/9c1d085d2e52396d05397afb0f658224bda0087c   this is old, but represents the issue; something related to how the cloud images get built 16:13
ubottuCommit 9c1d085 in openstack/kayobe "Workaround issue in CentOS cloud images with resolv.conf"16:13
rharperit happens off and on over the years of building images; 16:14
rharperoh, and i see gellertp mentions centos7 now; so likely that.  Sometimes it can be triggered if you're customizing an image and you "boot it up in a vm" and don't know to clear out certain files.  Ubuntu images use /etc/resolv.conf as symlink into /run  so it's always ephemeral ; 16:15
minimalrharper: why would it try and resolve 169.254.169.254 though?16:15
minimali.e. an IP address16:16
rharperit's part of the url_handler.py logic , I'm not sure we parse the URL  with the intent of avoiding ips 16:16
rharperah, utils.is_resolvable() 16:20
rharperfor each of the metadata urls in the datasource, it checks if it's resolvable .  I don't quite have the history of why we do resolvable check on the IP, but the second URL in the metadata urls to trie is a hostname , and it would fail the same way; 16:21
minimalI remember the good old days when a domain could NOT begin with a number, so detecting an IP address was easy :-)16:24
rharperheh16:24
minimaleeek! although the RFCs appear to say that a fully numeric domain name, such as 1.2.3.4. , is not permitted (as TLD cannot begin with number) I'm guess that 1.2.3.4 (no trailing dot) still has to be resolved as going through search path this could be the valid name 1.2.3.4.mycompany.com16:33
rharperthe code adds the trailing dot IIRC 16:35
holmanbminimal: regarding resolving the URL - https://cloudinit.readthedocs.io/en/latest/topics/datasources/ec2.html17:05
=== mamercad46 is now known as mamercad
akutzHowdy. I just saw a bug come through on VMware's internal tracker that Cloud-Init v21.3's update in Photon is causing ssh key generation on existing hosts. Do we know if there's any known issues with 21.3 related to SSH key-gen?18:09
rharperakutz: there was the recent merge of a drop-in conf to prevent race between cloud-init and sshd-keygen@.service 18:09
rharperif photon is RHEL derivative, it may be that race 18:10
akutzPhoton is homegrown, but does use dnf.18:10
rharperhttps://github.com/canonical/cloud-init/pull/1028 18:10
ubottuPull 1028 in canonical/cloud-init "Add sshd-keygen disable drop-in conf" [Merged]18:10
rharperwell, in particular if they use RH based sshd package, it will have ssh-keygen@.service enabled (unless they disable it in the build) 18:11
akutzThank you for the quick response rharper!18:11
akutzYep, they have the sshd-keygen service - https://github.com/vmware/photon/blob/3.0/SPECS/openssh/sshd-keygen.service18:13
rharpercool, hopefully that PR should give them something to check;  without that, the best case was keygen would run first, make keys, cloud-init would delete it and regen.  and then you're good.  Across reboots, we've not seen any issues that I know;  18:30
rharperhttps://bugs.launchpad.net/bugs/1946644 18:30
ubottuLaunchpad bug 1946644 in cloud-init "After restart cloud-init reconfigured the machine hostname and ssh keypairs" [High, Confirmed]18:30
rharperakutz: ^ 18:30
rharpermaybe that one if not the keygen 18:30
akutzAh, thanks!18:31
akutzThere's nothing in the CI v21.3 upgrade of which we're aware that would cause a "cloud-init cleanup", including wiping out any indication of previous boots, right?21:15

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!