[10:43] Hello [10:44] [ 13.512947] cloud-init[477]: Cloud-init v. 23.2.1-0ubuntu0~20.04.2 running 'init-local' at Fri, 04 Aug 2023 09:56:19 +0000. Up 13.23 seconds. [10:44] [ 13.515664] cloud-init[477]: 2023-08-04 09:56:20,232 - url_helper.py[WARNING]: Exception(s) [UrlError('Failed to parse: http://169.254.169.254/latest/api/token'), UrlError('Failed to parse: http://[fd00:ec2::254]/latest/api/token')] during request to http://[fd00:ec2::254]/latest/api/token, raising last exception [10:44] How can I fix this error? [10:44] It happens everytime I restart my Ubuntu AWS virtual machine. [10:45] that looks like a perfectly good URL to me [11:00] Yes, but it seems my vm has problems connecting to it at startup [11:00] datasource: [11:00] Ec2: [11:00] metadata_urls: [ 'http://169.254.169.254:80', 'http://instance-data:8773' ] [11:00] timeout: 5 # (defaults to 50 seconds) [11:00] max_wait: 10 # (defaults to 120 seconds) [11:01] Can I add this to the /etc/cloud/cloud.cfg file to skip this huge default timeout of 120s ? [11:03] [ 5.461936] cloud-init[869]: 2022-04-09 03:53:54,863 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/latest/api/token' failed [0/120s]: request error [Failed to parse: http://169.254.169.254/latest/api/token] [11:03] meena [11:07] why is it failing to parse these? that doesn't make sense [11:15] [ OK ] Reached target System Time Synchronized. [11:15] [ OK ] Finished Load AppArmor proâ¦s managed internally by snapd. [11:15] [ 6.557320] cloud-init[479]: Cloud-init v. 23.1.2-0ubuntu0~20.04.2 running 'init-local' at Fri, 04 Aug 2023 09:46:41 +0000. Up 6.27 seconds. [11:15] [ 6.560078] cloud-init[479]: 2023-08-04 09:46:42,205 - url_helper.py[WARNING]: Exception(s) [UrlError('Failed to parse: http://169.254.169.254/latest/api/token'), UrlError('Failed to parse: http://[fd00:ec2::254]/latest/api/token')] during request to http://[fd00:ec2::254]/latest/api/token, raising last exception [11:15] [ 6.565378] cloud-init[479]: 2023-08-04 09:46:42,205 - url_helper.py[WARNING]: Calling 'None' failed [0/120s]: request error [Failed to parse: http://[fd00:ec2::254]/latest/api/token] [11:15] [ 7.709959] cloud-init[479]: 2023-08-04 09:46:43,358 - url_helper.py[WARNING]: Exception(s) [UrlError('Failed to parse: http://169.254.169.254/latest/api/token'), UrlError('Failed to parse: http://[fd00:ec2::254]/latest/api/token')] during request to http://[fd00:ec2::254]/latest/api/token, raising last exception [11:16] The URL is ok, there's a parsing error, but there are not much details about it [11:16] Maybe something related to IPv6 ? [11:17] And it keeps retrying for 2 x 120 seconds [11:17] BrunoADuarte: please do not paste logs in the channel, use something like pastebin [11:17] BrunoDuarte: is this running on AWS? [11:17] ok, sorry [11:18] yes [11:19] why are you specifying the urls in the cloud-init config? the EC2 datasource should have the metadata urls builtin [11:26] do you have disable_ec2_metadata set in your user-data? [13:03] minimal so ealier you asked if my VM was on AWS. Yes it is... here's the last cloud-init startup log I got from it, the error ocurred several times [13:03] https://pastebin.com/UyN27ytg [13:04] The first error appears on line #65 [13:09] the fetches for [fd00:ec2::254] will fail as no IPv6 is setup [13:09] I assume these logs/errors are NOT from the 1st boot after this VM has been created. Is this correct? [13:11] correct. [13:11] This is 3 year old AWS Ubuntu instance that I always upgrade when there are new updates to it. [13:11] did you notice this in the logs: 2023-08-04 09:48:46,043 - DataSourceEc2.py[WARNING]: IMDS's HTTP endpoint is probably disabled [13:15] no... I never messed with the IMDS [13:16] have you tried wget/curl to it from the CLI? [13:16] no... [13:17] then try that to see what happens? [13:22] wget http://169.254.169.254/latest/meta-data/ [13:22] --2023-08-04 13:19:24-- http://169.254.169.254/latest/meta-data/ [13:22] Connecting to 169.254.169.254:80... connected. [13:22] .... [13:22] Saving to: ‘index.html’ [13:22] .... [13:22] 2023-08-04 13:19:24 (45.3 MB/s) - ‘index.html’ saved [320/320] [13:22] ok, and what's the current IP address of the VM? [13:23] but if fails for IPv6 [13:23] Connecting to [fd00:ec2::254]:80... failed: Network is unreachable. [13:24] inet 172.31.72.80 netmask 255.255.240.0 broadcast 172.31.79.255 [13:24] inet6 fe80::1486:10ff:fe06:d17b prefixlen 64 scopeid 0x20 [13:24] if IPv6 actually configured for the VM? I believe not [13:24] inet6 fe80::1486:10ff:fe06:d17b prefixlen 64 scopeid 0x20 [13:24] ok, that's a link-local IPv6 address, not sure if metadata server is reacable via that [13:26] on the AWS page there's no IPv6 assigned [13:26] *aws console page [13:28] not relevant, cloud-init only uses temp IPv4 (no IPv6) config to contact metadata server, and as cloud-init was unable to contact metadata server (for network config) it therefore only setup the IPv4 fallback config - IPv4 via DHCP, no IPv6 config [13:28] s/uses/used/ [13:28] line 40 [13:29] the version of cloud-init you're using only uses dhclient/IPv4 (newer releases use dhcpc for both IPv4 & IPv6) [13:32] can you run an "ip r" to show the routing table [13:34] default via 172.31.64.1 dev ens5 proto dhcp src 172.31.72.80 metric 100 [13:34] 172.31.64.0/20 dev ens5 proto kernel scope link src 172.31.72.80 [13:34] 172.31.64.1 dev ens5 proto dhcp scope link src 172.31.72.80 metric 100 [13:40] ok, that basically matches the routes cloud-init "manually" added before trying to contact metadata server, lines 54 & 55 [13:41] correct [13:41] you have the datasource urls defined in /etc/cloud.cfg or a file in /etc/cloud.cfg.d/ ? [13:41] no... the lines are commented [13:42] datasource: [13:42] Ec2: [13:42] metadata_urls: [ 'http://169.254.169.254:80', 'http://instance-data:8773' ] [13:42] timeout: 5 [13:42] max_wait: 10 [13:42] i though of adding like this, to it gives up after 10s [13:43] *so [13:43] they don't appear commented out there [13:44] this is what I want to add... [13:44] the actual lines are [13:44] # Ec2: [13:44] # metadata_urls: [ 'blah.com' ] [13:44] # timeout: 5 # (defaults to 50 seconds) [13:45] # max_wait: 10 # (defaults to 120 seconds) [13:45] *the default lines [13:47] this is a VPC instance rather than a "Classic" (non-VPC) instance, right? [13:49] correct [14:00] the "Failed to parse" error is coming from the urllib python library I think [14:00] which cloud-init uses [14:06] yes, from "url_helper.py" it seems [14:06] I mean it almost seems like urllib is where the issue is. The urls look ok to me from the logs [14:07] you said you upgraded, could something have gone wrong with that? [14:12] I'm not sure, the machine works fine I don't remember of any broken updates... maybe I can force the update of this library? [14:12] TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 'REDACTED'"` && curl -H " [14:12] X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/ [14:12] this command works perfectly from the cli [14:13] is the same request that fails with the url_helper lib [14:13] yeah, it looks to me like cloud-init never actually requests the url, the "parse" error happens just before then [14:13] I'm out of ideas, perhaps someone else in channel has some? [14:23] I ran out of ideas hours ago [14:23] I would start an iphython shell and look what the url_helper does step by step [14:24] only do this if you're desperate [14:38] meena , can I just run "sudo dpkg-reconfigure cloud-init" and leave only "NoCloud" enabled ? [15:06] BrunoADuarte: if you're in AWS, why would you have anything other than AWS enabled as source? [16:21] looking at the config schema I see various entries marked as "required: [ "abc" ]" but there is nothing on the docs website to highlight those things that are required [17:47] meena because the AWS EC2 source isn't working. [17:51] BrunoADuarte: have you tried deploying a new VM instance to see if that works? [17:58] It will probably work, otherwise I believe there would be a lot of issues reported, and I could not find anything related. It's a specific problem with my machine. It makes the boot take up to 5 minutes. [17:59] yeah that was what I was thinking to narrow down, that in general it works and that is is something specific to that VM instance [18:00] it would also givew you something to compare with, i.e. the contents of /etc/cloud/ with a working machine on the same OS version, same user-data etc [20:04] trying to catch up on BrunoADuarte's conversation here. the first strange thing with the logs is that line 11 telling us that the DataSourceNone was detected on this machine prior to the most recent boot logs from 23.1.2. I'd be curious what was failing before this boot of 23.1.2 as it may shed light on the problem. [20:07] The 2nd thing strange is the log "Calling None failed..." which tell us that at the point of failure implying that the url_reader_cb never successfully returned a valid url (due to the urlparse error).. yet urllib.urlparse is happy with that url as is. [20:07] python3 -c 'from urllib.parse import quote, urlparse, urlunparse; print(urlparse("http://[fd00:ec2::254]/latest/api/token"))' [20:07] ParseResult(scheme='http', netloc='[fd00:ec2::254]', path='/latest/api/token', params='', query='', fragment='') [20:25] blackboxsw: yeah it's all very strange [20:26] BrunoADuarte: if you get a chance to file an issue with cloud-init and attach your full cloud-init logs (obtained via cloud-init collect-logs) plus your the output of your configured network (ip r; ip -6 r; ip addr) that'd help debug a bit async on this, and with a bit more data. I'm wondering how the upgrade path could hit this flavor issue. [20:28] blackboxsw: he filed #4316 but didn't attach logs to it [20:28] But, there is obviously something else going on here as the DataSourceEc2 was not previously detected. And, it does look like the `url_helper.dual_stack` implementation may need to better handle exceptions in ipv4 vs ipv6 stacks when errors occur as it looks like it loops indefinitely. [20:28] ahh thx. [20:28] IPv6 uses link-local address to reach metadata server? [20:30] yep it does https://aws.amazon.com/blogs/networking-and-content-delivery/introducing-ipv6-only-subnets-and-ec2-instances/ [20:30] and same with the typical openstack/ec2 ivp4 too . 169.254.169.254 [20:31] theory being that the IMDS is on the private subnet locally accessible to the vm/instance being launched and won't need external routing to get to that addr [20:32] yupe. Of course some other providers in future could implement IPv6 access in a completely different way :-( [20:33] LL should allow cloud-init to avoid costly ephemeral dhcp client setup in init-local boot stage if we have enough data to understand a unique link local address we can assign pre-network setup. [20:33] Sure, I'll export the cloud-init logs and attach there. [20:33] BrunoADuarte: many thanks I was commenting on that issue with that request now. [20:40] blackboxsw: hmm, the IPv6 metadata server address is a ULA address, so wondering if AWS's RA announces a default route to that via the link-local gateway fe80:ec2::1, otherwise how would VM get to it [20:57] yeah, not certain offhand how that's sorted. [21:07] blackboxsw: actually I'm wrong, that AWS article doesn't make sense as fe80:ec2::1 is *not* a link-local address (as LL are fe80 followed by 56 "0" bits) and LL addresses are /64 so I'm confused how with LL only a machine can get to outside the LL /64 [21:27] Sorry, I think I got my wires are crossed here but EC2 ipv6 IMDS looks like it is **not** a link local address yet the default ipv6 config on aws instances is that systemd.networkd is configured LinkLocalAddressing=ipv6... Which gives typical instances LL ipv6 addresses `$ python3 -c 'from ipaddress import IPv6Address; print(IPv6Address("fe80::14:2aff:fe09:57b9").is_link_local); print(IPv6Address("fd00:ec2::254").is_link_local)' [21:27] True [21:27] False [21:27] ` [21:37] blackboxsw: that was my point, how to get to a ULA metadata server from a LL address without a route? (which would require an RA to announcing such a route and the distro to be configured to see RA announcements)