[15:54] hi, I'm not familiar with cloud init so I apologize in advance if I'm missing something basic. :) We're currently migrating systems from Amazon Linux 2 to Amazon Linux 2023. However, I've run into a problem were the data payload I define in terraform on the autoscaling group to be passed to cloud init on startup no longer gets created or executed when an instance built on its AMI is brought online. I suspect it's a configuration difference between them wit [15:58] any pointers appreciated! [16:16] JoBbZ: not sure what you mean. How exactly do you expect a autoscaling group to relate to cloud-init? A ASG is AWS infrastructure, not anything to do with a VM's OS that cloud-init would configure [16:40] via https://registry.terraform.io/providers/hashicorp/template/latest/docs/data-sources/cloudinit_config [16:41] and sorry, I mispoke.. it's part of the launch template [16:41] not ASG :) [16:49] (although the launch template is included in the ASG definition, but point being, that's the flow above) [16:55] JoBbZ: ok, so that's just specifying the user-data to provide to the instance [16:55] so what exactly is the problem? [16:56] have you checked the cloud-init logs on a new instance to verify it sees the specified user-data? [17:03] to check cloud-init user-data on the system in question `sudo cloud-init query userdata` [17:03] thx minimal for all the discussion fielding :) [17:07] could do with some help for a testcase I'm writing. The c-i module calls "which" twice but I'm not sure how to check the result of the *2nd* call to "which" (I have cloudinit.subp.which mocked) [17:10] Doing m_which.return_value = "xyz" matches for the 1st "which" call [17:10] Doing m_Which.return_value = None matches for the 3rd situation (neither "which" call gives any result) [17:14] blackboxsw: will do that.. ty. [17:15] minimal: yeah so I don't know a lot about this... I just know that my script 'ansible.cfg' ends up existing on the old systems for cloud init to execute, and doesn't on the new ones, but the terraform side is unchanged and I ran with with debug logging in terraform to ensure it's all generated correctly [17:16] ok, I see a base64 encoded? result on both servers with that query [17:16] JoBbZ: so either looking at /var/logs/cloud-init.log (if debugging is enable) or running the command that blackboxsw suggested will show you the user-data that cloud-init is using. [17:20] minimal: yes, the command comes back with encoded and/or encrypted data... === NightMonkey_ is now known as NightMonkey [17:21] i've read through /var/logs/cloud-init.log, but I don't see where the user data bit comes into play exactly. Basically the user data is supposed to create an executable called ansible.cfg, that exists in /var/lib/cloud/instance/scripts/ on the old AL2 systems, but it isn't created on the new AL2023 systems [17:24] ok, grepping for userdata in the log I see [17:24] what is the content-type of the user-data you are specifying in the terraform file and what is the contents of that user-data given to terraform? [17:25] [WARNING]: Unhandled non-multipart (text/x-not-multipart) userdata: 'b'H4sIAAA... [17:25] and that string matches what's coming back from my query about user data [17:25] and that message does *not* exist on the other system [17:26] so... guessing that's the issue? [17:26] can you show the relevant portions of your terraform file? [17:29] https://dpaste.org/taWaN [17:30] and I can get the debug logs from terraform one sec [17:31] JoBbZ: where is the part when you create an instance and set a user_data_base64 value for it? [17:35] you can see that here in the debug log: https://dpaste.org/u8tmL [17:35] in the rendered: section [17:36] The value in the rendered: section is what I get back from the command `sudo cloud-init query userdata` [17:36] as well [17:37] JoBbZ: I don't see any reference to a "aws_instance" resource to create a VM and pass it user-data [17:39] it's part of a launch template... so it gets put into the system when it comes up... regardless, as I noted, the user data that is rendered there is what cloud init is showing on both systems [17:40] it's just not having any effect on the AL2023 system, and there's the warning about (text/x-not-multipart) userdata not being handled on the AL2023 system [17:40] which is not the case on the AL2 system [17:40] and the 2 systems are running which cloud-init version(s)? [17:41] 19.3-46 amazon2.0.1 on the AL2 system [17:41] 22.2.2 on the AL2023 system [17:45] could be a difference in behaviour between those 2 versions [17:46] looking at current code, text/x-not-multipart is used when "a message is not multipart and it doesn't contain its own content-type" [17:47] however you have specified "text/x-shellscript" as content-type [17:50] ok.. I figured it must be a difference between the two releases as well, but not sure how/why [17:52] JoBbZ: have you tried *not* base64 encoding it to see if that makes any difference? [17:53] I've not, I can definitely try [18:01] actually... got something.. [18:02] in /var/lib/cloud/instance/ there is a user-data.txt.i file [18:02] on the old systems, it's seen as multipart/mixed with a boundary [18:02] on the new systems, it has that, but then it's followed by Content-Type: text/x-not-multipart [18:02] hmm, but if you're only providing a single text/x-shellscript then why would it be treated as multipart? [18:03] so seems to be some sort of issue there with handling the generated MIME data in 2.22.2 that didn't exist in 19.3-46 [18:03] on the old systems I mean [18:03] no, it's multipart... shell script and a #cloud-init config as well [18:03] so on the old systems, which are the content-type for each of the parts? [18:04] you can see both parts in this: https://dpaste.org/u8tmL [18:04] first part is the #cloud-config part, with content type text/cloud-config [18:04] ah........not sure if that is valid, I though for user-data you EITHER provided cloud-config OR a shell script, not both [18:04] second part is the script, with content type text/x-shellscript [18:04] no, you can combine them [18:04] that's documented :) [18:04] it's just not extracting correctly with 22.2.2 for some reason hm [18:06] https://cloudinit.readthedocs.io/en/latest/explanation/format.html#user-data-formats [18:06] "User data that will be acted upon by cloud-init must be in one of the following types." [18:06] at first glance the "in one of" implies ONLY one type [18:07] nah it says both can be used https://cloudinit.readthedocs.io/en/latest/explanation/format.html#mime-multi-part-archive [18:07] "For example, both a user data script and a cloud-config type could be specified." [18:07] which is exactly what I'm doing [18:07] :) [18:07] ah, further down, "For example, both a user data script and a cloud-config type could be specified." [18:08] right, just saw that [18:08] so something's changed in mime handling between the two releases I guess [18:08] I'll have to dig into that [18:09] just wondering tho if you need to specify "text/cloud-config" when using multipart [18:09] Begins with: #cloud-config or Content-Type: text/cloud-config when using a MIME archive." [18:10] ah, you are specifying that [18:10] yup :/ [18:10] and again works fine with the older cloud init version... everything seems to match the docs [18:13] hm.. so I just took /var/lib/cloud/instance/user-data.txt [18:14] base64 -d user-data.txt > file.gz [18:14] then [18:14] gunzip file.gz [18:14] and I get exactly what it is supposed to contain [18:14] it appears that the difference is the version fails to gunzip it [18:17] try not gzip-ing the user-data in terraform and see if that changes things? [18:18] yeah, going to do that next [18:18] I'm only 4k, well below the 16k gzip requirement [18:20] I'm not seeing any multipart content-type in your Terraform doc. I'd assume it should be present next to "base64_encode": true and "gzip": true [18:21] terraform is supposed to do that automatically [18:22] terraform creates the final mime document. everything on the terraform side is the same between both systems, it's cloud init that's different between them [18:23] I guess I'm just assuming it would appear in the JSON you provided (which includes stuff like "rendered" value [18:23] sure understood :) [18:23] I'm just wondering if current code is "stricter" in terms of MIME spec [18:23] maybe, but then I'd expect a lot of people's existing stuff to break [18:24] I'm on a very recent version of terraform as well (1.4.6) [18:24] wonder if amazon patched cloud init and broke it, always possible too [18:24] or redhat [18:24] (since AL2023 is based on fedora/redhat) [18:26] so when you did the base64 decode and gunzip do you see a MIME document? [18:27] yep [18:27] I see exactly what it should be [18:29] so it has a multipart/mixed content type? [18:29] yes [18:29] I found a bug report for this issue [18:29] c-i bug? [18:30] yeah [18:30] same exact issue I'm seeing [18:30] what's the number? [18:31] github.com/canonical/cloud-init/issues/3712 [18:31] -ubottu:#cloud-init- Issue 3712 in canonical/cloud-init "gzipped and base64 encoded user-data leads to failure" [Closed] [18:31] which is interesting, because it says fixed in 20.3 [18:31] and I'm on 22.2.2 [18:31] but same exact problem [18:34] I though you'd tried without base64 and gzip? [18:34] I'm trying w/o gzip now [18:34] takes an hour to rebuild [18:34] so I'll know more in about 40 minutes if removing gzip resolves it [18:43] good luck! [18:45] Hello comunity, what format is the "Fingerprint (sha256)" in cloud-init logs shown e.g. in "Authorized keys from /home/ubuntu/.ssh/authorized_keys for user ubuntu" ? I need to compare that "Fingerprint (sha256)" with fingerprints of my local public-private key pair. How can I compare them? [18:50] user1001: isn't it in sha256 format? ;-) [18:55] user1001: you use the relevant ssh tool to display your public key's fingerprint [18:59] yeah, if I generate from my public key I get "SHA256:SH8HjfHJdpu3MiAEpTuF4YrLNislS6GD+meUh11z44g" or "MD5:2e:00:ca:99:c4:6f:fb:78:2f:26:b4:f9:c4:84:83:78", but in the cloudinit logs I see a much longer fingerprint: "48:7f:07:8d:f1:c9:76:9b:b7:32:20:04:a5:3b:85:e1:8a:cb:36:2b:25:4b:a1:83:fa:67:94:87:5d:73:e3:88" (according to logs should be [18:59] "Fingerprint (sha256)" ) [19:22] user1001: on which version of cloud-init are you seeing this in the logs? I only see it on the console (but not logs) with 22.2 [19:22] oops, s/22.2/23.2/ [19:29] well just turning off gzip didn't do it... turning off base64 too [19:29]  @minimal  Cloud-init v. 22.2-0ubuntu1~22.04.3 [19:30] user1001: ok. I think this behaviour was changed to not log it. Anyway, what exactly is the underlying problem you are having? [19:35] minimal I am booting an jammy-server-cloudimg-amd64 image in OpenStack, and OpenStack injects a ssh key into "/home/ubuntu/.ssh/authorized_keys", but I cannot make a ssh conecttion - so I try to verify that the ssh key was injected correctly (already tried everything else) [19:42] cat /home/ubuntu/.ssh/authorized_keys ? [19:45] I was off by a month for the summit [19:45] whew [19:46] meena: panicing about travel plans? ;-) [19:49] minimal: i can't travel until my partner is fully licenced. but i would like to prepare stuff [19:52] it would be nice if that was the case next month… but that's not something i can actually bank on [19:53] minimal I have no access to the VM, as OpenStack only booted the source image so I can start installing Ubuntu, but I can't, because normal login does not work, and ssh access cannot verify my ssh key... [20:36] minimal: OK, it works after disabling both base64 encoding and gzip [20:36] I've filed a bug with cloud-init [21:49] if someone stumbles on this problem (xkcd.com/979 ;-) solution: the cloud-init log for the authorized_keys file is produced in cc_ssh_authkey_fingerprints.py by the function _gen_fingerprint(), which generates the long "Fingerprint (sha256)" for the public ssh key (the first function argument), so you can generate and compare the fingerprint of [21:49] your local public ssh key with the fingerprint reported in cloud-init log === not_phunyguy is now known as phunyguy === not_phunyguy is now known as phunyguy