caribou | Hello aciba, I just saw your comment on GH#5523 and added a reply. Let me know if you want to discuss it further | 11:33 |
---|---|---|
aciba | caribou: thanks for the ping, I think it is okay for the moment! | 15:42 |
hwrd | hello, I'm trying to figure out why my cloud-init script didnt fully run on Amazon Linux 2023. It's getting set up via Terraform, and so I cnt find the config filethe machine | 17:30 |
hwrd | it's not in /etc/cloud/cloud.cfg.d/, nor does /var/lib/cloud/instance/user-data.txt exit | 17:31 |
hwrd | exist* | 17:31 |
hwrd | I'm able to `ssh` intothe machine, which means part of my cloudinit script ran (to create my user and set authorized_keys), but the end ofthe script didnt create directories I expected | 17:32 |
hwrd | looks like it's just my runcmd tht's not working | 18:01 |
blackboxsw | @hwrd sorry for the delay here. You can check quickly if cloudinit has errors given one of the following commands: `cloud-init status --format-yaml` (will show you warnings or errors with potenital user-data or scripts). And `sudo cloud-init query userdata` (to see the user-data that cloud-init was provided by your cloud at launch). | 19:43 |
blackboxsw | @hwrd and finally: `sudo cloud-init schema --system --annotate` (to tell you about invalid user-data schema) | 19:43 |
blackboxsw | hwrd: If there were errors in your runcmd script, some of the stdout stderr may be redirected to /var/log/cloud-init-output.log so watch for script errors there | 19:43 |
blackboxsw | typo correction: `cloud-init status --format=yaml` | 19:44 |
hwrd | yeah so this `sudo cloud-init schema --system --annotate` tells me that `var/lib/cloud/instance/user-data.txt` doesn't exist, which is true. | 19:54 |
hwrd | and `cloud-init status --format=yaml` tells me `/usr/bin/cloud-init: error: unrecognized arguments: --format=yaml` | 19:55 |
hwrd | without `--format=yaml` it just says `status: done` | 19:55 |
hwrd | I think it may be this https://www.virtualthoughts.co.uk/2023/01/18/debugging-cloud-init-not-executing-runcmd-commands/ | 19:55 |
hwrd | I've rebuilt my packer image, about to stand up an instance now. | 19:56 |
hwrd | what's puzzling to me is, I don't know where my config is on the machine, if it's not at `/var/lib/cloud/instance/user-data.txt`. In fact `/var/lib/cloud/instance` doesn't even exist | 19:57 |
hwrd | `sudo cloud-init query userdata` returns empty blackboxsw | 20:03 |
hwrd | `sudo cloud-init schema --system --annotate` tells me `FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/cloud/instance/user-data.txt'` | 20:04 |
hwrd | right, so cloud-init-config.service wasn't the issue. | 20:05 |
hwrd | `/var/log/cloud-init-output.log` doesn't have any of my custom cloudinit config. the only thing tht stands out is, `2024-07-19 19:58:07,522 - schema.py[WARNING]: Invalid cloud-config provided: Please run 'sudo cloud-init schema --system' to see the schema errors.`, which as I mentioned, just tells me that `FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/cloud/instance/user-data.txt` | 20:07 |
hwrd | but, it looks like my cloudinit IS running, as all the files are there... but nothing in `runcmd` is running... but `bootcmd` is. | 20:07 |
minimal | hwrd: which DataSource are you using? | 20:09 |
hwrd | minimal: wht does DataSource mean? | 20:09 |
blackboxsw | given that you have no runcmd userdata provided to the current launch of the machine, cloud-init optimizes and ignores certain modules because no current user-data has a runcmd key that requires cloud-init to interact | 20:09 |
blackboxsw | datasource == target cloud platform | 20:09 |
minimal | where cloud-init gets meta-data/network-config/user-data from | 20:09 |
hwrd | AWS. I'm upgrading our AMI from Amazon Linux 2 to Amazon Linux 2023 | 20:09 |
hwrd | no code has changed for userdata | 20:10 |
minimal | AMI? ok, so likely using Ec2 DataSource | 20:10 |
minimal | assuming you're running in AWS rather than running Amazon Linux on a local hypervisor | 20:10 |
hwrd | probably. I'm doing it thru https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/instance#user_data | 20:11 |
hwrd | this exact same user_data works on Amazon Linux 2... but doesn't seem to on Amazon Linux 2023 | 20:12 |
hwrd | it looks like everything besides `runcmd` is being run... but I don't see any traces of anything being run in the logs | 20:12 |
blackboxsw | hwrd: so the reason you see bootcmd run but not runcmd is because bootcmd module is special and runs PER_ALWAYS, runcmd only runs when user-data containing a `runcmd:` key is presented to the instance at first launch. When cloud-init doesn't see certain top-level config keys in user-data it'll skip running that module because there's nothing to do and you see logs like the following: | 20:12 |
blackboxsw | 2024-07-18 12:11:17,344 - modules.py[INFO]: Skipping modules 'wireguard,snap,ubuntu_autoinstall,keyboard,apt_pipelining,ubuntu_pro,ntp,timezone,disable_ec2_metadata,runcmd' because no applicable config is provided. | 20:12 |
minimal | can you clarify if you're running in AWS rather than running Amazon Linux on a local hypervisor? | 20:13 |
hwrd | blackboxsw my user-data does contain `runcmd` though. | 20:13 |
hwrd | minimal: yes, running in AWS | 20:13 |
blackboxsw | hrwd: I'm confused as you also said above "`sudo cloud-init query userdata` returns empty blackboxsw" which on the target system shows there was no user-data provided to the VM launched in ec2 | 20:14 |
hwrd | blackboxsw that's confusing to me too | 20:14 |
hwrd | becaue | 20:14 |
blackboxsw | I think you are providing user-data to terraform, builds, but not user-data to the target platform you are deploying? | 20:15 |
minimal | "this exact same user_data works on Amazon Linux 2... but doesn't seem to on Amazon Linux 2023" - from memory AL2 contained a old and heavily modified (by AWS) cloud-init whereas AL2023 is "closer" (i.e. less modified) to mainstream cloud-init | 20:15 |
blackboxsw | ahh, hrm | 20:15 |
hwrd | blackboxsw it's getting there somehow, but not in the normal locations... | 20:15 |
hwrd | i.e, it's not at /var/lib/cloud/instance | 20:16 |
hwrd | for example, I hve ```write_files: - path: /usr/local/sbin/ec2-hostname.sh'` | 20:16 |
hwrd | and tht file exists on the instance | 20:16 |
hwrd | and all my `users:` are there too | 20:17 |
hwrd | but I cnt find where the userdata file is on the machine. | 20:17 |
hwrd | might userdata be somewhere besides /var/lib/cloud/instance/user-data.txt? | 20:18 |
hwrd | minimal are you sure tht's not the other way around? cuz I can't find anything in the canonical locations on AL2023 | 20:18 |
minimal | cloud-init.log should have an entry like "util.py[DEBUG]: Writing to /var/lib/cloud/instances/< uuid >/user-data.txt" | 20:20 |
hwrd | lemme check | 20:20 |
minimal | as /var/lib/cloud/instance is generally a softlink to a /var/lib/cloud/instances/< uuid > directory | 20:21 |
hwrd | minimal: nothing https://gist.github.com/hahuang65/53cd8a9531b5110b84a8524f6cc6e149 | 20:22 |
hwrd | weird | 20:23 |
hwrd | so /var/lib/cloud/instances/i-051c0d43e54061fd9/ only has a directory `sem` | 20:23 |
minimal | DataSourceNone... | 20:23 |
minimal | rather than Ec2 | 20:23 |
minimal | "Used fallback datasource" | 20:24 |
hwrd | what would be fallback datasource? | 20:24 |
hwrd | from my perspective, the userdata is DEFINITELY running | 20:25 |
hwrd | `+ '[' '!' -f /root/cloud-init-ebs-mounted ']'` this line, is from my userdata | 20:25 |
hwrd | all my scripts and files are place. Users and their `.ssh/authorized_keys` are all there | 20:26 |
blackboxsw | whoa, cloud-init 22.2.2 that is ooooold. | 20:26 |
hwrd | blackboxsw maybe I need to update it. weird that's what AL2023 ships with. | 20:26 |
blackboxsw | ok so there will definitely be some feature differences in this image when comparied to tip of main. | 20:26 |
hwrd | https://docs.aws.amazon.com/linux/al2023/ug/al2023-ami-kvm-image.html | 20:27 |
blackboxsw | yeah different distro downstreams grab upstream releases of cloud-init at different paces. And some of those downstreams have custom patches that may prevent you from grabbing latest | 20:27 |
minimal | blackboxsw: it is AL 2023 though ;-) | 20:27 |
hwrd | 22.2.2.. that's wht it ships with. | 20:27 |
blackboxsw | right :) | 20:27 |
minimal | 23.2.2 is from July 2023 | 20:27 |
blackboxsw | 22 is a year prior, but yeah close enough. the version isn't really the problem here it's something in how the config is being handled. and that fallback datasource indicated it couldn't connect to Ec2 properly | 20:28 |
hwrd | lol so, Cloud-init v. 19.3-46.amzn2.0.1 works with my userdata | 20:28 |
blackboxsw | and fell back to a basic/defaul config | 20:28 |
minimal | I'm assuming that the c-i version in AL2 didn't have the same level of schema validation | 20:29 |
hwrd | blackboxsw tht's wht it looks like from the logs... but it is DEFINITELY running my userdata | 20:29 |
hwrd | minimal I dont think the schema validtion is the issue. it's not even finding the user data file to validate | 20:29 |
minimal | hwrd: so how then is it running your bootcmd (in the user-data) if it cannot find the user-data? | 20:30 |
blackboxsw | minimal: bootcmd runs always on every boot regardless of user-data present | 20:30 |
hwrd | minimal: no idea... but shouldn't schema validation fail the entire file? | 20:30 |
hwrd | my question right now is... where the hell is my user-data.txt file | 20:30 |
minimal | blackboxsw: I though the bootcmd was running the specific bootcmd values specified in his user-data | 20:31 |
hwrd | `cloud-init` command expects it to be at `/var/lib/cloud` | 20:31 |
blackboxsw | ahh sorry missed that part | 20:31 |
hwrd | minimal blackboxsw yes, bootcmd is running the specific bootcmd values in my user-data | 20:31 |
minimal | hwrd: so did you check inside /var/lib/cloud/instances/ to see if uuid dir exists? | 20:31 |
hwrd | yeah, lemme send tht over | 20:31 |
blackboxsw | hwrd: probably best if you can paste your full cloud-init log somewhere. Double check it doesn't have passwords exposed in logs | 20:32 |
blackboxsw | https://paste.opendev.org/ or something | 20:32 |
hwrd | blackboxsw I did. it's here https://gist.github.com/hahuang65/53cd8a9531b5110b84a8524f6cc6e149 | 20:33 |
blackboxsw | hwrd: I thought that was a snippet. your paste starts with `Cloud-init v. 22.2.2 running 'init' at Fri, 19 Jul 2024 19:01:56 +0000. Up 8.67 seconds.` which is the second boot stage of cloud-init | 20:33 |
minimal | blackboxsw: looks like the cloud-init doesn't have debug enabled | 20:34 |
hwrd | oh tht's `cloud-init-output.log` | 20:34 |
hwrd | do you wnt `cloud-init.log`? | 20:34 |
minimal | ah, yes that's what we're after | 20:34 |
hwrd | k | 20:34 |
blackboxsw | correct. `cloud-init analyze show` should also hopefuly confirm that your env has run 4 separate boot stages | 20:34 |
blackboxsw | if any of those 4 stages is skipped, cloud-init won't run all your config, or won't properly detect the datasource(ec2) | 20:35 |
hwrd | https://gist.github.com/hahuang65/84aea524581f5dd5f6787d4ead755197 | 20:35 |
minimal | blackboxsw: <joke> don't you have some Windows boxes to fix? ;-) | 20:35 |
blackboxsw | tell me about it | 20:35 |
blackboxsw | #pay_no_attention_to_the_reviewer :) | 20:35 |
hwrd | here's the analyze https://gist.github.com/hahuang65/0f89e91f528717ee10da7c995b8e8db1 | 20:36 |
blackboxsw | hwrd: again that past starts with only `Cloud-init running 'init'` which makes me think we are skipping early detection stage. though I do see Cloud-init v. 22.2.2 running 'modules:config' and `final` and minimal's comment of not being in DEBUG log levels hurts us here as we really can't see much at all | 20:37 |
hwrd | bizarre... idk how to get those earlier stages | 20:38 |
blackboxsw | hwrd: are you sure that's /var/log/cloud-init.log ???? that really looks like cloud-init-output.log to me | 20:38 |
minimal | yeah he mentioned that, am waiting for cloud-init.log | 20:38 |
hwrd | https://gist.github.com/hahuang65/84aea524581f5dd5f6787d4ead755197 this should be `cloud-init.log` | 20:39 |
minimal | also "analyze" output mentions DataSourceEc2, not DataSourceNone | 20:39 |
hwrd | `sudo head /var/log/cloud-init.log2024-07-19 19:01:56,185 - util.py[DEBUG]: Cloud-init v. 22.2.2 running 'init' at Fri, 19 Jul 2024 19:01:56 +0000. Up 8.67 seconds.2024-07-19 19:01:56,186 - main.py[DEBUG]: No kernel command line url found.2024-07-19 19:01:56,186 - main.py[DEBUG]: Closing stdin.` | 20:40 |
blackboxsw | analyze also mentions runcmd too. | 20:40 |
hwrd | pretty bizarre. | 20:40 |
blackboxsw | 2024-07-19 19:01:58,686 - cc_runcmd.py[DEBUG]: Skipping module named runcmd, no 'runcmd' key in configuration | 20:41 |
blackboxsw | yet user-data confirmed empty `2024-07-19 19:01:56,436 - url_helper.py[DEBUG]: Read from http://169.254.169.254:80/2021-03-23/user-data (200, 0b) after 1 attempts` | 20:42 |
blackboxsw | 0b | 20:42 |
hwrd | but what configuration is it reading | 20:42 |
hwrd | if it's doing all the other stuff in my user-data | 20:42 |
blackboxsw | yeah strange | 20:43 |
hwrd | so /var/lib/cloud/instances/ has i-051c0d43e54061fd9 iid-datasource-none | 20:43 |
hwrd | both only have `sem` directories. no `user-data.txt` | 20:43 |
minimal | blackboxsw: wasn't there an issue in the past to do with user-data from some metadata servers (multi-part? compressed?) were it wasn't recognised | 20:45 |
hwrd | tht sounds like it COULD be it | 20:46 |
hwrd | or even this https://github.com/amazonlinux/amazon-linux-2023/issues/401 | 20:46 |
-ubottu:#cloud-init- Issue 401 in amazonlinux/amazon-linux-2023 "[Bug] - Custom cloud init hack for userdata is broken for AL2023" [Closed] | 20:46 | |
minimal | c-i 23.3 "Ec2: support decoding souble base64 encoded user-data" | 20:47 |
hwrd | yeh in Terraform, I've got ` base64_encode = true` | 20:47 |
hwrd | lemme try without it | 20:48 |
minimal | #4276 | 20:48 |
minimal | hwrd: that's the c-i fix for amazonlinux 401 | 20:49 |
hwrd | yeah I'm using the absolute latest amazonlinux 2023 | 20:49 |
hwrd | but worth a shot without base64 | 20:49 |
hwrd | this is the last thing I can try for now. gotta go make dinner for the litte ones. | 20:51 |
blackboxsw | ahh right, forgot about that PR/condition | 20:51 |
hwrd | looks to be the same issue. | 20:52 |
hwrd | dammit | 20:52 |
hwrd | yeh still `2024-07-19 19:01:56,436 - url_helper.py[DEBUG]: Read from http://169.254.169.254:80/2021-03-23/user-data (200, 0b) after 1 attempts` | 20:54 |
blackboxsw | yeah, I need to disappear too. thx minimal for the recall there on double compression which would have affected cloud-init 22.2 in that AL image being launched . since I'm not as familiar with terraform, nor amazonlinux setup, this makes things a bit more challenging to reason about. I'm still bugged by lack of an 'init-local' boot stage there too I would expect to see an init service trying to run that stage | 20:54 |
blackboxsw | one other thing in logs. `2024-07-19 19:01:56,853 - util.py[DEBUG]: Writing to /var/lib/cloud/instances/i-008c9917e10c667dc/user-data.txt.i - wb: [600] 308 bytes` | 20:54 |
hwrd | blackboxsw minimal thanks so much for your help so far. I'll be back to poke t it | 20:55 |
blackboxsw | I'm seeing non-zero user-data.txt.i file written. I'm curious what that userdata first line is | 20:55 |
minimal | hwrd: that "0b" (but 200 HTTP status code) seems to mean the user-data is just not there on the AWS's metadata server | 20:55 |
hwrd | hrm | 20:55 |
blackboxsw | if it's not `#cloud-config` cloud-init would ignore it | 20:55 |
blackboxsw | and when processing it takes user-data.txt.i and writes out an empty user-data.txt file in that directory | 20:56 |
hwrd | `sudo cat /var/lib/cloud/instances/i-008c9917e10c667dc/user-data.txtcat: /var/lib/cloud/instances/i-008c9917e10c667dc/user-data.txt: No such file or directory` | 20:56 |
hwrd | doesnt exit | 20:56 |
hwrd | exist* | 20:56 |
blackboxsw | wth :/ | 20:56 |
hwrd | lol right? | 20:56 |
minimal | blackboxsw: I'm assume the "0b" actually means "zero bytes retrieved" rather than "zero *valid* bytes retrieved" | 20:56 |
blackboxsw | -> stepping away (not to be confused with a rage quit ;) | 20:56 |
blackboxsw | minimal: correct that's num bytes read/written | 20:56 |
hwrd | blackboxsw :) | 20:57 |
hwrd | I'll be back. thanks again guys | 20:57 |
minimal | so if the metadata server doesn't provide any user-data to cloud-init then I'm not sure how it's a cloud-init problem | 20:57 |
blackboxsw | so it got non-empty, did something with it and didn't like format so userdata processing wrote out an empty user-data.txt file in that particular case | 20:57 |
blackboxsw | minimal: yes, or if the user-data provided by packer wasn't the right format for some reason and cloud-init silently ignored it, that's the only potential prob I see for cloud-init. So if possible can hwrd paste later a clean/safe version of the original user-data provided? | 20:58 |
blackboxsw | or on the target system with cloud-init run `cloud-init schema -c <your_user_data> --annotate` | 20:59 |
blackboxsw | it should tell you if the raw user-data being provided is bogus too for some reason (though cloud-init 22.2 may not have great schema support for --annotate) | 20:59 |
blackboxsw | it should tell you if the raw user-data being provided is bogus too for some reason (though cloud-init 22.2 may not have great schema support for --annotat) | 20:59 |
blackboxsw | it should tell you if the raw user-data being provided is bogus too for some reason (though cloud-init 22.2 may not have great schema support for --annotate) | 20:59 |
blackboxsw | <- steps away | 21:00 |
minimal | or was the user-data not in place on the IMDS in time? At 19:01:56 it is 0 bytes retrieved from IMDS, at 19:58:06 (2nd boot) it is 5326 bytes! | 21:00 |
=== blackboxsw is now known as blackboxsw_away | ||
minimal | some async behaviour regarding VM creation and IMDS data population? | 21:00 |
hwrd | but for cloud-init schema -c <your_user_data> --annotate... I dont know what <your_user_data> is.. that file doesnt exit | 21:02 |
hwrd | exist | 21:02 |
blackboxsw_away | hwrd: I mean cut-n-paste it into a file on a system with cloud-init installed | 21:03 |
blackboxsw_away | cloud-init -c my-file.yaml --annotate | 21:03 |
blackboxsw_away | cloud-init schema -c my-file.yaml --annotate | 21:03 |
hwrd | ah... hrm... I'm actually not sure how terraform stitches allthe parts together. I need to figure out if Terraform will output the entire file for me. | 21:03 |
hwrd | or I can steal it from another machine. | 21:03 |
blackboxsw_away | or `lxc launch ubuntu-daily:noble test-n` to launch an ubuntu system which will have newer cloud-init installed | 21:04 |
blackboxsw_away | and could run cmds there with your file (if the AL virtual machine doesn't give you SSH) | 21:04 |
hwrd | ah it's base64 on this other system. I'll be bck tonight to untangle more | 21:07 |
hwrd | wait no... it's not... ...why is `user-data.txt` some binary formt? | 21:08 |
minimal | the main question is why is user-data present on IMDS at 19:58:06 but not at 19:01:56 ? | 21:09 |
blackboxsw_away | yep | 21:14 |
blackboxsw_away | and whatever that fmt is it wasn't digested by cloud-init 22.2 (because PR 4276 landed in cloud-init 23.3 I think) to deal with that double compressed binary | 21:14 |
hwrd | then the double compressed isn't the issue then. this user-data was running on version 19 (Amazon Linux 2) | 21:19 |
hwrd | minimal: it seems to have read it from IMDS the second go around... but didn't save it anywhere. | 21:20 |
minimal | hwrd: well generally the "important" cloud-init stuff happens on 1st boot | 21:55 |
minimal | so user-data not being available then is not good | 21:55 |
hwrd | hrm I gotta figure out why then | 21:59 |
hwrd | but actually if I curl it after... it's still empty. wonder why | 21:59 |
minimal | don't you need a "security" token to curl it? | 22:04 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!