/srv/irclogs.ubuntu.com/2024/07/22/#cloud-init.txt

=== blackboxsw_away is now known as blackboxsw
hwrdwelp... back at it... I kicked up a brand new instance and it looks like it grabbed the user-data first try, but it's still ignoring the runcmd https://gist.github.com/hahuang65/a9b042587c33709b330d8841ecbe6e4b16:58
hwrdblackboxsw minimal don't know if you guys are around, but I trolled thru the above log ^, and shortened it to what I thought were teh relevant bits for user-data https://gist.github.com/hahuang65/4ea7f1b36930131ed4c89631c48f501517:13
hwrdlooks like it SHOULD be working properly...17:13
minimal"Skipping module named runcmd, no 'runcmd' key in configuration"17:14
hwrdBUT also.. it says it writes the user-data... but17:14
hwrdcat: /var/lib/cloud/instances/i-0492894c085df3346/user-data.txt: No such file or directory17:14
hwrd>_>17:14
minimalwhat is the contents of your user-data?17:15
hwrdoh right... so how do I get that, from an instance that works? I see on this other instance that the `/var/lib/cloud/instances/<id>/user-data.txt` is some encoded formt17:16
hwrdotherwise, I have to get it out of terraform, which stitches a bunch of yaml files together.17:16
minimalfrom however you "launched" the Vm and provided user-data at launch time17:16
hwrdminimal these are the 3 parts that Terraform merges together into my user-data https://gist.github.com/hahuang65/3a3ddf7a378ce854bab8b3546aa71c5f17:22
hwrdI can't easily get to the end result... but I know this is working on Amazon Linux 2, with cloudinit v19.x.x17:22
minimalit is not just "runcmd", there is no sign of it creating the "hhhuang" user either17:27
hwrdminimal yes but...17:28
hwrd[hhhuang@ip-172-16-52-97 ~]$ whoamihhhuang17:28
minimalor doing write_files17:28
hwrdwhoops, newline didn't paste17:28
hwrdbut the user IS there... and so are all the writefiles.17:28
hwrdfor example17:29
minimalhang on, you have "users:" in base.yaml, rails.yaml, and users.yaml - are you sure this is merged correctly?17:29
minimaland merged by "what"? terraform?17:29
hwrdmerged by terraform17:29
minimalwell the logfile does NOT show the users_groups (which creates users specified in cloud-config user-data) module being called17:30
hwrdminimal https://registry.terraform.io/providers/hashicorp/cloudinit/2.2.0/docs/data-sources/cloudinit_config17:30
hwrdminimal yeah, I understand. so something is jacked. the log doesn't seem to show tht user-data exists... but something is running the user-data... but I also can't find a trace of the user-data on the filesystem.17:31
hwrdI do have a machine that I have the `user-data.txt` as cloudinit sees it and writes it down... but it's encoded, and I'm not sure how to decode it.17:32
hwrdbut my gut tells me I need to figure out why this new machine isnt logging the user-data, but is running parts of it any wy.17:32
hwrdanyway*17:33
minimalencoded? /var/lib/cloud/instance/user-data.txt should show the actual user-data used by cloud-init, not any "encoded" version of it AFAIK17:33
hwrdif I do `sudo head /var/lib/cloud/instance/user-data.txt`17:35
hwrdI get this17:35
hwrdminimal https://imgur.com/a/WgttfMf17:35
minimalI don't tend to use imgur as it wants to load lots of JS from 101 "random" places...17:39
hwrdsorry, any good place to upload a screenshot minimal17:43
minimaldunno17:45
minimalanyway looking at /var/lib/cloud/instance/user-data.txt on a VM here it starts with "#cloud-config" as that is exactly what the user-data provided to cloud-init started with17:46
hwrdoh, is it because this is gzipped and base64'd17:46
minimalwhereas the user-data.txt.i file begins with "Content-Type: multipart/mixed"17:47
hwrdah, yeah, I have to `mv user-data.txt{,.gz}` and then `gunzip user-data.txt.gz` and now I can read it17:49
hwrdlemme paste tht17:49
minimalso what does the user-data.txt.i file contain?17:52
hwrdminimal https://gist.github.com/hahuang65/03fa989b8dbfb527c5552b995c87841c < that's user-data.txt17:53
hwrdlemme get the .i17:53
hwrdoh the .i is pretty similar. do you want the paste17:54
hwrdI had to remove some sensitive data/sections out of the file17:54
minimalif you notice the user-data.txt is not "plain", it is a 3-part document with each of your original YAML files unmerged17:56
minimalthis looks more like what I'd expect for a user-data.txt.i file17:56
minimalperhaps blackboxsw has some ideas...17:56
hwrdwould that cause cloudinit to not write the file down in the first place?17:56
hwrdcuz on the busted instance, there's no `/var/lib/cloud/instance` directory at all17:57
minimalI notice each of the 3 parts has Content-Type: text/plain, whereas when I look at a (single part) user-data.txt file here it has Content-type: text/cloud-config17:58
hwrdhrmmm.17:59
hwrdI WONDER if I need to update the terraform provider.17:59
minimalI have not to-date used user-data merging so can't help regarding that18:01
hwrdhrm, I'm guessing this isnt it. I went from 2.3.2 to 2.3.418:02
hwrdwow someone is having the same issue as I am https://stackoverflow.com/questions/78769521/cloud-init-fails-on-amazon-linux-2023-but-works-on-amazonlinux218:05
minimalthat sounds like it may be related to the issue I mentioned the other day18:10
hwrdI'm still confused why nothing shows up in the logs, and yet parts of my config are being run.18:20
hwrdhrm... why does `cloud-init.log` have entries from multiple dates... if I'm tearing down the EC2 instance in between runs?18:29
hwrdminimal you think it's worth trying that user-data.txt frile from the working instance on the new instance?18:36
minimalthe file you provided only has entries for 22nd July18:36
hwrdminimal I cut out the above dates18:36
hwrdthere were more lines above18:36
minimalthat makes no sense at all, unless the AMI already had a non-empty cloud-init.log file18:37
hwrdoh yeah that's gotta be what it is. Packer fires up a new instance to make the AMI18:38
minimalPacker? you mean this is NOT an official AWS AmazonLinux3 AMI?18:38
hwrdhah, I guess I should have mentioned that. Yeah it's NOT an official AMI18:39
minimalwell then all bets are off18:39
hwrdI should try with the official one.18:39
minimalPacker modifies a *running* VM18:39
minimalcloud-init is designed to run on 1st boot of a VM18:39
minimalcorrection, to do *most* of its work on 1st boot18:40
blackboxswbah, sorry I thought you said terraform to start? ok so w/ packer you've created a custom  AMI that happened to run your user creation or user-data on first boot during the packer AMI image build. So, that's why your user exists, then the way you are trying to redeploy that dirty AMI (which had run cloud-init once already) through terraform is not providing the user-data somehow to the ec2 instance via the terraform launch?18:41
minimalso if you're using Packer then you need to ensure that you tidy up the VM to remove any cloud-init "state" before saving it as an AMI for later use18:41
hwrdblackboxsw nope, sorry for the confusion. Everything I sent you guys was post-AMI-creation. I'm not (intentionally) running any cloudinit when I build the packer AMI.18:43
blackboxswif you are creating a golden image via some tooling that you wish to reboot and have cloud-init use that AMI and run as if it was first boot, you'd need to run `sudo cloud-init clean --logs --machine-id`  in that MI before snapshotting that AMI. You'd probably also likey want to remove the custom user you created too unless that's an artifact you want in all VMs launched with that AMI18:43
minimalhwrd: why are you modifying the original AMI in the first place?18:43
blackboxswhwrd: I get that, it's packer that runs cloud-init to customize the AMI under the hood.18:43
hwrdminimal mostly to compile Ruby, so that instances created by Terraform don't take 30 minutes to bootstrp18:44
hwrdI'll try the cloud-init clean18:45
hwrdthe full list of things I do in Packer is... install dev libs (for compiling Ruby), install tools we want on every instance (zsh, datadog, mosh, ssm-agent, codedeploy-agent), compile Ruby18:47
hwrdthen, when Terraform fires up an instance, we have it populate scripts specific to that instance, and set up users18:48
blackboxswI presume given the previous discussion though, that cloud-init clean is only going to drop previous logs from your system, you are still going to be in a state on "first boot" of your custom AMI image in terraform that user-data content being presented isn't being seen/processed by cloud-init on the new VMs 'first boot'. But, at least you'll have a clean state and logs to better determine the source of the problem.18:49
hwrdoh I think the issue with my `runcmd` is that I can't do it when I make the AMI. My `runcmd` sets up a a bunch of dirs, and it needs the EBS volume to be attached.18:49
hwrdblackboxsw I think you're right.18:50
hwrdit's just a bit confounding that this all worked on the AL2 AMI. Never expected to hit this hurdle when my original task was to update our AMIs to AL202318:50
blackboxswI still think this may be related to the user-data format or compression not being understood by cloud-init and so it may be ignoring that user-data content on final AMI launch in terraform.18:51
hwrdthought I was going to mainly be fighting with dependencies.18:51
hwrdblackboxsw you may be right, but how do you explain that most of my user-data is run?18:51
hwrdeven though, there's no trace of it running18:52
minimalhwrd: isn't the point that the user-data is actions when Packer is run, not later when the revised AMI is used?18:52
minimals/actions/actioned/18:53
hwrdminimal not if there are certain dependencies that aren't met until a concrete instance is required?18:53
minimalI don't understand18:54
hwrdwhen Packer is run, an AMI is built. I then want to boot up instances using that AMI, which will run user-data for the instance, and not the AMI18:55
minimalI mean if the user is created when Packer runs then that user will be present when the AMI is later run and so it doesn't matter than cloud-init then doesn't create the user as it was previously created during Packer run18:55
blackboxswhwrd: correct per minimal's comment, I believe packer is going to trigger a cloud-init run during AMI/image creation (which is creating your default user etc), so that's when your user-data is being consumed. When you then launching images in terraform trying to reference your custom AMI, I'm guessing cloud-init had already run once during that AMI creation which created that default user.18:55
hwrdfor exmple, I dont have a hostname for the AMI, but will for the instance18:55
hwrdI cant link EBS volumes for an AMI if they're going to be detached again18:55
hwrdright, I understand what you guys are saying about how cloudinit is intended for18:56
hwrdbut regardless of the intentions... there is a technical reason why it's no longer working for me. whether or not that was an intended "fix" from cloudinit's perspective, I dont know18:57
hwrdAll I know is, it was working on AL2 with cloudinit 19... and now on AL2023 and cloudinit 22... it's not. My process has remained the same18:58
hwrdbut I guess if my end result is the same (I can upgrade our systems to AL2023), it doesn't matter how I get there18:58
hwrdare you guys suggesting, I move all my user-data to when Packer is creating the AMI?18:59
hwrdreally, everything is working, except this: https://gist.github.com/hahuang65/c104af2ea2644dc69889db8015f90b7219:01
hwrdbut, like I said, `/data/` requires my final EBS volume to be attached and mounted19:01
hwrdand there's smething funky with codedeploy-agent and ASGs that don't work well, unless I specifically disable them in the AMI, and start it with `runcmd`.19:02
hwrdI'm sorry if what I'm saying sounds totally whack-o19:08
hwrdyeh I think if this doesnt work out today nd tomorrow, I'm gonna migrate the `runcmd`s to codedeploy hooks.19:16
hwrdruncmd runs once... and bootcmd runs every time?19:18
hwrdblackboxsw `/usr/bin/cloud-init: error: unrecognized arguments: --machine-id` is that a new flag, beyond v22?19:35
blackboxswhwrd: because the packer image build -> terraform reuse of that AMI involved multiple boots where cloud-init is involved. I'm guessing that the way user-data is being provided via terraform in the VM launch using your custom AMI is what is somehow presenting user-data in a way that cloud-init in AL2023 doesn't like which results in redacting all user-data (or skipping processing it).   19:35
hwrdI can see tht... but it ISN'T skipping processing it. only the `runcmd` portion... for some reason.19:36
blackboxswhwrd: oops yes, I think on AL2023 you can `echo "uninitialized" > /etc/machine-id` in absence of that --machine-id setting19:36
blackboxswis it possible that whatever mime part is adding the `runcmd:` section it is being ignored? `sudo cloud-init query userdata` on your VM that was booted was showing no content.19:37
hwrdblackboxsw I have no idea. yeah `cloud-init query userdata` shows no content cuz `/var/lib/cloud/instance/user-data.txt` doesn't exist19:38
hwrdis it worth schleppng over `user-data.txt` from another machine to the problematic machine, and run that somehow?19:39
blackboxswGiven that this appears to be a bug affecting others in the wild (given your stackoverflow link) it might be worth filing a bug against cloud-init with the attached full logs (via cloud-init collect-logs) and representing how you created the AMI in packer and launched the AMI in terraform. then we can see how terraform config is providing user-data and what the clean boot logs are on the terraform deployed system.19:40
blackboxswhttps://github.com/canonical/cloud-init/issues/new/choose19:40
blackboxswThe fact that no /var/lib/cloud/instance symlink exists is a pointer to a problem in datasource detection I believe19:41
hwrdyes, but in my logs...`2024-07-22 18:04:54,076 - util.py[DEBUG]: Creating symbolic link from '/var/lib/cloud/instance' => '/var/lib/cloud/instances/i-0d45a58160fcd4eea'` 19:42
hwrdand19:42
hwrd`2024-07-22 18:04:54,268 - util.py[DEBUG]: Writing to /var/lib/cloud/instances/i-0d45a58160fcd4eea/user-data.txt - wb: [600] 12342 bytes`19:42
hwrdis tht bizarre or what?19:42
hwrdI'll file an issue for sure19:43
hwrdblackboxsw is this weird: Failed collecting file(s) due to error:[('/run/cloud-init/cloud-id', '/tmp/tmpuclmpz4q/cloud-init-logs-2024-07-22/run/cloud-init/cloud-id', "[Errno 2] No such file or directory: '/run/cloud-init/cloud-id'")]19:45
hwrdphew... took a bit, but here's the issue blackboxsw https://github.com/canonical/cloud-init/issues/553320:15
-ubottu:#cloud-init- Issue 5533 in canonical/cloud-init "Previously working configuration on Amazon Linux 2 no longer works on Amazon Linux 2023" [Open]20:15
hwrdah yes, so if I copy the user-data from the working machine to the broken machine, and run `cloud-init schema --system --annotate`, it tells me `# E1: File None needs to begin with "#cloud-config"`20:23
hwrdwhereas cloudinit on the working machine (v19.x.x), `cloud-init schema` isn't even a commn.20:23
hwrdcommand*20:23
hwrdstill doesn't explain why parts of my user-data were run though.20:26
esvhey folks, over the weekend I tried to convince a VM(cloud image) built with ssh_pwauth false to set it to true, haven't been able to.20:27
esvnormally I would just edit /etc/cloud/cloud.cfg and set ssh_pwauth to true and be done with it, but after doing a "cloud-init clean --logs --seed" , the setting is just ignored20:46
esvok, here we go... the VM is using: /bin/cloud-init 23.1.1-11.el8_9.120:47
esv    original /etc/cloud/cloud.cfg: https://bpa.st/2HDQ  ; cloud-init query --all : https://bpa.st/QDMQ    20:47
esvso, I guess the questions here is, is there a way to alter the behavior or is it cooked for the life of the deployed server?20:49
minimalhwrd: I said earlier that it is NOT the case that "parts of my user-data were run though" from the cloud-init.log output you provided, none of your user-data appears to be run THEN22:39
minimalit may have been run earlier (i.e. when you used Packer)22:39
hwrdthat shouldn't be the case. I agree that it appears that none of it ran. however packer doesn't have access to the user data code, so there's no chance it was run with packer. 22:40
hwrdthe user-data code only exists in the Terraform repository 22:40
minimalhwrd: sigh, when you run a VM using Packer then cloud-init will be started during *that* boot22:41
hwrdright22:41
minimalwhich "is not good"22:41
minimalas then, unless you tidy things up, when you launch a VM using Terraform then cloud-init may thing this is not "at boot"22:42
hwrdI can boot up the packer built AMI and see that none of the user-data stuff was run22:42
minimals/at/1st22:42
minimalwell if the cloud-init log does NOT show it creating etc then somehow they were otherwise created22:42
hwrdmy user doesn't even exist. I can't ssh in unless I use the shared key22:42
minimals/etc/users etc/22:43
hwrdor, is it possible there's some other process running cloudinit and wiping it? 22:43
minimalyou tell us, it's your AMI/VM22:43
hwrdcuz I see in cloudinit logs that it's writing out the user-data file. but when I look, it's gone. 22:43
hwrdhow can I check? 22:43
minimalI don't know, I'm only commenting on the logs you provided which appear to show cloud-init basically doing nothing with user-data22:44
hwrdyeah I'm pretty suspicious about the logs22:44
hwrdI'm not sure what to believe 22:45
minimalthe logs show cloud-init complaining about the user-data schema22:45
hwrdthe logs also show cloudinit saying it's writing out a file, but that file is non existent22:46
minimalthough that is a warning, rather than an error22:46
minimalyou earlier said there were older logs entries. Have you tried tidying up cloud-init in Packer and then testing the "new" AMI again?22:47
hwrdoh yeah so I ran my user-data from the other instance thru cloud-init schema and it did complain that it didn't start with the cloudinit comment22:47
minimalwhich other instance?22:47
hwrdminimal: nope cuz AWS didn't let me SSH in during packer build. I gotta try to rebuild the AMI again22:47
hwrdthe older Amazon Linux 2 instance, the one that everything is working on22:48
minimal"cloudinit comment"? you mean "#cloud-config"?22:48
hwrdyeah, sorry typing from a phone 22:48
minimaltechnically that has *always* been a requirement for valid cloud-config user-data YAML22:49
hwrdhttps://github.com/canonical/cloud-init/issues/5533#issuecomment-224375366322:49
-ubottu:#cloud-init- Issue 5533 in canonical/cloud-init "Previously working configuration on Amazon Linux 2 no longer works on Amazon Linux 2023" [Open]22:49
hwrdsee the comment I linked. 22:49
minimalit is just that older versions of cloud-init didn't do a validation of the provided user-data against the schema whereas new versions do22:49
hwrdgot it22:51
hwrdso then Terraform has been doing it wrong. 22:51
hwrdwill cloudinit refuse to run it if it's not valid? 22:51
minimalso your user-data was *always* "wrong" (i.e. syntactically invalid)22:51
minimalI already pointed out the message in the logs is a WARNING, not an ERROR22:51
hwrdright22:52
hwrdbut whys it warning... when there's no user-data? 22:52
hwrdthere are so many little weird inconsistencies here22:52
minimalsigh22:52
minimalbasically I would ignore EVERYTHING until you test a VM creation where there are no cloud-init log entries dated from before you created the VM22:53
minimalas the earlier timestamped logs signal to me that things are not right22:54
hwrdokay, I'll try that22:55
minimalalso remember the other day when I pointed out there was a AWS related bug ("ec2: Support double encoded userdata #4276) whose fix is NOT present in the cloud-init version you're using? This may or may not be a factor in your problems (the format of the /var/lib/cloud/user-data.txt you mentioned earlier makes me wonder)22:57
hwrdah yeah, so that didn't pan out. I disabled gzip and base64 encoding. same issue22:58
hwrdI'll come back after I get that fresh AMI build 22:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!