[14:16] <beantaxi> Hi all. Is this channel for cloud-init development only? Or is it ok for general cloud-init questions
[14:17] <rharper> beantaxi: always ask away ... someone might be able to help
[14:19] <beantaxi> Haha thanks! I'm _very_ new to cloud-init, though I'm very happy my EC2 startup actually is following an open standard. Anyway, I've excitedly moved to launch template & user data based startup, since why not just use that instead of learning terraform or what have you.
[14:19] <rharper> heh, cool.  welcome
[14:20] <beantaxi> Trouble is, it appears (perhaps misleadingly) that my userdata is not being executed till completion. Almost as though at some point, cloud-init says "ok, you've had long enough" and kills my script and decides to finish booting.
[14:20] <beantaxi> I'd be very surprised if that's what's happening, so I'm tring to dig in and get some more detail
[14:22] <beantaxi> For example, my user data is basically a bunch of apt installs, then a few mounts + writes to fstab, then some could downloads from S3 and some systemctl enables. But I keep getting these no good instances, because eg in cloud-init-output.log it appears to die somewhere in the middle, eg after my first mount
[14:23] <rharper> right, typically we look at the cloud-init logs;  if you can get into your system; then cloud-init collect-logs will create a tarball of cloud-init logs and state ..  it will package up /var/log/cloud-init* /run/cloud-init*  and include user-data, so if it's sensitive, you can edit those out and just paste a cloud-init.log;
[14:23] <rharper> beantaxi: is your script run via runcmd:  in user-data ?
[14:27] <beantaxi> I'm not sure about runcmd. But everything's in userdata. I have a launch template, where the base image is just EC2's Ubuntu 18.04 Server, and the userdata is my base64 encoded script
[14:31] <beantaxi> Ultimately I was able to 'fix' my instances, by uploading and running the script by hand, from a sudo -i shell. There only seems to be an issue during startup.
[14:31] <rharper> ok, so you should be able to find your decoded script in /var/lib/cloud/instance/scripts/
[14:31] <rharper> I'd first confirm it looks the way you expect decoded;
[14:31] <beantaxi> I've fired up a new instance, so I can grab the logs with collect-logs as you described. Thanks foe that! That sounds useful. And thatnks for the decoded script path! That'll be a great next step.
[14:32] <rharper> second, you can try to re-run it like cloud-init would with:    cloud-init --debug single --name cc_scripts_user --frequency=always ;    cloud-init will call run-parts on the that scripts dir;
[14:32] <beantaxi> Yesssss that sounds perfect
[14:33] <rharper> and lastly, if you use a #!/bin/bash -x   for your shebang in your script, then you can see the execution tracing output in /var/log/cloud-init-output.txt
[14:34] <beantaxi> That's the one thing I've actually done from the beginning. is it /var/log/cloud-init-output.txt or .log?
[14:36] <beantaxi> Backstory - a buddy has started a new job, with runaway k8s issues. k8s for everything. Unsurprisingly nothing works, and no one knows how it's even supposed to work. I told him 'have you looked at cloud-init? I think that's 99% of what you need.' So I'm hoping to demonstrate that (and perhaps get a little contract out of it.)
[14:37] <rharper>  /var/log/cloud-init-output.log
[14:38] <beantaxi> Ok good. That's what I've been looking at. It's unclear what it's relation is, to what AWS makes available in the console for 'Get System Log', but I presume that's some very AWS specific stuff going on.
[14:38] <rharper> beantaxi: speaking of k8s and cloud-init,  https://bugs.launchpad.net/cloud-init/+bug/1888822
[14:39] <rharper> this was just worked on last week; and it had to do with some k8s bootstrapping of secret-user-data ...  may not be related but figured I'd pass it along in case that was the issue
[14:42] <beantaxi> Thanks! It was a good read. Among other things, demonstrates people are successfully using cloud-init for much more elaborate scenarios than mine.
[14:43] <beantaxi> I was little afraid my issue was 'dont use cloud-init for anything over a dozen lines or so; that's not what cloud-init is for'
[14:55] <rharper> beantaxi: hehe, no there are some very elaborate and long scripts to setup hosts with cloud-init;
[15:05] <beantaxi> Murhpys Law: I just built a new image, and then launched a new VM from the new image, and both came up flawlessly. And I'd terminated the bad guys so I couldn't run the above scripts. But those are fantastic to have for future use.
[15:06] <rharper> =)
[15:08] <beantaxi> Actually in looking at my successful run, I notice I have an rsync in there, to sync a local disk up from a volume, and perhaps that's not really part of 'system startup'.
[15:09] <beantaxi> Do you guys have a recommendation, on whether to put that in a separate cloud-init step to run on start, or to use systemd, or other?
[15:15] <rharper> cloud-init will run every boot, not every cloud-init operation runs every boot;  you can create a script which cloud-init will run every boot, or only once or once-per-instance;
[15:16] <rharper> cloud-init can run things quite early ( a boot hook) ; user-scripts/runcmd typically run fairly late (by design, after networking is up and users created, files written, etc)
[15:16] <rharper> so it really depends on when you need to run the rsync; how often, etc.
[15:20] <beantaxi> That's actually how I found out about cloud-init. I wanted something to run on every VM start, not just VM creation, and I came across an AWS thing saying I could use a multi-part MIME file etc etc.
[15:21] <beantaxi> I'm new to systemd as well, so I'm musing if I want to go the multi-part MIME route or the systemd route. I'm happy to know any technical pros/cons if there's more to it than personal taste.
[15:22] <rharper> cloud-init only runs during boot; so after the bootup is finished, it's not active;  of course with a systemd unit you can start/restart it trivially;  having cloud-init re-run a script is also doable but likely more overhead of spinning up cloud-init to exec a script;  if it's meant to run more frequently than boot; I'd probably use write_files to create a systemd unit with my program being called from that
[15:26] <beantaxi> Ah - this is the bit where you use cloud-init / cloud-config directives, instead of pure bash
[15:28] <rharper> Right, write_files, and runcmd, you could use write_files to write out bot your unit and the script, and runcmd to invoke the script and the service if you like
[15:35] <beantaxi> I saw that ... I was a bit hesitant to learn that, instead of just writing bash, largely because I wasn't sure how I'd troubleshoot my cloud-config or see exactly what was going on. Of course I'd get the benefit of any error checking etc -- all the stuff I _should_ be doing but probably am not.
[15:36] <beantaxi> What's the implementation of write_files etc ... is it all little python functions?
[15:41] <rharper> almost all of cloud-init is written python;  the syntax for the user-data in put is in yaml, we have examples on our docs page, https://cloudinit.readthedocs.io/en/latest/topics/modules.html#write-files
[15:42] <rharper> for debugging/troubleshooting, we typically use LXD to run a system container with user-data attached to it;  that's faster than launching an image (if you don't have a dev setup with lxd, you can launch an ubuntu instance and use lxd from there)
[15:44] <beantaxi> That lxd maneuver sounds incredibly helpful. Everytime I need to debug an image startup issue I lose half a day, just waiting for VMs to start.
[15:44] <rharper> alternatively, if you deploy into an instance, you can test your configs with:  cloud-init --debug --file my-cloud-config.cfg single --name cc_write_files --frequency=always;  write your cloud-config that you want to text into the file and then repeatedly call cloud-init single , the --frequency=always means it will always execute that module
[15:45] <rharper> yeah, lxd part is nice;  we do something like :  lxc init ubuntu-daily:bionic b1;   lxc config set b1 user.user-data "$(cat my-user-data.cfg); lxc start b2;
[15:45] <rharper> s/b2/b1;
[15:45] <rharper> then you can lxc exec b1 bash;  and run cloud-init status --wait (this blocks until all of cloud-init is done);  and check your results;
[15:47] <beantaxi> lxd has seemed like black magic to me for quite some time. It's been bugging me, but I've never had a 'way in' to demystify it and actually use it as a productivity tool. This sounds perfect.
[18:40] <beantaxi> So, I've been musing if cloud-init could be used to deploy containers on separate AWS regions, or even across cloud providers.
[18:41] <beantaxi> And now, in a LXD youtube I'm watching from 2015, this guy talks about using LXD to migrate cloud_init _running containers_ from host to host. Wow.