[02:46] <prologic> Hi all 👋 I'm having a lot of difficulty trying to run a simple script per instance (i.e: cloud-init-per once) via cloud-init's runcmd module. It looks like (but I can't prove it) that cloud-init silently kills my script for running too long?
[02:46] <prologic> Is this at all the case? I can find no documentation on this.
[02:46] <prologic> My script does: wait for an ip to come up (from dhcp) excluding a few interfaces we don't care about, once that ip is known, reconfigure some pre-installed software (from the packer image) and restart some services.
[02:47] <prologic> Alternative question; Am I abusing cloud-init here?
[11:52] <stevenm> Odd_Bloke, so I read ... what you wrote
[11:54] <stevenm> That it really - they were words
[11:54] <stevenm> They did at least enter my brain
[11:55] <stevenm> I really hate the word cloud
[12:13] <meena> prologic: how long is that script running for when you run it without cloud-init? and, why can't you do that configuration via cloud-init's Network… things… netplan let's you do fairly complex configuration scenarios, and that's basically what the v2 network config format is
[12:14] <prologic> Oh I see so I am abusing cloud-init's runcmd(s)
[12:15] <prologic> Can you point me to where I can read more about this network / netplan stuff?
[12:15] <prologic> As for long long, well as long as it takes dhcp to assign an ip to the interfacee I'm interested in
[12:15] <prologic> so not long
[12:16] <prologic> multiple seconds I guess, I can't get it to work in cloud-init via runcmd so 🤷‍♂️
[12:45] <unix_> prologic, https://paste.unix-comp-airnet.net/paste/5eHuuzBg#2jnV-G+fjpy5l+uT8P5IhV9vp9mxvzyr8vtdcTzD+Xc
[12:46] <unix_> i do the same on a smartos system
[12:46] <unix_> host: smartos
[12:46] <unix_> vm: centos
[12:46] <unix_> "user-script" section
[12:48] <prologic> I don't understand what you're showing me
[12:48] <prologic> this just looks likee you're configuring the network
[12:48] <prologic> I'm trying to configure a piece of (questionable) software _after_ the network is up
[12:48] <prologic> or more precisely after a particular interface has an address I know is routable
[13:52] <austinK> Good day anyone able help answer some questions?
[14:58] <Odd_Bloke> falcojr: https://github.com/canonical/cloud-init/pull/829 is now un-WIP'd and ready for full review.
[14:59] <Odd_Bloke> stevenm: Clouds, clouds everywhere, nor any drop to drink.
[15:00] <Odd_Bloke> prologic: I'd be surprised if `runcmd` was being killed due to timing, but something strange may be going on.  Could you pastebin cloud-init.log from an affected instance?
[15:15] <stevenm> Odd_Bloke, :)
[15:16] <stevenm> We (and when I say we.. I mean who I work for) don't really want to have an ongoing 'connection' with the software our customers are choosing to run inside the VM's we host for them - at all really :)
[15:17] <stevenm> So I was just hoping to use cloud-init ready images and cloud-init (via Proxmox VE) to just pre-seed certain things (e.g. to get the network working, that's mostly it) on first time boot *only*
[15:26] <Odd_Bloke> stevenm: Do you (and when I say you.. ;) have an image capture story for your users?  By which I mean: can they launch a VM, then capture its filesystem and launch new VMs from that captured image?
[15:28] <stevenm> I don't think our users care about having that functionality
[15:47] <Odd_Bloke> stevenm: Right, but if it's available and they were to use it and you've completely disabled cloud-init, then their new VMs will behave very unexpectedly (and insecurely: SSH host keys will not be rotated, for example).  I don't know enough about Proxmox to know if such capacity is available by default.
[15:48] <Odd_Bloke> stevenm: But, also, cloud-init does only perform most of its actions on first boot, so I'm not exactly sure what issue you're seeing that we're trying to address here. :)
[15:53] <stevenm> I was hoping that this would just be a channel (in the form of a virtual CD-ROM drive) to communicate certain first-time setup information only
[15:53] <stevenm> And that is it.
[15:53] <stevenm> No reliance on cloud-init support from the hypervisor afterwards
[15:54] <stevenm> I'd rather give customers a blank VM and some space for them to upload their own ISOs
[15:54] <stevenm> We want that little involvement in what they run inside the VMs
[15:54] <Odd_Bloke> I would expect Proxmox to handle that for you: cloud-init will use DMI data to determine if the instance ID has changed.
[15:54] <stevenm> stop calling them "instances" :P
[15:55] <stevenm> The customer can run BeOS in them for all I care :P  They're VMs - plain and simple.
[15:56] <Odd_Bloke> You can have BeOS cloud instances, so I'm not sure what your point there is. :p
[15:57] <Odd_Bloke> But, sure, cloud-init will use DMI data to determine if the VM ID has changed. :)
[15:58] <stevenm> Here is my Windows 95 Cloud Instance...
[15:58] <stevenm> https://i.snipboard.io/H8mKTq.jpg
[15:58] <stevenm> Who knew they were ahead of their time.
[15:58] <Odd_Bloke> I'd call that a cloud image, not an instance. ;)
[15:58] <stevenm> Nah I want cloud-init OUT OF IT :) Well... after that initial first time setup anyway.
[15:59] <stevenm> So maybe this isn't for us.
[15:59] <Odd_Bloke> Maybe: will users be able to upload and use their own images?
[15:59] <stevenm> Certainly ISO's... not sure about anything else.
[15:59] <Odd_Bloke> (Most VM images are built with cloud-init included already.)
[16:01] <stevenm> Personally I don't mind if they upload disk images or indeed anything else like templates (cloud-init ready or not)
[16:03] <stevenm> and apparently the customer-facing front end we were going to buy in... supports it too
[16:03] <stevenm> (cloud-init)
[16:14] <blackboxsw> stevenm: meena Odd_Bloke, I'm probably missing the point here(and risk of cloud-init not fixing a network across a machine/image that has changed across reboots), but cloud-init running once could be done providing the "#cloud-config\nruncmd: [touch /etc/cloud/cloud-init.disabled]" with initial userdata or config in /etc/cloud/cloud.cfg.d/ which would get you a system that'll do it's cloud-init thing 1 time.
[16:15] <blackboxsw> that system though would be static and never reprovision again as long as  /etc/cloud/cloud-init.disabled exists, so the image would likely be fragile if moved from one network to another
[16:15] <blackboxsw> again, just a drive by comment without full context
[17:12] <Odd_Bloke> AnhVoMSFT: We're chatting about testing for the upcoming SRU and we were wondering if https://github.com/canonical/cloud-init/pull/709 is something that we can reproduce as regular users, or if that's only an issue in internal deployments?
[18:45] <xscori> Odd_Bloke well, it used to work fine until 18.5. Our current RHEL ssh keys are where we expect them (i.e. what we specified in /etc/ssh/sshd_config for "authorizedkeysfile"). It just seems to be broken afterwards. Here is my bug report on it: https://bugs.launchpad.net/cloud-init/+bug/1917817
[18:46] <xscori> I actually tried runcmd: [bin/cp, /home/%u/.ssh/authorized_keys, /etc/ssh/auth_keys/%u] thinking I could copy the file once it is inserted, but that did not seem to work either, I could not tell why from logs.
[19:54] <meena> xscori: why would runcmd know how to resolve %u?
[20:02] <xscori> you mean it does not?
[20:05] <xscori> meena I was looking at the code base and saw https://github.com/canonical/cloud-init/blob/master/cloudinit/ssh_util.py#L237 I assumed runcmd would understand %u and resolve it as well.
[20:06] <meena> I'm fairly certain it does not
[20:06] <meena> what would be the context of %u?
[20:06] <Odd_Bloke> xscori: %u is templating that sshd uses when reading its configuration, so we mirror that in our SSH handling.  By using runcmd, you'd be circumventing our SSH handling entirely (because it doesn't do what you want) and so you'd need to handle it yourself.
[20:06] <xscori> well, that would explain the reason it did not work :)
[20:07] <Odd_Bloke> (That said, I'm looking into why you might have seen this regress.)
[20:07] <xscori> oh, thank you very much!
[20:08] <meena> wait, this used to work???
[20:09] <xscori> yes
[20:10] <xscori> we have production systems running with 18.5 and it inserts the ssh keys correctly into the file specified in sshd-config > authorizedkeysfile
[20:11] <meena> which users? how??
[20:12] <xscori> I mean, we launch an ec2, say create a new key-pair and attach it to instance, then we ssh into the instance, and when we check the ssh keys, we see them not under /home/ec-user but under /etc/ssh/auth_keys/ec2-user
[20:12] <Odd_Bloke> We've definitely had some changes in this area since 18.5.
[20:12] <xscori> default user for rhel is, clouduser, I think but we switch that to be 'ec2-user'
[20:13] <Odd_Bloke> https://github.com/canonical/cloud-init/commit/f1094b1a539044c0193165a41501480de0f8df14 was between 18.5 and 19.4, so is the most likely culprit.
[20:13] <xscori> yeah, I looked, alot of changes actually...and unfortunately could not figure out what broke
[20:13] <xscori> yes, I saw that
[20:14] <xscori> I did a diff on that commit...and got lost :)
[20:14] <Odd_Bloke> https://github.com/canonical/cloud-init/commit/b0e73814db4027dba0b7dc0282e295b7f653325c landed in 20.4 and was intended to handle a bug in that previous one (perhaps this bug?) but was implemented in a way that opened up a vulnerability, so was reverted in 20.4.1.
[20:15] <xscori> I am not sure, I cloned the repo and looked at the ssh related changes since 18.5 in git history and change logs, I just could not understand the logic of the code to follow up
[20:15] <Odd_Bloke> The one thought I have, though, is that IIRC otubo brought these upstream from the Red Hat packaging, so it's possible that the Red Hat packages you're using have these changes even if they weren't upstream for that version.
[20:17] <xscori> idk... I thought about filing a case with rhel, but thought they might simply point me back to cloud-init devs, so started there instead
[20:17] <Odd_Bloke> Yeah, not trying to palm you back off on them (yet ;), just thinking aloud.
[20:17] <xscori> we have enterprise support with them, so I can definitely open a case with them if it is something they did
[20:18] <xscori> sure, I appreciate your time
[20:18] <Odd_Bloke> So I think the problem is probably https://github.com/canonical/cloud-init/commit/f1094b1a539044c0193165a41501480de0f8df14#diff-8978d79f04e525de3011b92f7b141a7bd6dae4b6d0a70f9b9ea923bbd1451a43L239
[20:19] <Odd_Bloke> That went from returning `auth_key_fn` to returning `default_authorizedkeys_file`.
[20:19] <xscori> yes, and I manually tried to simulate the situation
[20:19] <Odd_Bloke> Which is consistent with what you've described, I think.
[20:20] <xscori> for example 'extract_authorized_keys' func returned ['/home/ec2-user/.ssh/auh_keys', {}] when I provided the second param to it '/etc/ssh/sshd_config', which is a CONSTANT anyway
[20:20] <xscori> I did not expect that
[20:20] <Odd_Bloke> And, indeed, the (since-reverted) fix moved that to returning `auth_key_fns[0]`.
[20:20] <xscori> but I also did not know what the return values should be
[20:21] <xscori> that second {}, I though, should be /etc/ssh/auth_keys/ec2-user to match the experience we have with 18.5
[20:21] <xscori> thought*
[20:22] <Odd_Bloke> It should return ("/path/to/store/the/keys/in", ["list", "of", "the", "keys", "to write there", "which will look more like", "ssh-rsa AAAAAAAAA....etc"])
[20:22] <xscori> again, I did not understand the whole logic, so... was not sure what I was seeing is unexpected
[20:22] <xscori> oh
[20:22] <xscori> in that case, it is definitely returning the wrong thing
[20:23] <xscori> it is reading sshd_config, so not sure why it does not use it once it sees the authorizedkeysfile specified
[20:23] <Odd_Bloke> Yeah, that's the bug, from the line I linked to.
[20:24] <Odd_Bloke> It unconditionally returns `default_authorizedkeys_file` which is `default_authorizedkeys_file = os.path.join(ssh_dir, 'authorized_keys')` (and ssh_dir is `os.path.join(pw_ent.pw_dir, '.ssh')`).
[20:25] <Odd_Bloke> And so it disregards the setting, hence the regression.
[20:26] <xscori> I see
[20:29] <xscori> oops did not notice channel kicked me out
[20:30] <Odd_Bloke> No worries, you didn't miss anything.
[20:30] <xscori> :)  ok
[20:31] <xscori> so, this is not a rhel issue and needs to be fixed in the source....hmm, even if it is fixed, then it will need to make its way to rhel, to an rpm etc.:(  oh boy....
[20:32] <xscori> Odd_Bloke does any workaround pop into your mind?
[20:33] <xscori> I guess copying might be one, or convincing our cyber to temporarily allow ssh keys in /home/{user}/.ssh
[20:33] <xscori> none too pretty
[20:35] <Odd_Bloke> xscori: Unfortunately, you're correct.  Your best bet is likely to iterate over users (in a runcmd) and move the keys, as you suggest.
[20:35] <Odd_Bloke> Apologies that we don't have a better answer. :/
[20:35] <xscori> ok, I guess I need to learn about runcmd and figure out how to do that
[20:36] <xscori> it is what it is. cloud-init is awesome, and bugs are fact of life, I mean code :)
[20:36] <Odd_Bloke> :)
[20:40] <xscori> ok, thinking about this a bit... I read somewhere in documentation, I guess, that ssh is run in boot stage before sshd_config is read and sshd service is run
[20:40] <xscori> is that b/c keys have to be in place before the service starts or is it ok to insert the keys at any point? b/c runcmd is running at a later stage, right?
[20:41] <xscori> again thinking loud, if that's true, copying the keys alone won't do it, I will have to recycle the service as well?
[21:07] <Odd_Bloke> xscori: IIRC, you need to have _host_ keys in place before SSH starts.  I think it's just an implementation detail that we write authorized keys at the same time: it's certainly the case that sshd will pick up new authorized keys without a restart.
[21:08] <Odd_Bloke> I think you would have a window between SSH coming up and keys being installed that isn't present in the default configuration.
[21:10] <Odd_Bloke> rharper: I'm planning on landing https://github.com/canonical/cloud-init/pull/829 Monday morning so we can kick off an SRU process; LMK if I haven't addressed your concerns sufficiently. :)
[21:41] <xscori> good to know, thank you! @o
[21:42] <xscori> gawd! I meant,  Odd_Bloke
[22:04] <rharper> Odd_Bloke: +1 on landing
[22:57] <Odd_Bloke> rharper: Thanks!
[22:57] <Odd_Bloke> Have a good weekend!
[22:58] <rharper> Odd_Bloke: Thanks , you as well
[23:19] <beantaxi> Do folks here tend to stick to US business hours?
[23:22] <rharper> beantaxi: I think there may be some Euro timezone but mostly US
[23:22] <rharper> if you leave your client up, it's worth just asking any time, we'll usually reply when we can
[23:28] <beantaxi> Thanks! I think in the past I've thought weekends were a good time to eg get a PR done and dusted, but it looks like business hours actually works better
[23:29] <beantaxi> So does that imply folks work on cloud-init fulltime? Or just that they tend to do it while at work
[23:31] <rharper> there's a cloud-init team at Canonical, and many cloud partners also have folks who work cloud-init upstream;  and community folks maintaining distros or OS specific sections
[23:32] <beantaxi> That makes sense
[23:44] <beantaxi> How frowned upon is it to say "hey can you look at my PR?" Speaking purely from antsiness rather than urgency
[23:45] <beantaxi> The Real Falcon was good enough to give it quite a bit of attention today, and I'm afraid I've got my very last change in just after he's taken off. There's no harm it all in it waiting till Monday if that's how it is.
[23:45] <beantaxi> But life's too short to never be annoying.
[23:48] <falcojr> beantaxi: yeah, I probably won't get to it again until Monday
[23:49] <falcojr> You can feel free to ask for a review anytime here, but sometimes it takes us a while to get around to it just because of other competing priorities
[23:55] <beantaxi> falcojr: That makes sense. It's literally my first contribution to anything this size, and my excitement for it is truly laughable.
[23:55] <rharper> beantaxi: IMO, it's always OK to ping for reviews on your PR (either on the PR itself or here in irc)