[00:11] https://github.com/canonical/cloud-init/issues/4288 - this also hit the focal version 23.2.1-0ubuntu0~20.04.2 -- is there a lp bug (or github issue) to track the fix into focal/jammy ? [00:11] -ubottu:#cloud-init- Issue 4288 in canonical/cloud-init "v23.2.1 failing on ubuntu 22.04 with local variable 'ds_name' referenced before assignment" [Closed] [00:22] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2028784 [00:22] -ubottu:#cloud-init- Launchpad bug 2028784 in cloud-init (Ubuntu Mantic) "Variable 'ds_name' referenced before assignment" [Undecided, New] [02:11] rharper: fix has already been released upstream. SRU will follow asap but will still need some time [02:37] falcojr: thanks; I was looking further into how we saw it but the cloud-images didn't; I *think* if ds-identify returns something it detects, then the importer code isn't run; we have a non-ds-id datasource which we enable via cloud.cfg in the image; this definitely hit the issue; but major cloud/ds won't see it. I was hoping the commit or even the SRU bug could cover what code paths trigger it; [02:46] rharper: Good point, I'll update the bug with the contex. Datasources can be specified with a "DataSource" in front of them. So the datasource list could contain either "LXD" or "DataSourceLXD" and either one should be recognized. The "DataSource" prefix isn't common. It's not used at all upstream or in stock ubuntu images, and isn't used on any of [02:46] the major clouds either. We integration tested on the big 3 (plus a few more) and didn't see any issues then. [08:33] blackboxsw any ideas >? https://github.com/canonical/cloud-init/pull/4291 [08:33] -ubottu:#cloud-init- Pull 4291 in canonical/cloud-init "[RFC] NM renderer: set default IPv6 addr-gen-mode for all interfaces to eui64" [Open] [08:33] falcojr [10:29] ani: unfortunately it's been a busy week of hotfixes. Realistically we probably won't get around to reviewing most PR until sometime early next week === rharper_ is now known as rharper === orndorffgrant5 is now known as orndorffgrant [19:17] Hi. I'm using ubuntu ova templates 22.04 and cloud-init started acting differently since around the beginning of June. I'm creating a template and then I'm cloning a virtual machine. The problem is that cloud-init starts both init-local and init stages before rebooting, instead of just init-local. That exposes the ssh server and so terraform immediately tries to connect to it. After creating the template, I'm resetting cloud-init so that [19:17] I can provision more specific stuff again with cloud-init. Is there any way I could force cloud-init to change it behaviour while creating the template and not start the init-local stage so early (before reboot)? [19:28] effendy[m]: are you using Ubuntu's subquity to install Ubuntu? [19:30] This is a cloudimage, the OS is already installed, as far as I understand. [19:31] It just read the cloud-init configuration, I'm not giving it any parameters in grub or anything like that (like I used to with the 'classical' installation when I created templates with packer) [19:32] so you boot the image once and modify things to create a template? [19:33] It boots once and then it restarts automatically. This is the behaviour that I see with cloud images (at least with ova). [19:33] I expect that would depend on the user-data being provided [19:34] So I inject the cloudinit configuration from the very beginning. It boots once, then it restarts. In the cloudinit configuration I have a trick to tell it to stop the ssh service only once. [19:34] unless there is vendor-data or similar triggering a reboot [19:35] Would that trigger a reboot? [19:35] #cloud-config\nbootcmd:\n - [ cloud-init-per, once, systemctl, stop, sshd ]\nusers:\n - default\n - name: packer\n groups: sudo\n lock-passwd: false\n ssh_authorized_keys:\n - ssh-ed25519 ssh-public-key\n sudo: ['ALL=(ALL) NOPASSWD:ALL']\n shell: /bin/bash\n inactive: false\n passwd: $6$HASH/ [19:35] This is basic. [19:36] I was referring to something like "power_state:\n mode: reboot\n" [19:36] I'm injecting this with vapp (this is how it works with vsphere). [19:36] with vsphere and ova. [19:36] I"m not doing that anywhere. [19:36] so what is doing the reboot? [19:36] I've no idea. [19:36] I would think it's cloud-init. [19:36] The reason for that is that cloud-init changed its behaviour in June in the cloudimage (ova) template. [19:36] only if it is told to do so, like with the config example I pasted [19:36] not template, image. [19:37] Right, but why would cloud-init run only one stage until June this year (before rebooting) and now it's running two stages before rebooting? The second stage presupposes starting the ssh server. [19:38] have you looked at /var/log/cloud-init.log to see what is happening and why? [19:38] I can assure you that I'm not running any commands that reboots the image. I think cloud-init does that, for example, for stuff like adding disk partition maybe and thereabout. [19:39] nope, c-i doesn't reboot to add disk partitions [19:39] Yes, I have, the discussion is here: https://github.com/canonical/cloud-init/issues/4188 [19:39] -ubottu:#cloud-init- Issue 4188 in canonical/cloud-init "cloud-init changes behaviour with ubuntu cloud image ova starting from version 20230602" [Closed] [19:39] the only conclusion is what I said about the different stages. [19:39] Ok, it was a supposition. [19:40] if cloud-init is rebooting then you will see the power_state module being run in the cloud-init.log [19:40] yeah, I'm not seeing it. [19:41] so then it is unlikely that cloud-init is doing the reboot [19:42] So you mean it is atypical that the a cloudimage would reboot once? [19:42] looked at that (closed) issue several people commented that the version of cloud-init before and after is the same [19:42] and that therefore they assumed any change in behaviour was due to something else in the "environment" [19:43] Yeah, well, as I said there, I initially started the conversation on an Ubuntu channel. [19:43] But I at least expected to be pointed in the right direction, if they cannot do anything about it, given that this is still canonical. [19:44] And I obviously thought about cloud-init anyway, given how tightly connect to it the Ubuntu cloudimages are. [19:45] connected* [19:47] In any case, as I said, I'm not bothered as much by the reboot (although it would be nice not to have it at all after all), but about cloud-init's changing its behaviour, which is then related to Ubuntu only, but not to cloud-init. Although they're both tied to canonical. [19:52] where is the logfile from when you say it worked differently earlier? [19:53] It's right there in the thread: https://github.com/canonical/cloud-init/issues/4188 [19:53] -ubottu:#cloud-init- Issue 4188 in canonical/cloud-init "cloud-init changes behaviour with ubuntu cloud image ova starting from version 20230602" [Closed] [19:54] I wouldn't be able to (easily) reproduce this, I've deleted the image. I'm trying to figure out what reboots the system in the meantime. [19:54] (I deleted the image because I thought I was out of the woods, but I didn't think it through and I thought I wouldn't have issues with the second stage -terraform) [19:57] I've found a very similar issue here: https://www.reddit.com/r/Proxmox/comments/11zwxq4/reboot_after_cloudinit/ [19:57] in one of the logs in that Issue I see "Invalid cloud-config provided: Please run 'sudo cloud-init schema --system' to see the schema errors.", though this is a warning, not an error [19:57] This is a different context where proxmox is being used. So the reboot is typical of the Ubuntu cloudimages. [19:57] I don't understand what you mean [19:58] Ah, no, this is misleading. The user says that a reboot is needed in order to start ssh (never mind). [19:58] what user? [19:58] in the link, never mind, you can ignore it. [20:00] as I said previously, cloud-init does *not* reboot a machine unless it is configured to do so [20:20] If I don't run cloud-init clean in the first stage (when I create the template), so that cloud-init won't run again, and then, after I clone the VM, I see that the VM isn't restarting, that still doesn't necessarily mean that cloud-init triggered it, right? [20:27] Or about filesystem resizing done by cloud-init? Wouldn't that require a reboot? [20:28] I know it doesn't normally if you do it yourself manually, but maybe cloud-init is acting differently. [21:19] Not re-enabling cloud-init when creating the template does indeed stop the reboot when I clone the template. [21:19] I'm still at a loss to understand how the situation is explained to me as if it were worlds apart from cloudinit :) [21:39] effendy[m]: what do you mean by "re-enabling cloud-init"? [21:40] also what do you mean by "the reboot when I clone the template"? A reboot of the VM running the template? or a reboot of a VM created using the created template? [21:40] I mean not doing cloud-init clean [21:40] A reboot of a VM created from the template. [21:41] if you are created a "template" (i.e. a image to later used) then you need to clean cloud-init so that cloud-init can run from a "fresh" situation [21:42] Yeah, I'm not sure if you read what I said. [21:42] I do that normally. [21:42] But I've just tested it without doing the clean. And the VM created from the template isn't rebooting anymore, now that cloud-init isn't running anymore. [21:43] why is cloud-init not running anymore? you said you didn't do a clean, you didn't say you disabled cloud-init [21:44] I'm not seeing the systemd cloud-init anymore if I don't do the clean. In any case, when I don't do it, cloud-init will read (at least this is my impression) parts of the configuration only once. [21:45] the networking part seems to be read though and for that I add the disable-network-whatever file (when it to – in other contexts) [21:45] if you don't do a clean when info regarding the previous run will be kept - which is not what you want for a template to create VMs from [21:45] s/when/then/ [21:46] Yes, I guess so. As i said, the point was to test how it behaves. [21:47] cloud-init isn't running anymore - the vm isn't rebooting anymore. [21:47] the tasks that run for a "fresh" instance won't run as without a "clean" then cloud-init will not see a machine as being a fresh instance [21:47] s/then/as/ [21:47] Yeah, I got that... [21:47] I'm not sure how to explain more clearly that I was just testing it. [21:47] And that I normally run the clean. [21:48] The point was to see a connection between cloud-init and the reboots. And there obviously is. I just don't understand how and to what extent this is related to something specific to the ubuntu cloud-image. [21:48] ok, and to reinforce the point, the only thing in cloud-init that would trigger reboots is the power_state module which would only cause reboots in the configuration told it to do so [21:48] sin/if/ [21:50] Right, but aren't there any point of references, such as concrete images, where cloud-init runs? I mean, when you run cloud-init, there is a clear context. I don't understand why the ubuntu images seem so foreign. [21:50] as if I were doing something weird. [21:50] I can't speak for Ubuntu, I'm looking at this from a cloud-init perspective [21:51] So if you test cloud-init, it's only as a generic abstract layer? Not actually tested on linux distros? [21:51] you are asking me specifically? [21:51] I don't work for Canonical [21:51] For instance, sure :) [21:51] For also generally. [21:51] But also* [21:51] I'm the cloud-init maintainer for Alpine Linux [21:52] I see, so you're testing it only on alpine then. [21:52] me personally, yes [21:52] and it doesn't reboot at the end of configuration unless I give it user-data telling it to do so [21:56] cloud-init is typically tested on distros by the maintainers of those distros [21:57] Yes, I see. Maybe there's a specific version for Ubuntu, I don't know... but you (and the others) say that as long as the version is the same, it's the same cloud-init. However cloud-init does behave differently, that I know. The versions before June vs those afterwards. [21:57] though cloud-init itself have a set of automated testcases that cover some aspects of different distros [21:57] Maybe the only difference is that in the cloudimages before June it would reboot faster, before cloud-init would start the init stage :D [21:58] But I guess you'd normally see issues there, as in interrupted pocesses. [21:58] perhaps you ned to workout was is triggering any reboots [21:58] s/was/what/ [21:58] Yeah, I was thinking about that. I'm not sure how though. [21:58] auditd? [21:59] There's nothing in syslog, I just see services/processes reacting to sigterm signals. [21:59] you said you don't have logs from earlier when things worked... [21:59] I do. In the github thread. [21:59] I said that I couldn't easily recreate them now, because I'd need the cloudimage from before June. [22:00] which I've deleted. I guess I could get it from somewhere, I would just need to work at it a little bit. [22:00] This thread, they're there: https://github.com/canonical/cloud-init/issues/4188 [22:00] -ubottu:#cloud-init- Issue 4188 in canonical/cloud-init "cloud-init changes behaviour with ubuntu cloud image ova starting from version 20230602" [Closed] [22:00] I looked at the tarfiles in the github issue but despite the 2 tarfiles having different dates the cloud-init.log files inside them have entries from the same date [22:01] Because I tested them in parallel, if you're referring to the timestamps. [22:01] I mean in the same period of time. [22:01] But the images were different. [22:02] One from 24.05.2023 and the other from 02.06.2023. That's got nothing to do with when the systems booted. [22:03] blackboxsw also saw the difference. [22:04] are these logs from the VM creating the templates? or from VMs booting using the created templates? [22:08] From the VM that were created from the templates. So the latter. So after running clean. [22:08] Ah, no! [22:08] They're from the initial VM, run with packer, yes. So the VM that is creating the template. [22:08] well the cloud-init logs show things like packer user being created [22:08] (the context is packer, so yes, it's the first stage) [22:09] Yeah, I've remembered. [22:09] so either the pre-existing logs weren't removed when creating the template, or else the log is from the VM building the template [22:09] As I said, it's from the VM building the template :) [22:10] but you said the problem is not with creating the templates but later when they're being used? [22:10] in which case logs from then would be what is needed [22:10] Good catch :) I said that because I found the solution with packer (stopping ssh temporarily with bootcmd). [22:10] But the behaviour is identical in both cases. [22:10] but the logs won't be... [22:11] And with terraform it becomes much more difficult to circumvent the SSH problem. [22:11] by "both cases" i meant in the case of packer and in the case of terraform [22:11] so in the case of creating the template and in the case of creating the virtual machine from the template. [22:11] you're getting BOTH the template creation VM and the VM running using the templates rebooting? [22:12] Yes. [22:12] that wasn't clear to me [22:12] They both reboot once after the first boot. [22:12] But as I said previously, if I don't run the clean command when creating the template, the virtual machine (created form the template) won't reboot, because cloud-init doesn't seem to be running at all (systemd services aren't there). [23:08] I see in the logs that the second cloud init finishes running the final module, the rebooting process starts. Now there might be something external to cloud-init that watches cloud-init, but I'm not sure. [23:09] Jul 27 22:52:06 omni-consul-0 cloud-init[1051]: Cloud-init v. 23.2.1-0ubuntu0~22.04.1 running 'modules:final' at Thu, 27 Jul 2023 22:52:06 +0000. Up 47.58 seconds. [23:09] Jul 27 22:52:06 omni-consul-0 systemd[1]: Reloading. [23:09] Jul 27 22:52:07 omni-consul-0 systemd[1]: Removed slice Slice /system/modprobe. [23:09] That's the start of the rebooting process.