[09:22] hello guys [09:22] how is scripts-per-instance determined ? [09:23] how cloud-init knows about the first boot ? [09:28] Xat`, the instance-id is read, if that changes it is considered a new instance [09:32] powersj: I am testing with local instance on vbox, how this is implemented ? [09:35] On cloud provider, I guess instance-id is retrieve from instance metadata . How it works with vbox or vmware [09:35] wait a min, maybe cloud-init provides it [09:35] let me query the metadata url [09:36] ok no it does not [09:38] powersj: nvm, I gonna read about instance metadata with cloud-init ;) [09:40] I changed the value from /run/cloud-init/.instance-id , then did a reboot but the script in the scripts-per-instance has not been executed === cpaelzer__ is now known as cpaelzer [15:53] blackboxsw: You have some changes requested on https://github.com/canonical/cloud-init/pull/70 if you want to take another look [16:31] Odd_Bloke: otubo. https://github.com/canonical/cloud-init/pull/70 looks good. was there a Launchpad bug related to this commit set? [16:32] I've approved pull 70, just didn't squash merge yet incase we forgot to correlate to a launchpad (or redhat bug) [16:38] ahh yes there was [16:38] https://bugs.launchpad.net/cloud-init/+bug/1781781 [16:38] Ubuntu bug 1781781 in curtin "/swap.img w/fallocate has holes" [Medium,Confirmed] [16:38] ok I'll tie that bug to the squashed commit message [17:21] ohh interesting Odd_Bloke paride, on an azure vm where I've config'd ipv6 and ipv4 on nic0 and only ipv4 on nic1. IMDS is showing/allocating ipv6 addresses to nic1. [17:21] cloud-init query ds.meta_data.imds.network | pastebinit [17:21] http://paste.ubuntu.com/p/T2CSCQTRVC/ [17:22] I think we may have a minor issue t to file for clarity with azure folks. [17:31] Yeah, sounds like some clarification is necessary. [17:45] i expected a non-zero exit code if the events log says container die [17:45] uhh wrong channel [18:02] any chance of passing the cloud-init config via `-fw_cfg` option of qemu? kind of like how Fedora CoreOS does its ignition config? `--qemu-commandline="-fw_cfg name=opt/com.coreos/config,file=/path/to/example.ign"` https://docs.fedoraproject.org/en-US/fedora-coreos/getting-started/#_launching_with_qemu [18:50] cyberpear: Using the firmware configuration like that isn't supported, so the two options I would suggest are using the kernel cmdline or a NoCloud metadata drive. Both of those options are documented at https://cloudinit.readthedocs.io/en/latest/topics/datasources/nocloud.html#datasource-nocloud [18:51] cyberpear: (You can also file a feature request using the bug link in the topic, if you'd like. :) [19:53] blackboxsw: https://github.com/cloud-init/ubuntu-sru/pull/87 <-- for your review; in particular, review of the verification script before I start running it for all releases would be appreciated :) [21:16] Odd_Bloke: looks good. The testing you are doing there is probably a bit deeper than needed as we could have used `lxc exec test-$SERIES -- cloud-init devel net-convert --output-kind=netplan --directory /out.d --network-data=network.yaml --distro ubuntu` and validated the output results instead of having to setup an lxc and override configs. Did you want to exercise the whole system instead, it is definitely [21:16] more thorough to setup lxc network on launch and your test is valid [21:24] comment and pointer added https://github.com/cloud-init/ubuntu-sru/pull/87 take what you will. [21:32] Hi @blackboxsw it's been a while, I've been inbetween teams. I I noticed azurecloud integration has an issue with the function _wait_for_system(self, wait_for_cloud_init) in the base instance.py class. This function is called after the vm is booted and tries to run a script and ssh. When removing that function there are no ssh issue's, but for now it ssh'ing is 50/50. Can you help me look into this? [21:33] hi ahosmanMSFT. [21:34] is that failing due to timeout? [21:34] Yes [21:36] When I remove that function it's a 100% success [21:38] so on a test run that did fail you'd probably want to pass --preserve-instance and see if ssh connectivity came up sometime later after the default boot_timeout that you have set for azure which is 300 seconds. [21:40] on a test system that is retained (and exhibited the timeout failure) I'd be curious to see cloud-init analyze blame to see if cloud-init was spending an inordinate amount of time setting up [21:40] I did some tests and ssh connective is available, I think it has to do with either the scripts or something else in that function [21:40] if cloud-init setup on Azure is < 30 seconds, then the issue is somehow that the initiall ssh connection to the vm is timing out without connect [21:40] hmm I haven't run blame on the system [21:42] I presume it it has nothing to do with ssh'ing it's self, but the wait part because it ssh's immediately when that function is removed [21:42] ahosmanMSFT: also what that 'script' waits for is for a systemd enabled system to report either `systemctl is-system-running == 'running' or 'degraded'` [21:42] so checking `systemctl is-system-running` on the system will tell you what state it is in [21:43] ahosmanMSFT: and a `systemd-analyze blame` on the timedout system will also tell you where the boot process spent most of it's time [21:43] *its* [21:54] Ok, I'll try that and let you know. Got a meeting soon though. [22:04] blackboxsw: https://github.com/canonical/cloud-init/pull/185 <-- very small CI fix/change [22:14] blackboxsw: And https://github.com/cloud-init/ubuntu-sru/pull/88 [22:20] Do cloud tests run every PR, I know they run every night @blackboxsw [22:32] We run a subset of the lxd tests for each PR. [22:32] But the full LXD test suite and the non-LXD test suites only run nightly. [22:34] how about azure/ec2 [22:37] As I said, the non-LXD test suites only run nightly. :) [22:37] ok, that makes sense since those tests would consume more time [22:50] more time and more $$ spinning up instances on the clouds :) [22:51] Odd_Bloke: what about contextlib vs contextlib2? [23:01] ahosmanMSFT: did you know the azure instance type which exhibits byte-swapping behavior? [23:01] I'm trying to validate that your fix resolves the issue w/ incorrectly seeing 'new' instance-id across boots [23:01] as that fix is part of this SRU [23:02] It was on all Azure gen2 VM's when switching nodes on azure [23:02] thanks ahosmanMSFT [23:15] @blackboxsw I'm witnessing some weird behavior if azurcloud/image.py two different if statements one executes one doesn't they both have the same self._img_instance value of NONE, can you verify this. This is why images aren't launching on azurecloud integration tests. https://paste.ubuntu.com/p/VW8SH8QXsj/ [23:19] meena: I haven't looked at it yet, but I assume it can go? [23:25] ahosmanMSFT: is self._img_instance the string "NONE" instead of the python value of None? [23:25] that would trigger one path to run, and the other not [23:27] They both have the same value, you can see when it’s initialized in azure cloud/image.py.__init__ [23:31] ahosmanMSFT: your LOG.debug("self._img_instance: %s" % self._img_instance) is down below self.platform.create_instance( and self._img_instance.start(wait=True, wait_for_cloud_init=True) [23:32] so it's one of those two that isn't completing without error (which is why your logs don't show LOG.debug("self._img_instance: %s" % self._img_instance) [23:32] so the logic paths are properly followed. just something bogus happening in the create_instance or instance.start() calls right [23:34] ahosmanMSFT: is there a specific cloud_test name that typically fails for you when things do fail? [23:36] blackboxsw it doesn’t fail individual tests, but when running multiple tests it fails to create clean image for the rest of tests due to it failing to creat a snapshot [23:37] ok will run a suite and see if I can get it to fail for me [23:38] won't be able to kick that off though until I'm done with current SRU verification on Azure specifically (as I don't want to collide w/ my manual test runs in the same account) [23:39] I just have one more manual SRU test to run (I had hit a configuration problem as I sent in email). But I *think* I've worked around it by creating a load balancer for the moment. [23:40] blackboxsw: Thanks, I’ll keep hacking at it too