Xat` | hello guys | 09:22 |
---|---|---|
Xat` | how is scripts-per-instance determined ? | 09:22 |
Xat` | how cloud-init knows about the first boot ? | 09:23 |
powersj | Xat`, the instance-id is read, if that changes it is considered a new instance | 09:28 |
Xat` | powersj: I am testing with local instance on vbox, how this is implemented ? | 09:32 |
Xat` | On cloud provider, I guess instance-id is retrieve from instance metadata . How it works with vbox or vmware | 09:35 |
Xat` | wait a min, maybe cloud-init provides it | 09:35 |
Xat` | let me query the metadata url | 09:35 |
Xat` | ok no it does not | 09:36 |
Xat` | powersj: nvm, I gonna read about instance metadata with cloud-init ;) | 09:38 |
Xat` | I changed the value from /run/cloud-init/.instance-id , then did a reboot but the script in the scripts-per-instance has not been executed | 09:40 |
=== cpaelzer__ is now known as cpaelzer | ||
Odd_Bloke | blackboxsw: You have some changes requested on https://github.com/canonical/cloud-init/pull/70 if you want to take another look | 15:53 |
blackboxsw | Odd_Bloke: otubo. https://github.com/canonical/cloud-init/pull/70 looks good. was there a Launchpad bug related to this commit set? | 16:31 |
blackboxsw | I've approved pull 70, just didn't squash merge yet incase we forgot to correlate to a launchpad (or redhat bug) | 16:32 |
blackboxsw | ahh yes there was | 16:38 |
blackboxsw | https://bugs.launchpad.net/cloud-init/+bug/1781781 | 16:38 |
ubot5 | Ubuntu bug 1781781 in curtin "/swap.img w/fallocate has holes" [Medium,Confirmed] | 16:38 |
blackboxsw | ok I'll tie that bug to the squashed commit message | 16:38 |
blackboxsw | ohh interesting Odd_Bloke paride, on an azure vm where I've config'd ipv6 and ipv4 on nic0 and only ipv4 on nic1. IMDS is showing/allocating ipv6 addresses to nic1. | 17:21 |
blackboxsw | cloud-init query ds.meta_data.imds.network | pastebinit | 17:21 |
blackboxsw | http://paste.ubuntu.com/p/T2CSCQTRVC/ | 17:21 |
blackboxsw | I think we may have a minor issue t to file for clarity with azure folks. | 17:22 |
Odd_Bloke | Yeah, sounds like some clarification is necessary. | 17:31 |
akik | i expected a non-zero exit code if the events log says container die | 17:45 |
akik | uhh wrong channel | 17:45 |
cyberpear | any chance of passing the cloud-init config via `-fw_cfg` option of qemu? kind of like how Fedora CoreOS does its ignition config? `--qemu-commandline="-fw_cfg name=opt/com.coreos/config,file=/path/to/example.ign"` https://docs.fedoraproject.org/en-US/fedora-coreos/getting-started/#_launching_with_qemu | 18:02 |
Odd_Bloke | cyberpear: Using the firmware configuration like that isn't supported, so the two options I would suggest are using the kernel cmdline or a NoCloud metadata drive. Both of those options are documented at https://cloudinit.readthedocs.io/en/latest/topics/datasources/nocloud.html#datasource-nocloud | 18:50 |
Odd_Bloke | cyberpear: (You can also file a feature request using the bug link in the topic, if you'd like. :) | 18:51 |
Odd_Bloke | blackboxsw: https://github.com/cloud-init/ubuntu-sru/pull/87 <-- for your review; in particular, review of the verification script before I start running it for all releases would be appreciated :) | 19:53 |
blackboxsw | Odd_Bloke: looks good. The testing you are doing there is probably a bit deeper than needed as we could have used `lxc exec test-$SERIES -- cloud-init devel net-convert --output-kind=netplan --directory /out.d --network-data=network.yaml --distro ubuntu` and validated the output results instead of having to setup an lxc and override configs. Did you want to exercise the whole system instead, it is definitely | 21:16 |
blackboxsw | more thorough to setup lxc network on launch and your test is valid | 21:16 |
blackboxsw | comment and pointer added https://github.com/cloud-init/ubuntu-sru/pull/87 take what you will. | 21:24 |
ahosmanMSFT | Hi @blackboxsw it's been a while, I've been inbetween teams. I I noticed azurecloud integration has an issue with the function _wait_for_system(self, wait_for_cloud_init) in the base instance.py class. This function is called after the vm is booted and tries to run a script and ssh. When removing that function there are no ssh issue's, but for now it ssh'ing is 50/50. Can you help me look into this? | 21:32 |
blackboxsw | hi ahosmanMSFT. | 21:33 |
blackboxsw | is that failing due to timeout? | 21:34 |
ahosmanMSFT | Yes | 21:34 |
ahosmanMSFT | When I remove that function it's a 100% success | 21:36 |
blackboxsw | so on a test run that did fail you'd probably want to pass --preserve-instance and see if ssh connectivity came up sometime later after the default boot_timeout that you have set for azure which is 300 seconds. | 21:38 |
blackboxsw | on a test system that is retained (and exhibited the timeout failure) I'd be curious to see cloud-init analyze blame to see if cloud-init was spending an inordinate amount of time setting up | 21:40 |
ahosmanMSFT | I did some tests and ssh connective is available, I think it has to do with either the scripts or something else in that function | 21:40 |
blackboxsw | if cloud-init setup on Azure is < 30 seconds, then the issue is somehow that the initiall ssh connection to the vm is timing out without connect | 21:40 |
ahosmanMSFT | hmm I haven't run blame on the system | 21:40 |
ahosmanMSFT | I presume it it has nothing to do with ssh'ing it's self, but the wait part because it ssh's immediately when that function is removed | 21:42 |
blackboxsw | ahosmanMSFT: also what that 'script' waits for is for a systemd enabled system to report either `systemctl is-system-running == 'running' or 'degraded'` | 21:42 |
blackboxsw | so checking `systemctl is-system-running` on the system will tell you what state it is in | 21:42 |
blackboxsw | ahosmanMSFT: and a `systemd-analyze blame` on the timedout system will also tell you where the boot process spent most of it's time | 21:43 |
blackboxsw | *its* | 21:43 |
ahosmanMSFT | Ok, I'll try that and let you know. Got a meeting soon though. | 21:54 |
Odd_Bloke | blackboxsw: https://github.com/canonical/cloud-init/pull/185 <-- very small CI fix/change | 22:04 |
Odd_Bloke | blackboxsw: And https://github.com/cloud-init/ubuntu-sru/pull/88 | 22:14 |
ahosmanMSFT | Do cloud tests run every PR, I know they run every night @blackboxsw | 22:20 |
Odd_Bloke | We run a subset of the lxd tests for each PR. | 22:32 |
Odd_Bloke | But the full LXD test suite and the non-LXD test suites only run nightly. | 22:32 |
ahosmanMSFT | how about azure/ec2 | 22:34 |
Odd_Bloke | As I said, the non-LXD test suites only run nightly. :) | 22:37 |
ahosmanMSFT | ok, that makes sense since those tests would consume more time | 22:37 |
blackboxsw | more time and more $$ spinning up instances on the clouds :) | 22:50 |
meena | Odd_Bloke: what about contextlib vs contextlib2? | 22:51 |
blackboxsw | ahosmanMSFT: did you know the azure instance type which exhibits byte-swapping behavior? | 23:01 |
blackboxsw | I'm trying to validate that your fix resolves the issue w/ incorrectly seeing 'new' instance-id across boots | 23:01 |
blackboxsw | as that fix is part of this SRU | 23:01 |
ahosmanMSFT | It was on all Azure gen2 VM's when switching nodes on azure | 23:02 |
blackboxsw | thanks ahosmanMSFT | 23:02 |
ahosmanMSFT | @blackboxsw I'm witnessing some weird behavior if azurcloud/image.py two different if statements one executes one doesn't they both have the same self._img_instance value of NONE, can you verify this. This is why images aren't launching on azurecloud integration tests. https://paste.ubuntu.com/p/VW8SH8QXsj/ | 23:15 |
Odd_Bloke | meena: I haven't looked at it yet, but I assume it can go? | 23:19 |
blackboxsw | ahosmanMSFT: is self._img_instance the string "NONE" instead of the python value of None? | 23:25 |
blackboxsw | that would trigger one path to run, and the other not | 23:25 |
ahosmanMSFT | They both have the same value, you can see when it’s initialized in azure cloud/image.py.__init__ | 23:27 |
blackboxsw | ahosmanMSFT: your LOG.debug("self._img_instance: %s" % self._img_instance) is down below self.platform.create_instance( and self._img_instance.start(wait=True, wait_for_cloud_init=True) | 23:31 |
blackboxsw | so it's one of those two that isn't completing without error (which is why your logs don't show LOG.debug("self._img_instance: %s" % self._img_instance) | 23:32 |
blackboxsw | so the logic paths are properly followed. just something bogus happening in the create_instance or instance.start() calls right | 23:32 |
blackboxsw | ahosmanMSFT: is there a specific cloud_test name that typically fails for you when things do fail? | 23:34 |
ahosmanMSFT | blackboxsw it doesn’t fail individual tests, but when running multiple tests it fails to create clean image for the rest of tests due to it failing to creat a snapshot | 23:36 |
blackboxsw | ok will run a suite and see if I can get it to fail for me | 23:37 |
blackboxsw | won't be able to kick that off though until I'm done with current SRU verification on Azure specifically (as I don't want to collide w/ my manual test runs in the same account) | 23:38 |
blackboxsw | I just have one more manual SRU test to run (I had hit a configuration problem as I sent in email). But I *think* I've worked around it by creating a load balancer for the moment. | 23:39 |
ahosmanMSFT | blackboxsw: Thanks, I’ll keep hacking at it too | 23:40 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!