[15:09] <gjolly> hey, we are seeing a weird issue on Azure with Hirsute instances and Accelerated Networking (https://bugs.launchpad.net/cloud-init/+bug/1919177). Cloud-init shows this error "Failed to read /var/lib/dhcp/dhclient.eth0.leases" and the user's public key is not setup. It only happens sometimes (~once per 5 instances).
[15:13] <gjolly> AnhVoMSFT: powersj suggested you might be able to help us with this. Maybe, something change recently which is related to accelerated networking and Mellanox devices. The thing is that we are only able to reproduce with Ubuntu Hirsute instances.
[15:58] <AnhVoMSFT> gjolly - let me take a look at the bug reported on launchpad
[16:02] <AnhVoMSFT> Looking at the log, I can explain the differences in the good vs bad case that you pointed out in the log (about the error message coming from azure.py vs DataSourceAzure.py). The one you see from azure.py is coming from the report_diagnostic event while the other one is a direct debug log from DataSourceAzure.py. It's a red herring, the issue is still a networking issue
[16:28] <gjolly> AnhVoMSFT: thanks! When you say it's networking, does that mean it's more a kernel issue and less something related to cloud-init for you?
[18:51] <AnhVoMSFT> gjolly cloud-init was not able to communicate with the platform (Azure). From the log it was able to communicate earlier to the metadata server, but later the call to wireserver would fail (even though metadata server and wireserver use different IP addresses, traffic actually flows through wireserver in both cases). This means some thing happened between the metadata server call and the
[18:51] <AnhVoMSFT> wireserver call
[18:52] <AnhVoMSFT> since this happens only on Hirsute it somewhat rules out platform issue