=== cpaelzer__ is now known as cpaelzer === hjensas is now known as hjensas|afk [12:30] Hello everyone, did someone report issues with cloud-init cloud-config & cloud-final services being blocked by snapd in recent ubuntu cloud images ? [12:31] All I can find is a recent discussion in linuxcontainers.org reporting similar issues [12:36] well looks like it's more of a snapd problem than cloud-init === hjensas|afk is now known as hjensas [12:50] ok, it IS snapd's fault; removing the package release the cloud-config & cloud-final jobs [12:50] any snapd IRC channel around ? [12:57] FYI : https://bugs.launchpad.net/snapd/+bug/1874249 [12:57] Ubuntu bug 1874249 in snapd "snapd service never completes on boot off focal cloudimg" [Undecided,New] [13:33] caribou: I haven't seen that particular failure. BTW, it looks like you have truncated lines in your journal output there, which might make it harder for the snapd folks to debug. [13:35] just got it today with the new cloudimg [13:35] ok, I'll check the logs & add a new set [17:55] Odd_Bloke: I see the same issue I'm seeing over on ua-client repo. no travis links to jobs that are in progress. https://github.com/canonical/cloud-init/pull/323 have you noticed this before? [17:56] I'm finding as well on ua-client that even completed travis jobs are not firing a status response back to the source PR, so it remains unmergeable [17:56] It happens from time-to-time, yeah. [17:57] Migrating to travis-ci.com should also fix this, I believe. [17:57] (Because AIUI .org uses an older, now-deprecated GitHub API.) [17:59] similar to intermittent probs like this I think https://github.com/travis-ci/travis-ci/issues/7363 [17:59] gotcha [18:00] yeah I can see your travis run has completed with success https://travis-ci.org/github/canonical/cloud-init/builds/678274876 but no status update on your PR yet https://github.com/canonical/cloud-init/pull/323 [18:00] Yep. [18:03] https://www.githubstatus.com/ github issue: Update - We have implemented a fix and are processing a backlog of notifications. [18:03] Apr 22, 01:26 UTC [18:03] Travis reported they were operational after that. [18:04] Oh, not after that, but I think notifications are probably user-facing notifications? [18:04] https://www.traviscistatus.com/incidents/bj882gcyxh9v corresponded to https://www.githubstatus.com/incidents/dsf2qtzh4jpz [20:06] if I want cloudinit to write netplan yml file into /run/netplan, where is the right place to change the netplan_path? [20:07] I changed it in the datasource's __init__ (distro.renderer_configs['netplan']['netplan_path']) but I don't think it's being picked up [20:27] AnhVoMSFT: I don't know off the top of my head. What are you trying to achieve by doing this? [20:30] cloud-init, once disabled/removed leaves behind the /etc/netplan/50-...yml configuration file that has a mac address hardcoded in it. This causes problem for customers who snapshots the VHD and wants to boot them up as a separate VM. [20:31] since network configuration is re-generated upon every boot on Azure anyway, it makes more sense to write the netplan configuration file in /run where it does not persist across boot [20:32] I'm trying to change the path of the netplan config from within the datasource so that it writes to /run/netplan instead [20:34] Shouldn't cloud-init run on those VMs and regenerate the correct configuration (with the appropriate MACs for that VM)? [20:38] A couple scenarios where that does not work: 1) Customers already disabled/removed cloud-init, 2) In some scenarios, the metadata source isn't available when booting these VHDs [20:42] So I changed the distro's renderer_config that was passed to the datasource, but when I print it out from distros/__init__.py's _supported_write_network_config, it does not seem like the change was picked up [20:51] And what would happen to an instance that was rebooted and had cloud-init fail for some reason? I think it would fall off the network if its networking config was all in /run? [20:56] That is a good point. I think it would depend on when/where cloud-init fails. Let me think about it a bit [21:00] In fact, (1) is a case where storing the network config in /run would fail too, isn't it? `apt remove cloud-init; reboot` -> no network config [21:02] indeed. Would writing a netplan file into /etc/netplan without a mac address, then write one with mac-address into /run work? thinking out loud [21:03] although that is probably as good as not writing mac address into /etc in the first place [21:05] AnhVoMSFT: This feels quite complex, and I'm worried that we will miss/forget stuff if we discuss it in IRC. Do you think you could file a bug for it so that we can make sure we all understand the requirements/problem statement? [21:06] let me see if we had an existing bug on it [21:07] we did talk about this with Ryan and Josh in one of our sync meetings and at the time the /run approach seemed reasonable, but you pointed out a pretty big gap === mutantturkey is now known as old_joe [21:07] I guess the main problem is cloud-init is leaving behind the netplan file with a hardcoded mac address in it === old_joe is now known as mutantturkey [21:08] Well, it's "leaving it behind" so that it can apply network configuration correctly on the next boot, so it's not entirely a "problem". :) [21:09] I think what I meant was when it gets removed/uninstalled, etc... [21:09] but the problem isn't so much of leaving it behind, the problem is it hardcodes the mac address in it, which potentially can become stale and if there isn't an entity that updates it [21:12] Right, but I think hardcoding the MAC address is the correct thing to do in the general case. Because if we don't do it then, potentially, on future boots, interfaces can be presented to userspace with different names (this can happen due to races in the kernel, so it's not platform-specific, or it can be the platform presenting them at different PCI addresses), and we'll apply incorrect configuration. [21:13] (Do you already have a deprovisioning process that these customers are expected to follow? Could that be expanded to include a step which calls cloud-init somehow?) [21:14] if there is only one nic there's no need for hardcoding mac. Or do we still need to hardcode it? [21:14] the trouble is the backup/restore scenario where the customer takes snapshot or backups the OSDisk, then later restore it (as a different VM) [21:15] although backup/restore might not be as big of a problem if they provision it again as a normal VM, because cloud-init will run and perform network config [21:15] And boots of those restored VMs don't run cloud-init? [21:15] Aha, we raced on the question and answer there. :) [21:15] only if they attach OS Disk as specialize VM (which is the only way to boot up from a vhd today) [21:16] so there're some limitation of the platform there - when attaching disk as specialize vhd there isn't provisioning information being made available and cloud-init fails at some point earlier on and doesn't really do network config if I remember correctly [21:17] (it would fail to find Azure datasource, because there's no provisioning ISO attached) [21:18] To answer a slightly earlier question: we wouldn't need to hardcode the MAC if we were sure there would only _ever_ be one NIC. But instances could have NICs attached, or disk images could be restored to systems with multiple NICs, so we can't assume that. [21:19] (Obviously the restore case would break with a hardcoded MAC, so perhaps that wasn't the best example. Still, the attach case is valid.) [21:19] yeah, customers can add new NIC, reboot, and probably lose network :-) [21:20] actually in that case no, because when they reboot they will get new config with 2 NICs and we'll be writing network config correctly (hopefully) [21:21] Right, this would be the case where cloud-init had been disabled, I guess. [21:22] Instance booted with a single NIC, cloud-init persists MAC address, cloud-init is removed, NIC added, reboot -> the cloud-init generated config will still reliably apply to the original NIC [21:22] (Right?) [21:22] right [21:22] this is tricky... [21:23] Agreed. [21:23] :p [21:23] let me look into the scenario where we boot up vhd and no provisioning ISO attached [21:24] Yeah, this definitely feels like we need to understand the exact requirements driving the change, because that could make a substantial difference to the solution. [21:24] perhaps we can do something there [21:25] yeah, we have these support cases from backup/restore customers who now fail to boot up VM due to mac address in netplan. I will take a closer look and perhaps file a bug with better details so we can discuss [21:25] OK, cool, thank you! [21:26] thanks Odd_Bloke === tds1 is now known as tds