[17:04] <blackboxsw> holmanb: thanks for the spec review for cloud-init adding "autoinstall:" support though #cloud-config schema. So, the plan is for cloud-init to source the supplemental subquity "autoinstall:" static JSON schema and validate that during installer/ephemeral boot if present.
[17:04] <blackboxsw> holmanb: So, let's discuss your open question in the spec: "why would we want cloud-init want to fallback to use a static opaque schema which only validates "autoinstall:" as a top-level object if no subiquity static schema is present.
[17:07] <blackboxsw> holmanb: I admit I was thinking about this only in terms simple JSON schema validation via 3rd party tools which use schemastore.org such as https://www.jsonschemavalidator.net/   and vscode if someone creates a cloud-config.yaml file in their repository in vscode, or via jsonschemavalidor.net there will not be a supplemental schema anywhwere which would permit an "autoinstall:" key in #cloud-config and those YAML config keys will get 
[17:07] <blackboxsw> rejected by local validation
[17:08] <blackboxsw> holmanb: I admit though, that this opaque fallback presents two problem in that local autoinstall: schema validation via vscode or schemavalidator:
[17:09] <blackboxsw> 1. opaque local schema validation of autoinstall could potentially "validate" any invalid autoinstall sub-keys which would error out when officially trying to launch via a live-installer environment
[17:15] <blackboxsw> 2. any opaque or full static copy of subiquity autoinstall schema that cloud-init bakes into our static schema definitions will be static and likely be out of date with latest subiquity json schema unless we dynamically $ref the upstream subiquity JSON schema from https://github.com/canonical/subiquity/blob/main/autoinstall-schema.json
[17:16] <blackboxsw> And maybe our best course of action is to either use a local subiquity packaged schema file if present, or source a remote $ref on https://github.com/canonical/subiquity/blob/main/autoinstall-schema.json as fallback instead of cloud-init creating it's own opaque local schema
[17:22] <holmanb> @blackboxsw: +1 thanks for bringing the conversation here. I see the "fallback to opaque validation" as an unwanted silent failure for both editor autocompletion usage, as well as in the case of validation for subiquity.
[17:24] <holmanb> @blackboxsw: If the schema isn't properly sourced, I think I want some proverbial flashing neon lights indicating that either my editor config isn't right, or that subiquity is riding a unicycle down the freeway without training wheels.
[17:25] <blackboxsw> holmanb: yes that makes sense. It's a disservice to provide either a stale snapshot or opaque representation of that schema for local/exxternal tool schema validation in that case.
[17:26] <holmanb> @blackboxsw: I must also admit I don't fully understand refs. This bit from the docs particularly makes me feel like my intuition is best left behind: "Generally, implementations don’t make HTTP requests (https://) or read from the file system (file://) to fetch schemas. Instead, they provide a way to load schemas into an internal schema database."
[17:27] <holmanb> link: https://json-schema.org/understanding-json-schema/structuring.html#schema-identification
[17:32] <blackboxsw> Right, I think the implementation may be different depending on whether it's python JSON schema validation vs golang vs something else. From my limited understanding herem we have defined a top-level base-uri via our $id tag here https://github.com/canonical/cloud-init/blob/main/cloudinit/config/schemas/versions.schema.cloud-config.json#L3 and from that top-level based URI we have potentially a number of relative URLs we can source.
[17:34] <holmanb> @blackboxsw: Thanks, perhaps that's just an implementation detail that isn't really necessary for this discussion.
[17:34] <blackboxsw> But, because of limitations in using relative $refs in schema draft 04 python implemnentation, we actually don't use the relative URLs https://github.com/canonical/cloud-init/blob/main/cloudinit/config/schemas/versions.schema.cloud-config.json#L14
[17:35] <blackboxsw> we instead use the full URI and pull that upstream file down to validate latest schema version for cloud-init.
[17:36] <holmanb> @blackboxsw: I think I agree that 'use the local autoinstall schema by default, fallback to network if none provided' generally makes sense. This will give you the packaged version that matches the OS by if subiquity is installed, and the most recent published version otherwise.
[17:38] <blackboxsw> so anyone's vscode (and jsonvalidator) should be validating current local user-data/cloud-config.yaml against our upstream versioned schema. What this means though because json validation implementions are sourcing our upstream static JSON files, is that schema version deprecation needs to support older schema versions in https://github.com/canonical/cloud-init/blob/main/cloudinit/config/schemas/versions.schema.cloud-config.json
[17:39] <blackboxsw> >> This will give you the packaged version that matches the OS by if subiquity is installed, and the most recent published version otherwise.   Ok good. we'll start with that and iterate in a PR in further discussion on this point as we marinate on that option and potential pitfalls :)
[17:48] <holmanb> @blackboxsw: Sounds good. Related, I think we may need to change how we've been handling the schema in main for release versions. Perhaps this was intended to be handled with the 22.3 release?
[17:49] <holmanb> Since our versions.schema.cloud-config.json currently points to upstream main
[17:49] <holmanb> I think we'll want to revert the v1 schema for 22.2 back to the latest 22.2 upstream release, add a "tip" schema for ongoing developement, and update versions.schema.cloud-config.json with each release with a new file.
[17:50] <holmanb> blackboxsw: does that make sense?
[17:52] <blackboxsw> @holmanb I think that is a gap that we need to address long-term. But, maybe we can fold some of that foundational work into the subiquity/autoinstall schema roadmap item. As it is currently we should be able to continue to reference "old" versions, and cloud-nit schema is not strict as far as permitting additionalProperties, so we have some short runway to iron out the best release-specific schema hardening.
[17:54] <blackboxsw> "continue to reference "old" versions" by that I mean our versions.schema.cloud-config,json file does represent old versions even if tip of main moves on to something incompatible. Our problem to address is how to ensure stable releases of cloud-init only honor the schema version they support.
[18:05] <holmanb> @blackboxsw: I think that referencing old versions can be accomplished in the method I described.  The version enum in versions.schema.cloud-config.json should probably have a default value of the current release in the tagged release, and on main the default would be the "tip" schema file. 
[18:06] <holmanb> I agree this isn't really urgent at the moment, although anyone using schema validation in an editor is currently getting prompts that are the tip of main, which will likely include non-valid values at any given point in time.
[18:17] <blackboxsw> +1  on that holmanb 
[18:19] <holmanb> @blackboxsw:  and maybe the resolution to this issue is as simple as s|main|22\.2| on this line https://github.com/canonical/cloud-init/blob/4970f3b9f5a4c3f48e445ef7bcc2c73bd924315e/cloudinit/config/schemas/versions.schema.cloud-config.json#L14
[18:39] <themachine> I'm trying to use network config v1 with NoCloud source and overall it's worked fine but I'm now attempting to assign a MAC address with "mac_address:" and upon doing so the rest of the network configuration never takes place. I've recreated this on Debian 11 and Centos Stream 8 cloud images so far. My network-config file I'm currently testing is: https://termbin.com/6vzb
[18:40] <themachine> with MAC_ADDRESS getting replaced with a randomly generated but valid MAC
[18:42] <themachine> If anyone has a suggestion as to what the cause is I'm all ears. I didn't find anything of help in the cloud-init.log on the VM.
[18:59] <minimal> themachine: what do you mean assign a MAC address? the hypervisor will define whatever MAC a NIC has
[19:00] <minimal> any mac address specified is use for matching interfaces, not for setting a mac address
[19:01] <themachine> minimal: at least with qemu/kvm and libvirt a mac is randomly generated for the guest
[19:01] <minimal> right, but that's nothing to do with cloud-init
[19:02] <minimal> you do not need to specify any mac address in network-config unless there is more than 1 NIC and you are trying to *match* specified settings with a particular NIC
[19:03] <themachine> I am manually assigning a mac at the hypervisor level but when I use mac_address: in my network config to match that mac the rest of the networking is not configured.
[19:04] <themachine> I was attempting to use mac matching to avoid having to know the name of the interface in the guest.
[19:05] <minimal> which version of cloud-init? I believe the entry name is "macaddress" (without "_") in some versions of cloud-init
[19:05] <minimal> themachine: so you are setting up more than 1 NIC for the VM?
[19:05] <themachine> 20.4.1 on this debian 11 guest
[19:06] <minimal> themachine: so you are setting up more than 1 NIC for the VM?
[19:06] <themachine> I am not
[19:06] <minimal> so then don't specify any mac in the network-config then
[19:07] <minimal> as I said its basically used when there is more than 1 NIC present
[19:08] <themachine> Would you have a suggestion then on how to assign networking if I don't know the "name:" of the interface? My problem I'm running into is that some cloud images use the predictable naming convention whiles others (such as centos) use eth0 still.
[19:08] <minimal> the v1 docs say "name: <desired device name>"
[19:08] <minimal> so it is the *desired* name, not the current name
[19:13] <minimal> themachine: ^^^
[19:14] <themachine> Yes I'm working with that now but it's not behaving as I would expect it to. This may be due to my own lack of understanding of system network configuration.
[19:16] <themachine> I just removed the mac_address from my network-config and build a new centos guest. I see that in sysconfig/network-scripts a script with the appropriate configuration is created but that isn't active on boot and thus the rest of the cloud-init configuration (such as update and package installation) fails
[19:16] <minimal> so what does the cloud-init.log show?
[19:17] <minimal> is the network config being written? (Centos is sysconfig isn't in?)
[19:17] <themachine> Yes it would appear that the network configuration is written but then not used
[19:18] <themachine> I provided "enp1s0" as the interface name but on boot eth0 is the only interface presented by 'ip link show'
[19:18] <themachine> I'm looking at the cloud-init.log now
[19:19] <minimal> so is network written for enp1s0 or for eth0?
[19:19] <themachine> enp1s0
[19:20] <minimal> what value are you specifying for "name:" in the network-config?
[19:20] <themachine> enp1s0
[19:20] <minimal> ok, so that sounds correct
[19:20] <blackboxsw> passing through comment, on my centos9 lxc instance with cloud-init enabled: grep eth /var/log/cloud-init.log https://paste.opendev.org/show/bjfmE5zMmOveaSAoJ3ow/
[19:20] <themachine> I agree. What I don't understand now is why enp1s0 isn't being used.
[19:21] <minimal> well that sounds like a Centos issue ;-) if the sysconfig for enp1s0 is setup then I'd expect Centos to use it
[19:22] <minimal> but the log you just pasted references eth0 - which is not what you said
[19:22] <themachine> that wasn't me
[19:22] <minimal> doh, didn't notice it was blackboxsw lol
[19:23] <blackboxsw> heh, sorry drive-by comment as I was poking at what I had locally
[19:24] <minimal> themachine: you won't see similiar log entries to those blackboxsw posted unless debug is enabled for cloud-init.log
[19:28] <blackboxsw> themachine: +1 on that. generally what you want to see is what cloud-init processed for network config via a logs like:  `egrep 'applying net config names|Applying network|rename|render|Writing to /etc' /var/log/cloud-init.log `
[19:30] <blackboxsw> that should give you an understanding of what cloud-init saw from network config, whether it renamed network interfaces, when network config backend cloud-init determined "renderer" of choice and ultimately what /etc network config file it wrote.
[19:30] <blackboxsw> journalctl -b 0 should also have NIC discovery and rename information from the kernel and cloud-init which should also give breadcrumbs for whether device renames happened and cloud-init missed them or something strange like that
[19:37] <blackboxsw> also there is funkiness I recall on systems with both network-manager and sysconfig that could result in cloud-init emitting sysconfig files, but network-manager trying to control the device or vice-versa https://bugs.launchpad.net/cloud-init/+bug/1894837
[19:37] <themachine> Thank you for your advice.
 :)
[19:38] <themachine> I definitely see now that all the network configuration is being written appropriately but something else is happening that I apparently don't yet understand.
[19:44] <minimal> blackboxsw: remember we talked last week about looked like a v1->netplan bug re: nameserver. Did a bug get raised?
[19:46] <themachine> For the sake of comparison I switched back to Debian 11 and I changed the "name:" of the interface eth0 because I know the actual adapter name in the system will be named enp1s0.
[19:47] <minimal> themachine: you changed the "name: eth0" entry to "name: enp1s0" you mean?
[19:47] <themachine> Upon building the guest I see in /etc/network/interfaces.d/ there is 50-cloud-init which has the appropriate network config present but 'ip link show' indicates the only adapter is enp1s0
[19:47] <themachine> minimal: No, i set "name: eth0"
[19:48] <minimal> ok
[19:49] <themachine> and attempting to start networking with 'systmectl start networking' results in an error in the journal indicating 'Cannot find device eth0'
[19:52] <blackboxsw> minimal: for shame. I happily went off to PTO for the US holiday and forgot the file tha bug from thursday. Will file it now.  `cloud-init devel net-convert -network-data=netv1.yaml --kind=yaml -D ubuntu  -O netplan -d    outputdir` etc.
[19:55] <themachine> So my experience so far is that if 'name:' doesn't match the actual interface name assigned by the kernel then networking cannot start
[20:01] <themachine> If I also use "mac_address:" to match the MAC then the interface is renamed to whatever I define for "name:" but no network configuration is ever written at all
[20:02] <minimal> blackboxsw: I'm looking into another issue but not sure if it lies with Alpine or with cloud-init - using NoCloud seed-url where the downloaded meta-data contains network-config info, a /e/n/i is written with the proper config but the "ifup" executed by cloud-init has no effect, the DHCP-allocated IP setup for the seed request to be possible remains - if I manually do a "ifup -f" afterwards I do get the desired network config but the DHCP address also 
[20:02] <minimal> remains. Its going to take me a bit of time to dig further
[20:06] <themachine> I need to go eat some lunch but if anyone has additional input regarding what I'm seeing/doing I'll gratefully read it in a bit.
[20:06] <minimal> themachine: sounds like a bug
[20:08] <blackboxsw> minimal: from the looks of things regarding the nameserver issue on Thursday. netplan code doesn't seem to track any global nameservers at all unless they live under a specific dns_nameserver key within each interface. I'm trying to dig through history of the global nameserver discussion related to this to see if a bug due to accidental omission or intentional
[20:11] <minimal> blackboxsw: I'd expect "global" nameservers to always work, it's the per-interface specification that I wouldn't be sure that all distros support
[20:12] <blackboxsw> I mean a dns_nameserver under each subnet https://github.com/canonical/cloud-init/blob/main/cloudinit/net/netplan.py#L122-L123. checking my facts/expectations now.
[20:25] <themachine> minimal: well at least I'm not crazy then
[20:29] <blackboxsw> minimal: found the behavior with nameservers being ignored https://github.com/canonical/cloud-init/blob/main/cloudinit/net/netplan.py#L421-L422 so we don't apply the global DNS if a subnet already has a specific nameserver documented, or if it doesn't have a defined address. For dhcp* type it doesn't make sense.
[20:29] <minimal> themachine: you're using Debian Stable which has quite an old version of cloud-init. Is Centos Stream 8 using a more modern version?
[20:30] <minimal> blackboxsw: right, as for DHCP you'd expect a DHCP option in the response to specify DNS servers?
[20:32] <blackboxsw> minimal: in dhcp type we'd expect DHCP to give DNS. I misread the subnet type in the prior bug, it was ipv6_slaac. Which I think doesn't support DNS directives??? If that's the case, then we need to permit global "nameservers" in these type of subnets
[20:33] <themachine> centos-stream8 has 21.1-11
[20:34] <minimal> themachine: a bit more recent
[20:34] <minimal> blackboxsw: for IPv6 RA can specify DNS servers
[20:34] <themachine> The same behavior is present when I do this exact thing. Using "mac_address:" matching results in the interface being renamed to my supplied "name:" but no network config is written
[20:36] <themachine> and centos-stream9 has 22.1-3
[20:36] <minimal> blackboxsw: DHCPv6 can also indicate DNS servers
[20:36] <minimal> blackboxsw: I don't think SLAAC can indicate DNS servers
[20:38] <blackboxsw> I was admittedly peeking over this link FWIW  https://www.networkacademy.io/ccna/ipv6/stateless-dhcpv6
[20:41] <blackboxsw> minimal: so doesn't the then imply that we don't want to apply global nameservers to ipv6_slaac type devices? I'll open the bug with context just so we can close out historically with the decision there related to iandk's initial IRC query https://irclogs.ubuntu.com/2022/06/30/%23cloud-init.html
[20:46] <blackboxsw> https://bugs.launchpad.net/cloud-init/+bug/1980877 for what it's worth
[20:48] <minimal> blackboxsw: IPv6 is so complicated that I'm not sure what I'm really suggesting ;-)
[20:49] <minimal> but from a general IPv4/IPv6 perspective I would have assumed that DNS would be set up globally unless there are multiple interfaces or that network-config specifies interface-specific DNS settings
[21:00] <blackboxsw> right, but who knows with complexity introduced by security minded-network folks and whether global DNS should apply as default behavior for subnet types.  I think for this specific case, the version: 1 config could present a specific dns_nameservers: in the slaac subnet definition. https://bugs.launchpad.net/cloud-init/+bug/1980877/comments/1
[21:01] <blackboxsw> marking it low priority as there is a workaround in network-config version 1 that makes this "work", although maybe not ideal
[21:10] <minimal> blackboxsw: not c-i specific but in the past on Unix/Linux in general I put DNS servers in /etc/resolv.conf, I've never specified them in a interface-specific location
[21:13] <minimal> in a multi-interface machine (whether IPv4/v6/or mixed) a DNS name would be resolved to an IP address and then that address would be used to make a routing decision as to which interface to send traffic via