[14:32] <OneBit> Hello, i want install a ubuntu-server via cloud-init autoinstall. I have two disks and i want RAID 1. Is this possible?
[15:55] <dalurka> OneBit: yes
[17:55] <akutz> Is there a known issue where MAC addresses in metadata need to be case-specific for a given renderer? We found that when our provided MACs are lower-case, networking fails on RHEL. Now, this would be an older version of CI without several of the more recent patches, including one I did in network_state.py that does switch to case-insensitive MAC matching, but I am not sure if that actually addressed the issue or if it is even the same issue.
[17:55] <akutz>  Regardless, I think CI should 1) accept MACs in either case and 2) compare them in a case-insensitive manner and 3) output the value in the format required by the renderer.
[17:57] <minimal> akutz: do the MAC addresses have a leading zero? are they quoted?
[17:58] <akutz> They will always have the correct number of digits, as for quoted, not sure. Regardless, I've not seen the quoting affect it in the past. Is that possibly renderer specific?
[17:59] <akutz> Just verified that we do not quote the value.
[17:59] <minimal> akutz: I had an issue in past with network-config that required MAC quoting as due to YAML "smarts" they could be misinterpreted
[17:59] <akutz> Ack. Interesting.
[17:59] <akutz> Thank you.
[18:00] <akutz> But wouldn't that be specific to CI and not the distro? I guess since it's older CI on RHEL then Ubuntu/Photon may not be subject to the same issue since their CI will be newer.
[18:01] <minimal> akutz: here's my related MR from a couple of years ago: https://github.com/canonical/cloud-init/pull/623
[18:01] <akutz> minimal: cool, thank you. Here's the one to which I was referring up above -- https://github.com/canonical/cloud-init/pull/1327
[18:04] <minimal> akutz: there is https://github.com/canonical/cloud-init/pull/1135 when the change updated the docs to state "Letters must be lowercase"
[18:05] <akutz> Thanks again minimal -- we're going to try adding the quotes to see if that solves the issue. If it does, and it works for RHEL, then it would mean that there's some change in CI where uppercase worked on newer versions of CI but now older.
[18:06] <minimal> the quotes issue is only really a factor if the mac address contains no letters as then the YAML library treats it as a Base60 number instead
[18:06] <akutz> Ack.
[18:07] <akutz> So it turns out my engineer built CentOS 8 with CI 22.1. I didn't know she didn't use a stock image. That means she likely hit the change to requiring lcase MACs, and once we start testing against newer ubuntu and Photon images that use 22.1 and have that patch from 12/21, we'd hit it there too.
[18:10] <holmanb> akutz: yeah there was an old bug about the mac case that was worked on last cycle, but I guess that's already been brought up - the "built with 22.1" explanation makes sense
[18:11] <akutz> Do you happen to have that bug ID handy? The issue linked was just the doc change.
[18:12] <akutz> I found https://bugs.launchpad.net/cloud-init/+bug/1876941, but I don't see any code that addresses this. The solution is just to doc it?
[18:13] <akutz> (not judging, just asking)
[18:13] <akutz> Ah, found https://bugs.launchpad.net/cloud-init/+bug/1876363
[18:14] <holmanb> akutz: the first one is the one I was referring to "worked on last cycle" - I think we discussed and agreed that supporting both cases should happen
[18:14] <akutz> I may take on this bug, but I'm concerned about back-compat and the level of testing to ensure that it's handled appropriately everywhere.
[18:15] <holmanb> akutz: if you want to take it on, you're welcome to :)
[18:15] <holmanb> can help with ensuring test coverage when it comes time for review
[18:16] <akutz> Heh, I know no one would stop me per se, I'm just not sure I'm the right person for it. The LP has a specific location where this needs to be addressed (wait_for_phys_devices). I actually addressed this issue in the sysconfig renderer earlier this year as well. That's at least *one* location the LP didn't mention -- I'm sure there are others.
[18:17] <akutz> I think the first step will be to find out whether each renderer has specific requirements regarding MACs to ensure that any changes to those in CI does not affect how they are emitted for their respective renderers.
[18:18] <holmanb> akutz: that or you could just convert both cases to the case currently in use (assuming that works as expected)
[18:18] <holmanb> which would probably be less work than auditing various renderers :)
[18:19] <akutz> Yep. My concern is that the renderer may not actually correctly know which case it should be used and is dependent upon having the correct case passed all the way through via the metadata.
[18:20] <akutz> So there are two methods here -- normalize all macs or update all comparisons or dict lookups to be case insensitive. 
[18:21] <holmanb> akutz: agreed
[18:22] <akutz> And I guess my first thought is normalization is generally easier because there are only so many places where the value is inputted and outputted. That's true for comparison and lookup as well, but those will be trickier to spot I think. Anyway, that's what prompted my comment about renderers.
[18:25] <akutz> Huh, in netinfo.py the REs for matching MACs for various distros all use [\da-f], effectively asserting the system tools that report MAC info always do so in lcase.
[18:29] <holmanb> akutz: not sure if it helps, but here is a pointer to some mac regex I submitted to a different project: https://github.com/bee-san/pyWhat/blob/main/pywhat/Data/regex.json#L1479
[18:30] <holmanb> it supports either case
[18:31] <akutz> holmanb: thanks! I guess my point was more that CI seems to already internally expect MACs to be lcase in certain situations. And more importantly those situations involve the distro command used to produce the network info data. The fact that these distros all report MACs in lcase is strong evidence that they should be normalized as such. 
[18:32] <akutz> It honestly makes me wonder if there shouldn't just be a function that processes metadata prior to processing and updates all MACs to be lcase.
[18:32] <holmanb> akutz: ah, I see what you're saying
[18:33] <holmanb> yeah, agreed, normalize to lower case on input is how I'd tackle this
[21:17] <chiluk> Hey folks.  Does anyone have an example of a cloud.cfg that can actually set up dns nameserver, and search domains on 20.04 in aws?   The resolv_conf module doesn't work, and I can't seem to get a functional incantation of https://cloudinit.readthedocs.io/en/latest/topics/network-config.html
[21:19] <chiluk> Really it just feels like there are many ways to skin this cat, (forcibly setting /etc/resolv.conf, systemd-resolved, netplan, network-manager, ifcfg) what's the recommended way to do this.
[21:32] <holmanb> chiluk: Hello! Could you expand on "resolv_conf doesn't work"? I seem to recall systemd-resolved wiping out /etc/resolv.conf on $OTHER_DISTROBUTION, but I can't speak for 20.04 off the top of my head.
[21:34] <holmanb> chiluk: I would probably start with configuring dns along with other network configs via: https://cloudinit.readthedocs.io/en/latest/topics/network-config-format-v2.html, but I don't have an AWS/20.04 config lying around, sorry
[21:37] <chiluk> holmanb: yeah the resolv_conf cloud-init module is not resulting in a modified /etc/resolv.conf.  I haven't figure out if the module is failing or if resolv.conf is getting overwritten by systemd-resolvd ..
[21:37] <holmanb> minimal: Would you happen to know offhand whether cloud-init analyze works on Alpine?
[21:38] <chiluk> holmanb: as for network-config-format-v2, I couldnt' seem to make that work either, and I was hoping someone had a functional cloud.cfg I could copy.
[21:38] <minimal> holmanb: one sec whilst I check on a VM I have handy...
[21:40] <minimal> blame/show/dump work, boot fails with "You must be running a Kernel Telemetry supported distro."
[21:44] <holmanb> minimal: Interesting, thanks! 
[21:47]  * holmanb needs to figure out why lxc wasn't working for an alpine image
[21:48] <minimal> holmanb: have never tried lxc with Alpine cloud-init, its on my "to test" list
[21:49] <holmanb> minimal: I use lxc for checking various distro things, makes it really easy to test things quick
[21:50] <minimal> yeah I just need to figure out how to create a LXC container inside a chroot, that's how my script currently builds VM disk images
[21:50] <minimal> or else take a different approach for LXC
[21:51] <minimal> holmanb: is there a particular lxc template you're using for Alpine?
[21:51] <holmanb> minimal: I was trying to pull and run from the canonical image server: https://us.lxd.images.canonical.com/
[21:51] <holmanb> minimal: I don't build them myself
[21:52] <holmanb> looks like they have templates/build scripts/jenkins info on that page ^^
[21:53] <minimal> holmanb: this seems to be it: https://github.com/lxc/lxc-ci/blob/master/images/alpine.yaml
[21:53] <minimal> I think I looked at that once briefly
[21:56] <holmanb> yeah that looks like it
[21:57] <minimal> holmanb: which Alpine release did you use?
[21:57] <chiluk> Maybe I should ask a different question.  What is the best way to debug a failing cloud.cfg?  Is there a cloud-init command that forces it to rerun the network configuration?
[21:57] <chiluk> basically I want to boot a node ... modify cloud.cfg, and run something so that cloud-init attempts to use or analyze cloud.cfg
[21:58] <minimal> chiluk: have you tried enabling debugging for cloud-init? then the /var/log/cloud-init.log file would show a lot more detail as it processes network-config and user-data
[21:58] <holmanb> minimal: the cloud one: lxc init images:alpine/cloud
[21:58] <chiluk> well the problem is that I can't log in with my broken cloud.cfg
[22:01] <minimal> holmanb: did you write the user-data yourself or got it from the lxc people? e.g. from that template they seem to install openssh rather than openssh-pam which will cause problems if the user-data locks a user-account
[22:02] <holmanb> chiluk: there's a newer "pre-test-flight" tool for validating that your cloud-config is valid - that will check for obvious errors like keys that aren't valid even though yaml is valid
[22:02] <chiluk> cool I'll check that out.
[22:02] <chiluk> is there a way to force cloud-init to run as if it was running for a first-boot?
[22:02] <holmanb> chiluk: invoked via cloud-init devel schema -c ./config.yml on a running cloud-init system, or you could grab the cloud-init repo and run it locally setting the pythonpath
[22:03] <holmanb> chiluk: yes, to re-run use `cloud-init clean --reboot`, which will reset and reboot
[22:03] <holmanb> chiluk: or cloud-init clean followed by --reboot
[22:03] <chiluk> holmanb: will that reboot the node though?
[22:03] <chiluk> my issue is that I'm borking networking ..
[22:04] <holmanb> chiluk: yes it will
[22:04] <chiluk> is there a way to do that without the reboot?
[22:04] <holmanb> chiluk: have you tested your config locally in a container/vm?
[22:04] <chiluk> that's an idea... I guess I could probably just configure serial console in aws.
[22:05] <chiluk> alright I have some things to try .. thanks holmanb.
[22:05] <holmanb> chiluk: you _could_, by running the cloud-init "local" stage
[22:05] <chiluk> how does one do this?
[22:06] <holmanb> chiluk: I would highly recommend iterating in a local container/vm (where networking can be borked and you can debug) until your config works
[22:06] <holmanb> but...
[22:07] <chiluk> alright.
[22:08] <holmanb> chiluk: honestly I don't know if I would recomend running individual stages of cloud-init without a reboot
[22:09] <holmanb> likely to get results you don't expect, and there's four separate systemd services that run at particular parts of the systemd boot order
[22:09] <holmanb> err, minimal: I don't think we're talking about the same thing
[22:10] <minimal> holmanb: I thought you were creating a cloud-init enabled Alpine LXC container
[22:10] <holmanb> minimal: I'm just trying to pull down the image from the server to run an alpine container - I haven't managed to pull an image, so I haven't even gotten to cloud-config
[22:11] <holmanb> minimal: the canonical server has pre-built cloud-init containers of various distros
[22:12] <minimal> right: so you're trying to use this one? alpine	edge	amd64	cloud	20220426_13:00	YES (2.0 and up)	YES (2.0 and up)	YES	YES
[22:12] <minimal> for 3.15?
[22:12] <holmanb> the first one 
[22:12] <minimal> oops, sorry, Edge
[22:12] <minimal> the first one? 3.12?
[22:13] <minimal> the first Alpine version that had cloud-init packaged was 3.13
[22:14] <holmanb> minimal: ahh, it was user error
[22:14] <holmanb> edge
[22:14] <minimal> ok, edge is the "bleeding edge" of Alpine :-)
[22:15] <holmanb> minimal: perfect :)
[22:15] <holmanb> minimal: broken and new is my favorite
[22:15] <minimal> so if you can't even pull down the image then is that not a tooling issue?
[22:15] <holmanb> minimal: yeah you asking which version made me realize I was just invoking lxc wrong
[22:17] <minimal> hotmanb: so I was assuming those Canonical images were based on the upstream lxc templates, in which I pointed out a potential issue with the alpine cloud one & cloud-init
[22:18] <minimal> that behaviour of openssh (if PAM not enabled) and locked users
[22:18] <holmanb> minimal: ahh, because openssh is not used on alpine?
[22:19] <minimal> because in Alpine the package openssh-server is OpenSSH compiled without PAM and package openssh-server-pam is it compiled with PAM support
[22:20] <minimal> actually I'll have to check as they're actually specifying "openssh": https://github.com/lxc/lxc-ci/blob/master/images/alpine.yaml#L331
[22:20] <minimal> so "openssh" depends on "openssh-server"
[22:23] <holmanb> minimal: ahh, I see
[22:23] <minimal> long-story-short: in Linux a locked account is one where password access is blocked (acct is no disabled and so SSH-key-based access should still work), however upstream OpenSSH treat locked as completely locked (inc. SSH-key-based-access) *UNLESS* PAM is enabled
[22:23] <minimal> doesn't affect most distros as they only ship a single OpenSSH server package with PAM support
[22:24] <minimal> for Alpine though there's 3 different packages for OpenSSH server (3rd is with Kerberos support)
[22:24] <holmanb> minimal: sure, makes sense
[22:25] <minimal> I discussed this about 2 months ago on here with either blackboxsw or smoser as I wanted to figure out some sort of cloud-init way to handle this (e.g. adding a "disabled" option to "users:" section)
[22:26] <minimal> where "disabled" wouldn't lock the account (which puts "!" in password field) but rather put something else like "*" in there instead
[22:30] <minimal> holmanb: so have you now managed to pull down the image ok?
[22:31] <holmanb> minimal: I have, thanks
[22:31] <minimal> does it boot ok?
[22:32] <holmanb> minimal: it does
[22:32] <holmanb> one error, datasource related
[22:32] <holmanb> __init__.py[ERROR]: Could not import DataSourceVMware. Does the DataSource exist and is it importable?
[22:32] <minimal> I didn't see anything in the template about tweaking cloud.cfg's datasources list so I'm guessing its using the default list
[22:34] <minimal> holmanb: hmm, it is in the package: https://pkgs.alpinelinux.org/contents?file=*VM*&path=&name=cloud-init&branch=edge&repo=community&arch=x86_64
[22:35] <minimal> I would have assumed the LXC people would have tweaked the cloud.cfg to only try the LXC DataSource (for "performance" reasons) rather than go through a list of various DS
[22:36] <holmanb> minimal: /shrug
[22:36] <holmanb> I just use it 
[22:37]  * holmanb needs to eat food now
[22:37] <minimal> actually there is no LXC in c-i, only a LXD DS - I assume this handles both LXC and LXD
[22:38] <holmanb> Thanks for the help minimal!
[22:39] <minimal> holmanb: no problem. If its trying to use NoCloud for LXC then I suspect the LXC image has some other problems (i.e. they didn't follow the notes in my README.Alpine file for the cloud-init package)
[22:41] <minimal> holmanb: light reading here if you're interested ;-) https://git.alpinelinux.org/aports/tree/community/cloud-init/README.Alpine