=== falcojr5 is now known as falcojr
[42]i finally got around to spend a bit more time debugging my cloud-init network issues on debian17:09
[42]so i'm trying to use netplan rendering with systemd-networkd17:09
[42]as i also have ifupdown installed i'm trying to force the renderer to netplan via "network: renderers: ['netplan']" in /etc/cloud/cloud.cfg17:11
[42]with renderer netplan set i get https://paste.debian.net/plainh/f8dfa149 - without it it renders as expected into /etc/network/interfaces.d17:13
[42]example config: https://paste.debian.net/plain/117808717:14
minimalwhat's you network-config.yaml contents? Is it a v1 or v2 config?17:14
[42]cloud-init 20.2 on debian buster17:14
[42]fwiw both don't work17:15
[42]i'm trying with a simplified v1 config right now generated by proxmox but my custom v2 config yields the same result17:15
minimaltried turning on debug for cloud-init, might give more info on what its doing just before the error17:16
minimalwhere's it getting the network config from? Which DataSource are you using? doesn't look like NoCloud as no sign of it mounting fs with the YAML files17:24
minimaldid you put the config straight into the cloud.cfg file?17:24
[42]it's NoCloud17:24
minimalstrange, i'd expect to see lines where it runs blkid to find a FS with label 'cidata' before mounting it.17:25
minimalis this not the 1st boot for this machine?17:25
[42]2nd iirc but i've been running `cloud-init clean` and `cloud-init --debug init`17:26
[42]https://paste.debian.net/plainh/1ee03e90 grabbed the full log files17:27
minimalright, full log has what I expected for NoCloud:17:28
minimal2020-12-23 17:25:25,057 - DataSourceNoCloud.py[DEBUG]: Attempting to use data from /dev/sr017:28
minimaland same config worked before, you just forced netplan render now?17:30
[42]i haven't used this with cloud-init before but if i don't force netplan renderer i get no error and it creates a file in /etc/network/interfaces.d with the expected contents17:31
minimalhaven't used netplan myself either. Wasn't sure if it was supported by c-i on Debian.17:34
minimalthe log line: "RuntimeError: Unknown network config version: None" shows its very confused about the value of version. Might need to try adding some debug prints to the code around the error to figure out what's going on17:35
[42]which file should i be looking at for that?17:35
[42]nvm it's in the traceback17:36
minimalchecking cloudinit/distros/debian.py I do see netplan mentioned in there should I guess it is supported on Debian17:37
[42]added a debug statement for the netcfg section17:38
[42]so it considers {'renderers': ['netplan']} to be the entirety of my network config17:38
minimalnetplan and network-config v2 are basically the same thing. Is the error the same when supplying v2 config?17:38
[42]it doesn't even see my network config17:39
[42]only takes the network block from /etc/cloud/cloud.cfg17:39
[42]which is supposed to set the renderer17:39
minimal2020-12-23 17:25:25,058 - util.py[DEBUG]: Reading from /mnt//network-config (quiet=False)17:41
minimal2020-12-23 17:25:25,058 - util.py[DEBUG]: Read 318 bytes from /mnt//network-config17:41
minimalthat's it reading your YAML once ISO is mounted17:41
[42]cloud_init.net tries  version = netcfg.get('version')17:42
[42]but netcfg is literally {'renderers': ['netplan']}17:42
minimaltry looking at /run/cloud-init/instance-data.json should (from memory) contain what its read from the ISO17:42
[42]i added a debug print there17:42
[42]in def extract_physdevs(netcfg)17:43
minimalyeah it seems like the YAML is being somehow "lost" after being loaded17:43
[42]there's nothing network related in instance-data.json17:46
minimalstrange as your log has this:17:46
minimal2020-12-23 17:25:25,064 - handlers.py[DEBUG]: finish: init-network/search-NoCloudNet: SUCCESS: found network data from DataSourceNoCloudNet17:46
minimaland that's just logged after it writes stuff to /run/cloud-init/instance-data.json17:47
[42]ah it's in -sensitive.json17:48
minimalah, I didn't suggest that as I thought only password and the like in there17:49
minimalso its loaded it ok17:49
minimalits somehow getting "lost" after that17:49
[42]that also just has the renderer defined17:49
[42]so it's lost before that17:49
[42]> 2020-12-23 17:51:37,447 - stages.py[INFO]: loaded network config from system_cfg17:52
[42]if i understand it correctly it's not loading from the datasource but just system config17:53
minimaljust checked a VM here which uses ISO - sorry, those files don't contain the network data.17:54
minimalam comparing your logs with the ones here17:54
[42]self.datasource.network_config does contain the config in _find_networking_config17:57
minimalyour logs refer to DataSourceNoCloudNet whereas mine refer to DataSourceNoCloud17:59
[42]it tries cmdline, initramfs, system_cfg17:59
[42]in this order17:59
[42]and system_cfg returns17:59
[42]it reads the order ('cmdline', 'initramfs', 'system_cfg', 'ds') from self.datasource.network_config_sources18:01
minimalwhere does your YAML come from exactly? your logs mention sr0 but I don't see any mount/umount logged18:02
[42]i manually mounted it before already18:02
minimalits mounted via /etc/fstab?18:02
minimalyou're manually running cloud-init on a running system?18:03
[42]`cloud-init clean && cloud-init --debug init`18:03
minimalah ok, normally during 1st time boot I'd expect to see c-i mount the cidata FS18:03
[42]class DataSource(metaclass=abc.ABCMeta) defaults to preferring NetworkConfigSource.system_cfg over NetworkConfigSource.ds18:06
[42]and i guess it just doesn't merge network config but can only replace instead?18:06
minimalits obviously crashing as the data structure that should contain the network config is empty18:06
minimalwhere's that class?18:06
minimalyour /etc/cloud/cloud.cfg contains: datasource_list: [ 'NoCloud' ] ?18:09
minimalwell actually I guess [ 'NoCloud', 'None' ]18:10
minimaland you are manually running the 4 init scripts in sequence?18:11
[42]i'm not manually running individual init scripts, i'm running `cloud-init init` which as far as i understand should take care of that for me?18:12
minimalthere's a sequence of 4 script run during boot18:13
minimalcloud-init-local -> cloud-init -> cloud-config -> cloud-final18:15
[42]datasource_list is not actually present in cloud.cfg18:15
minimalI think there's a buildin default DS list, not sure18:15
minimalbut the sequence of running those scripts during boot is important. As you're on Debian I assume there will be systemd equivalent service files18:16
minimalcloud-init-local is run early to do things like bring up temporary networking (i.e. on AWS to talk to Metadata Service) and for NoCloud to fetch YAML config with network info18:17
minimalso that networking is up-and-running before cloud-init script runs next18:17
minimalremember cloud-init is designed to be run *during* system boot, not manually (unless you're debugging it) ;-)18:24
[42](which i am)18:25
minimalthat link I posted shows the sequence of scripts18:26
minimalso you need to run those manually when testing18:26
minimalafter first doing a "cloud-init clean"18:26
[42]i may have just found the issue18:27
[42]give me a few min to test18:27
[42]so the main issue was having the network block at root level in cloud.cfg18:49
[42]i'm still working on remaining issues but it should have been under system_info18:50
[42]i'm getting a netplan config now18:50
minimalwhat was the cloud.cfg issue?18:59
minimalwhere you specified the renderer?18:59
[42]it shouldn't be on root level19:00
[42]because then it conflicts with the network cfg i'm passing in the ds19:00
[42]network is coming up now but for some reason it's stuck for a minute here: [  *** ] A start job is running for Raise network interfaces (18s / 5min 1s)19:00
[42]oh nvm i was too fast on that19:01
[42]after rebuilding it's borked again19:01
[42]hm, last attempt it used eth0, now it's back to ens1819:01
minimalyupe, in system_info is also the distro value - that's used the select the relevant distro-specific Python file which defined which renders are used automatically by a distro (i.e. debian.py has 'eni' and 'netplan' in that order)19:02
minimalif your network has mac address specified then it can/should use that to rename an interface19:04
minimalwith debug turned on the cloud-init.log will show if it decides to rename19:04
[42]i was trying to avoid having to manually put the mac in there as well :)19:04
minimalyou could use the Linux kernel cmdline option to disable the "new style" interface naming19:05
minimalthat will force the use of old style eth0, eth1, etc form19:07
[42]i don't mind the new names19:08
[42]they're still predictable19:08
minimalBTW the v1 network-config is more limited than v2, I recommend using it19:08
[42]actually even better predictable without having to know the mac19:08
minimalv2 that is...19:08
[42]yeah i'm already using v219:08
[42]my network config requires that anyways19:08
[42](also reason why i use netplan)19:08
minimalthere are some things I can't do using v119:08
[42]my network cfg for example has different source ips for different routes19:09
minimalv2 will work with /etc/network/interfaces19:09
minimalhmm, having had to look at source IPs, what's the v2 format for specifying that? routes are just dest net, via, and metric from memory19:10
[42]e.g. {"to":"","via":"","from":""}19:12
minimal'from' isn't mentioned in the cloud-init docs19:13
[42]and then in the default route i'd have the vms public ip19:13
[42]no it's netplan config19:13
[42]so i'm not sure if it would render correctly for ifupdown19:13
[42]even though it's not documented in ci it's passed to netplan19:13
minimalbut you're providing a v2 network config, not a netplan for cloud-init to convert to netplan19:14
minimalhmm, it could "break" in future - unless you submit a PR to get it officially documented19:15
[42]the entire body of the network: block is passed to netplan19:15
[42]so i'm practically giving it a netplan config19:15
minimaltrue, but its not guaranteed to always work - either its intended cloud-init behaviour but someone forgot to document it or its unintended behaviour and therefore could change at any time19:17
[42]https://paste.debian.net/plainh/40e5318a for some reason there's a dhcpclient running in this image19:17
[42]and that seems to be what's blocking that long19:18
minimalI have to find some time to raise a MR/PR for a minor fix in cloud-init /etc/init/network renderer for static routes19:18
minimalisn't the dhclient started by systemd? or does it handle that itself natively these days?19:20
[42]i don't have a dhclient service19:21
[42]as in dhclient.service does not exist19:22
[42](nor does anything dhc*)19:22
[42]it seems that dhclient is started before systemd-networkd logs ens18: Configured19:24
minimalNetworkManager or something like that?19:27
[42]that's it19:28
[42]https://bugs.launchpad.net/cloud-init/+bug/1909138 should solve the documentation issue19:30
ubot5`Ubuntu bug 1909138 in cloud-init "cloud-init should officially support routes with source ip" [Undecided,New]19:30
minimalI guess the underlying issue is that if there's no way to map from v2 to eni & netplan & other (RedHat/Suse) whether it would be accepted as an optional entry or not19:33
[42]> Dec 23 19:03:42 debian cloud-ifupdown-helper[285]: Generated configuration for ens1819:34
[42]so that's19:35
[42]being generated via /etc/network/cloud-ifupdown-helper19:35
minimalActually I have 2 issues regarding static routes still to raise (both for /etc/network/interfaces): (1) with both IPv4 and IPv6 static routes the renderer puts them together in the IPv6 interface definition rather than separately in the IPv4 and IPv6 sections, and (b) I want to modify the render to use "ip" rather than "route" if its installed locally - using "ip" might also by change mean that source routing could be specified19:36
[42]i guess i'll adjust my patch script to nuke /etc/udev/rules.d/75-cloud-ifupdown.rules19:36
[42]then that should be fixed19:36
[42]does it technically make a difference if the route is set in v4 vs v6 section?19:36
[42]https://bugs.launchpad.net/cloud-init/+bug/925145 would be your second issue19:37
ubot5`Ubuntu bug 925145 in Fedora "Use ip instead of ifconfig and route" [Medium,Confirmed]19:37
minimalyeah, I meant I was intending to raise an MR to actually implement it19:38
minimalyeah mixed v4/v6 static - well in theory an ifup (or equivilant) could successfully bring up IPv4 on an interface but not IPv6 and then the IPv4 static routes would be missing............ not a big issue, more of an minor irritation19:40
minimalI'm the Alpine cloud-init maintainer :-) We use /etc/network/intefaces19:40
[42]i knew they were separated blocks but i didn't know it internally has independent states for v4 and v619:41
[42]alpine - fits your nick :D19:42
minimalwith /e/n/i each "iface" section is a separate stanza and so logically self contained19:44
[42]> [   18.966169] cloud-init[487]: Cloud-init v. 20.2 running 'modules:final' at Wed, 23 Dec 2020 19:45:35 +0000. Up 17.67 seconds.19:45
minimalall sorted?19:46
minimalbuilding your own Debian disk images?19:46
[42]not really19:46
[42]patching prebuilt ones19:46
[42]reorder partitions, install netplan, make it use systemd-networkd and systemd-resolved19:47
[42]and a second patched image that uses btrfs instead of ext419:47
minimalso you're creating a franken-Ubuntu in other words? ;-)19:47
[42]still a regular debian :P19:48
minimalI build my own images, have a nice Ansible playbook for cranking out Virtual, Physical, and RPI images19:48
minimalneed to find time to get back to doing the Debian ones19:49
[42]i was thinking about it but this way i already have the cloud-optimized configuration19:49
[42]and i don't have to fiddle with the debootstrap or whatever stuff and build a disk image from that19:50
minimalyeah I'm using cloud-init even for physical machines - small partition with the YAML config so just DD the image onto a box and either the static IPs etc and boot19:50
minimalI guess it depends on how stripped down / tailored you want it to be19:51
minimallooks fine. Did you figure out where the 75-cloud-ifupdown.rules came from?19:54
[42]it's part of debians cloud image19:55
minimalright. You've take the tweak-it approach, I've taken the 1001 Flavours approach (do you want LWN Y/N? encryption Y/N? physical or virtual? etc) lol19:57
[42]i've also used preseeds in the past19:58
[42]but they're more painful to integrate when not using dhcp than cloud-init19:58
minimalme too, I've been a Debian user for a very long time :-)19:58
minimalI'm building ready-to-rock-out-of-the-box images to avoid need to run Ansible/Chef/Salt/Puppet once up for typical configuration - so locked down SSHd config, basic services etc already done.20:00
[42]i only have a very limited base config but it'd probably be a good idea to include that in my base image too now that i'm patching them anyways20:02
minimalwhen you start thinking about central syslogging, SSHd, Prometheus node-exporter, disabling unrequired kernel modules, etc its a neverending source of effort20:04
[42]one of the sshd_config lockdowns i like the most is limiting server keys to ed25519 and disabling all others - most login attempts already fail like this: Unable to negotiate with port 32646: no matching host key type found.20:05
minimalyupe, have done that already :-) Still need to raise an Alpine MR to change the init.d script to stop it creating all (missing) types at startup20:06
minimalwhat about installing/running rng? for hardware random (or virtio-rng for VMs)20:08
[42]--args "-device virtio-rng-pci"20:08
minimalno machines or VMs without hardware random that need haveged or jitter-entropy via rngd?20:10
[42]all of my personal vms with virtio rng passthru20:10
minimalmodifying your SSHd initial startup to wait if entropy is too low before creating the host key?20:11
[42]don't think i've done that yet20:11
[42]but that should only apply on fresh machines20:12
[42]so they should normally have enough entropy20:12
minimaltrue, I use virtio-rng too, I guess as I'm using a mix of physical and VMs and some physical boxes don't have hw rng that I've catered for varying scenarios20:12
[42]i don't create vms that often20:12
minimalthey *should* have enough entropy but safer to check (and wait) before creating keys20:12
minimalthank you for visiting the "1001 things to do to harden your server channel" ;-)20:13
[42]how do you block the ssh key generation until enough entropy is ensured?20:16
minimalcat /proc/sys/kernel/random/entropy_avail20:16
minimaland decide based on the value20:17
[42]so you modify the hostkey generation script?20:17
minimalbelow 1000 is not great20:17
minimalon this laptop for example I'm seeing 3800 now - HW rng + haveged + jitterentropy-rngd20:18
minimalmodify the init.d script - already had to do that to stop it creating more than just ED25519 keys20:19
[42]well it doesn't hurt to have the extra keys if they're not referenced but yeah :D20:19
[42]cleaner without20:20
minimalinit.d script (or systemd service) runs ssh-keygen so just wrap that20:20
minimalwell if the other keys are there there's a risk sshd config, now or in future, could allow they to be used.......whereas if they don't exist they can't be used.20:21
[42]doesn't sshd_config default to all supported types unless you specify one or more in which case only the specified ones are used?20:22
[42]ssh_genkeytypes in cloud-init allows specifying which host keys are generated with that20:23
minimalyeah. My point was if you unintentionally changed sshd_config (i.e. my mistake) you could end up re-enabling one of the other key types20:24
[42]good point20:24
minimalwhereas if the other keys don't exist then such a mistake in config doesn't matter20:25
[42]e.g. when a new dist version of a config is shipped20:25
minimalindeed.......you end up with a default config file with lots enabled20:25
minimalthough that's really a seperate issue of whether you're running anything (e.g. InSpec or Ansible) to check/enforce secure configs20:26
minimalssh_genkeytypes: yes, that's not the issue - the usual sshd init.d/systemd file at startup will still create any "missing" hostkeys20:27
minimalso even if c-i creates just the one the sshd service will create the rest20:28
[42]not in the cloud-init debian image :)20:28
minimalin yours or stock? haven't checked their cloud images for a while20:28
[42]i didn't patch it out20:29
minimalok, don't have sshd on this Debian box, must check another to see. Anyway as I'm working on 2+ distros its a general thing to address/ensure20:30
[42]interestingly /usr/lib/systemd/system/cloud-init.service:Wants=sshd-keygen.service20:31
[42]but sshd-keygen.service doesn't exist20:31
[42]funnily that's enough to enable sshd-keygen.service in systemctl bash-completion20:31
minimaloh, word of advise, if you specify MAC addresses in network config its best to quote them20:33
[42]i remember that due to yaml being funny20:33
[42]but i don't remember what exactly was the issue20:34
minimalyes, I raised a MR a couple of months ago to get the c-i docs fixed regarding this and it opened up a whole can of worms...20:34
[42]i really like yaml but some things are just... special20:35
[42](others call me crazy for liking yaml)20:35
minimalthe c-i YAML configs are YAML 1.120:35
minimalhowever there's no version declared in the docs anywhere (same issue applies to netplan as its basically network-config v2)20:36
[42]i do actually have another project where i'm using mac addresses20:36
[42]this reminds me to double check if they're quoted there :D20:37
minimalnow there's a type in YAML 1.1 called a Sexagesimal which is Base 6020:37
[42]and they're not :(20:37
minimalthat is *not* present in YAML 1.220:37
[42]yeah my other project uses pyyaml which is yaml 1.120:38
[42]and *surprise* the macs are not currently quoted20:38
minimalif you have a MAC address that is completely numeric (no 'a-f' present) is can possibly be mistaken for a Base 60 number20:38
minimalso for someone using cloud-init in VMs where any MAC address is "made up" (and the typical KVM prefix is all numeric) its a potential problem - which I hit20:39
minimalto avoid the problem you need to quote any values that can be misinterpreted as Base 6020:40
[42]in my other project i'll just switch to a yaml 1.2 parser20:40
[42]although almost all macs in there are from actual physical devices20:40
minimalso as part of my MR to fix the docs I also "fixed" their testcases to quote all MACs for uniformity - and some tests failed20:41
minimalas in their test framework some of the test YAML with quoted values were read into Python and stubs to create resultant netplan YAML which the parsing code did'nt quote (as for those values not required) - so test result vs expected result mismatch20:42
minimalpyyaml (which c-i uses) *does* recognise the "%YAML" directive with different values, however it doesn't appear to convert data accordingly20:44
minimalall fun and games20:46
minimalI've hit several corner cases with c-i as I guess I use it in non-typical ways20:47

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!