[07:25] <StyXman> schopin: ack
[15:40] <Guest9223> Hi. I'm having some issues with a Hetzner server: after the apt update && apt upgrade && reboot netplan doesn't set the default gateway anymore -> I can only access it via a KVM. Any ideas what I can do or how to debug this? I compared the config to another server and I don't see any differences. Actually I never touched the default netplan config
[15:40] <Guest9223> from Hetzner until now ;-)
[15:42] <Guest9223> That command brought the machine back online (the "onlink" was required, otherwise it gave me an error): sudo ip route add default via 1.2.3.4 (gateway IP) dev enp7s0 onlink
[15:43] <Guest9223> I started systemd-networkd in debug mode according to the docs here and the machine was disconnected and the default gateway is gone again: https://wiki.ubuntu.com/DebuggingSystemd
[15:44] <Guest9223> Now I'm connected via the KVM and have no idea how to fix that and what causes that behavior...
[15:44] <slyon> can you show your config from /etc/netplan/*.yaml?
[15:45] <Guest9223> slyon: sure... give me a minute
[15:49] <Guest9223> slyon: here: https://pastebin.com/HmnUcHVL
[15:49] <Guest9223> It also matches the Hetzner docs here: https://docs.hetzner.com/de/robot/dedicated-server/network/net-config-debian-ubuntu/
[15:52] <slyon> Thank you, could you please also show the output of `networkctl status enp7s0` just to see if that is passed through correctly
[15:52] <Guest9223> sure, one sec
[15:52] <slyon> and maybe "route" and "route -6"
[15:53] <slyon> err. "ip route" and "ip -6 route" that is
[15:54] <Guest9223> route is empty, route -6 says "/proc/net/ipv6_route: No such file or directory \ INET6 (IPv6) not configured in this system
[15:55] <Guest9223> netowrkctl isn't so easy, I can't copy & paste anything and it includes public IPs... but I see "State: routable (failed)
[15:57] <slyon> the route output is interesting. "route" being empty should not be the case, as there is a default route defined in netplan, and also some ipv6 addresses and a gateway, so it should exist
[15:57] <Guest9223> ip route and ip -6 route are both empty
[15:58] <slyon> networkctl: failed, is interesting, too. in this case you should check your journalctl -u systemd-networkd (you put that already into debug mode which is good)
[15:58] <Guest9223> Somehow it failes to set that route... I did it manually last time and the machine was online immediatly again until I started systemd-networkd in debug mode
[15:59] <Guest9223> It's Ubuntu 20.04 HWE (Hetzner installimage) btw and everything patched as of yesterday.
[15:59] <Guest9223> lemme try journalctl...
[16:01] <Guest9223> Btw. when I run lib/systemd/systemd-networkd it shows "Enumeration completed \ enp7s0: Could not set address: Operation not supported \ enp7s0: Failed
[16:06] <slyon> that's probably the problem.. but I'm not sure what "Operation not supported" means. are there any more hints in journalctl?
[16:06] <slyon> you didn't put any custom networkd configuration or systemd override, right?
[16:07] <Guest9223> In journalctl I see that error message the first time appeared yesterday after the reboot... So seems that something is broken
[16:08] <Guest9223> Nope, I didn't touch that file until today when I started to investigate it but still I didn't modify anything. I was running netplan generate and apply but nothing (to my knowledge) should have modified anything here
[16:08] <Guest9223> And WireGuard was set up which I disabled, just in case
[16:09] <Guest9223> I saw that netplan was updated end of March... I would almost say that it broke something -> I have a few other servers but I don't want to reboot them now until this is solved because they are productive ;-)
[16:12] <slyon> to me it looks like netplan is doing it's job. but networkd is somehow failing to comunicate with the kernel... so it might be a kernel or systemd-networkd issue :-/ did you updates for those packages as well?
[16:12] <slyon> could you try booting an older kernel image?
[16:13] <Guest9223> Here is a screenshot of journalctl -> the upper part before the empty line is the last successful one: https://imgur.com/BfCgsOL
[16:14] <slyon> could you maybe paste a (redacted) version of /run/systemd/network/10-netplan-enp7s0.network ?
[16:14] <Guest9223> I've updated everything available yesterday, ~24h ago. I also checked for new updates today after setting the route manually -> nothing new
[16:14] <Guest9223> sure, one sec
[16:14] <slyon> I could try to apply that on a 20.04 machine and see if it produces the same issue..
[16:17] <Guest9223> https://imgur.com/7UH7TW9
[16:18] <Guest9223> The IP address match the one from /etc/netplan/01... and I also coudn'T see any difference on another server (besides diffferent IP addresses, ofc)
[16:19] <ddstreet> Guest9223 your ip address has netmask /32 so it can't reach your gateway, of course
[16:19] <ddstreet> if you actually want the /32 netmask for some reason then you need to add a specific route to the gateway before actually adding the default route
[16:19] <Guest9223> Regarding kernel updates: I just used apt update && apt upgrade, nothing was installed manually, no modules etc.
[16:21] <ddstreet> but you almost certainly don't actually want a /32 subnet
[16:22] <Guest9223> ddstreet: well it is mentioned and explained like this in the Hetzner docs: https://docs.hetzner.com/de/robot/dedicated-server/network/net-config-debian-ubuntu/ + another server has the same netmask and that one is working fine. So maybe something was changed here with a recent update and the Hetzner docs are outdated?
[16:22] <ddstreet> ok it's fine to do that - i.e. the 'point to point' stuff they talk about - but you do have to define the route to the gateway
[16:22] <Guest9223> What I mean: they set/recommend it like that, I think to protect from (accidental or malicious) IP changes and duplicate IP addresses with other customers - that'S how I understand their docs
[16:23] <ddstreet> that's what the 'on-link' does, it says 'ok kernel i know you have no route to this ip, but don't worry it really is on your link local so just pretend you can route to it'
[16:24] <ddstreet> if systemd isn't properly handling the setting there might be a systemd bug
[16:24] <Guest9223> Yeah that explains why it errored out without the "online" parameter and it was working for a few months like that -> I think January 24th 2022 or so
[16:24] <ddstreet> ah
[16:24] <ddstreet> Guest9223 in your .network file the param is misspelled
[16:24] <ddstreet> it's GatewayOnLink not GatewayOnlink
[16:24] <ddstreet> change the L capitalization and it should work ok
[16:24] <ddstreet> not sure if netplan is using the wrong cap?
[16:25] <Guest9223> Lemme check...
[16:26] <Guest9223> Is it enough to edit the .network file? Because I get the same error and on another (not yet rebooted) server it is also lowercase
[16:26] <ddstreet> no, it looks like there's a bug in netplan where it's using the wrong spelling
[16:27] <Guest9223> So something is case sensitive and not recognizing the key you mean?
[16:28] <ddstreet> until that's fixed, i guess you could add a drop-in file, e.g. if the netplan-generated file is named '10-netplan-enp7s0.network' then create a file /etc/systemd/network/10-netplan-enp7s0.network.d/override.conf' and make its content:
[16:28] <ddstreet> [Route]
[16:28] <ddstreet> GatewayOnLink=true
[16:28] <ddstreet> (just those lines)
[16:28] <ddstreet> then i think if you reboot it shoudl work
[16:28] <slyon> ddstreet: netplan is rendering it as "GatewayOnlink=true".. but that didn't change since 2018... and it was actually changed from GatewayOnLink -> GatewayOnlink in that commit d419c7b8
[16:28] <ddstreet> ah ok that's interesting then
[16:29] <Guest92> Got disconnected... did my GitHub message make it?
[16:31] <slyon> Guest92: no, i don't think so
[16:32] <Guest92> https://github.com/canonical/netplan/search?q=GatewayOnLink shows GatewayOnlink only, lowercase
[16:32] <slyon> netplan is rendering it as "GatewayOnlink=true".. but that didn't change since 2018... and it was actually changed from GatewayOnLink -> GatewayOnlink in that commit d419c7b8
[16:32] <ddstreet> Guest92 i'm wrong, as slyon said either spelling is ok
[16:33] <ddstreet> so ignore me, might be the kernel as he said :)
[16:33] <Guest92> Seems that systemd-networkd is accepting both: https://github.com/systemd/systemd/search?q=GatewayOnLink
[16:33] <slyon> I got to drop now. I think this problem is most probably related to networkd or kernel (as the error message told us: "Enumeration completed \ enp7s0: Could not set address: Operation not supported \ enp7s0: Failed")
[16:33] <Guest92> That one is correct: https://github.com/systemd/systemd/blob/2afb2f4a9d6a497dfbe1983fbe1bac297a8dc52b/src/network/networkd-route.c#L2348
[16:34] <Guest92> Hm... I'm not a kernel dev... any idea what to do now? ;-)
[16:34] <Guest92> I can set the route manually via ip route add... but that lasts only until the next reboot
[16:39] <ddstreet> slyon fyi systemd is only keeping the 'Onlink' spelling for backwards compat, the correct usage is 'GatewayOnLink' per upstream commit 9cb8c5593443d24c19e40bfd4fc06d672f8c554c
[16:41] <Guest92> Ok, so a little bug discovered (it should be GatewayOnLink in the .network file) but that probably isn't the issue here? :-)
[16:42] <ddstreet> right it's definitely not the issue for you
[16:44] <ddstreet> can you share the output of 'ip a' after it fails? it's unable to set your static ip, which is strange
[16:51] <ddstreet> and check your kernel logs, e.g. 'journalctl -b -k'
[16:51] <Guest92> One sec... I enabled IPv6 again in /etc/default/ufw and enabled more logging for systemd-networkd
[16:51] <Guest92> Just rebooting and seeing what will happen now...
[16:54] <Guest92> https://imgur.com/lhVCC4R
[16:54] <Guest92> Still dead after reboot...
[17:00] <ddstreet> but you still get the 'Could not set address' error?
[17:00] <ddstreet> the address looks set
[17:02] <Guest92> I see some ACPI Errors in the journalctl -b -k output and the last line is "enp7s0... Link is Up, Full Duplex etc.
[17:02] <ddstreet> no errors from networkd?
[17:03] <Guest92> Nope... but still the same error in journalctl -u systemd-networkd
[17:03] <Guest92> But much more debug output now... shall I upload that? I can set the route manually to get the machine online
[17:03] <ddstreet> sure
[17:10] <Guest92> There we go: https://pastebin.com/8KUYndZw
[17:10] <Guest92> I just replace the first 3 parts of the IP, the last part is the original
[17:18] <ddstreet> Guest92 do you have ipv6 disabled?
[17:18] <ddstreet> like, in the kernel cmdline?
[17:19] <Guest92> AFAIR I disabled it only via /etc/default/ufw but not sure... how to check that?
[17:19] <ddstreet> well first check /proc/cmdline
[17:20] <ddstreet> to make sure you didn't disable ipv6 globally
[17:20] <Guest92> Yeah, looks disabled: BOOT_IMAGE=/vmlinuz-5.13.0-40-generic root=/dev/mapper/vg0-root ro ipv6.disable=1 nomodeset consoleblank=0
[17:21] <Guest92> Ah I remember... seems I disabled it via grub: GRUB_CMDLINE_LINUX="ipv6.disable=1"
[17:21] <ddstreet> ok...and you're telling networkd to set up an ipv6 address?
[17:22] <Guest92> Ok, lemme check... I'll enable it and reboot...
[17:26] <Guest92> Holy moly! That was the problem! ;-)
[17:27] <ddstreet> yeah, hard for networkd to add the ipv6 addr you asked it to add, when ipv6 is disabled :)
[17:27] <Guest92> Besides that mistake on my behalf... shouldn't it set up IPv4 at least or somehow detect that? I found another issue here with a similar case: https://github.com/systemd/systemd/issues/12656
[17:28] <Guest92> And... do you guys accept coffee or donations somehow? ;-)
[17:28] <ddstreet> networkd doesn't do 'partially configured', if part of the interface setup fails it's considered 'failed'
[17:29] <ddstreet> so since setting (one of) the addresses failed, networkd didn't bother to continue with the next step, adding the route for the interface
[17:34] <Guest92> I see... BIG BIG BIG thank you to help me figuring this out! :-)