/srv/irclogs.ubuntu.com/2021/11/04/#cloud-init.txt

=== jbg is now known as jbg_
akutzfalcojr blackboxsw: Are either of you working on the PR for https://bugs.launchpad.net/cloud-init/+bug/1949407? I know it has an associated patch, but there are no tests. This is biting the Kubernetes image builder as well. I was going to whip up a PR with a test to validate the patch. If y'all already are, please let me know. Looks like this will need to wait until 21.5 anyway though.17:03
ubottuLaunchpad bug 1949407 in cloud-init "crash in cloud-init when using set-name on networkd renderer" [Undecided, New]17:03
falcojrakutz: we are not...thanks for jumping on that!17:06
akutzfalcojr: PR is open https://github.com/canonical/cloud-init/pull/110018:31
ubottuPull 1100 in canonical/cloud-init "Fix for set-name bug in networkd renderer" [Open]18:31
anankeI'm trying to develop/test/debug networking setup with cloud-init. Is there a way I can force re-running of the cloud-init's network configuration? I've modified user-data and restarted the ec2 instance, but I don't see changes I expected19:27
akutzananke: If you're doing dev work you could write/update unit tests for ec2 and use mocks?19:31
akutzOtherwise you could run "cloud-init clean" per https://cloudinit.readthedocs.io/en/latest/topics/cli.html19:31
anankeit's not actual dev work, just trying to construct a working 'network' section that could be passed to an ec2 instance via user-data and configure available network interfaces19:34
akutzYou *could* play around with that by modifying https://github.com/canonical/cloud-init/blob/main/tests/unittests/test_datasource/test_ec2.py19:35
anankeI've tried 'cloud-init clean' by itself, didn't seem to do much. 'cloud-init clean -r' on the other hand seems to be reconfiguring19:35
anankeI'm now wondering why the config doesn't seem to take effect19:36
akutzAnd then run "make clean_pyc && PYTHONPATH="$(pwd)" python3 -m pytest -v tests/unittests/test_datasource/test_ec2.py" to execute the tests19:36
akutzWell it also depends on the version of cloud-init I suppose.19:37
anankefeatures seem to support it, I'm using v1 while the host's cloud-init supports both v1 and v2. logs seem to indicate that the config was accepted, but I don't see any action19:38
akutzWell, I cannot say I'm an expert at ec2's datasource, so I may not be much help19:39
akutzI suggest trying the unit test file as it would be a quick way to validate your config19:39
akutzBut you cannot pass network config via userdata if that is what you are trying19:39
akutz(you said v1 and v2, and those are frequently used to describe the network config version)19:40
anankeargh, that may explain it. that's unfortunate, because it would solve a metric ton of problems for us if we could pass network config via userdata19:41
anankehold on, the docs seem to indicate that it should be possible, eg: 'For example, OpenStack may provide network config in the MetaData Service.19:41
anankeyet a few sentences later I see 'User-data cannot change an instance’s network configuration.'19:42
anankeI'm a bit baffled by that, sees contradictory19:43
akutzMetaData service is not userdata though. Do you have a link?19:44
anankehttps://cloudinit.readthedocs.io/en/latest/topics/network-config.html19:45
akutzTake a look at that test file I sent you. It shows what comes via the metadata.19:45
anankeare you referring to https://github.com/canonical/cloud-init/blob/main/tests/unittests/test_datasource/test_ec2.py ?19:46
akutzThe "MetaData Service" varies based on datasource provider, but it refers to how the cloud platform injects "metadata" into the guest in order to do things like bring up networking. It's strictly *not* userdata. So yeah, you cannot typically provide the metadata yourself unless the cloud platform is designed that way.19:46
akutzAnd yes I am19:46
anankeI see, thank you19:47
akutzYou could provide userdata that uses the runcmd module to reconfigure the network to your liking I suppose, but remember that the runcmd module executes per instance, not per boot, so the networking may not be persistent unless you write files to ensure it is.19:47
minimalananke: isn't the network config coming from EC2's own side based on how you have created the VM (number of interfaces, which VPC for each interface, etc)19:48
anankehmm, I'll have to figure out what's possible then, in lieu of being able to provide the network config via user-data19:48
akutzYou should take a look at https://cloudinit.readthedocs.io/en/latest/topics/datasources/ec2.html to see how the datasource works19:48
akutzBut minimal is right -- the metadata for ec2 instances is determined based on how you've configured the instance itself.19:48
anankeminimal: essentially, yes, but it seems different distributions treat the same network setup differently19:49
anankeand to be specific: centos 7 does NOT bring up additional network interfaces19:49
minimalif you create a VM with a single interface on a private VPC then the AWS network config via their metadata server will show the interface with an IP from the VPC's defined subnet, etc19:49
akutzWhereas the VMware datasource (https://cloudinit.readthedocs.io/en/latest/topics/datasources/vmware.html), because we have no metadata service, is designed to have the metadata be injectable by the user (or operator). Think of it this way -- EC2 and *most* cloud platforms PULL their metadata. VMware *pushes* its metadata because there is no location from which to pull it. 19:49
akutzananke: if CentOS 7 is a supported AMI it *should* work...19:50
anankeminimal: my concern is the second network interface. under the same conditions (new ec2 instance, same subnet, two NICs attached at creation time), we observe different behavior. amazon linux 2 will run dhcp on all interfaces, same thing for debian & ubuntu, but not centos719:51
akutzHowever, I don't know what version of cloud-init it is using19:51
akutzYes, different distros behave differently ananke19:51
minimalnot familiar with Centos but the problem I guess you are facing is that typical only the first interface is brought up and configured by some distros - that's one reason why hotplug functionality was added in recent cloud-init releases19:51
akutzSee https://github.com/canonical/cloud-init/tree/main/cloudinit/distros19:51
akutzCentOS's datasource literally just inherits RHELs19:51
minimalvarious distros are using their own specific ways to handle secondary address on an interface and multiple interfaces on AWS19:51
minimale.g. Amazon Linux has a ec2-net-tools package (from memory) for dealing with that19:52
anankeit appears it's cloud-init 19.4, this is the latest and greatest official centos 7 ami19:52
akutzAnd given how old CentOS 7 is, it's likely any fix for this isn't part of the Cloud-init on Cent OS 7.19:52
akutzHere's the distro source for RHEL/CentOS for Cloud-Init 19.4 https://github.com/canonical/cloud-init/blob/ubuntu/19.4-56-g06e324ff-0ubuntu1/cloudinit/distros/rhel.py19:52
akutzsmoser is one of the authors of this distro source, so he might know.19:53
minimalananke: here's what Amazon Linux uses: https://github.com/aws/amazon-ec2-net-utils19:54
akutzananke -- have you seen https://www.internetstaff.com/multiple-ec2-network-interfaces-on-red-hat-centos-7/ 19:54
akutzI also found this https://serverfault.com/questions/826607/server-not-accessible-on-eth1-additional-network-interface-centos-7-on-aws-ec219:55
anankeakutz: haven't seen it yet, but it seems to confirm my experience19:55
akutzYeah, this looks more and more to be a distro issue and perhaps the version of cloud-init inside that distro. minimal, do you know if this is something that might work in a later version of CI on CentOS if ananke built a custom AMI?19:56
minimalakutz: yes the bit "If you’re not running Amazon Linux with the built in network interface management tools, adding multiple ENIs on the same subnet can be a confusing experience." is referring to lack of ec2-net-utils or equivalent19:56
akutzAck19:57
anankethe reason I was looking at cloud-init, and hoping that network config for _additional_ interfaces could be passed via user-data, is because we have a lot of custom AMIs for many distros. We're looking at introducing more complex network setups, where the systems will be managed/accessed via secondary network interface19:57
minimalits one reason why cloud-init 21.3 & 21.4 started added "hotplug" support (currently only for Openstack and Ec2)19:57
anankeif we could leverage cloud-init, it would simplify the process by a metric ton, especially if we could inject additional routes for that interface19:57
akutzSo the Kubernetes image builder has support for CentOS 7 (https://github.com/kubernetes-sigs/image-builder/tree/master/images/capi). I will ping that group to see if they have addressed the issue somehow manually.19:57
minimalananke: yes additional interface info can be pass if, for example, you are using the NoCloud Data Source (like I use for physical machines). Ec2 and some other cloud providers are different - *they* provide the network config to the relevant c-i data source19:58
anankeright now we may have to look at rebuilding all of the AMIs, but still using cloud-init to perform this network config task, just tacking to our existing cloud-init template19:58
anankeakutz: thank you19:59
minimalif you were to "curl" the right url on the Ec2 machine for the metadata server network info you will see what is supposed to be used. I guess Centos just doesn't have anything in place to make use of that19:59
akutzThey do build a CentOS 7 AMI - https://github.com/kubernetes-sigs/image-builder/blob/master/images/capi/Makefile#L57019:59
anankeminimal: that's a good point, I can see what the aws metadata service show20:00
akutzDo ya'll know what the AWS SSM agent is?20:01
minimalananke: there's 2 issue: initial (1st boot) setting up interfaces and, secondly, dealing with later dynamic changes in interfaces (i.e. you add/remove other interfaces later)20:01
anankeakutz: yes20:01
akutzWhat is it?20:01
anankeakutz: it's an agent that can be installed on a given instance, to provide almost 'out-of-band' like management for a given system20:02
minimala way to do some machine management instead of using SSH to connect to them20:02
minimalEC2 originally did not provide a console for VMs so I guess SSM is what they expected people to use instead20:03
anankeminimal: I'm not concerned with changes after the fact, though it brings up an interesting point: if network information is present at boot time, why does cloud-init on centos 7 bring up only _first_ interface?20:03
rharperananke: it depends on the platform and release of OS (and the cloud-init with in it).     20:04
anankerharper: 19.4 in this case, and there is no trace of 'network' config in /etc/cloud*20:05
minimalananke: as I basically said earlier - *all* distros originally only typically brought up a single interface via DHCP on EC2. When AWS did their own distro they wrote ec2-net-tools to handle more interfaces and more IPs per interface. Other distros then started to provide equivalent functionality. I use and maintain cloud-init on Alpine Linux, I'm in the process of working out how to do the same on that distro20:05
rharperand what platform? ec2, openstack?  azure ? 20:05
anankerharper: ec220:06
anankeminimal: I see. I haven't tried earlier versions of ubuntu (such as 18 or 16), but I was hoping centos 7 by now would have a fairly robust cloud-init setup20:06
rharperso multi-nic bringup on ec2 was introduced  in, Date:   Wed Mar 18 13:33:37 2020 -0600 commit 6600c642af3817fe5e0170cb7b4eeac4be3c60eb 20:07
rharperso, that's not in 19.4 20:07
rharpercentos7 is python2 based, and 19.4 is the last python2.x release for cloud-init 20:07
anankerharper: ahh, thank you. that would explain the default behavior20:08
rharperand in general,  we try not to change existing behavior on older releases; 20:08
anankeand ec2-net-tools likely explains why amazon linux 2 has it, on their cloud-init 19.3-4420:08
rharperso even if the code in cloud-init in the OS *can* do something, it may be gated or disabled so that an upgrade of cloud-init in the image doesn't break  existing behvaior 20:08
anankethat was the confusing part, why amazon linux 2 worked just fine20:09
rharperananke: yes, they've had their own form of hotplug/extra-nic scripts for sometime 20:09
anankeso now I just have to figure out what kind of magic centos7/amazon linux 2 leverage to respond to traffic on the same interface it comes on, despite default routing, I'll be set20:10
anankegood news is that the same network config section stored as a cloud-init config works. so we'll be rebuilding the images, but thankfully that's automated20:17
rharperananke: the ec2 net-tools package does setup some nice routing tables;  that allows different nics to have their own routing table entry 20:39
anankerharper: I may have to explore it in depth. We're running into an odd routing issues, which seem to be distro specific, haven't figured out what controls them20:44
rharperyeah, so on the amazon linux instances, you should be able to use:  ip rule list  to see the extra tables installed for secondary nics; 20:47
rharperthen ip route show table NNN ,   the "default" table name is "main"    20:47
rharperhttps://github.com/aws/amazon-ec2-net-utils/blob/master/ec2net-functions  has the interesting code 20:48
anankethanks! this is helpful. centos7 and amazon linux 2 don't seem to differ much, but they clearly do some kind of magic that ubuntu seems to be lacking20:49
rharperwhich release of ubuntu? 20:49
ananke2020:49
anankewhile beyond the scope of this channel, the problem is fairly peculiar, and I'm fairly baffled by what's going on20:50
rharperI know we do route metrics with multi-nic instances,  so I would expect things to be OK, but we might be missing something  20:50
rharperif you're seeing an Ubuntu 20.04 routing issue when you bring up multiple nics in ec2 I would suggest filing a bug in launchpad to see if either the config cloud-init generates (or netplan applies) is incorrect 20:51
anankehere's the issue: dual-homed EC2 instance, primary (eth0) on subnet A (10.0.0.0/8), secondary (eth1) on subnet B (100.64.0.0/10). default route is associated with the gateway for subnet A20:52
anankerharper: funny enough, ubuntu behaves how I would expect it to, but I'm seeking the magic that amazon linux 2 & centos 7 have :)20:53
rharperyeah, the plan for netplan and multi-home is to use VRFs to route traffic back out the interface it came in20:54
rharperthere's an open bug/feat-request for netplan for sometime...    https://bugs.launchpad.net/netplan/+bug/1773522  20:55
ubottuLaunchpad bug 1773522 in netplan "[RFE] add support for VRF devices" [Wishlist, Confirmed]20:55
anankesubnet B is connected to subnet C (192.168.0.0/16). bastion host sits on subnet C and communicates to the host that sits on subnet A & B. Packets leave bastion with source address of 192.168.x.x and arrive on eth1 (100.64.x.x) on a given host20:56
anankeso here's the 64k question: what happens to replies? linux by default will reply to packets based on the defined routes. since there is no explicit route for 192.168.0.0/16, ubuntu replies on the interface associated with the default route: eth0, which is on subnet 10.0.0.0/8, and replies go to ether20:57
anankesomehow amazon linux 2, and now centos 7 after quick testing, seem to reply to the same packet on the interface it came from - eth1. what's baffling there is no route whatsoever for 192.168.0.0/1620:58
anankerharper: thanks, I'll have to take a closer look at it20:59
rharperyeah, I think each interface has it's own default route (but in a different routing table) so IIUC, tables are given a priority, so the lookup for a packet will check the non-default tables before using "main"  ;; 21:00
rharper^ on ec2/centos7 if its using the ec2-net-utils;  and on Ubuntu, we only mark routes with metrics 21:00
anankecentos7 doesn't have ec2-net-utils21:01
rharperinteresting;  I wonder if there's some dhclient route script magic in the cloud image though 21:01
rharperId definitly run ip rule list on those ama/cent hosts to see what shows up vs ubuntu 21:01
anankeand I wish the answer would be so simple, but there is absolutely nothing in the dhcp lease that would indicate routes to those other subnets21:02
anankebut yes, I'll go with a fine comb over the ip rule list21:02
anankedebian 10 is another problem altogether, with two interfaces it sets the default route to the _last_ interface that was brought up: eth1 in this case21:03
anankeso while that 'fixes' access to the vm, it breaks everything else21:04
rharperheh21:05
anankebut it's 5pm, and that's a problem for tomorrow :) thanks everybody21:06
rharpero/ 21:06
holmanbananke: late to the convo, but I'll  add a +1 to rharper's suggestion to take a look at ip rule. iirc linux has 3 route tables by default, and you can do some pretty fancy things using host-based routing21:11
holmanbthere is a cloud-init ticket for supporting host-based routing21:11
rharperholmanb: nice, is that different than the VRF one for netplan ? 21:12
holmanbit is, I'll drop a link21:12
holmanbs/host-based/policy based/21:13
holmanbhttps://bugs.launchpad.net/cloud-init/+bug/180729721:13
ubottuLaunchpad bug 1807297 in cloud-init "Add support to configure policy based routing" [High, Triaged]21:13
rharperawesome 21:16
holmanbif there is time this release I might try to tackle it21:18
blackboxswahh, pulling my head out of the ground. TIL manipulating the lxd image metadata to hack/prefer LXD over NoCloud datasources. I'll write up a howto. This also gives us the change to pre-test incoming image metadtata changes easily and/or direct download daily images if, for example simplestreams data isn't being published for one reason or another.21:26
blackboxswI can now properly validate falcojr's SRU as LXD datasource works TM on latest Ubuntu jammy release.21:26
holmanb@blackboxsw nice!21:29
holmanbananke: also, it looks like cloud-init's network module is disabled amazon' linux, that may  (check 21:30
holmanb(hit enter too early)21:30
holmanbananke: that may describe the behavior difference between amazon linux and ubuntu (check /etc/cloud/cloud.cfg)21:31

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!