[17:03] <akutz> falcojr blackboxsw: Are either of you working on the PR for https://bugs.launchpad.net/cloud-init/+bug/1949407? I know it has an associated patch, but there are no tests. This is biting the Kubernetes image builder as well. I was going to whip up a PR with a test to validate the patch. If y'all already are, please let me know. Looks like this will need to wait until 21.5 anyway though.
[17:06] <falcojr> akutz: we are not...thanks for jumping on that!
[18:31] <akutz> falcojr: PR is open https://github.com/canonical/cloud-init/pull/1100
[19:27] <ananke> I'm trying to develop/test/debug networking setup with cloud-init. Is there a way I can force re-running of the cloud-init's network configuration? I've modified user-data and restarted the ec2 instance, but I don't see changes I expected
[19:31] <akutz> ananke: If you're doing dev work you could write/update unit tests for ec2 and use mocks?
[19:31] <akutz> Otherwise you could run "cloud-init clean" per https://cloudinit.readthedocs.io/en/latest/topics/cli.html
[19:34] <ananke> it's not actual dev work, just trying to construct a working 'network' section that could be passed to an ec2 instance via user-data and configure available network interfaces
[19:35] <akutz> You *could* play around with that by modifying https://github.com/canonical/cloud-init/blob/main/tests/unittests/test_datasource/test_ec2.py
[19:35] <ananke> I've tried 'cloud-init clean' by itself, didn't seem to do much. 'cloud-init clean -r' on the other hand seems to be reconfiguring
[19:36] <ananke> I'm now wondering why the config doesn't seem to take effect
[19:36] <akutz> And then run "make clean_pyc && PYTHONPATH="$(pwd)" python3 -m pytest -v tests/unittests/test_datasource/test_ec2.py" to execute the tests
[19:37] <akutz> Well it also depends on the version of cloud-init I suppose.
[19:38] <ananke> features seem to support it, I'm using v1 while the host's cloud-init supports both v1 and v2. logs seem to indicate that the config was accepted, but I don't see any action
[19:39] <akutz> Well, I cannot say I'm an expert at ec2's datasource, so I may not be much help
[19:39] <akutz> I suggest trying the unit test file as it would be a quick way to validate your config
[19:39] <akutz> But you cannot pass network config via userdata if that is what you are trying
[19:40] <akutz> (you said v1 and v2, and those are frequently used to describe the network config version)
[19:41] <ananke> argh, that may explain it. that's unfortunate, because it would solve a metric ton of problems for us if we could pass network config via userdata
[19:41] <ananke> hold on, the docs seem to indicate that it should be possible, eg: 'For example, OpenStack may provide network config in the MetaData Service.
[19:42] <ananke> yet a few sentences later I see 'User-data cannot change an instance’s network configuration.'
[19:43] <ananke> I'm a bit baffled by that, sees contradictory
[19:44] <akutz> MetaData service is not userdata though. Do you have a link?
[19:45] <ananke> https://cloudinit.readthedocs.io/en/latest/topics/network-config.html
[19:45] <akutz> Take a look at that test file I sent you. It shows what comes via the metadata.
[19:46] <ananke> are you referring to https://github.com/canonical/cloud-init/blob/main/tests/unittests/test_datasource/test_ec2.py ?
[19:46] <akutz> The "MetaData Service" varies based on datasource provider, but it refers to how the cloud platform injects "metadata" into the guest in order to do things like bring up networking. It's strictly *not* userdata. So yeah, you cannot typically provide the metadata yourself unless the cloud platform is designed that way.
[19:46] <akutz> And yes I am
[19:47] <ananke> I see, thank you
[19:47] <akutz> You could provide userdata that uses the runcmd module to reconfigure the network to your liking I suppose, but remember that the runcmd module executes per instance, not per boot, so the networking may not be persistent unless you write files to ensure it is.
[19:48] <minimal> ananke: isn't the network config coming from EC2's own side based on how you have created the VM (number of interfaces, which VPC for each interface, etc)
[19:48] <ananke> hmm, I'll have to figure out what's possible then, in lieu of being able to provide the network config via user-data
[19:48] <akutz> You should take a look at https://cloudinit.readthedocs.io/en/latest/topics/datasources/ec2.html to see how the datasource works
[19:48] <akutz> But minimal is right -- the metadata for ec2 instances is determined based on how you've configured the instance itself.
[19:49] <ananke> minimal: essentially, yes, but it seems different distributions treat the same network setup differently
[19:49] <ananke> and to be specific: centos 7 does NOT bring up additional network interfaces
[19:49] <minimal> if you create a VM with a single interface on a private VPC then the AWS network config via their metadata server will show the interface with an IP from the VPC's defined subnet, etc
[19:49] <akutz> Whereas the VMware datasource (https://cloudinit.readthedocs.io/en/latest/topics/datasources/vmware.html), because we have no metadata service, is designed to have the metadata be injectable by the user (or operator). Think of it this way -- EC2 and *most* cloud platforms PULL their metadata. VMware *pushes* its metadata because there is no location from which to pull it. 
[19:50] <akutz> ananke: if CentOS 7 is a supported AMI it *should* work...
[19:51] <ananke> minimal: my concern is the second network interface. under the same conditions (new ec2 instance, same subnet, two NICs attached at creation time), we observe different behavior. amazon linux 2 will run dhcp on all interfaces, same thing for debian & ubuntu, but not centos7
[19:51] <akutz> However, I don't know what version of cloud-init it is using
[19:51] <akutz> Yes, different distros behave differently ananke
[19:51] <minimal> not familiar with Centos but the problem I guess you are facing is that typical only the first interface is brought up and configured by some distros - that's one reason why hotplug functionality was added in recent cloud-init releases
[19:51] <akutz> See https://github.com/canonical/cloud-init/tree/main/cloudinit/distros
[19:51] <akutz> CentOS's datasource literally just inherits RHELs
[19:51] <minimal> various distros are using their own specific ways to handle secondary address on an interface and multiple interfaces on AWS
[19:52] <minimal> e.g. Amazon Linux has a ec2-net-tools package (from memory) for dealing with that
[19:52] <ananke> it appears it's cloud-init 19.4, this is the latest and greatest official centos 7 ami
[19:52] <akutz> And given how old CentOS 7 is, it's likely any fix for this isn't part of the Cloud-init on Cent OS 7.
[19:52] <akutz> Here's the distro source for RHEL/CentOS for Cloud-Init 19.4 https://github.com/canonical/cloud-init/blob/ubuntu/19.4-56-g06e324ff-0ubuntu1/cloudinit/distros/rhel.py
[19:53] <akutz> smoser is one of the authors of this distro source, so he might know.
[19:54] <minimal> ananke: here's what Amazon Linux uses: https://github.com/aws/amazon-ec2-net-utils
[19:54] <akutz> ananke -- have you seen https://www.internetstaff.com/multiple-ec2-network-interfaces-on-red-hat-centos-7/ 
[19:55] <akutz> I also found this https://serverfault.com/questions/826607/server-not-accessible-on-eth1-additional-network-interface-centos-7-on-aws-ec2
[19:55] <ananke> akutz: haven't seen it yet, but it seems to confirm my experience
[19:56] <akutz> Yeah, this looks more and more to be a distro issue and perhaps the version of cloud-init inside that distro. minimal, do you know if this is something that might work in a later version of CI on CentOS if ananke built a custom AMI?
[19:56] <minimal> akutz: yes the bit "If you’re not running Amazon Linux with the built in network interface management tools, adding multiple ENIs on the same subnet can be a confusing experience." is referring to lack of ec2-net-utils or equivalent
[19:57] <akutz> Ack
[19:57] <ananke> the reason I was looking at cloud-init, and hoping that network config for _additional_ interfaces could be passed via user-data, is because we have a lot of custom AMIs for many distros. We're looking at introducing more complex network setups, where the systems will be managed/accessed via secondary network interface
[19:57] <minimal> its one reason why cloud-init 21.3 & 21.4 started added "hotplug" support (currently only for Openstack and Ec2)
[19:57] <ananke> if we could leverage cloud-init, it would simplify the process by a metric ton, especially if we could inject additional routes for that interface
[19:57] <akutz> So the Kubernetes image builder has support for CentOS 7 (https://github.com/kubernetes-sigs/image-builder/tree/master/images/capi). I will ping that group to see if they have addressed the issue somehow manually.
[19:58] <minimal> ananke: yes additional interface info can be pass if, for example, you are using the NoCloud Data Source (like I use for physical machines). Ec2 and some other cloud providers are different - *they* provide the network config to the relevant c-i data source
[19:58] <ananke> right now we may have to look at rebuilding all of the AMIs, but still using cloud-init to perform this network config task, just tacking to our existing cloud-init template
[19:59] <ananke> akutz: thank you
[19:59] <minimal> if you were to "curl" the right url on the Ec2 machine for the metadata server network info you will see what is supposed to be used. I guess Centos just doesn't have anything in place to make use of that
[19:59] <akutz> They do build a CentOS 7 AMI - https://github.com/kubernetes-sigs/image-builder/blob/master/images/capi/Makefile#L570
[20:00] <ananke> minimal: that's a good point, I can see what the aws metadata service show
[20:01] <akutz> Do ya'll know what the AWS SSM agent is?
[20:01] <minimal> ananke: there's 2 issue: initial (1st boot) setting up interfaces and, secondly, dealing with later dynamic changes in interfaces (i.e. you add/remove other interfaces later)
[20:01] <ananke> akutz: yes
[20:01] <akutz> What is it?
[20:02] <ananke> akutz: it's an agent that can be installed on a given instance, to provide almost 'out-of-band' like management for a given system
[20:02] <minimal> a way to do some machine management instead of using SSH to connect to them
[20:03] <minimal> EC2 originally did not provide a console for VMs so I guess SSM is what they expected people to use instead
[20:03] <ananke> minimal: I'm not concerned with changes after the fact, though it brings up an interesting point: if network information is present at boot time, why does cloud-init on centos 7 bring up only _first_ interface?
[20:04] <rharper> ananke: it depends on the platform and release of OS (and the cloud-init with in it).     
[20:05] <ananke> rharper: 19.4 in this case, and there is no trace of 'network' config in /etc/cloud*
[20:05] <minimal> ananke: as I basically said earlier - *all* distros originally only typically brought up a single interface via DHCP on EC2. When AWS did their own distro they wrote ec2-net-tools to handle more interfaces and more IPs per interface. Other distros then started to provide equivalent functionality. I use and maintain cloud-init on Alpine Linux, I'm in the process of working out how to do the same on that distro
[20:05] <rharper> and what platform? ec2, openstack?  azure ? 
[20:06] <ananke> rharper: ec2
[20:06] <ananke> minimal: I see. I haven't tried earlier versions of ubuntu (such as 18 or 16), but I was hoping centos 7 by now would have a fairly robust cloud-init setup
[20:07] <rharper> so multi-nic bringup on ec2 was introduced  in, Date:   Wed Mar 18 13:33:37 2020 -0600 commit 6600c642af3817fe5e0170cb7b4eeac4be3c60eb 
[20:07] <rharper> so, that's not in 19.4 
[20:07] <rharper> centos7 is python2 based, and 19.4 is the last python2.x release for cloud-init 
[20:08] <ananke> rharper: ahh, thank you. that would explain the default behavior
[20:08] <rharper> and in general,  we try not to change existing behavior on older releases; 
[20:08] <ananke> and ec2-net-tools likely explains why amazon linux 2 has it, on their cloud-init 19.3-44
[20:08] <rharper> so even if the code in cloud-init in the OS *can* do something, it may be gated or disabled so that an upgrade of cloud-init in the image doesn't break  existing behvaior 
[20:09] <ananke> that was the confusing part, why amazon linux 2 worked just fine
[20:09] <rharper> ananke: yes, they've had their own form of hotplug/extra-nic scripts for sometime 
[20:10] <ananke> so now I just have to figure out what kind of magic centos7/amazon linux 2 leverage to respond to traffic on the same interface it comes on, despite default routing, I'll be set
[20:17] <ananke> good news is that the same network config section stored as a cloud-init config works. so we'll be rebuilding the images, but thankfully that's automated
[20:39] <rharper> ananke: the ec2 net-tools package does setup some nice routing tables;  that allows different nics to have their own routing table entry 
[20:44] <ananke> rharper: I may have to explore it in depth. We're running into an odd routing issues, which seem to be distro specific, haven't figured out what controls them
[20:47] <rharper> yeah, so on the amazon linux instances, you should be able to use:  ip rule list  to see the extra tables installed for secondary nics; 
[20:47] <rharper> then ip route show table NNN ,   the "default" table name is "main"    
[20:48] <rharper> https://github.com/aws/amazon-ec2-net-utils/blob/master/ec2net-functions  has the interesting code 
[20:49] <ananke> thanks! this is helpful. centos7 and amazon linux 2 don't seem to differ much, but they clearly do some kind of magic that ubuntu seems to be lacking
[20:49] <rharper> which release of ubuntu? 
[20:49] <ananke> 20
[20:50] <ananke> while beyond the scope of this channel, the problem is fairly peculiar, and I'm fairly baffled by what's going on
[20:50] <rharper> I know we do route metrics with multi-nic instances,  so I would expect things to be OK, but we might be missing something  
[20:51] <rharper> if you're seeing an Ubuntu 20.04 routing issue when you bring up multiple nics in ec2 I would suggest filing a bug in launchpad to see if either the config cloud-init generates (or netplan applies) is incorrect 
[20:52] <ananke> here's the issue: dual-homed EC2 instance, primary (eth0) on subnet A (10.0.0.0/8), secondary (eth1) on subnet B (100.64.0.0/10). default route is associated with the gateway for subnet A
[20:53] <ananke> rharper: funny enough, ubuntu behaves how I would expect it to, but I'm seeking the magic that amazon linux 2 & centos 7 have :)
[20:54] <rharper> yeah, the plan for netplan and multi-home is to use VRFs to route traffic back out the interface it came in
[20:55] <rharper> there's an open bug/feat-request for netplan for sometime...    https://bugs.launchpad.net/netplan/+bug/1773522  
[20:56] <ananke> subnet B is connected to subnet C (192.168.0.0/16). bastion host sits on subnet C and communicates to the host that sits on subnet A & B. Packets leave bastion with source address of 192.168.x.x and arrive on eth1 (100.64.x.x) on a given host
[20:57] <ananke> so here's the 64k question: what happens to replies? linux by default will reply to packets based on the defined routes. since there is no explicit route for 192.168.0.0/16, ubuntu replies on the interface associated with the default route: eth0, which is on subnet 10.0.0.0/8, and replies go to ether
[20:58] <ananke> somehow amazon linux 2, and now centos 7 after quick testing, seem to reply to the same packet on the interface it came from - eth1. what's baffling there is no route whatsoever for 192.168.0.0/16
[20:59] <ananke> rharper: thanks, I'll have to take a closer look at it
[21:00] <rharper> yeah, I think each interface has it's own default route (but in a different routing table) so IIUC, tables are given a priority, so the lookup for a packet will check the non-default tables before using "main"  ;; 
[21:00] <rharper> ^ on ec2/centos7 if its using the ec2-net-utils;  and on Ubuntu, we only mark routes with metrics 
[21:01] <ananke> centos7 doesn't have ec2-net-utils
[21:01] <rharper> interesting;  I wonder if there's some dhclient route script magic in the cloud image though 
[21:01] <rharper> Id definitly run ip rule list on those ama/cent hosts to see what shows up vs ubuntu 
[21:02] <ananke> and I wish the answer would be so simple, but there is absolutely nothing in the dhcp lease that would indicate routes to those other subnets
[21:02] <ananke> but yes, I'll go with a fine comb over the ip rule list
[21:03] <ananke> debian 10 is another problem altogether, with two interfaces it sets the default route to the _last_ interface that was brought up: eth1 in this case
[21:04] <ananke> so while that 'fixes' access to the vm, it breaks everything else
[21:05] <rharper> heh
[21:06] <ananke> but it's 5pm, and that's a problem for tomorrow :) thanks everybody
[21:06] <rharper> o/ 
[21:11] <holmanb> ananke: late to the convo, but I'll  add a +1 to rharper's suggestion to take a look at ip rule. iirc linux has 3 route tables by default, and you can do some pretty fancy things using host-based routing
[21:11] <holmanb> there is a cloud-init ticket for supporting host-based routing
[21:12] <rharper> holmanb: nice, is that different than the VRF one for netplan ? 
[21:12] <holmanb> it is, I'll drop a link
[21:13] <holmanb> s/host-based/policy based/
[21:13] <holmanb> https://bugs.launchpad.net/cloud-init/+bug/1807297
[21:16] <rharper> awesome 
[21:18] <holmanb> if there is time this release I might try to tackle it
[21:26] <blackboxsw> ahh, pulling my head out of the ground. TIL manipulating the lxd image metadata to hack/prefer LXD over NoCloud datasources. I'll write up a howto. This also gives us the change to pre-test incoming image metadtata changes easily and/or direct download daily images if, for example simplestreams data isn't being published for one reason or another.
[21:26] <blackboxsw> I can now properly validate falcojr's SRU as LXD datasource works TM on latest Ubuntu jammy release.
[21:29] <holmanb> @blackboxsw nice!
[21:30] <holmanb> ananke: also, it looks like cloud-init's network module is disabled amazon' linux, that may  (check 
[21:30] <holmanb> (hit enter too early)
[21:31] <holmanb> ananke: that may describe the behavior difference between amazon linux and ubuntu (check /etc/cloud/cloud.cfg)