Odd_Bloke | smoser: I've added some comments on https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1603222; your thoughts would be much appreciated. :) | 06:12 |
---|---|---|
smoser | Odd_Bloke, so i think that at this point more of the stuff is in cloud-inti, right ? | 13:48 |
smoser | i think what you're saying is that cloud-init in xenial does not depend on udev rules form walinux agent | 13:48 |
smoser | so i'd prefer if we fix something to have that be consistent between trusty -> xenial + | 13:49 |
Odd_Bloke | smoser: The problem is orthogonal to udev rules, really. | 13:58 |
Odd_Bloke | smoser: One thing I wasn't sure about is how introducing the udev rules to existing instances would affect them. | 13:58 |
Odd_Bloke | But both sets would apply, so I think it would be fine... | 13:58 |
smoser | right. they both should set up symlinks. | 13:59 |
smoser | why is it orthogonal? | 13:59 |
=== rangerpbzzzz is now known as rangerpb | ||
Odd_Bloke | Because the problem is that they don't use the udev rules properly; it doesn't matter which set of udev rules are there. | 15:05 |
Odd_Bloke | But, yeah, I'm happy to backport the udev rules if that seems safe. | 15:06 |
Odd_Bloke | I was just thinking in terms of minimising the backport diff. | 15:06 |
mgagne | smoser: will test your fix for bond right now | 16:44 |
mgagne | however I believe 3.2) described in bug still isn't fixed. | 16:44 |
smoser | mgagne, looking | 18:14 |
mgagne | in a meeting but so far, only the auto stanza on bond0 looks to be missing | 18:18 |
smoser | mgagne, i'm not sure its requried. | 18:19 |
mgagne | it is as far as I know, I added it and it worked | 18:19 |
smoser | i'll compare the network config against other stuff we have examples of (curtin vmtest) where we actually verify | 18:19 |
mgagne | will test after my meeting | 18:19 |
smoser | thanks mgagne | 18:19 |
smoser | rharper, look at https://bugs.launchpad.net/cloud-init/+bug/1605749 and have a think. | 18:52 |
rharper | smoser: is that your bonding fix ? | 18:53 |
rharper | yeah; I read the branch earlier, my initial thought was that we'd generically want to "Resolve links" at render time | 18:53 |
rharper | rather than be bond specificy | 18:53 |
rharper | the mechanism to use link['id'] as the interface name key is common for all types; that is in the network state we only have link/ids and then at runtime we'd do a replacement of link-id with get_name_from_macaddr(link_to_mac[link-id]) sort of lookup | 18:55 |
smoser | rharper, well, do other things have referenced links ? | 18:55 |
rharper | any of the combined types | 18:55 |
rharper | bridges, bonds, and vlans | 18:55 |
smoser | rharper, well, if we want, its easy enough now with the same generic mechanism | 18:56 |
rharper | w.r.t auto on bond0; that is needed; there's a bug related not getting auto on stanzas without network config (the subnet has the 'control' value and default) | 18:57 |
rharper | so a bond with no subnets, but then vlans on top misses the 'auto bondX' line; I've found this in curtin since Friday, working on a fix there; | 18:57 |
smoser | ok. | 18:57 |
smoser | then the 2 things are | 18:57 |
smoser | a.) auto bond [necessary] | 18:57 |
smoser | b.) resolve links generically | 18:58 |
smoser | where b is not strictly necessary at this time... | 18:58 |
smoser | can you give anohter example of where it is? | 18:58 |
rharper | a) bonds default to auto, unless a subnet with 'control' says otherwise | 18:58 |
rharper | for b) just replace the vlan_raw_interface value from bond.X to interface0 | 18:59 |
rharper | the underlying device if it's type physical' will refer to another "links" element which may not have a 'name' key set | 18:59 |
rharper | think, eth0.123; the eth0 would be the link.id; and that may not be the name of the device, like bond_interfaces contains link.ids | 19:00 |
smoser | where did you come up with the string 'vlan_raw_inteface' ? | 19:06 |
smoser | i dont see that anywhere. | 19:06 |
rharper | that's the eni name | 19:06 |
rharper | sorry, vlan_link is the network_data.json field | 19:07 |
rharper | in cloudinit/sources/helpers/openstack.py:593 | 19:07 |
rharper | we generate a nic name based on the vlank_link and vlan_id, which is OK since vlan is a constructed interface, but the vlan_link points to the underlying device (this is a link.id) and needs to be replaced | 19:08 |
smoser | rharper, ok. i can add a test for vlan on that also. | 19:12 |
rharper | if you need a config yaml, I have one | 19:12 |
smoser | http://paste.ubuntu.com/23059320/ | 19:14 |
smoser | i think that is what you were saying | 19:15 |
smoser | that shows the error. | 19:15 |
smoser | in 'eth0.602', does 'eth0' actually matter ? | 19:15 |
rharper | for our internal state no | 19:17 |
rharper | but it's typical shorthand for underlying device . vlan_id | 19:17 |
rharper | so, the vlan scripts in ifupdown split on . and call vconfig with the first segment and pass the second as the vlan_id | 19:17 |
rharper | you can instead say iface vlan1 and underneath specify the vlan_raw_device (eth0), and vlan_id (123) | 19:18 |
smoser | well that sucks | 19:23 |
smoser | harder to get at that | 19:25 |
rharper | no, we just need to tag the elements of state that use a link.id | 19:27 |
rharper | and when we're rendering the interface, do an id lookup by mac | 19:28 |
rharper | we already repeat the vlan_id as an interface attribute | 19:28 |
rharper | we can set the vlan interface name to vlan{index} instead of {link_id}.{vlan_id} | 19:28 |
rharper | which is what we're doing now; | 19:28 |
smoser | well, https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bond_name is updated to now have a vlan test case that i think renders correctly | 19:46 |
smoser | i dont love the mechanism, but seems to work | 19:46 |
rharper | looking | 19:51 |
smoser | still missing the auto up change, assuming youre looking at that. | 19:53 |
rharper | yeah, that's a one-liner in eni.py | 19:54 |
rharper | basically in the case that we don't have a subnet configured, if we have 'bond-masters' or 'bond-slaves' we emit an auto $iface | 19:55 |
rharper | I need to think a bit more | 19:55 |
rharper | as it's a general issue for interfaces that don't include a subnet config since 'control' is a property of a subnet | 19:55 |
smoser | yeah. | 19:56 |
mgagne | smoser: so I patched on my side to get auto added. It now fails as described in the bug description in 3.1) | 19:56 |
smoser | mgagne, well, how are you running that ? | 19:57 |
mgagne | smoser: running what? | 19:57 |
mgagne | smoser: install cloud-init from repo, apply patches found in your branch, build image, upload image, boot | 19:58 |
mgagne | http://imgur.com/yA3eslq | 20:00 |
rocket | I am wondering if I can use cloud-init in my home lab running with vmware fusion. Is there a metadata server I could run locally thats fairly simple/lightweight? | 20:09 |
smoser | rocket, there is some vmware support, but i'm not familiar enough with their product line to know if 'fusion' supports it. | 20:14 |
rocket | I was just hoping to start up a pythonic based webserver or something .. or should I be looking at creating my own that produces the yaml files I am seeing in documentation? | 20:15 |
rocket | I just didn't know what was required for a really simple setup | 20:16 |
rocket | I *think* I just need random hostnames and point that towards a saltmaster etc.. | 20:16 |
smoser | rocket, theres two things that provide difficulty i think | 20:19 |
smoser | a.) you need data per isntance-id ... each instance needs to somewho get different data | 20:19 |
smoser | b.) you have to tell cloud-init where the metadata service is. | 20:19 |
smoser | you could mock the ec2 one by mocking 169.254.169.254 and plumbing that network in | 20:20 |
rharper | smoser: 3.1 is another run-time variant; bonds inherit mac address of the slaves ; when we're doing the lookup, we can filter by type (we only need to loop up names by macs of 'physical' devices) | 20:20 |
smoser | rharper, right. but he's getting a stack trace ther | 20:23 |
smoser | which is interesting and i can't reproduce. | 20:23 |
mgagne | smoser: we are not yet at 3.2, we are still stuck at 3.1 | 20:23 |
mgagne | smoser: how are you testing? do you have access to an openstack cloud? | 20:24 |
smoser | python3 NotADirectory and FileNotFound are obnoxious | 20:24 |
rharper | smoser: we need to not try to look up the bond mac address; it can be called *whatever* bond{index} ; only the bond_interfaces lists of link_ids need to be resolved (and actually) we need to check the type of the links to see if they're physical, otherwise we can ignore the mac lookup | 20:25 |
smoser | mgagne, well, i do, yes, but was not focused there yet, and i dont have an openstack cloud that woudl ask me to bind lik ehtat | 20:25 |
mgagne | this line is problematic: https://git.launchpad.net/cloud-init/tree/cloudinit/net/__init__.py#n99 | 20:25 |
mgagne | it makes the assumption that all devices found in this folder are a real device and file is a directory | 20:26 |
mgagne | this is why this line fails: https://git.launchpad.net/cloud-init/tree/cloudinit/net/__init__.py#n350 | 20:26 |
mgagne | but could be related to python3 as you said | 20:26 |
smoser | mgagne, well, it doesnt really make the assumption | 20:31 |
smoser | it accepts an OSError and a IOError and does the right thing | 20:31 |
mgagne | I'm not sure why one would list those devices and only filter them later | 20:32 |
mgagne | rharper: I'm not sure why cloud-init tries to configure the network a second time. The 2nd time is run, slaves mac address might be updated and no longer match the ones found in config-drive. | 20:42 |
rharper | smoser: it does the right thing (read_sys_net) however, the interfaces_by_mac does not like 'bonding_masters' file and throws exception; this prevents creating the mac_to_ifname; we can handle the NotADirectoryError and continue | 20:42 |
smoser | rharper, we are trying to handle that. | 20:43 |
smoser | thats the thing | 20:43 |
smoser | NotADirectoryError is an OSError | 20:43 |
rharper | interesting | 20:43 |
rharper | when I test it, it's not handled | 20:43 |
smoser | but apparently does not have errno = 2 | 20:43 |
smoser | where do you test this ? | 20:43 |
rharper | xenial vm | 20:43 |
rharper | with bond added | 20:43 |
rharper | if you're on diglett you can ssh into the vm | 20:44 |
rharper | smoser: ssh ubuntu@192.168.122.178 | 20:44 |
rharper | this is not with your branch, so if you've updated it's just what's in xenial (cloud-init level) | 20:45 |
mgagne | current code is testing for ENOENT, not ENOTDIR | 20:45 |
smoser | probably need eNOTDIR | 20:45 |
smoser | yeah. | 20:45 |
rharper | http://paste.ubuntu.com/23059495/ | 20:45 |
smoser | rharper, http://paste.ubuntu.com/23059501/ | 20:48 |
smoser | obnoxious | 20:49 |
smoser | so open("/sys/class/net/bonding_masters/address") throws a NotADirectoryError with a errno of 20 | 20:49 |
mgagne | try with an existing file and append a filename to it and try opening it | 20:49 |
smoser | mgagne, right. thats it. ok. thank you | 20:50 |
rharper | yeah; it's the full path that included not-a-dir-element | 20:51 |
smoser | http://paste.ubuntu.com/23059511/ | 20:52 |
rharper | y | 20:52 |
smoser | there is a question in my mind if there could be 2 nicks with the same address | 20:53 |
smoser | wow | 20:54 |
smoser | $ cat /sys/class/net/bond0/address | 20:54 |
smoser | 52:54:00:f2:5a:35 | 20:54 |
smoser | $ cat /sys/class/net/ens3/address | 20:54 |
smoser | 52:54:00:b2:5a:27 | 20:54 |
smoser | $ cat /sys/class/net/ens5/address | 20:55 |
smoser | cat: /sys/class/net/ens5/address: No such file or directory | 20:55 |
smoser | so the answer is that you can't have 2, but if this were to run after a bond were set up, we'd get the bond as the device with that mac | 20:55 |
smoser | which is odd | 20:55 |
mgagne | root@localhost:/sys/class/net# cat bond0/address | 20:55 |
mgagne | 0c:c4:7a:34:6e:3c | 20:55 |
mgagne | root@localhost:/sys/class/net# cat eno1/address | 20:55 |
mgagne | 0c:c4:7a:34:6e:3c | 20:55 |
mgagne | root@localhost:/sys/class/net# cat eno2/address | 20:55 |
mgagne | 0c:c4:7a:34:6e:3c | 20:55 |
rharper | as I mentioned before; for naming, we can ignore non-physical devices; | 20:56 |
rharper | bonds/bridges/vlans have various configs that inherit mac of underlying devices; we really want to know the physical nic and mac pairing | 20:57 |
smoser | that is odd. | 20:57 |
smoser | well, rharper we dont *always* want the physical nic. we could put a bond on two vlans | 20:58 |
smoser | i think though, that the code i have in that tree is actually right. | 20:58 |
rharper | vlan names are arbitrary | 20:58 |
rharper | as are bond names | 20:58 |
smoser | sure. but if we're looking to get a mapping of mac to interfacen ame, then the path is valid. | 20:58 |
rharper | that is, we can always set them | 20:59 |
smoser | but i think the code is doing the right thing at this point. | 20:59 |
smoser | as it looks through the links and sets up the name, and only overwrites it with what it found in /sys if it does not yet have a mac from the links table. | 20:59 |
mgagne | you can find the original mac address in /sys/class/net/<ifname>/bonding_slave/perm_hwaddr | 20:59 |
rharper | to configure the bond or vlan correctly, we only need to lookup link_ids of physical devices; | 20:59 |
rharper | and emit those names in the config | 21:00 |
smoser | oh wait. it doesnt do that. but it shoudl. | 21:00 |
smoser | yeah, its ok as it is right now i think. | 21:02 |
smoser | mgagne, so i think what i just pushed will fix all but the 'auto' | 21:03 |
smoser | i have to runnow. | 21:03 |
mgagne | https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bond_name ? | 21:04 |
smoser | right | 21:04 |
mgagne | no, this won't fix the auto | 21:04 |
smoser | the one 2dee860 should fix the NotADirectory | 21:04 |
smoser | right | 21:04 |
smoser | it should fix all *but* the auto | 21:04 |
mgagne | http://paste.openstack.org/show/557628/ | 21:04 |
mgagne | and the point 3.2) | 21:05 |
rharper | mgagne: yeah | 21:05 |
mgagne | I'm stuck at 3.2 since last week | 21:05 |
mgagne | everything else has been fixed on my side since | 21:05 |
mgagne | maybe not in the way you would have done it | 21:05 |
rharper | when you say later, do you mean subsequent boots ? | 21:05 |
mgagne | let me check | 21:06 |
mgagne | "3.2) Once 3.1) is fixed, configuration fails again later" | 21:06 |
mgagne | Once 3.1 is fixed, cloud-init will fail again at a different place further down | 21:06 |
mgagne | will reword the description | 21:06 |
smoser | i dont understand 3.2 failure though | 21:07 |
smoser | as when this runs, the bond should not be set up yet | 21:07 |
mgagne | cloud-init looks to be rerun twice. Since bond is already configured at this time, mac isn't found and crash | 21:07 |
mgagne | it is | 21:07 |
smoser | it shoudl not run the network config step on the second time through if it successfully ran it on the first. | 21:08 |
smoser | ie, cloud-init local should set up networking before anything is allowed to come up | 21:08 |
smoser | and then cloud-init should come up and see its not first instance boot (cloud-init local was) and not run that section | 21:08 |
mgagne | well, I didn't see failure in dsmode=local and cloud-init still try to (re)configure network anyway | 21:08 |
mgagne | I can rebuild with latest patches and pull out logs | 21:09 |
smoser | mgagne, sure. i do agree if it ran twice the second one well coudl fail | 21:09 |
smoser | i do have ot run now. | 21:09 |
mgagne | the ENOTDIR error is caused by this second run | 21:10 |
rharper | we should debug why you get a second run, but certainly the 'bonding_masters' file is only added after a bond is configured | 21:13 |
mgagne | image is rebuilding with all patches, gonna take a while before it builds and then a baremetal is booted | 21:19 |
mgagne | maybe I can pull the logs from the current baremetal, it ran twice anyway | 21:20 |
mgagne | rharper: http://paste.openstack.org/show/GHjpr6jMq1uxoCtRl922/ | 21:24 |
rharper | k | 21:24 |
mgagne | "Execution continuing, no previous run detected that would allow us to stop early." | 21:25 |
rharper | is config provided in two places? or ConfigDrive only? | 21:25 |
mgagne | we only have configdrive in place | 21:26 |
mgagne | "no-net is written by upstart cloud-init-nonet when network failed" I'm not sure xenial still has upstart | 21:26 |
mgagne | the file /var/lib/cloud/data/no-net does not exist on the machine | 21:27 |
rharper | ok | 21:28 |
rharper | yeah, when init runs, it reads the data; that somehow is definitely re-running net config | 21:28 |
mgagne | I don't understand the logic here... if no-net exists, it's because network config failed. The logic makes it so network config in dsmode=net will STOP if network previously failed. in our case, it sure didn't fail... so it will try to configure it again? o_O | 21:31 |
mgagne | I only found occurence of no-net in cmd/main.py and upstart config file so I'm not sure what to think here | 21:32 |
rharper | so, dsmode=net, IIUC, is for things like AWS where the datasource is over the network | 21:32 |
rharper | in that case, cloud-init has to bring up something (fallback networking, dhcp on an interface) and attempt to find a datasource over the networking | 21:33 |
rharper | after acquiring a datasource, networking config may be included, in which case, we'd need to update the network config (override the fallback generated one) | 21:33 |
mgagne | I have those ds loaded: NoCloud, ConfigDrive, OVF, MAAS | 21:35 |
mgagne | Should I remove all but ConfigDrive? | 21:35 |
rharper | in the ConfigDrive case, it's not via network but local file | 21:35 |
rharper | you can configure them off but it will only select one | 21:35 |
mgagne | so I think DataSourceConfigDrive supports both local and net. | 21:37 |
mgagne | and there is no flag to previous it from running twice. But it could also be by design but didn't plan for bonding support and all its side effects | 21:38 |
rharper | I think I can see it applied twice; it was only with bonding configs that we trip up | 21:38 |
mgagne | yea | 21:38 |
mgagne | I think there was a lot of logic that didn't account for bonding being configured and enabled at that time | 21:38 |
mgagne | like mac address changing | 21:39 |
rharper | yeah; I don't think we want to apply whole-sale net config twice; | 21:39 |
mgagne | but this *could* be a valid scenario (I don't know yet how) | 21:39 |
mgagne | like boot with APIPA address in dsmode=local and later get an IP with metadata service or whatever. | 21:40 |
mgagne | I just don't want to bulldozer my way to make bonding work and break use cases I didn't know existed | 21:41 |
rharper | yeah, so init [net] mode does re-read the json data and attempts to create network_state which invokes the openstack conversion, which fails when the initial state of the system is already configured | 21:41 |
rharper | I don't think you're breaking anything; smoser or harlowja will have to help me understand why they'get parsed twice | 21:42 |
mgagne | only because it can't find the link mac address (after ENOTDIR is fixed of course) | 21:42 |
rharper | sure; but in general, I'd like to know why we convert it twice (no need, it was already rendered into the instance_id object IIUC) | 21:43 |
rharper | so, if it didn't fail converting due to the ENOTDIR, then it's attached to the stage object and you see: | 21:43 |
rharper | stages.py[DEBUG]: not a new instance. network config is not applied. | 21:43 |
rharper | you | 21:43 |
rharper | yours never gets that far; maybe the rebuild will | 21:44 |
* rharper steps out for a bit | 21:45 | |
=== rangerpb is now known as rangerpbzzzz | ||
mgagne | rharper: ok I fixed the last issue. baremetal is now booting fine | 22:27 |
mgagne | rharper: all patches: http://paste.ubuntu.com/23059836/ | 22:28 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!