Odd_Blokesmoser: I've added some comments on https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1603222; your thoughts would be much appreciated. :)06:12
smoserOdd_Bloke, so i think that at this point more of the stuff is in cloud-inti, right ?13:48
smoseri think what you're saying is that cloud-init in xenial does not depend on udev rules form walinux agent13:48
smoserso i'd prefer if we fix something to have that be consistent between trusty -> xenial +13:49
Odd_Blokesmoser: The problem is orthogonal to udev rules, really.13:58
Odd_Blokesmoser: One thing I wasn't sure about is how introducing the udev rules to existing instances would affect them.13:58
Odd_BlokeBut both sets would apply, so I think it would be fine...13:58
smoserright. they both should set up symlinks.13:59
smoserwhy is it orthogonal?13:59
=== rangerpbzzzz is now known as rangerpb
Odd_BlokeBecause the problem is that they don't use the udev rules properly; it doesn't matter which set of udev rules are there.15:05
Odd_BlokeBut, yeah, I'm happy to backport the udev rules if that seems safe.15:06
Odd_BlokeI was just thinking in terms of minimising the backport diff.15:06
mgagnesmoser: will test your fix for bond right now16:44
mgagnehowever I believe 3.2) described in bug still isn't fixed.16:44
smosermgagne, looking18:14
mgagnein a meeting but so far, only the auto stanza on bond0 looks to be missing18:18
smosermgagne, i'm not sure its requried.18:19
mgagneit is as far as I know, I added it and it worked18:19
smoseri'll compare the network config against other stuff we have examples of (curtin vmtest) where we actually verify18:19
mgagnewill test after my meeting18:19
smoserthanks mgagne18:19
smoserrharper, look at https://bugs.launchpad.net/cloud-init/+bug/1605749 and have a think.18:52
rharpersmoser: is that your bonding fix ?18:53
rharperyeah; I read the branch earlier, my initial thought was that we'd generically want to "Resolve links" at render time18:53
rharperrather than be bond specificy18:53
rharperthe mechanism to use link['id'] as the interface name key is common for all types; that is in the network state we only have link/ids and then at runtime we'd do a  replacement of link-id with get_name_from_macaddr(link_to_mac[link-id]) sort of lookup18:55
smoserrharper, well, do other things have referenced links ?18:55
rharperany of the combined types18:55
rharperbridges, bonds, and vlans18:55
smoserrharper, well, if we want, its easy enough now with the same generic mechanism18:56
rharperw.r.t auto on bond0; that is needed; there's a bug related not getting auto on stanzas without network config (the subnet has the 'control' value and default)18:57
rharperso a bond with no subnets, but then vlans on top misses the 'auto bondX' line;  I've found this in curtin since Friday, working on a fix there;18:57
smoserthen the 2 things are18:57
smosera.) auto bond [necessary]18:57
smoserb.) resolve links generically18:58
smoserwhere b is not strictly necessary at this time...18:58
smosercan you give anohter example of where it is?18:58
rharpera) bonds default to auto, unless a subnet with 'control' says otherwise18:58
rharperfor b) just replace the vlan_raw_interface value from bond.X to interface018:59
rharperthe underlying device if it's type physical' will refer to another "links" element which may not have a 'name' key set18:59
rharperthink, eth0.123;  the eth0 would be the link.id; and that may not be the name of the device, like bond_interfaces contains link.ids19:00
smoserwhere did you come up with the string 'vlan_raw_inteface' ?19:06
smoseri dont see that anywhere.19:06
rharperthat's the eni name19:06
rharpersorry, vlan_link is the network_data.json field19:07
rharperin cloudinit/sources/helpers/openstack.py:59319:07
rharperwe generate a nic name based on the vlank_link and vlan_id, which is OK since vlan is a constructed interface, but the vlan_link points to the underlying device (this is a link.id) and needs to be replaced19:08
smoserrharper, ok. i can add a test for vlan on that also.19:12
rharperif you need a config yaml, I have one19:12
smoseri think that is what you were saying19:15
smoserthat shows the error.19:15
smoserin 'eth0.602', does 'eth0' actually matter ?19:15
rharperfor our internal state no19:17
rharperbut it's typical shorthand for underlying device . vlan_id19:17
rharperso, the vlan scripts in ifupdown split on . and call vconfig with the first segment and pass the second as the vlan_id19:17
rharperyou can instead  say iface vlan1 and underneath specify the vlan_raw_device (eth0), and vlan_id (123)19:18
smoserwell that sucks19:23
smoserharder to get at that19:25
rharperno, we just need to tag the elements of state that use a link.id19:27
rharperand when we're rendering the interface, do an id lookup by mac19:28
rharperwe already repeat the vlan_id as an interface attribute19:28
rharperwe can set the vlan interface name to vlan{index} instead of {link_id}.{vlan_id}19:28
rharperwhich is what we're doing now;19:28
smoserwell, https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bond_name is updated to now have a vlan test case that i think renders correctly19:46
smoseri dont love the mechanism, but seems to work19:46
smoserstill missing the auto up change, assuming youre looking at that.19:53
rharperyeah, that's a one-liner in eni.py19:54
rharperbasically in the case that we don't have a subnet configured, if we have 'bond-masters' or 'bond-slaves' we emit an auto $iface19:55
rharperI need to think a bit more19:55
rharperas it's a general issue for interfaces that don't include a subnet config since 'control' is a property of a subnet19:55
mgagnesmoser: so I patched on my side to get auto added. It now fails as described in the bug description in 3.1)19:56
smosermgagne, well, how are you running that ?19:57
mgagnesmoser: running what?19:57
mgagnesmoser: install cloud-init from repo, apply patches found in your branch, build image, upload image, boot19:58
rocketI am wondering if I can use cloud-init in my home lab running with vmware fusion.  Is there a metadata server I could run locally thats fairly simple/lightweight?20:09
smoserrocket, there is some vmware support, but i'm not familiar enough with their product line to know if 'fusion' supports it.20:14
rocketI was just hoping to start up a pythonic based webserver or something .. or should I be looking at creating my own that produces the yaml files I am seeing in documentation?20:15
rocketI just didn't know what was required for a really simple setup20:16
rocketI *think* I just need random hostnames and point that towards a saltmaster etc..20:16
smoserrocket, theres two things that provide difficulty i think20:19
smosera.) you need data per isntance-id ... each instance needs to somewho get different data20:19
smoserb.) you have to tell cloud-init where the metadata service is.20:19
smoseryou could mock the ec2 one by mocking and plumbing that network in20:20
rharpersmoser: 3.1 is another run-time variant;  bonds inherit mac address of the slaves ;  when we're doing the lookup, we can filter by type (we only need to loop up names by macs of 'physical' devices)20:20
smoserrharper, right. but he's getting a stack trace ther20:23
smoserwhich is interesting and i can't reproduce.20:23
mgagnesmoser: we are not yet at 3.2, we are still stuck at 3.120:23
mgagnesmoser: how are you testing? do you have access to an openstack cloud?20:24
smoserpython3 NotADirectory and FileNotFound are obnoxious20:24
rharpersmoser: we need to not try to look up the bond mac address; it can be called *whatever* bond{index} ; only the bond_interfaces lists of link_ids need to be resolved (and actually) we need to check the type of the links to see if they're physical, otherwise we can ignore the mac lookup20:25
smosermgagne, well, i do, yes, but was not focused there yet, and i dont have an openstack cloud that woudl ask me to bind lik ehtat20:25
mgagnethis line is problematic: https://git.launchpad.net/cloud-init/tree/cloudinit/net/__init__.py#n9920:25
mgagneit makes the assumption that all devices found in this folder are a real device and file is a directory20:26
mgagnethis is why this line fails: https://git.launchpad.net/cloud-init/tree/cloudinit/net/__init__.py#n35020:26
mgagnebut could be related to python3 as you said20:26
smosermgagne, well, it doesnt really make the assumption20:31
smoserit accepts an OSError and a IOError and does the right thing20:31
mgagneI'm not sure why one would list those devices and only filter them later20:32
mgagnerharper: I'm not sure why cloud-init tries to configure the network a second time. The 2nd time is run, slaves mac address might be updated and no longer match the ones found in config-drive.20:42
rharpersmoser: it does the right thing (read_sys_net) however, the interfaces_by_mac does not like 'bonding_masters' file and throws exception; this prevents creating the mac_to_ifname;  we can handle the NotADirectoryError and continue20:42
smoserrharper, we are trying to handle that.20:43
smoserthats the thing20:43
smoserNotADirectoryError is an OSError20:43
rharperwhen I test it, it's not handled20:43
smoserbut apparently does not have errno = 220:43
smoserwhere do you test this ?20:43
rharperxenial vm20:43
rharperwith bond added20:43
rharperif you're on diglett you can ssh into the vm20:44
rharpersmoser: ssh ubuntu@
rharperthis is not with your branch, so if you've updated it's just what's in xenial (cloud-init level)20:45
mgagnecurrent code is testing for ENOENT, not ENOTDIR20:45
smoserprobably need eNOTDIR20:45
smoserrharper, http://paste.ubuntu.com/23059501/20:48
smoserso open("/sys/class/net/bonding_masters/address") throws a NotADirectoryError with a errno of 2020:49
mgagnetry with an existing file and append a filename to it and try opening it20:49
smosermgagne, right. thats it. ok. thank you20:50
rharperyeah; it's the full path that included not-a-dir-element20:51
smoserthere is  a question in my mind if there could be 2 nicks with the same address20:53
smoser$ cat /sys/class/net/bond0/address20:54
smoser$ cat /sys/class/net/ens3/address20:54
smoser$ cat /sys/class/net/ens5/address20:55
smosercat: /sys/class/net/ens5/address: No such file or directory20:55
smoserso the answer is that you can't have 2, but if this were to run after a bond were set up, we'd get the bond as the device with that mac20:55
smoserwhich is odd20:55
mgagneroot@localhost:/sys/class/net# cat bond0/address20:55
mgagneroot@localhost:/sys/class/net# cat eno1/address20:55
mgagneroot@localhost:/sys/class/net# cat eno2/address20:55
rharperas I mentioned before; for naming, we can ignore non-physical devices;20:56
rharperbonds/bridges/vlans have various configs that inherit mac of underlying devices;  we really want to know the physical nic and mac pairing20:57
smoserthat is odd.20:57
smoserwell, rharper we dont *always* want the physical nic. we could put a bond on two vlans20:58
smoseri think though, that the code i have in that tree is actually right.20:58
rharpervlan names are arbitrary20:58
rharperas are bond names20:58
smosersure. but if we're looking to get a mapping of mac to interfacen ame, then the path is valid.20:58
rharperthat is, we can always set them20:59
smoserbut i think the code is doing the right thing at this point.20:59
smoseras it looks through the links and sets up the name, and only overwrites it with what it found in /sys if it does not yet have a mac from the links table.20:59
mgagneyou can find the original mac address in /sys/class/net/<ifname>/bonding_slave/perm_hwaddr20:59
rharperto configure the bond or vlan correctly, we only need to lookup link_ids of physical devices;20:59
rharperand emit those names in the config21:00
smoseroh wait. it doesnt do that. but it shoudl.21:00
smoseryeah, its ok as it is right now i think.21:02
smosermgagne, so i think what i just pushed will fix all but the 'auto'21:03
smoseri have to runnow.21:03
mgagnehttps://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bond_name ?21:04
mgagneno, this won't fix the auto21:04
smoserthe one 2dee860 should fix the NotADirectory21:04
smoserit should fix all *but* the auto21:04
mgagneand the point 3.2)21:05
rharpermgagne: yeah21:05
mgagneI'm stuck at 3.2 since last week21:05
mgagneeverything else has been fixed on my side since21:05
mgagnemaybe not in the way you would have done it21:05
rharperwhen you say later, do you mean subsequent boots ?21:05
mgagnelet me check21:06
mgagne"3.2) Once 3.1) is fixed, configuration fails again later"21:06
mgagneOnce 3.1 is fixed, cloud-init will fail again at a different place further down21:06
mgagnewill reword the description21:06
smoseri dont understand 3.2 failure though21:07
smoseras when this runs, the bond should not be set up yet21:07
mgagnecloud-init looks to be rerun twice. Since bond is already configured at this time, mac isn't found and crash21:07
mgagneit is21:07
smoserit shoudl not run the network config step on the second time through if it successfully ran it on the first.21:08
smoserie, cloud-init local should set up networking before anything is allowed to come up21:08
smoserand then cloud-init should come up and see its not first instance boot (cloud-init local was) and not run that section21:08
mgagnewell, I didn't see failure in dsmode=local and cloud-init still try to (re)configure network anyway21:08
mgagneI can rebuild with latest patches and pull out logs21:09
smosermgagne, sure. i do agree if it ran twice the second one well coudl fail21:09
smoseri do have ot run now.21:09
mgagnethe ENOTDIR error is caused by this second run21:10
rharperwe should debug why you get a second run, but certainly the 'bonding_masters' file is only added after a bond is configured21:13
mgagneimage is rebuilding with all patches, gonna take a while before it builds and then a baremetal is booted21:19
mgagnemaybe I can pull the logs from the current baremetal, it ran twice anyway21:20
mgagnerharper: http://paste.openstack.org/show/GHjpr6jMq1uxoCtRl922/21:24
mgagne"Execution continuing, no previous run detected that would allow us to stop early."21:25
rharperis config provided in two places? or ConfigDrive only?21:25
mgagnewe only have configdrive in place21:26
mgagne"no-net is written by upstart cloud-init-nonet when network failed" I'm not sure xenial still has upstart21:26
mgagnethe file /var/lib/cloud/data/no-net does not exist on the machine21:27
rharperyeah, when init runs, it reads the data; that somehow is definitely re-running net config21:28
mgagneI don't understand the logic here... if no-net exists, it's because network config failed. The logic makes it so network config in dsmode=net will STOP if network previously failed. in our case, it sure didn't fail... so it will try to configure it again? o_O21:31
mgagneI only found occurence of no-net in cmd/main.py and upstart config file so I'm not sure what to think here21:32
rharperso, dsmode=net, IIUC, is for things like AWS where the datasource is over the network21:32
rharperin that case, cloud-init has to bring up something (fallback networking, dhcp on an interface) and attempt to find a datasource over the networking21:33
rharperafter acquiring a datasource, networking config may be included, in which case, we'd need to update the network config (override the fallback generated one)21:33
mgagneI have those ds loaded: NoCloud, ConfigDrive, OVF, MAAS21:35
mgagneShould I remove all but ConfigDrive?21:35
rharperin the ConfigDrive case, it's not via network but local file21:35
rharperyou can configure them off but it will only select one21:35
mgagneso I think DataSourceConfigDrive supports both local and net.21:37
mgagneand there is no flag to previous it from running twice. But it could also be by design but didn't plan for bonding support and all its side effects21:38
rharperI think I can see it applied twice; it was only with bonding configs that we trip up21:38
mgagneI think there was a lot of logic that didn't account for bonding being configured and enabled at that time21:38
mgagnelike mac address changing21:39
rharperyeah; I don't think we want to apply whole-sale net config twice;21:39
mgagnebut this *could* be a valid scenario (I don't know yet how)21:39
mgagnelike boot with APIPA address in dsmode=local and later get an IP with metadata service or whatever.21:40
mgagneI just don't want to bulldozer my way to make bonding work and break use cases I didn't know existed21:41
rharperyeah, so init [net] mode does re-read the json data and attempts to create network_state which invokes the openstack conversion, which fails when the initial state of the system is already configured21:41
rharperI don't think you're breaking anything;  smoser or harlowja will have to help me understand why they'get parsed twice21:42
mgagneonly because it can't find the link mac address (after ENOTDIR is fixed of course)21:42
rharpersure; but in general, I'd like to know why we convert it twice (no need, it was already rendered into the instance_id object IIUC)21:43
rharperso, if it didn't fail converting due to the ENOTDIR, then it's attached to the stage object and you see:21:43
rharperstages.py[DEBUG]: not a new instance. network config is not applied.21:43
rharperyours never gets that far; maybe the rebuild will21:44
* rharper steps out for a bit 21:45
=== rangerpb is now known as rangerpbzzzz
mgagnerharper: ok I fixed the last issue. baremetal is now booting fine22:27
mgagnerharper: all patches: http://paste.ubuntu.com/23059836/22:28

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!