/srv/irclogs.ubuntu.com/2016/08/22/#cloud-init.txt

prometheanfiresmoser: what's the diference between _write_network, _write_network_config, _bring_up_interfaces and _bring_up_interface?02:15
prometheanfiresmoser: I have gentoo networking working, at least for dhcp, probably for static as well03:38
prometheanfiresmoser: last two commits here https://github.com/prometheanfire/cloud-init03:39
prometheanfireonly 'error' that happens doesn't seem to hurt anything03:46
prometheanfire2016-08-22 03:44:21,495 - __init__.py[WARNING]: apply_network_config is not currently implemented for distribution '<class 'cloudinit.distros.gentoo.Distro'>'.  Attempting to use apply_network03:46
=== Takumo is now known as Jagmilleurs
=== Jagmilleurs is now known as Takumo
smoserprometheanfire, here now. reading your comments.13:34
smoseryou gave the systsem bonding config ?13:34
prometheanfiresmoser: hi13:34
prometheanfiresmoser: no, the kernel just supports it13:34
prometheanfireand this breaks things13:34
smoserhm..13:35
prometheanfirebecause bonding_masters exists as a file within /sys/class/net13:35
smoserwell, there is another fix for that, that i do need to get in. but i didn't believe that it was as simple as you say.13:35
smoserthe ubuntu kernels *do* support bonding13:35
prometheanfirewell, for us the module is loaded static13:35
smoserprobably its as a module though, and likely it isnt loaded when that runs.13:35
prometheanfiremaybe that's why we hit it13:36
smoser"module loaded static"13:36
smoser?13:36
smoseryou m ean builtin13:36
prometheanfireya13:36
smoser:)13:36
prometheanfirestatic kernel :D13:36
prometheanfireit's how I (personally) compile my kernels13:36
smoserok.13:36
prometheanfireeasy to ship13:36
prometheanfirethis is the same code used in glean if it helps13:37
smoserglean ?13:37
prometheanfiresimple-init?13:38
prometheanfiresame thing13:38
prometheanfirehttps://github.com/openstack-infra/glean/13:38
prometheanfirehttps://github.com/openstack-infra/glean/blob/master/glean/cmd.py#L68613:38
smoserprometheanfire, so https://git.launchpad.net/~smoser/cloud-init/log/?h=bond_name has the fix for bond_master stuff13:45
smoserand wrt glean, i can't just take that, as license is not compatible.  cloud-init requires signing cla for contribution.13:46
prometheanfireright13:46
smoser:-(13:46
prometheanfireabout the code13:46
prometheanfireI wrote it13:46
prometheanfireso I should be able to contribute it to both projects13:47
prometheanfireI'd think at least13:47
smoseroh. yeah, then you can.13:47
smoseryes.13:47
prometheanfire:D13:47
smoserthe other coment is that '_write_network' is kind of the "legacy" mechanism13:48
prometheanfirein any case we should probably ignore the other interfaces, but if your boding update code works then that solves the immediate problem13:48
prometheanfireso, should I convert to _write_networks?13:50
smoserwe want "_write_network_config", and using a Renderer like ubuntu/debian and rhel does13:51
smoserunfortunately i think you're going to want more doc / info on the "network state" :)13:53
prometheanfireis netconfig the same thing as settings?13:53
smoserno. its more like what is described at http://people.canonical.com/~rharper/curtin/topics/networking.html13:54
smoserand ubuntu and rhel basically load that thing into a 'network_state' and then render from mit.13:54
prometheanfireya, that's doable13:54
prometheanfirea useable datastructure13:54
smoserthe one thing that i can tell you is that there are unit tests that you can easily run to poke around at what is happening13:55
smoser(and please do contribute unit tests for your new code)13:55
prometheanfireshouldn't need unit tests for that, but ok :P13:55
smoserprometheanfire, thank you. i'm really happy to have your contributions13:55
smoserwell, for rendering... you  can have quite complex state13:55
prometheanfireyarp13:56
prometheanfireat the moment I'll be using this patch on 0.7.713:56
prometheanfireor will soon13:56
prometheanfirebut I've tested it at least and it fufills the simple use case13:57
prometheanfireand even better, it actually works13:57
prometheanfirecurrent code is broken for networking on gentoo, it lays down debain style configs13:57
prometheanfireright now the code to use the new method looks over complicated13:59
smoserprometheanfire, yeah. i'm not really happy with it either. note though that it supports much more complex config14:04
smosermultiple ips per nic, ipv4 ipv6, bond, vlan..14:05
prometheanfireit does14:05
smoserso yes, it is more complicated14:05
prometheanfireI just don't even know where to start14:05
prometheanfireit's that hairy14:05
smoseryeah.14:05
smoseri'd suggest looking at rhel, and how that works.14:07
smoserand then just working from unit tests to get somethign sane.14:08
smoserher... i'll try to get some scaffolding in place for you14:08
smosers/her/here/14:08
prometheanfirecurrent code actually started from arch14:08
smoserwaht wyould you suggest as the name for the ntwork config style ?14:09
prometheanfirewe have a name, sec14:09
smoserie, 'eni', 'sysconfig' ...14:09
prometheanfirehttps://wiki.gentoo.org/wiki/Netifrc14:10
prometheanfirenetifrc14:10
prometheanfirelooking at rhel's stuff in net/sysconfig14:11
smoserk. thanks..14:11
prometheanfireok, this is an example of the netconfig14:17
prometheanfire{'version': 1, 'config': [{'name': 'eth0', 'subnets': [{'type': 'dhcp'}], 'mac_address': 'fa:16:3e:00:10:ee', 'type': 'physical'}]}                                                                                                                                        [ ok ]14:17
smoserright. thats netconfig, then it gets loaded into 'NetworkState', and the renderers actually take the NetworkState.. a middle ground14:18
prometheanfireright14:18
smosergive me 10 more minutes, and i'll try to hand you a unit test thing to fill in14:18
prometheanfirecool14:18
prometheanfirehave been looking at sysconfig.py, looks like it's just a longer version of my if  stuff14:19
prometheanfiredoes netconfig allow you to have vlans in your bonds or bonds in your vlans, etc?14:20
prometheanfireoh, it's listed explicitly14:23
prometheanfireneat14:23
smoser http://paste.ubuntu.com/23078466/14:32
smoserprometheanfire, ^ that should at least allow you to easily poke through the code and see your results easily.14:33
smoseri did not add the _write_network_config to gentoo distro but i'm guessing you can figure that out.14:33
smoserone thing to point out, the renderer should be idempotent. whatever network config is there, the one provided is what the sysstem *should* do14:34
prometheanfireya, that's the easy part14:37
prometheanfireI've already started on it14:37
smosermgagne_, i'd appreciate your thoughts on https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/30356315:31
smoserideally i'd liek to have that merged today, you mentioned it had some issues still i think15:31
smoseri think you still say 3.2 is busted.15:34
mgagne_smoser: I will test the whole patch set. Is it complete yet?16:06
mgagne_smoser: 3.2 got fixed with the bonding_slaves detection16:06
smosermgagne_, then i think it is complete ...16:08
smoserpending your test :)16:08
smoserand my fuzzy memory16:08
mgagne_smoser: for some reasons, cloud-init configures network twice. At this point, I don't care about the reason, it's beyond my expertise. So detecting bonding should do the job.16:08
smoserit configures networking once.16:08
mgagne_ok, will launch a build with your patches against 0.7.7~bzr1256-0ubuntu1~16.04.116:08
smoserit renames devices twice.16:08
mgagne_smoser: It goes through the network json config parsing twice at least16:08
smoseri'mi pretty sure thats the case, and i have to think again about why we rename network devices after we've configured.16:08
mgagne_so I will be applying this patch http://paste.ubuntu.com/23078862/ verbatim to cloud-init 0.7.7~bzr1256-0ubuntu1~16.04.1 and will boot on a baremetal with bonding+vlans to validate.16:15
Tim_hi16:19
prometheanfirethanks17:03
smoserprometheanfire, merged your branch17:09
prometheanfireyarp17:09
prometheanfirewill rebase my integration branch when your stuff merges17:09
prometheanfireI'll see if I can test your patch17:23
mgagne_smoser: so I tested the patches. Somehow I managed to reproduce an intermittent bug my coworker had. Default gateway fails to configure and server doesn't ping.17:55
smoserhmm... i think maybe rharper might know something.17:55
smosermgagne_, xenial, right ?17:55
mgagne_yes17:56
smoserifupdown is very "fun".17:56
mgagne_smoser: is there anything I can look at? default gw is configured in post-up with route add || true17:56
mgagne_so I'm not sure how I'm supposed to debug that17:56
smoserand i know that rharper has been doing some hair pulling over bonds recently.17:56
smoserso you can get into it, mgagne ?17:56
mgagne_so we never supported ubuntu 16.04, even without bonding yet so I'm not sure if it's a known issue with bonding or cloud-init.17:57
mgagne_any logs I can pull to help debug?17:57
prometheanfiresmoser: reviewed your patch, worksforme18:10
smosermgagne_, grab /var/log/cloud-init.log18:11
prometheanfirealso, closed the other merge request18:12
smosermgagne_, can you get to it while its up?18:13
smoseror are you ust able to shut down and collect files18:13
mgagne_yea, will get the logs after my meeting =)18:13
mgagne_will get the whole /var/log if needed18:13
smoserif you can get while its up, please get output of18:14
smoserifconfig -a18:14
smoserand any thing else you might find useful18:14
smosersystemct status18:14
smoserwoiuld be good18:14
mgagne_if post-up fails, will the output be logged?20:06
rharpermgagne_: any stderr from the ifup will be capture in the networking.service log, so you should see something in systemctl status -l networking20:09
mgagne_ok, I ran that command and didn't see anything20:09
rharpermgagne_: from your gist though, the devices and routes all came up as configured20:09
mgagne_will rerun20:09
mgagne_to make sure20:09
mgagne_check cloud-init-output.log, you won't see the default gw20:09
mgagne_only link local routes20:09
rharperin your gist, do you have the etc/network/interfaces.d/50-cloud-init.cfg file ?20:11
smoserrharper, interfaces-cloud-init.txt20:11
mgagne_yes20:11
smoserhttps://gist.github.com/mgagne/fbc1b05412f41426f2e248acd5efad14#file-interfaces-cloud-init-txt20:11
rharpersmoser: ah right20:12
mgagne_added systemctl status -l networking output20:12
mgagne_so I suspect that *maybe* the default route is added but something removes it later?20:12
mgagne_or will || true hide the failure?20:12
rharpermgagne_: in your gist, the bond0.602 default gw is the one you expect ?  (the launchpad bug had other config)20:13
mgagne_yes20:13
mgagne_I thought it would be configured with the gateway stanza but ¯\_(ツ)_/¯20:14
smoserOdd_Bloke, around ?20:17
smoser test_exception_fetching_fabric_data_doesnt_propagate20:18
smoserwhy would i not want that to propogate?20:18
mgagne_rharper: is there anything I can do to help debug? I don't mind rebuilding an image with debug config or whatever.20:21
rharpermgagne_: a plain route -n  would be nice20:21
rharperand the original network_data.json;20:21
mgagne_rharper: I added the default route already so I can SSH and pull logs20:22
rharperit's applying some routes; I just can't see why it wouldn't do the post up20:22
mgagne_rharper: added to gist20:23
mgagne_rharper: I will try to reboot a 2nd server which didn't have the issue and see if I can reproduce after multiple reboots20:24
rharperok20:24
mgagne_rharper: let me know if you would prefer to get SSH access for further debug, this can be done20:24
rharperok20:24
smoserrangerpbzzzz, around ?20:30
smoserwonder if its ok if i open a bug and assign it toyou.20:30
rharpermgagne_: so I can recreate the case where the cloud-init-output does not contain the default route; but post-up on bond0.602 does run and work;  maybe we could add a cloud-init final command to run route -n so we can see that?  in xenial, cloud-init writes the files and networking.service is doing an ifup -a (which will bring up any non physical devices ;  the physical devices with bond-master will create the bond0 and20:34
rharperenslave them) and then the ifup -a will trigger an ifup on bond0.602 and bond0.612;  they'll run and run the post-up which should add the default gw you need;20:34
mgagne_rharper: so you think that: cloud-init runs route -n, doesn't see default gw at this point in time but later route should be configured by ifup?20:35
rharperI'm not entirely certain but in my recreate;  the output info doesn't show the bond.vlan route; but when I login and run route; it's fully up20:35
mgagne_because it's true that the /32 route doesn't show in route output. this means something is running later to add routes.20:35
rharperI don't know when it runs to collect the network status20:36
rharperbut possibly too soon or some other reason20:36
rharpernot sure if smoser has more details20:36
mgagne_could it be the slaves link aren't fully up and therefore routes aren't applied yet since it's in post-up?20:36
rharperyes20:36
rharperslaves can take some time20:36
rharperbonding scripts will wait up to 600 seconds for a bond to join20:37
rharpererr , 60 seconds20:37
smoserhm.20:37
rharper(60 * 0.1)20:37
mgagne_6 seconds? :D20:37
smosercloud-init writes network status during 'init' stage20:37
rharper*sigh*  600 * 0.120:37
smoserwhich shoudl be after static networking is up20:37
smoserso if that runs before all if the 'ifup' stuff is finished, then that is a bug20:37
smoser# systemctl cat cloud-init.service | grep networking20:39
smoserAfter=cloud-init-local.service networking.service20:39
rharpersmoser: no, I can see the route info now;  I was looking at the top of the -output file before I added the config;  I definitely see the default routes running but; this is a VM versus baremetal;20:39
smoserRequires=networking.service20:39
mgagne_but Net device info shows the interface as up20:39
mgagne_I don't know where it takes the status but it means slaves are up too?20:39
rharperhttp://paste.ubuntu.com/23079501/20:40
rharperit should look like that20:40
rharperI only added the one bond with default route (and used active-backup on a second nic in a VM);  but the output should look similar in number of routes20:40
rharperbut it is odd that during the dump of the route in mgagne_ case, there is nic message from kernel about being up;20:42
rharperthe info table runs at Up 48.87 , but the nic up message isn't until 5720:43
mgagne_so I'm rebooting in loop to try to reproduce the problem and so far, no luck20:44
mgagne_coworker says that if you reboot a node with the problem, gw is configured properly and your issue is fixed.20:44
rharperyeah, the switch delay20:44
mgagne_so I'm wondering if it's something cloud-init does at that time20:45
mgagne_which isn't done in next boot20:45
mgagne_like renaming an interface20:46
rharperyou can force cloud-init to re-run by nuking /var/lib/cloud/*  in the instance befere rebooting20:46
rharperrenaming happens on each boot20:46
rharperor an attempt20:46
mgagne_by cloud-init?20:46
rharperyes20:46
mgagne_ok, will nuke /var/lib/cloud/* on an other node I have20:46
mgagne_and reboot forever20:47
prometheanfirethat's how I test as well20:47
* rharper steps away for a bit20:57
mgagne_so I reboot and rebuilt 10+ times and I can't reproduce22:30
mgagne_it looks to be a very unlucky race condition22:30

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!