[11:50] mgagne, hm.. [11:50] your 'is_bonding_slave' idea is fine, but i really dontunderstand why it shoudl be needed. [11:51] you mentioned upstart, are you using upstart somewhere ? [11:56] you dont want to mess with dsmode really. [11:57] in current trunk, dsmode=local would allow you to make init_modules run earlier (without access to network) === rangerpbzzzz is now known as rangerpb [15:44] mgagne, around ? [15:56] smoser: I am now [15:58] hey. [15:58] so, are you using upstart ? [15:58] smoser: I'm not using upstart, I was reading source code and commenting about it. cloud-init is running twice as per bug description. Running it a second time fails because cloud-init doesn't expect bonding to be configured at this point or in fact, all code and tests were done without bonding support so a lot of assumption were made which aren't true anymore. [15:59] I'm booting on ubuntu 16.04, it's systemd afaik [15:59] cloud-init does run twice for sure, but only the first time should set the networking up. [15:59] oh. but we rename on every boot, so maybe we're doign that twice. [15:59] hm.. [15:59] ok, well that's not the case on my side, bug 3.1) and 3.2) were caused by this double network config run [16:00] i'll have a look in a bit. your is_bonding_slave change seems to make sense. [16:00] no no, I boot ONCE and it fails, I'm not even testing reboot at this point [16:00] right. [16:01] because of this mac/link/device mapping, 2nd run fails because of how bonding behaves, it changes the mac of the bonding slaves, hence the added logic for is_bonding_slave. [16:02] I didn't do extensive tests, just boot, ping, ssh (with sshkey) and check hostname [16:02] right [18:02] smoser u ever get a chance to look over https://code.launchpad.net/~harlowja/cloud-init/+git/cloud-init/+merge/302609 [18:02] its the future! [18:02] ha [18:02] harlowja, i've not looked at it now. [18:03] np [18:04] so .. [18:04] Instead of looking in a very specific location for [18:04] cloudinit config modules; which for those adding there [18:04] own modules makes it hard to do without patching that [18:04] location instead use entrypoints and register all [18:04] current cloudinit config modules by default with that [18:04] new entrypoint (and use that same entrypoint namespace [18:04] for later finding needed modules). [18:04] -- [18:05] how does registering the entry points help "those adding their own modules" [18:06] rharper, what shall i do for mgagne's auto-bringup of bond. [18:06] did you have work on that that i didnt' see ? [18:06] smoser: the fix is what i had [18:07] but in general, we need to think about v4 vs v6 [18:07] what fix ? [18:07] in eni.py [18:07] i didnts ee, sorry. [18:07] didnt see [18:07] he posted patches, basically adds the if 'bond-master' or 'bond-slaves' in iface, then emit auto [18:07] smoser so they still need to add a entry to cloud.cfg (either at packaging time, or at userdata/runtime) [18:07] smoser: rharper: all patches: http://paste.ubuntu.com/23059836/ [18:07] i didn't go into the path of discovering and creating a cloud[init,config,final] sections of that config [18:08] because though i could, its umm, non trival :-P [18:08] and likely requires more metadata on modules to define there ordering (not via cloud.cfg at that point) [18:08] we probably should instead check if iface['type'] in ['bond', 'vlan'] and possibly 'bridge' ; [18:09] rharper, so you're just assuming all bonds (or vlans or bridges) then are 'auto' [18:09] so that kind of stuff seems like a larger change, vs just attempting to find modules that are already defined in cloud.cfg via entrypoints (leaving the change to be just a different way to find modules) [18:09] smoser: we default to auto if an interface has a subnet [18:09] in this case, it's a bond with no subnets [18:10] as it's being assembled but not with a subnet; [18:10] ie, those default to 'auto' while others (even with 'subnets') default to non-auto [18:10] we do default to auto if a subnet ? [18:10] no we always default to auto unless 'control' is set in subnet [18:10] yes [18:11] hm.. you're saying that is true after your change or before [18:11] there are a few known cases where config explicitly wants subnet + control=manual (aka iscsiroot) [18:11] if iface has subnet, control=auto for the iface/index pair [18:11] if you do not include any subnet, then no auto (except for bond-slaves) [18:12] that really should be any interface with a nested config (master/slave); I'm pretty sure [18:12] if iface has subnet and no control= [18:12] then control is set to auto [18:12] for iscsiroot, we specify control: manual [18:12] override the default; [18:15] right, so you're not actually checking for bond-master. [18:15] you're just trunign oauto on [18:16] no, we check for bond-master in the case if iface with no subnets [18:16] and then auto it, *if* it's a slave (slaves point to their master with bond-master key) [18:17] but, if the bond master itself (bond0) doesn't configure a subnet, it doesn't get an auto [18:18] I suspect the code in ifupdown/if-pre-up.d/ifenslave could be fixed to raise the bond master independent of whether it's marked auto or not; but it currently does *not* bring up the master unless listed in allow-auto (or marked auto) [18:18] if bond0 doesn't come up then the rest of the config won't succeed (we timeout waiting on bond0 to be created via slave ifup hook) [18:19] a bond-specific solution/workaround is to also include the bond master (indicated by key bond-slaves in iface) to be marked auto; [18:20] that might be enough, but I'd like to test/check bridges without subnets and vlans without subnets to see if we generally need to mark non-subnet interfaces with auto by default; that is, I don't yet know of a config where we want a manual bond/vlan/bridge [18:56] ok. for now i'm good with the fix as you all had. [18:58] it is kind of wierd and possibly wrong that we are renaming devices in the 'init' stage (in addition to init-local). [18:59] harlowja, what i dont understand is how you are making the problem of adding a config thing any easier. [18:59] the cc_foo.py can now be placed in some additional directory ? [18:59] smoser i can put the config modules in my own library, expose a named entrypoint, then just update cloud.cfg to reference that module [19:00] so cc_blahblah no longer needs to be patched into cloud-init [19:00] how do you expose a named entry point ? [19:00] same way the modification there to cloud-init setup.py is [19:01] so library would just need to add a entrypoints entry (like in that setup.py) in there own module [19:03] so in said libraries setup.py there would be an entry like [19:03] entry_points={ [19:03] 'cloud.config': [ [19:03] 'my_thing = my_thing.my_cloud_handler', [19:03] ], [19:03] }, [19:03] so when cloudinit looks for a way to call 'my_thing' (assuming its in a cloud.cfg listing somewhere) then it can go out and try to find it (and load this library to get at it) [19:04] (or if nobody registered that module, then die as usual) [19:16] harlowja, so.. [19:16] [19:16] http://paste.ubuntu.com/23062532/ [19:16] that is what i dont like about entry poitns [19:16] takes ~ 0.01 to bring up python , 0.03 to bring up python3 on a reasonably current SSD [19:17] (with '0' as first arg) [19:17] importing the pkg_resources takes 0.3 seconds on python, and 0.25-ish on python3 [19:17] it does look like it caches stuff as 10 runs take about the same as 1 [19:18] i'm guessing that python3 is faster in my test only because i have fewer entry points or packages installed on the system in python3 compared to python2 [19:18] so its doing less work. [19:20] this is also embarrasing: [19:20] http://paste.ubuntu.com/23062545/ [19:20] and it needs fixing [19:20] but i'm somewhat hesitant to add something like that. [19:31] so thats just because u imported 'pkg_resources' ? [19:32] the pkg resources import takes quite some time (~.1 seconds) [19:32] the enumerating of some non-existant namespace takes .2 seconds [19:32] obviously very scientific data there. [19:33] :-P [19:33] i should have done a -1 [19:33] lets re-do that paste [19:36] http://paste.ubuntu.com/23062572/ [19:36] there. -1 is just cost of bringing up python [19:37] fiddle [19:37] http://paste.ubuntu.com/23062577/ [19:37] there ^ [19:37] -1 is cost of python [19:38] 0 is cost of import pkg_resoruces [19:38] 1 is cost of one call to 'iter_entry_points' [19:38] 10 is cost of 10 calls [19:38] k [19:38] with revised my.py at http://paste.ubuntu.com/23062581/ [19:38] seems like they need to better optimize that entrypoint 'catalog' lol [19:38] yeah, it is stat crazy [19:39] those openstack cli programs do taht. [19:39] right [19:39] they do cache well [19:39] since 10 runs takes basically nothing more than 1 [19:39] but assuming a entrypoint catalog existed, in the core python, then i'd assume that stuff wouldn't take forever [19:39] aka a tiny sqllite db [19:39] lol [19:40] wonder why such a thing doesn't exist [19:40] yeah, but i think the entry points are stuff in taht egg.info right ? [19:40] thats how those are loaded ? [19:40] so python goes looking in any possible directory in sys.path for a file egg.info or something and then goes reading it and such. [19:40] thats one location of it, but u'd think that pip could update a sqllite db or something [19:41] i wonder if the python community is working on anything like that [19:41] seems pretty obvious to do that [19:42] then X people wouldn't be making there own entrypoint-thing due to this [19:43] not too long ago i had a spinning disk [19:43] (more embarrasment) [19:43] whats that crap [19:43] ha [19:44] and running 'nova' on it took like 3 seconds to load. [19:44] nova as in the cli tool, not the service :) [19:44] :-P [19:48] so ya, the other option is that we make our own loader slightly more advanced [19:48] so that say in cloud.cfg u could have fully specified modules + functions [19:48] then i could have a entry like [19:48] godaddy_ci.handlers:basic_handler [19:49] though that starts to just make our own entrypoint like thing :-/ [19:54] ok. one more thing.. http://paste.ubuntu.com/23062607/ [19:54] for my reference mostly. [19:54] lol [19:54] that just runs it with strace too, and counts stats or opens [19:54] nice [19:54] stats or opens: 2561 [19:54] lol [19:54] ya, idk why they aren't backing that crap via sqllite [19:54] afaik entrypoints are all 'static' [19:55] in that they are all defined by packing (in setup.py or other) [19:55] seems dumb to rescan the filesystem to find them [19:56] it'd seem like a win for most of python if it wasn't so scan happy [19:56] though of course any change to do that would probably hit the people that will say its all in cache and such and blah blah [19:59] and gets into the question, of make our own thing, or just work with the python stuffs [20:06] so i had a start of my own thing [20:06] that took a list of directories [20:06] and would look in those. [20:06] cloud-init needs lots of performance improvements for sure [20:08] why not at that point just explicitly name full modules in cloud.cfg ? [20:08] i'd rather not make our own full entrypoint thing :( [20:09] or just at least, try to talk to python-devs, asking what's a solution (is there any, is sqlite db possible, or a static file that everyone updates or ...) [20:22] ok... i'm just saying this out loud for my own logs and such. [20:22] http://paste.ubuntu.com/23062681/ [20:22] that is a bzr revno to git hash mapping that seems correct for right now. [20:30] woah [20:30] ha [20:54] so a coworker tested the "fixed" cloud-init and had some form of race condition, the gateway and routes weren't properly configured. rebooting fixed the issue. === rangerpb is now known as rangerpbzzzz [21:17] will do more tests tomorrow