[11:50] <smoser> mgagne, hm..
[11:50] <smoser> your 'is_bonding_slave' idea is fine, but i really dontunderstand why it shoudl be needed.
[11:51] <smoser> you mentioned upstart, are you using upstart somewhere ?
[11:56] <smoser> you dont want to mess with dsmode really.
[11:57] <smoser> in current trunk, dsmode=local would allow you to make init_modules run earlier (without access to network)
[15:44] <smoser> mgagne, around ?
[15:56] <mgagne> smoser: I am now
[15:58] <smoser> hey.
[15:58] <smoser> so, are you using upstart ?
[15:58] <mgagne> smoser: I'm not using upstart, I was reading source code and commenting about it. cloud-init is running twice as per bug description. Running it a second time fails because cloud-init doesn't expect bonding to be configured at this point or in fact, all code and tests were done without bonding support so a lot of assumption were made which aren't true anymore.
[15:59] <mgagne> I'm booting on ubuntu 16.04, it's systemd afaik
[15:59] <smoser> cloud-init does run twice for sure, but only the first time should set the networking up.
[15:59] <smoser> oh. but we rename on every boot, so maybe we're doign that twice.
[15:59] <smoser> hm..
[15:59] <mgagne> ok, well that's not the case on my side, bug 3.1) and 3.2) were caused by this double network config run
[16:00] <smoser> i'll have a look in a bit. your is_bonding_slave change seems to make sense.
[16:00] <mgagne> no no, I boot ONCE and it fails, I'm not even testing reboot at this point
[16:00] <smoser> right.
[16:01] <mgagne> because of this mac/link/device mapping, 2nd run fails because of how bonding behaves, it changes the mac of the bonding slaves, hence the added logic for is_bonding_slave.
[16:02] <mgagne> I didn't do extensive tests, just boot, ping, ssh (with sshkey) and check hostname
[16:02] <smoser> right
[18:02] <harlowja> smoser u ever get a chance to look over https://code.launchpad.net/~harlowja/cloud-init/+git/cloud-init/+merge/302609
[18:02] <harlowja> its the future!
[18:02] <harlowja> ha
[18:02] <smoser> harlowja, i've not looked at it now.
[18:03] <harlowja> np
[18:04] <smoser> so ..
[18:04] <smoser> Instead of looking in a very specific location for
[18:04] <smoser> cloudinit config modules; which for those adding there
[18:04] <smoser> own modules makes it hard to do without patching that
[18:04] <smoser> location instead use entrypoints and register all
[18:04] <smoser> current cloudinit config modules by default with that
[18:04] <smoser> new entrypoint (and use that same entrypoint namespace
[18:04] <smoser> for later finding needed modules).
[18:04] <smoser> --
[18:05] <smoser> how does registering the entry points help "those adding their own modules"
[18:06] <smoser> rharper, what shall i do for mgagne's auto-bringup of bond.
[18:06] <smoser> did you have work on that that i didnt' see ?
[18:06] <rharper> smoser: the fix is what i had
[18:07] <rharper> but in general, we need to think about v4 vs v6
[18:07] <smoser> what fix ?
[18:07] <rharper> in eni.py
[18:07] <smoser> i didnts ee, sorry.
[18:07] <smoser> didnt see
[18:07] <rharper> he posted patches, basically adds the if 'bond-master' or 'bond-slaves' in iface, then emit auto
[18:07] <harlowja> smoser  so they still need to add a entry to cloud.cfg (either at packaging time, or at userdata/runtime)
[18:07] <rharper> smoser: <mgagne> rharper: all patches: http://paste.ubuntu.com/23059836/
[18:07] <harlowja> i didn't go into the path of discovering and creating a cloud[init,config,final] sections of that config
[18:08] <harlowja> because though i could, its umm, non trival :-P
[18:08] <harlowja> and likely requires more metadata on modules to define there ordering (not via cloud.cfg at that point)
[18:08] <rharper> we probably should instead check if iface['type'] in ['bond', 'vlan'] and possibly 'bridge' ;
[18:09] <smoser> rharper, so you're just assuming all bonds (or vlans or bridges) then are 'auto'
[18:09] <harlowja> so that kind of stuff seems like a larger change, vs just attempting to find modules that are already defined in cloud.cfg via entrypoints (leaving the change to be just a different way to find modules)
[18:09] <rharper> smoser: we default to auto if an interface has a subnet
[18:09] <rharper> in this case, it's a bond with no subnets
[18:10] <rharper> as it's being assembled but not with a subnet;
[18:10] <smoser> ie, those default to 'auto' while others (even with 'subnets') default to non-auto
[18:10] <smoser> we do default to auto if a subnet ?
[18:10] <rharper> no we always default to auto unless 'control' is set in subnet
[18:10] <rharper> yes
[18:11] <smoser> hm.. you're saying that is true after your change or before
[18:11] <rharper> there are a few known cases where config explicitly wants subnet + control=manual (aka iscsiroot)
[18:11] <rharper> if iface has subnet, control=auto for the iface/index pair
[18:11] <rharper> if you do not include any subnet, then no auto (except for bond-slaves)
[18:12] <rharper> that really should be any interface with a nested config (master/slave); I'm pretty sure
[18:12] <smoser> if iface has subnet and no control=
[18:12] <rharper> then control is set to auto
[18:12] <rharper> for iscsiroot, we specify control: manual
[18:12] <rharper> override the default;
[18:15] <smoser> right, so you're not actually checking for bond-master.
[18:15] <smoser> you're just trunign oauto on
[18:16] <rharper> no, we check for bond-master in the case if iface with no subnets
[18:16] <rharper> and then auto it, *if* it's a slave (slaves point to their master with bond-master key)
[18:17] <rharper> but, if the bond master itself (bond0) doesn't configure a subnet, it doesn't get an auto
[18:18] <rharper> I suspect the code in ifupdown/if-pre-up.d/ifenslave could be fixed to raise the bond master independent of whether it's marked auto or not; but it currently does *not* bring up the master unless listed in allow-auto (or marked auto)
[18:18] <rharper> if bond0 doesn't come up then the rest of the config won't succeed (we timeout waiting on bond0 to be created via slave ifup hook)
[18:19] <rharper> a bond-specific solution/workaround is to also include the bond master (indicated by key bond-slaves in iface) to be marked auto;
[18:20] <rharper> that might be enough, but I'd like to test/check bridges without subnets and vlans without subnets to see if we generally need to mark non-subnet interfaces with auto by default;  that is, I don't yet know of a config where we want a manual bond/vlan/bridge
[18:56] <smoser> ok. for now i'm good with the fix as you all had.
[18:58] <smoser> it is kind of wierd and possibly wrong that we are renaming devices in the 'init' stage (in addition to init-local).
[18:59] <smoser> harlowja, what i dont understand is how you are making the problem of adding a config thing any easier.
[18:59] <smoser> the cc_foo.py can now be placed in some additional directory ?
[18:59] <harlowja> smoser i can put the config modules in my own library, expose a named entrypoint, then just update cloud.cfg to reference that module
[19:00] <harlowja> so cc_blahblah no longer needs to be patched into cloud-init
[19:00] <smoser> how do you expose a named entry point ?
[19:00] <harlowja> same way the modification there to cloud-init setup.py is
[19:01] <harlowja> so library would just need to add a entrypoints entry (like in that setup.py) in there own module
[19:03] <harlowja> so in said libraries setup.py there would be an entry like
[19:03] <harlowja> entry_points={
[19:03] <harlowja>     'cloud.config': [
[19:03] <harlowja>         'my_thing = my_thing.my_cloud_handler',
[19:03] <harlowja>     ],
[19:03] <harlowja> },
[19:03] <harlowja> so when cloudinit looks for a way to call 'my_thing' (assuming its in a cloud.cfg listing somewhere) then it can go out and try to find it (and load this library to get at it)
[19:04] <harlowja> (or if nobody registered that module, then die as usual)
[19:16] <smoser> harlowja, so..
[19:16] <smoser>  
[19:16] <smoser> http://paste.ubuntu.com/23062532/
[19:16] <smoser> that is what i dont like about entry poitns
[19:16] <smoser> takes ~ 0.01 to bring up python , 0.03 to bring up python3 on a reasonably current SSD
[19:17] <smoser> (with '0' as first arg)
[19:17] <smoser> importing the pkg_resources takes 0.3 seconds on python, and 0.25-ish on python3
[19:17] <smoser> it does look like it caches stuff as 10 runs take about the same as 1
[19:18] <smoser> i'm guessing that python3 is faster in my test only because i have fewer entry points or packages installed on the system in python3 compared to python2
[19:18] <smoser> so its doing less work.
[19:20] <smoser> this is also embarrasing:
[19:20] <smoser>  http://paste.ubuntu.com/23062545/
[19:20] <smoser> and it needs fixing
[19:20] <smoser> but i'm somewhat hesitant to add something like that.
[19:31] <harlowja> so thats just because u imported 'pkg_resources' ?
[19:32] <smoser> the pkg resources import takes quite some time (~.1 seconds)
[19:32] <smoser> the enumerating of some non-existant namespace takes .2 seconds
[19:32] <smoser> obviously very scientific data there.
[19:33] <harlowja> :-P
[19:33] <smoser> i should have done a -1
[19:33] <smoser> lets re-do that paste
[19:36] <smoser>  http://paste.ubuntu.com/23062572/
[19:36] <smoser> there. -1 is just cost of bringing up python
[19:37] <smoser> fiddle
[19:37] <smoser> http://paste.ubuntu.com/23062577/
[19:37] <smoser> there ^
[19:37] <smoser> -1 is cost of python
[19:38] <smoser>  0 is cost of import pkg_resoruces
[19:38] <smoser>  1 is cost of one call to 'iter_entry_points'
[19:38] <smoser> 10 is cost of 10 calls
[19:38] <harlowja> k
[19:38] <smoser> with revised my.py at http://paste.ubuntu.com/23062581/
[19:38] <harlowja> seems like they need to better optimize that entrypoint 'catalog' lol
[19:38] <smoser> yeah, it is stat crazy
[19:39] <smoser> those openstack cli programs do taht.
[19:39] <harlowja> right
[19:39] <smoser> they do cache well
[19:39] <smoser> since 10 runs takes basically nothing more than 1
[19:39] <harlowja> but assuming a entrypoint catalog existed, in the core python, then i'd assume that stuff wouldn't take forever
[19:39] <harlowja> aka a tiny sqllite db
[19:39] <harlowja> lol
[19:40] <harlowja> wonder why such a thing doesn't exist
[19:40] <smoser> yeah, but i think the entry points are stuff in taht egg.info right ?
[19:40] <smoser> thats how those are loaded ?
[19:40] <smoser> so python goes looking in any possible directory in sys.path for a file egg.info or something and then goes reading it and such.
[19:40] <harlowja> thats one location of it, but u'd think that pip could update a sqllite db or something
[19:41] <harlowja> i wonder if the python community is working on anything like that
[19:41] <harlowja> seems pretty obvious to do that
[19:42] <harlowja> then X people wouldn't be making there own entrypoint-thing due to this
[19:43] <smoser> not too long ago i had a spinning disk
[19:43] <smoser> (more embarrasment)
[19:43] <harlowja> whats that crap
[19:43] <harlowja> ha
[19:44] <smoser> and running 'nova' on it took like 3 seconds to load.
[19:44] <smoser> nova as in the cli tool, not the service :)
[19:44] <harlowja> :-P
[19:48] <harlowja> so ya, the other option is that we make our own loader slightly more advanced
[19:48] <harlowja> so that say in cloud.cfg u could have fully specified modules + functions
[19:48] <harlowja> then i could have a entry like
[19:48] <harlowja> godaddy_ci.handlers:basic_handler
[19:49] <harlowja> though that starts to just make our own entrypoint like thing :-/
[19:54] <smoser> ok. one more thing.. http://paste.ubuntu.com/23062607/
[19:54] <smoser> for my reference mostly.
[19:54] <harlowja> lol
[19:54] <smoser> that just runs it with strace too, and counts stats or opens
[19:54] <harlowja> nice
[19:54] <harlowja> stats or opens: 2561
[19:54] <harlowja> lol
[19:54] <harlowja> ya, idk why they aren't backing that crap via sqllite
[19:54] <harlowja> afaik entrypoints are all 'static'
[19:55] <harlowja> in that they are all defined by packing (in setup.py or other)
[19:55] <harlowja> seems dumb to rescan the filesystem to find them
[19:56] <harlowja> it'd seem like a win for most of python if it wasn't so scan happy
[19:56] <harlowja> though of course any change to do that would probably hit the people that will say its all in cache and such and blah blah
[19:59] <harlowja> and gets into the question, of make our own thing, or just work with the python stuffs
[20:06] <smoser> so i had a start of my own thing
[20:06] <smoser> that took a list of directories
[20:06] <smoser> and would look in those.
[20:06] <smoser> cloud-init needs lots of performance improvements for sure
[20:08] <harlowja> why not at that point just explicitly name full modules in cloud.cfg ?
[20:08] <harlowja> i'd rather not make our own full entrypoint thing :(
[20:09] <harlowja> or just at least, try to talk to python-devs, asking what's a solution (is there any, is sqlite db possible, or a static file that everyone updates or ...)
[20:22] <smoser> ok... i'm just saying this out loud for my own logs and such.
[20:22] <smoser> http://paste.ubuntu.com/23062681/
[20:22] <smoser> that is a bzr revno to git hash mapping that seems correct for right now.
[20:30] <harlowja> woah
[20:30] <harlowja> ha
[20:54] <mgagne> so a coworker tested the "fixed" cloud-init and had some form of race condition, the gateway and routes weren't properly configured. rebooting fixed the issue.
[21:17] <mgagne> will do more tests tomorrow