[18:02] <r-daneel> smoser, so I was wondering if I had anything more to do for patch https://bugs.launchpad.net/cloud-init/+bug/1275098
[18:02] <smoser> hey, r-daneel 
[18:03] <smoser> so the reason it languisued is probably
[18:03] <smoser> that i just didn't have focused time to think about it.
[18:03] <smoser> its very non-trivial.
[18:04] <r-daneel> you mean the logic is non-trivial ?
[18:04] <smoser> in that bouncing network adapters early in boot feels strange.
[18:05] <smoser> and may have unintended side effects.
[18:06] <smoser> i suspect the reason the interfaces came up was because they were "left over" from a previous instance ? ie, in a "capture" ?
[18:07] <r-daneel> well, I experienced only 2 cases: 1. I have no proper networking, so bring_down/bring_up has no side-effect, 2. I have wrong IP info, so  I anyway will break networking by replacing the IP
[18:07] <r-daneel> the case arises when we clone a volume
[18:07] <r-daneel> and then try booting a new VM from the clone
[18:08] <r-daneel> the OS has already an IP (the previous - wrong - one) and has it configured before cloud-init runs
[18:08] <r-daneel> of course, cloud-init does all is needed in the config files
[18:08] <r-daneel> but it's too late
[18:08] <r-daneel> as interfaces are set up
[18:09] <r-daneel> now doing an ifup, either adds the new IP or fails totally
[18:09] <harlowja_> smoser u alive!
[18:09] <smoser> yeah, i dont know why my bip proxy didn't join here.
[18:10] <smoser> now it will.
[18:12] <smoser> r-daneel, so i'm admittedly not all that knowledgable about how boot works on centos and cloud-init.
[18:12] <smoser> but in ubuntu, networking comes up in parallel to the local datasource.
[18:12] <r-daneel> smoser, as far as I could understand from the code, we end up calling the _bring_up_interface() method
[18:12] <smoser> if thats the case on centos (or even if its not, because i want to solve this correctly),
[18:13] <smoser> then the 'ifdown' could fail with "interface not up"
[18:13] <smoser> or between cloud-init takign it down and then back up, the OS could bring it up.
[18:13] <r-daneel> smoser, ifdown failing does not seem to be a real issue to me
[18:13] <smoser> and the cloud-init's "ifup" would fail
[18:14] <smoser> r-daneel, it may not seem to be an issue.
[18:14] <smoser> but you can't blindly ignore it.
[18:14] <r-daneel> maybe try/catch that call and pass in case of failure 6
[18:14] <r-daneel> ?
[18:15] <smoser> failure 6 ?
[18:16] <smoser> basically you have a fairly straigh tforward failure path, with a fairly straight forward work around.
[18:16] <smoser> simply remove non 'eth0' (or all) interfaces before you "capture" and snapshot.
[18:16] <r-daneel> (sorry '6' is the lower case for '?' on my keyboard :p)
[18:16] <smoser> but fixing it by just going willy nilly with 'ifdown && ifup' seems to be racy
[18:17] <smoser> and i'd rather have a guaranteed failure with striaght forward work around
[18:17] <r-daneel> smoser, this is a volume clone, the OS is not aware of being 'freezed'
[18:17] <smoser> than sometimes-it-doesnt-work situation
[18:17] <smoser> r-daneel, understood.
[18:17] <smoser> but you could easily "prep" before "capture".
[18:18] <smoser> generally, cloud-init has made you not have to "prep" (clean). and i've wanted to make that always "just work"
[18:18] <smoser> but there are some wierd cases, where I don't knwo what the right behavior is.
[18:19] <r-daneel> well, I understand that proper 'prepping' would be the right thing to do
[18:20] <r-daneel> for instance debian/ubuntu have that udev file you'd need to cleanup 
[18:21] <smoser> debian/ubuntu should not have udev files
[18:21] <smoser> unless you're using an odd MAC range
[18:21] <r-daneel> but obviously, when we boot on a clone, the only issue we get is with IP configuration 
[18:22] <smoser> yeah. i do understand this is an issue. and i'd like to have it fixed properly.
[18:23] <r-daneel> if we try to do that very cleanly, we should check if there is a pre-existing setup or at least check if the network is setup as exepcted by our freshly installed config
[18:23] <smoser> its really hard in ubuntu, and i suspect it might be in centos too (if not now, then it might be later with move to systemd)
[18:23] <r-daneel> we could then only ifdown if we really already have a wrong setup
[18:24] <smoser> the ordering of boot is just very much not guaranteed
[18:24] <r-daneel> when cloud-init does ifup it already assumes an ordering
[18:25] <r-daneel> it assumes no interface is set, and that noone will fiddle with it
[18:25] <smoser> well, sort of.
[18:25] <r-daneel> even worse, on centos it adds the IP to any existing config
[18:25] <r-daneel> not enforcing the setup
[18:25] <smoser> if there was no interface configuration, for ethX, and cloud-init writes an interface configuration for ethX , at least in current ubuntu (and i think centos) nothing is magically going to bring it up
[18:25] <smoser> so that case is safe to assert "it was not up"
[18:26] <smoser> ubuntu/debian 'ifdown' is terribly annoying.
[18:27] <r-daneel> but cloud-init happily overwrites the existing config files
[18:27] <smoser> if you remove ethX from /etc/network/interfaces and then ifdown an interface that was already up, it says "not configured"
[18:27] <smoser> r-daneel, yeah, agreed. its not perfect now.
[18:27] <r-daneel> if interface is non-existent, not being able to ifdown it has no effect at worse
[18:28] <r-daneel> and in ubuntu/debian the call is ifup ALL, not event per interface
[18:28] <r-daneel> s/event/even/
[18:29] <smoser> the sanest thing to me seems to be to block any network events from occurring while you're going looking for "local" data sources
[18:30] <r-daneel> again, ifdown on a non-setup interface or one that has wrong info wouldn't bother me, ... if I do changes by hand then reboot, ifdown will fail on shutdown and OS doesn't care
[18:30] <smoser> then, correctly reading "existing config" and merging it with config from config drive.
[18:30] <r-daneel> merging is ok, but you're likely to conflict with what is set
[18:30] <smoser> but ignoring things that might fail sometimes is not helpful to anyone.
[18:31] <smoser> well, if you conflict and you're blocking all networking coming up, thne you set it right.
[18:31] <smoser> ie, config-drive would "win".
[18:31] <smoser> (possibly allowing for cloud.cfg configuration that modifies that behavior)
[18:32] <smoser> do you know if you could block all networking from coming up on centos ?
[18:32] <r-daneel> but the cloud-init service in the OS comes after the OS own's script for network setup, as it seems
[18:32] <smoser> i suspect i can do it on ubuntu, pretty sure you can do it fairly easily on sysvinit.
[18:32] <r-daneel> so how will you prevent the OS from setting the network info ?
[18:33] <smoser> there are 2 cloud-init services
[18:33] <smoser> local and 'init'.
[18:33] <smoser> the local can read only local datasources
[18:33] <r-daneel> ok, let me check ordering ...
[18:33] <smoser> and cannot presume netowrking, but can set up networking.
[18:33] <smoser> then... there is the other thing that adds a twist here.
[18:34] <smoser> in reality, i suspect that ocnfig drive is not long for this world.
[18:34] <r-daneel> hmm ? configdrive will be removed 6
[18:34] <r-daneel> ?
[18:34] <smoser> its static nature is just too limiting. i suspect in time to come, config-drive will disappear as the metadata service is able to provide dynamic data.
[18:35] <smoser> are you familiar with how hot-plug of network interfaces works on amazon ?
[18:35] <smoser> its really nice.
[18:35] <smoser> interface added, causes udev rules to fire that create and name the interface.
[18:35] <smoser> those same rules say "oh, look, i'm on EC2, and that means the metadata service might tell me what IP address I should get".
[18:36] <smoser> (this doesn't work on ubuntu but on Amazon Linux AMI it does)
[18:36] <r-daneel> ok, I see
[18:36] <smoser> https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1153626
[18:37] <smoser> that is a much more sane way of doing things
[18:37] <smoser> and going that rougte would still mean we potentially have the issue that you were seeing (already existing config)
[18:38] <smoser> r-daneel, i'm not opposed to getting this working better.
[18:38] <smoser> and don't mean to sound stand-offish
[18:38] <smoser> its just not as easy as it might first appear.
[18:39] <r-daneel> I see your point, we use configdrive because of inherent reduction of complexity and security
[18:40] <smoser> i don't really know that anything is more secure.
[18:41] <smoser> if your host network is compromised, i think you have significant issues.
[18:41] <smoser> and as for complexity, i think that openstack networking generally being difficult to get right is what lead to pepole wanting config drive and its initial popularity), but at least in my recent experience, that is much better sorted now.
[18:42] <smoser> ie, you dont have "no route to 169.254.169.254" issues so much.
[18:45] <r-daneel> exposing a common service to all tenants seems much more risky than giving access to a file on the host. Someone escaping his VM is a much bigger problem but less likely to happen
[18:46] <alexpilotti> smoser: hi there
[18:47] <smoser> r-daneel, but you've already exposed dozens of common services to your guests ;)
[18:47] <r-daneel> smoser, such as ?
[18:47] <smoser> nuetron-api, nova-api, dhcp, dns
[18:48] <r-daneel> smoser, these are control planes not the VM infrastrucutre
[18:48] <smoser> dhcp and dns are on vm infrastructure
[18:48] <r-daneel> we have no dhcp and dns is external
[18:49] <smoser> i dont know. if you can't securely route traffic from one vm to an endpoint specific to that vm, then you cant actually do networking securely between 2 vms of a single tenant.
[18:50] <smoser> so yes, i agree, it is mroe complex, but its not complexity that you can actualy do without i think in the end.
[18:50] <smoser> alexpilotti, whats up?
[18:51] <alexpilotti> smoser: waiting for an answer on which is the minimum / recommended cloud-init version with MaaS metadata support :-)
[18:52] <smoser> oh. sorry.
[18:53] <smoser> our 12.04 images use 0.7.0-0ubuntu2
[18:53] <smoser> so thats surely known-working
[18:53] <alexpilotti> smoser: great, so with 0.7.4 we are good 
[18:55] <r-daneel> smoser, when trying to keep things as secure and non-complex as possible, we found it useful to use configdrive. I agree that at some point in time we may need more flexibility and my walk the metadata-server way. For now I find it useful to have the choice between statically set information in configdrive and dynamic setu through metadata-server. 
[18:56] <smoser> r-daneel, and you're certainly not alone in that decision :)
[18:57] <alexpilotti> smoser: tx, doing some tests
[19:01] <r-daneel> smoser, so back to our initial topic ;) as far as I remember, the 'local' and 'init' phases both ran after the OS had finished setting the IPs. Am I wrong ? 
[19:04] <smoser> r-daneel, on ubuntu, local happens [possibly] in parallel with any 'auto' interfaces.
[19:04] <smoser> '[possibly]' is very complex. 
[19:05] <smoser> but for almost all cases i can think of they're in parallel, and nothing forces them to be serialized.
[19:05] <smoser> i do not know about centos.
[19:05] <smoser> sysvinit iseasy to do these sorts of things :)
[19:05] <r-daneel> centos service ordering is S10network, S50cloud-init-local, S51cloud-init, S52cloud-config, S53cloud-final
[19:11] <r-daneel> for ubuntu, I did not fully check. experience showed (in the logs) that we were always doing ifup on an already  up interface and ifup refused to override
[19:11] <smoser> r-daneel, yeah, so on centos, it should be possible to just put cloud-init-local before S10
[19:12] <smoser> the one thing that that breaks, which i dont think is a real issue is network mounted filesystems (ie, /usr/ on nfs)
[19:13] <r-daneel> if you have the wrong IP, our setup prevent you from communicating (anti spoofing) if it could, we'd mess up things because of IP collision
[19:16] <r-daneel> smoser, then we still have no better solution for ubuntu
[19:17] <r-daneel> smoser, would it be more acceptable to do the ifdown/ifup cycle only on failure of the initial ifup ?
[19:18] <smoser> maybe, yeah.
[19:19] <smoser> i'm guessing one way or another we can force cloud-init local to run before and block networking coming up
[19:19] <smoser> but it will probably be tricky
[19:20] <smoser> (since as it is, netowrking comes up on udev events)
[19:20] <smoser> (network-intreace-added)
[19:21] <smoser> sorry.. net-device-added
[19:22] <r-daneel> so for those relying on udev, cloud-init should already be hooked-in to 'know' what to do
[19:24] <smoser> yeah. is complex though... elcoud-init explicitly actually emits the net-device-added when its inside a container
[19:24] <smoser> (as lxc instances don't get those events)
[19:32] <r-daneel> smoser, would it help to implement that ifdown/ifup in the platform specific files ? maybe only on ifup failure ? I understand that we're trying to march toward a future-proof solution but this will require a lot more code-diving on my part :)
[19:33] <smoser> i dont mind if ifup/down is in 'distro'.
[19:33] <smoser> err.. in per-distro code.
[19:39] <r-daneel> ok, I'll try to come up with distro-specific code. Will get back to you for a review once done :) 
[19:40] <r-daneel> smoser, thank you for your help