[11:45] <kholkina> hi all! I have a question regarding user-data update. I have updated a user-data via openstack nova and now I have a new value when chek it via http://169.254.169.254/2009-04-04/user-data. Also I added [scripts_users, always] in the config file. But it still doesn't work on reboot. When I delete obj.pkl manually and reboot it works fine. But I think it's not the better way
[13:47] <niluje> smoser: <3
[13:48] <niluje> just noticed you merged the scaleway datasource for xenial on December
[13:48] <niluje> thanks for the xmas gift :)
[14:40] <smoser> niluje: \o/
[15:10] <niluje> "caribou" joined us on monday :p
[19:10] <dojordan> @smoser, can you take a look at my latest iteration? I proposed and implemented a solution that works post xenial: https://code.launchpad.net/~dojordan/cloud-init/+git/cloud-init/+merge/334341
[19:18] <ajorg> I found what I think is an interesting (bug?) behavior...
[19:19] <ajorg> If you are using NoCloud (via an ISO attached to a VM) and you remove the ISO, the next time you boot it will decide you're using None instead of NoCloud and it will re-initialize.
[19:20] <ajorg> I'm not sure how to improve this behavior, but I wonder if using the DMI UUID or serial for both None and NoCloud would fix it.
[19:21] <ajorg> I suspect similar behavior would be shown if you denied access to the instance metadata on an EC2 instance early enough that cloud-init couldn't read it.
[19:22] <smoser> ajorg: that is correct.
[19:22] <smoser> yeah, we could use the dmi uuid (as some other clouds do).  and keep that.
[19:23] <smoser> there is also 'manual_cache_clean' that you can set
[19:23] <smoser> and then you can remove the disk.
[19:23] <ajorg> If we use the same ID for both NoCloud and None will it detect that cloud-init has already run even though it has changed clouds?
[19:25] <ajorg> A note that says "please don't remove the ISO or this will happen" in the docs would also be a good start.
[20:29] <smoser> blackboxsw:
[20:29] <smoser> https://git.launchpad.net/~dojordan/cloud-init/commit/?id=7f23e5c4808a9c647cd4d5277625a723a58b132b
[20:29] <smoser> "on ubuntu release > xenial we rely on systemd-networkd"
[20:29] <smoser> do you know if that is correct ?
[20:31] <blackboxsw> reading
[20:34] <blackboxsw> smoser: for artful and bionic I know that's true... not sure about zesty. I don't recall, but I'll spin one up now
[20:34] <blackboxsw> I left nearly the same log message on dropping ifupdown Azure stuff in my commit
[20:34] <blackboxsw> ￼"    In artful and bionic ifupdown package is no longer installed in default
[20:34] <blackboxsw>     cloud images. As such, Azure can't use those tools to bounce the network
[20:34] <blackboxsw>     informing DDNS about hostname changes."
[20:35] <smoser> blackboxsw: i just commented there
[20:35] <smoser> mwhudson says that you and i are blowing smoke
[21:09] <blackboxsw> " the data thre is new and will get documented more as we go.
[21:09] <blackboxsw> That function will make its way back to 16.04 in a cloud-init SRU in the next month or so." .... smoser I'll make a card for that.
[21:09] <blackboxsw> I should have shoved something into RTD when we added instance-data.json, but we can get it in soon
[21:10] <blackboxsw> I kinda figured we needed to do it with the jinja-template handling too
[21:11] <dojordan> @blackboxsw unrelated, but do you know where the SendHostname=true flag lives?
[21:12] <blackboxsw> smoser: yeah you pointed me at his comment there that systemd isn't handling publishing dhcp info on hostname change..... I thought I ran on azure artful and bionic and the update did happen, but I don't know what triggered it. hrm.
[21:12] <dojordan> I actually can't find that flag anywhere actually being set
[21:14] <dojordan> Im looking at an artful azure VM now, and I checked /lib/systemd/network, /run/systemd/network, /etc/systemd/network
[21:15] <blackboxsw> dojordan: cloud-init doesn't set it, I only read the docs on it as default behavior if unset. I can see in systemd during dhcp client config refresh they check for said flag... but I can't confirm whether something sets that up.
[21:15]  * blackboxsw needs to look at this again it seems. my recall is dusty on what I originally tested/saw on artful and bionic. I'll spin up azure vms for testing now
[21:16] <dojordan> Gotcha, ill enable the flag and look for a dhcp request
[21:24] <dojordan> what's the policy in cloud-init about restarting another service? Manual testing seems to confirm what michael said in bug 1739516 that changing the hostname doesn't appear to actually send a DHCP request
[21:25] <smoser> dojordan: well if we have to bounce it, we can
[21:25] <blackboxsw> dojordan: so systemd-networkd docs say the SendHostname=true needs to be in the [DHCP] section of the configs. I see that DHCP section in /run/systemd/network/10-netplan-eth0.network.... but I'm guessing we could provided a /etc/systemd/network/11-something.network  file containing [DHCP]\bSendHostname=true  if needed? (shot in the dark as I'm working off docs at the moment
[21:25] <smoser> ads we have before
[21:25] <smoser> it'd be nicer if you coudl bounce *that* inteface
[21:25] <smoser> rather than restarting the service
[21:26] <smoser> i'm somwwhat concerned about restartign the service during boot, but we will see.
[21:26] <blackboxsw> that's just me RTFMing though. poking at an azure instance in a min here to confirm
[21:27] <blackboxsw> smoser: rharper do we know if netplan configs allow you to post arbitrary dhcp config options, or just dhcp4: on dhcp6: on etc
[21:27] <blackboxsw> s/on/true
[21:28]  * blackboxsw doesn't see anything that looks like it at http://people.canonical.com/~mtrudel/netplan/
[21:28] <blackboxsw> not sure if that's the 'canonical' source for netplan docs though
[21:28] <rharper> blackboxsw: netplan does not expose arbitrary dhcp options; I think accept-ra is the exception at this time
[21:29] <smoser> what was the flag we were interested in ?
[21:29] <rharper> SendHostname; but I do wonder if we can figure out what's going on before we start exposing these sorts of things in the top-level yaml;
[21:30] <rharper> is this some sort of dynamic dns update based on hostname from a dhclient request ? or is it doing something else with this specific mechanism
[21:30] <dojordan> I think the flag is simply whether or nor to send the hostname when sending a dhcp request, but not actually sending dhcp everytime the hostname is changed
[21:31] <cyphermox> smoser, rharper: SendHostname is default in systemd.
[21:31] <rharper> it's just sent when the client attempts to obtain (or renew) a lease IIUC
[21:32] <cyphermox> (so is UseHostname=, which is meant to use the hostname that DHCP hands out)
[21:32] <dojordan> gotcha, so we won't be able to rely on that to force a new dhcp
[21:32] <cyphermox> what is this about?
[21:33] <dojordan> basically we (azure) need a way to force a dhcp request to acquire a new IP in certain circumstances
[21:33] <dojordan> we used to use ifdown/ifup but post xenial we no longer have those binaries installed by default
[21:34] <cyphermox> dojordan: wouldn't the "right way" be to have the DHCP server send a DHCPNAK?
[21:34] <cyphermox> assuming that works "out of line"
[21:37] <dojordan> or the other right way would be to use leases less than 2^23 - 1 seconds... but unfortunately in our stack the dhcp server is not aware of when to send the DHCPNAK
[21:37] <cyphermox> right
[21:38] <cyphermox> short leases obviate the issue of "we need to force the client to re-configure now", at the cost of more packets happening on the network
[21:38] <cyphermox> whereas DHCPNAK or doing stuff on the client means there needs to be code that decides it's time to reconfigure outside of the lease expiry time
[21:40] <cyphermox> is the client or infrastructure more likely to know what set of circumstances the client should do DHCP again?
[21:40] <dojordan> client
[21:41] <cyphermox> (my guess is the server should always be authoritative, but I don't know of your use case)
[21:41] <cyphermox> ah, interesting
[21:41] <dojordan> i guess we could just run dhclient -r?
[21:41] <dojordan> i can elaborate on the scenario:
[21:41] <cyphermox> dojordan: dhclient isn't what's doing DHCP when using systemd-networkd.
[21:41] <dojordan> oh right... and systemd-network doesn't expose the same api?
[21:42] <cyphermox> I don't think it exposes that, but I'm not sure :)
[21:43] <dojordan> thinking outside the box, can we delete the dhcp lease files? does that retrigger it?
[21:43] <smoser> dojordan: this m ight become easier (and more easily use dhclient sandboxed) if we get to running in local time.
[21:44] <dojordan> sorry didn't quite catch that
[21:44] <nazarewk> hlelo
[21:44] <smoser> oh. hm.
[21:44] <nazarewk> *hello
[21:44] <smoser> wait.
[21:44] <nazarewk> how does this project relate to CoreOS cloud-init and RancherOS cloud-init?
[21:45] <smoser> now i'm confused.
[21:45] <cyphermox> dojordan: I don't think it would trigger it, but you might be able to just restart systemd-networkd without adverse consequences
[21:45] <smoser> nazarewk: no. coreos is not here.
[21:45]  * smoser googles rancheros
[21:45] <nazarewk> smoser: i'm researching cloud operating systems
[21:45] <dojordan> that was my though too, just wanted to make sure it would be safe
[21:45] <smoser> dojordan: we run azure datasource at local
[21:46] <smoser> meaning proper system configured networking isnt up yet.
[21:46] <nazarewk> after going through coreos (deprecated cloud-init), RancherOS (forked from CoreOS cloud-init) i stumbled upon configuring Project Atomic with cloud-init
[21:46] <smoser> now i'm confused.
[21:46] <nazarewk> and saw that is some wide standard
[21:46] <cyphermox> smoser: again?
[21:46] <cyphermox> ;)
[21:46] <nazarewk> any idea how those 3 (or 2 i guess) relate to each other?
[21:47] <smoser> well, project atomic to my knowledge uses this cloud-init
[21:47] <smoser> (as contributors from redhat have added that support)
[21:48] <nazarewk> i already know that, i am more interested in how cloud-init came to be
[21:48] <nazarewk> i can't find any info on where it came from
[21:49] <nazarewk> ok looks like i found this
[21:49] <nazarewk> https://github.com/coreos/coreos-cloudinit#configuration-with-cloud-config
[21:51] <cyphermox> dojordan: so, I guess what you would need to do depends on what the circumstances are in which a client needs to restart DHCP, if it's because new information was received from the datasource, maybe the best is for cloud-init to restart systemd-networkd when it finds out, if systemd-networkd is even running
[21:52] <cyphermox> OTOH, if it's potentially hours after boot, then some agent would need to do it, if it's safe.
[21:52] <dojordan> so the reason this matters is if we hit our instance metadata service with a stale IP, it doesn't recognize it and the client will throw an exception
[21:52] <dojordan> so we catch it and bounce the nic (on xenial)
[21:53] <cyphermox> so this isn't actually doing any real DHCP?
[21:53] <cyphermox> otherwise you'd have the lease times to tell you when to renew, so you theoretically never have a stale IP
[21:53] <dojordan> our leases are infinite
[21:54] <dojordan> but it is doing real DHCP in the sense we are getting a new ip,dns,subnet, etc
[21:54] <dojordan> the reason it could be stale is our platform is moving a vm from one vnet to another
[21:54] <cyphermox> yes, but I mean even if the IP belongs to you forever, a short lease will have you ping the DHCP server periodically and possibly get new information
[21:54] <cyphermox> ah
[21:57] <dojordan> @smoser, not entirely sure when systemd-networkd is running, but I have confirmed in testing that in my current PR we do successfully hit the instance metadata server in our data source. So I guess some parts of the networking stack are setup?
[22:01] <smoser> dojordan: right.
[22:01] <smoser> thats what is confusing to me
[22:01] <smoser> i think you are actually ending up "bouncing" the network that wasnt up yet
[22:02] <smoser> but not sure. i'm looking at that.
[22:05] <rharper> what stage is cloud-init at when it's time to bounce? has cloud-init net already run? is that what we're trying to determine ?
[22:05] <dojordan> yeah, this is during _get_data within the AzureDataSource, so if it is in local then network has yet to run
[22:07] <rharper> then it won't yet be up; yeah
[22:07] <smoser> right
[22:07] <rharper> and then the resume *shouldn't* need to kick dhcp since it never leased anything anyhow; execpt how are we polling metadata service to know when it's time to come up again ?
[22:08] <smoser> right
[22:08] <smoser> i think it came up because we bounced it
[22:09] <smoser> so it ran 'ifdown; ifup' and it came up. since the ifdown didnt do anthing
[22:09] <smoser> thats my hypothesis
[22:09] <smoser> on xenial
[22:09] <dojordan> i think you're right
[22:10] <dojordan> so theoretically it wouldn't work > xenial as there is no ifup
[22:10] <rharper> w.r.t the systemd path;  if were in the same boat; I think an ip link set down on the interface that needs to bounce may be enough for systemd-networkd to allow a restart to kick DHCP again
[22:10] <rharper> that's something testable
[22:11] <dojordan> yes i am actually testing that later this week or early next
[22:11] <dojordan> waiting for the networking team...
[22:11] <dojordan> but they claimed changing the link state of the interface (removing and re adding the nic from our hypervisor) didn't trigger dhcp in the vm.
[22:12] <smoser> dojordan: since we're running at local time frame, we dont have to "bounce"
[22:12] <smoser> and we can use somethign more like the ec2 datasource
[22:12] <dojordan> sounds good
[22:12] <rharper> dojordan: that make sense as it never DHCP'd in the first place;
[22:12] <smoser> which does a dhclient, gets its network info, then we can use the 'EphemeralIPV4Network' context manager
[22:12] <smoser> to hit the datasource, and timeout and do it all again
[22:13] <dojordan> beauty
[22:13] <dojordan> i like it
[22:13] <rharper> but what remains is, while it's sleeping, how can it know when it's time to wake-up  if we don't have a network interface up to poll a URL ?
[22:13] <dojordan> we bring up a temporary one
[22:13] <smoser> rharper: right now, if uyou're looking at his branch it runs
[22:13] <smoser> self.bounce_network_with_azure_hostname
[22:13] <smoser> in _reprovision
[22:13] <smoser> and that does
[22:13] <dojordan> (just to confirm) the ec2 ephemeralipv4network hits the dhcp server?
[22:13] <smoser>  sh -c 'ifdown eth0; ifup eth0'
[22:13] <smoser> which ends up bringing it up the first time through
[22:14] <smoser> and relying on "stale" networking configuration to do so
[22:14] <smoser> dojordan: look at the ec2 code you shuld be able to see.
[22:14] <rharper> but, the ifup eth0 isn't sufficient in local; there's no eth0 network config to indicate it needs to DHCP
[22:14] <rharper> at least in > xenial
[22:15] <rharper> or at least I'm not sure we write out the network config for it starts "sleep waiting"
[22:15] <smoser> this hunk in _get_data
[22:15] <smoser> http://paste.ubuntu.com/26368525/
[22:15] <smoser> we'd just do
[22:16] <rharper> oh, EC2 does, but not in Azure at this time
[22:16] <smoser> with net.EphemeralIpv4Network(**net_params):
[22:16] <smoser> right
[22:16] <smoser> so i'm saying something like:
[22:16] <smoser> while True:
[22:17] <smoser>     dhcp_leases = dhcp.maybe_perform_dhcp_discovery(self.fallback_interface)
[22:17] <smoser>     if not dhcp_leases:
[22:17] <smoser>         something_bad
[22:17] <smoser>       with net.Ephem....
[22:17] <smoser>         hit MD service
[22:17] <smoser>         on happy path break
[22:18] <smoser> that is terrible i realize , buti think maybe explains it.
[22:18] <rharper> right;  the net.Eph dhcp stuff; how did we work around requiring dhclient ? or do we keep that present ?for Artful + Bionic ?
[22:19] <smoser> we require dhclient
[22:20] <blackboxsw> nazarewk: I believe cloud-init originated here, with these folks, originally a Canonical internal product  that garnered broad adoption and was picked up a supported by other OSes and clouds. :) https://www.podcastinit.com/cloud-init-with-scott-moser-episode-126/
[22:20] <blackboxsw> :)
[22:25] <blackboxsw> smoser: per the suggestion about the poll_imds looping.... right we should be able to call maybe_permform_dhcp_discovery which attempts dhclient queries using the fallback nic. and returns an empty list if dhclient doesn't exist or can't get an ip address
[22:26] <blackboxsw> you're pastebin explains what we do currently in ec2 which could apply here in azure too https://paste.ubuntu.com/26368525/
[22:27] <smoser> i hope dojordan follows. i think blackboxsw and rharper do, but ih ave to run.
[22:27] <smoser> i'll look in tomorrow on it a bit dojordan
[22:27] <blackboxsw> dojordan: (just to confirm) the ec2 ephemeralipv4network hits the dhcp server?.... Ephemeral ipv4 network uses the response for maybe_perform_dhcp_discovery
[22:30] <blackboxsw> you provide it with a params interface, ip, prefix_or_mask broadcast and router all of which maybe_perform_dhcp_discovery returns
[22:30] <blackboxsw> you provide it with the params (interface, ip, prefix_or_mask broadcast and router) all of which maybe_perform_dhcp_discovery returns
[22:33] <blackboxsw> the specific use (which could be nearly the same for Azure) is here https://git.launchpad.net/cloud-init/tree/cloudinit/sources/DataSourceEc2.py?id=78372f16d2711812793196aa8003ad51693ca472#n105
[22:34] <blackboxsw> though within the with net.EphemeralIPv4Network context youc could do your polling of IMDS
[22:35] <blackboxsw> as that EphemeralIPv4Network context manager serves only to temporarily bring up a network interface to allow you to hit an external URL. Then it tears that interface back down
[22:37] <blackboxsw> so it basically performs whatever static network setup is required (including routes) on a given interface  for your context and then any setup it needed to perform it tears down upon __exit__
[22:39] <dojordan> yeah I will play around with it this afternoon, I might make a modification to add retry support though
[22:39] <blackboxsw> I'm +1 on retries, I like those more that while Trues :)
[22:51] <dojordan> haha yeah a little scary typing that