/srv/irclogs.ubuntu.com/2018/01/11/#cloud-init.txt

kholkina	hi all! I have a question regarding user-data update. I have updated a user-data via openstack nova and now I have a new value when chek it via http://169.254.169.254/2009-04-04/user-data. Also I added [scripts_users, always] in the config file. But it still doesn't work on reboot. When I delete obj.pkl manually and reboot it works fine. But I think it's not the better way	11:45
niluje	smoser: <3	13:47
niluje	just noticed you merged the scaleway datasource for xenial on December	13:48
niluje	thanks for the xmas gift :)	13:48
smoser	niluje: \o/	14:40
niluje	"caribou" joined us on monday :p	15:10
dojordan	@smoser, can you take a look at my latest iteration? I proposed and implemented a solution that works post xenial: https://code.launchpad.net/~dojordan/cloud-init/+git/cloud-init/+merge/334341	19:10
ajorg	I found what I think is an interesting (bug?) behavior...	19:18
ajorg	If you are using NoCloud (via an ISO attached to a VM) and you remove the ISO, the next time you boot it will decide you're using None instead of NoCloud and it will re-initialize.	19:19
ajorg	I'm not sure how to improve this behavior, but I wonder if using the DMI UUID or serial for both None and NoCloud would fix it.	19:20
ajorg	I suspect similar behavior would be shown if you denied access to the instance metadata on an EC2 instance early enough that cloud-init couldn't read it.	19:21
smoser	ajorg: that is correct.	19:22
smoser	yeah, we could use the dmi uuid (as some other clouds do). and keep that.	19:22
smoser	there is also 'manual_cache_clean' that you can set	19:23
smoser	and then you can remove the disk.	19:23
ajorg	If we use the same ID for both NoCloud and None will it detect that cloud-init has already run even though it has changed clouds?	19:23
ajorg	A note that says "please don't remove the ISO or this will happen" in the docs would also be a good start.	19:25
smoser	blackboxsw:	20:29
smoser	https://git.launchpad.net/~dojordan/cloud-init/commit/?id=7f23e5c4808a9c647cd4d5277625a723a58b132b	20:29
smoser	"on ubuntu release > xenial we rely on systemd-networkd"	20:29
smoser	do you know if that is correct ?	20:29
blackboxsw	reading	20:31
blackboxsw	smoser: for artful and bionic I know that's true... not sure about zesty. I don't recall, but I'll spin one up now	20:34
blackboxsw	I left nearly the same log message on dropping ifupdown Azure stuff in my commit	20:34
blackboxsw	" In artful and bionic ifupdown package is no longer installed in default	20:34
blackboxsw	cloud images. As such, Azure can't use those tools to bounce the network	20:34
blackboxsw	informing DDNS about hostname changes."	20:34
smoser	blackboxsw: i just commented there	20:35
smoser	mwhudson says that you and i are blowing smoke	20:35
blackboxsw	" the data thre is new and will get documented more as we go.	21:09
blackboxsw	That function will make its way back to 16.04 in a cloud-init SRU in the next month or so." .... smoser I'll make a card for that.	21:09
blackboxsw	I should have shoved something into RTD when we added instance-data.json, but we can get it in soon	21:09
blackboxsw	I kinda figured we needed to do it with the jinja-template handling too	21:10
dojordan	@blackboxsw unrelated, but do you know where the SendHostname=true flag lives?	21:11
blackboxsw	smoser: yeah you pointed me at his comment there that systemd isn't handling publishing dhcp info on hostname change..... I thought I ran on azure artful and bionic and the update did happen, but I don't know what triggered it. hrm.	21:12
dojordan	I actually can't find that flag anywhere actually being set	21:12
dojordan	Im looking at an artful azure VM now, and I checked /lib/systemd/network, /run/systemd/network, /etc/systemd/network	21:14
blackboxsw	dojordan: cloud-init doesn't set it, I only read the docs on it as default behavior if unset. I can see in systemd during dhcp client config refresh they check for said flag... but I can't confirm whether something sets that up.	21:15
* blackboxsw needs to look at this again it seems. my recall is dusty on what I originally tested/saw on artful and bionic. I'll spin up azure vms for testing now		21:15
dojordan	Gotcha, ill enable the flag and look for a dhcp request	21:16
dojordan	what's the policy in cloud-init about restarting another service? Manual testing seems to confirm what michael said in bug 1739516 that changing the hostname doesn't appear to actually send a DHCP request	21:24
ubot5	bug 1739516 in cloud-init "networking comes up before hostname is set" [Medium,Confirmed] https://launchpad.net/bugs/1739516	21:24
smoser	dojordan: well if we have to bounce it, we can	21:25
blackboxsw	dojordan: so systemd-networkd docs say the SendHostname=true needs to be in the [DHCP] section of the configs. I see that DHCP section in /run/systemd/network/10-netplan-eth0.network.... but I'm guessing we could provided a /etc/systemd/network/11-something.network file containing [DHCP]\bSendHostname=true if needed? (shot in the dark as I'm working off docs at the moment	21:25
smoser	ads we have before	21:25
smoser	it'd be nicer if you coudl bounce that inteface	21:25
smoser	rather than restarting the service	21:25
smoser	i'm somwwhat concerned about restartign the service during boot, but we will see.	21:26
blackboxsw	that's just me RTFMing though. poking at an azure instance in a min here to confirm	21:26
blackboxsw	smoser: rharper do we know if netplan configs allow you to post arbitrary dhcp config options, or just dhcp4: on dhcp6: on etc	21:27
blackboxsw	s/on/true	21:27
* blackboxsw doesn't see anything that looks like it at http://people.canonical.com/~mtrudel/netplan/		21:28
blackboxsw	not sure if that's the 'canonical' source for netplan docs though	21:28
rharper	blackboxsw: netplan does not expose arbitrary dhcp options; I think accept-ra is the exception at this time	21:28
smoser	what was the flag we were interested in ?	21:29
rharper	SendHostname; but I do wonder if we can figure out what's going on before we start exposing these sorts of things in the top-level yaml;	21:29
rharper	is this some sort of dynamic dns update based on hostname from a dhclient request ? or is it doing something else with this specific mechanism	21:30
dojordan	I think the flag is simply whether or nor to send the hostname when sending a dhcp request, but not actually sending dhcp everytime the hostname is changed	21:30
cyphermox	smoser, rharper: SendHostname is default in systemd.	21:31
rharper	it's just sent when the client attempts to obtain (or renew) a lease IIUC	21:31
cyphermox	(so is UseHostname=, which is meant to use the hostname that DHCP hands out)	21:32
dojordan	gotcha, so we won't be able to rely on that to force a new dhcp	21:32
cyphermox	what is this about?	21:32
dojordan	basically we (azure) need a way to force a dhcp request to acquire a new IP in certain circumstances	21:33
dojordan	we used to use ifdown/ifup but post xenial we no longer have those binaries installed by default	21:33
cyphermox	dojordan: wouldn't the "right way" be to have the DHCP server send a DHCPNAK?	21:34
cyphermox	assuming that works "out of line"	21:34
dojordan	or the other right way would be to use leases less than 2^23 - 1 seconds... but unfortunately in our stack the dhcp server is not aware of when to send the DHCPNAK	21:37
cyphermox	right	21:37
cyphermox	short leases obviate the issue of "we need to force the client to re-configure now", at the cost of more packets happening on the network	21:38
cyphermox	whereas DHCPNAK or doing stuff on the client means there needs to be code that decides it's time to reconfigure outside of the lease expiry time	21:38
cyphermox	is the client or infrastructure more likely to know what set of circumstances the client should do DHCP again?	21:40
dojordan	client	21:40
cyphermox	(my guess is the server should always be authoritative, but I don't know of your use case)	21:41
cyphermox	ah, interesting	21:41
dojordan	i guess we could just run dhclient -r?	21:41
dojordan	i can elaborate on the scenario:	21:41
cyphermox	dojordan: dhclient isn't what's doing DHCP when using systemd-networkd.	21:41
dojordan	oh right... and systemd-network doesn't expose the same api?	21:41
cyphermox	I don't think it exposes that, but I'm not sure :)	21:42
dojordan	thinking outside the box, can we delete the dhcp lease files? does that retrigger it?	21:43
smoser	dojordan: this m ight become easier (and more easily use dhclient sandboxed) if we get to running in local time.	21:43
dojordan	sorry didn't quite catch that	21:44
nazarewk	hlelo	21:44
smoser	oh. hm.	21:44
nazarewk	*hello	21:44
smoser	wait.	21:44
nazarewk	how does this project relate to CoreOS cloud-init and RancherOS cloud-init?	21:44
smoser	now i'm confused.	21:45
cyphermox	dojordan: I don't think it would trigger it, but you might be able to just restart systemd-networkd without adverse consequences	21:45
smoser	nazarewk: no. coreos is not here.	21:45
* smoser googles rancheros		21:45
nazarewk	smoser: i'm researching cloud operating systems	21:45
dojordan	that was my though too, just wanted to make sure it would be safe	21:45
smoser	dojordan: we run azure datasource at local	21:45
smoser	meaning proper system configured networking isnt up yet.	21:46
nazarewk	after going through coreos (deprecated cloud-init), RancherOS (forked from CoreOS cloud-init) i stumbled upon configuring Project Atomic with cloud-init	21:46
smoser	now i'm confused.	21:46
nazarewk	and saw that is some wide standard	21:46
cyphermox	smoser: again?	21:46
cyphermox	;)	21:46
nazarewk	any idea how those 3 (or 2 i guess) relate to each other?	21:46
smoser	well, project atomic to my knowledge uses this cloud-init	21:47
smoser	(as contributors from redhat have added that support)	21:47
nazarewk	i already know that, i am more interested in how cloud-init came to be	21:48
nazarewk	i can't find any info on where it came from	21:48
nazarewk	ok looks like i found this	21:49
nazarewk	https://github.com/coreos/coreos-cloudinit#configuration-with-cloud-config	21:49
cyphermox	dojordan: so, I guess what you would need to do depends on what the circumstances are in which a client needs to restart DHCP, if it's because new information was received from the datasource, maybe the best is for cloud-init to restart systemd-networkd when it finds out, if systemd-networkd is even running	21:51
cyphermox	OTOH, if it's potentially hours after boot, then some agent would need to do it, if it's safe.	21:52
dojordan	so the reason this matters is if we hit our instance metadata service with a stale IP, it doesn't recognize it and the client will throw an exception	21:52
dojordan	so we catch it and bounce the nic (on xenial)	21:52
cyphermox	so this isn't actually doing any real DHCP?	21:53
cyphermox	otherwise you'd have the lease times to tell you when to renew, so you theoretically never have a stale IP	21:53
dojordan	our leases are infinite	21:53
dojordan	but it is doing real DHCP in the sense we are getting a new ip,dns,subnet, etc	21:54
dojordan	the reason it could be stale is our platform is moving a vm from one vnet to another	21:54
cyphermox	yes, but I mean even if the IP belongs to you forever, a short lease will have you ping the DHCP server periodically and possibly get new information	21:54
cyphermox	ah	21:54
dojordan	@smoser, not entirely sure when systemd-networkd is running, but I have confirmed in testing that in my current PR we do successfully hit the instance metadata server in our data source. So I guess some parts of the networking stack are setup?	21:57
smoser	dojordan: right.	22:01
smoser	thats what is confusing to me	22:01
smoser	i think you are actually ending up "bouncing" the network that wasnt up yet	22:01
smoser	but not sure. i'm looking at that.	22:02
rharper	what stage is cloud-init at when it's time to bounce? has cloud-init net already run? is that what we're trying to determine ?	22:05
dojordan	yeah, this is during _get_data within the AzureDataSource, so if it is in local then network has yet to run	22:05
rharper	then it won't yet be up; yeah	22:07
smoser	right	22:07
rharper	and then the resume shouldn't need to kick dhcp since it never leased anything anyhow; execpt how are we polling metadata service to know when it's time to come up again ?	22:07
smoser	right	22:08
smoser	i think it came up because we bounced it	22:08
smoser	so it ran 'ifdown; ifup' and it came up. since the ifdown didnt do anthing	22:09
smoser	thats my hypothesis	22:09
smoser	on xenial	22:09
dojordan	i think you're right	22:09
dojordan	so theoretically it wouldn't work > xenial as there is no ifup	22:10
rharper	w.r.t the systemd path; if were in the same boat; I think an ip link set down on the interface that needs to bounce may be enough for systemd-networkd to allow a restart to kick DHCP again	22:10
rharper	that's something testable	22:10
dojordan	yes i am actually testing that later this week or early next	22:11
dojordan	waiting for the networking team...	22:11
dojordan	but they claimed changing the link state of the interface (removing and re adding the nic from our hypervisor) didn't trigger dhcp in the vm.	22:11
smoser	dojordan: since we're running at local time frame, we dont have to "bounce"	22:12
smoser	and we can use somethign more like the ec2 datasource	22:12
dojordan	sounds good	22:12
rharper	dojordan: that make sense as it never DHCP'd in the first place;	22:12
smoser	which does a dhclient, gets its network info, then we can use the 'EphemeralIPV4Network' context manager	22:12
smoser	to hit the datasource, and timeout and do it all again	22:12
dojordan	beauty	22:13
dojordan	i like it	22:13
rharper	but what remains is, while it's sleeping, how can it know when it's time to wake-up if we don't have a network interface up to poll a URL ?	22:13
dojordan	we bring up a temporary one	22:13
smoser	rharper: right now, if uyou're looking at his branch it runs	22:13
smoser	self.bounce_network_with_azure_hostname	22:13
smoser	in _reprovision	22:13
smoser	and that does	22:13
dojordan	(just to confirm) the ec2 ephemeralipv4network hits the dhcp server?	22:13
smoser	sh -c 'ifdown eth0; ifup eth0'	22:13
smoser	which ends up bringing it up the first time through	22:13
smoser	and relying on "stale" networking configuration to do so	22:14
smoser	dojordan: look at the ec2 code you shuld be able to see.	22:14
rharper	but, the ifup eth0 isn't sufficient in local; there's no eth0 network config to indicate it needs to DHCP	22:14
rharper	at least in > xenial	22:14
rharper	or at least I'm not sure we write out the network config for it starts "sleep waiting"	22:15
smoser	this hunk in _get_data	22:15
smoser	http://paste.ubuntu.com/26368525/	22:15
smoser	we'd just do	22:15
rharper	oh, EC2 does, but not in Azure at this time	22:16
smoser	with net.EphemeralIpv4Network(**net_params):	22:16
smoser	right	22:16
smoser	so i'm saying something like:	22:16
smoser	while True:	22:16
smoser	dhcp_leases = dhcp.maybe_perform_dhcp_discovery(self.fallback_interface)	22:17
smoser	if not dhcp_leases:	22:17
smoser	something_bad	22:17
smoser	with net.Ephem....	22:17
smoser	hit MD service	22:17
smoser	on happy path break	22:17
smoser	that is terrible i realize , buti think maybe explains it.	22:18
rharper	right; the net.Eph dhcp stuff; how did we work around requiring dhclient ? or do we keep that present ?for Artful + Bionic ?	22:18
smoser	we require dhclient	22:19
blackboxsw	nazarewk: I believe cloud-init originated here, with these folks, originally a Canonical internal product that garnered broad adoption and was picked up a supported by other OSes and clouds. :) https://www.podcastinit.com/cloud-init-with-scott-moser-episode-126/	22:20
blackboxsw	:)	22:20
blackboxsw	smoser: per the suggestion about the poll_imds looping.... right we should be able to call maybe_permform_dhcp_discovery which attempts dhclient queries using the fallback nic. and returns an empty list if dhclient doesn't exist or can't get an ip address	22:25
blackboxsw	you're pastebin explains what we do currently in ec2 which could apply here in azure too https://paste.ubuntu.com/26368525/	22:26
smoser	i hope dojordan follows. i think blackboxsw and rharper do, but ih ave to run.	22:27
smoser	i'll look in tomorrow on it a bit dojordan	22:27
blackboxsw	dojordan: (just to confirm) the ec2 ephemeralipv4network hits the dhcp server?.... Ephemeral ipv4 network uses the response for maybe_perform_dhcp_discovery	22:27
blackboxsw	you provide it with a params interface, ip, prefix_or_mask broadcast and router all of which maybe_perform_dhcp_discovery returns	22:30
blackboxsw	you provide it with the params (interface, ip, prefix_or_mask broadcast and router) all of which maybe_perform_dhcp_discovery returns	22:30
blackboxsw	the specific use (which could be nearly the same for Azure) is here https://git.launchpad.net/cloud-init/tree/cloudinit/sources/DataSourceEc2.py?id=78372f16d2711812793196aa8003ad51693ca472#n105	22:33
blackboxsw	though within the with net.EphemeralIPv4Network context youc could do your polling of IMDS	22:34
blackboxsw	as that EphemeralIPv4Network context manager serves only to temporarily bring up a network interface to allow you to hit an external URL. Then it tears that interface back down	22:35
blackboxsw	so it basically performs whatever static network setup is required (including routes) on a given interface for your context and then any setup it needed to perform it tears down upon __exit__	22:37
dojordan	yeah I will play around with it this afternoon, I might make a modification to add retry support though	22:39
blackboxsw	I'm +1 on retries, I like those more that while Trues :)	22:39
dojordan	haha yeah a little scary typing that	22:51

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!