/srv/irclogs.ubuntu.com/2018/01/11/#cloud-init.txt

kholkinahi all! I have a question regarding user-data update. I have updated a user-data via openstack nova and now I have a new value when chek it via http://169.254.169.254/2009-04-04/user-data. Also I added [scripts_users, always] in the config file. But it still doesn't work on reboot. When I delete obj.pkl manually and reboot it works fine. But I think it's not the better way11:45
nilujesmoser: <313:47
nilujejust noticed you merged the scaleway datasource for xenial on December13:48
nilujethanks for the xmas gift :)13:48
smoserniluje: \o/14:40
niluje"caribou" joined us on monday :p15:10
dojordan@smoser, can you take a look at my latest iteration? I proposed and implemented a solution that works post xenial: https://code.launchpad.net/~dojordan/cloud-init/+git/cloud-init/+merge/33434119:10
ajorgI found what I think is an interesting (bug?) behavior...19:18
ajorgIf you are using NoCloud (via an ISO attached to a VM) and you remove the ISO, the next time you boot it will decide you're using None instead of NoCloud and it will re-initialize.19:19
ajorgI'm not sure how to improve this behavior, but I wonder if using the DMI UUID or serial for both None and NoCloud would fix it.19:20
ajorgI suspect similar behavior would be shown if you denied access to the instance metadata on an EC2 instance early enough that cloud-init couldn't read it.19:21
smoserajorg: that is correct.19:22
smoseryeah, we could use the dmi uuid (as some other clouds do).  and keep that.19:22
smoserthere is also 'manual_cache_clean' that you can set19:23
smoserand then you can remove the disk.19:23
ajorgIf we use the same ID for both NoCloud and None will it detect that cloud-init has already run even though it has changed clouds?19:23
ajorgA note that says "please don't remove the ISO or this will happen" in the docs would also be a good start.19:25
smoserblackboxsw:20:29
smoserhttps://git.launchpad.net/~dojordan/cloud-init/commit/?id=7f23e5c4808a9c647cd4d5277625a723a58b132b20:29
smoser"on ubuntu release > xenial we rely on systemd-networkd"20:29
smoserdo you know if that is correct ?20:29
blackboxswreading20:31
blackboxswsmoser: for artful and bionic I know that's true... not sure about zesty. I don't recall, but I'll spin one up now20:34
blackboxswI left nearly the same log message on dropping ifupdown Azure stuff in my commit20:34
blackboxsw"    In artful and bionic ifupdown package is no longer installed in default20:34
blackboxsw    cloud images. As such, Azure can't use those tools to bounce the network20:34
blackboxsw    informing DDNS about hostname changes."20:34
smoserblackboxsw: i just commented there20:35
smosermwhudson says that you and i are blowing smoke20:35
blackboxsw" the data thre is new and will get documented more as we go.21:09
blackboxswThat function will make its way back to 16.04 in a cloud-init SRU in the next month or so." .... smoser I'll make a card for that.21:09
blackboxswI should have shoved something into RTD when we added instance-data.json, but we can get it in soon21:09
blackboxswI kinda figured we needed to do it with the jinja-template handling too21:10
dojordan@blackboxsw unrelated, but do you know where the SendHostname=true flag lives?21:11
blackboxswsmoser: yeah you pointed me at his comment there that systemd isn't handling publishing dhcp info on hostname change..... I thought I ran on azure artful and bionic and the update did happen, but I don't know what triggered it. hrm.21:12
dojordanI actually can't find that flag anywhere actually being set21:12
dojordanIm looking at an artful azure VM now, and I checked /lib/systemd/network, /run/systemd/network, /etc/systemd/network21:14
blackboxswdojordan: cloud-init doesn't set it, I only read the docs on it as default behavior if unset. I can see in systemd during dhcp client config refresh they check for said flag... but I can't confirm whether something sets that up.21:15
* blackboxsw needs to look at this again it seems. my recall is dusty on what I originally tested/saw on artful and bionic. I'll spin up azure vms for testing now21:15
dojordanGotcha, ill enable the flag and look for a dhcp request21:16
dojordanwhat's the policy in cloud-init about restarting another service? Manual testing seems to confirm what michael said in bug 1739516 that changing the hostname doesn't appear to actually send a DHCP request21:24
ubot5bug 1739516 in cloud-init "networking comes up before hostname is set" [Medium,Confirmed] https://launchpad.net/bugs/173951621:24
smoserdojordan: well if we have to bounce it, we can21:25
blackboxswdojordan: so systemd-networkd docs say the SendHostname=true needs to be in the [DHCP] section of the configs. I see that DHCP section in /run/systemd/network/10-netplan-eth0.network.... but I'm guessing we could provided a /etc/systemd/network/11-something.network  file containing [DHCP]\bSendHostname=true  if needed? (shot in the dark as I'm working off docs at the moment21:25
smoserads we have before21:25
smoserit'd be nicer if you coudl bounce *that* inteface21:25
smoserrather than restarting the service21:25
smoseri'm somwwhat concerned about restartign the service during boot, but we will see.21:26
blackboxswthat's just me RTFMing though. poking at an azure instance in a min here to confirm21:26
blackboxswsmoser: rharper do we know if netplan configs allow you to post arbitrary dhcp config options, or just dhcp4: on dhcp6: on etc21:27
blackboxsws/on/true21:27
* blackboxsw doesn't see anything that looks like it at http://people.canonical.com/~mtrudel/netplan/21:28
blackboxswnot sure if that's the 'canonical' source for netplan docs though21:28
rharperblackboxsw: netplan does not expose arbitrary dhcp options; I think accept-ra is the exception at this time21:28
smoserwhat was the flag we were interested in ?21:29
rharperSendHostname; but I do wonder if we can figure out what's going on before we start exposing these sorts of things in the top-level yaml;21:29
rharperis this some sort of dynamic dns update based on hostname from a dhclient request ? or is it doing something else with this specific mechanism21:30
dojordanI think the flag is simply whether or nor to send the hostname when sending a dhcp request, but not actually sending dhcp everytime the hostname is changed21:30
cyphermoxsmoser, rharper: SendHostname is default in systemd.21:31
rharperit's just sent when the client attempts to obtain (or renew) a lease IIUC21:31
cyphermox(so is UseHostname=, which is meant to use the hostname that DHCP hands out)21:32
dojordangotcha, so we won't be able to rely on that to force a new dhcp21:32
cyphermoxwhat is this about?21:32
dojordanbasically we (azure) need a way to force a dhcp request to acquire a new IP in certain circumstances21:33
dojordanwe used to use ifdown/ifup but post xenial we no longer have those binaries installed by default21:33
cyphermoxdojordan: wouldn't the "right way" be to have the DHCP server send a DHCPNAK?21:34
cyphermoxassuming that works "out of line"21:34
dojordanor the other right way would be to use leases less than 2^23 - 1 seconds... but unfortunately in our stack the dhcp server is not aware of when to send the DHCPNAK21:37
cyphermoxright21:37
cyphermoxshort leases obviate the issue of "we need to force the client to re-configure now", at the cost of more packets happening on the network21:38
cyphermoxwhereas DHCPNAK or doing stuff on the client means there needs to be code that decides it's time to reconfigure outside of the lease expiry time21:38
cyphermoxis the client or infrastructure more likely to know what set of circumstances the client should do DHCP again?21:40
dojordanclient21:40
cyphermox(my guess is the server should always be authoritative, but I don't know of your use case)21:41
cyphermoxah, interesting21:41
dojordani guess we could just run dhclient -r?21:41
dojordani can elaborate on the scenario:21:41
cyphermoxdojordan: dhclient isn't what's doing DHCP when using systemd-networkd.21:41
dojordanoh right... and systemd-network doesn't expose the same api?21:41
cyphermoxI don't think it exposes that, but I'm not sure :)21:42
dojordanthinking outside the box, can we delete the dhcp lease files? does that retrigger it?21:43
smoserdojordan: this m ight become easier (and more easily use dhclient sandboxed) if we get to running in local time.21:43
dojordansorry didn't quite catch that21:44
nazarewkhlelo21:44
smoseroh. hm.21:44
nazarewk*hello21:44
smoserwait.21:44
nazarewkhow does this project relate to CoreOS cloud-init and RancherOS cloud-init?21:44
smosernow i'm confused.21:45
cyphermoxdojordan: I don't think it would trigger it, but you might be able to just restart systemd-networkd without adverse consequences21:45
smosernazarewk: no. coreos is not here.21:45
* smoser googles rancheros21:45
nazarewksmoser: i'm researching cloud operating systems21:45
dojordanthat was my though too, just wanted to make sure it would be safe21:45
smoserdojordan: we run azure datasource at local21:45
smosermeaning proper system configured networking isnt up yet.21:46
nazarewkafter going through coreos (deprecated cloud-init), RancherOS (forked from CoreOS cloud-init) i stumbled upon configuring Project Atomic with cloud-init21:46
smosernow i'm confused.21:46
nazarewkand saw that is some wide standard21:46
cyphermoxsmoser: again?21:46
cyphermox;)21:46
nazarewkany idea how those 3 (or 2 i guess) relate to each other?21:46
smoserwell, project atomic to my knowledge uses this cloud-init21:47
smoser(as contributors from redhat have added that support)21:47
nazarewki already know that, i am more interested in how cloud-init came to be21:48
nazarewki can't find any info on where it came from21:48
nazarewkok looks like i found this21:49
nazarewkhttps://github.com/coreos/coreos-cloudinit#configuration-with-cloud-config21:49
cyphermoxdojordan: so, I guess what you would need to do depends on what the circumstances are in which a client needs to restart DHCP, if it's because new information was received from the datasource, maybe the best is for cloud-init to restart systemd-networkd when it finds out, if systemd-networkd is even running21:51
cyphermoxOTOH, if it's potentially hours after boot, then some agent would need to do it, if it's safe.21:52
dojordanso the reason this matters is if we hit our instance metadata service with a stale IP, it doesn't recognize it and the client will throw an exception21:52
dojordanso we catch it and bounce the nic (on xenial)21:52
cyphermoxso this isn't actually doing any real DHCP?21:53
cyphermoxotherwise you'd have the lease times to tell you when to renew, so you theoretically never have a stale IP21:53
dojordanour leases are infinite21:53
dojordanbut it is doing real DHCP in the sense we are getting a new ip,dns,subnet, etc21:54
dojordanthe reason it could be stale is our platform is moving a vm from one vnet to another21:54
cyphermoxyes, but I mean even if the IP belongs to you forever, a short lease will have you ping the DHCP server periodically and possibly get new information21:54
cyphermoxah21:54
dojordan@smoser, not entirely sure when systemd-networkd is running, but I have confirmed in testing that in my current PR we do successfully hit the instance metadata server in our data source. So I guess some parts of the networking stack are setup?21:57
smoserdojordan: right.22:01
smoserthats what is confusing to me22:01
smoseri think you are actually ending up "bouncing" the network that wasnt up yet22:01
smoserbut not sure. i'm looking at that.22:02
rharperwhat stage is cloud-init at when it's time to bounce? has cloud-init net already run? is that what we're trying to determine ?22:05
dojordanyeah, this is during _get_data within the AzureDataSource, so if it is in local then network has yet to run22:05
rharperthen it won't yet be up; yeah22:07
smoserright22:07
rharperand then the resume *shouldn't* need to kick dhcp since it never leased anything anyhow; execpt how are we polling metadata service to know when it's time to come up again ?22:07
smoserright22:08
smoseri think it came up because we bounced it22:08
smoserso it ran 'ifdown; ifup' and it came up. since the ifdown didnt do anthing22:09
smoserthats my hypothesis22:09
smoseron xenial22:09
dojordani think you're right22:09
dojordanso theoretically it wouldn't work > xenial as there is no ifup22:10
rharperw.r.t the systemd path;  if were in the same boat; I think an ip link set down on the interface that needs to bounce may be enough for systemd-networkd to allow a restart to kick DHCP again22:10
rharperthat's something testable22:10
dojordanyes i am actually testing that later this week or early next22:11
dojordanwaiting for the networking team...22:11
dojordanbut they claimed changing the link state of the interface (removing and re adding the nic from our hypervisor) didn't trigger dhcp in the vm.22:11
smoserdojordan: since we're running at local time frame, we dont have to "bounce"22:12
smoserand we can use somethign more like the ec2 datasource22:12
dojordansounds good22:12
rharperdojordan: that make sense as it never DHCP'd in the first place;22:12
smoserwhich does a dhclient, gets its network info, then we can use the 'EphemeralIPV4Network' context manager22:12
smoserto hit the datasource, and timeout and do it all again22:12
dojordanbeauty22:13
dojordani like it22:13
rharperbut what remains is, while it's sleeping, how can it know when it's time to wake-up  if we don't have a network interface up to poll a URL ?22:13
dojordanwe bring up a temporary one22:13
smoserrharper: right now, if uyou're looking at his branch it runs22:13
smoserself.bounce_network_with_azure_hostname22:13
smoserin _reprovision22:13
smoserand that does22:13
dojordan(just to confirm) the ec2 ephemeralipv4network hits the dhcp server?22:13
smoser sh -c 'ifdown eth0; ifup eth0'22:13
smoserwhich ends up bringing it up the first time through22:13
smoserand relying on "stale" networking configuration to do so22:14
smoserdojordan: look at the ec2 code you shuld be able to see.22:14
rharperbut, the ifup eth0 isn't sufficient in local; there's no eth0 network config to indicate it needs to DHCP22:14
rharperat least in > xenial22:14
rharperor at least I'm not sure we write out the network config for it starts "sleep waiting"22:15
smoserthis hunk in _get_data22:15
smoserhttp://paste.ubuntu.com/26368525/22:15
smoserwe'd just do22:15
rharperoh, EC2 does, but not in Azure at this time22:16
smoserwith net.EphemeralIpv4Network(**net_params):22:16
smoserright22:16
smoserso i'm saying something like:22:16
smoserwhile True:22:16
smoser    dhcp_leases = dhcp.maybe_perform_dhcp_discovery(self.fallback_interface)22:17
smoser    if not dhcp_leases:22:17
smoser        something_bad22:17
smoser      with net.Ephem....22:17
smoser        hit MD service22:17
smoser        on happy path break22:17
smoserthat is terrible i realize , buti think maybe explains it.22:18
rharperright;  the net.Eph dhcp stuff; how did we work around requiring dhclient ? or do we keep that present ?for Artful + Bionic ?22:18
smoserwe require dhclient22:19
blackboxswnazarewk: I believe cloud-init originated here, with these folks, originally a Canonical internal product  that garnered broad adoption and was picked up a supported by other OSes and clouds. :) https://www.podcastinit.com/cloud-init-with-scott-moser-episode-126/22:20
blackboxsw:)22:20
blackboxswsmoser: per the suggestion about the poll_imds looping.... right we should be able to call maybe_permform_dhcp_discovery which attempts dhclient queries using the fallback nic. and returns an empty list if dhclient doesn't exist or can't get an ip address22:25
blackboxswyou're pastebin explains what we do currently in ec2 which could apply here in azure too https://paste.ubuntu.com/26368525/22:26
smoseri hope dojordan follows. i think blackboxsw and rharper do, but ih ave to run.22:27
smoseri'll look in tomorrow on it a bit dojordan22:27
blackboxswdojordan: (just to confirm) the ec2 ephemeralipv4network hits the dhcp server?.... Ephemeral ipv4 network uses the response for maybe_perform_dhcp_discovery22:27
blackboxswyou provide it with a params interface, ip, prefix_or_mask broadcast and router all of which maybe_perform_dhcp_discovery returns22:30
blackboxswyou provide it with the params (interface, ip, prefix_or_mask broadcast and router) all of which maybe_perform_dhcp_discovery returns22:30
blackboxswthe specific use (which could be nearly the same for Azure) is here https://git.launchpad.net/cloud-init/tree/cloudinit/sources/DataSourceEc2.py?id=78372f16d2711812793196aa8003ad51693ca472#n10522:33
blackboxswthough within the with net.EphemeralIPv4Network context youc could do your polling of IMDS22:34
blackboxswas that EphemeralIPv4Network context manager serves only to temporarily bring up a network interface to allow you to hit an external URL. Then it tears that interface back down22:35
blackboxswso it basically performs whatever static network setup is required (including routes) on a given interface  for your context and then any setup it needed to perform it tears down upon __exit__22:37
dojordanyeah I will play around with it this afternoon, I might make a modification to add retry support though22:39
blackboxswI'm +1 on retries, I like those more that while Trues :)22:39
dojordanhaha yeah a little scary typing that22:51

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!