kholkina | hi all! I have a question regarding user-data update. I have updated a user-data via openstack nova and now I have a new value when chek it via http://169.254.169.254/2009-04-04/user-data. Also I added [scripts_users, always] in the config file. But it still doesn't work on reboot. When I delete obj.pkl manually and reboot it works fine. But I think it's not the better way | 11:45 |
---|---|---|
niluje | smoser: <3 | 13:47 |
niluje | just noticed you merged the scaleway datasource for xenial on December | 13:48 |
niluje | thanks for the xmas gift :) | 13:48 |
smoser | niluje: \o/ | 14:40 |
niluje | "caribou" joined us on monday :p | 15:10 |
dojordan | @smoser, can you take a look at my latest iteration? I proposed and implemented a solution that works post xenial: https://code.launchpad.net/~dojordan/cloud-init/+git/cloud-init/+merge/334341 | 19:10 |
ajorg | I found what I think is an interesting (bug?) behavior... | 19:18 |
ajorg | If you are using NoCloud (via an ISO attached to a VM) and you remove the ISO, the next time you boot it will decide you're using None instead of NoCloud and it will re-initialize. | 19:19 |
ajorg | I'm not sure how to improve this behavior, but I wonder if using the DMI UUID or serial for both None and NoCloud would fix it. | 19:20 |
ajorg | I suspect similar behavior would be shown if you denied access to the instance metadata on an EC2 instance early enough that cloud-init couldn't read it. | 19:21 |
smoser | ajorg: that is correct. | 19:22 |
smoser | yeah, we could use the dmi uuid (as some other clouds do). and keep that. | 19:22 |
smoser | there is also 'manual_cache_clean' that you can set | 19:23 |
smoser | and then you can remove the disk. | 19:23 |
ajorg | If we use the same ID for both NoCloud and None will it detect that cloud-init has already run even though it has changed clouds? | 19:23 |
ajorg | A note that says "please don't remove the ISO or this will happen" in the docs would also be a good start. | 19:25 |
smoser | blackboxsw: | 20:29 |
smoser | https://git.launchpad.net/~dojordan/cloud-init/commit/?id=7f23e5c4808a9c647cd4d5277625a723a58b132b | 20:29 |
smoser | "on ubuntu release > xenial we rely on systemd-networkd" | 20:29 |
smoser | do you know if that is correct ? | 20:29 |
blackboxsw | reading | 20:31 |
blackboxsw | smoser: for artful and bionic I know that's true... not sure about zesty. I don't recall, but I'll spin one up now | 20:34 |
blackboxsw | I left nearly the same log message on dropping ifupdown Azure stuff in my commit | 20:34 |
blackboxsw | " In artful and bionic ifupdown package is no longer installed in default | 20:34 |
blackboxsw | cloud images. As such, Azure can't use those tools to bounce the network | 20:34 |
blackboxsw | informing DDNS about hostname changes." | 20:34 |
smoser | blackboxsw: i just commented there | 20:35 |
smoser | mwhudson says that you and i are blowing smoke | 20:35 |
blackboxsw | " the data thre is new and will get documented more as we go. | 21:09 |
blackboxsw | That function will make its way back to 16.04 in a cloud-init SRU in the next month or so." .... smoser I'll make a card for that. | 21:09 |
blackboxsw | I should have shoved something into RTD when we added instance-data.json, but we can get it in soon | 21:09 |
blackboxsw | I kinda figured we needed to do it with the jinja-template handling too | 21:10 |
dojordan | @blackboxsw unrelated, but do you know where the SendHostname=true flag lives? | 21:11 |
blackboxsw | smoser: yeah you pointed me at his comment there that systemd isn't handling publishing dhcp info on hostname change..... I thought I ran on azure artful and bionic and the update did happen, but I don't know what triggered it. hrm. | 21:12 |
dojordan | I actually can't find that flag anywhere actually being set | 21:12 |
dojordan | Im looking at an artful azure VM now, and I checked /lib/systemd/network, /run/systemd/network, /etc/systemd/network | 21:14 |
blackboxsw | dojordan: cloud-init doesn't set it, I only read the docs on it as default behavior if unset. I can see in systemd during dhcp client config refresh they check for said flag... but I can't confirm whether something sets that up. | 21:15 |
* blackboxsw needs to look at this again it seems. my recall is dusty on what I originally tested/saw on artful and bionic. I'll spin up azure vms for testing now | 21:15 | |
dojordan | Gotcha, ill enable the flag and look for a dhcp request | 21:16 |
dojordan | what's the policy in cloud-init about restarting another service? Manual testing seems to confirm what michael said in bug 1739516 that changing the hostname doesn't appear to actually send a DHCP request | 21:24 |
ubot5 | bug 1739516 in cloud-init "networking comes up before hostname is set" [Medium,Confirmed] https://launchpad.net/bugs/1739516 | 21:24 |
smoser | dojordan: well if we have to bounce it, we can | 21:25 |
blackboxsw | dojordan: so systemd-networkd docs say the SendHostname=true needs to be in the [DHCP] section of the configs. I see that DHCP section in /run/systemd/network/10-netplan-eth0.network.... but I'm guessing we could provided a /etc/systemd/network/11-something.network file containing [DHCP]\bSendHostname=true if needed? (shot in the dark as I'm working off docs at the moment | 21:25 |
smoser | ads we have before | 21:25 |
smoser | it'd be nicer if you coudl bounce *that* inteface | 21:25 |
smoser | rather than restarting the service | 21:25 |
smoser | i'm somwwhat concerned about restartign the service during boot, but we will see. | 21:26 |
blackboxsw | that's just me RTFMing though. poking at an azure instance in a min here to confirm | 21:26 |
blackboxsw | smoser: rharper do we know if netplan configs allow you to post arbitrary dhcp config options, or just dhcp4: on dhcp6: on etc | 21:27 |
blackboxsw | s/on/true | 21:27 |
* blackboxsw doesn't see anything that looks like it at http://people.canonical.com/~mtrudel/netplan/ | 21:28 | |
blackboxsw | not sure if that's the 'canonical' source for netplan docs though | 21:28 |
rharper | blackboxsw: netplan does not expose arbitrary dhcp options; I think accept-ra is the exception at this time | 21:28 |
smoser | what was the flag we were interested in ? | 21:29 |
rharper | SendHostname; but I do wonder if we can figure out what's going on before we start exposing these sorts of things in the top-level yaml; | 21:29 |
rharper | is this some sort of dynamic dns update based on hostname from a dhclient request ? or is it doing something else with this specific mechanism | 21:30 |
dojordan | I think the flag is simply whether or nor to send the hostname when sending a dhcp request, but not actually sending dhcp everytime the hostname is changed | 21:30 |
cyphermox | smoser, rharper: SendHostname is default in systemd. | 21:31 |
rharper | it's just sent when the client attempts to obtain (or renew) a lease IIUC | 21:31 |
cyphermox | (so is UseHostname=, which is meant to use the hostname that DHCP hands out) | 21:32 |
dojordan | gotcha, so we won't be able to rely on that to force a new dhcp | 21:32 |
cyphermox | what is this about? | 21:32 |
dojordan | basically we (azure) need a way to force a dhcp request to acquire a new IP in certain circumstances | 21:33 |
dojordan | we used to use ifdown/ifup but post xenial we no longer have those binaries installed by default | 21:33 |
cyphermox | dojordan: wouldn't the "right way" be to have the DHCP server send a DHCPNAK? | 21:34 |
cyphermox | assuming that works "out of line" | 21:34 |
dojordan | or the other right way would be to use leases less than 2^23 - 1 seconds... but unfortunately in our stack the dhcp server is not aware of when to send the DHCPNAK | 21:37 |
cyphermox | right | 21:37 |
cyphermox | short leases obviate the issue of "we need to force the client to re-configure now", at the cost of more packets happening on the network | 21:38 |
cyphermox | whereas DHCPNAK or doing stuff on the client means there needs to be code that decides it's time to reconfigure outside of the lease expiry time | 21:38 |
cyphermox | is the client or infrastructure more likely to know what set of circumstances the client should do DHCP again? | 21:40 |
dojordan | client | 21:40 |
cyphermox | (my guess is the server should always be authoritative, but I don't know of your use case) | 21:41 |
cyphermox | ah, interesting | 21:41 |
dojordan | i guess we could just run dhclient -r? | 21:41 |
dojordan | i can elaborate on the scenario: | 21:41 |
cyphermox | dojordan: dhclient isn't what's doing DHCP when using systemd-networkd. | 21:41 |
dojordan | oh right... and systemd-network doesn't expose the same api? | 21:41 |
cyphermox | I don't think it exposes that, but I'm not sure :) | 21:42 |
dojordan | thinking outside the box, can we delete the dhcp lease files? does that retrigger it? | 21:43 |
smoser | dojordan: this m ight become easier (and more easily use dhclient sandboxed) if we get to running in local time. | 21:43 |
dojordan | sorry didn't quite catch that | 21:44 |
nazarewk | hlelo | 21:44 |
smoser | oh. hm. | 21:44 |
nazarewk | *hello | 21:44 |
smoser | wait. | 21:44 |
nazarewk | how does this project relate to CoreOS cloud-init and RancherOS cloud-init? | 21:44 |
smoser | now i'm confused. | 21:45 |
cyphermox | dojordan: I don't think it would trigger it, but you might be able to just restart systemd-networkd without adverse consequences | 21:45 |
smoser | nazarewk: no. coreos is not here. | 21:45 |
* smoser googles rancheros | 21:45 | |
nazarewk | smoser: i'm researching cloud operating systems | 21:45 |
dojordan | that was my though too, just wanted to make sure it would be safe | 21:45 |
smoser | dojordan: we run azure datasource at local | 21:45 |
smoser | meaning proper system configured networking isnt up yet. | 21:46 |
nazarewk | after going through coreos (deprecated cloud-init), RancherOS (forked from CoreOS cloud-init) i stumbled upon configuring Project Atomic with cloud-init | 21:46 |
smoser | now i'm confused. | 21:46 |
nazarewk | and saw that is some wide standard | 21:46 |
cyphermox | smoser: again? | 21:46 |
cyphermox | ;) | 21:46 |
nazarewk | any idea how those 3 (or 2 i guess) relate to each other? | 21:46 |
smoser | well, project atomic to my knowledge uses this cloud-init | 21:47 |
smoser | (as contributors from redhat have added that support) | 21:47 |
nazarewk | i already know that, i am more interested in how cloud-init came to be | 21:48 |
nazarewk | i can't find any info on where it came from | 21:48 |
nazarewk | ok looks like i found this | 21:49 |
nazarewk | https://github.com/coreos/coreos-cloudinit#configuration-with-cloud-config | 21:49 |
cyphermox | dojordan: so, I guess what you would need to do depends on what the circumstances are in which a client needs to restart DHCP, if it's because new information was received from the datasource, maybe the best is for cloud-init to restart systemd-networkd when it finds out, if systemd-networkd is even running | 21:51 |
cyphermox | OTOH, if it's potentially hours after boot, then some agent would need to do it, if it's safe. | 21:52 |
dojordan | so the reason this matters is if we hit our instance metadata service with a stale IP, it doesn't recognize it and the client will throw an exception | 21:52 |
dojordan | so we catch it and bounce the nic (on xenial) | 21:52 |
cyphermox | so this isn't actually doing any real DHCP? | 21:53 |
cyphermox | otherwise you'd have the lease times to tell you when to renew, so you theoretically never have a stale IP | 21:53 |
dojordan | our leases are infinite | 21:53 |
dojordan | but it is doing real DHCP in the sense we are getting a new ip,dns,subnet, etc | 21:54 |
dojordan | the reason it could be stale is our platform is moving a vm from one vnet to another | 21:54 |
cyphermox | yes, but I mean even if the IP belongs to you forever, a short lease will have you ping the DHCP server periodically and possibly get new information | 21:54 |
cyphermox | ah | 21:54 |
dojordan | @smoser, not entirely sure when systemd-networkd is running, but I have confirmed in testing that in my current PR we do successfully hit the instance metadata server in our data source. So I guess some parts of the networking stack are setup? | 21:57 |
smoser | dojordan: right. | 22:01 |
smoser | thats what is confusing to me | 22:01 |
smoser | i think you are actually ending up "bouncing" the network that wasnt up yet | 22:01 |
smoser | but not sure. i'm looking at that. | 22:02 |
rharper | what stage is cloud-init at when it's time to bounce? has cloud-init net already run? is that what we're trying to determine ? | 22:05 |
dojordan | yeah, this is during _get_data within the AzureDataSource, so if it is in local then network has yet to run | 22:05 |
rharper | then it won't yet be up; yeah | 22:07 |
smoser | right | 22:07 |
rharper | and then the resume *shouldn't* need to kick dhcp since it never leased anything anyhow; execpt how are we polling metadata service to know when it's time to come up again ? | 22:07 |
smoser | right | 22:08 |
smoser | i think it came up because we bounced it | 22:08 |
smoser | so it ran 'ifdown; ifup' and it came up. since the ifdown didnt do anthing | 22:09 |
smoser | thats my hypothesis | 22:09 |
smoser | on xenial | 22:09 |
dojordan | i think you're right | 22:09 |
dojordan | so theoretically it wouldn't work > xenial as there is no ifup | 22:10 |
rharper | w.r.t the systemd path; if were in the same boat; I think an ip link set down on the interface that needs to bounce may be enough for systemd-networkd to allow a restart to kick DHCP again | 22:10 |
rharper | that's something testable | 22:10 |
dojordan | yes i am actually testing that later this week or early next | 22:11 |
dojordan | waiting for the networking team... | 22:11 |
dojordan | but they claimed changing the link state of the interface (removing and re adding the nic from our hypervisor) didn't trigger dhcp in the vm. | 22:11 |
smoser | dojordan: since we're running at local time frame, we dont have to "bounce" | 22:12 |
smoser | and we can use somethign more like the ec2 datasource | 22:12 |
dojordan | sounds good | 22:12 |
rharper | dojordan: that make sense as it never DHCP'd in the first place; | 22:12 |
smoser | which does a dhclient, gets its network info, then we can use the 'EphemeralIPV4Network' context manager | 22:12 |
smoser | to hit the datasource, and timeout and do it all again | 22:12 |
dojordan | beauty | 22:13 |
dojordan | i like it | 22:13 |
rharper | but what remains is, while it's sleeping, how can it know when it's time to wake-up if we don't have a network interface up to poll a URL ? | 22:13 |
dojordan | we bring up a temporary one | 22:13 |
smoser | rharper: right now, if uyou're looking at his branch it runs | 22:13 |
smoser | self.bounce_network_with_azure_hostname | 22:13 |
smoser | in _reprovision | 22:13 |
smoser | and that does | 22:13 |
dojordan | (just to confirm) the ec2 ephemeralipv4network hits the dhcp server? | 22:13 |
smoser | sh -c 'ifdown eth0; ifup eth0' | 22:13 |
smoser | which ends up bringing it up the first time through | 22:13 |
smoser | and relying on "stale" networking configuration to do so | 22:14 |
smoser | dojordan: look at the ec2 code you shuld be able to see. | 22:14 |
rharper | but, the ifup eth0 isn't sufficient in local; there's no eth0 network config to indicate it needs to DHCP | 22:14 |
rharper | at least in > xenial | 22:14 |
rharper | or at least I'm not sure we write out the network config for it starts "sleep waiting" | 22:15 |
smoser | this hunk in _get_data | 22:15 |
smoser | http://paste.ubuntu.com/26368525/ | 22:15 |
smoser | we'd just do | 22:15 |
rharper | oh, EC2 does, but not in Azure at this time | 22:16 |
smoser | with net.EphemeralIpv4Network(**net_params): | 22:16 |
smoser | right | 22:16 |
smoser | so i'm saying something like: | 22:16 |
smoser | while True: | 22:16 |
smoser | dhcp_leases = dhcp.maybe_perform_dhcp_discovery(self.fallback_interface) | 22:17 |
smoser | if not dhcp_leases: | 22:17 |
smoser | something_bad | 22:17 |
smoser | with net.Ephem.... | 22:17 |
smoser | hit MD service | 22:17 |
smoser | on happy path break | 22:17 |
smoser | that is terrible i realize , buti think maybe explains it. | 22:18 |
rharper | right; the net.Eph dhcp stuff; how did we work around requiring dhclient ? or do we keep that present ?for Artful + Bionic ? | 22:18 |
smoser | we require dhclient | 22:19 |
blackboxsw | nazarewk: I believe cloud-init originated here, with these folks, originally a Canonical internal product that garnered broad adoption and was picked up a supported by other OSes and clouds. :) https://www.podcastinit.com/cloud-init-with-scott-moser-episode-126/ | 22:20 |
blackboxsw | :) | 22:20 |
blackboxsw | smoser: per the suggestion about the poll_imds looping.... right we should be able to call maybe_permform_dhcp_discovery which attempts dhclient queries using the fallback nic. and returns an empty list if dhclient doesn't exist or can't get an ip address | 22:25 |
blackboxsw | you're pastebin explains what we do currently in ec2 which could apply here in azure too https://paste.ubuntu.com/26368525/ | 22:26 |
smoser | i hope dojordan follows. i think blackboxsw and rharper do, but ih ave to run. | 22:27 |
smoser | i'll look in tomorrow on it a bit dojordan | 22:27 |
blackboxsw | dojordan: (just to confirm) the ec2 ephemeralipv4network hits the dhcp server?.... Ephemeral ipv4 network uses the response for maybe_perform_dhcp_discovery | 22:27 |
blackboxsw | you provide it with a params interface, ip, prefix_or_mask broadcast and router all of which maybe_perform_dhcp_discovery returns | 22:30 |
blackboxsw | you provide it with the params (interface, ip, prefix_or_mask broadcast and router) all of which maybe_perform_dhcp_discovery returns | 22:30 |
blackboxsw | the specific use (which could be nearly the same for Azure) is here https://git.launchpad.net/cloud-init/tree/cloudinit/sources/DataSourceEc2.py?id=78372f16d2711812793196aa8003ad51693ca472#n105 | 22:33 |
blackboxsw | though within the with net.EphemeralIPv4Network context youc could do your polling of IMDS | 22:34 |
blackboxsw | as that EphemeralIPv4Network context manager serves only to temporarily bring up a network interface to allow you to hit an external URL. Then it tears that interface back down | 22:35 |
blackboxsw | so it basically performs whatever static network setup is required (including routes) on a given interface for your context and then any setup it needed to perform it tears down upon __exit__ | 22:37 |
dojordan | yeah I will play around with it this afternoon, I might make a modification to add retry support though | 22:39 |
blackboxsw | I'm +1 on retries, I like those more that while Trues :) | 22:39 |
dojordan | haha yeah a little scary typing that | 22:51 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!