=== vrubiolo1 is now known as vrubiolo === vrubiolo1 is now known as vrubiolo [15:05] hi [15:07] I'm trying to write to root home directory ~/ from userdata (/bin/sh) in ec2 instance(AWS), but I'm getting error because directory not exists, I found that HOME env variable is empty [15:07] any idea? [15:10] shaykeren: Instead of using ~, you could just use /root/? [15:20] sure, but I would like to know why it is working when my userdata runs with #!/bin/bash and also the HOME env var is not empty in that case, [15:21] why is it behave differently? [15:22] sounds like the issue with shell, not cloud init. is it /bin/sh from busybox or some other trimmed shell? [15:24] im using ubuntu ami [15:25] shaykeren: Can you pastebin your userdata, please? [15:25] they may be using dash, who knows how that behaves [15:28] wait.. [15:35] userdata: [15:35] #!/bin/shecho test>~/testecho $HOMEecho "[cs18][$(date +"%T.%3N")] start of chronos user-data";echo "[cs18][$(date +"%T.%3N")] end of chronos user-data"; [15:36] cloud-init-output: [15:36] Cloud-init v. 18.3-9-g2e62cb8a-0ubuntu1~16.04.2 running 'modules:config' at Thu, 12 Mar 2020 15:34:10 +0000. Up 25.08 seconds./var/lib/cloud/instance/scripts/part-001: 2: /var/lib/cloud/instance/scripts/part-001: cannot create ~/test: Directory nonexistent[cs18][15:34:12.364] start of chronos user-data[cs18][15:34:12.366] end of chronos user-data [15:39] I also saw that in case the userdata has #!/bin/bash it is executed as /bin/bash [part-001 path] and incase of #!/bin/sh it is executed as /bin/bash[part-001 path] [15:40] I also figured that if Ill run this script manually (inside the ec2 instance) /bin/sh[part-001 path] it is running ok without any issue [15:43] shaykeren: Sorry, when I say pastebin I mean something like https://paste.ubuntu.com/. Could you paste that all there and then post the link in here? [15:43] (It's much easier to figure out what's going on when newlines are intact. :) [15:46] sure my fault [15:46] https://paste.ubuntu.com/p/Sxrf9727n7/ [15:48] cloud-init-output logs: https://paste.ubuntu.com/p/37sK5ggYTb/ [15:49] shaykeren: OK, so that's the failing case, right? What does the passing case look like? [15:50] changing the #!/bin/sh to #!/bin/bash in the userdata [16:03] shaykeren: And what does the output look like, if you could paste that too? [16:04] ok let me run it with /bin/bash [16:12] cloud-init-output - https://paste.ubuntu.com/p/TbdhvB7gkb/ , file was created under home directory of the root user [16:13] userdata - https://paste.ubuntu.com/p/jV9WBTGBGM/ [16:14] in both cases the HOME env variable was empty but in case of #!/bin/bash the file created [16:17] shaykeren: So I'm not sure why there's a difference in behaviour there, but I don't believe that cloud-init treats the two files differently. So you're probably seeing differences in behaviour between bash and dash with the environment that cloud-init executes them in. [16:19] shaykeren: If you want to dig into this more, then please file a bug at https://bugs.launchpad.net/cloud-init/+filebug after reproducing on a more recent version of cloud-init (using the latest image in EC2 would do the trick). [16:19] And attach the output of `cloud-init collect-logs` on the /bin/sh instance, too. [16:19] it is weird because if I run it manually it is run with no issue [16:20] Yeah, it'll be something to do with the execution environment that cloud-init uses. [16:20] If you run it from a logged-in shell then there'll be a lot more environment variables around, for example. [16:21] And cloud-init also runs early enough in boot that some stuff may not yet be on disk, which can sometimes affect behaviour. (I doubt it in this case, but you never know.) [16:21] ok ill collect the logs and upload it [16:22] Thanks! [16:22] :) [17:42] Odd_Bloke: this is a very thoughtful concern you have about ordering https://github.com/canonical/cloud-init/pull/114#discussion_r391222054 . What I think this means is that cloud-init in network_config only (always) adds route information to the first static IP address listed on an interface. Which means that we could currently be adding a normalized route to an IPv6 address (internal net_config subnet) xenial [17:42] and on an IPV4 interface on Bionic+ due to python key dict ordering difference right? [17:42] sorry that's a weighty question out of left field I realize [17:43] I'm trying to wrap my head around where/if this is a bug/shortcoming in existing cloud-init net_config translation for interfaces with multiple static addresses on a single interface [17:51] ahh I see now. that for loop on routes is actually adding all routes ipv4 and ipv6 to the first additional static subnet that we configure on an interface. [17:54] testsimple_render_bond_v2_input_netplan seems to be the only test that exercises this route adding [17:57] ahh and ok I get it. we are ok on ordering changes because cloud-init renders netplan with routes on the interface, not the specific address to which we are attaching all those routes. [17:59] here, https://github.com/canonical/cloud-init/blob/master/tests/unittests/test_net.py#L2061-L2099 [18:01] ok routes in initial yaml-v2, get assigned to first IPv4 or ipv6 in internal network_confing addresses list, and then gets bubbled up to netplan config output under network.bonds.bond0.routes instead of hanging off under network.bonds.bond0.addresses[0] as our internal code may have implied. I'll add a comment about this in the code so we don't have to dig into this again next time [18:02] blackboxsw, https://github.com/canonical/cloud-init/pull/62 you can merge the patch. It's all good here. [18:05] excellent Goneri, will you take a followup work item PR for *BSD to sort the package_command('upgrade') call? [18:06] Yes, I think, but first, I will rebase OpenBSD branch. [18:07] +1 no rush on that, just checking whether you agree that is something that should eventually be tackled [18:07] I would also like to take a look at the Ephemeral DHCP thing, it's a bit of a pain in the neck [18:08] unlike Linux, a BSD image should come with (mostly) zero packages. So it's less critical. [18:08] the base system is not part of a packaging system [18:17] Goneri: does the description of your PR now look acceptable? I'll be using that for the squash merge commit message https://github.com/canonical/cloud-init/pull/62 [18:18] if you have any changes/corrections to the PR description text. please update and let me know when you are done reviewing it [18:19] yeah, it's fine. I've updated the list of the OS used during the tests [18:20] could you avoid a squash merge? I tried to isolate each patch as much as I could. [18:20] (well, ignore what I just said, the result is not that great. /me goes hide) [18:29] heh Goneri project-wise we squash merge the world in cloud-init [18:29] Roger that. [18:29] which is why generally we want to keep PRs smaller and more concise [18:29] ... where possible [18:31] Yeah, I know the feeling :-) [18:37] merged Goneri thanks again for that work [18:46] \o/ I've started the branch 1 year ago + 4 days :-) \o/ [18:47] thanks all, I feel emotional now! [18:48] heh holy moly [18:49] should've landed that on the anniversary 4 days ago [18:49] ! [18:49] it may be worth an update to https://cloudinit.readthedocs.io/en/latest/topics/availability.html#distributions [18:49] to add NetBSD in there too [18:51] rharper: Odd_Bloke https://github.com/canonical/cloud-init/pull/114/files#r391823711 reality check for me. I think dropping public-ipv4 check in ec2 still allows us to properly setup dhcp on primary/fallback nic always. [19:01] blackboxsw: without looking, what's the benefit to dropping it? what sort of LOC or execution time are we saving ? [19:01] rharper: I don't think there is a really a benefit, probably just a risk [19:02] yeah one key lookup isn't going to break anything. so we can leave that logic in place [19:02] or reproduce it in the new v2 config [19:04] rharper: ec2 also doesn't actually add public IP configuration details to the instance at all. a work item that we should discuss [19:04] at some point. [19:05] blackboxsw: please file a bug for that with details, that way it gets on the backlog [19:05] currently we rely on dhcp and any secondary local-ipv4s or ipv6s. no additional config added for reading ec2's public_ipv4s [19:06] rharper: will do [19:06] AFAIK, they only publish public DNS names [19:06] rharper: https://cloudinit.readthedocs.io/en/latest/topics/instancedata.html#example-output shows an example [19:06] and this PR 114 shows the updates as well with the new EC2 API version [19:15] https://bugs.launchpad.net/cloud-init/+bug/1867197 [19:15] Ubuntu bug 1867197 in cloud-init "ec2: add support for rendering public IP addresses in network config" [Undecided,New] [19:19] blackboxsw, https://github.com/canonical/cloud-init/pull/250 [19:32] Goneri: Congrats on the branch landing! \o/ [19:33] blackboxsw, what's left on your ec2 secondary nics branch? [19:34] powers working it actively right now. ahh and rharper my question about public-ipv4s handling is moot in this branch as that config in the network_config v1 version of Ec2 only setup dhcp: true if we were fallback_nic, had local-ipv4s or had public-ipv4s metadata values.. in the new v2 network config for Ec2 per PR #114 we are setting dhcp4: true on all nics... I'm looking at trying to reconstitute only adding [19:34] dhcp4: true if fallback_nic public-ipv4s or local-ipv4s [19:35] powersj: I think the only thing left it resolving what I dropped regarding public-ipv4s reading from the metadata. [19:35] as least so we have no risk of regression, even though I believe it doesn't regress anything in it's current state, best to be sure [19:36] blackboxsw: I don't understand the previous logic; we always want to dhcp4 on primary nic; and optionally add static ips if present [19:36] IIUC, that logic came before we parsed network config from IMDS, no ? [19:37] rharper: the release ec2 network_config would have added network-config information that didn't enable dhcp4 at all on a device if nic is not primary and doesn't have public or local ipv4 addrs [19:37] yes, correct [19:37] the *released* ec2 network_config [19:37] yes, I see that [19:37] the network_config in PR 114, was setting up dhcp4 on all nics [19:37] ah, and you're fixing that [19:37] regardless of secondary nic with no local-ipv4 addrs [19:37] right [19:38] I think that's a gap in the switch that I left [19:38] yes [19:38] so logic should be this for ec2: [19:38] we dhcp *only* on primary; and for all nics (primary included) if there are secondary ips, add them (v4 or v6) [19:38] if fallback_nic(primary nic): dhcp4: true [19:39] if scondary nic: dhcp4 true only if local-ipv4s or public-ipv4s is present [19:39] wait [19:39] yep stating case 3 then will wait [19:39] I've never seen ec2 say you can dhcp on secondary nics [19:40] 3: if any nic and len(local-ipv4s) > 1 or len(ipv6s) > 1 then add those secondary IPs to nic config [19:40] rharper: I just tested that on an ec2 instance with 2 nics [19:41] dhcp on both gets you the proper matching ipv4 addrs and routes that secondary config would have setup statically [19:41] but maybe that's unsupported? [19:41] but it is not required [19:41] don't we already have the private ips ? [19:41] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-instance-addressing.html [19:41] says, yes you can dhcp on interfaces with private IP [19:42] I'm not seeing them say dhcp on secondary interfaces should work; though I believe it did work for you [19:43] right "We allocate private IPv4 addresses to instances using DHCP" [19:44] but, what about secondary nics with only public ip addresses (is that even a thing?) [19:44] public ips are 'elastic ips' [19:44] I doubt those are allocated via DHCP [19:44] and I'm not sure you're going to get additional private ips on the same interface [19:45] if we had nic2, only public-ipv4s, no local-ipv4s, would dhcp work. I think not as there is no private IP allocated to that nic [19:45] can you confirm that your secondary nic dhcp response included more than one IP ? [19:45] you always get a private ip (local-ipv4) no ? [19:45] rharper: yeah will setup one now [19:45] thats how you communicate nic to nic internally from instance to instance [19:45] I think we always get a private IP on any attached interface [19:45] right [19:45] yes [19:46] so I think my comment about "public-ipv4s" existing and no local-ipv4s is *not* a thing (not a viable vm network config) [19:46] so, yes you can DHCP on all nics if we wanted; I'm just wondering if that's useful vs. just assiging static ips [19:46] I think a prerequisite of having the attached nic is that it *must* have a local-ipv4s addr [19:47] rharper: right, probably not as useful. we could avoid doing that. though if ec2 instruments custom dhcp options we'd miss out due to our static ip config on secondary nic [19:47] well, let's see what dhcp response shows up on secondary nics; I suspect it's the same as the primary [19:47] but if we add dhcp, on all nics ec2 vms also pay the cost of that dhcp roundtrip right [19:47] sounds good. will setup the instance now [19:47] we can check what ec2net-utils does as well [19:48] and check with Fred [19:49] so powersj the branch is close, I think we are circling the drain on final implementation. it doesn't involve a whole lot of work either way and I'd like to see it landed if we can today or tomorrow so we can get it into the CPC image pipeline [19:49] for focal [19:50] blackboxsw, how do you and rharper close on the remaining work? [19:50] for this current ec2 branch? [19:50] yes [19:51] powersj: it's not going to happen in the short term [19:51] we need to step back and confirm how we want to do it [19:51] I think I have to spin up an instance, we need to check dhcp config output and confirm we aren't missing something interesting by using static addresses? [19:51] let's look at the AmazonLinux package; even see what a multi-nic multi-ip AmazonLinux instance looks like [19:52] and then, I'd check with Fred to see if that's optimal, or if there are better things to do; and then we can make a decision [19:52] now, if we wanted to "land it today" I'd keep it with dhcp on primary, and then *static ips only* if present on all other interfaces present [19:53] ok fair. so sample configs on multi-nic ubuntu and multi-nic amazonlinux and suggest what's best in email to fred/ahnvo ? [19:53] blackboxsw: once we enable DHCP on all interfaces, we certainly need to do route-metric again [19:53] just fred [19:53] rharper: why do we need route-metric if ec2 isn't using classless routes in dhcp? [19:53] Ahn doesn't care about Ec2 networking I bet =) [19:53] if it has a gateway [19:53] you don't want it to clobber primary route [19:54] ahh I thought it was only gateway, plus classless static routes in dhcp that caused this concern [19:54] gateway is a route [19:54] the default [19:54] classless are additional routes (which may include a default route as well) in which you ignore the gateway value, [19:55] we know they don't currently put in a classless static route set; but there may be a GATEWAY= value in which case we still need route metrics to ensure that we don't route packets meant for the internet out of the secondary interfaces [19:55] ok so short term potential of only enabling dhcp on primary interface and static for all the other nics, would that get us into an upgrade pickle if we went dhcp after discussion with Fred? [19:56] it may not have a GATEWAY value, but it could show up (accidental or on purpose) so it's best to put a metric on secondary interfaes [19:56] yeah, unless we render network on every boot [19:56] which we should discuss, with Fred; I blieve we already do each boot but on Ec2 classic only [19:59] rharper: I think we also have CPC image magic that invalidates the cache on cloud images. but can confirm [19:59] so that'd be rendering network every boot, everywhere but that may also be limited to a specific ubuntu series [20:01] I don't think so [20:01] on Azure we added a dropin to rm the obj.pkl [20:02] only on ec2-classic do we render every boot; due to MAC address on nic changes between stops/stars [20:02] on vpc, all is fine [20:02] this reminds me of wanting a table on datasource capabilities (check_instance_id, network_config, update_event tpes) [20:05] rharper: https://paste.ubuntu.com/p/ZyFKkxPDCB/ [20:06] so I'm on a vpc instance (non-classic) and I see cache invalid for Ec2Local ds detection across simple reboots [20:06] # Non-VPC (aka Classic) Ec2 instances need to rewrite the [20:06] # network config file every boot due to MAC address change. [20:06] if self.is_classic_instance(): [20:06] self.update_events['network'].add(EventType.BOOT) [20:07] so I don't know what's going on in the image but I do know what code we wrote [20:07] right agreed on what that code does. there is just some drop in magic at play here I think in ec2 images [20:07] and we don't persist object.pkl on ec2 [20:07] as it doesn't implement check_instance_id() [20:08] I've wrapped myself around the axle at the moment on this. first I'll get that multi-nic instance up with PR 114 so we can dissect the dhcp response from networkd [20:49] blackboxsw: Odd_Bloke: interested in your thoughts on this: https://github.com/canonical/cloud-init/pull/238#issuecomment-598408582 [20:49] when you have time to look [20:50] rharper: sorry here's dhcp info on dual-nic ec2 vm https://pastebin.ubuntu.com/p/tg2ZhZ3V6Z/ [20:50] ROUTER=172.31.32.1 [20:50] that will put in a default route [20:50] so, we defintely want a dhcp-route-metric [20:50] if we go with dhcp on secondary interfaces [20:51] so metrics required in this case. ok [20:51] route-metric rather. [20:51] dhcpX-override: {'route-metric': NNN}' [20:51] and /me just removed it. sorry a more concerted discuss was in order yesterday or the day before to make sure I was gong down the right path. [20:51] hehe [20:51] blackboxsw: also, on your unittests there was a bunch of mac.lower() after you had capitalized on of the MAC values; what was that about ? [20:53] rharper: I think that was earlier me making sure we exercised some of the internal logic in cloud-init which I know lower()'s the mac addr we get from IMDS. I should have instead just added a specific unit test that validated uppercase and lowercase macs result in same rendered net config [20:53] blackboxsw: ah, ok, yeah; less splash damage to other tests [20:53] yeah, and clear documentation of the intent [20:54] rharper: ok, so what path do we want to go on for focal for ec2 secondary nics do you think? [20:54] static addr setup on secondary nics? [20:54] as it stands currently, it looks like published cloud-init on bionic only configs primary nic even on dual-nic boxes [20:54] let's look at ec2utils and see what AmazonLinux does; if they dhcp + additional private ips; then I think we do the same [21:08] hrm, I see a bunch of primary actions (on eth0 only) https://github.com/aws/ec2-net-utils/blob/master/ec2net-functions [21:08] checking around for stuff handling 2nd nic [21:10] which calls plug_interface for all interfaces and only activate_primary on each hotplug add [21:10] https://github.com/aws/ec2-net-utils/blob/master/ec2ifup [21:11] yeah seems in all cases rewrite_primary gets called which noops on !eth0 [21:11] no, it ignores eht0 [21:12] that's code for all other interfaces [21:12] no ? [21:12] so they do dhcp on non-eth0 [21:12] and then ensure the rules for non-eth0 don't clobber eth0 (which Im sure they already dhcp on eth0 ) [21:13] blackboxsw: so, to me, that's equivalent to what we're suggesting now; dhcp on all interfaces, add secondary ips on all interfaces that have them, and ensure non-primary interfaces have a route-metric (which is what they do with the route table bits) [21:13] hahah right complete misread [21:14] ok will reconstitute the route-metric bits. [21:15] RTABLE=${INTERFACE#eth} [21:15] let RTABLE+=10000 [21:15] right [21:15] 10000 offset per nic [21:15] o k [21:15] so they add route-metric on nic >= 1 [21:15] is there a base route metric I wonder [21:16] on eth0 [21:16] which is what we do on Azure [21:16] ah, its not a metric [21:16] its a different routing table altogether [21:16] but it has the same effect in that the primary route table is consulted *first* before looking at higher value tables [21:16] rharper: I was curious if different routing table name/id is equivalent to different metric [21:16] right [21:16] ij [21:16] ok [21:21] rharper: so plan of attack for cloud-init ec2 multi-nic, multi-ip [21:21] https://hackmd.io/rBjW9rjPRg6LYydxOgW8cQ?sync=&type= [21:23] right, we dhcp6 as well if interface has ipv6 right? so the logic is the same for static ipv6 secondary ips as v4 [21:26] right. yes rharper [21:26] cool [21:26] dhcp6 only active ipv6s values [21:27] ok I can correct this branch as I think all it needs to route-metrics at the moment [21:29] excellent [21:29] Do we need a doc update to the Ec2 Datasource docs w.r.t network configuration ? [21:29] if not, I think it would be a good add to describe what we plan to configure in the multi-nic, multi-ip scenarios (v4, v6 and mixed) [21:30] rharper: can add that to the datasource since we are touching it w.r.t. secondary_nic config option [21:30] makes sense [21:31] rharper: so route-metric: 0 for eth0? [21:31] or route-metric: 100 [21:31] no, we did 100 [21:31] for eth0 [21:31] right ok [21:31] just wanted to confirm [21:31] (index + 1) * 100 [21:31] same as azure IIRC [21:31] agreed === ananke_ is now known as ananke