[12:40] <SVP93> Hello all, I need a help with cloud init
[12:41] <SVP93> I have a custom application packged into deb file. And I am trying to install during late-commands.
[12:41] <SVP93> Through the console, I see it gets installed. However post restart after successful installation, the installed package is missing.
[12:42] <SVP93> I am trying with Ubuntu 20.4 live cd ISO image.
[12:43] <SVP93> Can anyone help me solving this custom package installation issue?
[17:11] <akutz> Hi @falcojr, this is Andrew from VMware. I think I figured out my issue re: IRC. I thought y'all were still on Freenode :) 
[17:12] <falcojr> akutz: ooph, yeah...there was a bunch of freenode drama so we moved over here
[17:12] <falcojr> sorry if you've been speaking into the void
[17:13] <akutz> That's fine. I spent an hour and change trying to just get a valid SSO account and never succeeded. I saw the repo pointed to Libera, but I thought it was just a web client for IRC. Then I noticed it was a server in Textual and had the epiphany :) 
[17:15] <akutz> Anyway, I was just checking to see if there is anything else I can or need to do re: https://github.com/canonical/cloud-init/pull/953. I'm not trying to rush anyone (Chad specifically), but I also wanted to make sure there was not anything I could be doing async. Thanks!
[18:10] <falcojr> akutz: not at this time no. I think we just need to get the re-review done
[18:10] <akutz> falcojr: Ack, thanks.
[18:11] <akutz> Need any help with that? I think it looks pretty good... ;) Seriously, thank y'all again for making your way through such a large PR as quickly as you have.
[18:12] <falcojr> you're welcome...it'll be great to have the datasource in cloud-init proper
[18:14] <akutz> As I said in the PR, it will be great to stop seeing the same issue in my repo every other week, "When will this be in CI proper?" :) 
[18:14] <akutz> I think it will also be nice to align all existing VMware subplatforms under a single DS instead of under the OVF DS. 
[18:15] <akutz> That will thankfully make it easy for those DSs to inherit the same post-transport features of the GuestInfo DS.
[18:15] <falcojr> yeah, definitely
[18:16] <akutz> falcojr: FWIW, I added a deprecation notice to https://github.com/vmware/cloud-init-vmware-guestinfo last week since it seems like we're on a fairly consistent pace to merging this DS. 
[18:17] <falcojr> good idea
[18:18] <akutz> It won't be a huge issue until there's a CI release with the new DS, but at the very minimum people could start pulling the files from the CI repo proper.
[18:19] <blackboxsw> akutz: thanks for finding us again. I'm glad to see you join here so we can fast track any back and forth conversations on landing this. I'm reviewing your latest todayt
[18:20] <akutz> blackboxsw: Heh, that's not why I joined, no, of course not... :) Look, it's ridiculous for me to expect y'all to merge this quickly if I won't even *try* to meet you half-way and use your preferred approach to async-communication outside of GitHub.
[18:22] <akutz> The fact is, I've been one of the largest proponents of cloud-init at VMware for three years since I joined in 2018. The problem is my interests and my responsibilities only align on occasion. My hope is that with this DS in CI proper we will have more opportunities to invest more resources (including myself) into CI.
[18:23] <akutz> Earlier this year I was named as the Solutions Architect for VM Service (https://core.vmware.com/blog/introducing-virtual-machine-provisioning-kubernetes-vm-service), and I've been exploring other platforms and how they align on specific guest customization engines. I helped create https://github.com/kubernetes-sigs/image-builder, and it was based on cloud-init. You can connect the dots :)
[18:25] <akutz> I've been in talks with the team responsible for Cloud-Init Prep at VMware (the Guest Tools team), and they're excited about DataSourceVMware, and I've rallied an effort with them to relocate their recent work in the OVF DS to the new VMware DS. So a *huge* thanks to you for pointing out the sub-platform bit to me!
[18:25] <blackboxsw> totally understood. this is a great improvement over the OVF datasource usage. My comments are more nits and paper cuts that were an annoyance in OVF maintenance a bit. Your approach in this VMware datasource is way simpler and much easier to comprehend/maintain. 
[18:27] <blackboxsw> and I defintely recognize your efforts there, and was suprised/pleased that those efforts have come  to a PR for this. Every time I opened OVF datasource for previous merges, I would know the review would take a lot more time than other DSes because of re-learning and wading through a sea of very long methods which are hard to follow
[18:28] <akutz> Oh, my goals re: the OVF DS were not at all the result of anything you said or did. I had taken a more passive approach to merging this DS into CI proper over the last 18 months because of the work on CI Prep and wanting to respect there was a team dedicate to this at VMware. However, with continued calls to merge the GuestInfo DS, and realizing there was no "DataSourceVMware", it occurred to me there was a very simple path to merging the GuestInfo DS
[18:28] <akutz>  while maintaining support for both transport types. That's when I engaged the CI Prep team and we all figured out this path forward.
[18:28] <blackboxsw> Most of the lift too for VMWare-related "cost" in reviews has also typically come due to  lack of access to VMWare products to validate test and re-educate ourselves about the deploy lifecycle
[18:28] <akutz> >  Most of the lift too for VMWare-related "cost" in reviews has also typically come due to  lack of access to VMWare products to validate test and re-educate ourselves about the deploy lifecycle
[18:28] <akutz> Totally fair. That's why I love that it's possible to test/validate the GuestInfo transport with Fusion on the desktop :) 
[18:29] <blackboxsw> +1 definitely
[18:29] <akutz> And huge shoutout to falcojr for turning me onto the @mockpatcher magic. That was amazing.
[18:29] <akutz> I made a lot of changes to the DS for merge, and with the new unit tests I have immense confidence that everything will continue to work as expected. 
[18:40] <akutz> re: the OVF DS -- I cannot speak to why it was designed that way. My guess would be to keep the actual DS free of as much platform specifics as possible. But as you said, it makes chasing things down a PITA. Still, it *should* make it that much easier to move things into the new DS. Then, over time, move them into the DS directly as well to make walking the code easier. Still, with that much code, the walk is always going to be long. I think I will
[18:40] <akutz>  walk it myself and create an ASCII/PlantUML map for future explorers :)
[20:53] <akutz> blackboxsw mentioned he liked a comment I left in response to one of his reviews at https://github.com/canonical/cloud-init/pull/953/files#r682669808.
[20:53] <akutz> > FWIW, I've explained to others that I view CI and its datasources like an NES or SNES. CI is the console, and the datasources are the cartridges. Some datasources, like some cartridges, have extra functionality, i.e. specialized chips, that enhance the overall user experience. With respect to cloud-init, I always assumed this was on purpose.
[20:53] <akutz> Perhaps my impression is mistaken? I thought this would be a good discussion to have and am interested in what others think. Thanks!
[20:57] <blackboxsw> I think we generally get into that mode where each datasource/cloud-platform can own it's own implementation and begins to define features/extensions that are specific to their needs.
[20:59] <blackboxsw> Ultimately there is also a tension try to generalize any "goodnesss" or feature improvements that are generally applicable as best we can to make sure 1. we don't have 30 maintain different copies of slightly different functionality in unique datasources 2. We improve the features, stability, features of cloud-init globally where possible in "the console" or shared utility modules to garner more adoption and more 
[20:59] <blackboxsw> contributions
[21:03] <akutz> Pretty much what I was thinking as well. Datasources are a good way to expose new functionality and limit that exposure at first. But when there is cause, it's also desirable to move the "goodness" into common areas.
[21:05] <blackboxsw> akutz: I'm trying to compare the detection logic you have in ds-identify in your PR vs the python _get_data detection as generally we want the python version of the code reflact do what ds-identify finds using bash. I note that your shell version detects env vars first, but python _get_data does it in a different order
[21:07] <blackboxsw> I don't **think** it's a problem... but trying to wrap my head around it. if we have env vars and rpctools we can never detect data from vmware-rpctool right?
[21:08] <akutz> blackboxsw: Good catch. I wish we could centralize the detection logic somehow to avoid mistakes like this. Regardless, the reason it's less important in *this* case is because those two are two sides of the same coin. The env var is almost a sub platform of a sub platform. The env var support was to introduce the GuestInfo DS to containers. It's not a "first-class" transport like guest info or OVF. Still, there's no reason to flip the order. I will
[21:08] <akutz>  rebase the PR real quick to update the _get_data function to be the same order as the ds-ident code.
[21:11] <blackboxsw> +1, ok, so by that logic flip in python we then could still override images which contain vmware-rpctool with envvars if needed for some strange reason.
[21:12] <blackboxsw> akutz: also one more thing it seems ds-itentify relies strictly on  DI_VIRT == "vmware" we probably need a comparable sanity chek in python _get_data with if "vmware" != dmi.read_dmi_data("system-product-name") : return False  right? or do you expect that we still "can" detect VMware datasource on platforms without system-product-name == "vmware"?
[21:13] <akutz> I'm fixing this now by the way
[21:13] <akutz> re: the order
[21:13] <blackboxsw> +1
[21:14] <akutz> blackboxsw: the only reason I added the DI_VIRT bit in ds-identify was because it was there. I didn't do that when the file was external. I can check to see if I can do that in the _get_data function as well. Still, I thought ds-identify was a little more exhaustive than the other?
[21:15] <akutz> blackboxsw: here's the diff for the order fix - https://gist.github.com/akutz/40825fc02b11d14072706e5c26dc80a4
[21:18] <akutz> blackboxsw: It doesn't appear the Azure DS checks DI_VIRT in DataSourceAzure either? I'm not trying to get around this by pointing to another DS, but simply asking what the preferred approach is here. Shouldn't there be a common call to get that info from the DS side?
[21:20] <blackboxsw> akutz: no worries, I don't think your trying to get out of it. I'm trying to recall the real reason as my recall on this is fuzzy. ds-identify tries to quickly whiddle down the number of potenial datasources which "could" be a match on a system to either a single match, or a list of "maybes". If the remaining list of potential DS matches is 1, then the python code will only attempt to source and run the only datasource 
[21:20] <blackboxsw> ds-identify told us about
[21:21] <akutz> Yeah, I'm unfortunately too familiar with the ds-identify logic. It's how I learned about systemd-generators and why cloud-init did not seem to be installed on an Ubuntu host where I was developing the original DS :)
[21:22] <blackboxsw> I believe some systems have ds-identify disabled... but I need to recheck that "fact". If we don't have ds-identify running on boot in those envs, then the python code I believe will try to run through a list of all potential datasources in order, and if we have a discrepancy in that detection logic, environments where ds-identify is not operational **could** identify a different datasource than the python-only environments
[21:22] <akutz> So even the other DS, IBM, that uses DI_VIRT, does not do a one-to-one match with how they check the virt platform in the IBM DS. Seems to me this there is not a standard mechanism here. I'd be tempted to just remove the DI_VIRT check from the new DS's check function in ds-identify...
[21:22] <blackboxsw> so in your PRs case (and looks like Azure too). Notice OVF does perform this check in both ds-id and DataSourceOVF._get_data.
[21:23] <blackboxsw> sorry <enter> before I reviewed this comment ^.
[21:23] <akutz> OVF does, yes. And so maybe the new DS will get it for free when some of that code moves over. Still, I'm leaning towards removing it from the current PR because it may impact local testing with Fusion. I haven't verified yet if the platform is correctly set on those systems.
[21:23] <akutz> Doing that now
[21:24] <blackboxsw> +1 on removing it from ds-identify from the time being
[21:24] <akutz> Ack
[21:25] <blackboxsw> right becase if dis-id doesn't exist in some platform, you could "return True" in DataSourceVMware._get_data if     "vmware" != read_dmi_data("system-product-name")             . But a ds-id system would say no I didn't detect VMware as an option
[21:26] <akutz> Yep
[21:27] <blackboxsw> +1 on https://gist.github.com/akutz/40825fc02b11d14072706e5c26dc80a4
[21:28] <akutz> $ sudo systemd-detect-virt
[21:28] <akutz> vmware
[21:28] <akutz> (that's from Fusion)
[21:28] <blackboxsw> excellent, ok so maybe could remain be strict if you want it
[21:28] <akutz> so if we do want to put it back at some point in the new dscheck_VMware function, it will still be backwards-compat
[21:29] <akutz> blackboxsw: this is what I have right now. I haven't pushed it - https://gist.github.com/akutz/0e0b2fbaba26e42dc85adbb45030ff12
[21:29] <akutz> I'd rather work with y'all to figure out how to port this same logic into the DS code in a common way. Basically port the ds-identify file's "detect_virt" function into CI python so it's a field set at the base DS level.
[21:30] <blackboxsw> right, sounds like it. as it is now, systems without sudo systemd-detect-virt == "vmware" will spend more time looking for 
[21:30] <blackboxsw>  presence of "vmware_has_rpctool"   but this is a minor time cost (and VMware detection is last in the discovery list too, so no impact to other datasources)
[21:30] <blackboxsw> ...reading
[21:31] <akutz> I'm fine putting the platform check into ds-identify for now and following up with a comparable check in the DS itself at a later date. Then again, as you said, the rpctool check isn't that expensive either, and VMware is last in the list.
[21:34] <akutz> I'm going to put the check in ds-identify and will add it later to the datasource. Does that work for you? I want to make sure we replicate the "detect_virt" function in Python before I add the check into the DS.
[21:34] <akutz> Right now it seems other DSs do their own thing to account for the fact that detect_virt isn't a thing in the Python side.
[21:36] <akutz> Although the current detect_virt logic in ds-identify does rely on SystemD being present if not FreeBSD
[21:36] <akutz> Hmmm. How about we omit the check for now and figure this out post-merge?
[21:37] <blackboxsw> akutz: what does this show on Fusion  python3 -c 'from cloudinit.dmi import read_dmi_data; print(read_dmi_data("system-product-name"))'
[21:37] <akutz> VMware7,1
[21:38] <akutz> So VMware is in there.
[21:38] <akutz> Is that field somewhere in the sys-info available to the DSs?
[21:38] <blackboxsw> hrm ok, we'll that'd break I think then unless we "vmware" in system_type.lower() or systemtime.lower().startswith("vmware")
[21:38] <blackboxsw> the DS can call dmi.read_dmi_data("system-product-name") like OVF currently does for that check
[21:39] <akutz> Ack
[21:39] <akutz> I don't mind doing that check then
[21:39] <akutz> Let me go ahead and add that. 
[21:39] <blackboxsw> here is what I was thikning https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceOVF.py#L89
[21:40] <blackboxsw> it's best to be strict, but not completely necessary for now if you want to avoid it until next iteration. it's just good of ds-id and the python DataSourceVMware behave equivalently.
[21:41] <blackboxsw> so either add something in both, or drop from both probably
[21:43] <blackboxsw> sure enough https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceOVF.py#L99   'vmware' in system_type.lower():
[21:46] <akutz> Yep, that's what I copied
[21:46] <akutz> I'm trying to figure out what to mock to make sure the tests pass. Do you know if the data is read via a file?
[21:48] <blackboxsw> akutz: a sysfs file, but that'd be hard.
[21:48] <blackboxsw> might want to just @mock.path("dmi.read_dmi_data") .... I'm checking now 
[21:48] <blackboxsw> also https://github.com/canonical/cloud-init/pull/953/files#r685543970
[21:48] <akutz> ah, it's "/sys/class/dmi/id/product_name"
[21:49] <akutz> Shouldn't be hard to mock
[21:49] <blackboxsw> ohhh you'd mock util.load_file or something
[21:49] <blackboxsw> +1. yeah
[21:50] <blackboxsw> also just wondering about the strict version dependency on netifaces. that might be a blocker to us releasing DataSourceVMware  support on earlier Ubuntu series.
[21:50] <blackboxsw> Bionic and Focal 
[21:50] <blackboxsw> python3-netifaces | 0.10.4-0.1build4 | bionic 
[21:50] <blackboxsw> we can cross that bridge if we have to, and limit the VMware datasource to  python3-netifaces | 0.10.9-0.2       | hirsute  and later.
[21:51] <blackboxsw> but I'm hoping the version diff doesn't matter
[21:52] <minimal> so what happens with detect_virt on a non-systemd Linux machine? (e.g. Alpine, which I maintain, or Gentoo)?
[21:52] <akutz> minimal: Today detect-virt depends on systemd or FreeBSD
[21:53] <akutz> But I didn't introduce that -- it's just the way it's written.
[21:53] <akutz> blackboxsw: Earlier versions of netifaces should likely work as well. I just picked the version that was there when I started using it.
[21:54] <akutz> but just like I started dropping all Py2 support, I was only looking forward with dependencies as well.
[21:54] <akutz> (following y'all's lead WRT Python)
[21:54] <akutz> I'll drop the version to 0.10.4 after looking at the release notes from the project
[21:55] <blackboxsw> +1 either way, we can tweak that in a separate PR later too
[21:55] <akutz> Yeah, it looks like we should stick with 0.10.9. At least 0.10.5! https://github.com/al45tair/netifaces/blob/master/CHANGELOG
[21:56] <akutz> In 0.10.5 they fixed "Respect interface priorities when determining default gateway"
[21:56] <akutz> That's a huge thing for the DS
[21:56] <akutz> I recommend focusing post-merge on a plan to move away from netifaces or how to backport specific things?
[21:56] <blackboxsw> +1 right
[21:56] <akutz> I'm fine with the DS not being in earlier Ubuntu versions for now
[22:16] <akutz> blackboxsw: https://gist.github.com/akutz/d7bc0d5b68dd32451fd2d152bff83f13
[22:16] <akutz> I'm going to go ahead and rebase the PR with the above diff that includes the virt type check for all non-env var transports
[22:20] <akutz> Okay, the PR is rebased with the above patch to now check for the platform in both places.
[22:23] <akutz> blackboxsw: Saw your comment on the diff -- it *should* pass the tests. I've been running them locally ahead of pushing changes. 
[22:23] <akutz> With "make clean_pyc && PYTHONPATH="$(pwd)" python3 -m pytest -v tests/unittests/test_datasource/test_vmware.py tests/unittests/test_ds_identify.py"
[22:26] <blackboxsw> +1 excellent. I just mean I think we can land it if the continuous integration jobs pass on it. via `tox -p auto` I think
[22:26] <blackboxsw> takes into account pyflakes etc
[22:27] <blackboxsw> pylint  pytest  etc.
[22:31] <akutz> I check that stuff locally as well. Well, I run black on it. The only thing it doesn't catch are unnecessary imports. But I check those manually.
[22:32] <akutz> Not saying to not wait on CI. Just saying I expect it to pass :)
[22:32] <akutz> (it's totally going to fail now)
[22:37] <blackboxsw> PR looking good, final integration test running. expect we'll be able to land before morn. have to step away