[10:10] <catphish> not strictly a cloud-init question, but i'm trying to encrypt my cloud-init data with AES, but a clean ubuntu box doesn't seem to have python module "Crypto", i'm not actually very familiar with python, should it have an AES library built in? or am i not going to be able to do this?
[11:14] <catphish> looks like i can solve this by piping the data to the openssl CLI, it's a little painful but effective
[11:14] <catphish> openssl enc -d -base64 -aes256 -K 4e3077305271704834626963785a62785a567a6245436f546353753145457376 -iv 47a8326ba88ae1eed21fa74266f4363a <<< HY6jY3tYIHFCbvnsZnQBCSjUXg9Jw8c5oS6utnW0H03M6u/E6rclSk1iIqN41byS
[14:52] <Odd_Bloke> catphish: What cloud-init data are you referring to?  (And how does the symmetric key get into the instance?)
[14:54] <AnhVoMSFT> @Odd_Bloke Can you review this PR when you have a chance? https://github.com/canonical/cloud-init/pull/591 there's a comment from Ryan that needs your feedback. I did test upgrade scenario and found no issue (even when not deleting the obj.pkl)
[15:18] <catphish> Odd_Bloke: my network data source sends data AES256 encrypted, the symmetric key is in the BIOS
[15:23] <catphish> this seemed a better idea than any other method of ensuring that only the correct VM can obtain and use its data
[15:23] <catphish> (that i could think of)
[15:25] <catphish> though another reasonable option is to use the PSK in the BIOS as a password, and use that to request the data over TLS, i was just struggling to work out what CA I'd use
[15:28] <catphish> i was trying to keep it largely free from any external requirements, requiring a public certificate signed by my org's domain name, and then just sending a password from the BIOS over the TLS to request the plain data might be sufficient though
[15:28] <catphish> i'm really just trying to ensure it's maintenance free
[15:32] <catphish> tldr: trying to design a secure way that my VM can trust and fetch cloud-init data from its hypervisor securely, with a minimum of need to hardcode any kind of key into the module
[15:34] <AnhVoMSFT> @rharper do you have suggestion for how to write the unit test that you suggested? Generating the list of interfaces for renaming come from stages and not the datasource
[15:37] <AnhVoMSFT> stages do reset the distro object that comes into datasource (this caught me by surprise, which was why i had to move the blacklist instantiation into _get_data and not in datasource's _init)
[15:37] <rharper> AnhVoMSFT: stages does make the change, but it invokes methods on ds object
[15:38] <AnhVoMSFT> right, but how do I make the kind of unit tests that would basically go through the flow of stages invoking DS's init, then reset the distro, then calling _get_data. That would be a more useful test
[15:38] <AnhVoMSFT> because later if someone either changes stages' resetting the distro object in a different place, or move where the DSAzure is setting the blacklist, we want the test to fail so that it can be paid attention to
[15:40] <AnhVoMSFT> whatever datasource does to distro in __init__ is completely irrelevant as stages will reset distro to None and supply a new distro instance later
[15:41] <rharper> I'm not sure what reset you mean?   the distro object is created and the passed to the Datasource constructor;  on every boot; is that what you meant by reset ?
[15:43] <AnhVoMSFT> https://github.com/canonical/cloud-init/blob/5bf287f430b97860bf746e61b83ff53b834592d0/cloudinit/stages.py#L265
[15:46] <AnhVoMSFT> so the distro object there gets reset to None, a new object will be created when it's accessed again the next time
[15:46] <rharper> that resets the stages object; in particular, it's called without reset_ds=True, so it's only modifying the stage's copy of _distro _path and _cfg;  the self.datasource is the real DS that's used
[15:46] <AnhVoMSFT> right, but stages.distro is the one that is passed to networking
[15:47] <AnhVoMSFT> so if the datasource is to influence the distro object (by setting the blacklist_drivers), it needs to do it in the right place. It can't assume that the same distro object exists throughout boots
[15:47] <rharper> no, it uses stages.distro, not stages._distro
[15:48] <AnhVoMSFT> stages.distro is a @property that would return stages._distro if instantiated, otherwise it would create instantiate a new one, is that not the right assumption?
[15:48] <rharper> AnhVoMSFT: I see your issue;  it makes a new class on each access
[15:48] <rharper> distros.fetch()
[15:49] <AnhVoMSFT> is it possible to write a unit test for such interaction (that would essentially need two boots, I think?)
[15:49] <rharper> so state can;t be stored in the instantiated distro ... so how is this working then ?  or does the blacklist get reapplied as well (you moved it to _get_ds()
[15:50] <rharper> yeah, you just call stages.Init() just like it's called in cmd/main.py
[15:50] <AnhVoMSFT> the blacklist gets reapplied every time _get_data is called
[15:50] <rharper> right, to deal with the distro getting recreated each stage
[15:52] <AnhVoMSFT> actually as long as I write a test that would invoke stages' distro.networking.wait_for_physdevs(netcfg) and come out with the right netcfg that should be good
[15:53] <AnhVoMSFT> https://github.com/canonical/cloud-init/blob/5bf287f430b97860bf746e61b83ff53b834592d0/cloudinit/stages.py#L699
[15:53] <rharper> you don't ahve to use stages; but yes;
[15:53] <rharper> you can create a DS, passing it a distro object;  1) confirm that ds.distro.networking.blacklist_drivers matches what's in DatasourceAzure;
[15:53] <AnhVoMSFT> that makes sense
[15:54] <rharper> and 2)  mocking cloudinit.net.get_devices (or whatever the main call)  is called with the correct blacklist_drivers values
[15:54] <AnhVoMSFT> need to invoke the DS' _get_data, then verify the blacklist
[15:54] <rharper> yep
[15:54] <AnhVoMSFT> cool, let me work on that
[15:54] <AnhVoMSFT> thanks Ryan
[15:56] <AnhVoMSFT> would you want me to keep the unit test I wrote for net.get_interfaces or we don't really need that test at all
[15:57] <rharper> I don't think we need it; there's an existing blacklist filter test
[15:57] <AnhVoMSFT> ok
[15:58] <Odd_Bloke> catphish: Aha, OK, I'd forgotten you were working on a datasource.  I'm not aware of any other DS doing something like this, so I don't have any particularly useful advice to give.  Using a key from the BIOS sounds like a reasonable path forward; I would suggest shelling out to `openssl` for now, only because that's going to be more readily available than a Python library, I expect.
[16:00] <catphish> Odd_Bloke: that was my conclusion so far, thank you, though i will look into using SSL and doing the encryption at that layer instead, rather than encrypting the data itself
[16:17] <Odd_Bloke> rharper: You are correct that `obj.distro.networking` is not present after upgrade to a cloud-init version where we would expect it to be present; I think making `Distro.networking` a property which instantiates `self.networking_cls` (which is a _class_ attribute, so not stored in the pickle and so read from the class definition in the current code) as needed will do the trick.
[16:20] <rharper> Odd_Bloke: ok;  even though azure resets the pkl object on each boot; immediately after upgrade; if users run cloud-init commands which would load the object; I think we'd see issues
[16:23] <Odd_Bloke> Yeah, we'd also see the issue on non-Azure platforms.
[16:23] <AnhVoMSFT> why did I not run into any issue when I tested upgrade (I explicitly removed the code that would delete the obj.pkl)
[16:23] <Odd_Bloke> That's a good question.
[16:25] <AnhVoMSFT> is it because _get_data was actually not called at all, so the code wouldn't run
[16:25] <rharper> you'd need to do something like: cloud-init init  to force it to run again
[16:25] <rharper> we've had users manually run those commands after upgrade (which isn't needed, but shouldn't break things)
[16:26] <AnhVoMSFT> I did upgrade, then remove the systemd unit responsible for deleting the obj.pkl, then reboot
[16:26] <AnhVoMSFT> so in effect that should have called cloud-init init again. I think when obj.pkl existed, the _get_data method isnt invoked on reboot, so the access to .networking object was not executed
[16:28] <rharper> yes
[16:30] <Odd_Bloke> A minimal reproducer using LXD: https://paste.ubuntu.com/p/g5Vp62h2g3/
[16:33] <rharper> Odd_Bloke: yeah, AnhVoMSFT your test upgraded from an instance which already had the refactor done;
[16:34] <AnhVoMSFT> ah I see - so technically this issue already existed before my change?
[16:50] <Odd_Bloke> Yeah, I think we have a latent bug there, though IIUC it's unlikely to be triggered in normal usage ATM.
[17:11] <Odd_Bloke> I have a commit locally that should address the missing .networking issue; I'm off on Monday so I'll do a bit more testing on Tuesday then propose it.