catphish | not strictly a cloud-init question, but i'm trying to encrypt my cloud-init data with AES, but a clean ubuntu box doesn't seem to have python module "Crypto", i'm not actually very familiar with python, should it have an AES library built in? or am i not going to be able to do this? | 10:10 |
---|---|---|
catphish | looks like i can solve this by piping the data to the openssl CLI, it's a little painful but effective | 11:14 |
catphish | openssl enc -d -base64 -aes256 -K 4e3077305271704834626963785a62785a567a6245436f546353753145457376 -iv 47a8326ba88ae1eed21fa74266f4363a <<< HY6jY3tYIHFCbvnsZnQBCSjUXg9Jw8c5oS6utnW0H03M6u/E6rclSk1iIqN41byS | 11:14 |
=== vrubiolo1 is now known as vrubiolo | ||
Odd_Bloke | catphish: What cloud-init data are you referring to? (And how does the symmetric key get into the instance?) | 14:52 |
AnhVoMSFT | @Odd_Bloke Can you review this PR when you have a chance? https://github.com/canonical/cloud-init/pull/591 there's a comment from Ryan that needs your feedback. I did test upgrade scenario and found no issue (even when not deleting the obj.pkl) | 14:54 |
catphish | Odd_Bloke: my network data source sends data AES256 encrypted, the symmetric key is in the BIOS | 15:18 |
catphish | this seemed a better idea than any other method of ensuring that only the correct VM can obtain and use its data | 15:23 |
catphish | (that i could think of) | 15:23 |
catphish | though another reasonable option is to use the PSK in the BIOS as a password, and use that to request the data over TLS, i was just struggling to work out what CA I'd use | 15:25 |
catphish | i was trying to keep it largely free from any external requirements, requiring a public certificate signed by my org's domain name, and then just sending a password from the BIOS over the TLS to request the plain data might be sufficient though | 15:28 |
catphish | i'm really just trying to ensure it's maintenance free | 15:28 |
catphish | tldr: trying to design a secure way that my VM can trust and fetch cloud-init data from its hypervisor securely, with a minimum of need to hardcode any kind of key into the module | 15:32 |
AnhVoMSFT | @rharper do you have suggestion for how to write the unit test that you suggested? Generating the list of interfaces for renaming come from stages and not the datasource | 15:34 |
AnhVoMSFT | stages do reset the distro object that comes into datasource (this caught me by surprise, which was why i had to move the blacklist instantiation into _get_data and not in datasource's _init) | 15:37 |
rharper | AnhVoMSFT: stages does make the change, but it invokes methods on ds object | 15:37 |
AnhVoMSFT | right, but how do I make the kind of unit tests that would basically go through the flow of stages invoking DS's init, then reset the distro, then calling _get_data. That would be a more useful test | 15:38 |
AnhVoMSFT | because later if someone either changes stages' resetting the distro object in a different place, or move where the DSAzure is setting the blacklist, we want the test to fail so that it can be paid attention to | 15:38 |
AnhVoMSFT | whatever datasource does to distro in __init__ is completely irrelevant as stages will reset distro to None and supply a new distro instance later | 15:40 |
rharper | I'm not sure what reset you mean? the distro object is created and the passed to the Datasource constructor; on every boot; is that what you meant by reset ? | 15:41 |
AnhVoMSFT | https://github.com/canonical/cloud-init/blob/5bf287f430b97860bf746e61b83ff53b834592d0/cloudinit/stages.py#L265 | 15:43 |
AnhVoMSFT | so the distro object there gets reset to None, a new object will be created when it's accessed again the next time | 15:46 |
rharper | that resets the stages object; in particular, it's called without reset_ds=True, so it's only modifying the stage's copy of _distro _path and _cfg; the self.datasource is the real DS that's used | 15:46 |
AnhVoMSFT | right, but stages.distro is the one that is passed to networking | 15:46 |
AnhVoMSFT | so if the datasource is to influence the distro object (by setting the blacklist_drivers), it needs to do it in the right place. It can't assume that the same distro object exists throughout boots | 15:47 |
rharper | no, it uses stages.distro, not stages._distro | 15:47 |
AnhVoMSFT | stages.distro is a @property that would return stages._distro if instantiated, otherwise it would create instantiate a new one, is that not the right assumption? | 15:48 |
rharper | AnhVoMSFT: I see your issue; it makes a new class on each access | 15:48 |
rharper | distros.fetch() | 15:48 |
AnhVoMSFT | is it possible to write a unit test for such interaction (that would essentially need two boots, I think?) | 15:49 |
rharper | so state can;t be stored in the instantiated distro ... so how is this working then ? or does the blacklist get reapplied as well (you moved it to _get_ds() | 15:49 |
rharper | yeah, you just call stages.Init() just like it's called in cmd/main.py | 15:50 |
AnhVoMSFT | the blacklist gets reapplied every time _get_data is called | 15:50 |
rharper | right, to deal with the distro getting recreated each stage | 15:50 |
AnhVoMSFT | actually as long as I write a test that would invoke stages' distro.networking.wait_for_physdevs(netcfg) and come out with the right netcfg that should be good | 15:52 |
AnhVoMSFT | https://github.com/canonical/cloud-init/blob/5bf287f430b97860bf746e61b83ff53b834592d0/cloudinit/stages.py#L699 | 15:53 |
rharper | you don't ahve to use stages; but yes; | 15:53 |
rharper | you can create a DS, passing it a distro object; 1) confirm that ds.distro.networking.blacklist_drivers matches what's in DatasourceAzure; | 15:53 |
AnhVoMSFT | that makes sense | 15:53 |
rharper | and 2) mocking cloudinit.net.get_devices (or whatever the main call) is called with the correct blacklist_drivers values | 15:54 |
AnhVoMSFT | need to invoke the DS' _get_data, then verify the blacklist | 15:54 |
rharper | yep | 15:54 |
AnhVoMSFT | cool, let me work on that | 15:54 |
AnhVoMSFT | thanks Ryan | 15:54 |
AnhVoMSFT | would you want me to keep the unit test I wrote for net.get_interfaces or we don't really need that test at all | 15:56 |
rharper | I don't think we need it; there's an existing blacklist filter test | 15:57 |
AnhVoMSFT | ok | 15:57 |
Odd_Bloke | catphish: Aha, OK, I'd forgotten you were working on a datasource. I'm not aware of any other DS doing something like this, so I don't have any particularly useful advice to give. Using a key from the BIOS sounds like a reasonable path forward; I would suggest shelling out to `openssl` for now, only because that's going to be more readily available than a Python library, I expect. | 15:58 |
catphish | Odd_Bloke: that was my conclusion so far, thank you, though i will look into using SSL and doing the encryption at that layer instead, rather than encrypting the data itself | 16:00 |
Odd_Bloke | rharper: You are correct that `obj.distro.networking` is not present after upgrade to a cloud-init version where we would expect it to be present; I think making `Distro.networking` a property which instantiates `self.networking_cls` (which is a _class_ attribute, so not stored in the pickle and so read from the class definition in the current code) as needed will do the trick. | 16:17 |
rharper | Odd_Bloke: ok; even though azure resets the pkl object on each boot; immediately after upgrade; if users run cloud-init commands which would load the object; I think we'd see issues | 16:20 |
Odd_Bloke | Yeah, we'd also see the issue on non-Azure platforms. | 16:23 |
AnhVoMSFT | why did I not run into any issue when I tested upgrade (I explicitly removed the code that would delete the obj.pkl) | 16:23 |
Odd_Bloke | That's a good question. | 16:23 |
AnhVoMSFT | is it because _get_data was actually not called at all, so the code wouldn't run | 16:25 |
rharper | you'd need to do something like: cloud-init init to force it to run again | 16:25 |
rharper | we've had users manually run those commands after upgrade (which isn't needed, but shouldn't break things) | 16:25 |
AnhVoMSFT | I did upgrade, then remove the systemd unit responsible for deleting the obj.pkl, then reboot | 16:26 |
AnhVoMSFT | so in effect that should have called cloud-init init again. I think when obj.pkl existed, the _get_data method isnt invoked on reboot, so the access to .networking object was not executed | 16:26 |
rharper | yes | 16:28 |
Odd_Bloke | A minimal reproducer using LXD: https://paste.ubuntu.com/p/g5Vp62h2g3/ | 16:30 |
rharper | Odd_Bloke: yeah, AnhVoMSFT your test upgraded from an instance which already had the refactor done; | 16:33 |
AnhVoMSFT | ah I see - so technically this issue already existed before my change? | 16:34 |
Odd_Bloke | Yeah, I think we have a latent bug there, though IIUC it's unlikely to be triggered in normal usage ATM. | 16:50 |
Odd_Bloke | I have a commit locally that should address the missing .networking issue; I'm off on Monday so I'll do a bit more testing on Tuesday then propose it. | 17:11 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!