[09:38] Merged stackforge/cloud-init: Use an explicit absolute import for importing the logging module https://review.openstack.org/210035 [09:49] claudiupopa: Could you workflow +1 https://review.openstack.org/#/c/202743/ ? [09:49] Oh, is it short a +2, actually? [09:50] Is Scott happy with it? [09:51] claudiupopa: I think so. [09:51] claudiupopa: And I think we said that I'd push forward with main stuff. [09:52] Cool. [09:52] Then I'm happy with it as it is. [09:52] +1ed [09:52] For workflow. [09:52] By the way, could you take a look again at the plugin patch? [09:52] I don't have tests, but I'll appreciate a comment regarding the direction. [09:55] Merged stackforge/cloud-init: add cloud-init main https://review.openstack.org/202743 [09:58] claudiupopa: So with parallel discovery, we'd still load the code from the disk serially? [09:59] Good question. I think it depends on the iterator's flavour. [09:59] Right now the loading is serial. [10:02] claudiupopa: Should filtering by name be a strategy? [10:02] It could be. [10:03] claudiupopa: We don't actually have anywhere calling get_data_source with a list of strategies yet, right? [10:04] Yep. [10:05] claudiupopa: How would a FilterByNamesStrategy be created? [10:07] writing right now an example. [10:07] Thanks! [10:08] Something like this http://paste.openstack.org/show/412159/ [10:09] Although _names should be passed somehow to the strategy. [10:09] Yeah, that was the bit I couldn't quite work out. [10:09] The strategies could be instantiated, and have a method that does the filtering? [10:10] You mean a separate method? [10:10] One for loading the data sources and another one for filtering? [10:11] Mm, the idea is to combine multiple of them to do the filtering, since trying to see if a data source is available or not is still considered a filtering operation. [10:12] I could instantiate them beforehand, in get_data_source. [10:13] And I could pass names only to the FilteringByNameStrategy. [10:13] So BaseSearchStrategy.__init__ wouldn't take any parameters by default, and search_data_source would become search_data_sources(). [10:13] And you'd pass the return of that in to the next search_data_sources. [10:14] (Rather than in to the constructor of the next strategy, as you do now) [10:14] Oh, that could work. [10:15] So I think you would instantiate them in get_data_source, yeah. [11:22] Hi. How can I run a cloud init script on an already installed instance? I've found that I gotta trick cloud-init into thinking this is a fresh boot, but I can't understand where should I place my cloud init file. [11:23] trueneu: Why do you want to run cloud-init, rather than just running a shell script etc.? [11:24] It's in a neat cloud config form, and it failed to execute at boot somehow, so I need to re-do it. [11:31] Claudiu Popa proposed stackforge/cloud-init: Add an API for loading a data source https://review.openstack.org/209520 [12:02] Odd_Bloke, or harlowja or claudiupopa your thoughts on my https://code.launchpad.net/~smoser/cloud-init/trunk.reporting/+merge/266578 (0.7) woudl be appreciated. [12:07] smoser: Are registry and reporting copy-paste backports from 2.0? [12:15] smoser: Oh, no, there's a WebHookHandler in there? [12:15] smoser: Still don't know why you aren't getting stuff in to 2.0 so we can do a copy-paste backport. [12:15] Rather than doing a copy-paste backport, a change, and then a forward-port. [12:16] copy & paste + imports + http://bazaar.launchpad.net/~smoser/cloud-init/trunk.reporting/revision/1155 [12:16] and the webhookhandler. [12:17] Odd_Bloke, because of time line is all. [12:20] and now that i think about it i think that code in that one doesnt work. [12:21] the goal of the change there is to re-initialize if different. [12:21] but i think the check there is comparing a dict to a class. [12:52] Daniel Watkins proposed stackforge/cloud-init: Fix running cloud-init with no arguments on Python 3. https://review.openstack.org/210381 [13:00] smoser: claudiupopa: Minor fix to main. ^ [13:01] Why doesn't parsed have the func attribute? [13:01] because it didn't have a subcommand. [13:01] claudiupopa: It's a bug in Python 3, I think. [13:02] maybe you can set_defautls on func to get it to call help ? [13:10] smoser: That works on Python 3, but not on Python 2. [13:55] smoser: claudiupopa: So that change gives us consistent behaviour on Python 2 and 3. [13:56] smoser: claudiupopa: Getting Python 2 to do something different will mean pre-empting the parser, because just parsing the arguments is what throws up the error. [13:57] I see. [13:57] Then it seems fine for me. [14:12] smoser: We have several different stages defined in cloudinit.shell, but I thought we were going to be running cloud-init as an agent (which would, presumably, only involve a single call to cloud-init). [14:47] smoser: claudiupopa: harlowja: I'm trying to work out how to name things; I'm going to work on persisting a discovered data source to disk (so that future runs don't have to perform discovery). What should I name the data that cloud-init has derived from its environment? [14:48] It's not metadata, vendor-data or user-data; those are all inputs. [14:48] Maybe 'configuration', but that would seem to be more appropriate as the stuff in /etc that defines how cloud-init will run on an instance. [14:48] Any thoughts? [14:48] persisting data source to disk, as in caching? [14:49] claudiupopa: So one of the stub commands in cloudinit.shell is 'search', which will 'search available data sources'. [14:49] ok. Odd_Bloke sorry, didnt' responde before [14:49] so the stages... there are still stages that have to run in boot [14:50] there might be a daemon that starts very early, and the stages communicate with that daemon. that is a possible implementation. [14:50] also possible is that a daemon just starts later. [14:50] but either way, as far as my vision can see, we'll have upstart or sysvinit or systemd jobs that run at points in boot [14:50] that is what those stages are for. [14:51] I think making it possible to not run a daemon would be good; I can imagine people who are happy with cloud-init as-is not wanting an extra process running. [14:51] wrt storing data, i think 'cache' sounds reasonable [14:52] you'll never have to run the daemon [14:52] even if it ran in boot, that'd just be an imlementatyion detail [14:52] and then it'd shut itself down. [14:52] but we can worry about that laer. [14:52] I'm not sure it is, strictly speaking, a cache though; some data sources will only be able to fetch information a single time. [14:52] (For example, CloudStack passwords can only be read once) [14:53] so metadata, userdata and vendordata all represents the same thing, an input data that's used to drive cloud-init. [14:53] How about drive data? [14:53] Sau execution data. [14:53] Or* [14:54] Actually, this is basically what would go in /var/lib/cloud/instance ATM; how about 'instance data'? [14:55] Yep, that sounds good as well. === rangerpbzzzz is now known as rangerpb [15:05] smoser: Thanks for the info on the commands. :) [15:06] claudiupopa: smoser: So, next question: what do we want the data to look like when serialised on-disk? [15:09] claudiupopa: smoser: I'm thinking we could persist a dictionary as JSON, but I don't know if we have lessons from 0.7.x that suggest that's a bad idea. [15:09] why should it be a bad idea? I was thinking on JSON as well. [15:12] claudiupopa: Well, that's not how we do it in 0.7.x; I wasn't sure if that was intentional or not. :p [15:13] JSON would be nice if there aren't gotchas from 0.7.x that Odd_Bloke refers to. [15:14] by the way, the caching is persistent per cloud-init's run or it's always there? [15:14] claudiupopa: I would expect it to always be there. [15:14] because some portions of data shouldn't stay there for longer, such as passwords. [15:15] Potentially the consumers of that data should be responsible for clearing it out? [15:16] before it"s serialized on disk? [15:17] It would be good to be able to separate the "fetch all the data we need" step from the "use the data" step. [15:17] No, I think it would be serialised to disk. [15:17] And then whatever handles passwords removes passwords from the serialised data. [15:18] (Side note: If someone can read the password from the disk, they're probably already in a position to do whatever they want anyway. :p) [15:18] that doesn't seem very good, since it's not separating the concerns properly. [15:19] Yeah, that's also true. [15:19] But anyway it's harder to read it from memory rather than from disk. ;-) [15:19] I'm thinking that special-casing passwords isn't particularly useful. [15:19] claudiupopa: It's easier to just set it to whatever you want than read it from disk. ;) [15:20] Because there could be other private data that shouldn't be persisted long-term. [15:20] Maybe having a way to specifiy that a piece of data should never be serialized? [15:20] @dont_serialize_this [15:21] claudiupopa: That does mean (e.g.) setting passwords in the same process as fetches the password from wherever the password is fetched from [15:22] agree with most of what is a bove. [15:22] in order to avoid ipc? If the agent is not involved, I would expect it to happen in the same process nevertheless. [15:22] json i think is fine with me. i used pckl in cloud-init 0.7 largely because it is simpler (i picked the class). [15:24] What if we just deprecate passwords in cloud-init 2.0 (and Ubuntu 16.04 cloud images)? :p [15:24] i think we kind of *have* donethat [15:25] well, on windows they're still somehow required. [15:25] You never know, perhaps 2016 will finally be the Year of Windows on the Cloud. ;) [15:25] smoser: What are your thoughts on persisting passwords to disk? [15:27] Hmm, could we hash the passwords ourselves before putting them on disk? [15:29] (This is, obviously, special-casing passwords like I said I didn't want to do :p) [15:29] how about specific exemption? [15:29] Having a decorator that marks a particular piece of data as non serializable. [15:30] Right, but that then means that we have to use that data before this particular process dies. [15:30] wll, you may need to persist them for some time [15:30] right. [15:30] yeah. [15:30] we can do some thing. like hashing i dont think its unreasonable. [15:31] if the perms on the data are correct, its sane [15:31] and then after we consume it we can remove that data. [15:31] it obviously did get written... maybe we'd need to shred [15:32] The same thing happens with hashing, the password will not be available anymore after deserialization. [15:32] as in we'll have a hash that can't be used. [15:32] Why couldn't it be used? [15:33] Ah, I'm guessing you can't use the hash of a password to set a password on Windows? [15:33] Nope. ;-) [15:33] *buys a cheap Windows laptop on eBay, so he can throw it out of the window* :p [15:37] OK, I think I can implement the first pass as 'serialise all the things' and then we can work out the nuance later. [15:40] we still have operators that use password and may want it set. I'm not defending the practice but it is still done. [15:41] do we know for sure we'll have separate processes serializing the data vs those that consume it? [15:41] smatzek: Currently there are two different cloud-init sub-commands defined which would do each bit. [15:42] as stated above I think there may be other cases of private or sensitive data that we may not want sitting around on disk, so the sensitive tag idea might be worth pursuing. [15:42] Odd_Bloke, this does go towards a larger thread. [15:43] with the goal of cloud-init query [15:43] whether that hits a daemon or hits a cache, we want user to be able to get some bits of data [15:43] and some bits to be privildged access only [15:44] another item that may be sensitive is the chef module's validation_key which is a private RSA key. That might be good to delete/shred once the chef module is done running. [15:47] So my proposal is (1) we persist all the data to disk, and then (2) individual modules are responsible for shredding whatever data they consider sensitive (and no longer needed). [15:49] Actually, we could have data sources provide a way of fetching passwords. [15:50] And then the modules that care about passwords use that. [15:50] But that doesn't solve the case where the password(s) are in user-data. [15:51] why they are two steps? [15:51] data retrieval and persistance and execution? [15:52] I think I'm missing context here. [15:52] I don't see why they would be one step (except for the issue we are discussing now). :p [15:53] I'm taking my lead from smoser having stubbed out 'search' and 'config' as separate subcommands. [15:54] 'search' need not necessarily encompass actual fetching of the data, I guess. [15:54] Which I have been assuming. [15:56] I guess cloud-provided data can also change in the meantime. [15:56] So maybe we shouldn't be persisting much of this stuff at all... === zz_natorious is now known as natorious [17:10] hmmmm, Odd_Bloke i suck at naming things :-P [17:10] :D [17:10] put stuff into a little sqlite.db file , profit? [17:10] persistence.db [17:11] there u go [17:11] lol [17:11] * harlowja is brillant [17:14] honest question, why not just store it in some /var/cloud/persistence.db or something [17:14] might be nice to have a little sqlite thing [17:14] i know i know the filesystem is currently used for this [17:37] Hello, I am trying to use cloud-init on centos7, v0.7.5. from cloud-init-0.7.5-10.el7.centos.1.x86_64. I am using it to install chef, however the AMI I have is pre-hardened and has noexec set on /tmp. The chef init script tries to download and run the installation out of /tmp which fails. The chef script honors tmpdir, so if I can reset the tmpdir environmental variable prior to the chef module then it'll work. Is there a way to do that in [17:37] cloud-init? [17:42] chef runs during cloud_config_modules. Looking at that module list I don't see any module where you could run arbitrary commands, scripts before it runs. You may be able to use bootcmd, which runs in cloud_init_modules to change the system env, so the process that cloud_config_modules would pick it up, but I'm not sure if that would work. [17:45] bootcmds would run in a subshell though, wouldn't it? [17:45] so the env change would be lost [17:46] I was hoping there was a way in cloudinit natively to set environmental variables for commands [18:07] Or, changing util.subp([tmpf], capture=False) to util.subp(['sh', tmpf], capture=False) === rangerpb is now known as rangerpbzzzz [21:24] For the chef module, I'd like node_name to be something like 'prefix-$INSTANCEID' as opposed to a static prefix [21:24] and not just instance-id [21:24] is there anyway to do that? [21:26] in other words, when using this in a autoscale group I can't use one single node-name w/chef, but its not very friendly to use just i-a234tg names [21:26] so for each autoscale group I'd put something like groupname-instanceid [21:26] ideally === natorious is now known as zz_natorious [22:13] the chef mechanism also needs a way to lay down the encrypted data bag key [22:14] mm perhaps with write_files [22:35] but it needs a way to at least specify the location === hatchetation_ is now known as hatchetation [22:52] and chef only seems to run if its installed via gems?