[09:38] <openstackgerrit> Merged stackforge/cloud-init: Use an explicit absolute import for importing the logging module  https://review.openstack.org/210035
[09:49] <Odd_Bloke> claudiupopa: Could you workflow +1 https://review.openstack.org/#/c/202743/ ?
[09:49] <Odd_Bloke> Oh, is it short a +2, actually?
[09:50] <claudiupopa> Is Scott happy with it?
[09:51] <Odd_Bloke> claudiupopa: I think so.
[09:51] <Odd_Bloke> claudiupopa: And I think we said that I'd push forward with main stuff.
[09:52] <claudiupopa> Cool.
[09:52] <claudiupopa> Then I'm happy with it as it is.
[09:52] <claudiupopa> +1ed
[09:52] <claudiupopa> For workflow.
[09:52] <claudiupopa> By the way, could you take a look again at the plugin patch?
[09:52] <claudiupopa> I don't have tests, but I'll appreciate a comment regarding the direction.
[09:55] <openstackgerrit> Merged stackforge/cloud-init: add cloud-init main  https://review.openstack.org/202743
[09:58] <Odd_Bloke> claudiupopa: So with parallel discovery, we'd still load the code from the disk serially?
[09:59] <claudiupopa> Good question. I think it depends on the iterator's flavour.
[09:59] <claudiupopa> Right now the loading is serial.
[10:02] <Odd_Bloke> claudiupopa: Should filtering by name be a strategy?
[10:02] <claudiupopa> It could be.
[10:03] <Odd_Bloke> claudiupopa: We don't actually have anywhere calling get_data_source with a list of strategies yet, right?
[10:04] <claudiupopa> Yep.
[10:05] <Odd_Bloke> claudiupopa: How would a FilterByNamesStrategy be created?
[10:07] <claudiupopa> writing right now an example.
[10:07] <Odd_Bloke> Thanks!
[10:08] <claudiupopa> Something like this http://paste.openstack.org/show/412159/
[10:09] <claudiupopa> Although _names should be passed somehow to the strategy.
[10:09] <Odd_Bloke> Yeah, that was the bit I couldn't quite work out.
[10:09] <Odd_Bloke> The strategies could be instantiated, and have a method that does the filtering?
[10:10] <claudiupopa> You mean a separate method?
[10:10] <claudiupopa> One for loading the data sources and another one for filtering?
[10:11] <claudiupopa> Mm, the idea is to combine multiple of them to do the filtering, since trying to see if a data source is available or not is still considered a filtering operation.
[10:12] <claudiupopa> I could instantiate them beforehand, in get_data_source.
[10:13] <claudiupopa> And I could pass names only to the FilteringByNameStrategy.
[10:13] <Odd_Bloke> So BaseSearchStrategy.__init__ wouldn't take any parameters by default, and search_data_source would become search_data_sources(<list of data sources>).
[10:13] <Odd_Bloke> And you'd pass the return of that in to the next search_data_sources.
[10:14] <Odd_Bloke> (Rather than in to the constructor of the next strategy, as you do now)
[10:14] <claudiupopa> Oh, that could work.
[10:15] <Odd_Bloke> So I think you would instantiate them in get_data_source, yeah.
[11:22] <trueneu> Hi. How can I run a cloud init script on an already installed instance? I've found that I gotta trick cloud-init into thinking this is a fresh boot, but I can't understand where should I place my cloud init file.
[11:23] <Odd_Bloke> trueneu: Why do you want to run cloud-init, rather than just running a shell script etc.?
[11:24] <trueneu> It's in a neat cloud config form, and it failed to execute at boot somehow, so I need to re-do it.
[11:31] <openstackgerrit> Claudiu Popa proposed stackforge/cloud-init: Add an API for loading a data source  https://review.openstack.org/209520
[12:02] <smoser> Odd_Bloke, or harlowja or claudiupopa your thoughts on my https://code.launchpad.net/~smoser/cloud-init/trunk.reporting/+merge/266578 (0.7) woudl be appreciated.
[12:07] <Odd_Bloke> smoser: Are registry and reporting copy-paste backports from 2.0?
[12:15] <Odd_Bloke> smoser: Oh, no, there's a WebHookHandler in there?
[12:15] <Odd_Bloke> smoser: Still don't know why you aren't getting stuff in to 2.0 so we can do a copy-paste backport.
[12:15] <Odd_Bloke> Rather than doing a copy-paste backport, a change, and then a forward-port.
[12:16] <smoser> copy & paste + imports + http://bazaar.launchpad.net/~smoser/cloud-init/trunk.reporting/revision/1155
[12:16] <smoser> and the webhookhandler.
[12:17] <smoser> Odd_Bloke, because of time line is all.
[12:20] <smoser> and now that i think about it i think that code in that one doesnt work.
[12:21] <smoser> the goal of the change there is to re-initialize if different.
[12:21] <smoser> but i think the check there is comparing a dict to a class.
[12:52] <openstackgerrit> Daniel Watkins proposed stackforge/cloud-init: Fix running cloud-init with no arguments on Python 3.  https://review.openstack.org/210381
[13:00] <Odd_Bloke> smoser: claudiupopa: Minor fix to main. ^
[13:01] <claudiupopa> Why doesn't parsed have the func attribute?
[13:01] <smoser> because it didn't have a subcommand.
[13:01] <Odd_Bloke> claudiupopa: It's a bug in Python 3, I think.
[13:02] <smoser> maybe you can set_defautls on func to get it to call help ?
[13:10] <Odd_Bloke> smoser: That works on Python 3, but not on Python 2.
[13:55] <Odd_Bloke> smoser: claudiupopa: So that change gives us consistent behaviour on Python 2 and 3.
[13:56] <Odd_Bloke> smoser: claudiupopa: Getting Python 2 to do something different will mean pre-empting the parser, because just parsing the arguments is what throws up the error.
[13:57] <claudiupopa> I see.
[13:57] <claudiupopa> Then it seems fine for me.
[14:12] <Odd_Bloke> smoser: We have several different stages defined in cloudinit.shell, but I thought we were going to be running cloud-init as an agent (which would, presumably, only involve a single call to cloud-init).
[14:47] <Odd_Bloke> smoser: claudiupopa: harlowja: I'm trying to work out how to name things; I'm going to work on persisting a discovered data source to disk (so that future runs don't have to perform discovery).  What should I name the data that cloud-init has derived from its environment?
[14:48] <Odd_Bloke> It's not metadata, vendor-data or user-data; those are all inputs.
[14:48] <Odd_Bloke> Maybe 'configuration', but that would seem to be more appropriate as the stuff in /etc that defines how cloud-init will run on an instance.
[14:48] <Odd_Bloke> Any thoughts?
[14:48] <claudiupopa> persisting data source to disk, as in caching?
[14:49] <Odd_Bloke> claudiupopa: So one of the stub commands in cloudinit.shell is 'search', which will 'search available data sources'.
[14:49] <smoser> ok. Odd_Bloke sorry, didnt' responde before
[14:49] <smoser> so the stages... there are still stages that have to run in boot
[14:50] <smoser> there might be a daemon that starts very early, and the stages communicate with that daemon. that is a possible implementation.
[14:50] <smoser> also possible is that a daemon just starts later.
[14:50] <smoser> but either way, as far as my vision can see, we'll have upstart or sysvinit or systemd jobs that run at points in boot
[14:50] <smoser> that is what those stages are for.
[14:51] <Odd_Bloke> I think making it possible to not run a daemon would be good; I can imagine people who are happy with cloud-init as-is not wanting an extra process running.
[14:51] <smoser> wrt storing data, i think 'cache' sounds reasonable
[14:52] <smoser> you'll never have to run the daemon
[14:52] <smoser> even if it ran in boot, that'd just be an imlementatyion detail
[14:52] <smoser> and then it'd shut itself down.
[14:52] <smoser> but we can worry about that laer.
[14:52] <Odd_Bloke> I'm not sure it is, strictly speaking, a cache though; some data sources will only be able to fetch information a single time.
[14:52] <Odd_Bloke> (For example, CloudStack passwords can only be read once)
[14:53] <claudiupopa> so metadata, userdata and vendordata all represents the same thing, an input data that's used to drive cloud-init.
[14:53] <claudiupopa> How about drive data?
[14:53] <claudiupopa> Sau execution data.
[14:53] <claudiupopa> Or*
[14:54] <Odd_Bloke> Actually, this is basically what would go in /var/lib/cloud/instance ATM; how about 'instance data'?
[14:55] <claudiupopa> Yep, that sounds good as well.
[15:05] <Odd_Bloke> smoser: Thanks for the info on the commands. :)
[15:06] <Odd_Bloke> claudiupopa: smoser: So, next question: what do we want the data to look like when serialised on-disk?
[15:09] <Odd_Bloke> claudiupopa: smoser: I'm thinking we could persist a dictionary as JSON, but I don't know if we have lessons from 0.7.x that suggest that's a bad idea.
[15:09] <claudiupopa> why should it be a bad idea? I was thinking on JSON as well.
[15:12] <Odd_Bloke> claudiupopa: Well, that's not how we do it in 0.7.x; I wasn't sure if that was intentional or not. :p
[15:13] <smatzek> JSON would be nice if there aren't gotchas from 0.7.x that Odd_Bloke refers to.
[15:14] <claudiupopa> by the way, the caching is persistent per cloud-init's run or it's always there?
[15:14] <Odd_Bloke> claudiupopa: I would expect it to always be there.
[15:14] <claudiupopa> because some portions of data shouldn't stay there for longer, such as passwords.
[15:15] <Odd_Bloke> Potentially the consumers of that data should be responsible for clearing it out?
[15:16] <claudiupopa> before it"s serialized on disk?
[15:17] <Odd_Bloke> It would be good to be able to separate the "fetch all the data we need" step from the "use the data" step.
[15:17] <Odd_Bloke> No, I think it would be serialised to disk.
[15:17] <Odd_Bloke> And then whatever handles passwords removes passwords from the serialised data.
[15:18] <Odd_Bloke> (Side note: If someone can read the password from the disk, they're probably already in a position to do whatever they want anyway. :p)
[15:18] <claudiupopa> that doesn't seem very good, since it's not separating the concerns properly.
[15:19] <claudiupopa> Yeah, that's also true.
[15:19] <claudiupopa> But anyway it's harder to read it from memory rather than from disk. ;-)
[15:19] <Odd_Bloke> I'm thinking that special-casing passwords isn't particularly useful.
[15:19] <Odd_Bloke> claudiupopa: It's easier to just set it to whatever you want than read it from disk. ;)
[15:20] <Odd_Bloke> Because there could be other private data that shouldn't be persisted long-term.
[15:20] <claudiupopa> Maybe having a way to specifiy that a piece of data should never be serialized?
[15:20] <claudiupopa> @dont_serialize_this
[15:21] <Odd_Bloke> claudiupopa: That does mean (e.g.) setting passwords in the same process as fetches the password from wherever the password is fetched from
[15:22] <smoser> agree with most of what is a bove.
[15:22] <claudiupopa> in order to avoid ipc? If the agent is not involved, I  would expect it to happen in the same process nevertheless.
[15:22] <smoser> json i think is fine with me. i used pckl in cloud-init 0.7 largely because it is simpler (i picked the class).
[15:24] <Odd_Bloke> What if we just deprecate passwords in cloud-init 2.0 (and Ubuntu 16.04 cloud images)? :p
[15:24] <smoser> i think we kind of *have* donethat
[15:25] <claudiupopa> well, on windows they're still somehow required.
[15:25] <Odd_Bloke> You never know, perhaps 2016 will finally be the Year of Windows on the Cloud. ;)
[15:25] <Odd_Bloke> smoser: What are your thoughts on persisting passwords to disk?
[15:27] <Odd_Bloke> Hmm, could we hash the passwords ourselves before putting them on disk?
[15:29] <Odd_Bloke> (This is, obviously, special-casing passwords like I said I didn't want to do :p)
[15:29] <claudiupopa> how about specific exemption?
[15:29] <claudiupopa> Having a decorator that marks a particular piece of data as non serializable.
[15:30] <Odd_Bloke> Right, but that then means that we have to use that data before this particular process dies.
[15:30] <smoser> wll, you may need to persist them for some time
[15:30] <smoser> right.
[15:30] <smoser> yeah.
[15:30] <smoser> we can do some thing. like hashing i dont think its unreasonable.
[15:31] <smoser> if the perms on the data are correct, its sane
[15:31] <smoser> and then after we consume it we can remove that data.
[15:31] <smoser> it obviously did get written... maybe we'd need to shred
[15:32] <claudiupopa> The same thing happens with hashing, the password will not be available anymore after deserialization.
[15:32] <claudiupopa> as in we'll have a hash that can't be used.
[15:32] <Odd_Bloke> Why couldn't it be used?
[15:33] <Odd_Bloke> Ah, I'm guessing you can't use the hash of a password to set a password on Windows?
[15:33] <claudiupopa> Nope. ;-)
[15:33] <Odd_Bloke> *buys a cheap Windows laptop on eBay, so he can throw it out of the window* :p
[15:37] <Odd_Bloke> OK, I think I can implement the first pass as 'serialise all the things' and then we can work out the nuance later.
[15:40] <smatzek> we still have operators that use password and may want it set.  I'm not defending the practice but it is still done.
[15:41] <smatzek> do we know for sure we'll have separate processes serializing the data vs those that consume it?
[15:41] <Odd_Bloke> smatzek: Currently there are two different cloud-init sub-commands defined which would do each bit.
[15:42] <smatzek> as stated above I think there may be other cases of private or sensitive data that we may not want sitting around on disk, so the sensitive tag idea might be worth pursuing.
[15:42] <smoser> Odd_Bloke, this does go towards a larger thread.
[15:43] <smoser> with the goal of cloud-init query
[15:43] <smoser> whether that hits a daemon or hits a cache, we want user to be able to get some bits of data
[15:43] <smoser> and some bits to be privildged access only
[15:44] <smatzek> another item that may be sensitive is the chef module's validation_key which is a private RSA key.  That might be good to delete/shred once the chef module is done running.
[15:47] <Odd_Bloke> So my proposal is (1) we persist all the data to disk, and then (2) individual modules are responsible for shredding whatever data they consider sensitive (and no longer needed).
[15:49] <Odd_Bloke> Actually, we could have data sources provide a way of fetching passwords.
[15:50] <Odd_Bloke> And then the modules that care about passwords use that.
[15:50] <Odd_Bloke> But that doesn't solve the case where the password(s) are in user-data.
[15:51] <claudiupopa> why they are two steps?
[15:51] <claudiupopa> data retrieval and persistance and execution?
[15:52] <claudiupopa> I think I'm missing context here.
[15:52] <Odd_Bloke> I don't see why they would be one step (except for the issue we are discussing now). :p
[15:53] <Odd_Bloke> I'm taking my lead from smoser having stubbed out 'search' and 'config' as separate subcommands.
[15:54] <Odd_Bloke> 'search' need not necessarily encompass actual fetching of the data, I guess.
[15:54] <Odd_Bloke> Which I have been assuming.
[15:56] <Odd_Bloke> I guess cloud-provided data can also change in the meantime.
[15:56] <Odd_Bloke> So maybe we shouldn't be persisting much of this stuff at all...
[17:10] <harlowja> hmmmm, Odd_Bloke i suck at naming things :-P
[17:10] <Odd_Bloke> :D
[17:10] <harlowja> put stuff into a little sqlite.db file , profit?
[17:10] <harlowja> persistence.db
[17:11] <harlowja> there u go
[17:11] <harlowja> lol
[17:11]  * harlowja is brillant
[17:14] <harlowja> honest question, why not just store it in some /var/cloud/persistence.db or something
[17:14] <harlowja> might be nice to have a little sqlite thing
[17:14] <harlowja> i know i know the filesystem is currently used for this
[17:37] <Toger> Hello, I am trying to use cloud-init on centos7, v0.7.5.  from cloud-init-0.7.5-10.el7.centos.1.x86_64.  I am using it to install chef, however the AMI I have is pre-hardened and has noexec set on /tmp.  The chef init script tries to download and run the installation out of /tmp which fails. The chef script honors tmpdir, so if I can reset the tmpdir environmental variable prior to the chef module then it'll work. Is there a way to do that in
[17:37] <Toger> cloud-init?
[17:42] <smatzek> chef runs during cloud_config_modules.  Looking at that module list I don't see any module where you could run arbitrary commands, scripts before it runs.  You may be able to use bootcmd, which runs in cloud_init_modules to change the system env, so the process that cloud_config_modules would pick it up, but I'm not sure if that would work.
[17:45] <Toger> bootcmds would run in a subshell though, wouldn't it?
[17:45] <Toger> so the env change would be lost
[17:46] <Toger> I was hoping there was a way in cloudinit natively to set environmental variables for commands
[18:07] <Toger> Or, changing       util.subp([tmpf], capture=False)      to       util.subp(['sh', tmpf], capture=False)
[21:24] <Toger> For the chef module, I'd like node_name to be something like 'prefix-$INSTANCEID' as opposed to a static prefix
[21:24] <Toger> and not just instance-id
[21:24] <Toger> is there anyway to do that?
[21:26] <Toger> in other words, when using this in a autoscale group I can't use one single node-name w/chef, but its not very friendly to use just i-a234tg names
[21:26] <Toger> so for each autoscale group I'd put something like groupname-instanceid
[21:26] <Toger> ideally
[22:13] <Toger> the chef mechanism also needs a way to lay down the encrypted data bag key
[22:14] <Toger> mm perhaps with write_files
[22:35] <Toger> but it needs a way to at least specify the location
[22:52] <Toger> and chef only seems to run if its installed via gems?