#cloud-init 2014-08-11
<frankieonuonga> Hi folks, I am looking to use cloud init to create some containers on my machine but can not seem to find some documentation in regards as what should seat in user-data and meta-data
<harmw> smoser: if you happen to see RaginBajin, please have him checkout https://code.launchpad.net/~harmw/cloud-init/freebsd
<harmw> RaginBajin: ping
<RaginBajin> harmw: pong
<harmw> https://code.launchpad.net/~harmw/cloud-init/freebsd
<harmw> could you check out that branch?
<harmw> eg. create a new image (with oz), boot and instance with it, install bzr, checkout that branch and run tools/build-freebsd (or whatever I called it)
<RaginBajin> sure. 
<RaginBajin> I haven't used oz before, so I'll have to look up the howto
<harmw> ok, its on github
<harmw> and you'd probably want to add some of my patches as well
<harmw> https://github.com/clalancette/oz/pull/168/commits
<RaginBajin> Cool. I'll take a gander at it here in a few. 
<harmw> hm, I also made a change in Guest.py that's addressed in the tickettracker on github...
<harmw> something regarding symlinks
<harmw> RaginBajin: I'm looking forward to seeing your results :)
<smoser> pquerna, thanks. thats definitely something i'm interetsed in getting right. are you involed in the openstack bluerpint ?
<pquerna> smoser: mostly indirectly, other people at rackspace are working on the nova/ironic side of it
<pquerna> smoser: happy to push feedback through non-spec view channels if helpful :)
<pquerna> s/view/review/
<smoser> we have a team at canonical who is also in need of networking information format
<smoser> and i would like to use one format and not have 2.
<smoser> my comment on that review was wrt mtu
<smoser> i think it really should be in there, and canonical is wanting/needing to describe bond interfaces.
<pquerna> yup, makes sense.
<pquerna> JoshG sits 8.9 feet from me in the SF office
<pquerna> smoser: one thing we are looking at adding is a services: [] chunk in the format.  first pass is to add DNS servers, but we were also thinking of adding NTP and maybe even local apt mirror etc as a url in there.
<smoser> oh yeah, and multiple ip addresses per nic. (my comment there). 
<smoser> hm..
<smoser> i originally had kind of done vendor_data to be a way to provide information like that.
<smoser> and had envisioned 'mirrors' to be something that would put there.
<pquerna> right
<smoser> mirrors:
<smoser>  http://archive.ubuntu.com/ubuntu: ['http://ubuntu.localmirror/ubuntu/']
<smoser> something like that. but yeah, i agree that you do want something there. 
<smoser> i even commented on vendor_data that we could use that as a way to experiment on things before they got into the metadata service proper
<pquerna> yes
<pquerna> we are very likely going to deploy it into vendor_data.json first
<smoser> yeah, cool. and i'm ok with having cloud-init read it from there if its there, and let choose that over meta-data or something.
<harmw> RaginBajin: any luck yet or should I just upload an image instead?
<RaginBajin> harmw: Not yet.. I've been distracted a bit 
<harmw> :)
<harmw> smoser: would you like me to bug you again about cirros' new buildroot? :p
<pquerna> JoshNang: so
<pquerna> JoshNang: MTUs on things: ethernet is obvious, but should it also be a field on the VLAN and/or bonds?
<pquerna> JoshNang: trying to find docs on what linux does when MTU on eth0 is 1500 but bond or vlan has a higher value.
<JoshNang> so i've seen a couple references to the vlan defaulting to the NIC's MTU if they're different (on blogs/articles), but can't find that in any docs
<pquerna> smoser: for multiple IPs, is having N network {} defs OK, or were you thinking of having an array of IPs inside the network?
#cloud-init 2014-08-12
<nvucinic> hi guys, is there any way do disable every metadata check except nocloud ?
<smoser> pquerna, do you have an example of the json ?
<smoser> that i can just look at without thinking :)
<RaginBajin> harmw: I know I owe you  testing your code. I've been distracted, so I haven't been able to upload an image and test it out. I'm hoping I can get to it today or tomorrow. 
<harmw> RaginBajin: np :)
<harmw> Cris merged my fixes to oz btw, so when using oz you only need to checkout the master branch
<harmw> or whatever terminlogy github uses
<nvucinic> hello, is there any link on how to configure nocloud option ?
<harmw> you mean explicitly use nocloud instead of just looking for something (datasource) that happens to respond? 
<nvucinic> there are not other datasources except iso that is mounted on boot
<nvucinic> iso is automounted on boot
<harmw> datasource_list: ['nocloud']
<harmw> I think you would need to try something like that
<harmw> in your /etc/cloud/cloud.cfg
<nvucinic> ok, will try that 
<nvucinic> tell me, i need to change instance id every time i mount iso ?
<harmw> the instance id only changes when you nuke your instance and create a new one
<nvucinic> ok what is the minimum configuration to see if this is owrking?
<nvucinic> i can change hostname for example
<harmw> I'm afraid I dont know, I only user datasource openstack
<harmw> haven't touched configdrive/nocloud 
<nvucinic> well i am trying to change hostname 
<nvucinic> i have added this into my user-data 
<nvucinic> #cloud-config
<nvucinic> hostname: mynode
<nvucinic> fqdn: mynode.example.com
<nvucinic> manage_etc_hosts: true
<nvucinic> in meta-data i have only instance-id: iid-10
<nvucinic> that should do right?
<pquerna> smoser: yeah, will grab you a sample from our env this morning.
<smoser> nvucinic, well cloud-config wont change meta-data.
<smoser> that doens't look unreasonable above.
<nvucinic> why not ?
<nvucinic> i mean, how can i set at least hostname with cloud-init 
<smoser> just as you showed.
<smoser> cloud-init has no control over meta-data.
<smoser> but you told it to name itself 'mynode'. it will do that.
<nvucinic> well it does not do that :)
<nvucinic> so i thought that i did something wrong 
<smoser> ubuntu ?
<nvucinic> it's centos setup, on boot mounts cdrom in /mnt/cdrom
<smoser>  you have user-data in /var/lib/cloud/instance/user-data* ?
<nvucinic> no, only user-data file that i have is on iso
<nvucinic> also, i got "No Instance datasource found!
<nvucinic> when I put datasource_list: ['nocloud']
<nvucinic> my cdrom is mounted on /mnt/cdrom and i have user-data there
<smoser> you're on openstack ?
<pquerna> smoser: https://gist.github.com/pquerna/dd4313f34e98d625f351
<pquerna> smoser: that is openstack/latest/vendor_data.json
<smoser> thanks.
<harmw> RaginBajin: ping :)
<pquerna> smoser: one other topic we are thinking about, but no resolution yet, if we want to start making an omni-bus of an updated cloud-init for older operating systems.
<pquerna> smoser: bundle the universe under /opt/$something, don't have to get into packaging python-jsonpatch to old old things.
<smoser> pquerna, yeah. i will support getting cloud-init to run back to older versions
<RaginBajin> harmw: working on it right now actually. (or atleast getting the image up and going )
<harmw> nice nice
<harmw> let me know if you need hlp
<harmw> smoser: if I ever get neutron to do ipv6 in a proper dual stacked way in my cloud I'm going after cirros-ipv6
<smoser> harmw, +1
<harmw> yea yea
<harmw> :p
<harmw> someone should finish https://review.openstack.org/#/c/77471/, would make it a lot nicer
<harmw> RaginBajin: you done yet? :p
<smoser> harmw, 
<harmw> shoot
<harmw> the suspence is killing me, realy
<harmw> hate.. this.. smoser..
<smoser> harmw, wait.
<smoser> what ?
<smoser> i'm sorry. i think imight have typed into long window.
<smoser> i think that cirros-ipv6 would be great.
<harmw> you highlighted me, and then nothing
<harmw> somthing that would eventually lead to the death of either one of use
<harmw> and yes, that would be nice :)
#cloud-init 2014-08-13
<RaginBajin> harmw: ping
<nvucinic> hi guys, i still did not made nocloud to work 
<nvucinic> do i need to specify somewhere to clod.cfg path to iso file 
<nvucinic> this is what i have in logs
<nvucinic> http://pastebin.com/CGmDzHBJ
<harmw> RaginBajin: pong
<RaginBajin> harmw: Hey, So I was running your build last night. 
<harmw> ah great, enjoy me with al the bugs you encountered :p
<RaginBajin> Well, I need to go back and create another image. We don't always use the Openstack Metadata service but use the configdrive. So I need to create an additional instance to test with Openstack metadata
<harmw> did it work with the script from tools/?
<harmw> ah ok, yea I havent looked at configdrive yet
<RaginBajin> So, i downloaded the code, ran the python setup.py install but forgot to add --init-system.  We should change that error message to mention to use --init-system. 
<RaginBajin> Everytime I need to go into the setup.py script and look at the stupid file to find out what the name of the argument is 
<harmw> no no, what you should do is read/use the script in tools/ :)
<RaginBajin> The build-on-freebsd? 
<harmw> http://bazaar.launchpad.net/~harmw/cloud-init/freebsd/view/head:/tools/build-on-freebsd 
<harmw> yep
<RaginBajin> ahh 
<harmw> I needed some way of easy testing, and keeping any neccessary hacks documented until there is a real freebsd pkg/port
<harmw> a clean/fresh OZ image only needs bzr installed, from there on you can use bzr to grab my branch and this script will do the rest
<RaginBajin> ahh ok. 
<RaginBajin> Well let me try it that way then. 
<harmw> you can see the required steps as well, regarding the --init-system
<harmw> I'm working on building a real port btw, just need to make myself familiar with the freebsd tools for building such a thing first
<RaginBajin> Yeah. I think the --init-system change is a good over-all change. If it's missing actually tell you what the argument to use. not just what options. 
<RaginBajin> gotcha. I was looking at all the changes I needed to make for the configdrive to work. They are not many. Just a little specific type of things need to be changed when it's fbsd.  i.e  mounting a cdrom 
<harmw> yea, it's probably a few details in terms of device naming and such
<harmw> cloudinit already knows how to read from configdrives so that should be no problem
<RaginBajin> yeah. The changes I made were all around mounting and finding the fbsd drive. So, I should clean those up and get them submitted up as well. But it would be good after your changes go in. that way it can be tested with your changes as well 
<harmw> hm, perhaps you can branch me and commit to that :) I don't realy know if thats supported though, but it might just work
<harmw> (Ill just asume that branch will be able to merge with both mine and trunk)
<RaginBajin> harmw:  Just ran it. One error that I did get was in the growfs.  It seems that if your disk is already at the max size, it errors out with rc = 1 and the stderr of 'growfs: requested size 20GB is not larger than the current filesystem size 20GB\n' (20gb being my size of disk)
<harmw> yea, I noticed that as well
<harmw> handling that would be nice, someday
<RaginBajin> yeah. It also looks like the ssh-key injection didn't work for that user beastie 
<harmw> yea indeed
<harmw> I didn't look into that just yet
<harmw> I noticed for root they were added just fine
<RaginBajin> but it worked before which is strange. I think it's becasue the user is created after the key injection
<RaginBajin> it didn't create it for me on root either. 
<harmw> hm hm, then it might've done something else since I've put a user-data upon creating an instance that also does atleast something with users
<harmw> I'll have it fixed somewhere soon though :)
<RaginBajin> also we can fix the update_etc_hosts as well 
<RaginBajin> just need to add a template called hosts.freedbsd.tmpl
<harmw> oh ok, sounds nice (though I have no idea what that function/module does)
<RaginBajin> I just copied over one from hosts.debian.tmpl and renamed it and it works easily 
<nvucinic> anyone has some more info about nocloud option ?
<harmw> smoser: someone needs your help :)
<smoser> nooo
<smoser> nvucinic, whats up?
<smoser> i suggest using cloud-localds from cloud-utils
<RaginBajin> harmw:  Figured it out. We commented out the "ssh" part in the cloud.cfg. Once I added that back, all the keys were injected. 
<harmw> you did that? or was it in my branch?
<harmw> hm I did
<harmw> lame
<nvucinic> smoser: trying to get nocloud option to work but i belive that i missed some steps 
<nvucinic> i have datasource_list: ['nocloud']
<nvucinic> in cloud.cfg 
<nvucinic> datasource_list: ['nocloud']
<nvucinic> http://pastebin.com/CGmDzHBJ this are error that i get 
<RaginBajin> harmw:  Yeah your branch.  No biggie. just feel better that it wasn't something super crazy 
<harmw> I'll remove that # when I can :)
<harmw> i'd like to see your configdrive code btw, despite the fact that I have no use for it yet
<harmw> (so can't verify in-depth)
<smoser> nvucinic, NoCloud
<smoser> case sensitive as it is used for python import
<smoser> nvucinic, if you're just randomoly using pastebin's , please use openstack or ubuntu.
<smoser> those adds suck!
<smoser> and along those same lines, 'pastebinit' is awesome.
<smoser> (program that lets you : pastebinit foo
<smoser>  or: foo | pastebinit 
<smoser> )
<nvucinic> smoser: wow, thx
<pquerna> smoser: hey, have a question, I got a https://gist.github.com/pquerna/59a8671502dffc9d8843 status.json with "unhashable type", which sounds like an exception, but I can't find anything in the actual logs about the exception
<smoser> hm..
<smoser> where did you see that ?
<smoser> oh. i see.
<smoser> kind of sucks that i populated 'finished' and not 'end'
<smoser> :-(
<smoser> and that finished is not finish
<pquerna> heh
<smoser> i'm gonna change it to populate 'end' rather than 'finished'
<smoser> na. i'm just going to drop end. thats the more grown up thing to do.
<smoser> pquerna, can you recreate by just letting the error raise ?
<smoser> in /usr/bin/cloud-init, just 
<pquerna> if i nuke /var/lib/cloud, run cloud-init init, i get the same status.json every time
<smoser>  except Exception as e:
<smoser> raise
<pquerna> ok
<pquerna> thx
<pquerna> smoser: yup, got an exception now. thanks.
<pquerna> https://gist.github.com/pquerna/a8aed8f963663308af7a
<pquerna> hmm
<pquerna> smoser: so one thing i changed, heh, the version of openstack configdrive that read_configdrive was using, was before vendor_data.json was even added in openstack
<pquerna> smoser: so it was always empty
<smoser> oh. hm..
<pquerna> yeah, so this is just vendordata_raw is actually parsed json
<pquerna> this code is expecting a string blob of raw...
<pquerna> right
<smoser> yeah. :-(
<pquerna> self.vendordata_raw = results.get('vendordata')
<smoser> you want to open a bug and submit merge proposal ?
<pquerna> in DataSourceConfigDrive.py...
<pquerna> smoser: i guess my question is what is the intended operation, it seems the code wants to treat vendor_data.json like user_data and do all the various operations on it... so really its not the same thing. So seems fix is to keep vendordata_raw as a vendor_data file (that doesn't exist)
<pquerna> smoser: and then put the vendor_data.json, which is more enviromental/etc as a thing to parse as json and use if needed for info.
<smoser> so yeah, we want to do all the same things to vendor_data as we do to user_data.
<pquerna> https://github.com/pquerna/cloud-init/commit/eec8a00a328b296711ca201f3facdc5910886094
<pquerna> eg, something like that
<smoser> hm.
<smoser> ok. i undretsand more now.
<smoser> you were expecting it to be a dict (loaded json).
<smoser> and that does seem to make sense. 
<smoser> pquerna, bah.
<smoser> cloudinit/sources/DataSourceOpenStack.py does it right
<pquerna> smoser: ah, i see
<pquerna> smoser: ok
<smoser> it looks like there might be other improvements there too :-(
<smoser> sitl somewhat broken. but closer.
<praneshp> hey smoser  you around?
<praneshp> whatâs the minimum user-data file Iâd need to use to add users to an openstack vm when booting it up?
#cloud-init 2014-08-14
<smoser> praneshp, well it should add *a* user by default.
<praneshp> smoser: the user whoâs making this vm?
<praneshp> smoser: i managed to figure that out
<smoser> well, defaul tuser per distro
<praneshp> I accidentally had ssh-rsa twice in the key that I was passing in user-data
<praneshp> that was  breaking ssh-ing into the vm
<praneshp> when I fixed that all was well
<smoser> odd. i would have thought that'd be ok.
<praneshp> really? the key was wtong, I guess?
<smoser> oh. ssh-rsa is not public key
<smoser> thats private key
<smoser> users syntax is at: 
<smoser>  https://github.com/number5/cloud-init/blob/master/doc/examples/cloud-config-user-groups.txt
<smoser> http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/doc/examples/cloud-config.txt#L207
<smoser> (line 207 is authorized_keys)
<smoser> line 215 is *private* keys and public keys for those private keys.
<smoser> ie /etc/ssh/rsa_private.key (or whatever that path is)
<praneshp> smoser: sorry. really pulled in several directions here. I meant the publoc key
<praneshp> it should look like ssh-rsa AAAAB3NzaC1yc2EAAAADAQABA<blah>
<praneshp> for some reason, I typed ssh-rsa, then pasted ssh-rsa AAAAB3NzaC1yc2EAAAADAQABA<blah>
<praneshp> which was probably making the key funny
<smoser> ah. yeah, maybe.
<smoser> i'm really sorry . i have to go though.
<ndonegan> Is there anyway to setup a fake metadata server to tell cloud-init to do a no-op?
<ndonegan> For context, using oz to build and customise images. However, on the customise part, it will attempt to do a cloud-init and fail, but have to wait for the timeout which adds to the image creation time.
<harmw> oz shouldn't actually run cloud-init
<harmw> just install and enable it
<ndonegan> When doing oz-customize (say when the devs are laying down their packages), it boots up the image using libvirt, and then goes in over ssh and installs packages, repos etc.
<ndonegan> It's not a massive deal, it's mostly just extra time, and I can empty /var/lib/cloud after.
<ndonegan> But it would be nice if there was some way to tell cloud-init to basically twiddle it's thumbs for this boot.
<harmw> ah yes, that stage..
<harmw> you could just disable cloud-init in final install stage and re-enable it in customize stage
<ndonegan> That's basically what I'm doing now, but it would be nice if there's a simpler way :)
<smoser> ndonegan, for the boot that you dont want it to do anything
<smoser> you can provide cloud-config that basically says : do nothing
<smoser> ie, you can feed cloud-config that provides the list of config_modules, init_modules and final_modules
<smoser> and those can have whatever you need or want and no more.
<smoser> you dont *have* to ever clean /var/lib/cloud
<smoser> as the next time you boot, it shoudl have a new instance
<smoser> and that will trigger the running of things that are "per-instance"
<ndonegan> smoser: so basically pass in a user data with empty config_modules, init_modules and final_modules?
<smoser> yeah. cloud-init wont do much then.
<smoser> and you can feed it NoCloud user-data and meta-data
<ndonegan> Easy enough to put the link local addy on a dummy interface and "mock" the metadata service.
<smoser> ndonegan, https://gist.github.com/smoser/1278651
<smoser> you can then put whatever user-data and meta-data you want in that python strugure there.
<ndonegan> smoser: Nice! Thanks.
<ndonegan> smoser: Works a charm! A bit of customisation so that the returned ip is the client_ip, which probably wasn't 100% necessary, but it's doing what I wnat.
<harmw> smoser: you happen to know if bazaar supports .tar.gz downloads of branches?
<harmw> smoser: py27-cloud-init-0.7.5.txz
<harmw> I've just created a real FreeBSD port :)
<harmw> well, plus 2 python dependencies actually
#cloud-init 2014-08-15
<harmw> smoser: when may we expect a new version of ci to be released? i've got some important fixes in https://code.launchpad.net/~harmw/cloud-init/freebsd that I'd like in before I go for it and upload a real port to the freebsd ports system
<smoser> harmw, i dont have a strong plan for a release, but we can get your code merged in.
<harmw> ok, well, I'd like a version bump in some form to allow me to build with the merged code :)
<harmw> btw, i've submitted the dependencies that aren't currently in freebsd ports earlier today
<harmw> 2 packages, but still
<harmw> smoser: https://code.launchpad.net/~harmw/cloud-init/freebsd/+merge/231024
<harmw> harlowja: those fbsd initscripts from sean were a bit wacky I'm afraid :p
<smoser> harmw, thanks for being awesome.
<harlowja> yo yo
<harlowja> harmw haha
<harlowja> harmw u getting it up into freebsd repos i see, woot
<harmw> harlowja: yes, I already did the jsonpatch and jsonpointer ports. Someone from the fbsd project needs to commit them, after that they're live
<harlowja> cool
<harmw> I'm now working on the cloud-init port, but that a little more complicated
<harmw> though thats because of all these new packaging tools I need to learn :p
<harlowja> :)
<harmw> btw, if either one of you knows someone on the neutron crew
<harmw> have them work out https://review.openstack.org/#/c/77471/ :p
<harmw> it's abandoned
<harmw> which is sad
<harlowja> mark killed it, haha
<harlowja> i can bug mark about it i guess (he's on my team)
<harmw> hehe
<harmw> I think it's Randy (who created the code) who needs to fix something now though, but still :p
<harlowja> ya, if person doesn't fix the code, not sure what i can do :-P
<harlowja> *or keep pushing the review discussion/fixes...
<devicenull> Is there a list somewhere of what metadata cloud-init relies on from the ec2 metadata source?
<harlowja> devicenull unsure what that means ;)
<devicenull> right, that kinda incomprehensible
<harlowja> the metadata api is pretty generic that ec2 can expose, cloudinit will just extract all the resources on demand
<harlowja> http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/ec2_utils.py#L40 is the code that does this
<devicenull> so, what I'm looking for is a list of what metadata cloud-init relies on to actually do things.  I'm writing an implementation of the ec2 metadata service for our software
<harlowja> but the initial fetch from ec2 defines what subsequent keys are fetched...
<harlowja> so i'd say the key items that it uses are block-device-mapping
<harlowja> ssh-keys
<harlowja> and userdata (of course)
<harlowja> http://docs.aws.amazon.com/AWSEC2/2008-05-05/DeveloperGuide/index.html?AESDG-chapter-instancedata.html
<harlowja> availability-zone also (although this is less used)
<harlowja> and the hostname ones
<devicenull> okay, thanks!
<harlowja> 'ami-launch-index' will be used, although its optional
<harlowja> and of course instance-id
<devicenull> I'm thinking the safest way to go is to provide as many of the values as possible
<devicenull> and that are relevant to us
<harlowja> yup
<harlowja> those i would call the main key ones
<harlowja> ssh-keys can be empty
<harlowja> i'd make sure yours provides, userdata, instance-id, block-device ones and ssh-keys (although u can return ssh-keys as empty)
<devicenull> ok
<harlowja> and the hostname ones although cloud-init will skip those if its not there
<harlowja> and try to find the hostname by other mechanisms
<harlowja> http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/sources/__init__.py#L45 is the basic datasource
<harlowja> that modules and all that get access to
<harlowja> methods there expose some of the above
<devicenull> ah, ok that's handy
<harlowja> http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/sources/__init__.py#L164 (for example has the hostname logic)
<harlowja> http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/tools/mock-meta.py might be useful for u too
<harlowja> http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/tools/mock-meta.py#L192 (probably this function most useful)
<devicenull> ahha
<devicenull> yea that last one is very useful
<devicenull> so the network/* paths aren't used?
<devicenull> btw is there an actual better debug option?  even with --debug, I don't actually get any info on what cloud-init is doing
<devicenull> just some internal debug
<harlowja> devicenull network/* paths?
<devicenull> network/interfaces/macs/mac/device-number
<devicenull> from http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AESDG-chapter-instancedata.html
<harlowja> hmmm devicenull that seems newer than what cloudinit knows about
<harlowja> at least i didn't know that existed
<harlowja> devicenull anyway, cloud-init will read that, but it doesn't appear used yet afaik
<harlowja> devicenull as for debug, u should be able to get debug logs in /var/log/cloud-init.log
<harlowja> can also adjust the logging levels to do this to
<harlowja> when starting an instance
<harlowja> depending on what cloud-init version u have this is easier/harder
#cloud-init 2014-08-16
<harmw> if someone sees RaginBajin tell him to ping me
<harmw> i've got a freebsd pkg for him to test
#cloud-init 2015-08-10
<smoser> Odd_Bloke, some data sources cannot change
<smoser> and thus persisting is fine.
<smoser> config-drive simply cannot be made to change. other ones do not change.
<Odd_Bloke> smoser: Others definitely can change though (e.g. you can add SSH keys to a GCE instance).
<smoser> right. and ec2 changes user-data during a stop
<smoser> and meta-data is dynamic
<smoser> so yeah. some stuff we dont want cached, but stuff we can only ever read once... (like the passwd) we'd probalby need to cache or consume the first time we read it.
#cloud-init 2015-08-11
<openstackgerrit> Merged stackforge/cloud-init: Fix running cloud-init with no arguments on Python 3.  https://review.openstack.org/210381
<Odd_Bloke> claudiupopa: Is your data source API stuff ready for a re-review?
<Odd_Bloke> (I might not get to it this morning/today anyway)
<claudiupopa> Yep, it should be finished.
<Odd_Bloke> OK, cool, I'll see if I can give it another look today.
<gamename> hi guys.  I think my cloud-init on an AMI is not working.  What should I look for in the system log(s) to indicate cloud-init worked, or at least executed?
<Odd_Bloke> gamename: You should be able to look at /var/log/cloud-init.log, or search for 'cloud-init' in the syslog.
<smoser> in almost all cases, cloud-init writes WARN to that log if something is wrong.
<gamename> smoser: Thanks
<gamename> Odd_Bloke: Thanks
<harlowja> smoser btw, http://governance.openstack.org/resolutions/20150615-stackforge-retirement.html
<harlowja> i'm unsure what we want to do there due to that :-/
<harlowja> back to launchpad :-/
<harlowja> idk
<harlowja> launchpad + git?
<harlowja> https://review.openstack.org/#/c/192016/ did merge recently (the goverance blah blah)
<harlowja> * http://eavesdrop.openstack.org/meetings/tc/2015/tc.2015-07-28-20.03.log.html#l-11
<smoser> so the only change really is s,stackforge,openstack,
<smoser> right ?
<harlowja> and http://eavesdrop.openstack.org/meetings/tc/2015/tc.2015-07-21-20.01.log.html#l-29 (more of the logs of that)
<harlowja> smoser seems like it, so far, yes
 * harlowja personally i don't know why they are doing this, but whatever, lol
<smoser> gotta do something. they're not paying people to sit on their hands. (only harlowja is so lucky)
<harlowja> lol
<harlowja> push the hay in that pile, no now move it to that pile, noooo now move it to that pile
<harlowja> lol
<smoser> we can't all get by on our good looks.
<harlowja> lol
<harlowja> ha
<smoser> so the annoucment there... doesn't seem to make any negative implications for "non openstack openstack projects"
<smoser> but the discussion on 07-21 does
<harlowja> yup, that was my take on it to
<harlowja> '<dhellmann> jeblair: fwiw, both pecan and wsme will likely move back to github, not because of contempt, but because they're not solely used by openstack' ...
<harlowja> so ya, it confused me to
<harlowja> something changed after 7-21 that i haven't quite figured out lol
<harlowja> cause http://eavesdrop.openstack.org/meetings/tc/2015/tc.2015-07-28-20.03.log.html#l-11 was like no discussion, ha
<harlowja> smoser and how timely, http://lists.openstack.org/pipermail/openstack-dev/2015-August/071816.html
#cloud-init 2015-08-12
<Odd_Bloke> smoser: harlowja: So that looks like we won't need to move to openstack/, right?
<Odd_Bloke> As they are trying to avoid doing moves (rather than remove the string 'stackforge'), moving all the stackforge projects would seem unproductive...
<smoser> Odd_Bloke, it seems that the name stackforge is gone.
<smoser> we will have to move to openstack/ namespace.
<Odd_Bloke> smoser: http://paste.ubuntu.com/12061743/
<Odd_Bloke> smoser: That's from #openstack-infra just now.
<smoser> well thats nicer than i thought
<Odd_Bloke> smoser: Yeah, I think there was an earlier draft which was harsher.
<smoser> i thoguht we were at least forced to move namespace nwo rather than "at _some_ undefined point"
<claudiupopa> hey guys.
<claudiupopa> meeting today?
<Odd_Bloke> Yeah, I need to restart browser.
<Odd_Bloke> Will be there RSN.
<smoser> k
<smoser> ok...
<smoser> so we'll start cloud-init meeting here now.
<claudiupopa> yeah, sorry about that, I don't know what's happening with my mic.
<jgrimm> smoser, meeting here? or?
<smoser> no worries
<smoser> yeah
<jgrimm> cool
<smoser> let me type an agenda quick
<smoser> agenda:
<smoser>  * reviews http://bit.ly/cloudinit-reviews-public
<smoser>    * https://review.openstack.org/#/c/209520/
<smoser>  * reporting / smoser and cloud-init 0.7
<smoser>  * main and persisting state
<smoser>  * open discussion
<smoser> seem reasonable ? if i've missed anything then we can add them in open discussion.
<jgrimm> +1
<claudiupopa> Sounds good.
<Odd_Bloke> +1
<smoser> ok.
<smoser> so reading claudiupopa's very nice commit message.
<smoser> i think looks nice. have to read closer on how the network and local data source woudl work.
<smoser> but since many comments addresed in that review i think we're good there.. i'll review that some more after meeting
<smoser> reporting / smoser and cloud-init 0.7
<Odd_Bloke> Yeah, I need to do a re-review of that as well.
<smoser> == reporting / smoser and cloud-init 0.7 ==
<Odd_Bloke> But it was great last I looked, so that shouldn't take long.
<smoser> i took the reporting code from cloud-init 2.0, and added some things to it and loaded it into 0.7. the things i added were not much, and the intent will be to get them to really just be copied from 2->0.7
<smoser> the things discovered in using it that need to be addressed:
<smoser>  a.) exceptions should not leak through reporting.
<smoser>    the handlers should log errors but exceptions shoudlnt be allowed to bubble up.
<claudiupopa> You mean errors inside the handlers?
<smoser> yeah.
<Odd_Bloke> claudiupopa: Yes.
<smoser> ie, posting error to http://foo should not cause cloud-init to stacktrace
<smoser> log failure, go on with life.
<smoser>  b.) the webhook ideally woudl buffer messages until it has network connection and thne start. non-trivial to decide when it should try to reach an endpoint though.
<Odd_Bloke> That sounds like a nice-to-have to me; do we have a concrete use case for something needing to know what happened before network comes up?
<smoser> other than that i'd like to have that information collected...
<smoser> if we add timestamp of events to the event, then it makes logging boot time very easy.
<smoser>  c.) blocking... its not really a problem yet, but it'd seem that blocking on a webhook could be problematic to boot performance.
<smoser> if we're just expecting things to log errors rather than raise exception seems reasonable to somehow not block the rest of the code.
<smoser> make sense ?
<Odd_Bloke> Could we achieve (b) by writing structured data to the filesystem (i.e. a separate reporter) and then a separate job could Require/After cloud-init-with-networking and push that data out somewhere?
<claudiupopa> Why not storing them in memory?
<claudiupopa> Seems redundant to cache them on the filesystem.
<claudiupopa> And that's additional disk access that we don't need.
<Odd_Bloke> Because we don't know when we're going to get networking, and systemd is good at ordered dependencies. :p
<Odd_Bloke> I mean, this sounds like a very specific use-case.
<Odd_Bloke> I'm thinking of a way that someone who wanted it could achieve it.
<claudiupopa> Since I'm not very familiar with it..what systemd has to do with this? :-)
<smoser> systemd would start a job for us when networkign was all up.
<smoser> just like our next topic. we'll have stages in boot, for local datasource and applying networking config
<smoser> and then a stage that runs when networking config has come up
<smoser> in cloud-init 0.7 that is 2 separate processes, so need disk to persist and replay.
<smoser> i agree that it is a bit of a specific use case
<claudiupopa> I think I got it. I'm wondering if we could customize the stages or if they are hardcoded, since ideally on windows, we'll have only two of them.
<smoser> how so?
<claudiupopa> basically on windows we're doing everything in almost one run of cloudbaseinit. Except the fact that setting a hostname requires a restart and we have that splitted into another initial run.
<claudiupopa> that's why we actually don't need persistence on disk for windows.
<smoser> have to think some on it.
<smoser> i think we're kind of moved on to topic 2
<smoser> we've bled over.
<Odd_Bloke> Yeah, I think there's a fairly fundamental design decision to be made here.
<claudiupopa> yeah, regarding topic 1, I'm +1 on not crashing cloudinit when a reporting hook fails.
<Odd_Bloke> Which is going to affect how we implement a whole bunch of things.
<Odd_Bloke> And, yeah, we shouldn't crash out, and we should be able to do reporting in the background.
<Odd_Bloke> We'll obviously need to ensure that the payload contains a timestamp if we're doing it in the background.
<claudiupopa> in the background meaning a thread?
<Odd_Bloke> I'm not fussy. :p
<Odd_Bloke> It's all IO/networking, so the GIL won't be a problem.
<smoser> so ..
<smoser> https://review.openstack.org/#/c/202743/7/cloudinit/shell.py
<smoser> the first 4 are "stages" that are largely what we have in cloud-init.
<smoser> basically you have:
<smoser> basically you have:
<smoser>   search local datasources (and apply netowrking if possible) . this stage blocks networkign from coming up
<smoser>  search network datasources and run "init" modules (in 0.7 talk)
<smoser>  then config-final.
<smoser>   which is really just supposed to be "rc.local like timeframe"
<smoser> realize that i dont know that i wrote those well, but those are the general stages.
<claudiupopa> got it
<smoser> in 0.7 we have 3 config stages: init, config, and rc.local. i'm not sure if we need both 'init' and 'config'.
<claudiupopa> so you're relying on the OS to call these 4 stages for you?
<smoser> right.  but whether it calls me or not... they're still tasks we want to accomplish
<smoser> and also offer as places to "hook" into boot
<smoser> as the earlier you can do something, the less re-work you might have to do .. ie, ideally you can write config for a service before the service would normally start.
<smoser> (so rc.local isn't ideal).
<smoser> anywa..
<claudiupopa> just to be clear on this, why the need of multiple stages in cloud-init? What does it solve from one single stage?
<smoser> because running all "really early" might mean network isnt up.
<smoser> and running all "really late" means you've lost your chance to make a change without restarting a service or possibly other negative side affects.
<claudiupopa> so basically each run is a granular list of tasks to do, of config modules to execute?
<smoser> in 0.7, yes.
<smoser> in 2.0 we'd like this to be more dynamic, but still maintain that idea.
<claudiupopa> yeah, what would be ideal is to have a workflow that doesn't require multiple runs on windows and works with more stages on other platforms.
<claudiupopa> how about having a stage that chains all the other ones together?
<smoser> well, that is what 'all' is.
<claudiupopa> probably windows will be the only platform that will use it.
<smoser> and its useful for a user wanting to re-try something.
<claudiupopa> oh, there is one already.
<smoser> i think you kind of still have 2 stages there though.
<smoser> one before your networkign is up, that configures networkign, and sets hostname (and reboots)
<smoser> and then one that runs after that in the new boot
<claudiupopa> yeah, currently hostname and network config are separated.
<smoser> ok.
<claudiupopa> Don't know why, but I'm guessing because in the first phase we don't have networking info available.
<claudiupopa> But I have to check it out.
<claudiupopa> So two of them are definitely what we need.
<claudiupopa> this means that there should be a way to customize what a stage is.
<smoser> yeah, we can probably work things out.
<smoser> so when Odd_Bloke went looking at this he came to the problem where the 'network' stage (locate and apply networking configuration')
<smoser> would crawl a datasource and that datasource had a one-time read-only password in it.
<smoser> the network stage, would only apply networking information ideally, so then he'd have to store that password somewhere so that the 'config' stage could apply it.
<claudiupopa> yep, I understand know why persistance was into discussion.
<smoser> at risk of being called an idiot.. i dont think its all that big of a deal.
<smoser> i'd write the data to a file and read it back later.
<smoser> if its a root owned file, you risk that password being read off the disk by a root user at some future point or having that data found otherwise.
<smoser> i dont advocate storing passwords in plaintext on a disk, but i also dont advocate setting root passwords
<smoser> Odd_Bloke, ? thoughts?
<Odd_Bloke> Hold on, pulled in to another call.
<smoser> claudiupopa, thoguths ?
<claudiupopa> yeah, I'm not into security, but storing a file with potential private data doesn't strike me as being a very good practice.
<smoser> on linux we could encrypt it for inclusion to shadow, so thats better than plaintext.
<claudiupopa> also, if we'll have the ability to customize stages, then in that case on windows we will probably don't activate the persistance.
<claudiupopa> If that makes sense.
<claudiupopa> Since the password is probably not used by the network module.
<smoser> and we can also attempt to "shred" the file on disk, but i'm not sure how affective shred is on non-local and spinning disks. (http://superuser.com/questions/22238/how-to-securely-delete-files-stored-on-a-ssd)
<smoser> claudiupopa, i think we can come up with some solution there , yeah.
<smoser> you'll have to drive that :)
<claudiupopa> yeah.
<claudiupopa> Okay.
<claudiupopa> :-)
<Odd_Bloke> Right, back./
<Odd_Bloke> smoser: claudiupopa: So I had an idea: we could make the persistence invisible to consumers and only happen at the last possible moment.
<Odd_Bloke> That way, if we decide to run in stages, we persist the data.
<Odd_Bloke> But if we decide to run all at once, we don't.
<Odd_Bloke> (Or we only persist things that haven't been scrubbed out of the data)
<smoser> i think that sounds reasonable.
<claudiupopa> makes sense
<smoser> the registry could do that
<smoser> you coudl register 'persistent' items, some with a 'secure' or something.
<smoser> adn then on exit write it out
<Odd_Bloke> I think we'd always need to write out everything if we were writing out.
<Odd_Bloke> And then consumers of secure data would have a way of expunging it.
<smoser> right.
<Odd_Bloke> Which would remove it from memory and, if it had been persisted, remove it from disk as well.
<claudiupopa> what's a consumer in this case? Another endpoint or?
<smoser> config that consumes password
<claudiupopa> aa.
<Odd_Bloke> A potential future enhancement would be to allow you to configure cloud-init to never persist "secure" information.
<Odd_Bloke> But that would require a way of specifying what was "secure" data, and so is probably too much to bite off at this point.
<larsks> If I want to add a single item to cloud_final_modules, shouldn't I be able to do that with a cloud-config file that has a merge_how directive?
<smoser> larsks, is the module available somwhere already ?
<larsks> Yeah, it's one of the standard config modules. I just want to enable it (without needing to re-specify evertyhing that's in the system cloud.cfg)
<larsks> I thought something like this might work: http://chunk.io/f/f36e3fad696f4e23bf0b3a3e7eb26182
<larsks> ...but that seems to be *replacing* the cloud_final_modules: key.
<Odd_Bloke> smoser: claudiupopa: So I think I have enough to start on implementing that configuration stuff (and therefore beginning on main work).
<Odd_Bloke> smoser: claudiupopa: Do we have any open discussion topics?
<claudiupopa> I don't think so. You mean working on stages?
<Odd_Bloke> Yeah.
<claudiupopa> one requirement I would like to have is the ability to customize them, we'll probably use something different on windows.
<Odd_Bloke> So the current stubs include 'all' which runs all the stages.
<Odd_Bloke> We could either add an 'all-on-windows', or make the stages that 'all' runs customisable using /etc/cloud.cfg (or the Windows equivalent).
<Odd_Bloke> But getting 'all' working is probably further down the line anyway.
<Odd_Bloke> So we can hash it out then. :)
<claudiupopa> Okay. :-)
<smoser> ok. i think we're done for today.
<Odd_Bloke> Cool.
<Odd_Bloke> I'm going to try to get the meeting I have half an hour after this meeting starts rescheduled.
<claudiupopa> okay, guys.
<claudiupopa> I'll be waiting a review on the data source api. ;-)
<Odd_Bloke> So that I don't have to disappear just when we're getting to the juicy stuff. :p
<Odd_Bloke> claudiupopa: Well I'm backporting stuff for cloud-init 0.7.x on Azure, which means I regularly have time to do other things while waiting for Azure.
<Odd_Bloke> So hopefully I'll get to that this afternoon.
<larsks> Odd_Bloke: smoser: any thoughts on the merging-vs-replacing question?
<Odd_Bloke> larsks: I've never touched the merging stuff, so I have no idea, I'm afraid. :)
<smoser> larsks, sorry.. reading
<smoser> larsks, that actuallyu looks like it *should* work
<smoser> you can also use cloud-config-jsonp to do somethig similar
<larsks> smoser: yeah, that's what I thought, too :/.  I'm looking for jsonp examples right now; if you have one handy that would be awesome...
<larsks> I found https://bugs.launchpad.net/cloud-init/+bug/1316323.  Let me git it a shot.
<larsks> smoser: do you have a minute to take a gander at my cloud-config-jsonp attempt?
<smoser> larsks, yea. sorry. give me a minute.
<larsks> no rush.  Here are the details: https://gist.github.com/larsks/8e0848d4e81c9e7cb066
<larsks> At the bottom is the jsonp file, based on the examples from the lp bug.  Above that  is the system cloud.cfg, and at the top is the error that cloud-init is throwing...
<larsks> The error suggests that it is applying my patch against an empty configuration.
<smoser> larsks, ok... so
<smoser> from trunk, there is a tool : ./tools/ccfg-merge-debug
<larsks> Okay.  I'll take a look at that in a second.  Did the jsonp in my gist look reasonable?
<smoser> http://paste.ubuntu.com/12062712/
<smoser> so it seems to work for me...
<smoser> maybe you dont have a stock config ?
<smoser> in which case maybe you have to create the entry first ?
<smoser> hm..
<larsks> No, there's definitely a stock config there.
<smoser> yeah
<smoser> odd
<larsks> If it matters, I'm working with 0.7.5, because that's what centos has.
<smoser> do you see any WARN in config  ?
<larsks> smoser:  there are no warning messages in the log.  Using that merge config just replaces cloud_final_modules.  I can get the same behavior by booting with no user-data, and then running "cloud-init --file config-with-merge.yml modules --final"
<smoser> larsks, i'm sorry, i'm not following
<larsks> (a) there are no warnings (b) as we have seen, any use of that merge configuration actually *replaces* cloud_final_modules rather than appending, and (c) I can reproduce that behavior on demand just by passing the merge config to cloud-init using --file.
<larsks> That makes me sad, because I am trying to avoid spinning a new cloud-init rpm every time someone says, "we should enable config module <foo> in the default configuration".
<larsks> I'm going to see if maybe this is a version-related issue and maybe a more recent cloud-init will work correctly.
<smoser> look at /etc/cloud/cloud.cfg.d/05_logging.cfg
<smoser> andcomment out
<smoser>  - [ *log_base, *log_syslog ]
<smoser> so it doesnt go to /dev/log (im not sure where that goes)
<larsks> /dev/log ends up in the system journal.
<larsks> (on these systems)
<larsks> Let me try this with 0.7.6 first, because if that makes the problem go away I think we're actually in okay shape.
<smoser> larsks, it seems like maybe its a matter of unicode
<larsks> Interesting. In what way?  Fwiw, 0.7.6 exhibits the same behavior...
<smoser> nah. its nto that.
<smoser> :-I(
<smoser> it appears that you cant patch builtin config
<larsks> Ahhhhhhhhh, poop.
<smoser> yeah, crappy
<smoser> you can re-define the whole list though
<larsks> Yeah. Thanks for lookin'!
#cloud-init 2016-08-15
<Odd_Bloke> smoser: I've added some comments on https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1603222; your thoughts would be much appreciated. :)
<smoser> Odd_Bloke, so i think that at this point more of the stuff is in cloud-inti, right ?
<smoser> i think what you're saying is that cloud-init in xenial does not depend on udev rules form walinux agent
<smoser> so i'd prefer if we fix something to have that be consistent between trusty -> xenial +
<Odd_Bloke> smoser: The problem is orthogonal to udev rules, really.
<Odd_Bloke> smoser: One thing I wasn't sure about is how introducing the udev rules to existing instances would affect them.
<Odd_Bloke> But both sets would apply, so I think it would be fine...
<smoser> right. they both should set up symlinks.
<smoser> why is it orthogonal?
<Odd_Bloke> Because the problem is that they don't use the udev rules properly; it doesn't matter which set of udev rules are there.
<Odd_Bloke> But, yeah, I'm happy to backport the udev rules if that seems safe.
<Odd_Bloke> I was just thinking in terms of minimising the backport diff.
<mgagne> smoser: will test your fix for bond right now
<mgagne> however I believe 3.2) described in bug still isn't fixed.
<smoser> mgagne, looking
<mgagne> in a meeting but so far, only the auto stanza on bond0 looks to be missing
<smoser> mgagne, i'm not sure its requried.
<mgagne> it is as far as I know, I added it and it worked
<smoser> i'll compare the network config against other stuff we have examples of (curtin vmtest) where we actually verify
<mgagne> will test after my meeting
<smoser> thanks mgagne
<smoser> rharper, look at https://bugs.launchpad.net/cloud-init/+bug/1605749 and have a think.
<rharper> smoser: is that your bonding fix ?
<rharper> yeah; I read the branch earlier, my initial thought was that we'd generically want to "Resolve links" at render time
<rharper> rather than be bond specificy
<rharper> the mechanism to use link['id'] as the interface name key is common for all types; that is in the network state we only have link/ids and then at runtime we'd do a  replacement of link-id with get_name_from_macaddr(link_to_mac[link-id]) sort of lookup
<smoser> rharper, well, do other things have referenced links ?
<rharper> any of the combined types
<rharper> bridges, bonds, and vlans
<smoser> rharper, well, if we want, its easy enough now with the same generic mechanism
<rharper> w.r.t auto on bond0; that is needed; there's a bug related not getting auto on stanzas without network config (the subnet has the 'control' value and default)
<rharper> so a bond with no subnets, but then vlans on top misses the 'auto bondX' line;  I've found this in curtin since Friday, working on a fix there;
<smoser> ok.
<smoser> then the 2 things are
<smoser> a.) auto bond [necessary]
<smoser> b.) resolve links generically
<smoser> where b is not strictly necessary at this time...
<smoser> can you give anohter example of where it is?
<rharper> a) bonds default to auto, unless a subnet with 'control' says otherwise
<rharper> for b) just replace the vlan_raw_interface value from bond.X to interface0
<rharper> the underlying device if it's type physical' will refer to another "links" element which may not have a 'name' key set
<rharper> think, eth0.123;  the eth0 would be the link.id; and that may not be the name of the device, like bond_interfaces contains link.ids
<smoser> where did you come up with the string 'vlan_raw_inteface' ?
<smoser> i dont see that anywhere.
<rharper> that's the eni name
<rharper> sorry, vlan_link is the network_data.json field
<rharper> in cloudinit/sources/helpers/openstack.py:593
<rharper> we generate a nic name based on the vlank_link and vlan_id, which is OK since vlan is a constructed interface, but the vlan_link points to the underlying device (this is a link.id) and needs to be replaced
<smoser> rharper, ok. i can add a test for vlan on that also.
<rharper> if you need a config yaml, I have one
<smoser> http://paste.ubuntu.com/23059320/
<smoser> i think that is what you were saying
<smoser> that shows the error.
<smoser> in 'eth0.602', does 'eth0' actually matter ?
<rharper> for our internal state no
<rharper> but it's typical shorthand for underlying device . vlan_id
<rharper> so, the vlan scripts in ifupdown split on . and call vconfig with the first segment and pass the second as the vlan_id
<rharper> you can instead  say iface vlan1 and underneath specify the vlan_raw_device (eth0), and vlan_id (123)
<smoser> well that sucks
<smoser> harder to get at that
<rharper> no, we just need to tag the elements of state that use a link.id
<rharper> and when we're rendering the interface, do an id lookup by mac
<rharper> we already repeat the vlan_id as an interface attribute
<rharper> we can set the vlan interface name to vlan{index} instead of {link_id}.{vlan_id}
<rharper> which is what we're doing now;
<smoser> well, https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bond_name is updated to now have a vlan test case that i think renders correctly
<smoser> i dont love the mechanism, but seems to work
<rharper> looking
<smoser> still missing the auto up change, assuming youre looking at that.
<rharper> yeah, that's a one-liner in eni.py
<rharper> basically in the case that we don't have a subnet configured, if we have 'bond-masters' or 'bond-slaves' we emit an auto $iface
<rharper> I need to think a bit more
<rharper> as it's a general issue for interfaces that don't include a subnet config since 'control' is a property of a subnet
<smoser> yeah.
<mgagne> smoser: so I patched on my side to get auto added. It now fails as described in the bug description in 3.1)
<smoser> mgagne, well, how are you running that ?
<mgagne> smoser: running what?
<mgagne> smoser: install cloud-init from repo, apply patches found in your branch, build image, upload image, boot
<mgagne> http://imgur.com/yA3eslq
<rocket> I am wondering if I can use cloud-init in my home lab running with vmware fusion.  Is there a metadata server I could run locally thats fairly simple/lightweight?
<smoser> rocket, there is some vmware support, but i'm not familiar enough with their product line to know if 'fusion' supports it.
<rocket> I was just hoping to start up a pythonic based webserver or something .. or should I be looking at creating my own that produces the yaml files I am seeing in documentation?
<rocket> I just didn't know what was required for a really simple setup
<rocket> I *think* I just need random hostnames and point that towards a saltmaster etc..
<smoser> rocket, theres two things that provide difficulty i think
<smoser> a.) you need data per isntance-id ... each instance needs to somewho get different data
<smoser> b.) you have to tell cloud-init where the metadata service is.
<smoser> you could mock the ec2 one by mocking 169.254.169.254 and plumbing that network in
<rharper> smoser: 3.1 is another run-time variant;  bonds inherit mac address of the slaves ;  when we're doing the lookup, we can filter by type (we only need to loop up names by macs of 'physical' devices)
<smoser> rharper, right. but he's getting a stack trace ther
<smoser> which is interesting and i can't reproduce.
<mgagne> smoser: we are not yet at 3.2, we are still stuck at 3.1
<mgagne> smoser: how are you testing? do you have access to an openstack cloud?
<smoser> python3 NotADirectory and FileNotFound are obnoxious
<rharper> smoser: we need to not try to look up the bond mac address; it can be called *whatever* bond{index} ; only the bond_interfaces lists of link_ids need to be resolved (and actually) we need to check the type of the links to see if they're physical, otherwise we can ignore the mac lookup
<smoser> mgagne, well, i do, yes, but was not focused there yet, and i dont have an openstack cloud that woudl ask me to bind lik ehtat
<mgagne> this line is problematic: https://git.launchpad.net/cloud-init/tree/cloudinit/net/__init__.py#n99
<mgagne> it makes the assumption that all devices found in this folder are a real device and file is a directory
<mgagne> this is why this line fails: https://git.launchpad.net/cloud-init/tree/cloudinit/net/__init__.py#n350
<mgagne> but could be related to python3 as you said
<smoser> mgagne, well, it doesnt really make the assumption
<smoser> it accepts an OSError and a IOError and does the right thing
<mgagne> I'm not sure why one would list those devices and only filter them later
<mgagne> rharper: I'm not sure why cloud-init tries to configure the network a second time. The 2nd time is run, slaves mac address might be updated and no longer match the ones found in config-drive.
<rharper> smoser: it does the right thing (read_sys_net) however, the interfaces_by_mac does not like 'bonding_masters' file and throws exception; this prevents creating the mac_to_ifname;  we can handle the NotADirectoryError and continue
<smoser> rharper, we are trying to handle that.
<smoser> thats the thing
<smoser> NotADirectoryError is an OSError
<rharper> interesting
<rharper> when I test it, it's not handled
<smoser> but apparently does not have errno = 2
<smoser> where do you test this ?
<rharper> xenial vm
<rharper> with bond added
<rharper> if you're on diglett you can ssh into the vm
<rharper> smoser: ssh ubuntu@192.168.122.178
<rharper> this is not with your branch, so if you've updated it's just what's in xenial (cloud-init level)
<mgagne> current code is testing for ENOENT, not ENOTDIR
<smoser> probably need eNOTDIR
<smoser> yeah.
<rharper> http://paste.ubuntu.com/23059495/
<smoser> rharper, http://paste.ubuntu.com/23059501/
<smoser> obnoxious
<smoser> so open("/sys/class/net/bonding_masters/address") throws a NotADirectoryError with a errno of 20
<mgagne> try with an existing file and append a filename to it and try opening it
<smoser> mgagne, right. thats it. ok. thank you
<rharper> yeah; it's the full path that included not-a-dir-element
<smoser> http://paste.ubuntu.com/23059511/
<rharper> y
<smoser> there is  a question in my mind if there could be 2 nicks with the same address
<smoser> wow
<smoser> $ cat /sys/class/net/bond0/address
<smoser> 52:54:00:f2:5a:35
<smoser> $ cat /sys/class/net/ens3/address
<smoser> 52:54:00:b2:5a:27
<smoser> $ cat /sys/class/net/ens5/address
<smoser> cat: /sys/class/net/ens5/address: No such file or directory
<smoser> so the answer is that you can't have 2, but if this were to run after a bond were set up, we'd get the bond as the device with that mac
<smoser> which is odd
<mgagne> root@localhost:/sys/class/net# cat bond0/address
<mgagne> 0c:c4:7a:34:6e:3c
<mgagne> root@localhost:/sys/class/net# cat eno1/address
<mgagne> 0c:c4:7a:34:6e:3c
<mgagne> root@localhost:/sys/class/net# cat eno2/address
<mgagne> 0c:c4:7a:34:6e:3c
<rharper> as I mentioned before; for naming, we can ignore non-physical devices;
<rharper> bonds/bridges/vlans have various configs that inherit mac of underlying devices;  we really want to know the physical nic and mac pairing
<smoser> that is odd.
<smoser> well, rharper we dont *always* want the physical nic. we could put a bond on two vlans
<smoser> i think though, that the code i have in that tree is actually right.
<rharper> vlan names are arbitrary
<rharper> as are bond names
<smoser> sure. but if we're looking to get a mapping of mac to interfacen ame, then the path is valid.
<rharper> that is, we can always set them
<smoser> but i think the code is doing the right thing at this point.
<smoser> as it looks through the links and sets up the name, and only overwrites it with what it found in /sys if it does not yet have a mac from the links table.
<mgagne> you can find the original mac address in /sys/class/net/<ifname>/bonding_slave/perm_hwaddr
<rharper> to configure the bond or vlan correctly, we only need to lookup link_ids of physical devices;
<rharper> and emit those names in the config
<smoser> oh wait. it doesnt do that. but it shoudl.
<smoser> yeah, its ok as it is right now i think.
<smoser> mgagne, so i think what i just pushed will fix all but the 'auto'
<smoser> i have to runnow.
<mgagne> https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bond_name ?
<smoser> right
<mgagne> no, this won't fix the auto
<smoser> the one 2dee860 should fix the NotADirectory
<smoser> right
<smoser> it should fix all *but* the auto
<mgagne> http://paste.openstack.org/show/557628/
<mgagne> and the point 3.2)
<rharper> mgagne: yeah
<mgagne> I'm stuck at 3.2 since last week
<mgagne> everything else has been fixed on my side since
<mgagne> maybe not in the way you would have done it
<rharper> when you say later, do you mean subsequent boots ?
<mgagne> let me check
<mgagne> "3.2) Once 3.1) is fixed, configuration fails again later"
<mgagne> Once 3.1 is fixed, cloud-init will fail again at a different place further down
<mgagne> will reword the description
<smoser> i dont understand 3.2 failure though
<smoser> as when this runs, the bond should not be set up yet
<mgagne> cloud-init looks to be rerun twice. Since bond is already configured at this time, mac isn't found and crash
<mgagne> it is
<smoser> it shoudl not run the network config step on the second time through if it successfully ran it on the first.
<smoser> ie, cloud-init local should set up networking before anything is allowed to come up
<smoser> and then cloud-init should come up and see its not first instance boot (cloud-init local was) and not run that section
<mgagne> well, I didn't see failure in dsmode=local and cloud-init still try to (re)configure network anyway
<mgagne> I can rebuild with latest patches and pull out logs
<smoser> mgagne, sure. i do agree if it ran twice the second one well coudl fail
<smoser> i do have ot run now.
<mgagne> the ENOTDIR error is caused by this second run
<rharper> we should debug why you get a second run, but certainly the 'bonding_masters' file is only added after a bond is configured
<mgagne> image is rebuilding with all patches, gonna take a while before it builds and then a baremetal is booted
<mgagne> maybe I can pull the logs from the current baremetal, it ran twice anyway
<mgagne> rharper: http://paste.openstack.org/show/GHjpr6jMq1uxoCtRl922/
<rharper> k
<mgagne> "Execution continuing, no previous run detected that would allow us to stop early."
<rharper> is config provided in two places? or ConfigDrive only?
<mgagne> we only have configdrive in place
<mgagne> "no-net is written by upstart cloud-init-nonet when network failed" I'm not sure xenial still has upstart
<mgagne> the file /var/lib/cloud/data/no-net does not exist on the machine
<rharper> ok
<rharper> yeah, when init runs, it reads the data; that somehow is definitely re-running net config
<mgagne> I don't understand the logic here... if no-net exists, it's because network config failed. The logic makes it so network config in dsmode=net will STOP if network previously failed. in our case, it sure didn't fail... so it will try to configure it again? o_O
<mgagne> I only found occurence of no-net in cmd/main.py and upstart config file so I'm not sure what to think here
<rharper> so, dsmode=net, IIUC, is for things like AWS where the datasource is over the network
<rharper> in that case, cloud-init has to bring up something (fallback networking, dhcp on an interface) and attempt to find a datasource over the networking
<rharper> after acquiring a datasource, networking config may be included, in which case, we'd need to update the network config (override the fallback generated one)
<mgagne> I have those ds loaded: NoCloud, ConfigDrive, OVF, MAAS
<mgagne> Should I remove all but ConfigDrive?
<rharper> in the ConfigDrive case, it's not via network but local file
<rharper> you can configure them off but it will only select one
<mgagne> so I think DataSourceConfigDrive supports both local and net.
<mgagne> and there is no flag to previous it from running twice. But it could also be by design but didn't plan for bonding support and all its side effects
<rharper> I think I can see it applied twice; it was only with bonding configs that we trip up
<mgagne> yea
<mgagne> I think there was a lot of logic that didn't account for bonding being configured and enabled at that time
<mgagne> like mac address changing
<rharper> yeah; I don't think we want to apply whole-sale net config twice;
<mgagne> but this *could* be a valid scenario (I don't know yet how)
<mgagne> like boot with APIPA address in dsmode=local and later get an IP with metadata service or whatever.
<mgagne> I just don't want to bulldozer my way to make bonding work and break use cases I didn't know existed
<rharper> yeah, so init [net] mode does re-read the json data and attempts to create network_state which invokes the openstack conversion, which fails when the initial state of the system is already configured
<rharper> I don't think you're breaking anything;  smoser or harlowja will have to help me understand why they'get parsed twice
<mgagne> only because it can't find the link mac address (after ENOTDIR is fixed of course)
<rharper> sure; but in general, I'd like to know why we convert it twice (no need, it was already rendered into the instance_id object IIUC)
<rharper> so, if it didn't fail converting due to the ENOTDIR, then it's attached to the stage object and you see:
<rharper> stages.py[DEBUG]: not a new instance. network config is not applied.
<rharper> you
<rharper> yours never gets that far; maybe the rebuild will
 * rharper steps out for a bit 
<mgagne> rharper: ok I fixed the last issue. baremetal is now booting fine
<mgagne> rharper: all patches: http://paste.ubuntu.com/23059836/
#cloud-init 2016-08-16
<smoser> mgagne, hm..
<smoser> your 'is_bonding_slave' idea is fine, but i really dontunderstand why it shoudl be needed.
<smoser> you mentioned upstart, are you using upstart somewhere ?
<smoser> you dont want to mess with dsmode really.
<smoser> in current trunk, dsmode=local would allow you to make init_modules run earlier (without access to network)
<smoser> mgagne, around ?
<mgagne> smoser: I am now
<smoser> hey.
<smoser> so, are you using upstart ?
<mgagne> smoser: I'm not using upstart, I was reading source code and commenting about it. cloud-init is running twice as per bug description. Running it a second time fails because cloud-init doesn't expect bonding to be configured at this point or in fact, all code and tests were done without bonding support so a lot of assumption were made which aren't true anymore.
<mgagne> I'm booting on ubuntu 16.04, it's systemd afaik
<smoser> cloud-init does run twice for sure, but only the first time should set the networking up.
<smoser> oh. but we rename on every boot, so maybe we're doign that twice.
<smoser> hm..
<mgagne> ok, well that's not the case on my side, bug 3.1) and 3.2) were caused by this double network config run
<smoser> i'll have a look in a bit. your is_bonding_slave change seems to make sense.
<mgagne> no no, I boot ONCE and it fails, I'm not even testing reboot at this point
<smoser> right.
<mgagne> because of this mac/link/device mapping, 2nd run fails because of how bonding behaves, it changes the mac of the bonding slaves, hence the added logic for is_bonding_slave.
<mgagne> I didn't do extensive tests, just boot, ping, ssh (with sshkey) and check hostname
<smoser> right
<harlowja> smoser u ever get a chance to look over https://code.launchpad.net/~harlowja/cloud-init/+git/cloud-init/+merge/302609
<harlowja> its the future!
<harlowja> ha
<smoser> harlowja, i've not looked at it now.
<harlowja> np
<smoser> so ..
<smoser> Instead of looking in a very specific location for
<smoser> cloudinit config modules; which for those adding there
<smoser> own modules makes it hard to do without patching that
<smoser> location instead use entrypoints and register all
<smoser> current cloudinit config modules by default with that
<smoser> new entrypoint (and use that same entrypoint namespace
<smoser> for later finding needed modules).
<smoser> --
<smoser> how does registering the entry points help "those adding their own modules"
<smoser> rharper, what shall i do for mgagne's auto-bringup of bond.
<smoser> did you have work on that that i didnt' see ?
<rharper> smoser: the fix is what i had
<rharper> but in general, we need to think about v4 vs v6
<smoser> what fix ?
<rharper> in eni.py
<smoser> i didnts ee, sorry.
<smoser> didnt see
<rharper> he posted patches, basically adds the if 'bond-master' or 'bond-slaves' in iface, then emit auto
<harlowja> smoser  so they still need to add a entry to cloud.cfg (either at packaging time, or at userdata/runtime)
<rharper> smoser: <mgagne> rharper: all patches: http://paste.ubuntu.com/23059836/
<harlowja> i didn't go into the path of discovering and creating a cloud[init,config,final] sections of that config
<harlowja> because though i could, its umm, non trival :-P
<harlowja> and likely requires more metadata on modules to define there ordering (not via cloud.cfg at that point)
<rharper> we probably should instead check if iface['type'] in ['bond', 'vlan'] and possibly 'bridge' ;
<smoser> rharper, so you're just assuming all bonds (or vlans or bridges) then are 'auto'
<harlowja> so that kind of stuff seems like a larger change, vs just attempting to find modules that are already defined in cloud.cfg via entrypoints (leaving the change to be just a different way to find modules)
<rharper> smoser: we default to auto if an interface has a subnet
<rharper> in this case, it's a bond with no subnets
<rharper> as it's being assembled but not with a subnet;
<smoser> ie, those default to 'auto' while others (even with 'subnets') default to non-auto
<smoser> we do default to auto if a subnet ?
<rharper> no we always default to auto unless 'control' is set in subnet
<rharper> yes
<smoser> hm.. you're saying that is true after your change or before
<rharper> there are a few known cases where config explicitly wants subnet + control=manual (aka iscsiroot)
<rharper> if iface has subnet, control=auto for the iface/index pair
<rharper> if you do not include any subnet, then no auto (except for bond-slaves)
<rharper> that really should be any interface with a nested config (master/slave); I'm pretty sure
<smoser> if iface has subnet and no control=
<rharper> then control is set to auto
<rharper> for iscsiroot, we specify control: manual
<rharper> override the default;
<smoser> right, so you're not actually checking for bond-master.
<smoser> you're just trunign oauto on
<rharper> no, we check for bond-master in the case if iface with no subnets
<rharper> and then auto it, *if* it's a slave (slaves point to their master with bond-master key)
<rharper> but, if the bond master itself (bond0) doesn't configure a subnet, it doesn't get an auto
<rharper> I suspect the code in ifupdown/if-pre-up.d/ifenslave could be fixed to raise the bond master independent of whether it's marked auto or not; but it currently does *not* bring up the master unless listed in allow-auto (or marked auto)
<rharper> if bond0 doesn't come up then the rest of the config won't succeed (we timeout waiting on bond0 to be created via slave ifup hook)
<rharper> a bond-specific solution/workaround is to also include the bond master (indicated by key bond-slaves in iface) to be marked auto;
<rharper> that might be enough, but I'd like to test/check bridges without subnets and vlans without subnets to see if we generally need to mark non-subnet interfaces with auto by default;  that is, I don't yet know of a config where we want a manual bond/vlan/bridge
<smoser> ok. for now i'm good with the fix as you all had.
<smoser> it is kind of wierd and possibly wrong that we are renaming devices in the 'init' stage (in addition to init-local).
<smoser> harlowja, what i dont understand is how you are making the problem of adding a config thing any easier.
<smoser> the cc_foo.py can now be placed in some additional directory ?
<harlowja> smoser i can put the config modules in my own library, expose a named entrypoint, then just update cloud.cfg to reference that module
<harlowja> so cc_blahblah no longer needs to be patched into cloud-init
<smoser> how do you expose a named entry point ?
<harlowja> same way the modification there to cloud-init setup.py is
<harlowja> so library would just need to add a entrypoints entry (like in that setup.py) in there own module
<harlowja> so in said libraries setup.py there would be an entry like
<harlowja> entry_points={
<harlowja>     'cloud.config': [
<harlowja>         'my_thing = my_thing.my_cloud_handler',
<harlowja>     ],
<harlowja> },
<harlowja> so when cloudinit looks for a way to call 'my_thing' (assuming its in a cloud.cfg listing somewhere) then it can go out and try to find it (and load this library to get at it)
<harlowja> (or if nobody registered that module, then die as usual)
<smoser> harlowja, so..
<smoser>  
<smoser> http://paste.ubuntu.com/23062532/
<smoser> that is what i dont like about entry poitns
<smoser> takes ~ 0.01 to bring up python , 0.03 to bring up python3 on a reasonably current SSD
<smoser> (with '0' as first arg)
<smoser> importing the pkg_resources takes 0.3 seconds on python, and 0.25-ish on python3
<smoser> it does look like it caches stuff as 10 runs take about the same as 1
<smoser> i'm guessing that python3 is faster in my test only because i have fewer entry points or packages installed on the system in python3 compared to python2
<smoser> so its doing less work.
<smoser> this is also embarrasing:
<smoser>  http://paste.ubuntu.com/23062545/
<smoser> and it needs fixing
<smoser> but i'm somewhat hesitant to add something like that.
<harlowja> so thats just because u imported 'pkg_resources' ?
<smoser> the pkg resources import takes quite some time (~.1 seconds)
<smoser> the enumerating of some non-existant namespace takes .2 seconds
<smoser> obviously very scientific data there.
<harlowja> :-P
<smoser> i should have done a -1
<smoser> lets re-do that paste
<smoser>  http://paste.ubuntu.com/23062572/
<smoser> there. -1 is just cost of bringing up python
<smoser> fiddle
<smoser> http://paste.ubuntu.com/23062577/
<smoser> there ^
<smoser> -1 is cost of python
<smoser>  0 is cost of import pkg_resoruces
<smoser>  1 is cost of one call to 'iter_entry_points'
<smoser> 10 is cost of 10 calls
<harlowja> k
<smoser> with revised my.py at http://paste.ubuntu.com/23062581/
<harlowja> seems like they need to better optimize that entrypoint 'catalog' lol
<smoser> yeah, it is stat crazy
<smoser> those openstack cli programs do taht.
<harlowja> right
<smoser> they do cache well
<smoser> since 10 runs takes basically nothing more than 1
<harlowja> but assuming a entrypoint catalog existed, in the core python, then i'd assume that stuff wouldn't take forever
<harlowja> aka a tiny sqllite db
<harlowja> lol
<harlowja> wonder why such a thing doesn't exist
<smoser> yeah, but i think the entry points are stuff in taht egg.info right ?
<smoser> thats how those are loaded ?
<smoser> so python goes looking in any possible directory in sys.path for a file egg.info or something and then goes reading it and such.
<harlowja> thats one location of it, but u'd think that pip could update a sqllite db or something
<harlowja> i wonder if the python community is working on anything like that
<harlowja> seems pretty obvious to do that
<harlowja> then X people wouldn't be making there own entrypoint-thing due to this
<smoser> not too long ago i had a spinning disk
<smoser> (more embarrasment)
<harlowja> whats that crap
<harlowja> ha
<smoser> and running 'nova' on it took like 3 seconds to load.
<smoser> nova as in the cli tool, not the service :)
<harlowja> :-P
<harlowja> so ya, the other option is that we make our own loader slightly more advanced
<harlowja> so that say in cloud.cfg u could have fully specified modules + functions
<harlowja> then i could have a entry like
<harlowja> godaddy_ci.handlers:basic_handler
<harlowja> though that starts to just make our own entrypoint like thing :-/
<smoser> ok. one more thing.. http://paste.ubuntu.com/23062607/
<smoser> for my reference mostly.
<harlowja> lol
<smoser> that just runs it with strace too, and counts stats or opens
<harlowja> nice
<harlowja> stats or opens: 2561
<harlowja> lol
<harlowja> ya, idk why they aren't backing that crap via sqllite
<harlowja> afaik entrypoints are all 'static'
<harlowja> in that they are all defined by packing (in setup.py or other)
<harlowja> seems dumb to rescan the filesystem to find them
<harlowja> it'd seem like a win for most of python if it wasn't so scan happy
<harlowja> though of course any change to do that would probably hit the people that will say its all in cache and such and blah blah
<harlowja> and gets into the question, of make our own thing, or just work with the python stuffs
<smoser> so i had a start of my own thing
<smoser> that took a list of directories
<smoser> and would look in those.
<smoser> cloud-init needs lots of performance improvements for sure
<harlowja> why not at that point just explicitly name full modules in cloud.cfg ?
<harlowja> i'd rather not make our own full entrypoint thing :(
<harlowja> or just at least, try to talk to python-devs, asking what's a solution (is there any, is sqlite db possible, or a static file that everyone updates or ...)
<smoser> ok... i'm just saying this out loud for my own logs and such.
<smoser> http://paste.ubuntu.com/23062681/
<smoser> that is a bzr revno to git hash mapping that seems correct for right now.
<harlowja> woah
<harlowja> ha
<mgagne> so a coworker tested the "fixed" cloud-init and had some form of race condition, the gateway and routes weren't properly configured. rebooting fixed the issue.
<mgagne> will do more tests tomorrow
#cloud-init 2016-08-17
<Odd_Bloke> harlowja: Storing a database becomes problematic if there is more than one source of Python modules, surely?
<Odd_Bloke> harlowja: Consider the case where I activate a --system-site-packages virtualenv for the first time in six months after a bunch of changes to the system Python packages.
<Odd_Bloke> harlowja: Or the case where I have a Python library in my local directory.
<Odd_Bloke> Or really any case where I modify PYTHONPATH. :p
<rharper> smoser: on my trusty image, the procps service applies the cloud-init ipv6 priv extensions after network config (as designed); and clears existing ipv6 ips.   During boot we apply an ipv6 address to an interface (via udev event, call ifup on bond0.108 which sets ipv6 ip), the sysctl from procps service runs once the network is up; we switch from default mode of tmpaddr=2 to tempaddr=0 which ends up wiping the ipv6 addre
<rharper> ss that was previously set;  note the recent comment on the bug https://bugs.launchpad.net/ubuntu/+source/procps/+bug/1068756 for why this doesn't break in xenial
<smoser> that is a pretty dense one line irc statement
<rharper> yeah
<smoser> that seems busted.
<smoser> to apply the change (and wipe ip addresses) after they've been configured
<smoser> doenst it ?
<rharper> read the comments on the bug; there's some trusty kernel patch that triggers the wipe
<rharper> yeah
<rharper> so, from utopic on, we dropped it
<rharper> but good ol trusty keeps on keeping on
<rharper> and here I was cursing at ifupdown when it wasn't its fault
<smoser> i have found this bug before
<smoser> pretty certain once when one of the maas team was doing ipv6 stuff i found it.
<smoser> https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1352255
<smoser> rharper, there, comment 5 would have saved you a couple days
<smoser> :-(
<smoser> i'm really sorry, rharper
<rharper> heh
<rharper> well, the comment in the sysctl file
<rharper> helped
<smoser> yeah, rharper so its even awesomeer than you thought
<rharper> smoser: in #lxd, stgraber says that it's a mistake that it was dropped
<smoser> because if you switch the order of the stanzas, it can work i think.
<rharper> right, for some reason if I bring up bond0.208 (only v4) and then the 108 with v4+v6; that worked (without disabling the v6 change in tempname)
<rharper> but that's racy; possible if we can get the procps sysctl to re-trigger networking; I don't know...
<rharper> or if we could set the v6 tempname to 0 earlier (rather than procps service)
<smoser> https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1379427 is mentioned from the other bug.
<smoser> key statement there:
<smoser> "The result is that things that were waiting on 'static-network-up', expecting that would provide them with expected networking are not actually guaranteed anything other than the first stanza for each interface."
<smoser> ugh...
<harlowja> Odd_Bloke surely said things are solveable :-P
<harlowja> and scanning all the things isn't exactly a good solution :-P
<harlowja> least python could do is cache some stuff
<harlowja> hash(sys.pythonpath) ---> make a file of entrypoints per hash?
<harlowja> and use that hash if possible going forward, blah blah
<rharper> sorta like ld.so.cache
<harlowja> ya, some crap like that
<smoser> well, but where would it put that ?
<smoser> it does seem to sanely cache , but the first pass per python interpretor fills that cache.
<smoser>  /sbin/ldconfig  is wrought with failure paths too
<rharper> but it mostly just works
<rharper> we're not in new territory; so it's best to dig at previous proposals in this space
<rharper> google has shown me plenty of folks with the idea; not many actual attempts or details;
<cn28h> can anybody tell me what the "Ssh Authkey Fingerprints" plugin does? is that responsible for installing the public ssh keys for cloud-user (or whatever user you've chosen)?
<cn28h> asking because we saw it fail unexpectedly
<cn28h> (we're not injecting any ssh keys)
<cn28h> I don't think it's a big deal, but want to understand.. and if one plugin fails, will cloud-init continue processing the rest?
<cn28h> hm.. reading the code it looks rather like it's actually displaying the keys that were injected in pretty tables
<cn28h> not actually injecting keys
<smoser> cn28h, i have seen a race condition on it i think.  but often times you see that failures as a result of something else failing.
<smoser> if you saw it and can reproduce, i'd love to see a /var/log/cloud-init.log
 * smoser out
#cloud-init 2016-08-18
<smoser> harlowja, 'rejected' seems so mean.
<smoser> wrt https://code.launchpad.net/~harlowja/cloud-init/+git/cloud-init/+merge/302609
<brian_price_> Hello, random question. Is cloud-init 2.0 currently on hold or is there work or plans being discussed somewhere?
<smoser> on holser_
<smoser> hold even :)
<smoser> sorry holser_
<smoser> rangerpb, https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/303303
<brian_price_> ohh heh, okay thanks.
<wznoinsk> hi all
<rangerpb> smoser, lgtm
<rangerpb> larsks, u might want to peek at it too
<larsks> smoser: seems like a reasonable idea.  I left a comment re: simplifying is_azure, but functionally everything seems fine.
<smoser> larsks, i know that you'll tell me that i'm over-concerned, but the is_azure that i have there is probably 100x faster than grep
<larsks> I think you are placing too much value on execution efficiency and not enough on code legibility, but the end result is that both options deliver the same answer and the difference in execution time is probably not noticable.  So go with what you have :)
<smoser> alri largely agree, but i dont think the code is non-radable and avoiding another program execution and one more fork in boot i think is useful.
<smoser> http://paste.ubuntu.com/23068101/
<smoser> ^ that shows the relative performance
<smoser> it is minimal i agree. and its also liekly that 'grep' is already in the vfs layer during boot because some other script used it (probably for something very similar)
<smoser> harlowja, https://code.launchpad.net/~ajorgens/cloud-init/python26/+merge/296575
<smoser> that doesnt work. simply doesnt work
<smoser> right ?
<smoser> os.uname is always a method, not a property
<harlowja> oh man, i got rejected
<harlowja> lol
<harlowja> smoser will u still be my friend
<harlowja> lol
<smoser> harlowja, ^
<smoser> how did / does that work ?
<harlowja> not sure u are my friend anymore
<harlowja> i'm a reject
<harlowja> lol
<harlowja> smoser maybe it never worked?
<harlowja> though why it neverr blew up
<harlowja> thats another question...
<prometheanfire> smoser: 0.7.7 has py3 support (from reading the notes)
<prometheanfire> if so, which py3?
<smoser> prometheanfire, probably 3.4+
<prometheanfire> k, that's what I was thinking
<prometheanfire> can we get https://github.com/gentoo/gentoo/blob/master/app-emulation/cloud-init/files/cloud-init-0.7.6_p1212-gentoo.patch in :P
<prometheanfire> the init files are already there
<harlowja> propose a branch ?
<prometheanfire> ya
<prometheanfire> does cloud-init follow the openstack review process?
<smoser> prometheanfire, not openstack. launchpad . see HACKING.rst
<prometheanfire> clone so slow
<smoser> prometheanfire, http://paste.ubuntu.com/23068496/
<smoser> 7 seconds here.
<prometheanfire> ya, second try was fine
<prometheanfire> also, lolmake :P
<prometheanfire> installing requirements and tests requirements in a virtualenv doesn't install the nosetests3
<prometheanfire> binary which it expects
<smoser> ?
<smoser> run tox
<prometheanfire> that's not what hacking says to do
<smoser> does it really.
<smoser> man
<smoser> i should read that
<smoser> run tox
<prometheanfire> :P
<prometheanfire>     make test pep8
<smoser> yeah.
<smoser> tox does the right thing.
<smoser> and will then run python 2 and python3 too
<prometheanfire> ya
<prometheanfire> I forgot about our /sbin/ip vs /bin/ip thing...
<smoser> oh? do we have that hard coded somewhere?
<smoser> that just needs fixing
<prometheanfire> ya
<prometheanfire> cloud-init-0.7.6/cloudinit/sources/DataSourceOpenNebula.py
<prometheanfire> well, in the patch
<smoser> yeah. i got to run, but we should not run by full path.
<smoser> so patches welcome on that.
<smoser> i have to run
<prometheanfire> want me to include that as an addtional commit on the same topic?
<prometheanfire> it'd be easier for me at least
<smoser> sure.
<prometheanfire> thankss
<prometheanfire> done, https://code.launchpad.net/~prometheanfire/cloud-init/+git/cloud-init/+merge/303339
#cloud-init 2016-08-19
<smoser> harmw, https://code.launchpad.net/~harmw/cloud-init/growfs-gpart-fixes_for_fbsd/+merge/246075
<smoser> still needed ?
<smoser> rangerpb, hey.
<smoser> one more question...
<rangerpb> yessir smoser
<smoser> config/cloud.cf
<smoser> you set dhclient_lease to null i guess
<smoser> i dont think that should be necessary
<smoser> i think i need to revert the other changes to that file too
<smoser> Odd_Bloke, around ?
<Odd_Bloke> smoser: In half an hour.
<smoser> Odd_Bloke, https://git.launchpad.net/cloud-init/commit/?id=648dbbf6b090c81e989f1ab70bf99f4de16a6a70
<smoser> rangerpb set the 'agent_command' to __builtin__
<smoser> which, you had (i believe) added that support, but as far as i can tell its not used in ubuntu
<rangerpb> smoser, the __builtin__ was the key/value that lets it go down the route of discovery on azure
<rangerpb> without it, none of the code is used
<smoser> yeah.
<rangerpb> dhclient_lease should probably be set to something distro specific
<rangerpb> it allows fallback to a lease file which at present was only defined for ubuntu
<Odd_Bloke> The __builtin__ code there was written when we were building all-snaps images for Ubuntu Core Series 15, where walinuxagent wouldn't run.
<Odd_Bloke> We aren't presently using it for anything, as we don't publish Series 15 images any more.
<rangerpb> fine but it is a conditional in the current code
<rangerpb> totally open to change
<rangerpb> i never much liked the conditional which was there, it seems in cloud-init we chase down every possible avenue
<rangerpb> smoser, u follow?
<smoser> i have no idea what a series 15 image is
<rharper> ubuntu-core
<rharper> based on 15.04/10 IIRC
<smoser> oh. ok.
<smoser> rangerpb, does fedora/centos run the agent ?
<smoser> Odd_Bloke, what does the agent do for us ? why not use the new path.
<rangerpb> the agent still has some upside over the current
<rangerpb> not really in provisioning but more in configuration, etc
<rangerpb> for example, you can pass a script "through" the agent and it will end up getting executed on the vm
<smoser> well, cloud-init does that too :)
<Odd_Bloke> smoser: I think just because we've always been relying on walinuxagent.
<rangerpb> smoser, i think the agent can do it anytime, not just on boot
<Odd_Bloke> We will still need to run walinuxagent in Ubuntu images (for other Azure-specific things to work), so we'd have to check that the two don't conflict.
<Odd_Bloke> But walinuxagent is reasonably configurable, so if it isn't already possible to stop it doing things that we want to do, we could probably request it.
<smoser> Odd_Bloke, but does the agent start itself ?
<smoser> cloud-init starts it, but is that necessary ?
<rangerpb> the answer is it depends on what you are trying to do imho
<rangerpb> https://git.launchpad.net/cloud-init/tree/cloudinit/sources/DataSourceAzure.py#n231 <-- this was the conditional i was referring to
<rangerpb> basically two paths to do the same thing, agree Odd_Bloke ?
<Odd_Bloke> Yeah, but __builtin__ doesn't rely on walinuxagent.
<rangerpb> yeah
<rangerpb> so it might be better? that the conditional check for the existance of the agent?
<Odd_Bloke> But it was only ever used in an environment where walinuxagent _would not_ run, so we don't actually know how it works in an environment where we do __builtin__ _and_ walinuxagent runs.
<rangerpb> well i do
<rangerpb> i didnt see any harm in running both
<Odd_Bloke> Yeah, I don't see any harm in it.
<Odd_Bloke> Unless it breaks. ;)
<Odd_Bloke> But, fundamentally, the two should be fine side-by-side.
<Odd_Bloke> It's just whether or not there's something unconsidered that needs to be cleaned up.
<Odd_Bloke> It sounds like there isn't.
<Odd_Bloke> So that's good.
<rangerpb> the problem is that the one path *requires* an agent, which might not be there
<rangerpb> which is why i said, if the agent is present, prob prefer it; if not, go the "newer" route
<rangerpb> then there is no collision on provision
<rangerpb> smoser, Odd_Bloke however, if you prefer, I can run this by MS
<smoser> rangerpb, sure. i'll have a few changes to my dhclient hook cleanup that i'll point you at.
<Odd_Bloke> I thought you just said you'd identified that you had run the two and there wasn't a collision?  Did I misunderstand?
<smoser> we can have the default behavior to do as you suggest
<Odd_Bloke> (Or just being cautious?)
<rangerpb> being cautious
<Odd_Bloke> OK, cool.
<smoser> but still want it to be configurable
<rangerpb> sort of like, where does provisioning start/stop and other stuff begin
<smoser> so that you could set the agent command
<smoser> Odd_Bloke, in our images, if cluod-init did not start the agent...
<smoser> would it not run ?
<rangerpb> i would like to see the configuration have something like azure_provision: cloud-init|agent
<rangerpb> or one is override, perhaps better
<rangerpb> i.e. default is still agent and override with cloud-init
<rangerpb> i gotta grab some vittles, will read back when i return
<Odd_Bloke> So I think the only restriction we would need is that it be easy for the default to be cloud-init.
<Odd_Bloke> So that we can backport changes more easily, without having to carry a large patch to do that.
<Odd_Bloke> But even that isn't a hard blocker.
<Odd_Bloke> smoser: http://paste.ubuntu.com/23070971/ is the walinuxagent unit file, which I believe means it would come up anyway.
<Odd_Bloke> We just start it earlier.
<smoser> ok
<smoser> http://paste.ubuntu.com/23070985/
<smoser> rangerpb, Odd_Bloke ^ is what i'm looking at
<smoser> rangerpb, that'd mean that right now fedora would have to carry the agent_command setting
<smoser> but i'm open to "if agent is installed" logic
<rangerpb> ok
#cloud-init 2016-08-21
<prometheanfire> so, found a bug
<prometheanfire> https://gist.github.com/prometheanfire/bc3d3f4808462886f13e9e2147b183ef
<prometheanfire> if kernel doesn't support bonding cloud-init doesn't set up networking
<prometheanfire> looks like you don't write anything for gentoo's net config
<prometheanfire> I feel like it used to be corret...
<prometheanfire> proper gentoo network init, not tested yet (taken from my work in glean http://bpaste.net/show/ced19dd5b7d8
<prometheanfire> smoser: I'd like your initial review on ^ if you can
#cloud-init 2017-08-14
<boxrick> Does cloud init run at first boot only or each and every boot?
<powersj> boxrick: by default it will run at each boot. However, not all parts of cloud-init will run each time. For example, it does not need to generate ssh keys on each boot.
<boxrick> Ok lovely, cheers for that
<blackboxsw> rharper: I'm tearing out the cloud-init query subcommand, no need for Notimplemented() subcommands. We can add it in when we have something
<blackboxsw> as it is right now, you actually get tracebacks because we don't even setup the event stack  approporiately (undefinted parameters etc).
<rharper> blackboxsw: +1
<blackboxsw> rharper: I'm having trouble finding where exactly the module runtime is cached (and validated):w  is the frequency caching handling in cloud-init (
<blackboxsw> oops, typo-world. one sec
<blackboxsw> having a hard time finding where the cache of module runtimes is stored once a specific module is run... and where that frequency is validated to avoid running the module a 2nd time (for PER_ONCE frequencies)
<rharper> blackboxsw: when you say module, you mean each cc_<module>  ?
<blackboxsw> rharper: yeah
<rharper> so, in stages.py I Think you'll see the reporter setup
<rharper> also, the sem files, I think record some data
<blackboxsw> ok, yeah I just wasn't really seeing log messages telling me cc_ntp, for instance, is being ignored because it's configured PER_INSTANCE
<blackboxsw> per "cloud-init --debug single --name cc_ntp"
<rharper> so, when you run in single mode, not sure how well the logging is configured
<rharper> if you look at cmd/main.py there's *complicated* logic w.r.t when/how logging is configured/enabled IIRC
<blackboxsw> ahh I'm seeing the issue now. right logging is incomplete on the single commandline. The appropriate, and needed, messages about skippping due to invalid frequency, or disabled module etc just go into /var/log/cloud-init.log
<blackboxsw> right we need a stdout handler too if running single from the commandline (we don't want that in init,  modules or dhclient-hook subcommand cases I imagine as they are spawned by init or dhcp).
<blackboxsw> ok and all semaphore logic is in cloudinit/helpers.
<blackboxsw> ok now I get how that works
<rharper> yeah
<TemporalBeing> q: what is the intent with `runcmd`? Should it be possible to have multiple `runcmd` sections in a single cloud-init script and have them *all* run or does only the *last* get run?
<TemporalBeing> (I've found that `package_update` and `package_upgrade` but `packages` installs stuff just fine.
<blackboxsw> my understanding is that cloud-init piecies together all cloud-config parts ultimately into a single structure representing all  cloud-config parts. So if there are multiple-definitions of the same keys, the latest piece read is what gets honored
<blackboxsw> so I believe if there are multiple commands that we want to run via runcmd, you'd need to include them in a single runcmd list like the simplistic example https://cloudinit.readthedocs.io/en/latest/topics/modules.html#runcmd
<rharper> blackboxsw: TemporalBeing: yeah, run_cmd takes a list;  package_update and package_upgrade are flags which map to a  distros package manager for fetching repository data (package_update on ubuntu/debian does an apt-get update); and package_upgrade does a apt-get upgrade
<rharper> s/flags/booleans
<TemporalBeing> @rharper: I haven't gotten package_update/package_upgrade to work, so I have two runcmd sections - one to do what those do, then after the packages are installed via `package` more to do other things, including enabling things from the installed packages.
<rharper> TemporalBeing: what does your cloud-config look like ?
<TemporalBeing> @rharper - something like https://gist.github.com/BenjamenMeyer/eaadf44fb6ac187465578cc6d02f59b9
<rharper> k
<rharper> TemporalBeing: looks fine, you can collapse the write-files and run commands together;  I found one issue with your pip requriements file, you marked it b64 encoded, but it's just plain txt, so pip didn't like it, http://paste.ubuntu.com/25314461/
<rharper> trying that now; you'll be able to ssh in if you add a ssh key, and tail /var/log/cloud-init-output.log to see the apt upgrade/install, etc (output of runcmds end up there)
<TemporalBeing> yeah, I had to sanitize it - the b64 encoding difference was part of that; the structure is basically the same
<rharper> ah, gotcha
<TemporalBeing> my biggest issue is that the first cloud-init data (creating the /root/.config, and pushing a file into it) isn't getting run at all; the rest seems to be working
<TemporalBeing> can't figure out why, and can't get cloud-init to dump anything useful to diagnose
<rharper> hrm, let me see
<TemporalBeing> told if I remove some stuff then reboot I can get cloud-init to reprocess, which has kind of worked
<TemporalBeing> but no success in diagnostics of what's wrong
<TemporalBeing> If I need to rewrite to put everything into unique sections, okay...I can manage
<rharper> so, you don't need to do a mkdir for things with write_Files
<rharper> cloud-init will ensure the path to the file exists
<TemporalBeing> it failed if I didn't :(
<rharper> hrm
<rharper> TemporalBeing: which cloud-init  version ?
<TemporalBeing> initially on Ubuntu 16.04, now 17.04
<rharper> write_files module uses util.write_file in cloud-init, which calls a method ensure_dir() which does a recursive mkdir on the paths needed
<TemporalBeing> so presently working with ubuntu's version 0.7.9-113-g513e99e0-0ubuntu1~17.04.1
<rharper> k
 * rharper removes the mkdir and tries again
<rharper> write_files will happen first, then runcmd, then package-update-upgrade-install
<rharper> TemporalBeing: hrm, I think runcmd happens too soon here;  I'm going to file a bug;  as per the docs, it should be running at "rc.local" like time, but
<TemporalBeing> @rharper thanks for taking a look; refactoring some of my scripts too around the only the last perspective
<rharper> np, I'll paste the bug here and you can subscribe to it if you like
<TemporalBeing> sure - thanks
<TemporalBeing> the `write_files` didn't create the directory :(
<TemporalBeing> okay...I think I know what happened - I added another `write_files` section and the merge functionality isn't merging that section so only one of the two (the second) gets processed
<rharper> yes
<rharper> that's what I mentioned about combining
<rharper> both runcmd and write_files takes a *list* of entries
<rharper> if you have two top-level config keys, one of them will get clobbered when the configs are merged
<TemporalBeing> it *should* have merged them - that's what the merging functionality talks about
<rharper> it does mention that
<rharper> list type should append
<TemporalBeing> http://cloudinit.readthedocs.io/en/latest/topics/merging.html talks about them getting combined together
<rharper> so we may have a second bug here (possibly fixed in artful)
<rharper> but not yet in xenial
<rharper> will need to investigate that
<TemporalBeing> gotcha
<blackboxsw> rharper: sorry I wasted a bit of your time on my 2nd review of cloudinit analyze today. I hadn't pushed enough content to you. I've addressed review comments now I think and updated docs.
<rharper> blackboxsw: no worries;  I've implemented a quick summary verb we can use to print a one-line summary of the boot
<rharper> I'll have a patch/MP for you tomorrow; but it allows a cloud-init analyze (no args) to print that out ala systemd-analyze;   then we can do the cloud-init analyze show|blame|dump etc as we do with params and such
<blackboxsw> bah, just saw your branch up CLI features
<blackboxsw> https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/328992
<blackboxsw> I can rework my commandline.rst topic doc to incorporate your branch/changes
<rharper> ah, doc update
<rharper> if you want, that's cool
<blackboxsw> ok I'll pull it in. as I also touched on that
<rharper> I just noticed that when reading rtd this morning
<blackboxsw> sorry it took so long to push it up
<rharper> np
<blackboxsw> rharper: question once you have a chance to tox -e doc on my branch. should all of that content be migrated to capabilities.rst?
<blackboxsw> or should it continue to live some a separate  page.
<blackboxsw> It has a lot of your cloudinit analyze README.md notes etc.
<blackboxsw> which doesn't feel like it makes sense as the "entry" page of cloudinit capabilities
<rharper> blackboxsw: lemme see
<rharper> blackboxsw: so I'd make a new topic for the bulk of that instead of the CLI page;
<rharper> something like a Boot Time Analysis
<blackboxsw> Ok and then we just don't talk about cloud-init cmdline
<blackboxsw> much other than analyze specifically
<blackboxsw> so where should cloud-init single go?
<blackboxsw> an ammendment to the modules topic?
<blackboxsw> cloud-init (modules|init) are both kindof discussed on the stages topic
<rharper> We may want a general debugging/testing section where we can put both Analyze and single
<blackboxsw> that sounds good, then I can drop the rest
<blackboxsw> and it won't collide w/ your branch
<rharper> we certainly can still have a cli reference section that just has mostly --help like output
<rharper> my branch is tiny
<rharper> easily rebased/appended as needed
<rharper> I wouldn't worry about collision
<blackboxsw> +1
<blackboxsw> thanks
<blackboxsw> approved https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/328992
<blackboxsw> and I pushed my doc changes again
<blackboxsw> separate debugging/testing doc. dropped discussion of cloud-init (init|modules) as that's kinda covered by stages
<blackboxsw> gotta run for a few.
<rharper> later
<blackboxsw> back
#cloud-init 2017-08-15
<blackboxsw> rharper: I think we are good on https://code.launchpad.net/~msaikia/cloud-init/+git/cloud-init/+merge/322991 now. I was likely going to land this later today unless you have objections. smoser did initial review there I had a couple of followups minor changes have come back per review comments
<rharper> blackboxsw: cool
<blackboxsw> you know, gotta get content for our cloud-init weekly summary
<blackboxsw> :)
<rharper> lol
<rharper> blackboxsw: let me know when you push, and I'll rebase my two branches (doc change and v2 network passthrough)
<blackboxsw> waiting on tox
<blackboxsw> rharper: pushed
<rharper> blackboxsw: cool, thanks
<rharper> TemporalBeing: ok, looked again at the code;  the runcmd items get written out to a scripts file in the instance and then aren't executed until the scripts_user module runs, which happens very late (and after package install);  that explains why we could do package installs and interact with them in the runcmd (like your ufw changes, and pip);  What remains is the dictionary merging;  I'm going to look at that next
<TemporalBeing> ok
<rharper> blackboxsw: ok, pushed my two branches
<blackboxsw> nice rharper. ok I'm seeing a 6 hour offset in my unit tests on analyze branch (the only thing I think that is blocking that branch from landing). Strange thing in test_dump is that I use static SAMPLE_LOGS, so I'm not quite sure how our jenkins environment is parsing that differently. digging a bit
<rharper> blackboxsw: so, I've not updated to your latest branch changes yet, but it looked like we might need to mock out the timestamp value
<rharper> locally, tox -e py3 was failing, maybe you've already fixed that and your offset is different
<rharper> blackboxsw: do you have a link to the failure on jenkins ?
<blackboxsw> rharper: https://jenkins.ubuntu.com/server/job/cloud-init-ci/144/console
<rharper> blackboxsw: yeah, that's what I see locally on my xenial dev box
<rharper> tox -e py3 shows it each time
 * rharper will take a closer look since he can reproduce 
 * blackboxsw makes sure I've pushed latest
<blackboxsw> I get no errors xenial tox -e py3 on 2fdfab7..9e8fc85
<blackboxsw> just pushed
 * rharper refreshes
<rharper> blackboxsw: why does that work?
<rharper> I think we should debug the parse_timestamp with that value
<blackboxsw> I don't know if that works. but I definitely feel the same way
<blackboxsw> timeshifting -6hours just sheds light on something busticated
<blackboxsw> I only pushed that offset -6 hours to let jenkins run and see if it still fell over
<blackboxsw> I didn't intend you to get the latest work in progress branch
<blackboxsw> s/branch/commit
<blackboxsw> good timeshift -6hours didn't work. it's still broken by a 6 hour offset https://jenkins.ubuntu.com/server/job/cloud-init-ci/146/console
<blackboxsw> reverting that last commit
<rharper> I think this is a utc thingy;  when I manually convert the value via date, I get what the unittest gets as well;  so I'm wondering where you got the expected value ?
<rharper> ah
<rharper> it's in MDT
<blackboxsw> hahah
<rharper> % date  +%s.%3N -d "2017-08-08 20:05:07,147 MDT"
<rharper> 1502244307.147
<blackboxsw> yeah
<blackboxsw> my bad
<rharper> well, not really
<rharper> I feel like we need something else in here re: TZ
<rharper> the timestamp value written to the record is including the TZ offset in the timestamp value
<rharper> that feels like a bug, it should be in UTC (or at least we need to record the TZ)
<blackboxsw> right an tz explicit format
<rharper> well, I have a branch to force logging timestamps into UTC
<rharper> which should help here
<blackboxsw> right would avoid the issue
<blackboxsw> but for old logs...
<rharper> I think it's OK, if you're parsing a cloud-init.log file for events, you'r "generating" events from the log file, and you get the +TZ offset in the records
<rharper> I don't think that's an issue per-se
<rharper> that is, the delta between events is still correct independent of the TZ offset
<rharper> ok, I've got to grab some lunch, will think on this, bbl
<blackboxsw> reset my tz to repro the failure.. ok working it
<jhodapp> dpb1, I'm curious if there is already a way of adding a user via cloud-init/cloud-config for Ubuntu Core?
<dpb1> jhodapp: hey, not everything has landed, a couple of MPs are still in review for snappy.  But it's kind of testable right now
<dpb1> with effort
<dpb1> rharper: if all that was available in uc16, would adding users work?
<rharper> the users configuration dictionary works
<jhodapp> dpb1, oh nice
<jhodapp> rharper, so that's a full on uc16 system user right?
<rharper> if you want to add snappy users, then you use the snap_user format,
<rharper> it's a normal user, if you want a snappy system user, with store privs, you need to supply the email or system_user assertion
<rharper> lemme get the link to docs
<rharper> http://cloudinit.readthedocs.io/en/latest/topics/modules.html#snap-config
<rharper> so snappy: {'email': <addr>}  will trigger the 'snap create-user' which will do the user lookup via store and import ssh keys from associated account if present, and that will be a snap user which can install without sudo
<rharper> the other users: keys will work, and users are added via the --extra-users path, they have sudo but are not authenticated to the store , so one would need to snap login, etc to access snap stuff with an account not created through snap create-user
<jhodapp> rharper, what if the device doesn't have internet access to lookup via the store?
<rharper> it fails
<rharper> that's always the case for snap store interactions
<rharper> if you think you cannot rely on the store
<rharper> then you need to use assertions
<rharper> https://docs.ubuntu.com/core/en/reference/assertions/system-user
<jhodapp> rharper, manually applied assertions or can it be done with cloud-init still?
<rharper> with cloud-init
<rharper> you just inline the assertions in the config
<rharper> snappy: {'assertions': [assrt1, assert2, assser3']
<jhodapp> rharper, ah I see, I think that should work for the requirements I have in mind
<rharper> and then you can do email: <foo> known: true
<jhodapp> rharper, and that'll work no matter if we've disabled console-conf?
<rharper> no interaction directly with console-conf
<jhodapp> rharper, speaking of which, second question...can we disable console-conf via cloud-init?
<rharper> cloud-init runs 'snap ack  <assertion>'
<jhodapp> ok
<rharper> and 'snap create-user --known <email>'
<jhodapp> so just a wrapper
<rharper> we don't have a config for disabling console-conf, but 1) if you create-user, snap disables console-conf for you
<rharper> 2) if you want to cheat, you can touch the files that the systemd units do to disable console-conf
<jhodapp> rharper, oh interesting
<jhodapp> yeah we knew we could do that manually
<jhodapp> that'll work for us
<rharper> console-conf does 'snap create-user '
<jhodapp> although you may consider adding that ability explicitly
<rharper> so, it's either interactive via console-conf, or via cloud-init
<rharper> I don't think that'll abide
<rharper> in general, we don't want to disable console-conf
<rharper> those that really do want to have  custom image anyhow
<jhodapp> rharper, that's not a safe assumption
<rharper> in which they can write out something to disable console-conf
<jhodapp> the use case for this project breaks that assumption
<rharper> and you're using the stock pc image? instead of customizing ?
<jhodapp> rharper, yes
<rharper> to use system-assertions you need to have your own model
<rharper> so you'll need a custom image anyhow
<jhodapp> rharper, you can't use the default one with assertions?
<rharper> you can't sign it since you're not canonical
<jhodapp> rharper, we are though :)
<rharper> from everything I'm aware of, these should really be custom images; but I'll leave that to you;   you can always do a write_files to touch the console-conf disable, etc if you don't make one;
<rharper> blackboxsw: I was thinking, we should grep for any other configobj + stringio
<rharper> might as well see if we can find any other ones like that
<blackboxsw> robjo: good thought rharper we can fold that in with the cc_landscape change
<blackboxsw> oops sorry robjo.... rharper I mean
 * rharper just keeps handing out the work
<blackboxsw> 15 more mins on my existing unit tests, I'm trying to get coverage of failure paths up
<blackboxsw> rharper: I'm seeing various approaches to using logs in cc modules. some define them locally for the module LOG = logging.getlogger(__name__) other's use the log passed into the handle function (param # 3). What's the preferred long term?
<blackboxsw> some cc modules use both
<rharper> I believe all new ones use LOG = logging.getlogger(__name__)
<blackboxsw> ok so we can be certain where the logs are coming from
<rharper> ack
<blackboxsw> rharper: I'll tweak cc_landcsape to use this approach then
<blackboxsw> since I'm meddling
<rharper> maybe
<rharper> the log function isn't related to this bug; so my preference is not to touch it
<blackboxsw> maybe a topic to discuss w/ smoser on friday
<rharper> yes
<blackboxsw> +1
<rharper> we may want to file these as backlog bugs
<blackboxsw> falls into the "style" category (like ' versus ")
<blackboxsw> alsmost
<rharper> heh
<blackboxsw> ok cc_landscape done. now onto cc_puppet.
<blackboxsw> which also has no unit tests
<blackboxsw> and looks to suffer from the same prob
<blackboxsw> the other modules using StringIO seem to have handled things for py3
<blackboxsw> and have coverage
<dpb1> unittesting++
<blackboxsw> up to 61% coverage... slowly climbing out of the hole
<rharper> blackboxsw: nice
<blackboxsw> found more bogus docs for cloud-config options in  cc_puppet docstr
<blackboxsw> ok fixing that too. yay unit tests
<blkadder> Hi all. I am trying to use write_file in order to write files on to an AWS instance and failing miserably. In two of the test cases the files themselves are being written but with no content and in the third nothing is written at all. Any suggestions as to what I should be looking for?
<blkadder> s/are being written/are being created/
#cloud-init 2017-08-16
<energizer> When i start up there's a long period of waiting for network connection that it doesnt find -- how can i disable this?
<energizer> or how can i disable cloud-init altogether?
<rharper> blackboxsw: https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/329122  (logging-gmtime)
<blackboxsw> grabbing it
<rharper> I think that should force the analyze code which parses timestamp strings to produce UTC based timestamp values
<rharper> it's possible that in the unittest if analyze/dump.py isn't pulling in logging from cloudinit/log.py then we'd miss it
<rharper> but I think in general, we want any cloud-init module to import logging via cloudinit.log
<blackboxsw> rharper: ok reviewed https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+merge/329122 it'd be nice to have a unit test for that. but doesn't have to block this one.
<blackboxsw> also I pushed latest/final changes for the timestamp work in unit tests on the analyze branch.  I didn't have to mock, but let the unit test calculate timestamps using local time as well. https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/328819
<rharper> blackboxsw: cool
<rharper> I'll re-review
<blackboxsw> just sent and email to Sankar about the vmware branch that's in activereview (too see if we might get that landed before next week too.
<rharper> powersj: I see you performed the subprocess.Popen dance to get your file via stdin, nice !
<powersj> rharper: thx!
<powersj> that worked really easily
<rharper> we could look at adding stdin support to subp
<rharper> since w're setting the values via Popen anyhow; just not there today
<stanguturi> Need some help regarding the 'network config rendering'... I have a datasource that specifies all the proper network configuration in proper format in network_config property. I noticed that cloud-init writes that settings to /etc/network/interfaces.d/50-cloud..cfg file instead of /etc/network/interfaces
<stanguturi> Any specific reason for this.
<nacc> stanguturi: presumably so that it's easy to know what cloud-init owns?
<nacc> stanguturi: does it break something?
<nacc> (also actually with two . ?)
<stanguturi> @nacc, Not actually breaks. I want to know how does the system work if we write the network configuration data in a file other than /etc/network/interfaces
<nacc> stanguturi: does your /e/n/i have a line like "source-directory interfaces.d" ?
<nacc> stanguturi: i believe that is the default in debian, not sure, don't have a fresh image sitting in front of me at the moment
<stanguturi> I don't think I have it. Will check Thanks
<nacc> stanguturi: just going off of `man interfaces`
#cloud-init 2018-08-13
<blackboxsw> smoser: comments are posted on https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/352660. Do you think I should drop the extra check on /var/lib/waagent/ovf-env.xml even though that is where waagent stores the copied file?
<smoser> blackboxsw: i dont know if ip ushed 'save' on my comments...
<smoser> but there, the presense of that directory should not indicate anything to cloud-init.
<smoser> it just inidcates that walinux-agent has run at some point
<blackboxsw> and if /var/lib/cloud/seed/azure/ovf-env.xml exists we are saying this is an azure environment and that's okay right?
<blackboxsw> because that check is also not currently passed on azure instances I'm able to boot (because we get our data originally from /dev/sr0 right
<smoser> blackboxsw: i dont care so much about the seed dir.
<blackboxsw> ok I'll drop both
<smoser> seed dirs (other than NoCloud) are really just test paths
<blackboxsw> right
<smoser> blackboxsw: i think dropping the seed dir support is acceptable
<smoser> but i think it is not related to your changes
<smoser> so i'd leave that path un-modified in yours and you can do a separate merge proposal for dropping it entiresly.
<blackboxsw> smoser: sorry,
<blackboxsw> smoser: sorry, I added a seed_dir detection to DataSourceAzure._is_platform_viable.
<blackboxsw> I was going to drop my additions there
<blackboxsw> and leave ds-identify alone
<blackboxsw> sound good?
<smoser> you should not change behavior
<smoser> is_platform_viable should return True if /var/lib/cloud/seed/zaure/ovf-env.xml exists.
<smoser> but should not care at all about /var/lib/waagent/ovf-env.xml
<smoser> sorry if i was confusing above.
<blackboxsw> ok sounds good. I'll align w/ ds-identify smoser
<blackboxsw> wow IRC timeout
<blackboxsw> ok pushed
 * blackboxsw is working on jinja for #cloud-config unit tests
<shaner> Hi all, any chance at getting some reviews on https://code.launchpad.net/~shaner/cloud-init/+git/cloud-init/+merge/352572
<blackboxsw> sure shaner, will drop in some comments today
<shaner> thanks blackboxsw
<smoser> blackboxsw, rharper ubuntu@129.146.136.219
<smoser> if you want to see a oracle instance
<smoser>  https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/352921
<smoser> and review ^
<rharper> smoser: k
<blackboxsw> Thx
#cloud-init 2018-08-14
<plujon> I'm not sure if this is a cloud-init question... I have an Ubuntu 16.04.5 Digital Ocean droplet.  Yesterday I ran `apt-get update && apt-get dist-upgrade`, and after rebooting, found my ssh server keys had changed.  Further digging reveals that cloud-init generated new ssh keys for my server.  Why would it do that?
<smoser> plujon: if it thoguht you were on a new instance it would do that.
<smoser> it should not do that unless you captured a snapshot and created a new instance from that.
<smoser> if you could post /var/log/cloud-init.log that'd be nice.
<smoser> and if you can file a bug and attach 'cloud-init collect-logs' that would have more info for us.
<smoser> you can run 'ubuntu-bug' and follow prompts
<smoser> plujon: thank you for asking.
<plujon> smoser: The /var/log/cloud-init.log is rather big!  I wonder if this is the relevant piece:
<plujon> http://ix.io/1ka2
<plujon> That's a pair of warnings from journalctl.
<plujon> And from cloud-init.log, the key regeneration: http://ix.io/1ka8
<plujon> I'm hesitant to file a bug because it might be something Digital Ocean does intentionally, for some inexplicable reason.
<smoser> plujon: are you not wanting to paste things?
<smoser> pastebinit /var/log/cloud-init.log will do it for you
<smoser> wrt if it is intended or not... i would suspect it is not.
<smoser> just showing that it *did* regenerate keys (i believed you the first time) isn't really helpful. the full is needed to see why.
<smoser> if you're afraid of personal data being exposed, and you feel morme comfortable sharing with me privately that is fine too.
<plujon> smoser: http://paste.ubuntu.com/p/X3CpJqhj6m/
<smoser> plujon: thanks. i'll take a looka fter this call.. 15 minutes or so
<brtknr> Is the command for running tests simply "tox"
<brtknr> ERROR: FAIL could not package project - v = InvocationError('/usr/bin/python2 /opt/bharat/cloud-init/setup.py sdist --formats=zip --dist-dir /opt/bharat/cloud-init/.tox/dist (see /opt/bharat/cloud-init/.tox/log/tox-0.log)', 1)
<brtknr> I get this when i try to run tox
<brtknr> ERROR: invocation failed (exit code 1), logfile: /opt/bharat/cloud-init/.tox/log/tox-0.log
<brtknr> ERROR: actionid: tox
<brtknr> msg: packaging
<brtknr> cmdargs: ['/usr/bin/python2', local('/opt/bharat/cloud-init/setup.py'), 'sdist', '--formats=zip', '--dist-dir', local('/opt/bharat/cloud-init/.tox/dist')]
<brtknr> Traceback (most recent call last):
<brtknr>   File "setup.py", line 272, in <module>
<brtknr>     version=get_version(),
<brtknr>   File "setup.py", line 83, in get_version
<brtknr>     (ver, _e) = tiny_p(cmd)
<brtknr>   File "setup.py", line 48, in tiny_p
<brtknr>     (cmd, ret, out, err))
<brtknr> RuntimeError: Failed running ['/usr/bin/python2', 'tools/read-version'] [rc=1] (, git describe version (17.2-187-gf6249277) differs from cloudinit.version (18.3)
<brtknr> )
<brtknr> ERROR: FAIL could not package project - v = InvocationError('/usr/bin/python2 /opt/bharat/cloud-init/setup.py sdist --formats=zip --dist-dir /opt/bharat/cloud-init/.tox/dist (see /opt/bharat/cloud-init/.tox/log/tox-0.log)', 1)
<brtknr> Okay it worked after changing version inside cloudinit/version.py from 18.3 to 17.2
<blackboxsw> brtknr: what distro?
<brtknr> centos
<blackboxsw> 7 or 6?
<blackboxsw> I have an image here, will double check tip vs an older version
<brtknr>  7.5.1804
<blackboxsw> thx will check
<smoser> plujon: thanks. there is a bunch of bugs represented there... :-(
<plujon> smoser: Heh, you are welcome!
<smoser> this was a 14.04 upgraded ?
<plujon> smoser: Hmm.  Good question.  Let me see...
<plujon> The machine was initiated on 2016-11-07.  I think it was 16.04.
<smoser> plujon: you're probably right. i didnt realise/remember that 16.04 started with such an old version
<smoser> but you are correct. GA was 0 .7.7
<smoser>  https://launchpad.net/ubuntu/+source/cloud-init
<blackboxsw> hrm smoser one issue with hoisting EphemeralDCHP up into get_data is that _poll_imds (called by crawl_metadata) needs access to the dhcplease lease as well as retrying the EphemeralDHCP context manager  on urlerror.
<blackboxsw> I'll try thinking about how to consolidate both callsites
<smoser> hm.
 * blackboxsw doesn't believe we need the EphemeralDCHP within a try/except UrlError...  loop for Azure._poll_imds.  I'll try pulling it up into get_data and just passing the lease info into _poll_imds(lease)
<blackboxsw> hmm but passing the lease into crawl_metadata poses a problem because that call signature needs to be common across all DataSources :/
<blackboxsw> I guess I could persist an instance variable DSAzure.dhcp_lease
<blackboxsw> this feels dirty/brokne
<blackboxsw> this feels dirty/broken
<smoser> hm..
<blackboxsw> the whole point of EphemeralDHCPv4 is quickly setting up network for a short period of time and tearing it down afterward, hence the context manager, but this doesn't quite align w/ poll_imds which wants to repeatedly attempt to hit dhclient to get new addressess across a polling loop exception boundary.
<rharper> blackboxsw: can we subclass into a PollingEphemeralDHCP ?
<blackboxsw> it kinda feels like poll_imds should be our greater context....
<blackboxsw> rharper: yeah I kinda feel like that rework needs to happen, PollingEphemeralDHCP., like the name,
<blackboxsw> then we'd have one single context which would result in a properly configured network when we clear that 'gate'
<blackboxsw> ok that feels a bit like rework/tech-debt that'd involve a bit more eyes (like azure folks too) to weigh in
<blackboxsw> because I need access to a configured test/dev platform which blocks on imds polling
<blackboxsw> to validate behavior
<rharper> heh
<rharper> yeah
<shaner> smoser: really appreciate the review, will work on making the necessary fixes
<smoser> shaner: great. please feel free to ping. and also if i'm delinquent feel free to ping.
<shaner> smoser: ack, will do
<smoser> rharper: what is the 'manual' word for netplan
<rharper> Requried for somethin g
<smoser> :-(
<smoser> required for boot != please do not bring up automatically
<rharper> oh, critical connection
<cyphermox> no
<rharper> you said mainly which means 'ignore me'
<cyphermox> there's no such thing as "manual".
#cloud-init 2018-08-15
<brtknr> what does it take to setup a dev environment for cloud init?
<brtknr> i have just tried running tox on centos 7.5 and ubuntu 18.04 and both report plethora of error messages on the master branch
<brtknr> centos: ERROR: InvocationError for command '/opt/bharat/cloud-init/.tox/py3/bin/python -m nose --with-timer --timer-top-n 10 --with-coverage --cover-erase --cover-branches --cover-inclusive --cover-package=cloudinit tests/unittests cloudinit' (exited with code 1)
<blackboxsw> brtknr: as far as I recall we didn't touch tox support on centos:      try 'make ci-deps-centos; make tests'
<smoser> brtknr: need more info on the error. full pastebin ?
<blackboxsw> also on an ubuntu system w/ lxc one can run something like ./tools/run-container --package --source-package --unittest centos/6
<blackboxsw> smoser: I'd like to take your suggestion to bubble up network setup into Azure.get_data as a tech-debt item, there's their poll_imds use-case  which wants to re-try dhcp attempts across IMDS metadata updates due to a user-triggered IP address change. I think this will need broader discussion with Azure and/or access to a test env that exhibits the imds metadata change to verify that a single dhcp. I'm not sure if we
<blackboxsw> want to block  the Azure network-config-per-boot branch on that refactor or ot.
<blackboxsw> what I'm asking is do you think https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/352660 can land as is if we have a card/topic/bug for summit next week to discuss this consolidation (and possible rework of EphemeralDCHPv4
<smoser> blackboxsw: are you happy with branch as it is right now ?
<smoser> i marked approve
<blackboxsw> thanks smoser yeah, I was going to try a separate proposal for consolidation of EphemeralDHCPv4 uses for azure as the next branch so we can chat over it w/ Azure
<smoser> blackboxsw: set 'approved'. it should land.
#cloud-init 2018-08-16
<powersj> blackboxsw, fyi it may not land as I gotta figure out what to do with the new vlan + proxy
<powersj> I updated a bunch of jobs, but still need to debug the ci jobs
<blackboxsw> No worries powersj.  Thanks for the heada up
<smoser> blackboxsw: jenkins didn't like your aure branch
<braincod1> Hello cloud-init devs!
<braincod1> I have a userdata.sh that I would like to drop into /var/lib/cloud... it is not entirely clear to me (from the docs), if I can do that as-is with a bash script without migrating to cloud-init YAML DSL?
<braincod1> I tried putting it under per-boot, but doesn't seem to work as expected
<smoser> braincode: you're wanting shell script to be executed every boot ?
<braincode> Yeah, it just provisions an EC2 instance
<braincode> This one: https://github.com/umccr/umccrise/blob/master/deploy/roles/brainstorm.umccrise-docker/files/bootstrap-instance.sh
<braincode> I started to migrate it to cloud-init YAML, but there are some parts I would need to refactor (namely, env vars) for it to work with cloud-init: https://github.com/umccr/umccrise/blob/master/deploy/roles/brainstorm.umccrise-docker/files/cloud-init.yml
<smoser> that should generally work
<smoser> http://paste.ubuntu.com/p/kBJCbWXdKJ/
<smoser>  https://github.com/umccr/umccrise/blob/master/deploy/roles/brainstorm.umccrise-docker/files/cloud-init.yml is confused.
<smoser> #cloud-config
<smoser> must be yaml formatted, and can do things lke you have there 'fs_setup:'
<smoser> but your 'sleep 10' is not useful there.
<smoser> the first half of that script... can be in a per-boot script
<smoser> you can put cloud-config things in /etc/cloud/cloud.cfg.d/99-your-stuff.cfg
<smoser> but those things must be valid yaml.
<braincode> Yeah, the cloud-init one is work in progress
<braincode> That's why is half-baked, sorry
<smoser> hhttp://paste.ubuntu.com/p/QdDYygjNzd/
<smoser>  http://paste.ubuntu.com/p/QdDYygjNzd/
<braincode> @smoser, I'm getting this out of dropping the shell script in per-boot: https://paste.ubuntu.com/p/5Tc33RtnQQ/
<smoser> but you should absolutely be able to put that shell script into /var/lib/cloud/scripts/per-boot
<braincode> Yes, I know, work in progress, I'm focusing now on just the shell script for now
<smoser> (probalby you want per-instance... so it only runs once per instance)
<smoser> then chmod 755 it
<smoser> and it should be good
<smoser> if its not executable it will be ignored.
<smoser> if you have issues....
<smoser> almost certainly there is a 'WARN' in /var/log/cloud-init.log
<smoser> and the output of such scripts is in /var/log/cloud-init-output.log
<smoser> (if not redirected)
<smoser> i might suggest in the script at the top you can redirect it yourself... as I did with the 'tee'
<smoser> (
<smoser> long
<smoser> script
<smoser> here
<smoser> ) 2>&1 | tee /run/this-is-my-output.log
<smoser> make sense ?
<braincode> Yeah, thanks a ton! :D
<smoser> but if you weren't making the script executable, then cloud-init will ignore it.
<braincode> Also not entirely clear to me from https://cloudinit.readthedocs.io/en/latest/topics/datasources.html how to get those AWS env vars I'm defining on the script?
<braincode> gotta go, brb, thanks a lot @smoser!
<braincode> What I mean is that it is well documented where that json with all the instance-data sits, but it's not clear how to i.e fetch instance-id or region and then using it on the cloud-init.yaml? Am I allowed to call the python methods shown somehow from the YAML?
<blackboxsw> Braincode you are only a few days too early.  https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/335290
<blackboxsw> Should be able to use it in jinja templates
<blackboxsw> As.soon as we land that branch
<braincode> Hoho, awesome! :)
<braincode> Thanks for the suggestions on the 755... it went a little bit further but not all the way it seems: https://paste.ubuntu.com/p/54Rmy8jpJD/
<braincode> That is, with https://github.com/umccr/umccrise/blob/master/deploy/roles/brainstorm.umccrise-docker/files/bootstrap-instance.sh
<braincode> ... and I did not cut any further output at the end, that's it, all cloud-init executes from the scrip... wait a second, yeah, for sure it's about the Instance Role permissions
<nicolas17> hi
<nicolas17> on one of kde.org's servers, cloud-init decided that it has never run before
<nicolas17> which means it eg. re-generated the ssh host keys
<nicolas17> any idea why that can happen? I have logs
<nicolas17> https://paste.kde.org/pl76fn3ks
<nicolas17> I guess the error on line 498 may be it?
<powersj> smoser, blackboxsw ci is broken until I figure out the proxy
<smoser> i noticed :)
<blackboxsw> smoser: on your oracle instance I presume the commented #datasource_list: [ OpenStack ] in 99-oracle-bare-metal-datasource.cfg was you right for your dev/testing
<smoser> blackboxsw: this is apita
<smoser> right. i commented that out.
<smoser> this negative skip is annoying
<smoser> and i'm not sure the one in DataSourceConfigDrive is done right...
<blackboxsw> yeah and  $ python3 -c 'from cloudinit.sources.DataSourceOpenStack import detect_openstack; print(detect_openstack())'
<blackboxsw> True
<blackboxsw> yes, the negative skip as a policy due to ds detection order stinks.
<blackboxsw> so I only recently added that OracleCloud.com asset_tag check to DSOpenStack.
<blackboxsw> do we have different asset tags on OCI-classic vs. OCI
<blackboxsw> doesn't appear to. /me grasps for some other env variable that'd be specific to oci-c only
<blackboxsw> as a 'pass' for DataSourceOpenStack.
<smoser> no asset tags on oci-classic
<smoser> uses ec2
<smoser> not covered here, not negaively affected here.
<blackboxsw> sooo, since the only add of 3cee0bf8 was enabling detection of DMI_ASSET_TAG_ORACLE_CLOUD, we can back that out with your Oracle Datasource introduction
<blackboxsw> right?
<smoser> no.
<smoser> because then we'd have to deal with "transfer"ing
<smoser> blackboxsw: i'm currently confused at this
<smoser> $ sudo python3 -c 'from cloudinit.stages import _pkl_load; print(_pkl_load("/var/lib/cloud/instance/obj.pkl").sys_cfg.get("datasource_list", "why is there no datasource_list"))'
<smoser> why is there no datasource_list
 * blackboxsw hrm same on lxc... checking
<smoser> $ sudo python3 -c 'from cloudinit.stages import Init; x = Init(); print(x.cfg.get("datasource_list"))'
<smoser> ['Oracle', 'None']
<smoser> i thought that that was basically where the datasource got its sys_cfg
<smoser> blackboxsw: sorry. i rebooted. cleaned and rebooted. seeing if it comes up different.
<blackboxsw> no worries smoser I'm on a container
<smoser> well at least it recreates.
<blackboxsw> sources.find_sources gets passed a valid/populated datasource_list
<blackboxsw> checking to see what it does w/ self.cfg
<blackboxsw> checking to see what it does w/ Init.cfg
<blackboxsw> saves as ds.sys_cfg
<blackboxsw> hrm
<smoser> so
<smoser>  sudo cloud-init clean --logs
<smoser>  sudo cloud-init init --local
<smoser>  sudo python3 -c 'from cloudinit.stages import _pkl_load; print(_pkl_load("/var/lib/cloud/instance/obj.pkl").sys_cfg.get("datasource_list", "why is there no datasource_list"))'
<smoser> you will see the datasource now
<smoser>  sudo cloud-init init
<smoser> try again and it is not there.
<smoser> s/datasource/datasource_list/
<blackboxsw> hrm something w/ restoring from cache? not seeing that on lxc's nocloud datasource between --local and network stage runs.
<[42]> i'm doing something wrong somewhere i believe
<[42]> i'm setting a hostname in my user data and i can see it's set in /etc/hosts
<[42]> but it doesn't seem to set the system hostname as i can see in /etc/hostname / hostnamectl
<[42]> preserve_hostname is false
<[42]> any ideas what could be blocking this?
<blackboxsw> dumb, lxc does reproduce
<blackboxsw> i was not printing the pkl'd object datasource_list, but Init's.cfg['datasource_list']
<blackboxsw> so upon reload of cache in network stage, datasource_list is dropped
<[42]> i found https://git.launchpad.net/cloud-init/tree/doc/examples/cloud-config.txt#n291
<[42]> how does it check for "not modified by user"?
<[42]> does that even apply if it's set to false in /etc/cloud/cloud.cfg?
<[42]> found it in /var/lib/cloud/data/previous-hostname
<blackboxsw> so logic for hostname set is here :   /usr/lib/python3/dist-packages/cloudinit/config/cc_set_hostname.py, it checks a previous-hostname cache file to compare hostname/fqdn against any previous cloud-init configured hostname. if cloud-init thinks hostname didn't change, it'll emit a log  "No hostname changes. Skipping set-hostname"
<blackboxsw> sorry took me a while to type. yes that's the cache cloud-init compares against
<blackboxsw> I'd expect you are seeing that Skipping... log in /var/log/cloud-init.log
<[42]> i didn't see that line, might be due to the old debian cloud-init
<blackboxsw> [42] is debian on 0.7.9 still? we've definitely overhauled a bit of the cc_sethostname logic a bit after that.
<[42]> yes
<blackboxsw> :/ yeah I don't think we have the same problem/behavior anymore. I believe we made the cc_sethostname a bit more flexible.
<blackboxsw> flexible/re-entrant
<blackboxsw> cc_update_hostname is the other module that is at play here
<blackboxsw> ... and there really shouldn't be two modules for this ultimately.... but we haven't yet aligned that behavior but we've talked about that a possible future
<smoser> [42]:the intent is that it stores what it was previously
<smoser> and if it is not what it was previously, then it assumes written by user.
<[42]> smoser: yup figured that out
<smoser> yeah.. /me catches up with backscroll
<[42]> unfortunately others administrating these systems don't always invest enough time to figure out why things don't work to properly fix it rather than just quick fixing it on one system
<[42]> which isn't useful when cloning a machine
<smoser> blackboxsw: http://paste.ubuntu.com/p/cwBT2fNMHp/
<smoser> blackboxsw: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1787459 <-- thats a bug for the sys_cfg problem
<ubot5> Ubuntu bug 1787459 in cloud-init (Ubuntu) "datasource.sys_cfg gets different values in local stage and after." [Undecided,New]
<blackboxsw> almost done w/ your review on original branch smoser. will check paste andbug in a few mins
<smoser> blackboxsw: "a couple of nits about persisting variables"
<smoser> but i dont see comments i line
<smoser> inlin
<smoser> inline
<blackboxsw> bah, checking the submit button
<blackboxsw> ok smoser so your paste you are allowing oracle as openstack if Oracle not in ds list. so old behavior persists for existing pets which are upgraded.  and there is not a world where ds-identify would report OpenStack on OracleCloud.com per your new changes, so looks safe from that front.
<blackboxsw> +1 on your paste
<blackboxsw> smoser: 6 unsaved comments posted on your oci branch
<smoser> blackboxsw: https://meet.google.com/bxm-azib-mtj
<smoser> blackboxsw: http://paste.ubuntu.com/p/V2qQ4BWtVg/
<blackboxsw> smoser: looks good although the ci-b64: prefix not being processed to base64-encoded-keys. I can't decide if I care enough about the interim state of the data.
#cloud-init 2018-08-17
<smoser> blackboxsw: ah. yeah, that we shoudl get there... as you can see it just called util.json_dumps
<braincode> Hello cloud-init! If I drop a script under /var/lib/scripts/per-boot ... is it right to assume that by the time cloud-init picks up on it the instance/machine I'm running on will have networking up?
<braincode> It's not entirely clear to me when reading https://cloudinit.readthedocs.io/en/latest/topics/boot.html :-S
<braincode> @powersj, great blogposts about cloud-init, btw! ;)
<blackboxsw> agreed, good work powersj.
<powersj> thanks :)
<smoser> blackboxsw: i'm going to set 'approved' on azure-network-per-boot
<smoser> ok?
<smoser> blackboxsw: https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/352921 ... review and approve there would be grand.
<blackboxsw> will get that now
<blackboxsw> powersj: is https://jenkins.ubuntu.com/server/job/admin-lp-git-autoland/ something we need to kick today (instead of automated trigger)
<blackboxsw> smoser: and I both have cloud-init branches that are queued approved
<powersj> blackboxsw, I disabled testing while deej and I were playing with the network settings
<powersj> give me 20 to re-test things so we don't get silly failures
<powersj> should be setup now for the weekend
<smoser> ok. i was just about to ask that too
<blackboxsw> gotcha sounds good
<smoser> blackboxsw: i'd like a upload to cosmic today.
<blackboxsw> should I unset approve on the cloud-init branches for 20?
<blackboxsw> smoser: +1 me too
<powersj> nah nothing will run
<blackboxsw> ok leaving branches in approved state
<smoser> i'm tempted ... really tempted to merge manually.
<blackboxsw> will put up a cosmic MP 5 mins after they land
<blackboxsw> smoser: your week is almost over.
<blackboxsw> maybe it's best for these two so you don't have to hang around waiting on me
<smoser> ok. ... we think review-mps should work, right ?
<blackboxsw> we'll have more branches to autoland monday (removal of deprecated snap MPs, and jinja template hopefully)
<blackboxsw> smoser: I think it should I'll do one test locally against my own repo
<blackboxsw> smoser: looks good locally shall I
<blackboxsw> ?
<blackboxsw> review-mps rather
<powersj> blackboxsw, I enabled testing
<powersj> watching jobs notifications...
<blackboxsw> you beat the manual response. L(
<blackboxsw> :)
<blackboxsw> so I won't touch review-mps then
<powersj> yeah looks like there are 2 getting tested currently
<blackboxsw> hopefully serially :)
<blackboxsw> otherwise we risk a merge collision I suppose
<powersj> test runs concurrently, lander is serially
<blackboxsw> ahh I misunderstood. ok you are running lander tests on non-cloud-init public reviews
<blackboxsw> ok
<powersj> non-cloud-init?
<blackboxsw> ok looks like those are only ~5 mins anyway
<blackboxsw> ok
 * powersj choose a bad week to make this change
<blackboxsw> next week would be a bad week
<powersj> yeah :\
<blackboxsw> :)
<blackboxsw> ok, just so smoser doesn't have to stay late I'll run review-mps on one branch now
<powersj> will that land them?
<powersj> blackboxsw, ^
<blackboxsw> that'll land it in master
<powersj> well... there are 2 branches almost done
<powersj> let me see what is under test
<blackboxsw> cancelled, the branch names are yours looked like
<blackboxsw> I thought you were running 1-off tests
<powersj> looks like scaleway?
<powersj> 51f49dc1da0c045d01ddc977874c9bed70cb510f
<blackboxsw> not a valid approved review
<blackboxsw> we should be looking at just oracle or azure related branches
<blackboxsw> so I thought you were doing a little test branch run
<powersj> "<powersj> blackboxsw, I enabled testing"
<powersj> DEBUG: Merging feature/azure-network-per-boot to master
<powersj> that just landed
<blackboxsw> yeah "testing" was vague in my interpretation... testing meant your own test branch runs, "enabled autolander" meant ci was going
<blackboxsw> ahh
<powersj> ah
<powersj> DEBUG: Merging feature/oracle-datasource to master
<powersj> that just landed
<blackboxsw> ahh interesting, so the description on the given test run
<blackboxsw> is tip of master, not the commit of the branch under test=...
<blackboxsw> ok
<blackboxsw> good, I had cancelled my local review-mps on the azure branch
<powersj> ah yeah! that's because we checkout master and merge in the review
<powersj> that is misleading :\
<blackboxsw> yeah makes sense then
<powersj> ok so cloud-init ci is good
<powersj> now gotta go look at curtin
<blackboxsw> excellent
<powersj> also have an integration test running
<powersj> now that I have ppa access again
<blackboxsw> thanks, again powersj . ok I'm cutting a cosmic release branch
<blackboxsw> for review
<blackboxsw> https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/353353 for cosmic release
<blackboxsw> smoser: ^
 * blackboxsw is running through a quick upgrade check on azure
<blackboxsw> azure upgrade path good on cosmic MP
<blackboxsw> minor pre-existing bug found with cloud-init status if someone uses the unsupported CLI path of 'sudo cloud-init init; cloud-init status --wait'. When running any of the stages on the cmdline directly after-boot, we delete /run/cloud-init/result.json   unless the stage run is modules-final... this break cloud-init status --wait into lying that  cloud-init is "running"
<blackboxsw> minor bug we don't really have to address https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1787657
<ubot5> Ubuntu bug 1787657 in cloud-init (Ubuntu) "cloud-init status reports 'running' after 'cloud-init init --local' run on commandline" [Low,New]
<blackboxsw> ok CI approved cosmic branch
<blackboxsw> will dput once there is a 2nd review on changes in https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/353353
<smoser> blackboxsw: sorry, went into a python2.6 hole
<blackboxsw> I believe it
<blackboxsw> 'tis fine. I'm just working on the render subcommand (and a couple other tuneups for CLI)
<powersj> blackboxsw, have a merge for cii fixes incoming
 * blackboxsw ducks
<powersj> heh
<powersj> well it's that or lots of emails this weekend ;)
<powersj> I guess those are easy to deal with
<smoser> blackboxsw: i just approved that.
<smoser> thanks for landin the others
<blackboxsw> thanks auto-powers
<smoser> blackboxsw: shoot
<blackboxsw> smoser: listening?
<smoser> http://paste.ubuntu.com/p/gy2vWgQ8Mx/
<smoser> that needs doing (and smoke testing) on the ubuntu/devel branch
<smoser> best ifdone before you uplaod
<blackboxsw> smoser: ahh right, pkg update
<smoser> h
<smoser> ok. i have to run
<blackboxsw> will d
<blackboxsw> do
<blackboxsw> have a good one
<powersj> blackboxsw, https://code.launchpad.net/~powersj/cloud-init/+git/cloud-init/+merge/353355 if you aren't still trying to get cosmic out
<blackboxsw> powersj: just finished cosmic upload
<powersj> sweet
<blackboxsw> will look after curtin upload
<blackboxsw> :)
<powersj> thank you!
 * blackboxsw adds another tab... this time to firefox 
<blackboxsw> I'm separating my abuse of browser tabs to avoid chrome memory throttling
#cloud-init 2018-08-18
<smoser> blackboxsw: ok. one or 2 questions
<smoser> then i do go.
<smoser> 18.3-24-gf6249277-0ubuntu2
<smoser> did you actually upload a 0ubuntu1 ?
<blackboxsw> smoser: I didn't so I was debating about just consolidating into ubuntu1
<smoser> if not... you could have just adjusted
<smoser> yeah
<smoser> git rebase is really qonderful for that sort of thing
<blackboxsw> yeah ok I'll do that and the merge into origin/ubuntu/devel
<smoser> and in order to facilitate easier git rebase
<blackboxsw> yeah rebase -i
<blackboxsw> will do and collapse it
<smoser> and also easier 'git cherry-pick'
<smoser> if you put the 'update changelog' in a separate commit
<smoser> then its easier
<smoser> so commit one of debian/cloud-init.templates
<smoser> and then commit -m 'update changelog'
<smoser> because otherwise cherry-pick or rebase with those commits is always oing to conflict on a debian/changelog
<smoser> which isn't helpful
<blackboxsw> ok I get it, so don't collapse"fixup" the separate changelog commit
<smoser> no worries. its all fine.
<smoser> did you upload ?
<blackboxsw> nope just saw my curtin ubuntu/devel merged
<blackboxsw> you didn't do that right?
<smoser> yeah, just have 2 commits. you can look at the log in ubuntu/devel, its filled with "update changelog" messages for each change in debian/
<smoser> i did not intend to merge anything..
<blackboxsw> I was waiting on autolander for the curtin branch.
<smoser> hm..
<blackboxsw> and hoped that jenkins would do it.
<smoser> i was always just merging the debian branches myself.
<smoser> building loacally and pusing.
<smoser> but its fine if we want the lander to do it.
<smoser> anyway.. i really should go
<smoser> enjoy the rest of your friday  night ;)
<blackboxsw> and it is merged now so I'll upload curtin per build-and-push  https://code.launchpad.net/~chad.smith/curtin/+git/curtin/+merge/353361
<blackboxsw> http://paste.ubuntu.com/p/Vq8Mn2pTpt/
<blackboxsw> thanks will do see ya
<blackboxsw> yeah waiting on autolander is taking too long, will sort what I can and get it tonight or tomorrow
<blackboxsw> night
<blackboxsw> lander failures 2018-08-18 03:00:18,589 - tests.cloud_tests - ERROR - stage: collect test data for xenial encountered error: [Errno 28] No space left on device: '/tmp/LXDInstance-cloud-test-ubuntu-xenial-modules-ntp-servers-2s2jjcujy8pnqtgks3gb_h9t5s'
<blackboxsw> gonna have to clean up torkoal again
<blackboxsw> ok gotta upload in prog head off https://launchpad.net/ubuntu/+source/cloud-init/18.3-24-gf6249277-0ubuntu1/+build/15278099
#cloud-init 2018-08-19
<akik> how can i configure cloud-init for smaller timeouts if there's no cloud configuration data available?
<akik> it took minutes to boot the vm on my local machine with cloud-init enabled
#cloud-init 2019-08-12
<Odd_Bloke> rharper: blackboxsw: https://code.launchpad.net/~daniel-thewatkins/cloud-init/+git/cloud-init/+merge/371053 <-- should be ready for re-review
<Odd_Bloke> (Don't land it yet though, I'll want to do a final smoke test on Oracle with all changes included before it lands.)
<rharper> Odd_Bloke: cool
<danielmarquard> o/ hello! curious if anyone could provide guidance or a link to documentation on how and where cloud-init executes user data and downloads public keys. we track a single config for our centos AMI, built on-prem. it creates a user, but doesn't explicitly make any mata data calls, yet userdata executes by default. trying to understand how cloud-init knows to do these things by default and how the behavior can be changed
<rharper> danielmarquard: hi,  depending on the platform you launch an instance, the Datasource used is different, for example on EC2, cloud-init will certain metadata calls per that class, and if launched in Azure, it will make others.    what exactly do you need to do?
<rharper> https://cloudinit.readthedocs.io/en/latest/topics/modules.html#ssh  https://cloudinit.readthedocs.io/en/latest/topics/modules.html#ssh-import-id are two modules which can import ssh keys
<danielmarquard> rharper, thanks for the docs. ultimately, we need to ensure that our custom runcmd stanza executes before userdata
<rharper> you likely want bootcmd
<rharper> that runs early
<danielmarquard> perfect, that sounds about right. thanks for the guidance :)
<rharper> danielmarquard: it's worth looking at /etc/cloud/cloud.cfg to see the config_*_module lists,   that shows the order in which modules are run;  bootcmd runs early, but it also runs before the user is created; so that may present a problem depending on your script.
<danielmarquard> we're expanding all partitions in the root volume on a per-instance basis. will keep that module prioritization in mind. thanks!
#cloud-init 2019-08-13
<nils_> am I correct in assuming that users are created before disk-setup is run? If I needed to put the home directories for users on a different device then I should have the device mounted and formatted in bootcmd?
<felipe_1982_> I am running bookhook scripts in AWS, and the "echo" is not getting logged in /var/log/cloud-init-output.log as the documentation would suggest. Running Amazon Linux 2.
<Odd_Bloke> rharper: Hmm, I was going to suggest cc_growpart/cc_resizefs for the partition expansion question from yesterday, but it looks like cc_resizefs _only_ resizes the root FS.
<lfain> What is the recommended way to access vendor data in a pre-installed cloud-init script? Right now, I get it by calling http://169.254.169.254/openstack/2018-08-27/meta_data.json but I'm looking for a generic method that will work with any datasource type.
<Odd_Bloke> lfain: o/ What do you mean by a "pre-installed cloud-init script"?  You may want to look at `cloud-init query`.  (Run `cloud-init query -a` to see all the instance-data.)
<rharper> Odd_Bloke: yeah,  it's always been root volume only;
<rharper> nils_: look at /etc/cloud/cloud.cfg , 'cloud_init_modules' shows the order in which they run;  users-groups is next to last, and runs after disk_setup and mounts; which you'll need both to create the disks/partitions to hold the user dirs and then mount entries to mount before the user is created.
<nils_> rharper, it may be the ancient version bundled with CentOS, in my case users-groups is last in cloud-init-modules, mounts is first in cloud-config-modules
<Odd_Bloke> rharper: Right, but growpart supports non-root volumes, so the discrepancy caught me off-guard.
<Odd_Bloke> I guess at _first_ boot, you generally won't have filesystems on anything but your root volume.
<rharper> nils_: that's ... odd
<Odd_Bloke> tribaal: https://bugs.launchpad.net/cloud-init/+bug/1839854 <-- as a CloudStack(-ish) operator, I'd love your thoughts on this bug
<ubot5> Ubuntu bug 1839854 in cloud-init "CloudStack provider cannot determine correct metadata IP with multiple network interfaces" [Undecided,New]
<rharper> Odd_Bloke: it's possible that secondary disks could be partial, but unlikely; the resize was mostly about using cloud images overtop a later device
<lfain> I made a custom image where I put a script in the /var/lib/cloud/scripts/per-instance/ folder. This script uses parameters that I provide via the meta data.
<rharper> lfain: you may want your script to read /run/cloud-init/instance-data.json then; that has the instance metadata available
<lfain> rharper: I forgot to write that the image is based on CentOS7 with cloud-init v.18.2 installed. Therefore, I can not run "cloud-init query"   and the /run/cloud-init/instance-data.json also is not available.
<rharper> hrm
<lfain> Is there a way to install latest cloud-init on CentOS7? I didn't find RPMs for it.
<rharper> https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/   we do have daily rpm builds;
<rharper> https://copr.fedorainfracloud.org/coprs/g/cloud-init/el-testing/packages/
<rharper> has 19.2
<lfain> Thank you, rharper. I'll try 19.2. If I run into a problem, then I will return back to my current method (that I don't like...)
<rharper> lfain: cool!
<AnhVoMSFT> Hi folks, I have a CI build failed for this MP https://code.launchpad.net/~vtqanh/cloud-init/+git/cloud-init/+merge/369785 but the link does not work for me so I can't see what went wrong
<rharper> AnhVoMSFT: let me see and I'll let you know
<AnhVoMSFT> thanks rharper
<rharper> AnhVoMSFT: looks like a harness issues, I'll rerun the job
<AnhVoMSFT> thanks
<AnhVoMSFT> rharper I got disconnected from IRC but it looked like CI build failed again
<rharper> AnhVoMSFT: yes, tracking issue down on our side
<rharper> I've poked at jenkins; so this next run should work; but you won't need to do anything; we'll make sure it runs to completion , and if there's an issue in the branch, I'll let you know
<Odd_Bloke> rharper: blackboxsw: What is the relation between /v/l/cloud/instance/user-data.txt and user-data.txt.i?
<AnhVoMSFT> thanks rharper
<rharper> Odd_Bloke: I'd have to look, I thought it was something to do with mime wrappers
<rharper> ah, userdata_raw => user-data.txt ,   "processed user-data" => user-data.txt.i
<rharper> Odd_Bloke: see cloudinit/stages.py:def _store_userdata()
<AnhVoMSFT> looks like it passed this time
<rharper> AnhVoMSFT: yeah, I'll review
<Odd_Bloke> rharper: Did you get to the bottom of that mkdir cloud_tests error?
<rharper> Odd_Bloke: yes; lxd snap on the host needed someone to sudo mkdir a path that we use
<rharper> I suspect the upgrading/updating may have removed it from the common dir
<Odd_Bloke> Aha, OK, thanks!
<Odd_Bloke> Glad I asked, else I was going to spend 3 hours on triage today. :p
<powersj> O.o
<blackboxsw> rharper: Odd_Bloke care if I publish cloud-init to eoan again (now that Azure v2 networking is landed
<blackboxsw> or shall we wait on OCI
<blackboxsw> Odd_Bloke: LGTM https://code.launchpad.net/~daniel-thewatkins/cloud-init/+git/cloud-init/+merge/371053 minor comments/questions for clarification but nothing blocking
<Odd_Bloke> blackboxsw: Good points, I'll fix those up first thing tomorrow.
#cloud-init 2019-08-14
<Odd_Bloke> blackboxsw: rharper: I've addressed the outstanding comments on https://code.launchpad.net/~daniel-thewatkins/cloud-init/+git/cloud-init/+merge/371053; I'll land it after lunch if I don't hear any complaints.
<blackboxsw> Odd_Bloke: my back hurts.... does that count?
<rharper> Odd_Bloke: sounds good
 * blackboxsw needs to get some coffee  before I continue with my standup routine
<Odd_Bloke> blackboxsw: (N.B. I will not be listening to anyone.)
<blackboxsw> Odd_Bloke: one CI issue pycodestyle, then you are good https://code.launchpad.net/~daniel-thewatkins/cloud-init/+git/cloud-init/+merge/371053
<lfain>  Just installed cloud-init v.19.2 on CentOS7. I see that the kernel cmd line parameter "cloud-init=disabled" is ignorred.
<lfain> is it a known issue?
<blackboxsw> lfain: from docs, looks like that should still work, I'd be curious about the log files collected by `sudo cloud-init collect-logs` along with `cat /proc/1/cmdline`. If possible please file a bug with that content https://bugs.launchpad.net/cloud-init/+filebug
<blackboxsw> instead of cat /proc/1/cmdline......  python3 -c 'from cloudinit import util; print(util.get_cmdline())'
<blackboxsw> looks like I lost lfain :/
<Odd_Bloke> Not worth digging further until we have a bug report, but it's possible that the generator might not be installed correctly on CentOS.
<Odd_Bloke> Just a thought to bear in mind if it comes up again.
<blackboxsw> yep agreed
<Odd_Bloke> blackboxsw: https://code.launchpad.net/~daniel-thewatkins/cloud-init/+git/cloud-init/+merge/371203 now has the FreeBSD changes.
<blackboxsw> grabbing
<blackboxsw> approved
<blackboxsw> rharper: Odd_Bloke I think we are good on https://code.launchpad.net/~vtqanh/cloud-init/+git/cloud-init/+merge/369785
<rharper> blackboxsw: +1
<blackboxsw> approving that puppy
<felipe_1982_> My bookhook script is calling "echo" but the text does not appear in /var/log/cloud-init-output.log as expected. Amazon linux 2. Where do the messages from boothook go? I need to log and debug. Thanks.
#cloud-init 2019-08-15
<felipe1982> where is the output from "echo" in a "boothook" script sent?
<felipe1982> not seen in /var/log/cloud-init-output.log
<otubo> I have a small question about ds-identify: on line 1447 checks $DI_DSNAME, which is the data source specified on the configuration file. If it's specified, that's enough reason to end the script and return FOUND to the systemd generator. Am I correct?
<otubo> I mean, it doesn't really checks if the data source specified in the configuration file really exists.
<otubo> I see lots of commits form smoser, so I'll just ping :-)
<smoser> otubo: yes, that is what it does
<smoser> that would be specified as 'datasource=' in the ds-identify config.
<smoser> which is not documented, somewhat by design.
<smoser> https://bugs.launchpad.net/cloud-init/+bug/1838092
<ubot5> Ubuntu bug 1838092 in cloud-init "Add documentation for ds-identify configuration" [Wishlist,Triaged]
<smoser> and /me goes afk for a bit.
<otubo> smoser, shouldn't it really check if the data source exists? Let me explain my issue: I'm deploying a VM with vSphere with network customization (aka data source). First boot ok, network is configured. Second boot, ds-identify fails to check that the data source is gone (because of this check I mentioned) and cloud-init is still enabled. With cloud-init enabled and no data source, it falls back to default configuration (dhcp in my case)
<rharper> otubo: generally if the command line (or config on disk) is telling cloud-init which datasource to use, then there's nothing to check.  If you do not hardcode the datasource in cmdline or in config, then ds-identify will do the checks to find the datasource
<lfain> The cloud-init.service  that is provided by cloud-init-19.2+10.g7f674256-1.el7.noarch.rpm for Centos7 doesn't contain "ConditionKernelCommandLine=!cloud-init=disabled"
<lfain> Is it a bug?
<Odd_Bloke> lfain: I believe the cloud-init generator reads the kernel cmdline and disables all the cloud-init services if that line exists.
<Odd_Bloke> On an Ubuntu system, that generator is installed to /lib/systemd/system-generators/cloud-init-generator.
<lfain> Unfortunately, cloud-init 19.2 ignores presence of "cloud-init=disabled" in the kernel command line in Centos7...
<Odd_Bloke> lfain: Do you have that generator installed, do you know?
<Odd_Bloke> (If it's not exactly in that path, try `find / -name cloud-init-generator`.)
<lfain> Yes, it is installed. I just checked.
<lfain> "/lib/systemd/system-generators/cloud-init-generator"
<Odd_Bloke> lfain: OK, this is probably worth a bug report.  Could you file one using https://bugs.launchpad.net/cloud-init/+filebug and make sure you attach the output of `cloud-init collect-logs` on a failing instance, please?
<lfain> Ok, I will
<otubo> rharper, doesn't that increases the boot time? On the second boot of my use case I get 1 min and 34 seconds of boot time just for cloud-init-loca.service. Also, even if I don't set the data source on ds-identify.cfg, ds-identify recognizes OVF (vmware) but resets the network configuration to dhcp
<lfain> Odd_Bloke: Is it safe to add the "ConditionKernelCommandLine=!cloud-init=disabled" meantime?
<rharper> otubo: ds-identify runs fast, it's reading data from /proc and sys;  it does invoke blkid once; but the overhead isn't high at all.   you can run cloud-init analyze show to see where time was spent.  depdenting on the datasource, local may attempt to extract metadata from a network source, which means cloud-init will do a dhcp and then crawl metadata for network config.
<rharper> 1m30 seconds *sounds* like a network timeout though
<rharper> logs will tell the story
<rharper> otubo: if someone specified a datasource that wasn't there but is network-based, then it's going to try to read a url without networking ... or wait for networkd-wait-online.service to complete
<Odd_Bloke> lfain: You'd need to do it for every cloud-init unit.  But, to be clear, I don't know for sure that it will do what you want.
<otubo> rharper, I'm a little bit puzzled from this command, look at the output: https://pastebin.com/hHtSapG4
<otubo> rharper, on line 18 it says it couldn't find OVF data source, even if ds-identify recognizes it.
<otubo> rharper, and on line 38 and 39 it spends a lot of time looking for network based configuration from EC2 and OpenStack (as you said above)
<rharper> otubo: it does not look like ds-identify found an OVF, it looks completely disabled since cloud-init is searching all datasources
<rharper> we recently fixed a but with the generator on RHEL systems where ds-identify didn't have the correct path
<rharper> do you have a /run/cloud-init/ds-identify.log ?
<otubo> rharper, ok, two important things to mention here: 1) I'm using version 18.5 (but we're open to upgrade if necessary); 2) I can see from ds-identify.log and from the function dscheck_OVF() that it returns FOUND if disable_vmware_customization is set to false (which is in this case)
<rharper> otubo: on (2) do you see a /run/cloud-init/cloud.cfg  ?
<rharper> ds-identify should write out a simple config of datasource: [<Found ds name>, None]
<rharper> then cloud-init will only search that list of datasources rather than the complete list;
<otubo> rharper, yes datasource_list: [ OVF, None ]
<rharper> and if you've a cloud-init.log  then let's take a look
<otubo> rharper, sure, hang on
<otubo> rharper, You have exceeded the maximum paste size of 512 kilobytes per paste. hahahaha do you really need it?
<rharper> that's  a lot of reboots =)
<rharper> just the first boot or two
<otubo> rharper, as I understand ds-identify finds OVF because it is set on cloud.cfg (disable_vmware_customization to false) and it triggers it to NOT find the data source hence resetting to dhcp. Now the reason why cloud-init tries all other data sources I don't know.
<rharper> what a mess;  ds-identify can find OVF without it being set in config
<rharper> disable_vmware_customization doesn't have anything to do with which datasource is selected, so that wouldn't make cloud-init search all datasources again
<rharper> something else is going on; so the first boot or two logs
<otubo> rharper, ok, hang on
<otubo> rharper, this is the last reboot https://pastebin.com/80atbhBZ
<otubo> rharper, disclaimer: there's some curse words in Portuguese as I was doing some debugs.
<rharper> oh man, so .. OVF has a really rotten bug right now w.r.t instance-id, https://bugs.launchpad.net/cloud-init/+bug/1835205
<ubot5> Ubuntu bug 1835205 in cloud-init "OVF datasource should check if instant id is still on VMware Platform" [Medium,Triaged]
<rharper> 2019-08-15 13:06:35,076 - util.py[DEBUG]: waiting for configuration file took 90.096 seconds
<rharper> that's your long boot time
<otubo> rharper, yep, just saw it too.
<rharper> the OVF datasource _resets_ it's instance-id on each boot so it could apply customizations after a first boot;  I'm not sure how we let that slip into the OVF datasource but it's just wrong from a cloud-init perspective
<otubo> rharper, and I just realized that `cloud-init analyze show' also shows other boots not only the current one. So hopefully ds-identify is doing its job correctly. So the only problem now is fixing this bug you just mentioned.
<rharper> at least that bug needs a fix;  there could be more issues; but it's hard to say without sorting out what to do.
<otubo> rharper, I'm gonna try to contact Pengpeng Sun, perhaps we could work together.
<rharper> otubo: if you have any vmware contacts, it would be great to reach out to them on this issue;  if we don't have anything before the summit, I'd really like to get together then and walk through the vmware scenarios to see how we can support as many and without this instance-id trick
<otubo> rharper, sounds good. BTW, I'll be at the Summit, just got my flight tickets.
<rharper> o/
 * rharper raises other hand
<rharper> \o/
<otubo> rharper, I think I've been working with you for *at least* 8 years (since IBM..) and I have no idea how's your face :-D
<rharper> hehe
<rharper> I think I put something up on linkedin a while back
<otubo> rharper, anyway, I'll work on this issue and will keep launchpad updated. Thanks for the help :-)
<rharper> otubo: great!
<rharper> if you have an existing RHBZ, feel free to link to that from launchpad as well, via the  Also affects distribution/package  link, you can specify distro and bug url
<otubo> will do!
<Odd_Bloke> rharper: So I'm looking at the dracut network configuration, and I was assuming this would be relatively easy as this is sysconfig.  However, it looks like we don't have support for parsing sysconfig network configuration, only for rendering it.  Am I missing something, or is that the case?
<blackboxsw> rharper: do you have access to an OpenStack intsance w/  multi-nic and or  bonds bridges where we can dump curl http://169.254.169.254/openstack/latest/network_data.json   I was looking to avoid setting that up myself if you had a reference (this would be for my testing of OpenStack networking v2
<blackboxsw> s/a reference/an example/  ...  I'm working on handling the unit test example coverage for v2, but wondered if you had any more complex examples in mind too
<blackboxsw> Odd_Bloke: objections to my uploading to eoan? we have a couple of fixes, azure & oracle that could use some 'exposure' before feature freeze (and the rest of the oracle work is likely going to be a day or two)
<Odd_Bloke> blackboxsw: Nope, except to observe we'll probably want to do an upload right before FF anyway.
<blackboxsw> right +1. no concerns with more uploads :P
<blackboxsw> ok kicking one off
<blackboxsw> ok done [ubuntu/eoan-proposed] cloud-init 19.2-13-g2f3bb764-0ubuntu1 (Accepted)
<rharper> blackboxsw: I was talking with jamespage about that; and it's not very common; I think maybe ironic backend;  the closest to that we have would be something on rax baremetal; they deploy with bonds and vlans;
<rharper> I have some older network_data.json we used in the original parsing, but the upstream "format" should still show all of the details
<rharper> https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/metadata-service-network-info.html and https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/multiple-fixed-ips-network-information.html (proposed but not  yet landed I think)
<rharper> Odd_Bloke: that's correct (we don't have a read-from-sysconfig -> network_state) path;  there are sysconfig reader helpers in tree used for updating and rendering sysconfig from distro classes;
<blackboxsw> thanks for the reference rharper
<Odd_Bloke> rharper: OK, thanks, I'll look around for those.
<rharper>  cloudinit/distros/parsers/sys_conf.py
<rharper> and cloudinit/distros/rhel_util.py
<Odd_Bloke> Thanks!
<blackboxsw> minor fix for ubuntu-drivers for review https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/371369
<Odd_Bloke> blackboxsw: Reviewed.
<blackboxsw> Odd_Bloke: just saw. wrapping up comments now thanks
<blackboxsw> Odd_Bloke: did you mention no sysconfig parsers in cloud-init? cloudinit/distros/parsers/sys_conf.py
<blackboxsw> not sure if you already ran into that case
<blackboxsw> rharper: ^?
<rharper> blackboxsw: I passed it along
<Odd_Bloke> blackboxsw: Yeah, rharper pointed me at it. :)
<Odd_Bloke> Thanks!
<blackboxsw> sorry missed that response
<rharper> np
<blackboxsw> just ran into it while looking at all our uses of ConfigObj in existing cloudinit
<rharper> blackboxsw: I hope you saw my comment w.r.t the drivers branch
<blackboxsw> hadn't yet, refreshing rharper
<rharper> IIUC, the /etc/default/* stuff is shell sourceable, so you can't use spaces in variable assignment
<blackboxsw> ahh
<blackboxsw> will use load_shell_content instead
<blackboxsw> thx
<blackboxsw> hrm but as Odd_Bloke mentioned, once we load_shell_content and attempt re-writing the file for amendments, we'll lose pre-existing comments if added by someone outside of cloud-init
<blackboxsw> not sure if we care or not, but if we are re-writing the file, then I guess we could add the # written by cloud-init header
<blackboxsw> and we can cope with bugs and improve it if need be
<rharper> we can load the content to see what's there and then decide how to update/modify
<rharper> we could render via append with #header\n<content>\n#footer
<blackboxsw> and I see we use         return shlex.split(blob, comments=True)
<blackboxsw> so I think we do have the ability to retain comments
<blackboxsw> but maybe that's just on the specific line which contains a value
<blackboxsw> will test
<Odd_Bloke> My original comment doesn't stand as it's shell content.
<Odd_Bloke> So we may just be able to stick with what you had?
<blackboxsw> Odd_Bloke: rharper I think we do need to stick with what I had as load_shell_content actually strips comments, and trying to preserve comments across the shlex.split use is more work than just parsing line by line I feel.
<rharper> blackboxsw: agreed;  I really don't expect that file to have a lot of manual changes
<blackboxsw> rharper: Odd_Bloke force pushed, comments addressed I think.
 * blackboxsw force-pushed set of fixes to https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/371369
<blackboxsw> woot! thx Odd_Bloke rharper on https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/370970
 * blackboxsw celebrates with a victory lap for the trello card
<rharper> nice
<Odd_Bloke> \o/
<felipe1982> where is the output from "echo" in a "boothook" script sent? not seen in /var/log/cloud-init-output.log
<blackboxsw> felipe1982: it could be an error in the script preventing you from getting to execution. try running /var/lib/cloud/instance/boothooks/part-001 to see if you get an error or your echo
#cloud-init 2019-08-16
<smoser> felipe1982: if you have an older version of cloud-init, you may not  have the cloud-init-output.log enabled. and it will just go to wherever the init system output goes (probably /dev/console)
<smoser> but, yeah, you probalby can run that as blackboxsw pointed out
<felipe1982> amazon linux 2 is quite new. I would have assumed it was latest [enough]
<felipe1982> what version would I need for *boothook* to write to /var/log/cloud-init-output.log
<Odd_Bloke> Just realised that I did all of my testing for the Oracle secondary VNICs branch with a local change to prefer data source network config over initramfs config.
<Odd_Bloke> So a follow up MP will be coming Real Soon Now.
<Odd_Bloke> rharper: blackboxsw: https://code.launchpad.net/~daniel-thewatkins/cloud-init/+git/cloud-init/+merge/371403 <-- actually enable Oracle secondary VNICs (instead of just generating config which would enable them and throwing it away)
<smoser> :q
<smoser> oops
<Odd_Bloke> rharper: On https://bugs.launchpad.net/cloud-init/+bug/1799301, I think the problem is that it _does_ use sysconfig and we don't detect that we should be using sysconfig, right?  Or are you wondering if the Fedora comment is actually a distinct bug?
<ubot5> Ubuntu bug 1799301 in cloud-init "SUSE sysconfig renderer enablement incomplete" [Medium,Incomplete]
<rharper> Odd_Bloke: yes; but it seems to be related to where the additional files we check are located
<rharper>    'etc/sysconfig/network-scripts/network-functions',
<rharper>         'etc/sysconfig/network-scripts/ifdown-eth'
<rharper> we look for those; I wondered why Fedora wouldn't have those if they're still using sysconfig
<rharper> w.r.t OpenSuSE; they have different paths
<rharper> at least in some of the releases; hence the bug
<blackboxsw> per my ubuntu-drivers branch I think there is an option to provide debconf-set-selections instead of cloud-init knowing to write out /etc/default/linux-modules-nvidia. I'm exploring that option now since we already do debconf-set-selections for byobu, ca-certs and grub
<rharper> blackboxsw: ok;  writing out the line in the conf is going to be faster/cheaper than invoking debconf ;  if the config value or what needs emitting is expected to change ; that does abstract cloud-init from those sorts of changes;
<blackboxsw> rharper: Odd_Bloke smoser: for those interested, the 2nd commit on my branch updates cloud-init to just use debconf-set-selections, making it also a much smaller branch. (assuming that is the right selection value. I like abstracting cloud-init from the config implementation details if possible, in case they change. Even if it is a little slower for the corner-case deployment where cloud-init user is on gpgpu.
<blackboxsw> https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/371369
<blackboxsw> so if we like the 2nd commit more than the first. I can rebase and squash of we get confirmation that debconf-set-selection is valid
<blackboxsw> *if we get confirmation*
#cloud-init 2019-08-17
<ahmedbilalkhalid> How does cloud-init knows that it has already run once?
#cloud-init 2020-08-10
<Odd_Bloke> Idle thought: what if we emitted a MOTD message if cloud-init doesn't run successfully?
<smoser> Odd_Bloke: there is infrastucutre for that ... sort of.
<smoser> the warnings stuff.
<Odd_Bloke> Ooh, nice, I hadn't seen that before.
<smoser>  i swear that that someow showed motd
<smoser> but i dont see it now.
<smoser> ah. tools/Z99-cloudinit-warnings.sh
<Odd_Bloke> falcojr: https://github.com/canonical/cloud-init/pull/493 has landed. \o/
<falcojr> sweet! I'll get on the next part right away
<falcojr> Odd_Bloke: FYI, https://github.com/canonical/cloud-init/pull/528
#cloud-init 2020-08-11
<meena> looks like we don't even need `dmidecode` on FreeBSDâ¦ https://github.com/puppetlabs/facter/pull/2021/files#diff-91b2b45b4b35ec20430b96e0d0d8dcf8 â¬ï¸ we can just use `kenv`
<Odd_Bloke> falcojr: Just finished reviewing https://github.com/canonical/cloud-init/pull/528, FYI.
<falcojr> thanks
#cloud-init 2020-08-12
<amansi26> We have some modules specific to a particular platform, what should be procedure to contribute a new module in the cloud-init community ? Is there any restriction on doing so? I came across this document https://cloudinit.readthedocs.io/en/latest/topics/hacking.html . Wanted to confirm is there anything apart from that is needed to contribute?
<amansi26> can anyone guide me on this?
<smoser> amansi26: there is not really anything else.
<smoser> you're welcome to ask questions here. many/all of developers are in this channel.
<smoser> you could open a bug and describe what you're after, or just make a pull request.
<amansi26> As in OpenStack a feature contribution happens via blueprint/spec, that first needs to be approved before the code can be approved. So just want to confirm is there anything cloud-init also needed?
<amansi26> smoser:Does community have any restrictions accepting modules that are specific to a particular platform / architecture?
<smoser> no blueprint like thing.  if its a more complex idea that you'd like discussion on, then mailing list is probably the best path forward.
<smoser> Odd_Bloke, blackboxsw_ ^ ?
<smoser> no restrictions on platform/arch... it just has to do the right thing
<amansi26> smoser: Whats the mailing list to contact?
<Odd_Bloke> amansi26: https://cloudinit.readthedocs.io/en/latest/topics/code_review.html#asking-for-help <-- cloud-init@lists.launchpad.net
<Odd_Bloke> mruffell: I'm picking https://github.com/canonical/cloud-init/pull/514/ back up again, aiming to have it landed by the end of this week; if you have any further comments/responses, now is your chance. :)
<Odd_Bloke> falcojr: Is your Oracle PR ready for re-review?
<falcojr> Odd_Bloke yes, sorry...forgot to re-request
<Odd_Bloke> No worries!
<Odd_Bloke> falcojr: The tests are failing, I've commented (https://github.com/canonical/cloud-init/pull/528/files#r469501088) on the part of the code that's causing it (spoiler alert: it's because xenial is old).  (I think the changes will be isolated there, though, so I'm not blocked on continuing my review.)
<falcojr> gah, httpretty strikes again!
<falcojr> thanks
<falcojr> actually, Odd_Bloke that's an old build
<falcojr> I fixed that one already
<falcojr> the new failure is on bionic and something about packaging???
<falcojr> I restarted the job because I didn't see an obvious reason, but still investigating
<falcojr> gah, nevermind...the ordering of things in travis confused me
<falcojr> ignore me :D
<Odd_Bloke> ^_^
<mruffell> roger that Odd_Bloke, I am happy to help with testing
<Odd_Bloke> mruffell: OK, that's a good response.  (No further comments/responses allowed. ;)
<Odd_Bloke> falcojr: I've re-reviewed #528.
<falcojr> cool, I should get to it tomorrow morning
#cloud-init 2020-08-13
<AnhVoMSFT> @Odd_Bloke @blackboxsw_ @rharper any rough ETA on when the next SRU of cloud-init will be?
<Odd_Bloke> AnhVoMSFT: Within the next couple of weeks: once https://github.com/canonical/cloud-init/pull/528 is landed, and once https://github.com/canonical/cloud-init/pull/514 has been approved (that's a packaging change, so it won't "land" per se), we'll be starting to look at it.
<Odd_Bloke> FYI for fokls who aren't already aware: I will be on vacation for a couple of weeks starting at the end of this week.
<Odd_Bloke> ("fokls" <-- I am evidently in need of a vacation ;)
<AnhVoMSFT> Thanks @Odd_Bloke
<rharper> eandersson: hey , I saw your reponse on the openstack timeout issue;
<rharper> eandersson: I was hoping to see a cloud-init log (or if you know) whether the timeout is seen during cloud-init-local.service where it's bringing up the ephemeral dhcp client, or if it's seen much later ...  and if you're using latest cloud-iinit on ubuntu or something else?  I'd really like to rule out that it's not the classless static route issue
<eandersson> rharper http://paste.openstack.org/show/Udy604fiNCxNqP9pTV7D/
<eandersson> So I have seen a couple examples of this and we can reproduce it pretty easily.
<meena> rharper: what is, "the classless static route issue"?
<eandersson> This is on CentOS. I haven't tried it on Ubuntu.
<eandersson> But the underlying issue is basically either network or something not quite fast enough on the OpenStack side.
<eandersson> When creating a ton of VMs (e.g. 200+ VMs at the same time)
<eandersson> When I tested by allowing it to retry at least a few times it would never fail.
<rharper> https://bugs.launchpad.net/cloud-init/+bug/1821102
<ubot5> Ubuntu bug 1821102 in cloud-init "Cloud-init should not setup ephemeral ipv4 if apply_network_config is False for OpenStack" [High,Fix released]
<rharper> meena: ^
<rharper> it's sense been fixed
<rharper> eandersson: thanks, lemme look
<rharper> scale testing sounds like dhcp response times are slow; so it's likely unrelated to the bug I mentioned;
<eandersson> My motivation is really that if something like the EC2 (Amazon) data source has a retry, it is very reasonable that the OpenStack should be allowed to retry at least a couple of times.
<eandersson> But it's easy to just override the default cloud-init pages so it's not a hill I would die on. :D
<eandersson> *settings
<rharper> right, i'm still trying to see if we're failing to bring up networking correctly, or if it's the IMDS itself that's failing;  the snippet is hard to see the whole contenxt
<Odd_Bloke> A big difference (until recently, of course) is that EC2 only had Intel hardware, so we would always be able to determine that we were on EC2 reliably (and maybe we still can in their ARM, I don't recall).
<eandersson> Yea - I unfortunately don't have the full logs at the moment.
<Odd_Bloke> So EC2 can configure a timeout because that timeout should only affect EC2 instances.
<eandersson> But this issue is for sure that either the network (e.g. vxlan) or dhcp / network namespace / security group isn't ready.
<rharper> eandersson: it does look like this is during early boot and the ephemeral dhcp; I see the ip commands which tear down what we brought up
<rharper> eandersson: if possible, it would be great to see an Ubuntu log where I know we've got the classless static routes issue fixed;
<eandersson> How would that bug be presented?
<eandersson> Reading
<rharper> so,  looking at the logs for the bug I posted
<rharper> they look almost exactly like yours
<rharper> http://paste.openstack.org/show/796830/
<eandersson> It's for sure not the same issue because we don't have an additional route for 169
<eandersson> and the metadata proxy is local on the hypervisor
<rharper> ok; if it's not the bug and it's a slow metadata service; and 50 seconds isn't enough (10 second timeout * 5 retries); we're left with bumping a default for openstack which makes the remaining network datasources to wait longer;
<eandersson> Yea - I mean an alternative would just to be add some documentation tbh.
<rharper> Odd_Bloke: one area we didn't continue the discussion on was whether we should sort the datasource_list by total wait time (timeout * retries)
<rharper> configdrive is still an option for openstack;
<eandersson> I have never seen it fail more than once btw.
<eandersson> I probably could make it fail more, but would need to create a lot of VMs :D
<rharper> well, it already retries 5 times; with 10 second timeouts before retry
<eandersson> It does not
<rharper> well, then I want to see that log
<rharper> what you posted showed exactly one; the default for openstack is 10 second timeout and 5 retries
<eandersson> > Giving up on OpenStack md from ['http://169.254.169.254/openstack'] after 0 seconds
<eandersson> The TCP timeout is indeed 10s
<eandersson> Which is why it "waits for 10s"
<eandersson> That code path does not adher to the retries
<rharper> eandersson: I don't understand; https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/__init__.py#L172  this is the default config for all datasources, unless the subclass overrides;
<rharper> DataSourceOpenstack.py does not have an override (expect in your PR);
<rharper> so, I see no reason why OS would *not* retry the expected 5 times;   unless we're not getting 404; and there's some url_helper path that's skipping the retry ...
<rharper> hence needing the logs
<eandersson> It's because the code path uses wait_for_url
<eandersson> https://github.com/canonical/cloud-init/blob/411dbbebd328163bcb1c676cc711f3e5ed805375/cloudinit/sources/DataSourceOpenStack.py#L82
<eandersson> It does not even pass on the retries
<eandersson> It basically only uses the max_wait_seconds to determine how long it will retry for
<eandersson> In this case it's -1 (no retries) so it never retries
<johnsonshi> Odd_Bloke: Thanks for your always reviewing my PRs Odd_Bloke! I believe my PR (https://github.com/canonical/cloud-init/pull/468/) is finally ready.
<eandersson> It does however use the tcp timeout. So it may look like it tries for "10 s"
<blackboxsw_> Odd_Bloke: you mentioned a couple of prs that needed review, I just +1'd https://github.com/canonical/cloud-init/pull/531 were there 2 others that needed eyes?
<eandersson> https://github.com/canonical/cloud-init/blob/master/cloudinit/url_helper.py#L344 wait_for_url is defined here
<rharper> yeah; I see the try once with -1 ; which is the default;
<eandersson> Yea - that code path is just so unforgiving
<rharper> so, CloudStack, Ec2, Exoscale, MAAS, and Openstack (ignoring ovf since it's local ds mostly); all have max_wait set to something other than default;
<eandersson> And there are so many things that happens at the same time when spawning a new VM.
<Odd_Bloke> blackboxsw_: Ryan beat you to them. :)
<Odd_Bloke> (Thanks for the review!)
<blackboxsw_> that's the spirit !
<rharper> only OpenStack and MAAS, I think support multi-arch
<Odd_Bloke> Obviously more thanks to rharper for doing multiple reviews. ^_^
<rharper> Odd_Bloke: sure
<rharper> man, Xenial is the killer here (and maybe other OS which use cloud-init but don't use ds-identify)
<Odd_Bloke> johnsonshi: My pleasure!  I'm taking another look through the code to confirm that I'm happy with it, but note that I would like you to revert your removal of the type annotations before it lands.
<rharper> it will walk each DS at network time; so putting a large max_wait means pain for anything coming after it
<AnhVoMSFT> @Odd_Bloke - regarding typehints, what is your recommendations regarding other distros like RHEL/CentOS/SUSE, who have many enterprise customers that continue using it for at least another 3-5 years? There are features that we should be delivering them irrespective of the python version. Do you recommend we send another PR to stable-19.4 with the exact change except dropping the typehints
<AnhVoMSFT> in that PR?
<johnsonshi> Odd_Bloke: Thanks for the note about the type hints. Any recommendations for the pre-existing RHEL and CentOS image maintainers out there? Those images will still be supported for a few more years.
<AnhVoMSFT> oops, didn't realize johnsonshi was on IRC just now and asking same question :-)
<Odd_Bloke> I don't have any particular recommendations, no: it really depends on how they handle backporting cloud-init changes (and, for that matter, how their distro/company handles backporting Python 3-only projects to their Python 2-only releases more generally).  Or, expressed another way: I would expect them to know better than any recommendation I could give. :)
<Odd_Bloke> I think using `stable-19.4` would be appropriate, but note that the type annotations are not the only way in which cloud-init code cannot run on Python 2 any longer, so just removing them may not be sufficient.
<Odd_Bloke> Of course, it's only worth using `stable-19.4` if the maintainers of the existing images will use it, so you'd need to coordinate that with them.
<AnhVoMSFT> is typehints going to be a requirement for all new methods going into cloud-init?
<rharper> Odd_Bloke: so I think the best test of the cost of changing openstack max_wait to something other than -1 would be ec2 xenial boots; which at this time still are not ds-identify strict; so it will walk all of the ds in the list;  we could look at the current boot time now;  make the change to openstack per eandersson PR and retest  boot to see the impact
<Odd_Bloke> AnhVoMSFT: Not a requirement, but I would expect they will become used more generally as people adapt to being able to use them.
<Odd_Bloke> And I would like to discuss the timeline for dropping 3.4 support at the summit, after which point we would be able to use the `typing` module; I would expect an uptick in their use then, too.
<Odd_Bloke> (We can only annotate simple cases without `typing`, which obviously reduces the number of annotations across the codebase.)
<AnhVoMSFT> Given that this PR brings more clarity due to the refactoring and additional telemetry ( we previously did not capture the call to extract goal state ), I think the typehints isn't the major enhancement and dropping it doesn't really make the code worse (the code is being made better by the refactoring, it's not being made worse because the typehints weren't there to start with)
<AnhVoMSFT> by extract goal state I mean extracting the certificates.xml - duh
<Odd_Bloke> To be clear, I didn't say that the PR made our codebase worse, I said that type annotations make our codebase better.
<Odd_Bloke> I'd be happy to accept the type annotations in a follow-up PR, if that would give you an easier-to-backport commit?
<johnsonshi> Odd_Bloke AnhVoMSFT: Thanks to both of you for clarifying. :) What should the next steps be for my PR?
<johnsonshi> AnhVoMSFT Odd_Bloke: Ok I'll be going with that suggestion and have that in a follow up PR instead. Thanks folks.
<AnhVoMSFT> I think we should address the typehints in a follow-up PR, perhaps adding typehints to all the methods in DataSourceAzure, that'll be cleaner and provide better static analysis during testing too
<Odd_Bloke> AnhVoMSFT: We won't be able to add them to all the methods until we have `typing` access, but I agree that typing as many as we can currently would be great!
<AnhVoMSFT> Thanks @Odd_Bloke - I think one topic for the cloud-summit this time should be on how distro maintainers (mostly looking at RHEL, SUSE, Oracle Linux) plan on packaging cloud-init to support their python 2.7 and python 3.4 customers
<Odd_Bloke> johnsonshi: AnhVoMSFT: Landed! :)
<johnsonshi> Odd_Bloke: Woooo! :) I think that was the first substantial PR I made in the cloud-init codebase. Thanks for reviewing! :)
<Odd_Bloke> johnsonshi: Thank you for your work, and congratulations! :)
<Odd_Bloke> AnhVoMSFT: Agreed, that would be a good topic to include.
<Odd_Bloke> rick_h: AnhVoMSFT has suggested including a summit topic on how distro maintainers who still have to maintain packages for Python 2.7 systems (and, once we drop 3.4 support, 3.4 systems) can collaborate.  We wouldn't have any skin in that game as Ubuntu (xenial is on 3.5), but I think it would be a really valuable topic for us to facilitate as cloud-init upstream.
<AnhVoMSFT> johnsonshi I'm looking forward to the follow-up PR. I think there are quite a few important things to follow up
<rick_h> Odd_Bloke:  AnhVoMSFT that makes sense. I'll add it to the topic list. I think this is another place that having the discourse category for cloud-init might be of use. I don't have permission to set it up but working on getting it enabled.
<robjo> What determines the Data source dependencies?
<robjo> Seeing some flakiness in Azure
<Odd_Bloke> robjo: "dependencies" in what sense?
<robjo> Looking for data source in: ['Azure', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM', 'NETWORK']
<robjo> IN this case the Azure data source does not get loaded, i.e. I end up with Searching for network data source in: ['DataSourceNone']
<robjo> but when I get: "Looking for data source in: ['Azure', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM']
<robjo> then the Azure data source gets loaded and the instance gets provisioned
<robjo> there is no message in the importer, thus I don't know if we are trying to load the Azure data source and it fails and gets discarded, that would then point to Azure
<rharper> robjo: in sources/DataSourcesAzure.py   at the bottom, there should be 'datasources = []'
<rharper> robjo: Azure has been local (DEP_FILESYSTEM) for some time
<rharper> it uses the EphemeralDHCP to bring up networking to fetch the network config from IMDS; so it wants to only run at local time;
<rharper> well, not an empty array, but that's where it's defined;
<robjo> It has "datasources = [
<robjo>     (DataSourceAzure, (sources.DEP_FILESYSTEM, )),
<robjo> ]
<robjo> "
<robjo> so that looks OK
<rharper> yrd
<rharper> yes
<rharper> that's what's in master
<robjo> still running 19.4 but that's there already
<robjo> so where does the "NETWORK" part come from and why do I end up with None being loaded instead of Azure?
<robjo> None has both dependency on "sources.DEP_FILESYSTEM, sources.DEP_NETWORK"
<rharper> so stages.py:init runs a fetch() which calls the datasource fetch code, this looks for <site-pages>/cloudinit/sources/*.py;  for each of those files, it checks for the datasource attribute; and extracts the array and sees if there is a datasource class name DataSourceXXX, that has deps that match;
<rharper> robjo: it sounds like something went wrong at cloud-init-local.service time;  such that when cloud-init.service runs (which runs with mode=dsnet NETWORK deps) that it's still looking for a datasource
<rharper> there isn't a NETWORK Azure datasource, it only runs at local time
<robjo> so the Azure data source gets discarded because of no network dependency
<robjo> fair enough, but how do I get there in the first place?
<rharper> the question in the log is why wasn't Azure found at local time
<rharper> if you make it to cloud-init.service (stage 2) on Azure without finding the datasource then it's all going downhill from there
<rharper> Azure its found via several checks, azure_chassis in DMI data, or if it has an azure seed dir or a specific ovf-env.xml file;
<eandersson> Thanks rharper. Let me know how it goes. I have alternative ideas if the impact is too large.
<rharper> eandersson: ok, I'll append the suggested test to the PR;
<robjo> OK, yes, 'local' never runs, the log file starts with: "Cloud-init v. 19.4 running 'init' ...."
<rharper> that's do it
<rharper> *that'll do it*
<robjo> Interesting that this is not consistent and region dependent and storage class dependent....
<rharper> that sounds quite odd
<rharper> maybe then a race
<rharper> so sometimes if there's a systemd unit race, systemd will evict a unit
<rharper> can you check in the logs for 'cycle' I think is what systemd emits
<robjo> systemd expects a live system, AFAIK, given that the system with the condition never boots I can only extract information by attaching the system disk from the failed system
<rharper> should be in syslog or messages; or if your image writes persistent journal, then you can offline journalctl --directory /path/to/journal  to dump
<robjo> thanks, time to go dig, messages file is empty so there are obviously other issues and journald is setup to forward to rsyslog
<AnhVoMSFT> I felt like I ran into this problem before, with init running but not init-local
<AnhVoMSFT> which distro is this?
<rharper> AnhVoMSFT: likely suse related
#cloud-init 2020-08-14
<Odd_Bloke> smoser: https://github.com/canonical/cloud-init/pull/536 <-- the Oracle comment update I mentioned previously
<minimal> Hi folks. Anyone able to give some advice about test_handler dealing with testcases where a cloudinit module writes to a specific file (rather than passed as a parameter) and where the testcase also needs to load the file afterwards to check expected contents? My original method of letting the test_handler create the directory required ("/etc/apk/") for the tests failed with a permission error.
<rharper> minimal: there are a few unittests whic huse the FilesystemMocking class
<rharper> minimal: I think the ntp handler test-case that writes out template files ..
<rharper> minimal: also the reRoot() should let you write files and read them back in via the temp path
<minimal> rharper: Hi. Not looking to mock the write but rather to override the filename used for write by the module and also used for loadfile by the test handler.
<minimal> rharper: I had a look at many of the test_handlers but still haven't figured things out yet.
<minimal> Tried using mkstemp in the setUp function of the class to create a file and then in the testcase function tried using "cc_apk_configure._write_repositories_file.repo_file = self.tmp" to override the filename defined by the repo_file string in the module
<rharper> minimal: right, so reRoot() from one of the unittest classes, I'll look up; will change root into a tmp path, so you won't need to change your repo file, and it'll write to tmp_path as if it were root
<rharper> minimal: so, if you use FilesystemMockingTestCase, and then use reRoot(); that should be enough;  minimal also, I think if you look at tests/unittests/test_handler/test_handler_ntp.py:TestNtp(); I think that should do what you want
<minimal> rharper: Thanks I'll take a look at that. Unrelated - who's the best person to talk to about the NTP module? I've added Busybox NTP support but several of the NTP testcases are failing as they assume full NTP functionality rather than Busybox's limited support.
<rharper> minimal: I'm familiar with them
<rharper> minimal: is this related to the pools vs servers bit you mentioned in the PR ?
<minimal> The only thing in a ntp.conf file that Busybox's ntp can handle is a "server" line. Also I believe it can't handle "iburst" as part of a server line, so that various testcases that expect "pool" lines and "server" lines with "iburst" will not work for it.
<minimal> I'd added Busybox NTP support as its lightweight and Alpine's a lightweight distro. Wondering whether I should have just stuck with only Chrony
<rharper> minimal: did you provide a template for it?    I forget
<minimal> yes, again with only server lines
<rharper> I see, so when the alpine template is used, the test-cases themselves expect more things present
<rharper> I wonder if the templates need more logic
<minimal> yes - the NTP testcases loop through through each of the supported software (i.e. ntp, chrony, etc) and for each then test each distro that supports that software
<minimal> so they expect a generated ntp.conf to loop basically the same for Debian, Ubuntu, etc, and Alpine - which is the problem
<minimal> oops, loop == look
<rharper> yeah
<minimal> I could have used Openntp in Alpine for the ntp support but its "old code", from memory the last upstream release was 12-18 months ago, plus Openntp is of similar size to Chrony, whereas for a lightweight distro needing just a NTP client daemon the Busybox NTP seemed like a good candidate
<rharper> so, I think we need two things; 1) when on alpine rendering the template, if the user-data includes unsupported keys (pools e.g.) then cloud-init should emit warnings in the logs;   2) the unittests can be adjusted to handle alpine separately ;  for example we could filter out alpine from the distro list in the module and then add new unittests just for alpine, providing supported user-data (servers) and unsupported
<rharper> (servers/pools) and see that we log the warning
<minimal> Yeah that's why when I realised the implications of Busybox ntp support I wondered who's be the person(s) to consider if it was worth it - I could always drop the (Busybox) "ntp" and just go with Chrony and that would resolve the cc_ntp testcase failures I'm seeing now.
<rharper> minimal: I'm in favor of supporting both;
<minimal> with the present code in my PR if you specify "pools" in the userdata it'll be ignored when the template is processed. However if you specify "pools" but don't specify "servers" then you'll not have any server entries in the resultant config file. If userdata doesn't specifiy either then the code will place default server lines in the config ok.
<rharper> Yes; I think the implementation is fine, but when user-data is not processed as expected (but not fatal) I'm in favor of logging at least a warning;
<rharper> this community goes back and forth between being resilient, not being overly noisy in logs (we're already really noisy) but indicating to users (via the log at least) that cloud-init encountered something it off the normal path
<rharper> minimal: I think if we also in the docs for cc_ntp express that alpine ntp is busybox based and only supports servers (not pool) and indicates that it will enable default servers if no servers are present (and pools are ignored) that should be good enough;
<rharper> even suggesting that if you want/need pools to consider chrony over ntp in your image
<minimal> I'll have a look at the ntp testcases again to see how each it'd be to insert a "if distro != 'alpine':" clause for the ntp tests
<minimal> Re: the other issue - using reRoot had given some partial progress, looks like the next step to fix is that I need to create the "/etc/apk" directory structure inside the new_root so that the module can actually create the file within it
<rharper> minimal: yeah, you can util.ensure_dir(path)
<minimal> rharper: Thanks for your help! - got the cc_apk_repositories test_handler passing all the tests now, just need to squash and push the commit.
<minimal> rharper: I'll now have a look at tweaking the cc_ntp tests to treat Alpine + Busybox NTP as a special case
<rharper> minimal: nice!
