[09:37] <otubo> rharper: I was going over irc logs and found out we agreed to do the NM dropin as a downstream fix, then we would have some time to work on the _postcmd fix on the sysconfig renderer. I'm gonna close my PR, then.
[09:37] <otubo> rharper: I actually already have a draft using _postcmd, I'll be pushing my branch soon.
[10:04] <DanyC> hi folks, i came across https://bugs.launchpad.net/cloud-init/+bug/1020695 and while i see it has never implemented, anyone has another suggestion on how i can achieve the same but in an early cloud-init stage - ie: not final stage (runcmd/ script-user) ?
[10:26] <meena> otubo: cool.
[10:29] <meena> DanyC: there's some ideas here in smoser's (unmerged) branch: https://code.launchpad.net/~smoser/cloud-init/lp1020695/+merge/163216
[10:31] <meena> DanyC: i think i'd revive that bug / patch
[10:31] <meena> DanyC: but, can you explain why you need it super early?
[10:35] <DanyC> meena: sure thing i can. First let me give you the full picture (you might have seen i've asked various q recently) ;) . All i'm trying is to snapshot an EC2 and create an AMI from it with an application configured so i can later create multiple dev env from it. Now i have an application which doesn't cope very well with changing the IP and at the same time it does also look in /etc/hosts file (in addition t
[10:35] <DanyC> o its own "cache").
[10:36] <meena> ooff. I've seen those…
[10:36] <DanyC> and because of all that my final cloud-init stage doesn't kick/ run due to cloud-final.service not being up which is being held by my app service
[10:37] <meena> (my opinion is, like always, a bit different. because i have a config management fetish, in particular puppet)
[10:38] <meena> I'd keep /etc/hosts as created by cloud-init in the AMI, and ensure that the application is *not* auto-started (in the AMI)
[10:38] <DanyC> so i can't use runcmd, i can't use script-user. I tried bootcmd but it doesn't seem to work if i'm trying to "update" the IP in the /etc/hosts file with the info from ec2metadata
[10:38] <meena> then have cfg-mgmt run, fix up /etc/hosts, and whatever else the app needs. and only then enable and start the app.
[10:39] <DanyC> hence my assumption that maybe ec2metadata is not available and so i was looking for s'thing else.
[10:39] <DanyC> meena: but with a cfn-mgmt that will need to be run in user-data no? and if the final stage doesn't kick then i fail to see how it will help me
[10:40] <meena> uuuh… there's a switch somewhere to disable ec2metadata after the first boot, actually…
[10:40] <DanyC> i don't need to be switched, i need to work in the bootcmd stage
[10:41] <DanyC> *to be switched off
[10:42] <meena> i know, it just occurred to me, that you're trying to access it, while it's maybe switched off.
[10:42] <meena> anyway… let's collect all the things we know so far.
[10:42] <meena> first up: what do you think of my idea of only enabling & starting the app once everything is in place?
[10:50] <DanyC> meena: i wished i could do that, sadly i have other 10 services depending on the one which is holding cloud-init
[10:51] <DanyC> the only option i see is to be able to update the /etc/hosts file with the current IP of the new EC2 and bounce the service so i can then let cloud-init other stages kicking in
[10:52] <DanyC> but to do that it seems is harder than i initially thought, is a catch 22. Not to mention i can't change the silly app :facepalm
[10:52] <meena> DanyC: then all 10 services are disabled and stopped until everything is fixed up.
[10:52] <meena> What's the point of having them running, if they don't run correctly?
[10:53] <meena> heck, you could even make it a systemd service that all of them depend on!
[10:55] <meena> after networking, you run, update_etc_hosts.service, it fixes up /etc/hosts, and *then* all services can be started.
[10:55] <meena> think outside the box :P and inside another box!
[10:57] <DanyC> interesting, and i guess you saying i should have in the cloud_init_module stage a step to run before update_eth_hosts module ? or i've misunderstood you ?
[11:03] <meena> i don't really know what the best way is to bring that service onto the AMI.
[11:03] <meena> how are you bringing the other services onto the AMI?
[15:32] <rharper> otubo: ah, ok;  I do still wonder why the ordering is not enough;  my reading of systemd documentation suggests that NM should not be started before cloud-init-local.service has run to completion;   but I don't have the journal of when you saw the failure.
[15:46] <Goneri> blackboxsw, I think I've addressed all your comment here: https://github.com/canonical/cloud-init/pull/62
[15:46] <Goneri> blackboxsw, up to date prebuilt images are available here: https://bsd-cloud-image.org/
[18:27] <blackboxsw> Goneri: Thanks for the ping.  Out of curiosity, how often are the prebuilt images updating cloud-init?
[18:29] <blackboxsw> as in, I wonder if that's a good point of reference which you control that could be referenced if people find bugs/issues on bsd.
[18:29] <blackboxsw> once bsd changes are all integrated upstream in cloud-init I guess we could discuss that further
[18:36] <Goneri> blackboxsw, my goal is to autobuild them as frequently as necessary (e.g: after every merge, or even every PR), but it's still a work in progress
[18:37] <Goneri> for now, I still trigger the build manually.
[19:00] <blackboxsw> rharper: we can land this branch now right? https://github.com/canonical/cloud-init/pull/54
[19:00]  * blackboxsw is scrubbing PRs since I'm in the mood :)
[19:00] <rharper> blackboxsw: yes
[19:00] <rharper> it lands to ubuntu/xenial
[19:01] <blackboxsw> ok will get that landed today
[19:35] <blackboxsw> sorry Goneri, one more pull request landed on master I saw your force push :/
[19:35] <blackboxsw> will wrap up review on that next
[19:36] <Goneri> awesome :-)
[19:38] <meena> blackboxsw: you can probably close some of mine
[19:39] <blackboxsw> yeah, it's about time PRs don't age as well as fine wine.
[19:48] <meena> btw, can someone who's better at python than me, explain why i did this: https://github.com/canonical/cloud-init/blob/master/cloudinit/util.py#L1824 and how can i do this (in a thread-safe way? without spawning / forking?) so that i just say: find me a libc.
[19:49] <sarnold> what problem are you really trying to solve?
[19:56] <meena> sarnold: make this code work on NetBSD, and on the next version of FreeBSD if it changes that .7
[19:59] <sarnold> meena: aha; I think I'd try a loop over several potential libc pathnames, and populate that list with the paths to current and future libc libraries
[20:04] <blackboxsw> Goneri: quick volley of comments {% if variant in ["freebsd", "netbsd"] %}
[20:04] <blackboxsw> oops
[20:04] <blackboxsw> Goneri: on https://github.com/canonical/cloud-init/pull/62 I mean
[20:05] <blackboxsw> something concerns me a bit with the shifting cloudinitlocal to after networking in startup scripts as is diverges from upstream behavior and that could impact our future work
[20:05] <blackboxsw> as we'd have to take into account that netbsd is different in this regard
[20:06] <Goneri> blackboxsw, this is totally a mistake. I'm actually surprised it just works
[20:06] <meena> i could use https://docs.python.org/3/library/ctypes.html#ctypes.util.find_library
[20:06] <Goneri> I will fix the order. thanks blackboxsw for the review!
[20:07] <blackboxsw> Goneri: yeah me too a bit, I would've thought it would have introduced a startup service broken dependency loop or something
[20:07] <sarnold> meena: oh that looks a lot nicer
[20:07] <meena> but i think i'm gonna have to read the actual code to see what that does on the systems before using it.
[20:08] <blackboxsw> meena: have a url for me of the PR that you'd like looked at first?
[20:08] <blackboxsw> to refresh my memory. I'll try to get a pass on it today
[20:08] <meena> blackboxsw: nope.
[20:08] <blackboxsw> heh, will grab one of 'em and work through it
[20:08] <meena> blackboxsw: all i want is the networking stuff on the Mailinglist to get a response :P
[20:10] <blackboxsw> meena: I'll try to get an update to that then. I think the direction that robjo is going with current prs (with flavors on sysconfig renders) is probably the best approach at the moment, but I think sysconfig renderer is the most contentious/dirty of our cases because suse and rhel differ so much in network config flag support. I'll try saying something smart about that on the mailinglist, what blocked me was
[20:10] <blackboxsw> examples & suggestions there.
[20:15] <lachesis> hi all, i'm having a problem with ssh_pwauth setting on DigitalOcean. I am shipping an image that has its own /etc/ssh/sshd_config file with `PasswordAuthentication no` already set, but also with a `MatchUsers` section that enables PasswordAuthentication for some specific users (for somewhat silly reasons that are hard to fix). however, cloud-init is doing a broad string replacement, so it's disabling ssh password auth even for that particul
[20:15] <lachesis> ar user.
[20:15] <lachesis> how can i prevent this? also, where can i open an issue about this bug?
[20:16] <lachesis> i'd really like to mask that setting out in the image... i suppose i could do this by editing the /usr/lib/python3/dist-packages/cloudinit/config/cc_set_passwords.py file to just disable that check
[20:16] <meena> blackboxsw: the examples & suggestions provided, or the ones missing?
[20:17] <meena> blackboxsw: re PRs, i think this one can probably be closed: https://github.com/canonical/cloud-init/pull/69 from what i understand from rharper . and i'm wondering if we shouldn't revert the previous work i did there.
[20:18] <lachesis> i tried putting `ssh_pwauth: unchanged` in `/etc/cloud/cloud.cfg` but that didn't help
[20:22] <rharper> lachesis: could you file a bug and include the tarball from 'cloud-init collect-logs'  and any provided user-data ?   we can look into the issue and see what's going on
[20:23] <blackboxsw> lachesis: you can file a bug here: https://bugs.launchpad.net/cloud-init/+filebug to give a bit more context about the problem.
[20:23] <blackboxsw> sorry, would help if I hit enter
[20:24] <lachesis> :) yeah i'll file there... i am not providing any user data but it is possible that DO is without me... let me see if i can track down where cloud-init fetches that, probably some magic 169 address
[20:28] <rharper> lachesis: right, I see you were adding the config via /etc/cloud/cloud.cfg vs. user-data; that's good enough
[20:30] <blackboxsw> lachesis: if you ever need to check what userdata and vendordata cloud-init sees:   `sudo cloud-init query userdata` or `sudo cloudinit query vendordata` or `sudo cloudinit query --all`
[20:32] <lachesis> ok yeah it looks like ssh_pwauth is coming in from vendor data
[20:36] <lachesis> bug report: https://bugs.launchpad.net/cloud-init/+bug/1865082
[20:41] <rharper> blackboxsw: man, we really should have collect logs pull in  /etc/cloud/cloud.cfg, /etc/cloud/cloud.cfg.d/*.cfg ...
[20:41] <blackboxsw> yeah we really should
[20:41] <blackboxsw> it is a pain point when we have to debug/triage
[20:41] <rharper> lachesis: replied; was this run from an instance with the /etc/clouc/cloud.cfg including the added 'ssh_pwauth: unchanged' ?
[20:41] <blackboxsw> though we omitted it because it could contain sensitive info
[20:42] <blackboxsw> but maybe we lump it into the user-data question in apport
[20:42] <rharper> I mean, any of them could;  so can user-data to some degree
[20:42] <rharper> right
[20:42] <blackboxsw> yeah
[20:42] <lachesis> negative, there was no cloud.cfg in this case. i can rebuild the image and regenerate with that set, but it'll take a few min
[20:42] <rharper> lachesis: you can add that to the ecisting instance
[20:42] <rharper> and then run: sudo cloud-init clean --logs --reboot
[20:42] <lachesis> and reboot it?
[20:42] <lachesis> will do
[20:42] <rharper> that will run like "new instance"
[20:43] <rharper> the code reads that it should exit out without calling the ssh_utils path which reads in sshd ;
[20:45] <lachesis> hmm ok that restarted my box, and it hasn't come back up yet :/
[20:46] <lachesis> it's a cow, not a pet, so i can just destroy and rebuild it, no big worries, but it will prevent me from getting the logs this time :)
[20:46] <rharper> well, that's not nice of DO ...
[20:46] <rharper> lachesis: so DO does provide a *lot* of vendor-data scripts; it's possible that they are including a ssh_pwauth: no  setting by default
[20:47] <lachesis> they definitely are... i included the query --all result in the tar file in that bug
[20:47] <rharper> in which case, that can override system config; which leaves you with having to fight them or disabling vendor-data in your image; so you can set system-config;
[20:48] <lachesis> mm i see, ideally i'd just be able to disable that particular setting and/or that setting would be smart enough to avoid messing up my MatchUsers
[20:48] <lachesis> i am pretty strongly tempted to patch the python file and be done with it lol
[20:48] <rharper> heh
[20:50] <rharper> 2020-02-27 20:50:04,362 - util.py[DEBUG]: Writing to /var/lib/cloud/instances/f1/sem/config_set_passwords - wb: [644] 24 bytes
[20:50] <rharper> 2020-02-27 20:50:04,362 - helpers.py[DEBUG]: Running config-set-passwords using lock (<FileLock using file '/var/lib/cloud/instances/f1/sem/config_set_passwords'>)
[20:50] <rharper> 2020-02-27 20:50:04,363 - cc_set_passwords.py[DEBUG]: Leaving SSH config 'PasswordAuthentication' unchanged. ssh_pwauth=unchanged
[20:50] <rharper> 2020-02-27 20:50:04,363 - handlers.py[DEBUG]: finish: modules-config/config-set-passwords: SUCCESS: config-set-passwords ran successfully
[20:50] <rharper> lachesis: so that
[20:50] <rharper> what I expect to see if you
[20:50] <rharper> your value makes it into the combined cloud config
[20:51] <lachesis> sry how do i get logs from that cloud-init clean run?
[20:53] <blackboxsw> lachesis: logs live in /var/log/cloud-init.log
[20:53] <lachesis> obviously, thx :)
[20:53] <lachesis> 2020-02-27 20:47:12,316 - util.py[DEBUG]: Read 2964 bytes from /etc/ssh/sshd_config
[20:53] <lachesis> 2020-02-27 20:47:12,317 - ssh_util.py[DEBUG]: line 97: option PasswordAuthentication already set to no
[20:53] <lachesis> 2020-02-27 20:47:12,317 - ssh_util.py[DEBUG]: line 103: option PasswordAuthentication updated yes -> no
[20:53] <lachesis> 2020-02-27 20:47:12,317 - util.py[DEBUG]: Writing to /etc/ssh/sshd_config - wb: [644] 2963 bytes
[20:53] <blackboxsw> the cloud-init clean --logs  removes your old /var/log/cloud-init.log so it'll only contain the current boot
[20:53] <lachesis> $ cat /etc/cloud/cloud.cfg
[20:54] <lachesis> # The top level settings are used as module
[20:54] <lachesis> # and system configuration.
[20:54] <lachesis> ssh_pwauth: unchanged
[20:54] <lachesis> wait is that a space?
[20:54] <lachesis> doh
[20:54] <lachesis> it's not a space, the _ just got lost somehow?
[20:54] <lachesis> `ssh_pwauth: unchanged`
[20:54] <lachesis> maybe my xchat font is borked... but there is an _ showing in vim
[20:55] <lachesis> but i imagine the vendor-data just overrode my config there
[20:55] <rharper> that's what I'm thinking
[21:01] <lachesis> ok im gonna go with the dumb patch cc_set_passwords.py option :)
[21:01] <lachesis> thanks for your help folks, i love an active open-source IRC channel :)
[21:05] <rharper> lachesis: yw
[21:56] <blackboxsw> rharper: just repushed  https://github.com/canonical/cloud-init/pull/214 with doc updates
[21:56] <rharper> blackboxsw: cool, did you see my comments in the review re: SRU blocker text / system_info() json encoding potential issues ?
[21:57] <blackboxsw> rharper: I had and responded to both
[21:58] <rharper> k
[21:58]  * rharper reviews 
[21:58] <blackboxsw> I think SRU blocker is a no because we added new fields before across SRU boundary for platform/subplatform.
[21:58] <rharper> and we mark the fields experiemental correct ?
[21:58] <blackboxsw> rharper: I did in the ds in the base, but we can/should add those good thought
[21:58] <blackboxsw> doing that now
[21:59] <rharper> so we're not baking things in; though I suspect telling folks they can now use this in their jinja template might make it less useful if it will change on them
[21:59] <rharper> blackboxsw: well, I do think things like system_info/variant, etc
[21:59] <blackboxsw> that's true
[21:59] <rharper> won't be going away
[21:59] <rharper> so those don't have to be experiement, ie, they are fixed values at this point
[21:59] <blackboxsw> right it's already been a requested  feature here once or twice
[21:59] <rharper> we already have runtime code that looks at os.variant etc
[22:00] <blackboxsw> rharper: correct, nothing changed there, just surfaced that key in instance data
[22:00] <rharper> yeah; but once added and SRU'ed, it can't change without potential regressing user scripts
[22:00] <blackboxsw> that's all part of stock util.system_info
[22:00] <rharper> so, we should be happy with them;
[22:00] <blackboxsw> right, we would be unable to change names once SRU'd
[22:01] <blackboxsw> maybe it's worth bikeshedding on key names?
[22:01]  * rharper was just reviewing them 
[22:01] <blackboxsw> as we certainly don't  change v1 keys
[22:01] <blackboxsw> the rest of the dict doc is up for changes in general as we don't promise that part won't change.... just v1 keys
[22:02] <blackboxsw> as v1 is the generalized output
[22:02] <blackboxsw> s/generalized/standardized
[22:02] <rharper> the 'cfg' head is going to be free-form, no?  if someone modified their /etc/cloud/cloud.cfg (or added a sub file) then there's going to be whatever in there ...
[22:16] <rharper> blackboxsw: reviewed
[22:21] <blackboxsw> thanks rharper yeah, that cfg probably should be root-readonly
[22:21] <blackboxsw> it could have just about anything
[22:24] <fredlef1> I've been looking at the life cycle of instance_link.  I see it's mostly managed in Init::_reflect_cur_instance(), called in stage6, after a valid data source has been found. But it's also preemptively deleted in Init::_get_data_source(), before initiating a search for available data sources.
[22:24] <fredlef1> The preemptive deletion was added in 2016 in 0x0964b42e5 (quickly check to see if the previous instance id is still valid). Does anyone remember what made the deletion outside of reflect_cur_instance() necessary/desirable?
[23:19] <rharper> fredlef1: hi, I believe the goal is to not trust the on disk object cache (which is only present on datasources which implement check_instance_id());  unless we can confirm the current instance id is the same as what's on disk (either in cache, or via the symlink);  this deletion happens during cloud-init-local time;  on subsequent stages (init, config, final) they all use the trust flag
[23:19] <rharper> fredlef1: do you have a particular bug you're looking at or something else?
[23:22] <fredlef1> rharper: thanks. That's useful information. Unless I'm mistaken, the disk object cache is always created but it only gets used if check_instance_id() is implemented or if manual_cache_clean is set to True.
[23:22] <fredlef1> rharper: I'm looking at ways to gracefully handle a reboot where the datasource is not available anymore
[23:22] <rharper> fredlef1: yes, I believe that's true, we do write it but it won't be used unless the ds implements the function ; its disabled by default in base source class (cloudinit/sources/__init__.py
[23:23] <rharper> fredlef1: this is manual-cache-clean: True
[23:24] <fredlef1> rharper: not quite.  I don't want to reuse the cache if the datasource is available, so as not to miss updates/changes to user-data and network interfaces.
[23:24] <fredlef1> Basically, if the datasource is available, we should always crawl it but if it is not, I want a way out
[23:24] <rharper> interesting
[23:25] <rharper> well
[23:26] <fredlef1> I'm testing a patch that modifies _get_data_source forcefully load the cached data if we failed to find a valid data source and the cached datasource still matches the config.
[23:26] <fredlef1> I should probably hide that behind a configuration option to make it more palatable
[23:27] <fredlef1> Does that sound senseless ?
[23:30] <rharper> not senseless;  I generally like the idea of using cached datasource object under the following criteria 1) the platform is the same, 2) the datasource is the same 3) metadata service is down;  4) and I think we can compare the object's ds.instance_id value to what's written to /var/lib/cloud/data/instance-id
[23:33] <rharper> currently we won't re-use the object unless the source implements check_instance_id() ;  normally this is done via some non-network verification, some platforms encode instance-id in platorm data (like dmi system uuid etc);  the ec2 instance-id is not present in system info that I'm aware of;   but you could attempt to fetch instance id via metadata service; and return true and we'd use the object;  in your "metadata service is down scenario";  then
[23:33] <rharper> you could do the other fallback checks I mentioned (1, 2, 3) and return True only if those hold
[23:33] <rharper> this would restore the obj from cache
[23:34] <rharper> beyond that, we'd need to sort through the other hits to imds, like the .network_config property
[23:42] <fredlef1> rharper: I looked at implementing check_instance_id as we do have a way to get the instance id on newer instance types but it turns outs that implementing check_instance_id for the EC2 datasource would conflict with public documentation from AWS.  In fact, I'm rather planning to implement a DataSourceEc2::check_instance_id() that returns false has
[23:42] <fredlef1> a place to document why it should not be implemented.
[23:42] <rharper> ok,
[23:43] <fredlef1> I'll keep your criteria for reusing the cached datasource object in mind. Thanks for that.
[23:43] <rharper> that's reasonable;   so, then I suspect we'd need to modify _get_data() to handle this scenario
[23:43] <rharper> but possibly deal with the removal of the object cache , which is why you're asking =)