[02:47] <blackboxsw> @jchittum heh, no thanks on the black isort for what you are touching. We'll address it in a single PR so downstreams will only have to feel the pain of non-functional reformats once instead of 500 paper cuts over time :)
[02:48] <blackboxsw> jchittum: cool on the PR , yes don't worry about xenial-dev environment not succeeding in your env, just tox -e py3  and flake8 and pylin
[02:48] <blackboxsw> t
[13:58] <falcojr> upstream, we're dropping 3.5 support at the end of the year, so we'll also drop the xenial test envs then too
[14:07] <minimal> I am intending to raise a PR but first wanted to sound out the best way to implement it. It is regarding openssh behaviour with locked accounts
[14:08] <minimal> basically "lock_passwd: True" will *also* block key-based SSH logins if OpenSSH's sshd either has "UsePAM no" in its config or if it was built without PAM support
[14:11] <minimal> so wondering whether in a PR to either (a) add a new setting to change the behaviour of "lock_passwd" so that it disables rather than locks accounts or (b) add a new per-user setting "disable_passwd" instead (this will not solve the problem for people expecting "lock_passed" to work regardless of whether PAM is enabled/disabled)
[15:11] <falcojr> minimal: (b) is probably best for backwards compatibility. How would you implement "disable_passwd"? "lock_passwd" just calls passwd -l, so I'm not sure what people would currently be expecting it to do
[15:12] <minimal> falcojr: I'd thought of (a) for backwards compatibility as it would use a "secondary" setting to change the behaviour of "lock_passwd" (with the default being current behaviour)
[15:13] <minimal> whereas introducing a new setting (b) would not resolve the problem for people expecting "lock_passwd" to behave in the same fashion whether PAM is enabled/disabled
[15:14] <falcojr> minimal: well, it literally just locks the password, right? I'd rather we update the docs if that's unclear than make it possible to change the behavior of that setting
[15:15] <minimal> falcojr: the underlying issue is a openssh "strangeness" but it does affect (some) cloud-init users who want to lock the password of an account but then find they cannot SSH in to the account using a key
[15:16] <minimal> the only workaround for this scenario is to set the account's password to "*" rather than "!" (which is what "passwd -l" does)
[15:22] <minimal> falcojr: its not a documentation issue
[15:24] <minimal> falcojr: here's an article that describes the underlying (openssh) issue: https://arlimus.github.io/articles/usepam/
[15:42] <falcojr> "(b) would not resolve the problem for people expecting "lock_passwd" to behave in the same fashion whether PAM is enabled/disabled"
[15:42] <falcojr> I think this is a wrong expectation
[15:45] <minimal> ok, let me turn things around: what would be an acceptable way in cloud-init to resolve the situation where the default account needs to have its password locked (using "lock_passwd: true") but also that SSH access via a key must be possible on a machine where sshd_config "UsePAM no" is set (i.e. as part of system hardening)?
[15:45] <minimal> current the above does not permit SSH access using a key
[16:07] <falcojr> minimal: can't they set passwd to * ?
[16:22] <falcojr> if that's too opaque, I think I'd be ok with a "disable_password" setting that sets the password to *. It'd be silly to combine that with lock_passwd, but I wouldn't think it'd actually be a problem
[16:47] <minimal> falcojr: I just think of user-data as being "generic" (i.e. hiding OS-specific issues) and this issue means that 2 different machines given the same "lock_passwd: true" user-data could behave in different ways
[16:48] <minimal> I'll look at coding up a "disable_passwd" setting
[16:48] <falcojr> how would the same lock_passwd behave in different ways?
[16:49] <falcojr> maybe I'm looking at the wrong resources, but everything I've read says "!" means locked and can be unlocked, and "*" doesn't specifically have a meaning but essentially means disabled and can never be re-enabled (without modifying shadow directly)
[16:49] <minimal> not the lock_passwd but the server's sshd behaviour
[16:51] <minimal> yes the manpage for "shadow" indicates that "!" locks a password, not an account. OpenSSH however has been coded so that when UsePAM is "no" they treat a locked password as a locked account. The "fault" does like with OpenSSH, I'm just trying to come up with a way to deal with this in cloud-init
[16:51] <minimal> s/like/lie/
[17:24] <minimal> falcojr: let me try again to explain again using a simple scenario, a person creates 4 EC2 VMs using user-data that specifies a user "bob", associates a public SSH key with the user (via "ssh_authorized_keys") and also indicates no password access be permitted (via "locked_password: true"). The person then selects 4 "random" AMIs (e.g. Debian, CentOS, Ubuntu, Alpine) to pass this user-data to. Expected end result: the person can successfully ssh as 
[17:24] <minimal> user "bob" to all 4 machines using the relevant SSH key. Actual result: access may work or fail depending on whether the OpenSSH daemon on each of the 4 "random" AMIs has (a) been compiled with/without PAM support, and (b) the value of the sshd_config setting "UsePAM".
[17:35] <minimal> falcojr: relevant text in sshd manpage: "If there is a requirement to disable password authentication for the account while allowing still public-key, then the passwd field should be set to something other than these values (eg ‘NP’ or ‘*NP*’ )"
[17:35] <falcojr> minimal: Yeah, I get that that's a problem, but the lock_passwd option isn't specifically an ssh configuration option. Wouldn't having a disable password option alleviate that? We could add a doc note that if you want to enable ssh accounts with no user passwords that that's the recommended way.
[17:35] <falcojr> What's your concern with a disable password option?
[17:35] <minimal> falcojr: ^^^
[17:37] <minimal> bottom line: cloud-init treats "lock_passwd" as a "block only password access but permit key access to the account" which in general is what happens in reality. However this "assumption" breaks when sshd has PAM disabled
[17:38] <minimal> yes one approach would be to add a "disable_password" option (or use "passwd: *" instead) and document that people not use "lock_passwd".
[17:39] <minimal> historical and current practice for "everyone" out there seems to have been to use "lock_passwd" expecting it to block passwords but permit keys
[17:47] <falcojr> yeah, I'm assuming lock_passwd was added because that made it easy to do passwordless public key auth, but I don't think that means we should change the meaning of lock_passwd. Outside of cloud-init (on Linux at least), locked password still has a specific meaning that I would make me expect a "!"
[17:48] <falcojr> which is why I'm in favor of the separate option
[17:52] <minimal> ok, as long as its documented that people are pointed to use the "correct" option as I expect the vast majority of people locking passwords still want ssh key access to work for those
[17:53] <minimal> I just tried using passwd: "*" in user-data to see what happened - ended up with "!*" as the password field in /etc/shadow - lock_passwd is True by default so that's still a problem!
[17:54] <minimal> so its not as simple as using "passwd" or adding a "disable_password" option, as there's still interaction with "lock_passwd" (as its default-on)
[17:57] <minimal> effectively passwd/disable_password and lock_passwd need to be exclusive options
[17:59] <minimal> well disable_password and lock_passwd as I guess you could specify both lock_passwd: true and passwd: "<some value>" with the expectation that the resultant /etc/shadow password is set to "!<some value>" so that you could later unlock the password
[20:19] <rjschwei> falcojr: Looks like you did some refactoring of the network setup, thanks. But I am at a bit of a loss what happend. Can you give me some pointers where to start pulling on thread?
[20:19] <rjschwei> I need to forward port https://build.opensuse.org/package/view_file/Cloud:Tools/cloud-init/cloud-init-write-routes.patch?expand=1
[20:19] <rjschwei> and I cannot find where route writing happens now
[20:34] <falcojr> rjschwei: for context: https://github.com/canonical/cloud-init/pull/919/files . cloud-init used to do (global)_apply_network_from_network_config->(distro-dependent)_write_network_config->(global)_supported_network_config
[20:34] <falcojr> it now does _apply_network_from_network_config->_write_network_state
[20:35] <falcojr> passing the config-independent state rather than the raw config
[20:36] <rjschwei> Sorry "config-independent state" means nothing to me, is there an example in a test somewhere that I can look at to see what that looks like?
[20:37] <rjschwei> and I did find " _render_subnet_routes" in sysconfig.py just now but am also uncertain as to what gets passed in and how to deal with it.
[20:37] <falcojr> if you need the raw config (you probably don't, but using the network_state will require more changes to your patch), you can probably patch the call on line 215 to call your _write_network_config rather than _write_network_state and all should be good
[20:39] <falcojr> though, I'm guessing this patch could be upstreamed
[20:39] <falcojr> let me find some state tests
[20:39] <rjschwei> I have no problem changing the patch if I can figure out what my starting point is.
[20:40] <rjschwei> Yeah now that things come from the renderer and we have the flavors there already upstreaming looks a lot better than before
[20:41] <rjschwei> I do appreciate that the groundwork appears to be in place, I just have not wrapped my head around what the data structures look like to cast the information into the format wicked expects
[20:43] <falcojr> so generally, since cloud-init supports multiple network formats (v1 and v2), internally, we try to convert that to an intermediate format and use that where possible rather than dealing with the network config directly
[20:44] <falcojr> https://github.com/canonical/cloud-init/blob/main/cloudinit/net/network_state.py#L117
[20:44] <falcojr> and https://github.com/canonical/cloud-init/blob/main/tests/unittests/net/test_network_state.py
[20:44] <falcojr> so that's what I'm referring to when I mention the config-independent state
[20:51] <rjschwei> but those test have no subnet setup and routes, confused....
[21:09] <rjschwei> OK, I see the hacky way now, I think......
[21:09] <rjschwei> going with that first
[21:39] <falcojr> rjschwei: sorry, had to run out for a minute. That's fine, but more tests live at https://github.com/canonical/cloud-init/blob/main/tests/unittests/test_net.py#L3694
[21:40] <falcojr> it's harder to follow, but there's a number of configs and expected results
[21:57] <rjschwei> falcojr: np, thanks pursuing the ugly hack path first so I can get a 21.4 package built for testing. with the new structure in the renderer in place I see a reasonably good way in getting the route writing code upstream. Will work on that after the 21.4 package with the ugly hack is put together
[21:57] <falcojr> sounds good!