[09:42] <faa> hello cloud-init now support python 2.7?
[10:56] <faa> search in issues "Thanks for the pull request! cloud-init 19.4 was the last version of  cloud-init to support Python 2"
[13:28] <AnhVoMSFT> rharper powersj I am seeing ec2 prefix on ssh public key generation on Azure instance https://paste.ubuntu.com/p/QSkhKyPS2B/ - Is this expected? Looks like this is output of ssh-keygen? Is it possible to change that to platform name?
[13:49] <Odd_Bloke> faa: That's correct, we dropped Python 2 support between 19.4 and 20.1.
[13:53] <Odd_Bloke> AnhVoMSFT: I'm not sure where that string comes from, TBH; could you file a bug?
[14:10] <Odd_Bloke> AnhVoMSFT: Oh, looks like we had a bug filed overnight: https://bugs.launchpad.net/bugs/1869277
[14:14] <Odd_Bloke> Oh, hmph, we're seeing the integration test hang (and therefore timeout) issues we've seen intermittently before: https://travis-ci.org/github/canonical/cloud-init/jobs/667711759
[14:14] <Odd_Bloke> rharper: I remember we discussed ^ some, but did we reach a conclusion as to what we thought the problem was?
[14:19] <rharper> Odd_Bloke: the most recent issues were blocked on snapd.seeded.service taking more than the 300 seconds we wait
[14:28] <Odd_Bloke> rharper: These are xenial instances though, so I think we concluded that there aren't any snaps being seeded?
[14:28] <Odd_Bloke> (I think we had this exact exchange last time too. :p)
[14:28] <rharper> AnhVoMSFT: ec2[1850]   ,  in systemd journal, things get logged with pid arg0 and pid number,  ...  cloud-init execs are clearly marked cloudinit[PID] ...
[14:28] <rharper> AnhVoMSFT: can you reproduce on stock RHEL7.7 or Centos7.7 images versus that OpenLogic image ?
[14:28] <Odd_Bloke> rharper: You can reproduce it in an Ubuntu lxd.
[14:28] <Odd_Bloke> (The "ec2" thing, I mean.)
[14:29] <rharper>  ?
[14:30] <Odd_Bloke> In the xenial container I just launched to check snapd.seeded, I have: Mar 27 14:27:03 renewed-corgi ec2[653]: 1024 SHA256:x/P+1raytPVp9l5tcVB28yV48aTs6jG5OhR6Fw09ciM root@renewed-corgi (DSA)
[14:30] <rharper> oh, interesting, focal does not
[14:33] <Odd_Bloke> I'm currently getting ~100kB/s downloading an image locally.
[14:33] <Odd_Bloke> So I bet that's what's causing our CI failures.
[14:33] <rharper> Odd_Bloke: what's your build.info on that xenial image ?
[14:34] <Odd_Bloke> 20191107
[14:34] <rharper> I have a vm launched yesterday serial: 20200320
[14:34] <rharper> dailys are fine
[14:34] <rharper> huh
[14:35] <rharper> I can see it a container now as well
[14:35] <rharper> not quite sure why it's not showing up in the VM
[14:35] <Odd_Bloke> So the code uses util.multi_log, which has a comment about behaviour being different in containers.
[14:36] <Odd_Bloke> https://github.com/canonical/cloud-init/blob/master/cloudinit/util.py#L515-L522
[14:37] <rharper>  This file is part of cloud-init. See LICENSE file for license information.
[14:37] <rharper> logger_opts="-p user.info -t ec2"
[14:38] <rharper>  /usr/lib/cloud-init/write-ssh-key-fingerprints
[14:38] <rharper> it's been there since 2012
[14:38] <rharper> so that's nothing new
[14:38] <AnhVoMSFT> yes I think it has been there for a long time. We just noticed it yesterday :-)
[14:38] <rharper> hehe
[14:39] <rharper> we can use the bug to adjust that to something else;  I suspect we want to log as cloud-init instead of ec2
[14:39] <AnhVoMSFT> looking through google I think someone using OpenStack was also seeing it. It's basically there for all cloud-init users
[14:39] <AnhVoMSFT> I think ec2 will need their own prefix. I saw some bug against it back then
[14:40] <AnhVoMSFT> https://bugs.launchpad.net/ubuntu/+source/ec2-init/+bug/458576
[14:40] <rharper> I guess someone was extracting the public keys from the serial console log ?
[14:41] <rharper> and yes, ec2-init  was cloud-init before it was cloud-init
[14:41] <AnhVoMSFT> i think we let datasource provide their own prefix if they would like to (ec2 in this case). Otherwise it should just say cloud-init
[14:42] <Odd_Bloke> It should just say cloud-init, I think.
[14:43] <rharper> yeah
[15:11] <Odd_Bloke> rharper: https://github.com/canonical/cloud-init/pull/287 <-- this will extend the time it takes before CI times out, which should work around the problem for now
[15:12] <Odd_Bloke> It sounds like we're just getting caught in transatlantic link congestion that's out of Canonical's direct control.
[15:12] <Odd_Bloke> I'd like us to investigate a couple of other things: (a) caching images between Travis runs, and (b) emitting progress information during the step that currently times out, so we don't need to use travis_wait.
[15:12] <rharper> Yeah, wondering if we want to bump the timeouts in CI after checkout
[15:13] <rharper> Odd_Bloke: I know lxd can be configured to point to other image servers ...   I suspect if we were to cache them and then we'd need an image alias to ensure the image names we use are found in the configured image repo
[15:13] <rharper> Odd_Bloke: we may want to chat with stgraber on travis lxd image caching
[15:14] <Odd_Bloke> Perhaps; doesn't the test framework itself perform the download?
[15:16] <rharper> yeah, I'll look at the implementation here in a second
[15:19] <rharper> for bumping timeout, there's this setting: tests/cloud_tests/platforms.yaml:get_image_timeout: 300   ; we can sed that into whatever large value we want ...
[15:20] <rharper> image caching is harder, we point to  DEFAULT_SSTREAMS_SERVER = "https://images.linuxcontainers.org:8443"   ; , I see that's also set in tests/cloud_tests/releases.yaml, lxd: sstreams_server ... so if we stood up a cache, we could point it there
[15:25] <Odd_Bloke> rharper: That isn't the timeout we're hitting, I commented on the PR.
[15:26] <Odd_Bloke> Ideally, I think we'd be able to identify the path that contains the cache that lxd/cloud-tests uses, and then ask Travis to just retain that between runs.
[15:27] <Odd_Bloke> Using https://docs.travis-ci.com/user/caching/#arbitrary-directories
[15:34] <blackboxsw> rharper: if you get a chance today, netplan prioritization branch needs your response. Should we write up a cloud_test for this? https://github.com/canonical/cloud-init/pull/267
[15:38] <rharper> blackboxsw: I saw your note, while I would like a cloud-test, we don't yet have a way to reboot between runs ...
[15:44] <blackboxsw> ok will probably write up that multi-stage cloud_test
[15:49] <powersj> blackboxsw, multi-stage?
[15:50] <blackboxsw> powersj: I may peek at it more in a bit. the ifupdown test sort of requires us to cloud-init clean and re-render networking . so we may need an additional collect stage for that type of test. especially if we want to reboot and recollect after that boot
[15:51] <powersj> blackboxsw, make a card for it and move on
[17:25] <blackboxsw> rharper: https://github.com/canonical/cloud-init/pull/267 ok, I think since we know Focal needs netplan prioritization, let's land this, and we can revist cloud_tests later for this once we've discussed with cpc/foundations what the plan is with handling ifupdown gaps
[19:25] <rharper> blackboxsw: reviewed, looks good, I've suggested a unittest to add, then we can land.
[19:32] <Odd_Bloke> https://github.com/canonical/cloud-init/pull/288 <-- small precursor PR for the mirror URL sanitisation work
[19:39] <rharper> You need to leave a comment indicating the requested changes.
[19:39] <rharper> I swear
[19:42] <drag0nius> anyone tried running cloud-init on pre-baked raspbian in rpi?
[19:42] <drag0nius> i want to know how to get started and where to put my cloud-configs into baked image
[19:45] <blackboxsw> drag0nius: might that be in the #raspberrypi  channel on freenode?
[19:46] <blackboxsw> I know waveform in Canonical has been looking over raspi development using cloud-init.
[19:50] <Odd_Bloke> Anyone else just getting a loading screen on https://travis-ci.org/ ?
[19:52] <Odd_Bloke> rharper: I've moved that function, net makes more sense for sure.
[19:52] <rharper> cool!
[19:53] <Odd_Bloke> (And this is an example of why smaller code reviews are better; who knows if we'd have picked up on that if I proposed all my code at once.)
[19:53] <rharper> yeah, thanks
[20:14] <ananke> I'm finally having a few mins to look back at kali again. Tried their earlier AMI, and I'm stumped: the same issue. cloud-init status claims cloud-init is still running
[20:15] <ananke> What's more baffling is the fact that 'blame analyze' shows close to a minute being spent on this:
[20:15] <ananke> -- Boot Record 01 -- 51.20500s (init-local/search-Ec2Local)
[20:15] <rharper> do you have a the full cloud-init collect-logs tarball ?
[20:16] <ananke> I'm preparing them right now
[20:16] <rharper> that looks like maybe networking isn't setup up properly and it's timing out on the URL lookup
[20:16] <rharper> but we'll know more for sure with logs
[20:17] <rharper> is kali debian based or something else?
[20:17] <ananke> I believe so
[20:17] <ananke> we're feeding it an identical cloud-init config via user-data as we are ubuntu & centos targets, which makes it so baffling
[20:18] <rharper> I doubt it's your user-data
[20:18] <rharper> more likely issues with the OS and cloud-init integration ...
[20:18] <rharper> if it's debian based, they may have a much older cloud-init
[20:19] <ananke> 18.3, but we saw the same issue with cloud-init 19 on their 2020 AMI. thank you for your interest in this issue. what would be a good way to share the log tarball?
[20:24] <ananke> with the covid19 pandemic and push to go online as much as possible, our workload has increased, as we provide an educational platform via AWS. fun, but busy
[20:25] <rharper> https://bugs.launchpad.net/cloud-init/+filebug
[20:25] <rharper> and attach the tarball
[20:26] <blackboxsw> xenial daily build recipe fixed & rebuilt working on a parameterized  unit test on netplan over eni priority pr
[20:26] <rharper> blackboxsw: \o/
[20:26] <rharper> blackboxsw: heh, or you could paste the one I had ... I know it's begging for a pytest.mark.paramtrized ...
[20:27] <rharper> but .. landing focal features > pytest refactor IMO
[20:27] <blackboxsw> rharper: fair point, I was only going to spend ~20 on it shouldn't be long. how long til EOW??
[20:27] <rharper> heh, ok
[20:35] <Odd_Bloke> Travis seems to be back now, but heads-up that I had to logout/login to get the UI to behave properly (it wasn't showing me canonical projects in the sidebar, and I didn't have the restart button on jobs).
[20:36] <ananke> rharper: done, thank you! https://bugs.launchpad.net/cloud-init/+bug/1869430
[20:39] <rharper> ananke: I'll take a look
[20:39] <ananke> thanks!
[20:41] <ananke> right now that's a bit of a problem for us, as the build stage runs cloud-init status --wait, so it never finishes
[20:41] <rharper> yeah, that's the right thing to do; so let's see what's going on
[20:45] <rharper> ananke: it appears that cloud-config.service has not yet started  (nor final) so it's hanging ... cloud-config.service waits for systemd's 'network-online.target'  --- in the journal file, I don't see the 'Reached target Network is Online' which is emitted when this happens.
[20:46] <rharper> I'll update the bug ... do you have an ami id to reference?  I suspect there are issues with the network service files
[20:48] <ananke> sure thing, I can add the AMIs we've tried in the bug report
[20:51] <ananke> ● network-online.target - Network is Online
[20:51] <ananke>    Loaded: loaded (/lib/systemd/system/network-online.target; static; vendor preset: disabled)
[20:51] <ananke>    Active: inactive (dead)
[20:51] <ananke> interesting
[20:52] <rharper> typically means nothing "asked" for it; in ubuntu we have a networking.service scripts which depends on it and should bring in the target
[20:52] <rharper> I would think debian networking.service would as well
[20:52] <rharper> however, kali has both ifupdown and network-manager  installed
[20:53] <ananke> yes, network-manager appears to be running, and it has ifup/ifdown
[20:53] <rharper> cloud-init also has had some edge cases with  ifupdown and nm installed at the same time ;
[20:53] <ananke> great :)
[21:02] <ananke> rharper: are systemd logs uniform across distros? as in, would 'Reached target Network is Online.' be different from 'Reached target Network.'?
[21:04] <ananke> hmm, it appears it is. I'll have to dig into this deeper
[21:06] <rharper> there are 3 targets
[21:06] <rharper> network-pre, network, and network-online
[21:07] <rharper> cloud-init runs before network.target as it writes out networking config, and then the networking.service nm.service run to bring the network online;  once they are active (and the -wait-online.services complete) then the network-online.target is reached
[21:07] <rharper> ananke: I'm not having luck with those images ...  it's asking me to subscribe to the kali marketplace and I can't do that ...
[21:08] <ananke> yeah, I was afraid of that. Eventhough they're free, they are still on the marketplace. I am looking at your last notes, those are very astute observations
[21:09] <ananke> though one might wonder why is it taking 40 seconds to resolve IP based URL
[21:09] <rharper> I think dns is not working likely related how resolv.conf is configured
[21:11] <ananke> thank you, we'll dig into it and update the bug report with findings. that's very valuable information though, we were stumped why cloud-init was not finishing, despite the host being up and no major issues on the surface
[21:12] <rharper> I'll update the bug with the files I'm looking for
[21:12] <ananke> I have to tend to kids for a bit, since there is no school. thank you, and I'll be back in a bit
[21:12] <rharper> sure
[21:45] <blackboxsw> rharper: wrapped up a parametrized pytest that allows dropping a few other renderer unit tests
[21:45] <blackboxsw> there obviously is more refactor that could be had, but net.renderer.select doesn't really rely on the distro type, just available/not-available renderers
[21:46] <blackboxsw> so adding unit tests there doesn't really exercise the merged config rendered case here
[21:55] <rharper> blackboxsw: you mean the system-config policy  ?
[21:55] <rharper> it doesn't really matter where it comes from; we already know that system config is merged over the default policy;
[21:57] <rharper> blackboxsw: right, I didn;t look hard enough; but I *think* in the ntp test-case I've got one that loads the config/cloud-config.cfg.tmpl , renders it and verifies it's default behavior
[21:57] <rharper> we want something like that for renderer