[09:42] hello cloud-init now support python 2.7? [10:56] search in issues "Thanks for the pull request! cloud-init 19.4 was the last version of cloud-init to support Python 2" [13:28] rharper powersj I am seeing ec2 prefix on ssh public key generation on Azure instance https://paste.ubuntu.com/p/QSkhKyPS2B/ - Is this expected? Looks like this is output of ssh-keygen? Is it possible to change that to platform name? [13:49] faa: That's correct, we dropped Python 2 support between 19.4 and 20.1. [13:53] AnhVoMSFT: I'm not sure where that string comes from, TBH; could you file a bug? [14:10] AnhVoMSFT: Oh, looks like we had a bug filed overnight: https://bugs.launchpad.net/bugs/1869277 [14:10] Ubuntu bug 1869277 in cloud-init "azure vms with cloud-init display ec2 info in output" [Undecided,New] [14:14] Oh, hmph, we're seeing the integration test hang (and therefore timeout) issues we've seen intermittently before: https://travis-ci.org/github/canonical/cloud-init/jobs/667711759 [14:14] rharper: I remember we discussed ^ some, but did we reach a conclusion as to what we thought the problem was? [14:19] Odd_Bloke: the most recent issues were blocked on snapd.seeded.service taking more than the 300 seconds we wait [14:28] rharper: These are xenial instances though, so I think we concluded that there aren't any snaps being seeded? [14:28] (I think we had this exact exchange last time too. :p) [14:28] AnhVoMSFT: ec2[1850] , in systemd journal, things get logged with pid arg0 and pid number, ... cloud-init execs are clearly marked cloudinit[PID] ... [14:28] AnhVoMSFT: can you reproduce on stock RHEL7.7 or Centos7.7 images versus that OpenLogic image ? [14:28] rharper: You can reproduce it in an Ubuntu lxd. [14:28] (The "ec2" thing, I mean.) [14:29] ? [14:30] In the xenial container I just launched to check snapd.seeded, I have: Mar 27 14:27:03 renewed-corgi ec2[653]: 1024 SHA256:x/P+1raytPVp9l5tcVB28yV48aTs6jG5OhR6Fw09ciM root@renewed-corgi (DSA) [14:30] oh, interesting, focal does not [14:33] I'm currently getting ~100kB/s downloading an image locally. [14:33] So I bet that's what's causing our CI failures. [14:33] Odd_Bloke: what's your build.info on that xenial image ? [14:34] 20191107 [14:34] I have a vm launched yesterday serial: 20200320 [14:34] dailys are fine [14:34] huh [14:35] I can see it a container now as well [14:35] not quite sure why it's not showing up in the VM [14:35] So the code uses util.multi_log, which has a comment about behaviour being different in containers. [14:36] https://github.com/canonical/cloud-init/blob/master/cloudinit/util.py#L515-L522 [14:37] This file is part of cloud-init. See LICENSE file for license information. [14:37] logger_opts="-p user.info -t ec2" [14:38] /usr/lib/cloud-init/write-ssh-key-fingerprints [14:38] it's been there since 2012 [14:38] so that's nothing new [14:38] yes I think it has been there for a long time. We just noticed it yesterday :-) [14:38] hehe [14:39] we can use the bug to adjust that to something else; I suspect we want to log as cloud-init instead of ec2 [14:39] looking through google I think someone using OpenStack was also seeing it. It's basically there for all cloud-init users [14:39] I think ec2 will need their own prefix. I saw some bug against it back then [14:40] https://bugs.launchpad.net/ubuntu/+source/ec2-init/+bug/458576 [14:40] Ubuntu bug 458576 in ec2-init (Ubuntu Karmic) "ec2: ssh public key fingerprint in console output does not match EC2 standards" [Low,Fix released] [14:40] I guess someone was extracting the public keys from the serial console log ? [14:41] and yes, ec2-init was cloud-init before it was cloud-init [14:41] i think we let datasource provide their own prefix if they would like to (ec2 in this case). Otherwise it should just say cloud-init [14:42] It should just say cloud-init, I think. [14:43] yeah [15:11] rharper: https://github.com/canonical/cloud-init/pull/287 <-- this will extend the time it takes before CI times out, which should work around the problem for now [15:12] It sounds like we're just getting caught in transatlantic link congestion that's out of Canonical's direct control. [15:12] I'd like us to investigate a couple of other things: (a) caching images between Travis runs, and (b) emitting progress information during the step that currently times out, so we don't need to use travis_wait. [15:12] Yeah, wondering if we want to bump the timeouts in CI after checkout [15:13] Odd_Bloke: I know lxd can be configured to point to other image servers ... I suspect if we were to cache them and then we'd need an image alias to ensure the image names we use are found in the configured image repo [15:13] Odd_Bloke: we may want to chat with stgraber on travis lxd image caching [15:14] Perhaps; doesn't the test framework itself perform the download? [15:16] yeah, I'll look at the implementation here in a second [15:19] for bumping timeout, there's this setting: tests/cloud_tests/platforms.yaml:get_image_timeout: 300 ; we can sed that into whatever large value we want ... [15:20] image caching is harder, we point to DEFAULT_SSTREAMS_SERVER = "https://images.linuxcontainers.org:8443" ; , I see that's also set in tests/cloud_tests/releases.yaml, lxd: sstreams_server ... so if we stood up a cache, we could point it there [15:25] rharper: That isn't the timeout we're hitting, I commented on the PR. [15:26] Ideally, I think we'd be able to identify the path that contains the cache that lxd/cloud-tests uses, and then ask Travis to just retain that between runs. [15:27] Using https://docs.travis-ci.com/user/caching/#arbitrary-directories [15:34] rharper: if you get a chance today, netplan prioritization branch needs your response. Should we write up a cloud_test for this? https://github.com/canonical/cloud-init/pull/267 [15:38] blackboxsw: I saw your note, while I would like a cloud-test, we don't yet have a way to reboot between runs ... [15:44] ok will probably write up that multi-stage cloud_test [15:49] blackboxsw, multi-stage? [15:50] powersj: I may peek at it more in a bit. the ifupdown test sort of requires us to cloud-init clean and re-render networking . so we may need an additional collect stage for that type of test. especially if we want to reboot and recollect after that boot [15:51] blackboxsw, make a card for it and move on [17:25] rharper: https://github.com/canonical/cloud-init/pull/267 ok, I think since we know Focal needs netplan prioritization, let's land this, and we can revist cloud_tests later for this once we've discussed with cpc/foundations what the plan is with handling ifupdown gaps [19:25] blackboxsw: reviewed, looks good, I've suggested a unittest to add, then we can land. [19:32] https://github.com/canonical/cloud-init/pull/288 <-- small precursor PR for the mirror URL sanitisation work [19:39] You need to leave a comment indicating the requested changes. [19:39] I swear [19:42] anyone tried running cloud-init on pre-baked raspbian in rpi? [19:42] i want to know how to get started and where to put my cloud-configs into baked image [19:45] drag0nius: might that be in the #raspberrypi channel on freenode? [19:46] I know waveform in Canonical has been looking over raspi development using cloud-init. [19:50] Anyone else just getting a loading screen on https://travis-ci.org/ ? [19:52] rharper: I've moved that function, net makes more sense for sure. [19:52] cool! [19:53] (And this is an example of why smaller code reviews are better; who knows if we'd have picked up on that if I proposed all my code at once.) [19:53] yeah, thanks === tds0 is now known as tds [20:14] I'm finally having a few mins to look back at kali again. Tried their earlier AMI, and I'm stumped: the same issue. cloud-init status claims cloud-init is still running [20:15] What's more baffling is the fact that 'blame analyze' shows close to a minute being spent on this: [20:15] -- Boot Record 01 -- 51.20500s (init-local/search-Ec2Local) [20:15] do you have a the full cloud-init collect-logs tarball ? [20:16] I'm preparing them right now [20:16] that looks like maybe networking isn't setup up properly and it's timing out on the URL lookup [20:16] but we'll know more for sure with logs [20:17] is kali debian based or something else? [20:17] I believe so [20:17] we're feeding it an identical cloud-init config via user-data as we are ubuntu & centos targets, which makes it so baffling [20:18] I doubt it's your user-data [20:18] more likely issues with the OS and cloud-init integration ... [20:18] if it's debian based, they may have a much older cloud-init [20:19] 18.3, but we saw the same issue with cloud-init 19 on their 2020 AMI. thank you for your interest in this issue. what would be a good way to share the log tarball? [20:24] with the covid19 pandemic and push to go online as much as possible, our workload has increased, as we provide an educational platform via AWS. fun, but busy [20:25] https://bugs.launchpad.net/cloud-init/+filebug [20:25] and attach the tarball [20:26] xenial daily build recipe fixed & rebuilt working on a parameterized unit test on netplan over eni priority pr [20:26] blackboxsw: \o/ [20:26] blackboxsw: heh, or you could paste the one I had ... I know it's begging for a pytest.mark.paramtrized ... [20:27] but .. landing focal features > pytest refactor IMO [20:27] rharper: fair point, I was only going to spend ~20 on it shouldn't be long. how long til EOW?? [20:27] heh, ok [20:35] Travis seems to be back now, but heads-up that I had to logout/login to get the UI to behave properly (it wasn't showing me canonical projects in the sidebar, and I didn't have the restart button on jobs). [20:36] rharper: done, thank you! https://bugs.launchpad.net/cloud-init/+bug/1869430 [20:36] Ubuntu bug 1869430 in cloud-init "cloud-init persists in running state on Kali in AWS" [Undecided,New] [20:39] ananke: I'll take a look [20:39] thanks! [20:41] right now that's a bit of a problem for us, as the build stage runs cloud-init status --wait, so it never finishes [20:41] yeah, that's the right thing to do; so let's see what's going on [20:45] ananke: it appears that cloud-config.service has not yet started (nor final) so it's hanging ... cloud-config.service waits for systemd's 'network-online.target' --- in the journal file, I don't see the 'Reached target Network is Online' which is emitted when this happens. [20:46] I'll update the bug ... do you have an ami id to reference? I suspect there are issues with the network service files [20:48] sure thing, I can add the AMIs we've tried in the bug report [20:51] ● network-online.target - Network is Online [20:51] Loaded: loaded (/lib/systemd/system/network-online.target; static; vendor preset: disabled) [20:51] Active: inactive (dead) [20:51] interesting [20:52] typically means nothing "asked" for it; in ubuntu we have a networking.service scripts which depends on it and should bring in the target [20:52] I would think debian networking.service would as well [20:52] however, kali has both ifupdown and network-manager installed [20:53] yes, network-manager appears to be running, and it has ifup/ifdown [20:53] cloud-init also has had some edge cases with ifupdown and nm installed at the same time ; [20:53] great :) [21:02] rharper: are systemd logs uniform across distros? as in, would 'Reached target Network is Online.' be different from 'Reached target Network.'? [21:04] hmm, it appears it is. I'll have to dig into this deeper [21:06] there are 3 targets [21:06] network-pre, network, and network-online [21:07] cloud-init runs before network.target as it writes out networking config, and then the networking.service nm.service run to bring the network online; once they are active (and the -wait-online.services complete) then the network-online.target is reached [21:07] ananke: I'm not having luck with those images ... it's asking me to subscribe to the kali marketplace and I can't do that ... [21:08] yeah, I was afraid of that. Eventhough they're free, they are still on the marketplace. I am looking at your last notes, those are very astute observations [21:09] though one might wonder why is it taking 40 seconds to resolve IP based URL [21:09] I think dns is not working likely related how resolv.conf is configured [21:11] thank you, we'll dig into it and update the bug report with findings. that's very valuable information though, we were stumped why cloud-init was not finishing, despite the host being up and no major issues on the surface [21:12] I'll update the bug with the files I'm looking for [21:12] I have to tend to kids for a bit, since there is no school. thank you, and I'll be back in a bit [21:12] sure [21:45] rharper: wrapped up a parametrized pytest that allows dropping a few other renderer unit tests [21:45] there obviously is more refactor that could be had, but net.renderer.select doesn't really rely on the distro type, just available/not-available renderers [21:46] so adding unit tests there doesn't really exercise the merged config rendered case here [21:55] blackboxsw: you mean the system-config policy ? [21:55] it doesn't really matter where it comes from; we already know that system config is merged over the default policy; [21:57] blackboxsw: right, I didn;t look hard enough; but I *think* in the ntp test-case I've got one that loads the config/cloud-config.cfg.tmpl , renders it and verifies it's default behavior [21:57] we want something like that for renderer