faa | hello cloud-init now support python 2.7? | 09:42 |
---|---|---|
faa | search in issues "Thanks for the pull request! cloud-init 19.4 was the last version of cloud-init to support Python 2" | 10:56 |
AnhVoMSFT | rharper powersj I am seeing ec2 prefix on ssh public key generation on Azure instance https://paste.ubuntu.com/p/QSkhKyPS2B/ - Is this expected? Looks like this is output of ssh-keygen? Is it possible to change that to platform name? | 13:28 |
Odd_Bloke | faa: That's correct, we dropped Python 2 support between 19.4 and 20.1. | 13:49 |
Odd_Bloke | AnhVoMSFT: I'm not sure where that string comes from, TBH; could you file a bug? | 13:53 |
Odd_Bloke | AnhVoMSFT: Oh, looks like we had a bug filed overnight: https://bugs.launchpad.net/bugs/1869277 | 14:10 |
ubot5 | Ubuntu bug 1869277 in cloud-init "azure vms with cloud-init display ec2 info in output" [Undecided,New] | 14:10 |
Odd_Bloke | Oh, hmph, we're seeing the integration test hang (and therefore timeout) issues we've seen intermittently before: https://travis-ci.org/github/canonical/cloud-init/jobs/667711759 | 14:14 |
Odd_Bloke | rharper: I remember we discussed ^ some, but did we reach a conclusion as to what we thought the problem was? | 14:14 |
rharper | Odd_Bloke: the most recent issues were blocked on snapd.seeded.service taking more than the 300 seconds we wait | 14:19 |
Odd_Bloke | rharper: These are xenial instances though, so I think we concluded that there aren't any snaps being seeded? | 14:28 |
Odd_Bloke | (I think we had this exact exchange last time too. :p) | 14:28 |
rharper | AnhVoMSFT: ec2[1850] , in systemd journal, things get logged with pid arg0 and pid number, ... cloud-init execs are clearly marked cloudinit[PID] ... | 14:28 |
rharper | AnhVoMSFT: can you reproduce on stock RHEL7.7 or Centos7.7 images versus that OpenLogic image ? | 14:28 |
Odd_Bloke | rharper: You can reproduce it in an Ubuntu lxd. | 14:28 |
Odd_Bloke | (The "ec2" thing, I mean.) | 14:28 |
rharper | ? | 14:29 |
Odd_Bloke | In the xenial container I just launched to check snapd.seeded, I have: Mar 27 14:27:03 renewed-corgi ec2[653]: 1024 SHA256:x/P+1raytPVp9l5tcVB28yV48aTs6jG5OhR6Fw09ciM root@renewed-corgi (DSA) | 14:30 |
rharper | oh, interesting, focal does not | 14:30 |
Odd_Bloke | I'm currently getting ~100kB/s downloading an image locally. | 14:33 |
Odd_Bloke | So I bet that's what's causing our CI failures. | 14:33 |
rharper | Odd_Bloke: what's your build.info on that xenial image ? | 14:33 |
Odd_Bloke | 20191107 | 14:34 |
rharper | I have a vm launched yesterday serial: 20200320 | 14:34 |
rharper | dailys are fine | 14:34 |
rharper | huh | 14:34 |
rharper | I can see it a container now as well | 14:35 |
rharper | not quite sure why it's not showing up in the VM | 14:35 |
Odd_Bloke | So the code uses util.multi_log, which has a comment about behaviour being different in containers. | 14:35 |
Odd_Bloke | https://github.com/canonical/cloud-init/blob/master/cloudinit/util.py#L515-L522 | 14:36 |
rharper | This file is part of cloud-init. See LICENSE file for license information. | 14:37 |
rharper | logger_opts="-p user.info -t ec2" | 14:37 |
rharper | /usr/lib/cloud-init/write-ssh-key-fingerprints | 14:38 |
rharper | it's been there since 2012 | 14:38 |
rharper | so that's nothing new | 14:38 |
AnhVoMSFT | yes I think it has been there for a long time. We just noticed it yesterday :-) | 14:38 |
rharper | hehe | 14:38 |
rharper | we can use the bug to adjust that to something else; I suspect we want to log as cloud-init instead of ec2 | 14:39 |
AnhVoMSFT | looking through google I think someone using OpenStack was also seeing it. It's basically there for all cloud-init users | 14:39 |
AnhVoMSFT | I think ec2 will need their own prefix. I saw some bug against it back then | 14:39 |
AnhVoMSFT | https://bugs.launchpad.net/ubuntu/+source/ec2-init/+bug/458576 | 14:40 |
ubot5 | Ubuntu bug 458576 in ec2-init (Ubuntu Karmic) "ec2: ssh public key fingerprint in console output does not match EC2 standards" [Low,Fix released] | 14:40 |
rharper | I guess someone was extracting the public keys from the serial console log ? | 14:40 |
rharper | and yes, ec2-init was cloud-init before it was cloud-init | 14:41 |
AnhVoMSFT | i think we let datasource provide their own prefix if they would like to (ec2 in this case). Otherwise it should just say cloud-init | 14:41 |
Odd_Bloke | It should just say cloud-init, I think. | 14:42 |
rharper | yeah | 14:43 |
Odd_Bloke | rharper: https://github.com/canonical/cloud-init/pull/287 <-- this will extend the time it takes before CI times out, which should work around the problem for now | 15:11 |
Odd_Bloke | It sounds like we're just getting caught in transatlantic link congestion that's out of Canonical's direct control. | 15:12 |
Odd_Bloke | I'd like us to investigate a couple of other things: (a) caching images between Travis runs, and (b) emitting progress information during the step that currently times out, so we don't need to use travis_wait. | 15:12 |
rharper | Yeah, wondering if we want to bump the timeouts in CI after checkout | 15:12 |
rharper | Odd_Bloke: I know lxd can be configured to point to other image servers ... I suspect if we were to cache them and then we'd need an image alias to ensure the image names we use are found in the configured image repo | 15:13 |
rharper | Odd_Bloke: we may want to chat with stgraber on travis lxd image caching | 15:13 |
Odd_Bloke | Perhaps; doesn't the test framework itself perform the download? | 15:14 |
rharper | yeah, I'll look at the implementation here in a second | 15:16 |
rharper | for bumping timeout, there's this setting: tests/cloud_tests/platforms.yaml:get_image_timeout: 300 ; we can sed that into whatever large value we want ... | 15:19 |
rharper | image caching is harder, we point to DEFAULT_SSTREAMS_SERVER = "https://images.linuxcontainers.org:8443" ; , I see that's also set in tests/cloud_tests/releases.yaml, lxd: sstreams_server ... so if we stood up a cache, we could point it there | 15:20 |
Odd_Bloke | rharper: That isn't the timeout we're hitting, I commented on the PR. | 15:25 |
Odd_Bloke | Ideally, I think we'd be able to identify the path that contains the cache that lxd/cloud-tests uses, and then ask Travis to just retain that between runs. | 15:26 |
Odd_Bloke | Using https://docs.travis-ci.com/user/caching/#arbitrary-directories | 15:27 |
blackboxsw | rharper: if you get a chance today, netplan prioritization branch needs your response. Should we write up a cloud_test for this? https://github.com/canonical/cloud-init/pull/267 | 15:34 |
rharper | blackboxsw: I saw your note, while I would like a cloud-test, we don't yet have a way to reboot between runs ... | 15:38 |
blackboxsw | ok will probably write up that multi-stage cloud_test | 15:44 |
powersj | blackboxsw, multi-stage? | 15:49 |
blackboxsw | powersj: I may peek at it more in a bit. the ifupdown test sort of requires us to cloud-init clean and re-render networking . so we may need an additional collect stage for that type of test. especially if we want to reboot and recollect after that boot | 15:50 |
powersj | blackboxsw, make a card for it and move on | 15:51 |
blackboxsw | rharper: https://github.com/canonical/cloud-init/pull/267 ok, I think since we know Focal needs netplan prioritization, let's land this, and we can revist cloud_tests later for this once we've discussed with cpc/foundations what the plan is with handling ifupdown gaps | 17:25 |
rharper | blackboxsw: reviewed, looks good, I've suggested a unittest to add, then we can land. | 19:25 |
Odd_Bloke | https://github.com/canonical/cloud-init/pull/288 <-- small precursor PR for the mirror URL sanitisation work | 19:32 |
rharper | You need to leave a comment indicating the requested changes. | 19:39 |
rharper | I swear | 19:39 |
drag0nius | anyone tried running cloud-init on pre-baked raspbian in rpi? | 19:42 |
drag0nius | i want to know how to get started and where to put my cloud-configs into baked image | 19:42 |
blackboxsw | drag0nius: might that be in the #raspberrypi channel on freenode? | 19:45 |
blackboxsw | I know waveform in Canonical has been looking over raspi development using cloud-init. | 19:46 |
Odd_Bloke | Anyone else just getting a loading screen on https://travis-ci.org/ ? | 19:50 |
Odd_Bloke | rharper: I've moved that function, net makes more sense for sure. | 19:52 |
rharper | cool! | 19:52 |
Odd_Bloke | (And this is an example of why smaller code reviews are better; who knows if we'd have picked up on that if I proposed all my code at once.) | 19:53 |
rharper | yeah, thanks | 19:53 |
=== tds0 is now known as tds | ||
ananke | I'm finally having a few mins to look back at kali again. Tried their earlier AMI, and I'm stumped: the same issue. cloud-init status claims cloud-init is still running | 20:14 |
ananke | What's more baffling is the fact that 'blame analyze' shows close to a minute being spent on this: | 20:15 |
ananke | -- Boot Record 01 -- 51.20500s (init-local/search-Ec2Local) | 20:15 |
rharper | do you have a the full cloud-init collect-logs tarball ? | 20:15 |
ananke | I'm preparing them right now | 20:16 |
rharper | that looks like maybe networking isn't setup up properly and it's timing out on the URL lookup | 20:16 |
rharper | but we'll know more for sure with logs | 20:16 |
rharper | is kali debian based or something else? | 20:17 |
ananke | I believe so | 20:17 |
ananke | we're feeding it an identical cloud-init config via user-data as we are ubuntu & centos targets, which makes it so baffling | 20:17 |
rharper | I doubt it's your user-data | 20:18 |
rharper | more likely issues with the OS and cloud-init integration ... | 20:18 |
rharper | if it's debian based, they may have a much older cloud-init | 20:18 |
ananke | 18.3, but we saw the same issue with cloud-init 19 on their 2020 AMI. thank you for your interest in this issue. what would be a good way to share the log tarball? | 20:19 |
ananke | with the covid19 pandemic and push to go online as much as possible, our workload has increased, as we provide an educational platform via AWS. fun, but busy | 20:24 |
rharper | https://bugs.launchpad.net/cloud-init/+filebug | 20:25 |
rharper | and attach the tarball | 20:25 |
blackboxsw | xenial daily build recipe fixed & rebuilt working on a parameterized unit test on netplan over eni priority pr | 20:26 |
rharper | blackboxsw: \o/ | 20:26 |
rharper | blackboxsw: heh, or you could paste the one I had ... I know it's begging for a pytest.mark.paramtrized ... | 20:26 |
rharper | but .. landing focal features > pytest refactor IMO | 20:27 |
blackboxsw | rharper: fair point, I was only going to spend ~20 on it shouldn't be long. how long til EOW?? | 20:27 |
rharper | heh, ok | 20:27 |
Odd_Bloke | Travis seems to be back now, but heads-up that I had to logout/login to get the UI to behave properly (it wasn't showing me canonical projects in the sidebar, and I didn't have the restart button on jobs). | 20:35 |
ananke | rharper: done, thank you! https://bugs.launchpad.net/cloud-init/+bug/1869430 | 20:36 |
ubot5 | Ubuntu bug 1869430 in cloud-init "cloud-init persists in running state on Kali in AWS" [Undecided,New] | 20:36 |
rharper | ananke: I'll take a look | 20:39 |
ananke | thanks! | 20:39 |
ananke | right now that's a bit of a problem for us, as the build stage runs cloud-init status --wait, so it never finishes | 20:41 |
rharper | yeah, that's the right thing to do; so let's see what's going on | 20:41 |
rharper | ananke: it appears that cloud-config.service has not yet started (nor final) so it's hanging ... cloud-config.service waits for systemd's 'network-online.target' --- in the journal file, I don't see the 'Reached target Network is Online' which is emitted when this happens. | 20:45 |
rharper | I'll update the bug ... do you have an ami id to reference? I suspect there are issues with the network service files | 20:46 |
ananke | sure thing, I can add the AMIs we've tried in the bug report | 20:48 |
ananke | ● network-online.target - Network is Online | 20:51 |
ananke | Loaded: loaded (/lib/systemd/system/network-online.target; static; vendor preset: disabled) | 20:51 |
ananke | Active: inactive (dead) | 20:51 |
ananke | interesting | 20:51 |
rharper | typically means nothing "asked" for it; in ubuntu we have a networking.service scripts which depends on it and should bring in the target | 20:52 |
rharper | I would think debian networking.service would as well | 20:52 |
rharper | however, kali has both ifupdown and network-manager installed | 20:52 |
ananke | yes, network-manager appears to be running, and it has ifup/ifdown | 20:53 |
rharper | cloud-init also has had some edge cases with ifupdown and nm installed at the same time ; | 20:53 |
ananke | great :) | 20:53 |
ananke | rharper: are systemd logs uniform across distros? as in, would 'Reached target Network is Online.' be different from 'Reached target Network.'? | 21:02 |
ananke | hmm, it appears it is. I'll have to dig into this deeper | 21:04 |
rharper | there are 3 targets | 21:06 |
rharper | network-pre, network, and network-online | 21:06 |
rharper | cloud-init runs before network.target as it writes out networking config, and then the networking.service nm.service run to bring the network online; once they are active (and the -wait-online.services complete) then the network-online.target is reached | 21:07 |
rharper | ananke: I'm not having luck with those images ... it's asking me to subscribe to the kali marketplace and I can't do that ... | 21:07 |
ananke | yeah, I was afraid of that. Eventhough they're free, they are still on the marketplace. I am looking at your last notes, those are very astute observations | 21:08 |
ananke | though one might wonder why is it taking 40 seconds to resolve IP based URL | 21:09 |
rharper | I think dns is not working likely related how resolv.conf is configured | 21:09 |
ananke | thank you, we'll dig into it and update the bug report with findings. that's very valuable information though, we were stumped why cloud-init was not finishing, despite the host being up and no major issues on the surface | 21:11 |
rharper | I'll update the bug with the files I'm looking for | 21:12 |
ananke | I have to tend to kids for a bit, since there is no school. thank you, and I'll be back in a bit | 21:12 |
rharper | sure | 21:12 |
blackboxsw | rharper: wrapped up a parametrized pytest that allows dropping a few other renderer unit tests | 21:45 |
blackboxsw | there obviously is more refactor that could be had, but net.renderer.select doesn't really rely on the distro type, just available/not-available renderers | 21:45 |
blackboxsw | so adding unit tests there doesn't really exercise the merged config rendered case here | 21:46 |
rharper | blackboxsw: you mean the system-config policy ? | 21:55 |
rharper | it doesn't really matter where it comes from; we already know that system config is merged over the default policy; | 21:55 |
rharper | blackboxsw: right, I didn;t look hard enough; but I *think* in the ntp test-case I've got one that loads the config/cloud-config.cfg.tmpl , renders it and verifies it's default behavior | 21:57 |
rharper | we want something like that for renderer | 21:57 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!