ggoodman | Ahh, every time my computer idles, I lose all history from this channel! I tried googling to see if there are hosted archives and came up short. Anywhere I can read previous conversations? I'm not sure what I might have missed re: the hotplug spinning behaviour we hit last week. | 12:45 |
---|---|---|
ggoodman | To bring this into context, it appears that we picked up a newer version of cloud-init during recent AMI bakes based on Canonical's public ubuntu 18.04 image. As we rolled out the version of our service based on such AMIs we saw a significant and sustained increase in CPU and disk usage. We were able to eliminate code-level changes as contributing | 12:50 |
ggoodman | factors and happened to observe the `cloud-init` process spiking in CPU on affected hosts. Comparison of versions on affected and unaffected hosts identified that this tool had been updated. | 12:50 |
ggoodman | We also so that whereas on older hosts, the `/var/log/cloud-init.log` file did not change after initial boot, on affected hosts, this file was being written to in continuous bursts. | 12:51 |
ggoodman | Some Googling led us to the documentation about tweaking when networking config should be applied: https://cloudinit.readthedocs.io/en/latest/topics/events.html#apply-network-config-every-boot. We attempted this 'fix' on a quarantined host and observed that it did not resolve the issue. | 12:52 |
ggoodman | We were also unable to figure out how to pull in an older, unaffected version of cloud-init using `apt` as the version in use on unaffected hosts didn't appear to be available any longer. | 12:53 |
ggoodman | Our resolution was to tweak our AMI build tooling to pin to an older version of the base ubuntu AMI and then use `apt` to put a hold on `cloud-init`. As a result, we're unable to continue picking up upstream ubuntu base images until this issue is resolved. | 12:55 |
ggoodman | To put a visual on the impact across one deployment: https://imgur.com/a/4RJEt8E. The period where we were spiking to 100% was because the incremental CPU and disk load we were observing caused a bit of a cascade failure and required all hands on deck. The drop in load and variance thereof corresponds with when we finally got a working AMI built | 12:58 |
ggoodman | and shipped with `cloud-init` pinned to the earlier version. | 12:58 |
ggoodman | falcojr, it seemed like the linked logs confirmed some form of bug to you. If so, can you help me understand what you saw and how we should proceed to getting this properly reported? | 13:39 |
falcojr | ggoodman: I'll create the bug and link it here | 13:41 |
falcojr | to your IRC question, we have https://irclogs.ubuntu.com/ | 13:42 |
falcojr | for a brief summary, we added a new "hotplug" feature that allows a network device to be added to the system and cloud-init will detect it and automatically update the metadata and bring it up | 13:43 |
falcojr | the behavior is disabled by default, but for newly added devices, we still do a check to see if we should do anything | 13:44 |
falcojr | and the number of virtual interfaces generated by the docker containers is causing those checks to pile up and consume CPU | 13:45 |
ggoodman | Is there any bit of configuration that would allow us to disable even that behaviour? I think some of our platform folks were a bit worried about wiping out the udev rules as that would put is kind of outside the cloud-init 'happy path'. While I don't 100% agree with that risk-assessment, any other options would be very welcome! :D | 13:50 |
falcojr | ggoodman: Nothing I can think of at the moment. That udev rule is new to the latest release, and the fix will likely involve not writing the rule until we're sure we want the hotplug functionality enabled, rather than always writing it as we currently do | 13:55 |
ggoodman | Thanks falcojr. Makes sense. | 13:56 |
ggoodman | Do you happen to know if there's a (simple) way of installing a version prior to the introduction of the hotplug functionality via `apt`? | 13:57 |
falcojr | we don't have a simple way of getting back to 21.2, but 21.1 should still be in main. "apt install cloud-init=21.1-19-gbad84ad4-0ubuntu2" | 14:03 |
ggoodman | Thanks :D | 14:08 |
falcojr | ggoodman: https://bugs.launchpad.net/cloud-init/+bug/1946003 | 14:19 |
ubottu | Launchpad bug 1946003 in cloud-init "hotplug causing cloud-init to spike CPU usage" [Critical, Triaged] | 14:19 |
ggoodman | ❤️ | 14:21 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!