[06:42] <cpaelzer> hi cloud-init party people, I have a system that is maas deployed
[06:42] <cpaelzer> on a reboot I know see that it seems to have trouble communicating
[06:42] <cpaelzer> due to that boot is super slow
[06:42] <cpaelzer> I get on the SOL console a bunch of "failed posting event"
[06:42] <cpaelzer> and that slows down the network init "a lot"
[06:43] <cpaelzer> in fact - I assume due to timeouts - I see a one minute delay between every action
[06:43] <cpaelzer> example:
[06:43] <cpaelzer> [  316.066379] cloud-init[1075]: 2022-03-29 06:33:32,161 - handlers.py[WARNING]: failed posting event: finish: init-network/activate-datasource: SUCCESS: activating datasource
[06:43] <cpaelzer> [  376.131293] cloud-init[1075]: 2022-03-29 06:34:32,226 - handlers.py[WARNING]: failed posting event: start: init-network/config-migrator: running config-migrator with frequency always
[06:43] <cpaelzer> [  436.243233] cloud-init[1075]: 2022-03-29 06:35:32,338 - handlers.py[WARNING]: failed posting event: start: init-network/config-write-files: running config-write-files with frequency once-per-instance
[06:43] <cpaelzer> [  496.317115] cloud-init[1075]: 2022-03-29 06:36:32,412 - handlers.py[WARNING]: failed posting event: start: init-network/config-growpart: running config-growpart with frequency always
[06:43] <cpaelzer> [  556.477652] cloud-init[1075]: 2022-03-29 06:37:32,572 - handlers.py[WARNING]: failed posting event: finish: init-network/config-resizefs: SUCCESS: config-resizefs ran successfully
[06:43] <cpaelzer> [  616.581319] cloud-init[1075]: 2022-03-29 06:38:32,676 - handlers.py[WARNING]: failed posting event: start: init-network/config-update_hostname: running config-update_hostname with frequency always
[06:43] <cpaelzer> [  676.673728] cloud-init[1075]: 2022-03-29 06:39:32,768 - handlers.py[WARNING]: failed posting event: finish: init-network/config-update_etc_hosts: SUCCESS: config-update_etc_hosts ran successfully
[06:43] <cpaelzer> [  736.753728] cloud-init[1075]: 2022-03-29 06:40:32,848 - handlers.py[WARNING]: failed posting event: finish: init-network/config-rsyslog: SUCCESS: config-rsyslog previously ran
[06:43] <cpaelzer> [  796.809045] cloud-init[1075]: 2022-03-29 06:41:32,904 - handlers.py[WARNING]: failed posting event: finish: init-network/config-users-groups: SUCCESS: config-users-groups previously ran
[06:43] <cpaelzer> [  856.881363] cloud-init[1075]: 2022-03-29 06:42:32,976 - handlers.py[WARNING]: failed posting event: finish: init-network: SUCCESS: searching for network datasources
[06:43] <cpaelzer> once finished the rest of the system is happy
[06:44] <cpaelzer> I'm clear that this is already an error path as it can't post the events - so slowness is due to the underlying issue whatever it will be
[06:44] <cpaelzer> two questions for this
[06:44] <cpaelzer> 1. what would be the best place to start looking why it fails to post these?
[06:45] <cpaelzer> 2. should we make the error path somewhat less-waiting? For example start with the 60 second timeout that we seem to have, but then over time dimish that to 60 -> 45 -> 30 -> 15 -> 5 seconds so that the bad-path isn't "that slow" ?
[06:45] <cpaelzer> holmanb: falcojr: blackboxsw: ^^ ?
[11:26] <minimal> cpaelzer: Hi. Which version of cloud-init are you using? have you defined any reporting handlers? which DataSource are you using?
[12:06] <cpaelzer> hi minimal (and anyone else reading this later)
[12:06] <cpaelzer> as I mentioned this is a maas deployed system, so as I'd expect it uses DataSourceMAAS
[12:07] <cpaelzer> 2021-11-25 10:52:05,249 - stages.py[INFO]: Loaded datasource DataSourceMAAS - DataSourceMAAS [http://10-245-168-0--21.maas-internal:5248/MAAS/metadata/]
[12:07] <cpaelzer> version is: 22.1-14-g2e17a0d6-0ubuntu1~20.04.3
[12:07] <cpaelzer> maas defines reporting handlers AFAIK, I'd need to look what exactly it had set up
[12:11] <cpaelzer> I see the url_helper to try to open a connection
[12:11] <cpaelzer> that is when the 60 second timeout happens
[12:12] <cpaelzer> and once the 60 sec are done I get the "failed posting event"
[12:13] <cpaelzer> url_helper.py[DEBUG]: [0/1] open 'http://10-245-168-0--21.maas-internal:5248/MAAS/metadata/status/wmy6y6' with {'url': 'http://10-245-168-0--21.maas-internal:5248/MAAS/metadata/status/wmy6y6', 'allow_redirects': True, 'method': 'POST', 'headers': {'Authorization':  <keys/tokens removed>}} configuration
[13:01] <minimal> cpaelzer: ok, I misread "maas deployed" as a typo to "mass deployed"
[13:02] <cpaelzer> I see :-)
[13:02] <minimal> I'm suspecting that it is trying to post events before a network is actually up (and so such posts fail)
[13:03] <minimal> you'll notice all the warnings you posted mentioned "init-network"
[13:13] <cpaelzer> minimal: oh I have seen it fail later as well
[13:14] <cpaelzer> e.g. here from modile-final (that is later)
[13:14] <minimal> cpaelzer: ok, my point is that a request cannot be made to a HTTP url without a working network connection
[13:14] <cpaelzer> Mar 29 09:20:55 node-horsea cloud-init[5577]: Cloud-init v. 22.1-14-g2e17a0d6-0ubuntu1~20.04.3 finished at Tue, 29 Mar 2022 09:19:55 +0000. Datasource DataSourceMAAS [http://10-245-168-0--21>
[13:14] <cpaelzer> Mar 29 09:20:55 node-horsea cloud-init[5577]: 2022-03-29 09:20:55,962 - handlers.py[WARNING]: failed posting event: finish: modules-final/config-power-state-change: SUCCESS: config-power-sta>
[13:14] <cpaelzer> Mar 29 09:21:56 node-horsea cloud-init[5577]: 2022-03-29 09:21:56,031 - handlers.py[WARNING]: failed posting event: finish: modules-final: SUCCESS: running modules for final
[13:16] <minimal> so that would explain the earlier warnings
[13:16] <cpaelzer> maybe there is a way I can re-create a valid request out of that url_helper.py[DEBUG] log entry?
[13:16] <cpaelzer> I could then issue that "now" and see if we see e.g. failed-auth, or routing or ...
[13:18] <minimal> I see no reference to port 5248 in the MAAS cloud-init DataSource code and so assume that this url is somehow added to the cloud-init config (i.e. /etc/cloud/cloud.cfg or file in /etc/cloud/cloud.cfg.d/)
[13:19] <minimal> so it seems like its a MAAS-specific configuration issue (of the VM image you are using) rather than than a cloud-init one
[13:21] <falcojr> cpaelzer: https://bugs.launchpad.net/cloud-init/+bug/1910552
[13:21] <falcojr> IIRC, maas has a configuration to automatically turn the reporting behavior on
[13:22] <falcojr> in the near future, I'd like to make this reporting behavior off the main thread as to not block other things
[13:24] <cpaelzer> falcojr: so cloud-init would boot faster again then, and the reportes will (in a different thread) try to submit as they do now?
[13:25] <falcojr> cpaelzer: yes
[13:25] <cpaelzer> falcojr: have you seen jerzy asking if they could set "only report on initial boot" somehow?
[13:25] <cpaelzer> because only on deploy maas will listen (and care)
[13:26] <cpaelzer> are they asking how to do, or do they know and plan to add this in 3.3.0
[13:27] <falcojr> cpaelzer: I took it to mean they know and are planning to do it on the maas side
[13:29] <falcojr> cpaelzer: on a maas instance, at /etc/cloud/cloud.cfg.d/90_dpkg_local_cloud_config.cfg, I get
[13:29] <falcojr> # written by cloud-init debian package per preseed entry
[13:29] <falcojr> # cloud-init/local-cloud-config
[13:29] <falcojr> manage_etc_hosts: true
[13:29] <falcojr> manual_cache_clean: true
[13:29] <falcojr> reporting:
[13:29] <falcojr>   maas:
[13:29] <falcojr>     consumer_key: FBq7akSpdqmhesRNQj
[13:29] <falcojr>     endpoint: http://10-10-10-0--24.maas-internal:5248/MAAS/metadata/status/pwcbwg
[13:29] <falcojr>     token_key: SD6q9qBqWaRxkhCM3r
[13:29] <falcojr>     token_secret: cU5ktnQvnPLEqaEVyJHNrcRhzANtCfhF
[13:29] <falcojr>     type: webhook
[13:30] <falcojr> (that's on an internal VM so not concerned about secrets)
[13:33] <minimal> falcojr: btw when I was looking for docs on c-i reporting in general the only docs as such I could find was doc/examples/cloud-config-reporting.txt
[13:33] <minimal> there's no real info (apart from the source code) into the various reporting dests
[13:34] <falcojr> minimal: yeah, we definitely need to add some documentation around it
[13:34] <minimal> falcojr: e.g. I didn't know there was Azure/Hyper-V reporting until I noticed it in the source
[13:36] <minimal> wondering if that would give similar error as the relevant hyperv/azure kernel module and daemon might not be ready before c-i would try to talk to it
[13:41] <falcojr> don't know the specifics of that off the top of my head, but I imagine it would be the same as long as there's some kind of timeout to the request
[13:45] <minimal> falcojr: I'm working on a OS image for Azure currently and at present I don't start the hv_kvp daemon until the default runlevel whereas c-i (e.g. cloud-init-local) obviously starts earlier
[19:14] <blackboxsw> holmanb: https://github.com/canonical/cloud-init/pull/1357 because tox -e do_format no longer works locally due to https://github.com/psf/black/issues/2964
[19:16] <blackboxsw> Now that we've completed a successful SRU of cloud-init with you driving the downstream ubuntu release,I think it's time we close out on commit bits for the project per https://discourse.ubuntu.com/t/commit-rights-to-cloud-init-for-brett-holman/26271.
[19:17] <blackboxsw> we'll cobble up an email to the mailing list announcing commit rights and hope to get you owning PR review and merging permissions in short order.
[19:18] <blackboxsw> the PR above might be a simple one we can go through to test commit bit access
[19:31] <blackboxsw> json schema-wise this PR is fairly straight-forward https://github.com/canonical/cloud-init/pull/1358 though I probably should consolidate rtd/examples/cloud-config-mount-points into the meta schema examples for the module
[19:53] <holmanb> @blackboxsw: I can review 1358 once it passes ci :)
[19:55] <holmanb> blackboxsw: otherwise falcojr suggested #1311 sidechannel as a first candidate so I might do that first
[19:56] <blackboxsw> +1 on 1311. now that 1357 is landed. since you've already reviewed 1311 anyway.
[20:50] <holmanb> @blackboxsw: oops, I missed the comment about do_format earlier today
[20:50] <holmanb> @blackboxsw: will look into fixing that
[22:43] <blackboxsw> holmanb: no worries James merged it.
[22:44] <blackboxsw> and looks like you got the ds-identify LXD detection for VMS PR merged  good deal. I'd like us to generate and test a new-upstream-snapshot release to Jammy if we could this week. 
[22:49] <blackboxsw> as we have hit beta freeze and coming up on final freeze, it'd be good to get our content from cloud-init into this release to avoid 0-day SRU type situations.