/srv/irclogs.ubuntu.com/2022/03/29/#cloud-init.txt

=== paride3 is now known as paride
=== cpaelzer_ is now known as cpaelzer
cpaelzerhi cloud-init party people, I have a system that is maas deployed06:42
cpaelzeron a reboot I know see that it seems to have trouble communicating06:42
cpaelzerdue to that boot is super slow06:42
cpaelzerI get on the SOL console a bunch of "failed posting event"06:42
cpaelzerand that slows down the network init "a lot"06:42
cpaelzerin fact - I assume due to timeouts - I see a one minute delay between every action06:43
cpaelzerexample:06:43
cpaelzer[  316.066379] cloud-init[1075]: 2022-03-29 06:33:32,161 - handlers.py[WARNING]: failed posting event: finish: init-network/activate-datasource: SUCCESS: activating datasource06:43
cpaelzer[  376.131293] cloud-init[1075]: 2022-03-29 06:34:32,226 - handlers.py[WARNING]: failed posting event: start: init-network/config-migrator: running config-migrator with frequency always06:43
cpaelzer[  436.243233] cloud-init[1075]: 2022-03-29 06:35:32,338 - handlers.py[WARNING]: failed posting event: start: init-network/config-write-files: running config-write-files with frequency once-per-instance06:43
cpaelzer[  496.317115] cloud-init[1075]: 2022-03-29 06:36:32,412 - handlers.py[WARNING]: failed posting event: start: init-network/config-growpart: running config-growpart with frequency always06:43
cpaelzer[  556.477652] cloud-init[1075]: 2022-03-29 06:37:32,572 - handlers.py[WARNING]: failed posting event: finish: init-network/config-resizefs: SUCCESS: config-resizefs ran successfully06:43
cpaelzer[  616.581319] cloud-init[1075]: 2022-03-29 06:38:32,676 - handlers.py[WARNING]: failed posting event: start: init-network/config-update_hostname: running config-update_hostname with frequency always06:43
cpaelzer[  676.673728] cloud-init[1075]: 2022-03-29 06:39:32,768 - handlers.py[WARNING]: failed posting event: finish: init-network/config-update_etc_hosts: SUCCESS: config-update_etc_hosts ran successfully06:43
cpaelzer[  736.753728] cloud-init[1075]: 2022-03-29 06:40:32,848 - handlers.py[WARNING]: failed posting event: finish: init-network/config-rsyslog: SUCCESS: config-rsyslog previously ran06:43
cpaelzer[  796.809045] cloud-init[1075]: 2022-03-29 06:41:32,904 - handlers.py[WARNING]: failed posting event: finish: init-network/config-users-groups: SUCCESS: config-users-groups previously ran06:43
cpaelzer[  856.881363] cloud-init[1075]: 2022-03-29 06:42:32,976 - handlers.py[WARNING]: failed posting event: finish: init-network: SUCCESS: searching for network datasources06:43
cpaelzeronce finished the rest of the system is happy06:43
cpaelzerI'm clear that this is already an error path as it can't post the events - so slowness is due to the underlying issue whatever it will be06:44
cpaelzertwo questions for this06:44
cpaelzer1. what would be the best place to start looking why it fails to post these?06:44
cpaelzer2. should we make the error path somewhat less-waiting? For example start with the 60 second timeout that we seem to have, but then over time dimish that to 60 -> 45 -> 30 -> 15 -> 5 seconds so that the bad-path isn't "that slow" ?06:45
cpaelzerholmanb: falcojr: blackboxsw: ^^ ?06:45
=== FergusL2 is now known as FergusL
minimalcpaelzer: Hi. Which version of cloud-init are you using? have you defined any reporting handlers? which DataSource are you using?11:26
cpaelzerhi minimal (and anyone else reading this later)12:06
cpaelzeras I mentioned this is a maas deployed system, so as I'd expect it uses DataSourceMAAS12:06
cpaelzer2021-11-25 10:52:05,249 - stages.py[INFO]: Loaded datasource DataSourceMAAS - DataSourceMAAS [http://10-245-168-0--21.maas-internal:5248/MAAS/metadata/]12:07
cpaelzerversion is: 22.1-14-g2e17a0d6-0ubuntu1~20.04.312:07
cpaelzermaas defines reporting handlers AFAIK, I'd need to look what exactly it had set up12:07
cpaelzerI see the url_helper to try to open a connection12:11
cpaelzerthat is when the 60 second timeout happens12:11
cpaelzerand once the 60 sec are done I get the "failed posting event"12:12
cpaelzerurl_helper.py[DEBUG]: [0/1] open 'http://10-245-168-0--21.maas-internal:5248/MAAS/metadata/status/wmy6y6' with {'url': 'http://10-245-168-0--21.maas-internal:5248/MAAS/metadata/status/wmy6y6', 'allow_redirects': True, 'method': 'POST', 'headers': {'Authorization':  <keys/tokens removed>}} configuration12:13
minimalcpaelzer: ok, I misread "maas deployed" as a typo to "mass deployed"13:01
cpaelzerI see :-)13:02
minimalI'm suspecting that it is trying to post events before a network is actually up (and so such posts fail)13:02
minimalyou'll notice all the warnings you posted mentioned "init-network"13:03
cpaelzerminimal: oh I have seen it fail later as well13:13
cpaelzere.g. here from modile-final (that is later)13:14
minimalcpaelzer: ok, my point is that a request cannot be made to a HTTP url without a working network connection13:14
cpaelzerMar 29 09:20:55 node-horsea cloud-init[5577]: Cloud-init v. 22.1-14-g2e17a0d6-0ubuntu1~20.04.3 finished at Tue, 29 Mar 2022 09:19:55 +0000. Datasource DataSourceMAAS [http://10-245-168-0--21>13:14
cpaelzerMar 29 09:20:55 node-horsea cloud-init[5577]: 2022-03-29 09:20:55,962 - handlers.py[WARNING]: failed posting event: finish: modules-final/config-power-state-change: SUCCESS: config-power-sta>13:14
cpaelzerMar 29 09:21:56 node-horsea cloud-init[5577]: 2022-03-29 09:21:56,031 - handlers.py[WARNING]: failed posting event: finish: modules-final: SUCCESS: running modules for final13:14
minimalso that would explain the earlier warnings13:16
cpaelzermaybe there is a way I can re-create a valid request out of that url_helper.py[DEBUG] log entry?13:16
cpaelzerI could then issue that "now" and see if we see e.g. failed-auth, or routing or ...13:16
minimalI see no reference to port 5248 in the MAAS cloud-init DataSource code and so assume that this url is somehow added to the cloud-init config (i.e. /etc/cloud/cloud.cfg or file in /etc/cloud/cloud.cfg.d/)13:18
minimalso it seems like its a MAAS-specific configuration issue (of the VM image you are using) rather than than a cloud-init one13:19
falcojrcpaelzer: https://bugs.launchpad.net/cloud-init/+bug/191055213:21
ubottuLaunchpad bug 1910552 in MAAS "machines fail to boot if MAAS doesn't respond to cloud-init" [Medium, Triaged]13:21
falcojrIIRC, maas has a configuration to automatically turn the reporting behavior on13:21
falcojrin the near future, I'd like to make this reporting behavior off the main thread as to not block other things13:22
cpaelzerfalcojr: so cloud-init would boot faster again then, and the reportes will (in a different thread) try to submit as they do now?13:24
falcojrcpaelzer: yes13:25
cpaelzerfalcojr: have you seen jerzy asking if they could set "only report on initial boot" somehow?13:25
cpaelzerbecause only on deploy maas will listen (and care)13:25
cpaelzerare they asking how to do, or do they know and plan to add this in 3.3.013:26
falcojrcpaelzer: I took it to mean they know and are planning to do it on the maas side13:27
falcojrcpaelzer: on a maas instance, at /etc/cloud/cloud.cfg.d/90_dpkg_local_cloud_config.cfg, I get13:29
falcojr# written by cloud-init debian package per preseed entry13:29
falcojr# cloud-init/local-cloud-config13:29
falcojrmanage_etc_hosts: true13:29
falcojrmanual_cache_clean: true13:29
falcojrreporting:13:29
falcojr  maas:13:29
falcojr    consumer_key: FBq7akSpdqmhesRNQj13:29
falcojr    endpoint: http://10-10-10-0--24.maas-internal:5248/MAAS/metadata/status/pwcbwg13:29
falcojr    token_key: SD6q9qBqWaRxkhCM3r13:29
falcojr    token_secret: cU5ktnQvnPLEqaEVyJHNrcRhzANtCfhF13:29
falcojr    type: webhook13:29
falcojr(that's on an internal VM so not concerned about secrets)13:30
minimalfalcojr: btw when I was looking for docs on c-i reporting in general the only docs as such I could find was doc/examples/cloud-config-reporting.txt13:33
minimalthere's no real info (apart from the source code) into the various reporting dests13:33
falcojrminimal: yeah, we definitely need to add some documentation around it13:34
minimalfalcojr: e.g. I didn't know there was Azure/Hyper-V reporting until I noticed it in the source13:34
minimalwondering if that would give similar error as the relevant hyperv/azure kernel module and daemon might not be ready before c-i would try to talk to it13:36
falcojrdon't know the specifics of that off the top of my head, but I imagine it would be the same as long as there's some kind of timeout to the request13:41
minimalfalcojr: I'm working on a OS image for Azure currently and at present I don't start the hv_kvp daemon until the default runlevel whereas c-i (e.g. cloud-init-local) obviously starts earlier13:45
=== EugenMayer1 is now known as EugenMayer
blackboxswholmanb: https://github.com/canonical/cloud-init/pull/1357 because tox -e do_format no longer works locally due to https://github.com/psf/black/issues/296419:14
ubottuPull 1357 in canonical/cloud-init "black: bump pinned version to 22.3.0 to avoid click dependency issues" [Open]19:14
ubottuIssue 2964 in psf/black "Incompatible with click 8.1.0 (ImportError: cannot import name '_unicodefun' from 'click')" [Closed]19:14
blackboxswNow that we've completed a successful SRU of cloud-init with you driving the downstream ubuntu release,I think it's time we close out on commit bits for the project per https://discourse.ubuntu.com/t/commit-rights-to-cloud-init-for-brett-holman/26271.19:16
blackboxswwe'll cobble up an email to the mailing list announcing commit rights and hope to get you owning PR review and merging permissions in short order.19:17
blackboxswthe PR above might be a simple one we can go through to test commit bit access19:18
blackboxswjson schema-wise this PR is fairly straight-forward https://github.com/canonical/cloud-init/pull/1358 though I probably should consolidate rtd/examples/cloud-config-mount-points into the meta schema examples for the module19:31
ubottuPull 1358 in canonical/cloud-init "schema: add JSON schema for mcollective, migrator and mounts modules" [Open]19:31
holmanb@blackboxsw: I can review 1358 once it passes ci :)19:53
holmanbblackboxsw: otherwise falcojr suggested #1311 sidechannel as a first candidate so I might do that first19:55
blackboxsw+1 on 1311. now that 1357 is landed. since you've already reviewed 1311 anyway.19:56
holmanb@blackboxsw: oops, I missed the comment about do_format earlier today20:50
holmanb@blackboxsw: will look into fixing that20:50
blackboxswholmanb: no worries James merged it.22:43
blackboxswand looks like you got the ds-identify LXD detection for VMS PR merged  good deal. I'd like us to generate and test a new-upstream-snapshot release to Jammy if we could this week. 22:44
blackboxswas we have hit beta freeze and coming up on final freeze, it'd be good to get our content from cloud-init into this release to avoid 0-day SRU type situations.22:49

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!