rharper | falcojr: blackboxsw: for 1910552 ; I suspect that the issue is that the MAAS seed is network-based, to we do cache the ds such that we don't need to crawl the datasource on subsequent boots, *but*, maas configures a reporting endpoint, so each of cloud-init messages that we send to the reporter are attempting to POST those to an apparently dead MAAS. I believe the reporter configuration uses url_helper to post, and the timeout | 18:01 |
---|---|---|
rharper | is likely high and there are 10s of not 100s of messages we post during cloud-init stages ; | 18:01 |
falcojr | rharper: yeah, I figured that log spam was reporting endpoints, but that shouldn't be blocking anything, right? | 18:05 |
rharper | how long does each message take? it's not async in cloud-init | 18:06 |
rharper | so, each post has to fail before cloud-init can run the next module, etc | 18:06 |
rharper | for each close of the reporting context manager, it builds its post and pushes to endpoint ... that can hang ... | 18:07 |
rharper | the azure report uses threads to submit events async , which should be generalized , but never got time to refactor that into the general case. | 18:09 |
rharper | falcojr: yeah, so in the WebHookHander, no timeout value is set, the OautHelper (which maas uses) does not set one either and upstream requests module says if you don't set a timeout it will just hang. which *sounds* like what they're seeing ... one could setting timeout in WebHookHandler to something other than None to have the POST timeout after some number of seconds. | 18:14 |
falcojr | ahhh, thanks for that context...that's helpful | 19:14 |
falcojr | I couldn't find any specific call that hangs indefinitely...but there were a lot of 30-ish second calls being made between all of the various modules | 19:14 |
falcojr | I was suspecting that same call...so makes sense to try putting a timeout on it | 19:15 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!