/srv/irclogs.ubuntu.com/2020/09/04/#cloud-init.txt

smoserOdd_Bloke: the case is just "I have to manually clean out state".  Which I think is limited to /var/lib/cloud/instance link.12:50
smoserOdd_Bloke:i do see your point about reading /var/lib/cloud/cloud-config.txt as config and then basically persisting that across, but i dont think thats really a big deal. do you see a case where it is?12:53
Odd_BlokeI could imagine a use case (perhaps in the broken cloud scenario, but perhaps also more generally) where users might want to be able to temporarily "unlock" an instance (so it's in "check" mode) when they are performing a capture, and then lock it back again (to "trust" mode) after they've captured the image.13:16
Odd_BlokeAnd I think you could do this today manually, but you'd have to know when to move the state aside and when to move it back.13:18
Odd_BlokeI guess there's also the case where you mistakenly passed manual_cache_clean when launching an instance; there's no good way to undo that once the instance has launched.13:20
Odd_BlokeSo, to be clear, I do think what `manual_cache_clean` does today is reasonable and, in some cases, desirable; if you pass it as user-data, you really do have no choice but to clean the cache (or fake a clean with mv) to switch from "trust" to "check", and that ensures that your cache state will never leak out into a captured image.13:24
Odd_BlokeI do think there's a gap of sorts for a more easily reversible version of this.13:27
Odd_BlokeBy "in some cases, desirable", I mean that for many cases it is sufficient (but people might prefer a reversible option), and there are some cases where it is exactly what people want (because they don't want it to be easy to undo).13:28
Odd_Blokesmoser: ^13:29
smoserOdd_Bloke:but why would you ever want to "undo that once the instance is launched"13:40
smoserit really only affects *future* instances.13:40
smoseri do not have a use case for flipping it on and off for *this* instance.13:41
smoserOdd_Bloke: you can (i think) get what you were after by writing a ocnfig file somewhere.13:44
smosera.) launch instance with userdata setting of 'manual_cache_clean'13:44
smoserb.) write file in /etc/cloud/cloud.cfg.d/manual_clean.cfg (manual_cache_clean: true)13:44
smoserc. now you're in manual clean mode next time13:45
smoserd.) to disable this setting, you now just have to 'rm /etc/cloud/cloud.cfg.d/manual_clean.cfg /var/lib/cloud/instance/manual-cache-clean'13:46
smoseri think your use case essentially boils down to "i can't change the content of cached user-data during the lifetime of an instance".13:46
smoserwhich is true for *all* settings there.13:46
smosermanual-cache-clean is just made more annoying because of the marker file.13:47
smoserthat marker file exists only so that ds-identify can avoid parsing cloud config files itself.13:47
Odd_Blokesmoser: Should that be "without" in (a)?14:09
smoseryes14:13
* smoser curses his feeble mind14:13
Odd_BlokeOK, good, then I agree.14:13
smoserand actually..14:14
smoseri think the "enable" for this instance can be simplified to just14:14
Odd_BlokeI think the confusing part is that manual_cache_clean modifies "the lifetime of an instance" and I wasn't thinking about it in that way.14:14
smosertouch /var/lib/cloud/instance/manual-cache-clean14:14
smoseri woudln't want to advertise that interface (because its only ithere to help ds-identify)14:14
Odd_Bloke(I'm not saying that it confusing me means anything needs changing, to be clear. :)14:15
smoserOdd_Bloke:well... although it should not necessarily be true, *userdata* modifies "the lifetime of an instance"14:15
paridehey falcojr3, the ubuntu-sru manual verification PRs LGTM. Can I go ahead and merge?14:16
paridefalcojr3, we should be able to tick many boxes in the manual verification cards then :)14:16
Odd_Blokesmoser: I'm not sure I followed that point, could you expand on it?14:17
smoseryour complaint is that user-data modfiied the instance "permenantly" (you can't change the user-data ... /var/lib/cloud/instance/cloud-config.txt is an artifact of user-data, right?)14:18
smoseron AWS at least, user-data *can* be changed within the lifetime of an instance.14:18
smoserbut in cloud-init such changes will not be recognized.14:19
Odd_BlokeI wouldn't characterise it as a complaint: I just meant that I wasn't understanding what manual_cache_clean was doing (and why) because I wasn't thinking about it in those terms.14:20
smoserok. but my point is that manual_cache_clean (when set in user-data) has the same lifetime as *all* things set in user-data.14:20
smoseri'm not certain that is true, but i think so.14:20
Odd_BlokeI think it's true that manual_cache_clean (regardless of value) has the same lifetime as the other user-data that is specified alongside it.  But it affects what that lifetime is: if it is false (i.e. "check"), then the lifetime ends once a new instance ID is detected, but if it's true then the lifetime is the same as the lifetime of the state directory (i.e. until a manual cache clean).14:27
Odd_BlokeAnd I think that makes total sense, it's literally named "manual cache clean" (as you pointed out yesterday afternoon).14:29
smoser+1. i think that is excactly the point.14:29
smoser:)14:29
smoseryeah.14:29
smoserbut you wrote that very well.14:29
Odd_BlokeThanks. :)14:29
Odd_BlokeSo I think where I got confused is how you would switch the lifetime _back_ to being scoped to instance ID.14:30
Odd_BlokeAnd right now, I think you can't do that (non-hackily) if you've specified manual_cache_clean in user-data.14:30
Odd_Bloke(But that's separate, and I was conflating the two.)14:30
smoseryeah.14:31
smoserand i think the reason for that is... that user-data cannot be changed period in a non-hacky way14:31
smosermanual_cache_clean has the additional hack marker file. but other than that, it would be the same as other settings.14:32
smoserpossibly a bettter name would just make this all obvious14:32
Odd_BlokeYes and no: _if_ (and I'm not proposing we change to this, to be very clear) manual_cache_clean was processed only by writing out the manual-cache-clean file based on the configuration determined from a datasource (i.e. _not_ using the cached user-data at all), and _only_ the flag file were used to determine what mode we were in, then it would be true that you couldn't modify user-data, but you also14:34
Odd_Blokewouldn't need to.14:34
Odd_BlokeBecause if the flag were removed, then the old user-data would be disregarded entirely.14:35
Odd_BlokeSo I think we could implement something like this that wouldn't require hacking at user-data to modify, if we wanted to.14:36
Odd_BlokeDo we want to implement such a thing?  I don't think so ATM.14:37
smoser+1.14:37
falcojr3paride: Yes, thank you14:41
paridefalcojr3, merging!14:41
smoserOdd_Bloke:thank you for pushing/investigating this.14:50
Odd_Blokesmoser: Sure thing!  I'll ping you for doc review once I've figured out the best way of capturing the above.14:57
AnhVoMSFTwhat does it typically mean when udevadm settle failed within cloud-init https://paste.ubuntu.com/p/rYfhBxyvbZ/15:42
smoseri'd say "typically" udevadm-settle doesn't fail15:45
smosermy only 2 guesses for why:15:46
smosera.) timeout of some disk io15:46
smoserb.) i forget15:47
AnhVoMSFTSo this error happened during ephemeral dhcp15:47
AnhVoMSFTFound unstable nic names ['eth0']; calling udevadm settle15:47
AnhVoMSFTutil.py[DEBUG]: Running command ['udevadm', 'settle'] with allowed return codes [0] (shell=False, capture=True)15:48
AnhVoMSFTutil.py[DEBUG]: Waiting for udev events to settle took 120.147 seconds15:48
AnhVoMSFTThen there's a Traceback on ProcessExecutionError. I think after 2 minutes udevadm timeout-ed and returned error 115:48
smoseryeah, it clearly timed out.15:48
smosermaybe dmesg has some info15:48
AnhVoMSFTlet me check15:54
smoseri really suspect that there is very slow disks attached. or network attached disks and a bad network.16:04
AnhVoMSFTlooks like the vm got reboot sometimes after that and the dmesg log that was collected was after the reboot16:10
Odd_BlokeAnhVoMSFT: I would expect `journalctl -k` to include (most? IDK exactly) of the dmesg logs from previous boots.16:11
AnhVoMSFTit's from one of our automated nightly run - the VM is already gone. What can we collect in cloud-init log to make it easier to root cause the issue next time?16:12
Odd_Blokefalcojr3: What can I pick up SRU-wise?  It looks like I could do SoftLayer or OpenStack, based on the board, but I know you mentioned you were looking at OpenStack access so I don't want to start on that if you're already partially through it. :)16:13
Odd_Bloke(Also "Falco Junior the Third"?? ;)16:13
smoserAnhVoMSFT: running 'cloud-init collect-logs'16:16
smoseras that would contain the journal for the current boot16:17
falcojr3Odd_Bloke yes, I'll pick up openstack. Pretty much anything else including any of the bugs in the bug card would be good16:32
falcojr3and yeah, rebooted the box my instance of the lounge is hosted on and now I'm magically falcojr316:33
blackboxswok falcojr3 I'm back. and ready to start cloud-init SRU verification work. what would you like me to work?16:34
blackboxswsmoser I finished a driveby xenial cloud-utils PR that you probably have a *lot* more context on (daily maas image builds broke yesterday), hence by absence from cloud-init stuff.16:34
blackboxswfalcojr3: I'll grab SRU reviews first16:35
blackboxswand then get manual SRU verification tasks16:35
smoserblackboxsw:link ?16:44
blackboxswsmoser: https://code.launchpad.net/~chad.smith/ubuntu/+source/cloud-utils/+git/cloud-utils/+merge/39031816:44
blackboxswbackport of two of your separate overlayfs fixes into xenial16:44
blackboxswcould have combined them, but thought maybe separate cherry-pick backports may be easier to read/review16:45
blackboxswcloud-init SRU-wise just grabbed the cloud-init query decode user-data test16:46
* blackboxsw wonders if we should make our SRU trello process board public, so external folks could get visibility to the verification process (and contribute manual SRU tests for some of the one off bugs)16:48
Odd_Blokefalcojr3: Aha, right; now we have checklist assignment we aren't creating separate cards for all of those, right?16:48
blackboxswright I believe Odd_Bloke we assign our avatar to each checklist item16:49
blackboxswand check it off once done16:49
Odd_Blokeblackboxsw: falcojr3: Ack, I've updated our template to reflect this for next SRU: https://trello.com/c/6ym50IN3/9-create-trello-cards-for-each-commit-that-could-represent-a-functional-change-to-ubuntu16:50
blackboxsw+1 Odd_Bloke I just updated the top checklist item there so we get a trello formatted log2dch output, which makes it easier for us to link to the individ issues from the card16:59
blackboxswlog2dch --trello16:59
Odd_BlokeNice, thanks!17:01
smoserblackboxsw: i acke'd. but i would appreciate a fix for 'new' to 'knew'17:27
blackboxswsmoser: thanks! checking and will address it17:27
lucasmourablackboxsw, falcojr3 it seems that PRs #357 and #335 are already in cloud-init 20.219:16
lucasmouraI checked that #357 was already verified in the last SRU, but could not find verification for #33519:16
blackboxswdouble checking too19:16
blackboxswlucasmoura: git describe 7dceb9882590fb738ac0ff3429908cc6c945485a19:18
blackboxsw20.2-3-g7dceb988219:18
blackboxswyep looks like it was in that SRU and already released.19:18
blackboxswlucasmoura: probably don't need to recheck that content unless you want to19:19
blackboxswit's *just* schema validation which generates a warning log at best if people are providing invalid schema19:19
lucasmouraI think we can remove them from the sru list19:20
blackboxswso might just add ~ before around the text in that checklist item  or remove it19:20
blackboxswyep19:20
blackboxswsave yourself time there19:20
lucasmouraGot it. Also, I think we can skip this PR too: https://github.com/canonical/cloud-init/pull/44319:21
lucasmouraWhat do you think ?19:21
lucasmouraOh wait, I have found a place where it is used, maybe I can still directly test that19:23
blackboxswlucasmoura: I think, right, we  need to test the actual logic change that is using that call with the False param19:23
lucasmouraack19:24
blackboxswminor tooling improvement for log2dch19:57
blackboxswto create links we can click in the bug verification checklist https://github.com/canonical/uss-tableflip/pull/6119:57
blackboxswI'm updating that trello card now19:57
blackboxswwith the markdown output by log2dch --trello19:57
blackboxswusing this branch19:58
falcojr3a lot of our manual sru scripts look specifically for "Trace"20:34
falcojr3shouldn't that be "TRACE"?20:34
falcojr3or is that looking for a specific "Trace" message?20:34
=== falcojr3 is now known as falcojr
Odd_Blokefalcojr: That's looking for a Python traceback, I believe.20:44
Odd_Bloke(i.e. "Traceback (most recent call last)")20:44
falcojrAh20:45

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!