/srv/irclogs.ubuntu.com/2020/02/19/#cloud-init.txt

=== tds1 is now known as tds
=== vrubiolo1 is now known as vrubiolo
=== vrubiolo1 is now known as vrubiolo
Odd_BlokeHeads up: GitHub is having availability issues: https://www.githubstatus.com/15:47
Odd_Blokepowersj: blackboxsw: So do we have a release process that I should be following?  (I don't _think_ I've cut an upstream release before.)17:55
blackboxswOdd_Bloke: I added you to the trello board17:55
blackboxswhttps://trello.com/b/obFWGwZY/upstream-cloud-init-20117:56
blackboxswI copied it yesterday from the template board https://trello.com/b/QQYFXpsA/template-sru-cloud-init-xy and deleted any "SRU" cards17:56
Odd_BlokeStatus update on 20.1: we've tested the change that we were worried about and it does not appear to be causing issues17:56
Odd_Blokeblackboxsw: Cool, thanks!17:56
Odd_Blokerharper: Where are we on the IMDS change you wanted to land for 20.1?17:56
rharperwell, it's not a one liner =/17:57
rharperwriting a unittest (augmenting one to ensure we do the redaction)17:57
blackboxswOdd_Bloke: and the SRU process for that trello board template came originally from https://github.com/CanonicalLtd/uss-tableflip/blob/master/doc/upstream_release_process.md17:57
Odd_Blokerharper: OK, but in your view it's close enough that it's worth waiting for?17:57
rharperit should be; but since it's not a one liner; I don't suppose we should wait17:58
Odd_Blokerharper: I'm going to go grab lunch, so you have a bit of time to get something up for review. :)18:08
rharperOdd_Bloke: https://github.com/canonical/cloud-init/pull/21919:15
blackboxswminor volley back rharper19:21
blackboxswI might have to re-review the token interaction again. jussec19:33
blackboxswyeah X-aws-ec2-metadata-token-ttl-seconds header is just a seconds value, non-secret. so no need to redact19:36
rharperblackboxsw: true but we don't need the value logged do we?19:37
rharperI've replied; having the value won't aide in debugging; and if we lower it, we just end up obtaining a new token more frequently;19:40
blackboxswI guess we would if at some point we reduce the hardcoded AWS_TOKEN_TTL_SECONDS which afair I thought we would intend at at some point.19:41
blackboxswfair rharper I concede it's not a big deal, just recognizing that we are redacting things that aren't necessarily sensitive, by that measure where do we draw the line.19:42
blackboxswI have a couple more nits. we're doing a lot more work in that function than we need with every retry.19:57
blackboxswwill finish my review in ~30. have to afk for a bit19:58
Odd_Blokerharper: blackboxsw: OK, that lunch break ended up longer than anticipated; I see that PR is Approved so I'll wait for it before cutting the release.20:11
blackboxswback as well now. a minor set of review comments to followup with20:11
blackboxswthrowing together a paste to see what you guys think20:12
rharperblackboxsw: on my errand I was thinking I'd like to see if I can just copy the redacted header value instead of all headers;20:28
blackboxswOdd_Bloke: rharper here's what I was thinking https://paste.ubuntu.com/p/NXXXyMZKFH/ . I think we should be hoisting the headers_cb calculation out of the retry for loop as those headers should not change across calls to the same url.20:30
rharperblackboxsw: well, that's a separate fixup  no ?20:30
rharperblackboxsw: an alternative approach I thought about was having the headers_cb return the headers and a list of keys to be redacted20:31
blackboxswas it is currently in the function, we are re-calculating and copying the dict of headers on each run. with rharper's change it gets more costly as we are now extracting each header on each retry and comparing it to the redacted list on every iteration of the liip20:31
blackboxswloop20:31
rharperand I guess we don't pass back to headers_cb that we're retrying it just gets called a second time20:31
blackboxswrharper: that'd work, but it's also doing that work on every single retry against a url.20:31
rharperso even if headers_cb wanted to vary based on retry it doesn't know20:32
blackboxswright, it just gets called again and again20:32
blackboxswsoliciting the same response20:32
rharperwell, I like you suggestion but it seems like an optimization20:32
blackboxswrharper: it does seem like an optimization.20:32
rharperand if I push this copy down to only if the key is in redacted list; we're copying a very small amount of data20:32
blackboxswrharper: well yes, we are copying a small amount and settting up a pointer to every key in req_args.20:33
rharperblackboxsw: for now, the only time that it's hitting is if we do a retry anyhow; the happy path never does it more than once;  but I generally like it;20:36
blackboxswright, I think the optimization can wait. I'll add the paste there for reference in case we want to go that route in the future.20:36
Odd_BlokeI'm not really following here, but dict operations are generally _very_ optimised in Python (because it's dicts all the way down, internally :p) so I would be surprised if this is a substantial slowdown.  Which is to say: I think we can land as-is and then follow with the optimisation.20:38
rharperOdd_Bloke: I'm pushing up two changes in just a second, one blackboxsw wanted the string moved to module variable; and the other is to push the copy down to the matched keys rather than copying all headers even if we don't have anything to redact20:39
blackboxswyep I've pasted that comment to the PR and set approved from my end20:40
Odd_Blokerharper: OK, nice.20:41
powersjOdd_Bloke, how's 20.1?20:56
Odd_Blokepowersj: We're landing this IMDS change, then cutting it.20:59
Odd_Blokeblackboxsw: Your Approved is from before rharper's latest push; are you still +1?21:00
blackboxswI'm still  +1 Odd_Bloke21:00
blackboxswyeah looks good21:00
Odd_BlokeCool, merged.21:01
blackboxswheh ohh well.21:01
blackboxswwe didn't really need to deepcopy the entry and then reset the value to REDACTED21:02
blackboxswOdd_Bloke: rharper not needed for 20.1, but here's a low-hanging-fruit review. sorry I just missed that21:05
blackboxswhttps://github.com/canonical/cloud-init/pull/22021:05
blackboxswhrm strike that.21:06
blackboxswshame on me21:06
rharperhehe21:06
blackboxswyes we did, otherwise we'd overwrite the actual headers21:06
rharperit would have had this done a lot faster if I knew I was modifying my req_args21:06
rharperand then we never pass tests... and the execption callback runs *forever*21:06
rharperit's inifinite21:06
rharperso I sat there... for some tim e21:06
blackboxswheh21:06
rharperbefore realizing what was happening21:06
blackboxswyeah I was wondering the same for me. oopsie21:06
blackboxswclosing that PR21:07
rharpereven after I realized that headers were a key *in* all of the args21:07
Odd_Blokepowersj: rharper: blackboxsw: So I just opened up the card for 20.1 (I used shortcuts to assign and move it earlier), and it calls for us to land https://github.com/canonical/cloud-init/pull/54 before cutting.21:13
Odd_BlokeWhat do we think?21:13
powersjthat needs to come after21:14
powersjI don't want to land that right before focal goes out21:14
Odd_BlokeAfter what?21:14
rharperwell21:14
rharperit's non master21:14
rharperit's on the xenial branch only21:14
powersjoh right21:14
blackboxswrharper: hrm. gonna need a second glance at that..... we are in the middle of the for loop where we are setting filtered_reg_args[k] = value... if we aren't copying over filtered_req_args['headers'] = req_args[k], then we don't have to deepcopy I don't think.   Again, doesn't matter for 20.1, but I do think again we can drop that21:14
powersjin which case, not an issue for 20.1, it is for the SRU21:14
Odd_BlokeOK, so that's an SRU thing, cool.21:15
rharperblackboxsw: I was thinking that was correct; it's just a single dict entry;  it *could* have a value of a dict; but I think we can reason that header values cannot be dict21:15
rharperblackboxsw: do we know if we pay a heavier cost calling deepcopy() when we don't need it with such a small dict ?21:16
* rharper smells a timeit coming on 21:16
blackboxswrharper: ahh your branch didn't have my local changes. I did the if k != 'headers': filtered_req_args[k] = v     else: filtered_req_args[k] = REDACTED21:16
blackboxswI wasn't setting filtered_req_args[k] = v   if we were dealing with a 'headers' key21:17
blackboxswrharper: +1 on headers values also being a dict and being concerned about not blowing that away21:17
blackboxswbut, I think our logic initially path setting that filtered_req_args[k] = v  and then dropping into if k == 'headers' to redact is what causes the need for the copy.21:18
blackboxswif we didn't set that in the first place, we wouldn't have to copy and overwrite21:18
blackboxswanyway, not for 20.1 but I'll reopen the appropriate cleanup for us to look at for later21:19
blackboxswand yes might as well add a timeit to the PR once up :)21:22
rharperblackboxsw: ok, I can see a way without copy21:25
rharperhttps://paste.ubuntu.com/p/2XQF6CPntK/21:25
rharperhowever, I wonder if it's really faster than copying the headers dict21:26
blackboxswhttps://github.com/canonical/cloud-init/pull/22021:26
rharperfor small sized dict;21:26
blackboxswhttps://github.com/canonical/cloud-init/pull/22121:27
blackboxswyeah not much performance improvement I don't think21:27
blackboxswyeah and now we've spent more time on this than we'll ever likely get back in performance runs ;)21:28
blackboxswsorry about the rathole there21:28
blackboxswespecially given that our only case of redacting is a simple string value and not a large dict of values21:29
johnsonshiAccording to https://cloudinit.readthedocs.io/en/latest/topics/modules.html#mounts , "Swap files can be configured by setting the path to the swap file to create with filename". The docs imply that cloud-init creates a swap file at the specified filename if using the swap module.22:19
Odd_Blokeblackboxsw: So I'm looking at the release process, and it assumes that I can push the commit I've created locally to master.  We, of course, can't do that with branch protection.  We can either revise the docs to document disabling branch protection temporarily (probably not?), or we can update things so that the tagging of the release happens _after_ merge (via GH button) into master.  However, I think22:20
Odd_Blokethat will cause CI to fail, because it will look for a 20.1 tag (to determine versions) and fail.22:20
johnsonshiMy swap settings contain:22:20
johnsonshi...22:20
johnsonshiswap:22:20
johnsonshi   filename: /mnt/resource/swapfile22:20
johnsonshi   ...22:20
johnsonshi...22:20
johnsonshiWhen the machine is provisioned, cloud-init fails to create the swapfile. The cloud-init.log file shows that the "swapon -a" command failed. When I ssh into the machine and try running "swapon -a" myself, it says that "swapon: stat failed /mnt/resource/swapfile: No such file or directory".22:20
johnsonshiHowever, I can confirm that the /mnt/resource path exists and is accessible, and I am able to read/write files within /mnt/resource. I can also see that cloud-init added the swap entry to /etc/fstab.22:20
johnsonshiWhen I look through the cloud-init code, I can see cloud-init adds the swap entry to /etc/fstab if it exists, then runs "swapon -a" without creating the swapfile.22:20
johnsonshiIs cloud-init supposed to create the swapfile by if it is instructed to do so (the docs seem to imply that)?22:20
Odd_Blokejohnsonshi: I believe you need to specify "size" for the creation to occur.22:22
blackboxswOdd_Bloke: doc reference?22:22
Odd_Blokeblackboxsw: https://github.com/CanonicalLtd/uss-tableflip/blob/master/doc/upstream_release_process.md22:22
blackboxswthnaks22:22
Odd_Blokeblackboxsw: Specifically "Note: you need to push the tag or c-i will fail in check version."22:23
blackboxswOdd_Bloke: we have the ability to git push tags22:23
blackboxswright?22:23
Odd_Blokeblackboxsw: But we don't have the commit to tag until after merge.22:23
Odd_BlokeIt never exists on my system until then.22:23
Odd_BlokeWhich means the 20.1 tag I have pushed (until I realised the conflict) points at the commit which will be squashed into master (and then discarded, of course).22:24
blackboxswahh right, but we will merge you MP into master once you've bumped version to 20.1 right22:24
Odd_BlokeRight.22:24
johnsonshiswap:22:25
johnsonshi   filename: /mnt/resource/swapfile22:25
johnsonshi   size: 214748364822:25
johnsonshiI tried it again and the swapfile still isn't being created.22:25
Odd_Blokejohnsonshi: Could you file a bug with `cloud-init collect-logs` attached to it and drop the link to it back in here?22:25
johnsonshicc_mounts.py does not seem to have code that creates a swapfile?22:25
blackboxswOdd_Bloke: ok, hrm, but your branch did pass ci https://github.com/canonical/cloud-init/pull/22222:25
Odd_Blokeblackboxsw: Right, I pushed the tag.22:25
* blackboxsw checks it out again22:25
Odd_BlokeAs instructed.22:25
blackboxswwith branch restrictions reduced?22:26
Odd_BlokeBut that tag points at a commit which will never be in master.22:26
blackboxswor tag push worked22:26
Odd_BlokeTag push is fine.22:26
blackboxswahh ok. gotcha22:26
johnsonshiOdd_Bloke: Sure thing. So missing swapfile creation is a bug due to the code not being in cloud-init?22:26
blackboxswnot sure if that's a broken process that should be fixes. stale tags pointing to nowhere for previous releases22:26
Odd_Blokejohnsonshi: What version of cloud-init are you using?22:26
johnsonshiOdd_Bloke: The distro I am using it with (RHEL 7.7) comes with cloud-init 18.5.22:27
Odd_BlokeYeah, that should definitely have swap support, I don't know why you wouldn't be seeing that in cc_mounts.22:28
Odd_BlokeThere have been changes since 18.4, however, so you might be hitting an existing bug.22:28
Odd_Bloke*18.522:28
Odd_Blokejohnsonshi: Are you able to try again with 19.4 somehow?22:28
Odd_Blokeblackboxsw: It looks like I fixed up the 19.4 tag on the day it happened.22:30
Odd_Blokeblackboxsw: (And 19.3 predated the GH move, I think?)22:30
blackboxsw johnsonshi: any of the copr dev builds work for you? https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/22:30
blackboxswOdd_Bloke:  you are right . ok so, branch restrictions causing some issues then with upstream releases.22:31
johnsonshiOdd_Bloke: wait a sec22:31
johnsonshiYeah I'll try out 19.422:31
Odd_BlokeThanks!22:31
Odd_Blokejohnsonshi: Actually, if you're able, there have been further changes since 19.4, so a test of a daily build would be best.  (That cloud-init-dev link has those, so you're already doing that if that's how you're going to do it. :)22:32
johnsonshiThanks!22:33
Odd_Blokeblackboxsw: I'm thinking that retagging after is probably easier in the short term.22:33
Odd_BlokeBut I think I'm going to remove the tag and re-run CI, to see if we actually need it before at all.22:34
Odd_BlokeJust in case that the move to Travis has happened to obviate that need.22:34
blackboxsw+1 Odd_Bloke22:35
rharperjohnsonshi: from I can tell, the additional paths to your file are the issue;  a work around would be to either use file in an existing directory; or use bootcmd to run an mkdir -p22:36
rharperand the fix is for cloud-init to use util.ensure_file() on the swapfile path22:36
Odd_Blokepowersj: FYI, we've run into some release tooling issues with the migration to GitHub.  I'm currently pursuing getting it right this time, so we don't have to revisit it, because I'm not aware of any pressing need for 20.1 to be out the door.  Am I missing some info, or are you happy for me to proceed along this path?22:37
powersjOdd_Bloke, can you estimate how long this delay would take?22:38
Odd_Blokepowersj: Tomorrow instead of tonight.22:39
powersjthat's fine22:39
johnsonshiOdd_Bloke: Hmmm this is weird. The logs show that the swapfile creation command succeeds (for RHEL 7.7 cloud-init 18.4).22:59
johnsonshi2020-02-19 22:23:35,338 - cc_mounts.py[DEBUG]: swap file /mnt/resource/swapfile exists, but not in /proc/swaps22:59
johnsonshi2020-02-19 22:23:35,338 - util.py[DEBUG]: Running command ['sh', '-c', 'rm -f "$1" && umask 0066 && { fallocate -l "${2}M" "$1" ||  dd if=/dev/zero "of=$1" bs=1M "count=$2"; } && mkswap "$1" || { r=$?; rm -f "$1"; exit $r; }', 'setup_swap', '/mnt/resource/swapfile', '2048'] with allowed return codes [0] (shell=False, capture=True)22:59
johnsonshi2020-02-19 22:23:35,400 - util.py[DEBUG]: creating swap file '/mnt/resource/swapfile' of 2048MB took 0.062 seconds22:59
johnsonshiAfter rebooting (cloud-init clean -lr) as well as a fresh provision of another machine, the file still was not being created (despite cloud-init.log saying that it is created). When I ssh'ed into the machine and ran the command above, the command succeeded and the swapfile was created.23:00
johnsonshiI just asked some of my teammates (I am in the Azure Linux provisioning team), and they said that we'd have to use this  specific version of cloud-init due to a host of reasons.23:00
Odd_Blokejohnsonshi: What FS is /mnt/resource?23:02
Odd_Blokejohnsonshi: And is /mnt/resource ephemeral?  I think cloud-init is going to assume that if it creates a swapfile, it stays there.23:03
johnsonshiOdd_Bloke: ext4. Yeah it is an ephemeral disk.23:04
johnsonshiOdd_Bloke: Oddly enough, when I included a runcmd directive, the swapfile was created.23:04
Odd_Blokejohnsonshi: So that shell snippet has been replaced by Python code in cloud-init master.23:05
Odd_BlokeSo a one-off test with a recent daily build would give us some more information about what's going on here.23:06
Odd_BlokeBecause if the completely rewritten code _also_ exhibits failure then my guess is that this isn't actually directly related to the swap creation, and instead it's something more complex.23:07
Odd_Blokejohnsonshi: I'm finishing up my day now, I'm afraid.  If you can reproduce this with cloud-init dailies, then please file a bug.  If it only exhibits in the version you're pinned on, then I think you'll need to talk to your team about how you've handled such issues in the past.23:08
Odd_Bloke(And have a good evening once you're done with your day too, of course. :)23:08
johnsonshiOdd_Bloke: OK I will try that out. Thanks for your help and have a good evening too! :)23:09

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!