=== tds1 is now known as tds === vrubiolo1 is now known as vrubiolo === vrubiolo1 is now known as vrubiolo [15:47] Heads up: GitHub is having availability issues: https://www.githubstatus.com/ [17:55] powersj: blackboxsw: So do we have a release process that I should be following? (I don't _think_ I've cut an upstream release before.) [17:55] Odd_Bloke: I added you to the trello board [17:56] https://trello.com/b/obFWGwZY/upstream-cloud-init-201 [17:56] I copied it yesterday from the template board https://trello.com/b/QQYFXpsA/template-sru-cloud-init-xy and deleted any "SRU" cards [17:56] Status update on 20.1: we've tested the change that we were worried about and it does not appear to be causing issues [17:56] blackboxsw: Cool, thanks! [17:56] rharper: Where are we on the IMDS change you wanted to land for 20.1? [17:57] well, it's not a one liner =/ [17:57] writing a unittest (augmenting one to ensure we do the redaction) [17:57] Odd_Bloke: and the SRU process for that trello board template came originally from https://github.com/CanonicalLtd/uss-tableflip/blob/master/doc/upstream_release_process.md [17:57] rharper: OK, but in your view it's close enough that it's worth waiting for? [17:58] it should be; but since it's not a one liner; I don't suppose we should wait [18:08] rharper: I'm going to go grab lunch, so you have a bit of time to get something up for review. :) [19:15] Odd_Bloke: https://github.com/canonical/cloud-init/pull/219 [19:21] minor volley back rharper [19:33] I might have to re-review the token interaction again. jussec [19:36] yeah X-aws-ec2-metadata-token-ttl-seconds header is just a seconds value, non-secret. so no need to redact [19:37] blackboxsw: true but we don't need the value logged do we? [19:40] I've replied; having the value won't aide in debugging; and if we lower it, we just end up obtaining a new token more frequently; [19:41] I guess we would if at some point we reduce the hardcoded AWS_TOKEN_TTL_SECONDS which afair I thought we would intend at at some point. [19:42] fair rharper I concede it's not a big deal, just recognizing that we are redacting things that aren't necessarily sensitive, by that measure where do we draw the line. [19:57] I have a couple more nits. we're doing a lot more work in that function than we need with every retry. [19:58] will finish my review in ~30. have to afk for a bit [20:11] rharper: blackboxsw: OK, that lunch break ended up longer than anticipated; I see that PR is Approved so I'll wait for it before cutting the release. [20:11] back as well now. a minor set of review comments to followup with [20:12] throwing together a paste to see what you guys think [20:28] blackboxsw: on my errand I was thinking I'd like to see if I can just copy the redacted header value instead of all headers; [20:30] Odd_Bloke: rharper here's what I was thinking https://paste.ubuntu.com/p/NXXXyMZKFH/ . I think we should be hoisting the headers_cb calculation out of the retry for loop as those headers should not change across calls to the same url. [20:30] blackboxsw: well, that's a separate fixup no ? [20:31] blackboxsw: an alternative approach I thought about was having the headers_cb return the headers and a list of keys to be redacted [20:31] as it is currently in the function, we are re-calculating and copying the dict of headers on each run. with rharper's change it gets more costly as we are now extracting each header on each retry and comparing it to the redacted list on every iteration of the liip [20:31] loop [20:31] and I guess we don't pass back to headers_cb that we're retrying it just gets called a second time [20:31] rharper: that'd work, but it's also doing that work on every single retry against a url. [20:32] so even if headers_cb wanted to vary based on retry it doesn't know [20:32] right, it just gets called again and again [20:32] soliciting the same response [20:32] well, I like you suggestion but it seems like an optimization [20:32] rharper: it does seem like an optimization. [20:32] and if I push this copy down to only if the key is in redacted list; we're copying a very small amount of data [20:33] rharper: well yes, we are copying a small amount and settting up a pointer to every key in req_args. [20:36] blackboxsw: for now, the only time that it's hitting is if we do a retry anyhow; the happy path never does it more than once; but I generally like it; [20:36] right, I think the optimization can wait. I'll add the paste there for reference in case we want to go that route in the future. [20:38] I'm not really following here, but dict operations are generally _very_ optimised in Python (because it's dicts all the way down, internally :p) so I would be surprised if this is a substantial slowdown. Which is to say: I think we can land as-is and then follow with the optimisation. [20:39] Odd_Bloke: I'm pushing up two changes in just a second, one blackboxsw wanted the string moved to module variable; and the other is to push the copy down to the matched keys rather than copying all headers even if we don't have anything to redact [20:40] yep I've pasted that comment to the PR and set approved from my end [20:41] rharper: OK, nice. [20:56] Odd_Bloke, how's 20.1? [20:59] powersj: We're landing this IMDS change, then cutting it. [21:00] blackboxsw: Your Approved is from before rharper's latest push; are you still +1? [21:00] I'm still +1 Odd_Bloke [21:00] yeah looks good [21:01] Cool, merged. [21:01] heh ohh well. [21:02] we didn't really need to deepcopy the entry and then reset the value to REDACTED [21:05] Odd_Bloke: rharper not needed for 20.1, but here's a low-hanging-fruit review. sorry I just missed that [21:05] https://github.com/canonical/cloud-init/pull/220 [21:06] hrm strike that. [21:06] shame on me [21:06] hehe [21:06] yes we did, otherwise we'd overwrite the actual headers [21:06] it would have had this done a lot faster if I knew I was modifying my req_args [21:06] and then we never pass tests... and the execption callback runs *forever* [21:06] it's inifinite [21:06] so I sat there... for some tim e [21:06] heh [21:06] before realizing what was happening [21:06] yeah I was wondering the same for me. oopsie [21:07] closing that PR [21:07] even after I realized that headers were a key *in* all of the args [21:13] powersj: rharper: blackboxsw: So I just opened up the card for 20.1 (I used shortcuts to assign and move it earlier), and it calls for us to land https://github.com/canonical/cloud-init/pull/54 before cutting. [21:13] What do we think? [21:14] that needs to come after [21:14] I don't want to land that right before focal goes out [21:14] After what? [21:14] well [21:14] it's non master [21:14] it's on the xenial branch only [21:14] oh right [21:14] rharper: hrm. gonna need a second glance at that..... we are in the middle of the for loop where we are setting filtered_reg_args[k] = value... if we aren't copying over filtered_req_args['headers'] = req_args[k], then we don't have to deepcopy I don't think. Again, doesn't matter for 20.1, but I do think again we can drop that [21:14] in which case, not an issue for 20.1, it is for the SRU [21:15] OK, so that's an SRU thing, cool. [21:15] blackboxsw: I was thinking that was correct; it's just a single dict entry; it *could* have a value of a dict; but I think we can reason that header values cannot be dict [21:16] blackboxsw: do we know if we pay a heavier cost calling deepcopy() when we don't need it with such a small dict ? [21:16] * rharper smells a timeit coming on [21:16] rharper: ahh your branch didn't have my local changes. I did the if k != 'headers': filtered_req_args[k] = v else: filtered_req_args[k] = REDACTED [21:17] I wasn't setting filtered_req_args[k] = v if we were dealing with a 'headers' key [21:17] rharper: +1 on headers values also being a dict and being concerned about not blowing that away [21:18] but, I think our logic initially path setting that filtered_req_args[k] = v and then dropping into if k == 'headers' to redact is what causes the need for the copy. [21:18] if we didn't set that in the first place, we wouldn't have to copy and overwrite [21:19] anyway, not for 20.1 but I'll reopen the appropriate cleanup for us to look at for later [21:22] and yes might as well add a timeit to the PR once up :) [21:25] blackboxsw: ok, I can see a way without copy [21:25] https://paste.ubuntu.com/p/2XQF6CPntK/ [21:26] however, I wonder if it's really faster than copying the headers dict [21:26] https://github.com/canonical/cloud-init/pull/220 [21:26] for small sized dict; [21:27] https://github.com/canonical/cloud-init/pull/221 [21:27] yeah not much performance improvement I don't think [21:28] yeah and now we've spent more time on this than we'll ever likely get back in performance runs ;) [21:28] sorry about the rathole there [21:29] especially given that our only case of redacting is a simple string value and not a large dict of values [22:19] According to https://cloudinit.readthedocs.io/en/latest/topics/modules.html#mounts , "Swap files can be configured by setting the path to the swap file to create with filename". The docs imply that cloud-init creates a swap file at the specified filename if using the swap module. [22:20] blackboxsw: So I'm looking at the release process, and it assumes that I can push the commit I've created locally to master. We, of course, can't do that with branch protection. We can either revise the docs to document disabling branch protection temporarily (probably not?), or we can update things so that the tagging of the release happens _after_ merge (via GH button) into master. However, I think [22:20] that will cause CI to fail, because it will look for a 20.1 tag (to determine versions) and fail. [22:20] My swap settings contain: [22:20] ... [22:20] swap: [22:20] filename: /mnt/resource/swapfile [22:20] ... [22:20] ... [22:20] When the machine is provisioned, cloud-init fails to create the swapfile. The cloud-init.log file shows that the "swapon -a" command failed. When I ssh into the machine and try running "swapon -a" myself, it says that "swapon: stat failed /mnt/resource/swapfile: No such file or directory". [22:20] However, I can confirm that the /mnt/resource path exists and is accessible, and I am able to read/write files within /mnt/resource. I can also see that cloud-init added the swap entry to /etc/fstab. [22:20] When I look through the cloud-init code, I can see cloud-init adds the swap entry to /etc/fstab if it exists, then runs "swapon -a" without creating the swapfile. [22:20] Is cloud-init supposed to create the swapfile by if it is instructed to do so (the docs seem to imply that)? [22:22] johnsonshi: I believe you need to specify "size" for the creation to occur. [22:22] Odd_Bloke: doc reference? [22:22] blackboxsw: https://github.com/CanonicalLtd/uss-tableflip/blob/master/doc/upstream_release_process.md [22:22] thnaks [22:23] blackboxsw: Specifically "Note: you need to push the tag or c-i will fail in check version." [22:23] Odd_Bloke: we have the ability to git push tags [22:23] right? [22:23] blackboxsw: But we don't have the commit to tag until after merge. [22:23] It never exists on my system until then. [22:24] Which means the 20.1 tag I have pushed (until I realised the conflict) points at the commit which will be squashed into master (and then discarded, of course). [22:24] ahh right, but we will merge you MP into master once you've bumped version to 20.1 right [22:24] Right. [22:25] swap: [22:25] filename: /mnt/resource/swapfile [22:25] size: 2147483648 [22:25] I tried it again and the swapfile still isn't being created. [22:25] johnsonshi: Could you file a bug with `cloud-init collect-logs` attached to it and drop the link to it back in here? [22:25] cc_mounts.py does not seem to have code that creates a swapfile? [22:25] Odd_Bloke: ok, hrm, but your branch did pass ci https://github.com/canonical/cloud-init/pull/222 [22:25] blackboxsw: Right, I pushed the tag. [22:25] * blackboxsw checks it out again [22:25] As instructed. [22:26] with branch restrictions reduced? [22:26] But that tag points at a commit which will never be in master. [22:26] or tag push worked [22:26] Tag push is fine. [22:26] ahh ok. gotcha [22:26] Odd_Bloke: Sure thing. So missing swapfile creation is a bug due to the code not being in cloud-init? [22:26] not sure if that's a broken process that should be fixes. stale tags pointing to nowhere for previous releases [22:26] johnsonshi: What version of cloud-init are you using? [22:27] Odd_Bloke: The distro I am using it with (RHEL 7.7) comes with cloud-init 18.5. [22:28] Yeah, that should definitely have swap support, I don't know why you wouldn't be seeing that in cc_mounts. [22:28] There have been changes since 18.4, however, so you might be hitting an existing bug. [22:28] *18.5 [22:28] johnsonshi: Are you able to try again with 19.4 somehow? [22:30] blackboxsw: It looks like I fixed up the 19.4 tag on the day it happened. [22:30] blackboxsw: (And 19.3 predated the GH move, I think?) [22:30] johnsonshi: any of the copr dev builds work for you? https://copr.fedorainfracloud.org/coprs/g/cloud-init/cloud-init-dev/ [22:31] Odd_Bloke: you are right . ok so, branch restrictions causing some issues then with upstream releases. [22:31] Odd_Bloke: wait a sec [22:31] Yeah I'll try out 19.4 [22:31] Thanks! [22:32] johnsonshi: Actually, if you're able, there have been further changes since 19.4, so a test of a daily build would be best. (That cloud-init-dev link has those, so you're already doing that if that's how you're going to do it. :) [22:33] Thanks! [22:33] blackboxsw: I'm thinking that retagging after is probably easier in the short term. [22:34] But I think I'm going to remove the tag and re-run CI, to see if we actually need it before at all. [22:34] Just in case that the move to Travis has happened to obviate that need. [22:35] +1 Odd_Bloke [22:36] johnsonshi: from I can tell, the additional paths to your file are the issue; a work around would be to either use file in an existing directory; or use bootcmd to run an mkdir -p [22:36] and the fix is for cloud-init to use util.ensure_file() on the swapfile path [22:37] powersj: FYI, we've run into some release tooling issues with the migration to GitHub. I'm currently pursuing getting it right this time, so we don't have to revisit it, because I'm not aware of any pressing need for 20.1 to be out the door. Am I missing some info, or are you happy for me to proceed along this path? [22:38] Odd_Bloke, can you estimate how long this delay would take? [22:39] powersj: Tomorrow instead of tonight. [22:39] that's fine [22:59] Odd_Bloke: Hmmm this is weird. The logs show that the swapfile creation command succeeds (for RHEL 7.7 cloud-init 18.4). [22:59] 2020-02-19 22:23:35,338 - cc_mounts.py[DEBUG]: swap file /mnt/resource/swapfile exists, but not in /proc/swaps [22:59] 2020-02-19 22:23:35,338 - util.py[DEBUG]: Running command ['sh', '-c', 'rm -f "$1" && umask 0066 && { fallocate -l "${2}M" "$1" || dd if=/dev/zero "of=$1" bs=1M "count=$2"; } && mkswap "$1" || { r=$?; rm -f "$1"; exit $r; }', 'setup_swap', '/mnt/resource/swapfile', '2048'] with allowed return codes [0] (shell=False, capture=True) [22:59] 2020-02-19 22:23:35,400 - util.py[DEBUG]: creating swap file '/mnt/resource/swapfile' of 2048MB took 0.062 seconds [23:00] After rebooting (cloud-init clean -lr) as well as a fresh provision of another machine, the file still was not being created (despite cloud-init.log saying that it is created). When I ssh'ed into the machine and ran the command above, the command succeeded and the swapfile was created. [23:00] I just asked some of my teammates (I am in the Azure Linux provisioning team), and they said that we'd have to use this specific version of cloud-init due to a host of reasons. [23:02] johnsonshi: What FS is /mnt/resource? [23:03] johnsonshi: And is /mnt/resource ephemeral? I think cloud-init is going to assume that if it creates a swapfile, it stays there. [23:04] Odd_Bloke: ext4. Yeah it is an ephemeral disk. [23:04] Odd_Bloke: Oddly enough, when I included a runcmd directive, the swapfile was created. [23:05] johnsonshi: So that shell snippet has been replaced by Python code in cloud-init master. [23:06] So a one-off test with a recent daily build would give us some more information about what's going on here. [23:07] Because if the completely rewritten code _also_ exhibits failure then my guess is that this isn't actually directly related to the swap creation, and instead it's something more complex. [23:08] johnsonshi: I'm finishing up my day now, I'm afraid. If you can reproduce this with cloud-init dailies, then please file a bug. If it only exhibits in the version you're pinned on, then I think you'll need to talk to your team about how you've handled such issues in the past. [23:08] (And have a good evening once you're done with your day too, of course. :) [23:09] Odd_Bloke: OK I will try that out. Thanks for your help and have a good evening too! :)