[13:06] <ms7821> hi, anyone know how cloud-init calculates metrics? It claims to base them on the adapter ID, but I've ended up with a broken AWS instance because eth0 had metric=200 and eth1 metric=100
[13:07] <ms7821> editing netplan/50-cloud-init.yaml fixed this, and persists across reboots/runs of cloud-init
[13:07] <ms7821> so it looks a lot like metric calculation isn't idempotent
[13:09] <ms7821> this looks suspiciously like it's basing the metric on the MAC https://github.com/canonical/cloud-init/blob/6600c642af3817fe5e0170cb7b4eeac4be3c60eb/cloudinit/sources/DataSourceEc2.py#L766
[13:16] <ms7821> ahh, https://github.com/canonical/cloud-init/pull/114 fixes it by the looks of it
[13:18] <ms7821> which is odd given I have the latest version
[13:21] <Odd_Bloke> ms7821: The "latest version" on which distro?
[13:23] <ms7821> EC2's Ubuntu 20.04 LTS
[13:23] <ms7821> $ cloud-init -v gives 20.1-10-g71af48df-0ubuntu5
[13:23] <ms7821> which should include that PR
[13:24] <Odd_Bloke> Yep, it should.  So perhaps this is a separate bug?  Could you file one at https://bugs.launchpad.net/cloud-init/+filebug and include the tarball that `cloud-init collect-logs` on a failing instance (if you can get access somehow)?
[13:24] <ms7821> ohh wait maybe it doesn't
[13:25] <Odd_Bloke> It does, it was included in https://launchpad.net/ubuntu/+source/cloud-init/20.1-10-g71af48df-0ubuntu3
[13:31] <rharper> ms7821: it would be really good to have cloud-init collect-logs on your failing instance and the rendered netplan file;  prior to the PR you linked, we did not renderer route-metric on secondary nics;  rather we did not render config for any secondary nics in AWS;
[13:32] <ms7821> ahh, perhaps I've misread the PR then. maybe it's *caused* the bug
[13:32] <rharper> possibly, yes
[13:32] <ms7821> I read "On Ec2, cloud-init will configure networking on all attached NICs, instead of just the primary NIC." as the current state of affairs
[13:33] <rharper> yes,  and that was added in Focal only;  older releases do not yet do that by default;
[13:33] <ms7821> so I probably can't share the collect-logs but I'm pretty sure sorting by mac is wrong
[13:33] <rharper> we're not sorting by mac
[13:33] <ms7821> it should use meta-data/network/interfaces/macs/*/device-number
[13:33] <rharper> we rely on the order provided by the instance metadata
[13:34] <Odd_Bloke> ms7821: Can't collect, or can't share for non-technical reasons?  (We can help work around the former, not so much the latter. ;)
[13:34] <ms7821> I mean if I share it I need to rebuild this instance which has taken me a couple of days to troubleshoot already
[13:35] <ms7821> should it be creating a default route for secondary interfaces at all?
[13:35] <rharper> that's what the route metric is for; it's provided in the instance metadata
[13:36] <ms7821> OK, looking at this PR, it deleted and reintroduces the line I linked above
[13:36] <ms7821> deletes*
[13:36] <rharper> ms7821: I think I see where the issue is w.r.t obtaiing the device-number
[13:37] <ms7821> that line where it derives nic_idx from sorting by the macs is wrong
[13:37] <rharper> the macs-to-nics list index isn't what we want at all;  it should have been the device-number set in metadta
[13:37] <rharper> ms7821: indeed
[13:37] <rharper> it's fine for us to enumerate in any order but the index value we assign should come from the instance metadata value
[13:42] <Odd_Bloke> rharper: In your opinion, should we land a fix for this in master before kicking off SRUs?
[13:42] <rharper> Odd_Bloke: yes
[13:42] <rharper> I think our multi-nic testing should/would have found this
[13:42] <Odd_Bloke> rharper: Shall I start looking into it, or are you on your way already?
[13:42] <rharper> Odd_Bloke: please do
[13:42] <rharper> nic_idx = int(nic_metadata['device-number']) + 1  # zero-based
[13:42] <rharper> that's basically what we need to do instead
[13:43] <rharper> so, device-number: 0  will have the lowest metric and all secondary nics+ are higher;
[13:44] <Odd_Bloke> blackboxsw: FYI, ^ is a blocker on starting the SRU process for 20.2
[13:46] <Odd_Bloke> rick_h: (FYI too. :)
[13:50] <ms7821> looks like the macs are static and ordered in test_ec2.py, which is why it won't have been picked up
[13:50] <ms7821> (on the plus side, this will usually be the case so I expect very few people will see it)
[13:51] <Odd_Bloke> Yep, I just changed that test data to reproduce this particular issue.
[13:51] <Odd_Bloke> ms7821: Would you like to file a bug for the issue (even without collect-logs)?  (If not, I will. :)
[13:51] <ms7821> sure, can do
[13:52] <Odd_Bloke> Thanks!
[14:10] <ms7821> https://bugs.launchpad.net/cloud-init/+bug/1876312
[14:10] <ms7821> thanks for picking this up so quickly!
[14:20] <rharper> ms7821: thanks for reporting it in here
[14:21] <rick_h> Odd_Bloke:  sorry was in a call. /me reads back
[14:23] <rick_h> ah good find then and thanks ms7821
[14:23] <Odd_Bloke> rick_h: No problem (and no action from you required), just keeping you in the loop.
[14:38] <Odd_Bloke> rharper: https://github.com/canonical/cloud-init/pull/342
[15:15] <bpo> (Hopefully) quick issue for someone who knows Sphinx/RST better than I do: the link for 'analyze' in the first sentence of this section should link to another page (the top-level 'analyze' label) but instead links to the 'cli_analyze' label within the CLI page: https://cloudinit.readthedocs.io/en/latest/topics/cli.html#analyze -- when building the
[15:15] <bpo> docs there are several warnings about duplicate label names which may point to the issue.
[15:16] <rharper> bpo: thanks
[15:49] <LongLiveCHIEF> is it acceptable to submit documentation issues for cloud-init on launchpad?
[15:50] <powersj> LongLiveCHIEF, please do
[15:50] <blackboxsw> +1 LongLiveCHIEF
[15:50] <LongLiveCHIEF> :thumbsup:
[16:00] <LongLiveCHIEF> like this? https://bugs.launchpad.net/cloud-init/+bug/1876333
[16:30] <Odd_Bloke> bpo: FYI, I filed https://bugs.launchpad.net/cloud-init/+bug/1876323 for the analyze issue you were seeing yesterday.
[16:31] <Odd_Bloke> bpo: Would you mind filing a bug for that doc issue you identified?
[16:31] <bpo> Sure, happy to.
[16:32] <Odd_Bloke> Thanks!
[16:33] <bpo> Odd_Bloke: do you think this issue with the arg0Formatter's interaction with cloud-init should be raised separately with AWS? Or is it something the cloud-init project will address
[16:44] <bpo> doc issue filed here: https://bugs.launchpad.net/cloud-init/+bug/1876341
[17:18] <Odd_Bloke> Thanks for the bug!
[17:19] <Odd_Bloke> bpo: The log format _is_ configurable, so we need to figure out a better way of handling this in general.  In the short term, I'll look into fixing analyze for this particular log format, but in the medium term we may want to prefer using journalctl (which allows us to specify the format we want, circumventing the problem of what's on disk entirely) or something else along those lines.
[17:20] <bpo> Makes sense. Reading straight from journald would be great!
[17:21] <Odd_Bloke> (Obviously with a fallback to the log file if we don't have journald, in case anyone reading along is worried. :)
[17:51] <Odd_Bloke> blackboxsw: So I've attempted to run through the manual process AIUI, and it hasn't worked for me.  Can you review https://paste.ubuntu.com/p/HBbVmcf5BY/ and let me know if I did the wrong thing (or omitted steps)?
[17:54] <Odd_Bloke> blackboxsw: (Just noticed I omitted it, but those commands were all run on the ubuntu/daily/devel branch.)
[18:08] <blackboxsw> Odd_Bloke: will do
[19:15] <blackboxsw> Odd_Bloke: ok I see the problem in the logic there. hrm.  ok might have to go back to the drawing board. I'm toying with a couple of scenarios to make sure it's viable for multiple cherry picks. I expected we'd have to do a manual push to u/daily/devel to get this in the right state. but adding more cherry-picks might always hit those conflicts related to a missing debian/patches/series file each time we merge a
[19:15] <blackboxsw> new cherry pick/patch.
[19:16] <Odd_Bloke> blackboxsw: I have a method that did work: https://paste.ubuntu.com/p/kWqny27sHm/
[19:18] <blackboxsw> nice, though I'm trying to resolve that missing series file. I don't think this was the case if we were merging u/devel -> u/daily/devel. We shouldn't have to resolve those conflicts each time we uss-tableflip/cherry-pic
[19:20] <blackboxsw> so Odd_Bloke what happened on your local ubuntu/devel then if you then ran uss-tableflip/scripts/cherry-pick 25698b144e3b6548ffc29ab14bed1882242b161a
[19:21] <blackboxsw> I'd expect more errors on the ubuntu/daily/devel 'git cherry-pick` that gets called too
[19:21] <blackboxsw> s/errors/conflicts
[19:22] <Odd_Bloke> Well, ubuntu/daily/devel shouldn't have a series file, because we've reverted all the patches.
[19:22] <Odd_Bloke> But ubuntu/devel will have one (if any cherry-picks have happened, of course).
[19:23] <blackboxsw> right, which I think also means that u/daily/devel cannot git cherry-pick <a debian patch commitish u/devel> after we've already reverted the first cherry-pick
[19:24] <blackboxsw> so every time our u/devel gets past 1 cherry-pick-debian-patch, then u/daily/devel can no longer git cherry-pick the 2nd debian patch into u/daily/devel because the debian/patches/series file is already absent
[19:25] <rharper> doesn't daily/devel come from a previous series; don't we always have at least *one* patch in the series file ?
[19:25] <Odd_Bloke> It's absent and that's different to ubuntu/devel (where it exists with N-1 lines, if this is the Nth cherry-pick).
[19:25] <blackboxsw> rharper: not once we git revert that original cherry-pick from u/daily/devel. then series is gone
[19:25] <rharper> can you have comments in series files ?
[19:26] <rharper> can we have an empty file ?
[19:26] <blackboxsw> yes, but still git conflicts because the diff won't match right
[19:26] <Odd_Bloke> Either way the content will be different between the two branches.
[19:26] <Odd_Bloke> So git is going to ask us how it should resolve that difference.
[19:27] <Odd_Bloke> I don't think this is a _huge_ problem: the daily branch allows us to resolve these conflicts _once_ then push it.
[19:29] <Odd_Bloke> rharper: (FYI, responded to your comments on https://github.com/canonical/cloud-init/pull/342)
[19:33] <blackboxsw> hrm. Odd_Bloke I'm trying to get back to the  reason we care about only merging  u/daily/devel back into u/devel and why we don't want to just force push to u/daily/devel and recipe biulds build from there directly
[19:34] <Odd_Bloke> blackboxsw: Force push what to u/daily/devel?
[19:34] <blackboxsw> if we simplified things a bit, and just force pushed u/devel -> u/daily/devel and added a single commit to reverse patch all debian/patches/cpick* files or rm debian/patches/cpick*, then aren't we building what  we care about
[19:35] <Odd_Bloke> We were trying to avoid having to update another branch during Ubuntu releases.
[19:36] <Odd_Bloke> And if we're prepping debian/* changes in ubuntu/devel for the next upload, then we would need to remember to manually sync the daily branch./
[19:36] <blackboxsw> so each time we uss-tableflip/cherry-pick the cherry-pick tool of ours would force push u/devel -> u/daily/devel for us and revert any debian/patches/cpicks
[19:36] <Odd_Bloke> If we're only building from u/daily/devel, then we need to constantly keep u/devel pushed to u/daily/devel though.
[19:37] <Odd_Bloke> Not just on cherry-pick.
[19:37] <blackboxsw> hrm darn right
[19:38] <blackboxsw> thanks for the refresh there. you're right.
[19:38] <Odd_Bloke> No worries, there are a bunch of moving pieces, it is tricky.
[19:39] <blackboxsw> Odd_Bloke: if uss-tableflips/cherry-pick kept a running ordered list of the commits it pushed/popped, it'd be able to re-apply them all and properly revert the latest (it was my earlier iteration on the cherry-pick PR)
[19:40] <blackboxsw> so we could reapply all cpicks  to u/daily/devel then revert all in reverse order
[19:40] <blackboxsw> then we wouldn't have conflcts
[19:40] <blackboxsw> just a bit of extra noise on u/daily/devel branch with the re-apply all, revert all
[19:40] <Odd_Bloke> I _think_ git wouldn't recognise those re-applied cherry-picks as being the same as the cherry-picks applied to u/devel.
[19:41] <blackboxsw> I mean we'd track the local revert commitish on u/daily/devel an revert the revert kinda thing
[19:41] <Odd_Bloke> blackboxsw: Instead of thinking about how to automate it, try runnig through your proposal manually and see what happens. :)
[19:41] <blackboxsw> that sounds good will have a past
[19:41] <blackboxsw> paste
[19:41] <Odd_Bloke> Thanks!
[19:42] <rharper> Odd_Bloke: lol, https://bugs.launchpad.net/bugs/1876363
[19:43] <rharper> this showed up in the inbox *just* as I approved 342 =)
[19:51] <Odd_Bloke> HAH
[21:01] <Goneri> blackboxsw, I've refreshed my patch to use parameterize: https://github.com/canonical/cloud-init/pull/305
[21:24] <blackboxsw> Odd_Bloke: finally got it
[21:24] <blackboxsw> though too little to late I presume. paste coming
[22:07] <blackboxsw> Odd_Bloke: rharper for monday/tuesday https://paste.ubuntu.com/p/jPxsJtQMmg/
[22:08] <blackboxsw> it involves us reseting ubuntu/daily/<series> each time we cherry-pick a change, since this branch is only intended to revert cpicks for daily builds, I don't see the harm in us resetting and force pushing u/daily/<series> when we have a cherry-pick to perform. This would be the only time we'd update u/daily/<series> branches
[22:10] <blackboxsw> it also avoids having to maintain two branches every commit that lands in ubuntu/<series> branches and it avoids merge conflicts as  u/daily/<series> would only get updates to exactly what ubuntu/<series> has exactly when a cherry-pick is added.