[00:03] -queuebot:#ubuntu-release- New binary: iotjs [arm64] (groovy-proposed/universe) [1.0-2] (no packageset) [00:03] -queuebot:#ubuntu-release- New binary: iotjs [armhf] (groovy-proposed/universe) [1.0-2] (no packageset) [00:03] -queuebot:#ubuntu-release- New binary: gnss-sdr [arm64] (groovy-proposed/universe) [0.0.13-1] (no packageset) [00:06] -queuebot:#ubuntu-release- New binary: gnss-sdr [armhf] (groovy-proposed/universe) [0.0.13-1] (no packageset) [00:06] ddstreet: no more things to do ATM. We have point releases to ship! Or why does 1832754 must be fixed on the point release media? [00:20] Ok, I'll be kicking the .1 release candidate images now before going to sleep [00:23] 👍 [00:23] -queuebot:#ubuntu-release- New binary: agda [amd64] (groovy-proposed/universe) [2.6.1-1] (no packageset) [00:25] sil2100: vorlon: I wonder if for bionic we could accept "no change rebuild" of systemd with a version number that does not supersed the full-sru in the unapproved queue. Such that kmod migrates, and such that bionic d-i can be built. [00:27] xnox: that sounds reasonable I must say - I didn't check what exact fixes are in the systemd upload in Unapproved, but seeing that the test case is hard to establish for at least one of the fixes there, maybe it's better to go this way [00:45] -queuebot:#ubuntu-release- New binary: agda [ppc64el] (groovy-proposed/universe) [2.6.1-1] (no packageset) [00:54] -queuebot:#ubuntu-release- Builds: Ubuntu Desktop amd64 [Focal 20.04.1] has been updated (20200730) [00:58] -queuebot:#ubuntu-release- Builds: Ubuntu Budgie Desktop amd64 [Focal 20.04.1] has been updated (20200730) [01:00] -queuebot:#ubuntu-release- Builds: Kubuntu Desktop amd64 [Focal 20.04.1] has been updated (20200730) [01:00] -queuebot:#ubuntu-release- Builds: Ubuntu Server arm64 [Focal 20.04.1] (20200730) has been added [01:00] -queuebot:#ubuntu-release- Builds: Ubuntu Server amd64 [Focal 20.04.1] (20200730) has been added [01:03] -queuebot:#ubuntu-release- Builds: Ubuntu Base amd64 [Focal 20.04.1] has been updated (20200730) [01:03] -queuebot:#ubuntu-release- Builds: Ubuntu Base arm64 [Focal 20.04.1] has been updated (20200730) [01:03] -queuebot:#ubuntu-release- Builds: Ubuntu Base armhf [Focal 20.04.1] has been updated (20200730) [01:03] -queuebot:#ubuntu-release- Builds: Ubuntu Base ppc64el [Focal 20.04.1] has been updated (20200730) [01:03] -queuebot:#ubuntu-release- Builds: Ubuntu Base s390x [Focal 20.04.1] has been updated (20200730) [01:06] -queuebot:#ubuntu-release- Builds: Lubuntu Desktop amd64 [Focal 20.04.1] has been updated (20200730) [01:06] -queuebot:#ubuntu-release- Builds: Ubuntu Kylin Desktop amd64 [Focal 20.04.1] has been updated (20200730) [01:08] -queuebot:#ubuntu-release- Builds: Ubuntu MATE Desktop amd64 [Focal 20.04.1] has been updated (20200730) [01:08] -queuebot:#ubuntu-release- Builds: Ubuntu Server arm64+raspi [Focal 20.04.1] (20200730) has been added [01:08] -queuebot:#ubuntu-release- Builds: Ubuntu Server armhf+raspi [Focal 20.04.1] has been updated (20200730) [01:09] -queuebot:#ubuntu-release- New binary: agda [s390x] (groovy-proposed/universe) [2.6.1-1] (no packageset) [01:09] -queuebot:#ubuntu-release- Builds: Xubuntu Desktop amd64 [Focal 20.04.1] has been updated (20200730) [01:16] -queuebot:#ubuntu-release- Builds: Ubuntu Server Subiquity amd64 [Focal 20.04.1] has been updated (20200730) [01:16] -queuebot:#ubuntu-release- Builds: Ubuntu Server Subiquity arm64 [Focal 20.04.1] has been updated (20200730) [01:16] -queuebot:#ubuntu-release- Builds: Ubuntu Server Subiquity ppc64el [Focal 20.04.1] has been updated (20200730) [01:16] -queuebot:#ubuntu-release- Builds: Ubuntu Server Subiquity s390x [Focal 20.04.1] has been updated (20200730) === s8321414_ is now known as s8321414 [01:34] -queuebot:#ubuntu-release- Builds: Ubuntu Studio DVD amd64 [Focal 20.04.1] has been updated (20200730) [02:12] -queuebot:#ubuntu-release- New binary: gnss-sdr [riscv64] (groovy-proposed/universe) [0.0.13-1] (no packageset) [05:29] -queuebot:#ubuntu-release- New binary: libgtextutils [riscv64] (groovy-proposed/universe) [0.7-7] (no packageset) [05:49] -queuebot:#ubuntu-release- New binary: libgtextutils [ppc64el] (groovy-proposed/universe) [0.7-7] (no packageset) [05:50] -queuebot:#ubuntu-release- New binary: gnucap-python [ppc64el] (groovy-proposed/universe) [0.0.2-1.2] (no packageset) [05:52] -queuebot:#ubuntu-release- New binary: gnucap-python [riscv64] (groovy-proposed/universe) [0.0.2-1.2] (no packageset) [06:18] -queuebot:#ubuntu-release- New binary: gnucap-python [amd64] (groovy-proposed/universe) [0.0.2-1.2] (no packageset) [06:19] -queuebot:#ubuntu-release- New binary: libgtextutils [amd64] (groovy-proposed/universe) [0.7-7] (no packageset) [06:24] -queuebot:#ubuntu-release- New binary: gnucap-python [s390x] (groovy-proposed/universe) [0.0.2-1.2] (no packageset) [06:25] -queuebot:#ubuntu-release- New binary: libgtextutils [s390x] (groovy-proposed/universe) [0.7-7] (no packageset) [07:19] -queuebot:#ubuntu-release- New binary: libgtextutils [arm64] (groovy-proposed/universe) [0.7-7] (no packageset) [07:20] -queuebot:#ubuntu-release- New binary: gnucap-python [arm64] (groovy-proposed/universe) [0.0.2-1.2] (no packageset) [07:23] -queuebot:#ubuntu-release- New binary: gnucap-python [armhf] (groovy-proposed/universe) [0.0.2-1.2] (no packageset) [07:25] -queuebot:#ubuntu-release- New binary: libgtextutils [armhf] (groovy-proposed/universe) [0.7-7] (no packageset) [08:02] -queuebot:#ubuntu-release- Unapproved: apport (focal-proposed/main) [2.20.11-0ubuntu27.4 => 2.20.11-0ubuntu27.5] (core, i386-whitelist) [09:07] -queuebot:#ubuntu-release- New: accepted gnucap-python [amd64] (groovy-proposed) [0.0.2-1.2] [09:07] -queuebot:#ubuntu-release- New: accepted gnucap-python [armhf] (groovy-proposed) [0.0.2-1.2] [09:07] -queuebot:#ubuntu-release- New: accepted gnucap-python [riscv64] (groovy-proposed) [0.0.2-1.2] [09:07] -queuebot:#ubuntu-release- New: accepted libgtextutils [amd64] (groovy-proposed) [0.7-7] [09:07] -queuebot:#ubuntu-release- New: accepted libgtextutils [armhf] (groovy-proposed) [0.7-7] [09:07] -queuebot:#ubuntu-release- New: accepted libgtextutils [riscv64] (groovy-proposed) [0.7-7] [09:07] -queuebot:#ubuntu-release- New: accepted gnucap-python [arm64] (groovy-proposed) [0.0.2-1.2] [09:07] -queuebot:#ubuntu-release- New: accepted gnucap-python [s390x] (groovy-proposed) [0.0.2-1.2] [09:07] -queuebot:#ubuntu-release- New: accepted libgtextutils [ppc64el] (groovy-proposed) [0.7-7] [09:07] -queuebot:#ubuntu-release- New: accepted gnucap-python [ppc64el] (groovy-proposed) [0.0.2-1.2] [09:07] -queuebot:#ubuntu-release- New: accepted libgtextutils [s390x] (groovy-proposed) [0.7-7] [09:07] -queuebot:#ubuntu-release- New: accepted libgtextutils [arm64] (groovy-proposed) [0.7-7] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [amd64] (groovy-proposed) [0.0.13-1] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [armhf] (groovy-proposed) [0.0.13-1] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [riscv64] (groovy-proposed) [0.0.13-1] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [amd64] (groovy-proposed) [0.0.13-1~build1] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [ppc64el] (groovy-proposed) [0.0.13-1~build1] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [s390x] (groovy-proposed) [0.0.13-1~build1] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [arm64] (groovy-proposed) [0.0.13-1] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [s390x] (groovy-proposed) [0.0.13-1] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [riscv64] (groovy-proposed) [0.0.13-1~build1] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [ppc64el] (groovy-proposed) [0.0.13-1] [09:08] -queuebot:#ubuntu-release- New: accepted gnss-sdr [arm64] (groovy-proposed) [0.0.13-1~build1] [09:08] -queuebot:#ubuntu-release- New: accepted ifenslave [amd64] (groovy-proposed) [2.10ubuntu2] [09:08] -queuebot:#ubuntu-release- New: accepted iotjs [arm64] (groovy-proposed) [1.0-2] [09:08] -queuebot:#ubuntu-release- New: accepted iotjs [ppc64el] (groovy-proposed) [1.0-2] [09:08] -queuebot:#ubuntu-release- New: accepted iotjs [s390x] (groovy-proposed) [1.0-2] [09:08] -queuebot:#ubuntu-release- New: accepted iotjs [amd64] (groovy-proposed) [1.0-2] [09:08] -queuebot:#ubuntu-release- New: accepted iotjs [riscv64] (groovy-proposed) [1.0-2] [09:08] -queuebot:#ubuntu-release- New: accepted iotjs [armhf] (groovy-proposed) [1.0-2] [09:13] !regression-alert bug #1889509 on xenial [09:13] bug 1889509 in grub2 (Ubuntu) "grub boot error : "symbol 'grub_calloc' not found" [Undecided,Confirmed] https://launchpad.net/bugs/1889509 [09:13] xnox: I am only a bot, please don't think I'm intelligent :) [09:13] chrisccoulson: did you see above? [09:13] sil2100: can we pause phasing of grub2 on Xenial? [09:14] cpc-help are Xenial images failing testing on aws? [09:14] xnox, ack [09:16] Not sure who else can pause phasing. Or if there is bug report / tag way to do it. [09:17] Laney: apw: seb128: cjwatson: are you able to pause phasing of grub2/grub2-signed on Xenial please? [09:17] I'm not in the SRU team and doesn't know how that works sorry [09:18] sil2100 maybe could help there? [09:18] also that's not going to fix people using apt, we should maybe just remove the update from -updates? [09:18] or -security rather [09:19] (would unattendeed-upgrade respect the pausing?) [09:22] It's not in -security yet [09:22] seb128: demote to proposed could be done. [09:22] grub2 & grub2-signed [09:23] so these are people using legacy bios who have grub installed to more than one disk, and only one grub image gets updated? [09:23] In AWS cloud?? [09:23] I can demote it [09:24] phasing doesn't help for clouds does it? [09:24] We have dpkg questions to install grub legacy to more than one disk/partition. [09:24] Laney: it does not. [09:24] so yeah :p [09:24] I vote for demoting to proposed [09:24] what's the bug reference? [09:24] is that impacting only xenial? [09:24] Laney, bug 1889509 [09:24] bug 1889509 in grub2 (Ubuntu) "grub boot error : "symbol 'grub_calloc' not found" [Undecided,Confirmed] https://launchpad.net/bugs/1889509 [09:25] oh yeah I see it up there, thanks [09:25] np! [09:25] xnox, just looking at the comments on https://askubuntu.com/questions/1263125/how-to-fix-a-grub-boot-error-symbol-grub-calloc-not-found [09:25] having modules and the grub kernel go out of sync seems the only plausible way for this to happen [09:25] I'll do xenial, please confirm about other releases [09:25] comment #3 state [09:25] Same bug on Ubuntu 20.04 pro in Azure. [09:25] https://askubuntu.com/questions/1263125/how-to-fix-a-grub-boot-error-symbol-grub-calloc-not-found is 20.04 [09:26] It means that Trusty ESM is probably affected too [09:26] I am confused how it could go out of sync. [09:26] xnox, chrisccoulson, should we demote to proposed on all series? [09:27] "Demoting packages to xenial-updates-proposed" [09:27] yeaaahhhhh no [09:27] seb128, I'm not sure. given the bugs that this update addresses have a fair amount of press coverage, this is going to be, uhm, interesting [09:27] seb128: new installs, new Ami, are not affected. Upgrades are. [09:27] and EFI is unaffected [09:28] chrisccoulson, we need maybe to pull more people in to take a decision [09:28] I'll just remove it, it can be copied back to proposed by anyone if they want [09:28] or re-released or whatever [09:28] Wimpress, ^ help please [09:28] Laney: right. [09:28] It is being phased. It is regression alert, we should pause phasing. [09:28] Which yeah, means demote to proposed. [09:29] xnox, I assume they've gone out of sync because they've got the grub kernel installed to the MBR on more than one device, and grub-install only updated one of them (just a guess) [09:29] https://paste.ubuntu.com/p/XGH6bGWc5r/ confirm [09:29] I'd like to figure out what's wrong on AWS and fix that. [09:29] chrisccoulson, but yeah, it's going to be 'interesting' but probably less an issue than us bricking stack of machines in a stable update [09:29] please [09:30] Laney: looks good to me. [09:30] xnox: That's not a regression [09:30] cj [09:31] hi cjwatson [09:31] xnox: Every time GRUB changes significantly we get a flurry of reports along these lines; it's because of timebomb local configuration errors [09:31] xnox: You know about the interface between GRUB's core image and modules? [09:32] xnox: Part of GRUB lives in a "core image" at the start of the boot disk, and part of it lives in modules on the /boot file system. They're supposed to be updated in sync by grub-install. If those two things get out of sync, and if the interface between the core image and the modules change (which can happen on any update - there's no interface stability guarantee there), then you'll see ... [09:32] ... this type of problem. [09:33] ah ok, not removing then :> [09:33] xnox: This happens on improperly configured BIOS systems [09:33] It's fragile, our cloud images don't know which device they will be booted on, and when one upgrades them they get bricked. But we kind of supply both pieces, no?! [09:33] xnox: The standard fix is "sudo dpkg-reconfigure grub-pc" [09:34] xnox: Sure, there are problems here (hard to fix ones), but treating them as blocking a critical security update is totally wrong practice [09:34] I wonder if we have ever had it done right on AWS, such that AMIs can survive grub upgrade. [09:34] xnox: They're not actually related to the update as such [09:34] xnox: Any GRUB change that introduced a new core image symbol that modules need would encounter something similar on such misconfigured systems [09:34] Ok [09:36] we should probably add another section to the knowledgebase article [09:36] xnox: So certainly if we've been shipping cloud images in a state where the GRUB packaging doesn't know where to install GRUB, that's obviously a problem that we need to fix, but it must have been around forever and hit previously on multiple occasions [09:37] can you see the problem before you restart? [09:37] cjwatson: I recall seeing extensive cloud-init change to "fix grub state" which broke since AWS move to nvme drives. [09:37] cjwatson, would you be able to do that? (adding another bit to the "Recovery" section of https://wiki.ubuntu.com/SecurityTeam/KnowledgeBase/GRUB2SecureBootBypass) [09:37] xnox: grub-pc.postinst has countermeasures that try to spot the installation device not being present. It's possible those are somehow not working, or something else [09:37] chrisccoulson: No, I'm on leave until Monday [09:38] cjwatson, aha, no worries [09:38] sorry :) [09:38] Wittering on IRC is easy [09:38] let me try to recreate a broken configuration and then I'll try to go through the steps of recovering it [09:38] (And ideally we'd work out a change to move the target grub-install device to a proper configuration file rather than it living in debconf) [09:39] (Would still have to be packaging-specific, and I'd need to review it because we need to agree any approach along those lines between Debian and Ubuntu. But it's a long-standing wart) [09:39] Laney: chrisccoulson: execute dpkg-reconfigure grub-pc on bios booted machines after update is applied, but before reboot is a good thing to do. Either harmless, or will prevent boot failure. For one to double check that the right drives are configured for boot. [09:40] xnox, yeah, ideally I'd like to walk through the process on a broken configuration before updating the knowledgebase article [09:40] And obviously I don't have a veto on you deciding to pull a security update due to cloud image upgrade problems; I just want to make sure that people properly understand the nature of the problem [09:41] cjwatson: juliank was working on installing to multiple ESPs and pc-bios partitions and keeping them all in sync. But I am not sure if all our desires are complete there. Including like autodetecting all the grubs and updating them all. [09:41] xnox: Full autodetection sounds like an approach I would veto in Debian, because it would potentially trash existing disks that *shouldn't* have their boot sectors touched [09:41] could you get this via unattended-upgrades and get timebombed without being aware of it? [09:42] Yeah, I need CPC help on this. To see if we are building our aws images wrong. Should be reproducible by downgrading to release too. [09:42] xnox: But it depends on the details [09:43] Laney: yes one will. And the timebomb is the duration since last grub2 update we pushed, or since one installed. Whichever is shorter. [09:43] Laney: not yet, as unattended-upgrades only pull in -security IIRC (and it's in -updates only for now), but I guess it would be possible once it lands in security [09:43] cjwatson: right. [09:43] sil2100: oh yes! [09:43] eh [09:43] There's some code in grub-pc.postinst that tries to scan for existing grub2 boot sectors (for the purpose of upgrading from grub legacy), but it's very difficult to do [09:43] xnox: No, it's the duration since the last update that changed core/modules ABI [09:44] (May happen to be the same in this case, but not necessarily) [09:44] Right. Most grub SRUs we shipped change like things in grub-mkconfig rather than anything substantial in the bios core. [09:45] The other situations that tend to produce this are more easily ascribed to user error (although admittedly the requirements are not very well documented; but at least they tend to correspond to something the user did) [09:45] Replacing a disk, or a bad cloning process [09:46] I was obtusely getting at writing something down on the wiki not being sufficient for those systems [09:46] I struggled with improving the situation here for literally years [09:46] It's really hard to get right without breaking other things [09:47] Not to say that it can't be improved, but nobody should expect it to have a quick or easy solution IMO [09:47] BIOS sucks [09:50] In the absence of a monolithic image the way we have on UEFI (which of course has other problems, and which often doesn't fit on BIOS systems anyway), probably the only robust way to do much better is to figure out how to scan for a best guess at the device that needs to have grub-install run on it. Even then it will likely break some people with fiddly multibooting setups who will be super ... [09:50] ... vocal about it [09:51] * cjwatson out [09:56] * xnox ponders if we can copy symbols into every module, which are not in the "core" ABI that we define per series, or some such. [09:57] argh [09:57] aka make core to be "stable" yet make all modules have all the symbols they need too [09:57] you're going to stop me being out if you propose crazy ideas :P [09:57] seb128: I'm reading the backlog... [09:57] also, no, there's also no guarantee that core/modules won't change in ways that aren't visible at the ABI level [09:58] I'll go talk to cpc, to ensure they have testcases to test that after image is booted, the dpkg/grub/etc know and will update the right cores on the right places. [09:58] desynced core and modules may break, and copying symbols around won't fix that [09:58] they must be in sync [10:03] xnox, cjwatson vorlo n also does not want full autodetection, just fallback if we can't find configured targets [10:03] Just installing to any random disk you attached was not the idea [10:05] ack [10:16] Right. Probably won't fix every situation, but is a bit safer [10:16] (ways that aren't visible at the ABI level: e.g. IIRC the recent security update changed the return type of some functions too, and that isn't visible in C ABIs. Just an example) [10:17] -queuebot:#ubuntu-release- New binary: llvm-toolchain-11 [s390x] (groovy-proposed/universe) [1:11.0.0~+rc1-1] (no packageset) [10:18] chrisccoulson: commented on the bug report and askubuntu. All mentions there were about "how to unbrick your boot this one time" none of them had long-term advice on fixing up / doing dpkg-reconfigure grub-pc to apply thing ever after to all the right drives. [10:18] * xnox ponders if grub-install should offer "do you want to add this device to grub-pc debconf for automatic updates? [10:45] write a random uuid into the saved block of grub stage1 and record that for finding later [10:55] -queuebot:#ubuntu-release- New binary: llvm-toolchain-11 [ppc64el] (groovy-proposed/universe) [1:11.0.0~+rc1-1] (no packageset) [11:28] http://autopkgtest.ubuntu.com/running [11:28] what is going on with all those KDE packages [11:28] "Start lintian" [11:28] ?! [11:30] xnox sil2100 you can't do a no-change-rebuild of systemd in bionic because it FTBFS, which is one of the bugs fixed in the upload that's been waiting for review for 22 days [11:30] LP: #1886197 [11:30] Launchpad bug 1886197 in systemd (Ubuntu Bionic) "FTBFS in b due to libseccomp change" [High,In progress] https://launchpad.net/bugs/1886197 [11:31] ddstreet: thanks for pointing this out! [11:39] RikMills: are you here? can you look into these kubuntu autopkgtest hangs please? [11:39] I'm thinking of uploading pkg-kde-tools to drop the lintian run in the meantime [11:48] yes, I'm going to do that [11:49] just quickly testing with one example that it actually fixes the problem [12:00] bah, if you pass .debs to autopkgtest it doesn't use them for build-needed [12:03] xnox, ddstreet: let me review what's in that systemd upload right now [12:04] Laney: autopkgtest is so silly! [12:04] [12:04] Laney: blame pitti! [12:10] ddstreet: in case this upload gets accepted, how fast will you be able to get all the bugfixes verified? [12:10] I employed a clever use of sleep to ssh in and dpkg -i the dep [12:10] autopkgtest would have to be quite determined to undo that ... [12:12] chrisccoulson: Odd_Bloke: rick_h: rcj reports that AWS AMIs prior to 29th of April had cloud-init with cc_grub_dpkg run-once module that incorrectly specified /dev/sda to debconf, instead of nvmen0 to debconf. [12:12] On xenial, that didn'g go out until June 28th [12:12] so AMIs prior to cut-off dates have wrong cc_grub_dpkg executed. [12:13] Odd_Bloke: rick_h: can we push out cloud-init sru, that calls ds-identify, checks for AWS & nvme, and clears the state of cc_grub_dpkg / triggers a rerun of it via maintainer scripts somehow? [12:13] Odd_Bloke: rick_h: can you confirm when the fixes landed in cloud-init xenial...groovy? [12:14] I grabbed AWS AMI (us-west-2) ami-0813245c0939ab3ca which was released on 2020-04-29 because I wanted to be on the other side of the cloud-init cc_grub_dpkg patch https://git.launchpad.net/cloud-init/commit/cloudinit/config/cc_grub_dpkg.py?id=fc07d633f7cb694423349a2c4b10c91c4b4981a2 [12:16] I used an m5a.large instance because it's nitro-based and will have a root on nvme [12:17] grrr systemd SRUs [12:17] rcj: is there any vendor metadata we can push out? [12:17] xnox: Haven't we had reports of issues with 20.04 Pro in azure? (re: your suggestion to check ds-identify for aws) [12:17] xnox: how do you mean? what are you thinking? [12:17] rcj: please check backscroll. I remember somebody callling 20.04 PRO but not sure which cloud and which place. Maybe it was lp bug report?! [12:17] sil2100 i can verify all the reproducable bugs today, there are fixes for lp #1881972 and lp #1886115 which are reproducable by the bug reporter, though the former has already been verified by the bug reporter and the latter is a trivial clearly correct patch [12:18] Launchpad bug 1881972 in systemd (Ubuntu Bionic) "systemd-networkd crashes with invalid pointer" [Medium,In progress] https://launchpad.net/bugs/1881972 [12:18] Launchpad bug 1886115 in systemd (Ubuntu Bionic) "libseccomp 2.4.3-1ubuntu3.18.04.2 causes systemd to segfault on boot" [Medium,In progress] https://launchpad.net/bugs/1886115 [12:18] rcj: to pubhlish vendor-metadata in the cloud to do one-off re-configure of cc_grub_dpkg [12:18] xnox: azure pro image is mentioned in https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1889509/comments/3 [12:18] Ubuntu bug 1889509 in grub2 (Ubuntu) "grub boot error : "symbol 'grub_calloc' not found" [Undecided,Confirmed] [12:18] xnox: nope, there is no vendor metadata service like that. [12:18] and lp #1832754 which also is not reproducable by me but a clearly correct patch [12:18] Launchpad bug 1832754 in systemd (Ubuntu Bionic) ""shutdown[1]: Failed to wait for process: Protocol error" at shutdown or reboot and hangs." [Medium,In progress] https://launchpad.net/bugs/1832754 [12:20] rcj: sad about lack of vendor metadata [12:20] rcj: the azure pro => is it nvme issue too? [12:21] rcj: Odd_Bloke: rick_h: are there any instructions we can add to "fix" clouds? I.e. "$ sudo cloud-init run-one cc_grub_dpkg" [12:21] xnox: I haven't gotten there yet to check things out [12:21] To the knowledge base article. [12:25] xnox: are you editing? Wasn't clear if that was telling me what you're doing or asking me to do it. [12:28] ddstreet: ok, thanks - too bad the reporter of LP: #1832754 seems to have went silent [12:28] Launchpad bug 1832754 in systemd (Ubuntu Bionic) ""shutdown[1]: Failed to wait for process: Protocol error" at shutdown or reboot and hangs." [Medium,In progress] https://launchpad.net/bugs/1832754 [12:35] Laney: have you any idea what changed in last 48hrs? I uploaded new kde plasma Tuesday, and its tests had no problem [12:43] Odd_Bloke: rick_h: opened cloud-init bug, https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1889555 please check if it is feasiable to add maintainer scripts, to call cc_grub_dpkg, again, once-only, upon upgrade of cloud-init package. [12:43] Ubuntu bug 1889555 in cloud-init (Ubuntu Groovy) "cc_grub_dpkg was fixed to support nvme drives, but didn't clear the state of cc_grub_dpkg and didn't rerun it on upgrades" [Undecided,New] [12:45] juliank: chrisccoulson: cjwatson: on AWS, rcj has identified that despite debconf saying to install grub-pc onto /dev/sda, non-interactively, grub-install onto /dev/sda fails (as it does not exist), and yet the package is configured fine. At this point, it is fair to assume that the device in debconf database is wrong/has been renamed (i.e. should have been nvmen0) and yet we configured [12:45] grub-pc/grub-pc-bin and upgraded all modules that are now missmatched from core. Imho grub package configuration should at this point fail and rollback to the old modules. [12:45] such that there is no missmatch between the core on the nvme & modules on disk. [12:46] I think untag me at this point unless you have a patch against the Debian source tree for me to review :) [12:46] xnox: it should reprompt you if the device has gone missing [12:46] that's what the efi code does [12:46] but then I copied the EFI code from the bios code [12:46] I just wanted to give context for what it was supposed to do - I'm not going to debug the Ubuntu postinst [12:46] so i find this confusing [12:47] (I agree, it's definitely supposed to prompt you, there's a whole swathe of code specifically for that) [12:47] =))))) [12:51] rcj: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1889555 [12:51] Ubuntu bug 1889555 in cloud-init (Ubuntu Groovy) "cc_grub_dpkg was fixed to support nvme drives, but didn't clear the state of cc_grub_dpkg and didn't rerun it on upgrades" [Undecided,New] [12:54] jdstrand_: at this point i don't think we should push grub2 to security pocket, until at least after we either have cloud-init fix or grub maintainer fix. [12:57] xnox: https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1889556 [12:58] Ubuntu bug 1889556 in grub2 (Ubuntu) "grub-install failure does not fail package upgrade (and does not roll back to matching modules)" [Undecided,New] [12:59] sil2100: vorlon: the above does not block spinning new media / shipping existing grub. [13:00] sil2100: vorlon: but imho we should pause phasing / not push this update to security. until we either have grub2 postinst fix, or cloud-init fix. [13:01] rcj: xnox: Good morning, I'm catching up on scrollback. [13:01] Odd_Bloke: tl;dr https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1889556 & https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1889555 are good summaries. Old cloud-init, bites us, on nvme, months later, when we try to push out grub2 update. [13:01] Ubuntu bug 1889556 in grub2 (Ubuntu Groovy) "grub-install failure does not fail package upgrade (and does not roll back to matching modules)" [Undecided,New] [13:01] Ubuntu bug 1889555 in cloud-init (Ubuntu Groovy) "cc_grub_dpkg was fixed to support nvme drives, but didn't clear the state of cc_grub_dpkg and didn't rerun it on upgrades" [Undecided,New] [13:02] Odd_Bloke: and at the same time, grub2 maintainer scripts appear to ignore non-existence of boot devices. [13:04] -queuebot:#ubuntu-release- Unapproved: accepted systemd [source] (bionic-proposed) [237-3ubuntu10.42] [13:06] xnox: ACK, yeah, sounds nasty [13:07] sil2100: vorlon: but imho we should pause phasing / not push this update to security. until we either have grub2 postinst fix, or cloud-init fix. [13:07] ddstreet: accepted, quick verification would be very useful as we want this for .5! [13:07] xnox: ok, I can look into stopping the phasing [13:07] sorry for the dupe post [13:07] sil2100: yes please. [13:09] xnox: rcj: I'm not 100% I understand the severity of the issue here: from the cloud-init bug filed it sounds like "this has been broken for years but was recently fixed" but the urgency with which we are talking about it right now suggests to me that we've in some way regressed something, and I can't tell what that is. [13:10] Odd_Bloke: many grub updates do not change ABI. And upgrades work correctly on non-nvme cloud init instances. [13:10] Odd_Bloke: All images launched with a cloud-init earlier than the fix for (LP: #1877491) will fail to reboot after grub is installed [13:10] Launchpad bug 1877491 in cloud-init "cc_grub_dpkg: determine idevs in a more robust manner with grub-probe" [Undecided,Fix committed] https://launchpad.net/bugs/1877491 [13:11] Odd_Bloke: when grub on nvme was fixed in cloud-init, it was not fixed for existing instances, only for newly launched ones. [13:11] Odd_Bloke: reports have been seen from AWS, Azure, and MAAS in bug #1889509 [13:11] bug 1889509 in grub2 (Ubuntu) "grub boot error : "symbol 'grub_calloc' not found" [Undecided,Confirmed] https://launchpad.net/bugs/1889509 [13:12] Odd_Bloke: thus #1877491 is still unfixed on nvme. [13:12] Odd_Bloke: and we are pushing incompatible ABI grub-pc security update for trusty to Xenial. [13:13] Odd_Bloke: do you want a hangout? [13:13] So are we saying the fix we landed for bug 1877491 has not fixed this issue everywhere it could (or should) have? Or are we saying that that fix is causing this issue? [13:13] bug 1877491 in cloud-init "cc_grub_dpkg: determine idevs in a more robust manner with grub-probe" [Undecided,Fix committed] https://launchpad.net/bugs/1877491 [13:14] (To be clear, I'm just ensuring I understand the issue, I'm not trying to be defensive about it!) [13:14] Odd_Bloke: "has not fixed everywhere it could" [13:15] Odd_Bloke: i.e. when fixing bug in run-once modules, one should be considering to add maintainer scripts to rerun / fixup the broken piece via maintainer scripts. In general. But this one in particular. [13:16] Odd_Bloke: The issue being that cc_grub_dpkg is frequency once-per-instance and the fix for 1877491 didn't force that to re-run [13:16] Leaving running instances or new instances booted from older images and updated unfixed. [13:17] xnox: how does phasing work with cloud mirrors? Will it have any effect? [13:17] rcj: zero =) [13:17] rcj: phasing basically only relates to the UI upgraders [13:17] yeah, I expected that was the answer from everything I knew but I wanted explicit confirmation [13:18] thx [13:18] Since this *is* a security update btw., was the security team ok with stopping the phasing? [13:18] sil2100: that was discussed as contingency, yes. [13:18] In general, we will not apply new behaviour to running instances on upgrade, but then again it's rare that the new behaviour is required for instances to continue functioning. [13:19] sil2100: plus even applying these updates is insufficient. one has to apply dbxupdate, which we cannot push out yet. [13:19] sil2100: so even if one can fetch all the packages it's not enough. [13:19] Odd_Bloke: correct, that's a judgement call. In affect, at the moment, on nvme based instances it's a ticking time bomb. And it has now ticked to zero =) [13:20] so very case by case issue. [13:20] Odd_Bloke: but we know how to fix it automatically, and if we push out cloud-init sru to fix things up for people. we can push out grub2 to security, and nobody will be affected and can continue to install updates unattended and reboot. [13:21] xnox: If we push grub2 out to -security but cloud-init only goes to -updates, do we have a problem? [13:21] Odd_Bloke: yes, because grub2 will be autoinstalled within 24h by unattended-upgrades, where as cloud-init will not. [13:22] Odd_Bloke: hence ideally this cloud-init SRU should be built against security pocket, and push out to security pocket, wihtout USN. [13:23] Odd_Bloke: effectively this is denial of service, as reboot makes instance fall off the internet / fail to boot. [13:24] That would be a major bump of cloud-init version in -security (well, there are none in -security, so -release): from 0.7.7~bzr1212-0ubuntu1 to 20.2-45-g5f7825e2-0ubuntu1~16.04.1 on xenial. [13:25] (That's just an observation: I don't know what the consequences of that would be off-hand.) [13:26] Ouch [13:27] Did nvme instances exist back then? [13:28] We can add in grub2 that in breaks cloud-init << version-that-fixes-nvme-in-maintainer-script [13:28] That should cause to pull & install cloud-init from updates. [13:28] Or hold off grub security unattended upgrade [13:30] xnox: That sounds like a solution to me, I don't really know enough about the "expected" behaviour of -security for this sort of case, so I can't really approve/disapprove though. :p [13:32] xnox, or lead to grub being removed *g* [13:33] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1336855 <-- apparently not the first time we've dealt with something like this! [13:33] Ubuntu bug 1336855 in cloud-init (Ubuntu Utopic) "[SRU] non-interactive grub updates broken for /dev/xvda devices on Cloud-Images/Cloud-init" [Critical,Fix released] [13:37] argh, crash in change-override call for phasing reset, ouchy [13:40] -queuebot:#ubuntu-release- New binary: linux-signed-gke-5.0 [amd64] (bionic-proposed/universe) [5.0.0-1046.47] (kernel) [13:41] -queuebot:#ubuntu-release- New: accepted linux-signed-gke-5.0 [amd64] (bionic-proposed) [5.0.0-1046.47] [13:55] I've just read through the scrollback but it's not clear to me yet what action we're going to take [13:55] xnox: OK, so are you looking at that postinst change, or shall I start taking a look? [13:55] Odd_Bloke: we need cloud-init fix irrespective of grub2 changes. [13:56] Odd_Bloke: if you have something that was used for similar issue before, yes please reuse that. [13:56] chrisccoulson: So I think that we can make the postinst change to cloud-init and release that to -updates; this should mean (perhaps with grub dependency changes to force installation first?) that instances configured with -updates (i.e. most of them) should be fixed. The question of how to fix -security-only instances is still open, from my POV. [13:57] db_set grub-pc/install_devices "$parent_dev" [13:57] I think if you want to avoid a wholesale update in -security then a cherry pick of the fix itself would be better than anything involving Breaks or whatever [13:57] + grub-install $parent_dev || [13:57] + echo "WARNING! Unable to fix grub device mismatch. You may be broken." [13:57] are the key / right pices. [13:57] Odd_Bloke: db_set & grub-install are the right things to do. [13:57] xnox: So you think we should use those directly in the postinst, rather than executing the cloud-init module? [13:58] Odd_Bloke: up to you. whichever way. [13:58] Odd_Bloke: imho, it is cleaner to re-execute the module. But I don't know if that's easy/safe to do. [13:58] versus by-hand handling. [13:58] Odd_Bloke, thanks [13:58] Odd_Bloke: we know that what module does is good and correct. [13:59] chrisccoulson: we need to push dpm trees to the correct repository now, no? [14:00] chrisccoulson: did you push tags to repo we used in the ppa? [14:05] xnox, I didn't. There is already a tag in the repo for the version number we released [14:05] (I have just pushed the latest version of the branch though) [14:05] chrisccoulson: ack thanks. [14:09] juliank: pushing current focal branch to focal-devel; pushing chrisccoulson's branch as focal-security. Somehow we'll need to reconcile the two and merged them into one true focal. [14:09] xnox: ack [14:09] xnox: I can have a go at that on Monday [14:12] xnox, thanks [14:12] Is there anything I'm needed for at the moment? [14:12] chrisccoulson: not really [14:16] rcj: xnox: Does this summary look correct? https://paste.ubuntu.com/p/h9qdZfbCbc/ [14:21] Odd_Bloke: yes. [14:22] Odd_Bloke: you can also mention that "yes, it's also a bug that grub completes package upgrade, when it knows that it failed to update core. That is being addressed separately too, but forces interactive intervention." [14:23] "cloud-init can fix this for nvme drives non-interactively" [14:24] xnox: And that's LP: #1889556, right? [14:25] Launchpad bug 1889556 in grub2 (Ubuntu Groovy) "grub-install failure does not fail package upgrade (and does not roll back to matching modules)" [Undecided,New] https://launchpad.net/bugs/1889556 [14:27] Odd_Bloke: yes. [14:30] xnox: ack [14:31] xnox, did you open the merge request? https://gitlab.haskell.org/ghc/ghc/-/issues/18445 [14:31] xnox: I'll be sure to coordinate with rcj, xnox and chrisccoulson before doing anything [14:36] not sure why I said 'xnox' rather than 'you' there... [14:37] Odd_Bloke: I would say this is broader than "NVMe instances", it would be anywhere that cloud-init incorrectly configured grub install-devices. I say this because we have reports of Azure and MAAS (raid was mentioned) that I haven't dove into. [14:37] Laney: is there any way to test that a build is being done in a test runner rather than a normal archive buildd? [14:38] I dunno, README.package-tests.rst documents some variables but I'm not sure what's set at build-needed time [14:39] -queuebot:#ubuntu-release- New binary: llvm-toolchain-11 [amd64] (groovy-proposed/universe) [1:11.0.0~+rc1-1] (no packageset) [14:39] this feels like the wrong way to solve it though, I would rather see a fix that figures out why it's hanging and resolves that [14:39] I agree. I don't have time to investigate at the moment though [14:42] Odd_Bloke: I mention the scope because a solution might need to look at debconf to see if install-devices are valid rather than looking to see if we're booted on NVMe [14:48] jdstrand: me myself and I are not offended. [14:48] hehe :) [14:48] we've started pushing poor old snakefruit over the edge [14:48] load average 14 [14:49] xnox exists only in the first and third persons. [14:50] rcj: Right, I would be a little hesitant to apply this so broadly when we only know about this case. [14:50] (The original change itself was motivated by AWS Nitro instances, I believe, not by a more generally understood issue.) [14:53] -queuebot:#ubuntu-release- New: accepted llvm-toolchain-11 [amd64] (groovy-proposed) [1:11.0.0~+rc1-1] [14:53] -queuebot:#ubuntu-release- New: accepted llvm-toolchain-11 [s390x] (groovy-proposed) [1:11.0.0~+rc1-1] [14:53] -queuebot:#ubuntu-release- New: accepted llvm-toolchain-11 [ppc64el] (groovy-proposed) [1:11.0.0~+rc1-1] [14:54] Odd_Bloke: Sure, but 1889509 mentions Azure and MAAS so we have a bit more to understand here. [14:56] rcj: Ack, had missed that one. [14:59] and with maas I think I saw RAID [15:02] xnox: breaks from grub2 in security won't cause unattended-upgrades to pull cloud-init from updates, it will only cause unattended-upgrades to hold back grub2 [15:14] rcj: I guess I'm a little hesitant to use "install-devices are valid", because if a user has changed that then we probably shouldn't be touching it. [15:15] -queuebot:#ubuntu-release- New source: sup-mail (focal-proposed/primary) [1.0-3~0ubuntu20.04.1] [15:16] But we could potentially do "install-devices is /dev/sda and invalid"? [15:18] rcj: I also have a very vaguely defined worry about corner cases where the install-devices are not present when we upgrade or something like that; put another way, I don't know if "an absent install-devices implies it is invalid" is universally true. [15:20] Yeah, and we're stepping across into grub's domain tbh. If grub needs it to be valid during install/upgrade then it needs to ensure it's right. So for the NVMe bug that was fixed without touching existing config we should, as you're suggesting, find a way to correct that debconf. [15:21] I'm just worried that there is a larger problem space where cloud-init isn't/wasn't setting grub install-devices to a valid device. [15:22] Yep, I share that worry. [15:29] can you really always say /dev/sda is your install-devices? it's unlikely to be valid across the board when the image runs [15:30] Odd_Bloke: but really a fix for bug #1889556 will catch the issue during grub-install, rollback grub to something that still reboots, and we'll get apport data to fix those places that cloud-init hasn't gotten debconf right (if additional situations do exist) [15:30] bug 1889556 in grub2 (Ubuntu Groovy) "grub-install failure does not fail package upgrade (and does not roll back to matching modules)" [Undecided,Confirmed] https://launchpad.net/bugs/1889556 [15:31] cyphermox: You absolutely can't say /dev/sda is your install device universally and cloud-init cc_grub_dpkg module looks to get it set correctly in debconf for grub on first boot. [15:31] yeah, that's what I was trying to say [15:32] rcj: Right, so long as we have a backstop like that then I think we're fine with the more targetted fix? [15:33] Odd_Bloke: Yeah, thanks for rubber ducking with me. [15:33] I'm not sure who the duck is [15:34] xnox: oh, actually, now that we have systemd in bionic-proposed, let me review and accept your debian-installer! [15:35] Odd_Bloke: Can you think of other distros that need to know about this because they use cloud-init? deb-based distros but is there something similar for the rpm-based folks? [15:36] A question that probably better fits #cloud-init.. [15:36] * rcj walks down the hall to #cloud-init [15:38] sil2100: horay! [15:41] -queuebot:#ubuntu-release- New binary: linux-signed-oem-5.6 [amd64] (focal-proposed/main) [5.6.0-1021.21] (no packageset) [15:46] -queuebot:#ubuntu-release- Unapproved: accepted fwupd [amd64] (groovy-proposed) [1.3.11-2] [15:46] -queuebot:#ubuntu-release- Unapproved: accepted fwupd [armhf] (groovy-proposed) [1.3.11-2] [15:46] -queuebot:#ubuntu-release- Unapproved: accepted fwupd [arm64] (groovy-proposed) [1.3.11-2] [15:52] -queuebot:#ubuntu-release- New: accepted linux-signed-5.7 [amd64] (groovy-proposed) [5.7.0-15.16] [15:52] -queuebot:#ubuntu-release- New: accepted linux-signed-5.7 [s390x] (groovy-proposed) [5.7.0-15.16] [15:52] -queuebot:#ubuntu-release- New: accepted linux-signed-5.7 [ppc64el] (groovy-proposed) [5.7.0-15.16] [15:57] rbalint_: google-compute-engine-oslogin: deprecated debhelper version in a new package? [15:58] xnox: hm hm hmmm! So, for 18.04.5 it is the time when we switch the hwe kernels from 5.3 to 5.4 [15:58] While your d-i upload is still using the 5.3 kernel [16:03] xnox: you want to reupload with the version from https://launchpad.net/ubuntu/+source/linux-meta-hwe-5.4 ;) ? [16:06] -queuebot:#ubuntu-release- New: accepted google-compute-engine-oslogin [source] (groovy-proposed) [20200507.00-0ubuntu1] [16:08] -queuebot:#ubuntu-release- New source: telegraf (groovy-proposed/primary) [1.15.1+ds1-0ubuntu1] [16:08] sergiodj, ^^ [16:08] LocutusOfBorg: thanks! [16:11] sil2100: please reject [16:11] sil2100: hmmmm [16:12] is that the right package name i wonder. [16:12] apw: is linux-meta-hwe-5.4 intentional? or should it be https://launchpad.net/ubuntu/+source/linux-meta-hwe ? i.e. did we give up on '-5.4' or not? [16:13] sil2100: i thought we tried linux-5.4 and then dropped that. [16:13] klebers: ^^^^ [16:15] xnox: When the cloud-init module runs during postinst, we get this when trying to run debconf-set-selections: debconf: DbDriver "config": /var/cache/debconf/config.dat is locked by another process: Resource temporarily unavailable [16:16] Does this mean we'll need to handle this "manually" using db_get/db_set, or is there another way around it? [16:16] xnox, the meta package name really changed to linux-meta-hwe-5.4 [16:16] xnox: yeah, that's the new style, man! [16:16] xnox, that's the source pkg name, but the binary meta is the same [16:16] Welcome to 2020! [16:16] xnox: "Once singed with the archive key" - while I can see the parallel to cattle branding, you might want to fix the typo in the package description [16:16] ;) [16:17] -queuebot:#ubuntu-release- Unapproved: rejected debian-installer [source] (bionic-proposed) [20101020ubuntu543.16] [16:18] -queuebot:#ubuntu-release- New: accepted shim-canonical [source] (groovy-proposed) [1] [16:18] vorlon: i am confused [16:18] xnox: signed, not singed [16:18] vorlon: what's the typo, and where did i make it? [16:19] xnox: in the shim-canonical package [16:19] vorlon: thanks for reviewing that without context =) weeks later [16:19] xnox: my pleasure [16:19] klebers: thank, thanks a lot! [16:19] If you enjoyed the service ubuntu-archive provided today, please leave us a 5* review on Google Maps [16:20] Odd_Bloke: you must use db_get/db_set; do ensure you run without set -e; or correctly handle _all_ return codes form debconf calls. i.e. thing slike 10 30 etc. [16:20] ;.; [16:21] Odd_Bloke: if things are easy i just dput them; it's only the hard bugs that i talk about; file bugs; do "collaboration" =) [16:21] -queuebot:#ubuntu-release- New binary: shim-canonical [arm64] (groovy-proposed/universe) [1] (no packageset) [16:21] xnox: You're saying you're collaborating against me? ;) [16:24] I believe that would be "conspiring" :) [16:24] -queuebot:#ubuntu-release- New binary: shim-canonical [amd64] (groovy-proposed/universe) [1] (no packageset) [16:27] -queuebot:#ubuntu-release- New: accepted agda [amd64] (groovy-proposed) [2.6.1-1] [16:27] -queuebot:#ubuntu-release- New: accepted agda [s390x] (groovy-proposed) [2.6.1-1] [16:27] -queuebot:#ubuntu-release- New: accepted agda [ppc64el] (groovy-proposed) [2.6.1-1] [16:38] xnox: So the previous time we did something like this (i.e. https://github.com/canonical/cloud-init/blob/ubuntu/devel/debian/cloud-init.postinst#L193-L199) we performed a grub-install immediately after updating debconf; do you think we should do the same here, so that if we've misconfigured things there's an immediate indicator? [16:40] Odd_Bloke: yes. [16:40] Odd_Bloke: some people have already installed new grub. And if they install cloud-init sru, they do need debconf fix + re-grub-install. [16:41] (installed new grub, but no yet rebooted) [16:48] sil2100: pc gadget for amd64&i386 has been rebuild with boothole fixed grub, for uc16 and uc18. Without bumping editions. [16:48] sil2100: can you respin uc16 & uc18 beta channel images? or are they built daily? [16:54] xnox: those are built daily if anything [16:55] (we do dailies of both edge and beta images) [16:55] You want me to kick those now anyway? [16:56] sil2100: well it needs to be passed over to Cert right? and then i want to promote those gadgets to stable, as soon as fresh images are tested good with it. [16:58] xnox: rcj: I'm going to grab lunch now, here's a rough initial implementation of the postinst change: https://paste.ubuntu.com/p/h8rjJTnFfY/ please let me know what you think! [16:58] Odd_Bloke: sounds like xnox is collaborating at you pretty intensively [17:01] Odd_Bloke: I don't know how nvme enumeration works and if you could ever have nvme# without having nvme0 (re: line 26 in your patch) [17:04] Odd_Bloke: I agree that cloud-init install shouldn't exit 0 but you might elaborate from "You may be broken" to say what may be broken and how to check. "You may be unable to reboot. Consider running 'sudo dpkg-reconfigure grub-pc' to set a install device" [17:05] which is pretty wordy [17:05] * rcj -> lunch [17:05] Odd_Bloke: Looks good overall, just had those 2 bits of feedback. [17:13] Odd_Bloke: i love your elisp one-liner there [17:18] -queuebot:#ubuntu-release- New binary: haskell-hgettext [s390x] (groovy-proposed/universe) [0.1.31.0-7] (no packageset) [17:18] -queuebot:#ubuntu-release- New binary: haskell-hgettext [amd64] (groovy-proposed/universe) [0.1.31.0-7] (no packageset) [17:19] -queuebot:#ubuntu-release- New binary: haskell-hgettext [ppc64el] (groovy-proposed/universe) [0.1.31.0-7] (no packageset) [17:20] Odd_Bloke: the function should take "$2" like the other fixers, and you want to dpkg --compare-versions with 20.2-95 such that any upgrades from less than 20.2-95 will trigger executing this code on upgrade to this version of cloud-init only once. [17:20] Odd_Bloke: there is no need to run this code on every cloud-init package install / upgrade, going forward. [17:25] -queuebot:#ubuntu-release- Unapproved: landscape-client (xenial-proposed/main) [16.03-0ubuntu2.16.04.7 => 16.03-0ubuntu2.16.04.8] (ubuntu-server) [17:26] I'm going to run out of time, was going to kill all of the stuck kubuntu tests once pkg-kde-tools finished publishing [17:26] but it's not done that [17:26] if someone wants to, it's the things that have stalled on "=== Start lintian" [17:28] kill those once pkg-kde-tools ubuntu2 is available in release [17:28] otherwise they should eventually timeout :/ [17:28] (kill the autopkgtest processes on the cloud worker, that is) [17:31] -queuebot:#ubuntu-release- New binary: haskell-hgettext [arm64] (groovy-proposed/universe) [0.1.31.0-7] (no packageset) [17:31] -queuebot:#ubuntu-release- New binary: haskell-hgettext [armhf] (groovy-proposed/universe) [0.1.31.0-7] (no packageset) [17:32] xnox: Yep, hence the TODO at the top of the function; thanks for the invocation! Any other comments? [17:38] xnox: Do you mean `20.2-45`? [17:40] Odd_Bloke: groovy has 20.2-94. I could have launched an old instance and dist-ugpraded to groovy by now, and still be broken. [17:40] Odd_Bloke: and upgrades "worked" because we have not broken grub core <-> modules abi, until just now in groovy. [17:40] hence upgrading from less than 20.2-95 should trigger the fixer. [17:41] Aha, right, had forgotten another upload to groovy happened since the SRU (I wonder who uploaded that ¬.¬); thanks! [17:42] do use that comparison that treats empty/zero version number as infinity. such that on first-time install of cloud-init the fixer is not triggered. [18:01] -queuebot:#ubuntu-release- Unapproved: s390-tools (groovy-proposed/main) [2.12.0-0ubuntu5 => 2.12.0-0ubuntu6] (core) [18:02] -queuebot:#ubuntu-release- New binary: haskell-hgettext [riscv64] (groovy-proposed/universe) [0.1.31.0-7] (no packageset) [18:15] -queuebot:#ubuntu-release- Unapproved: debian-installer (bionic-proposed/main) [20101020ubuntu543.15 => 20101020ubuntu543.16] (core) [19:28] xnox: rcj: Updated proposal: https://paste.ubuntu.com/p/HQZZcWDRDQ/ Note that there is one "XXX" comment in there that I would appreciate your opinions on. [19:33] I think this is close enough that I'll open up a PR with it; that will give us somewhere to actually keep review comments. [19:41] Odd_Bloke: i would skip that paragraph. [19:42] Odd_Bloke: fetch grub_cfg_dev; fetch corect_idef; compare and key off that. [19:42] Odd_Bloke: cause grub_cfg_dev might be xenvhda thing, despite booting of nvme actually for example. [19:42] Odd_Bloke: and we still need to correct from vda to nvme [19:43] Odd_Bloke: it is strictly redundant. and limits amounts of things we could fix. [19:44] xnox: Under what circumstances would it be xvda? [19:44] The old cloud-init code would default to /dev/sda if it didn't find any appropriate devices. [19:45] Odd_Bloke: I don't have high enough confidence in my knowledge of the original bug to say if your check on line 35 does match the comment before it. But given your statement as I typed this, then yes. [19:45] (By appropriate I mean: in the hard-coded set of paths it considered.) [19:46] So I believe if it's /dev/xvda, then when this instance was first booted, /dev/xvda was present. [19:46] right [19:46] I just whinged thinking about the instance resize feature and whether cloud-init detects that as a new instance (and will re-run cc_grub_dpkg) or nto. [19:47] "instance resize" being "move from one instance type to another"? [19:47] xnox, linux-hwe-5.4> nominally the first linux-hwe in a cycle is linux-hwe, the others are linux-hwe- because they overlap; like right now hwe@4.15, hwe@5.0, hwe@5.3 and hwe@5.4 are all alive in bionic for various reasons [19:47] (And therefore possibly changing the root disk from being presented at /dev/xvda to /dev/nvme... ?) [19:48] yeah [19:49] or /dev/xvda to /dev/sda [19:51] rcj: xnox: https://github.com/canonical/cloud-init/pull/514/ [19:51] canonical issue (Pull request) 514 in cloud-init "debian/cloud-init.postinst: fix NVME grub install device on upgrade" [Open] [19:51] sil2100: just releasing subiquity to stable now, not sure when you're next planning on rolling images [19:52] Also: currently this isn't scoped to only run on AWS (which I don't think it should be), but that means that we need to make sure we're considering other potential use cases. [19:54] Agreed [19:54] I think the checks are good [20:32] Odd_Bloke: this is not just xenial.... [21:00] xnox: should gfxboot-theme-ubuntu be removed from groovy? [21:00] hmm ubuntu-defaults-builder depends on it [21:00] but does that work [21:00] ubuntu-defaults-builder does not work [21:01] rcj: I added data from one VM in bug 1889505 [21:01] joy [21:01] bug 1889505 in Mahara "Behat: create import_export_skins feature" [Undecided,New] https://launchpad.net/bugs/1889505 [21:01] vorlon: mwhudson: https://code.launchpad.net/~ubuntu-core-dev/grub/+git/ubuntu/+merge/388423 it's looking not that bad. I'll grab chocolate, and then want to test postinst change, and then test this preinst. [21:01] dammit. Me an my typing... 1889509 [21:12] argh *now* i'm releasing a new subiquity to stable [21:12] oh hm previous version was just missing the tag not so bad [21:20] xnox: is that code all copy-pasted from postinst? [21:20] is there some way to not duplicate it? [21:20] i guess preinst can't depend on stuff [21:22] mwhudson: copy-pasted postinst, only some thing tweaked, i.e. no " || true" around db_input towards the end. [21:22] hnngh [21:22] mwhudson: i cannot "source /var/lib/dpkg/info/grub2-pc.postinst" because it will, well, execute it. [21:22] can the copy pasting happen at package build time? [21:23] mwhudson: shell_vendor_function blah?! [21:23] well it's called preinst.in so clearly some sed is happening to it already [21:24] mwhudson: yes, it's vendored into every type of grub-$PLATFORM [21:24] dunno just thinking aloud [21:24] hence see all the @PACKAGE@ [21:25] xnox: reviewed with comments [21:29] xnox: Did I say it was just xenial? [21:31] hggdh: Thank you, we're reproducing and debugging [21:31] xnox: rcj: rick_h: I'm EODing now. The status of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1889555 is that we have a PR for xenial open at https://github.com/canonical/cloud-init/pull/514; once that is in a good state, I'll forward-port it to the other releases. rcj and I have discussed testing, so I also have a plan for what to do with that tomorrow. [21:31] canonical issue (Pull request) 514 in cloud-init "debian/cloud-init.postinst: fix NVMe grub install device on upgrade" [Open] [21:31] Ubuntu bug 1889555 in cloud-init (Ubuntu Groovy) "cc_grub_dpkg was fixed to support nvme drives, but didn't clear the state of cc_grub_dpkg and didn't rerun it on upgrades" [Undecided,In progress] [21:32] Odd_Bloke: In azure the root is on /dev/sda and it fails on the reboot. We'll have more tomorrow. [21:32] rcj: Exciting! [21:38] hggdh: thank you for the info, I'm working with rcj on this issue too, and we were able to reproduce the issue [21:38] Odd_Bloke: see https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1889509/comments/24 for more information about /dev/sda failing the reboot [21:38] Ubuntu bug 1889509 in grub2 (Ubuntu) "grub boot error : "symbol 'grub_calloc' not found" [High,Confirmed] [21:43] Odd_Bloke: sorry, i'm confused about cloud-init versions then. [21:45] vorlon: on efi systems, do we install grub onto /dev/sda or onto /dev/sda14 (the pc-bios partition)? [21:46] * mwhudson squints at that question [21:47] xnox: /dev/sda, I believe [21:47] because we install to mbr [21:47] and IIRC then the pc-bios extra space is there for making sure there's room for stage1.5 in a way that doesn't interfere with gpt [21:48] now, what should I make of the fact that grub-pc/install_devices is empty on my eoan-installed EFI system [21:49] I guess that means we had an installer bug [22:12] xnox: seen my mp review? [22:22] vorlon: do you have grub-pc installed and/or a bios_grub partition? [22:22] oh eoan probably defaulted to mbr partition tables [22:27] mwhudson: hey! Thanks! [22:27] I'll kick the subiquity images in a moment [22:31] vorlon: so that's what i'm thinking about too. [22:31] vorlon: do we care about changing this behaviour when booted under efi? [22:31] vorlon: imho invalid/empty grub-pc/installed_devices are not that critical if one booted in efi mode. [22:43] i timed out [22:43] mwhudson: vorlon: https://code.launchpad.net/~ubuntu-core-dev/grub/+git/ubuntu/+merge/388423 & https://code.launchpad.net/~ubuntu-core-dev/grub/+git/ubuntu/+merge/388432 [22:43] postinst tested and is good [22:43] preinst not tested, review feedback addressed. [22:44] oooh let me merge both and upload into a ppa i guess, to test tomorrow. [22:57] sil2100: great [23:00] sil2100: https://launchpad.net/ubuntu/bionic/+queue?queue_state=1&queue_text=debian-installer [23:19] -queuebot:#ubuntu-release- Unapproved: fwupd (groovy-proposed/main) [1.4.5-1 => 1.4.5-1] (core) [23:22] -queuebot:#ubuntu-release- Unapproved: fwupd (groovy-proposed/main) [1.4.5-1 => 1.4.5-1] (core) [23:23] mwhudson: xnox: proposal: inop the grub-pc postinst as an immediate hotfix because grub-pc is not affected by the security vulnerability [23:24] -queuebot:#ubuntu-release- Unapproved: fwupd (groovy-proposed/main) [1.4.5-1 => 1.4.5-1] (core) [23:24] vorlon: makes sense i think [23:24] vorlon: a consequence of this [23:25] because we can programmatically solve the case that grub is pointing at non-existent disks; we cannot programmatically solve the problem of grub pointing at an existent but wrong disk [23:25] vorlon: is that someone who has grub that is already out of date will "upgrade" to the new grub but in fact not get a new grub [23:25] but well [23:26] so long as we fix this with priority it lets the security fix get out immediately and seems a reasonable compromise [23:27] although i'm not really aware of the ins and outs of the problem [23:28] mwhudson: we could inop it only when upgrading from a limited set of versions [23:29] what does "inop" mean? [23:29] xnox: i assume "make inoperable", i.e. put "exit 0" at the top of it or something [23:30] yeah [23:30] vorlon: mwhudson: that's buggy. [23:30] vorlon: not sure that really makes any difference [23:30] not quite at the top [23:30] vorlon: mwhudson: it means new modules on disk, old core in mbr = fail to boot [23:30] mwhudson: well if you have N-1 grub, then the new grub gives you nothing you need [23:30] when was the last grub sru [23:30] for grub-pc [23:31] vorlon: did you mean exit 1 in grub-pc preinst? [23:31] xnox: no? new modules are only written to /boot/grub by grub-install [23:31] so we avoid calling grub-install from grub-pc.postinst in the security version when upgrading [23:32] hangon, how come things fail to boot then? [23:32] they fail to boot when we /are/ calling grub-install and grub-install fails, after copying the modules [23:32] grub-install fails to install core to /dev/sda because it doesn't exist, then copies all modules to /boot anyway? [23:32] yeah wait [23:32] yes, it copies the modules first [23:32] oh [23:32] that's a thing we need to fix [23:32] maybe it shouldn't do that [23:32] indeed it shouldn't [23:32] but that's grub C code and maybe we shouldn't rush that as a hotfix [23:33] so postinst should back up /boot first. [23:33] yeah [23:33] and if things fail, roll it back. [23:33] anyway, strace from rcj of a particularly weird failure from grub-install on azure http://paste.ubuntu.com/p/QqYMwZsBbf/ [23:33] xnox: but this is what we already discussed fixing in grub-install, to make it do things in sensible order? [23:34] so let's call version of grub from last week N [23:34] if someone has version N installed on a bios system, upgrades to N+1, we don't call grub-install -> no consequences, the changes in N+1 make no difference [23:35] if someone has version N-1 installed, they will "upgrade" to N+1 but not actually get the changes from N-1 to N [23:35] do we care? [23:35] if they have it installed, it will boot. [23:35] we only care to call grub-install upon "install" really, not upgrades. [23:35] untrue [23:36] we do care about calling it in the general case on upgrade [23:36] yeah it shouldn't break their system but it will mean they think they have fixes that they do not [23:36] we only care to call grub-install upon "install" really, not in-series security upgrade of efi. [23:36] the latter, yes [23:36] there are bugs in grub-pc too which are addressed by this security vulnerability [23:36] but we don't care. [23:36] cause it allows arbitrary code execution anyway [23:37] exactly [23:37] this is totally fine for focal because this is the first sru [23:38] so we know can kicking down the road is bad, and this time we will really kick it far enough [23:38] vorlon: i especially want inop for trusty-esm. [23:39] the last sru to bionic that wasn't strictly efi related was over a year ago [23:39] and xenial about 9 months ago [23:39] and the weird azure => don't they boot in efi mode anyway? [23:40] only gen2 vms [23:40] so yeah, fine with vorlon's plan so long as we do actually do preinsty things soon [23:43] 16 SRUs of grub2 in bionic, and none of them do anything interesting to the grub-pc binary bits [23:45] vorlon: https://paste.ubuntu.com/p/rHhWzd792x/ [23:46] xnox: you're calling that from inside a function, shouldn't that be return 0? [23:47] vorlon: not a function [23:47] vorlon: no he's not? [23:48] vorlon: ignore bogus annotation from git [23:48] heh ok [23:49] i'm down with uploading that. [23:50] it will unlock us publishing security update on monday [23:51] xnox: I don't want us skipping all the rest of the configuration bits, only the grub-install [23:51] vorlon: i did one laser eye surgery already! [23:51] * xnox squits harder [23:51] * xnox squints harder [23:51] xnox: I can work on the patch + upload [23:56] Is the "skip grub-install on upgrade" a temporary thing? (sorry, I've not read through the whole scrollback) [23:57] xnox: I would add it to the conditional on line 541 (on the focal branch) [23:57] chrisccoulson: I think we will want to leave it in place for all future SRUs because we don't have a reliable way to ensure grub-install on BIOS doesn't go wrong [23:58] vorlon, yeah, that seems reasonable