[04:37] PR snapcraft#3243 opened: Add skip-keys to catkin plugin [06:08] morning [06:39] mborzecki: good morning - should I merge 9128 or is it not meant to be on master? [06:41] mvo: hey, hm it's probably ok to merge it, it's only adding some debug output to the test [06:45] PR snapd#9126 closed: boot: introduce current_boot_assets and current_recovery_boot_assets to modeenv [06:45] PR snapd#9128 closed: tests/main/cgroup-tracking: try to collect some information about cgroups [07:04] good morning [07:06] morning [07:16] zyga: pstolowski: hey [07:16] zyga: left some comments under https://github.com/snapcore/snapd/pull/9128 [07:16] PR #9128: tests/main/cgroup-tracking: try to collect some information about cgroups [07:17] zyga: some weird thing happening on 16.04 and cgroups [07:32] good morning zyga [07:39] hm, so [07:39] 2020-08-11T07:31:29.8774856Z - google:ubuntu-20.04-64:tests/main/cgroup-tracking-failure [07:39] 2020-08-11T07:31:29.8775399Z - google:ubuntu-20.04-64:tests/main/snap-user-service [07:39] that is known, right? [07:48] mvo: I fear you'll need zyga to undersand the state of that, there's a PR open but I don't know if it's the final story on this [07:49] pedronis: that's fine, just wanted to double check [07:49] pedronis: also it was a bit of a general question, not directly targeted to you :) [07:50] pedronis: did you had a chance to think about the name for the kernel-crypto-api interface? or is the name good already? [07:50] I didn't interpret it as targeted to me, but nobody was saying anything :) [07:51] mvo: no, I haven't seriously thought about that PR since Friday [07:51] pedronis: heh :) thanks! [07:51] pedronis: hey, s/TrustedAssetsChain/TrustedAssets/? [07:53] pedronis: tbh i assumed that we'll glue that higher up the call stack, somewhere in boot when (re)sealing [07:53] mborzecki: yes, as I said, I think the content matches the doc comment, but the name confused Ian [07:54] ok [07:54] or so I think [07:54] the glueing is a bit messy because it is across recovery and run bootloaders [07:55] but we will deal [07:55] yeah, otoh i'm not sure one bootloader (eg. run) should know that it's being loaded by the recovery one [07:55] PR snapd#9111 closed: releases: release 2.46~pre1 to groovy [07:55] mborzecki: I don't think that's the main problem [07:56] mostly for the final glueing we reall need the maps from modeenv [08:30] PR snapd#9131 opened: debian: release 2.46~pre2 [08:47] re [08:47] sorry, I had a small unexpected errand [08:47] back now [08:47] mborzecki: 0 is always the ID of unified cgroup [08:47] mborzecki: cachio suggested that lxd test is responsible [08:47] maybe lxd mounts the unified hierarchy somehow [08:47] zyga: ha, as it mounts the hierarchy and it's left behind? [08:47] mborzecki: I will try that in a moment [08:47] perhaps [08:47] that would explain it [08:47] zyga: and sys from the host is bind mounted to the container? [08:48] and fortunately we have a cleanup script for it [08:48] I don't fully know, there are some approaches, more than one [08:48] duh, how about systemd container protocol then? :) [08:48] there's a special fake cgfs as well [08:48] mborzecki: I think that's beyond my confidence zone [08:48] but we have a clue [08:48] I spent a bit too long last night trying to fix core 16 [08:48] if it is lxd, then stgraber will probably know more [08:49] made progress but I EOD past midnight and I was a bit tired in the morning] [08:49] mborzecki: yeah but it's quick and easy to verify this with -shell-after [08:49] I'll try that now [08:49] hey mvo, sorry for starting late [08:50] mborzecki: actually, first priority is to fix master [08:50] so give me a moment to wrap up that issue first [08:50] errand, back in 30 or so [08:53] ok [08:56] zyga: no worries [09:12] re [09:16] * zyga tries to sit in the office today [09:16] yesterday that didn't go very well [09:17] mborzecki: any highlevel concerns/feedback about #9130? especially aobut the plan to make the mountedfilesystemwriter use the ResolvedSource (i.e. an absolute path) and stop passing in the gadget root there? [09:17] PR #9130: [RFC] Support for gadget.yaml "$kernel:" style references [09:17] mvo: haven't looked yet, i'll do it today [09:17] mborzecki: it will cause some churn so I want to somewhat be certain it's not in vain :) [09:17] mborzecki: ok, no worries [09:25] PR snapd#9132 opened: o/hookstate/ctlcmd: add optional --pid argument to "snapctl is-connected" [09:30] PR snapd#9133 opened: osutil: add CommitAs to atomic file [09:32] hopefully a trivla PR ^^ [09:33] mborzecki: https://github.com/snapcore/snapd/pull/9133#pullrequestreview-464902323 [09:33] PR #9133: osutil: add CommitAs to atomic file [09:33] yes, but with a typo :) [09:33] haha [09:33] ok, let me force push, not worth another patch [09:33] yeah [09:33] and cancel current run [09:33] save $$$ [09:34] * zyga tries another iteration of core16 fix [09:35] btw. gh actions still does not support canceling a workflow when a branch is pushed to? [09:35] canceling a running workflow i mean [09:35] I don't know [09:37] what else is broken in master [09:37] we have systemd-logind failing often [09:37] but there was one more thing IIRC [09:38] zyga: cgroups on xenial [09:38] ah right [09:38] thanks [09:38] zyga: and unit tests using gpg fail occasionally [09:38] thanks :) [09:38] yeah [09:38] about that [09:39] it fails in azure [09:39] snap sign tests iirc [09:39] not in our gce tests [09:39] azure VM has all kinds of modifications that are well documented [09:39] but gpg? [09:40] PR snapd#9095 closed: boot/bootstate20: unify commit method impls, rm bootState20MarkSuccessful [09:40] PR snapd#9116 closed: tests: add system information and image information when debug info is displayed [09:42] mborzecki: they update lots of things to more recent versions [09:43] I mean, really anything that people use [09:43] and pre-install a load of things [09:43] maybe some interaction there leaks [09:45] PR snapd#9001 closed: o/snapshotstate: helpers for calculating disk space needed for an automatic snapshot (2/N) [09:45] PR snapd#9079 closed: gadget/install: retrieve command lines from bootloader [09:48] mborzecki: should we just merge #9127 ? we can change again when it's used, if we discover we really need it to be different [09:48] PR #9127: bootloader: introduce TrustedAssetsBootloader, implement for grub [09:48] pedronis: i think we should land it and tweak later on [09:48] pedronis: this will probably be clearer when i start opening more bits [09:51] zyga: what's the status of https://bugs.launchpad.net/snapd/+bug/1867193 ? the test PR referenced there was merged long time ago, not a bug once "robust-mount-namespace-updates" are enabled? [09:51] Bug #1867193: failure refreshing snap with content interface [09:51] mborzecki: I'm adding this comment to the merge: Notice that this in practice splits across the divide of recovery vs run mode bootloaders, the complete chains for sealing will need to bridge that. We might use a different helper to produce that. [09:51] I'd mark it as fix released [09:51] the last comment may be a different bug [09:52] pedronis: sounds good [09:55] PR snapd#9124 closed: gadget: introduce content update observer [09:55] PR snapd#9127 closed: bootloader: introduce TrustedAssetsBootloader, implement for grub [09:56] yay [09:56] * mvo hugs mborzecki [10:09] sil2100: snapd is in beta now so we can re-enable beta image builds to pending. let me know if you prefer a mail [10:10] core16 tests are running [10:10] maybe no more typos [10:10] fingers crossed [10:12] zyga: what pr is that? [10:12] mvo: still local [10:12] there's a PR but it's only missing core16 [10:12] ok [10:12] PR number is [10:12] 9125 [10:12] if it passes then master will be better [10:12] well, locally, then I push and open the draft for review === facundo__ is now known as facubatista [10:16] pedronis: #9084 is ready for review (prereq PR landed) [10:16] PR #9084: o/snapstate: check disk space before creating automatic snapshot on remove (3/N) [10:17] tests are running, I need to stop sitting now [10:18] man standing desk is a godsend [10:18] I wish I knew that years ago [10:22] mvo: thanks! On it! [10:26] PR snapd#9134 opened: boot, o/devicestate: TrustedAssetUpdateObserver stubs, hook up to gadget updates [10:37] one more tweak and I think it will be good [10:53] pedronis: also, do you have a moment today to talk about disk space check & multi snap install/refresh? [11:12] mvo: can you merge https://github.com/snapcore/snapd/pull/9133 ? the test failure are unrelated [11:12] PR #9133: osutil: add CommitAs to atomic file [11:28] mborzecki: sure, done - could you please check 9131 ? [11:29] 2020-08-11T08:38:45.2793033Z 2020-08-11 08:38:45 Cannot allocate google:ubuntu-core-20-64: google: read JWT from JSON credentials: 'type' field is "authorized_user" (expected "service_account") [11:29] wow [11:31] hmm? [11:31] JWT? [11:31] PR snapd#9133 closed: osutil: add CommitAs to atomic file [11:31] javascript web token [11:31] (just making stuff up) [11:31] zyga: close ;) [11:31] so hot [11:31] I was outside for a sec [11:31] to move walk around the block [11:32] mvo: have you added skip spread tag after opening 9131? [11:34] mborzecki: yes, just now [11:34] mborzecki: actually silly that I require spread for this [11:35] mvo: i've closed and reopened it, the tag should be picked up now [11:35] mborzecki: aha, nice [11:41] mvo: dailies for beta should now be enabled! You want a new build kicked now? [11:41] sil2100: yes please [11:51] pstolowski: we can chat after the standup if that works for you? [11:52] pstolowski: thanks for merging master into #9084 [11:52] PR #9084: o/snapstate: check disk space before creating automatic snapshot on remove (3/N) [11:56] PR snapd#9131 closed: debian: release 2.46~pre2 [12:02] I think it works now [12:02] single test passes [12:02] let me push [12:04] * mvo crosses finger for zygas pr [12:05] yeah, I want this to be over [12:05] I'll take a quick shower and I'll start working on the other bug [12:05] (cgroups) [12:16] pedronis: yes, after standup is fine [12:17] looking into the cgroup + lxd thing [12:24] core16 is almost done testing, no failures [12:25] mborzecki: some questions/comments in #9134 ? [12:25] PR #9134: boot, o/devicestate: TrustedAssetUpdateObserver stubs, hook up to gadget updates [12:26] pedronis: thanks for the reviews (and mvo too) [12:28] core16 is green [12:28] let's see if everything else passes now [12:28] I'm focusing on the lxd thing now [12:28] first I need to see it [12:28] the fix should not be complicated [12:32] mborzecki: yeah, that's really happening [12:32] I wonder how [12:32] zyga: lxd mounting a unified hierachy? [12:33] yep [12:33] zyga: or maybe systemd in lxd doing that [12:33] clear as day [12:33] I will run again and see which part of the test causes that [12:33] I will push a quick fix so that we're okay [12:36] PR snapd#9019 closed: tests/nested/manual/minimal-smoke: run core smoke tests in a VM meeting minimal requirements [12:36] oh man [12:36] mborzecki: I understand now [12:37] mborzecki: that cgroupv2 is not mounted inside [12:37] pedronis, mvo thanks for feedback on disk space awareness [12:37] it was mounted in the container [12:37] but those are global [12:37] all processes see all cgroups [12:38] now where the hell is it mounted? [12:38] and I think this is more than tests [12:38] we need to adjust snapd code as well [12:39] mborzecki: I'm still unclear where this thing is mounted [12:39] it's not inside the container [12:40] it's not mounted in the inner container either [12:41] but the inner container is bionic [12:41] maybe it's just systemd? [12:48] hmmm [12:48] ok I'll bisect [13:50] I'll go for lunch now, ttyl [14:54] * cachio lunch [15:14] re [15:22] PR snapd#9087 closed: o/snapstate: check available disk space before downloading a snap on install (4/N) <â›” Blocked> [15:42] PR snapd#9135 opened: boot: move bootloaderKernelState20 impls to separate file [15:47] PR snapd#9064 closed: .github/workflows: move snap building to test.yaml as separate cached job [16:01] * zyga explores a fix for the cgroup2 thing [16:01] it's a kernel bu [16:01] *big [16:01] *bug [16:01] or feature [16:01] depending on how you look [16:01] we can only work around it [16:13] kenvandine: that's weird. on focal I seem to have gnome-software installed as a deb (fine). I have the review-tools latest/stable installed so I wanted to show someone the permissions dialog, but gnome-software showed that the review-tools were not installed. I pressed the Install button and they install latest/edge [16:17] zyga: the unified cgroup mount after lxd container started ? [16:17] it's not just thta [16:17] just mount cgroupv2 [16:17] and unmount it [16:17] done [16:17] no containers required [16:18] kenvandine: I wonder if it is because I haven't done 'snap login'? (ie, polkit promted me) [16:29] zyga: ah interesting, maybe we just mount cgroupv2 all the time unconditionally in project prepare and deal with the consequences ? [16:29] ijohnson: nah, that's unnatural [16:30] I've adjusted both snapd and the test that's sensitive to it [16:30] just making final touches [16:32] PR snapd#9136 opened: bootstate20: rm bootState20Modeenv, pass around modeenv directly [16:32] zyga: The dark side of cgroups is a pathway to many abilities some consider ... unnatural [16:32] haha, is that a star wars quote? [16:33] yes haha [16:33] * ijohnson goes away now and does real things [16:33] * ijohnson well tries too anyways [16:38] ijohnson: I pushed more fixes to https://github.com/snapcore/snapd/pull/9125 [16:38] PR #9125: tests: fix issues related to restarting systemd-logind.service <âš  Critical> [16:38] zyga: nice let me take a look, would be nice to land that [16:38] I will observe if it passes and we can then either review the ensemble or review in split branches [16:39] it's still a draft but please have a look for early impression [16:39] zyga: yeah I was just gonna say looking at this and the title it's not obvious how much you had to change, but yes +1 to splitting off the actual Go code changes [16:39] I'll still submit a review for you though [16:39] ok [16:39] the go code changes are optional [16:39] we were lucky [16:39] this is just for completeness [16:40] ah ok, I thought the Go code was necessary to fix the test [16:40] I thought so too but again, we were lucky :) [16:40] :-) [16:40] look at https://github.com/snapcore/snapd/pull/9125/commits/4b3bf6cbb08302b1878a9452e8ddcf9190ab8617 [16:40] PR #9125: tests: fix issues related to restarting systemd-logind.service <âš  Critical> [16:40] zyga: so does this present only on 4.15 kernel or is it 4.15+ ? [16:44] it's any version really [16:44] just that xenial's systemd does not mount v2 at all [16:44] so we noticed [16:48] ah right that makes sense [16:48] I just left a review [16:51] thanks! [16:53] replied [16:53] and looking at kernel docs first [17:42] ijohnson: I'm not entirely sure whether #9136 is or isn't on the path I described in the previous, to be clear I'm not sure the path is an improvement, we would have to try, it has the propery of making boostate20, closer to bootstate16 [17:42] PR #9136: bootstate20: rm bootState20Modeenv, pass around modeenv directly <â›” Blocked> [17:46] pedronis: mmm sorry I didn't see that you want the modeenv in the structs to go away entirely, I thought you just wanted the modeenv struct to go away [17:46] ijohnson: yes, my plan was to have it go away completely, that's why I was suprised that not touching initramfs bits was not a problem [17:47] ijohnson: it seems to be a kernel bug [17:47] I'll report it and add the reference to the code [17:55] ijohnson: I guess no bug report [17:56] ijohnson: quote from #lxcontainers [17:56] > do you have a reference? I will link to that next to our workaround [17:56] nope, that kind of upstream discussion is all private IRC and telegram ;) [17:56] I don't mind, I guess that's enough for now [17:56] tests on https://github.com/snapcore/snapd/pull/9125 are in progress [17:56] PR #9125: tests: fix issues related to restarting systemd-logind.service <âš  Critical> [17:56] I'll review the services PR and EOD [17:57] debating whether it's an issue or not, it's unlikely to be security relevant (as no controllers are auto-loaded on cgroup2) but still trying to argue that it could confuse badly written software on the host [17:58] stgraber: yeah, I think it's more surprising than dangerous [17:59] * zyga is tired [17:59] maybe that review needs to wait [17:59] it's 8PM [18:01] * zyga EODs, more reviews tomorrow [18:01] have a good rest of your day friends! [18:03] pedronis: ok, then in this case let's just abandon this PR and do it post uc20 1.0, I really don't know how we could get rid of modeenv entirely without changing a bunch of function/interface signatures and you're about to go on holiday, so not much point to trying to hash it out now I think [18:03] zyga: nice find thanks for looking into it more [18:04] zyga: consensus seems to be that it's been that way since 2016, nobody is supposed to assume that because something is listed in /proc/self/cgroup that it's mounted, you need to look at mount table too (as controllers never go away) [18:04] ack [18:04] I always found that surprising btw, [18:05] that there are those nice CG identifier numbers [18:05] that are dangling and unrelated to the filesystem [18:05] and that you need to match on the controllers and mountinfo [18:05] it's probably uninteresting as future has cgroup2 only [18:05] but v1 was somewhat messy from this point of view [18:07] ijohnson: yes, I expected we'll need to change some signatures [18:07] yeah I guess I totally missed that point sorry [18:07] I don't think it's worth it right now to try and simplify that at all, seems quite complex [18:08] IMHO this PR is still a win because it's less structs and less abstractions, but I suppose it is more type casting and slightly duplicative code, so I could see it not as a win too [18:08] if you don't think it's worth it I will just close it and we can revisit the refactor $SOME_DAY+2 now [18:09] ijohnson: in light of the fact that we want to do something larger it doesn't feel like a win because of your point two about the extra duplication [18:15] zyga, hey [18:16] did you fix the test tests/core/uc20-recovery at some point right? [18:16] ijohnson: can you explain me this change though: https://github.com/snapcore/snapd/pull/9136/files#diff-bff1ced30ed4581302ca81b7b53908f9L727 is that needed for that case? [18:16] ijohnson, or you did it? [18:16] PR #9136: bootstate20: rm bootState20Modeenv, pass around modeenv directly <â›” Blocked> [18:16] ijohnson: isn't that needed [18:17] ijohnson: ah, it's done lazily by revisions in the new code? [18:21] * cachio afk [18:24] sorry was at lunch [18:24] pedronis: it actually was never needed [18:25] pedronis: we already setup the modeenv when we create the bootState20Base in initramfs.go [18:25] cachio: I think I looked at it and I thought we fixed something there [18:25] ijohnson: ah [18:25] cachio: is it failing again now ? [18:25] I forgot about that [18:26] yeah that's kinda why I disliked how InitramfsRunModeSelectSnapsToMount is working where it just creates the objects and calls a method on them and kinda forgoes all the rest of the stuff in that file [18:26] but meh [18:28] zyga: I'm adding a tweak to our apparmor policy to prevent mounting cgroup2 if host doesn't have it, so we'll avoid that kind of issue for now [18:29] ijohnson: now I'm very tempted to try to see if what I have in mind is smaller or not overall (it might not be, which again would make it a bit meh) [18:30] pedronis: I mean hey go for it if you want, feels a bit superfluous right now but it is fun to try and write simple elegant code sometimes :-D [18:34] ijohnson: are you blocked on something from me? [18:35] pedronis: if you wanted to review the last commit on my cross check PR that is a draft that would be nice, but that is blocked on systemd-mount PR, which is then blocked on x n o x [18:35] pedronis: that would be https://github.com/snapcore/snapd/pull/9081/commits/147e3058ed55ff88e2c3aae0b598b346b6af5576 [18:35] PR #9081: secboot,cmd/snap-bootstrap: cross-check partitions before unlocking, mounting <â›” Blocked> [18:36] ijohnson: that last commit says it's +1711 ? [18:36] pedronis: no github is silly, that diff line is for the whole PR I'm pretty sure [18:36] which contains all the systemd-mount PR [18:37] pedronis: git says ` 7 files changed, 424 insertions(+), 96 deletions(-)` [18:37] yea, that seems more reasonable [18:38] PR snapd#9135 closed: boot: move bootloaderKernelState20 impls to separate file [18:38] thanks [18:44] ijohnson: I did a quick pass there, some comments/questions [18:45] thanks [19:30] ijohnson: so I tried, it gains 30 lines, not sure it's clearer or not, the cost is a new interface. There's probably a bit more that can be tried in initramfs.go [19:32] pedronis: mmm interesting, but yes there's a lot more that can be done in initramfs.go [19:32] I need to go to the bank [19:32] * ijohnson afk for hour or two [19:43] PR snapd#9137 opened: boot: refactor such that bootStateUpdate20 mainly carries Modeenv [20:25] ijohnson: do you see a reason for uc20 installation to be failing on sd (on the nuc, not pi)? [20:27] cmatsuoka: by "sd" do you mean like /dev/sda or like an actual SD card [20:27] ijohnson: I always installed to internal storage on real pc hardware, it never occured to me to boot the image and install on external media [20:27] ijohnson: sd card [20:27] ijohnson: like a giant raspberry pi [20:27] Hypothetically it should work [20:27] Did you find an issue with that setup? [20:28] yes, I think it should too, but that's the scenario that doesn't work for galem [21:28] PR snapd#9138 opened: many: refactor tpm seal parameter setting [21:59] PR snapcraft#3244 opened: file utils: introduce get_host_tool_path() to find commands on host