[06:12] morning [06:21] PR snapd#7703 closed: cmd/snap: make 'snap list' shorten latest/$RISK to $RISK [06:24] hey mborzecki [06:26] mvo: hey [06:36] PR snapd#7454 closed: interfaces: extend the fwupd slot to be implicit on classic [07:06] good morning [07:06] zyga: hey [07:06] 5 days left [07:06] let's make it count! === Girtablulu|Away is now known as Girtablulu [07:18] PR snapd#7717 closed: tests/docker-smoke: add minimal docker smoke test [07:19] mvo: some conflicts in #7714 [07:19] PR #7714: overlord: refactor mgrsSuite and extract kernelSuite [07:19] mvo: i can look into that, unless you've already fixed it [07:23] mborzecki: missed the ping, either way is fine [07:23] hey zyga [07:24] mvo: ok, i'll push it [07:28] ta! [07:30] mvo: #7715 is based on 7714 isn't it? [07:30] PR #7715: overlord: add base->base remodel undo tests and fixes [07:31] mborzecki: correct [07:32] mvo: ok, i'll update that one too then [07:32] ok [07:32] mborzecki: it also needs a review and some more things (that samuele pointed out) but lets do the mechanical bits first [07:33] mborzecki: hm, looks like 7665 is in some strange state, says its waiting for CI for me, mabe we ned to push master to it [07:34] mvo: yeah, i'll be addressing your comments and will push there [07:34] ta [07:48] brb [07:56] mborzecki: are you working on the remodel-kernel-undo-tests PR? it has no conflicts so I could just address the feedback and push and ask you to review :) ? [07:58] mvo: no, i just pushed a little patch to 7714 and gave it +1 [07:58] mborzecki: cool, I will just address samuele points in 7682 then and push and then you can review that too, [07:58] mvo: i think we can land 7714 and then update 7715, otherwise i'll get messy i think [07:58] mborzecki: its a subset so review should be trivial [07:59] mborzecki: aha, even better [07:59] mborzecki: one small issue is that 7714 includes 7682 [08:00] mborzecki: so I need to make this one ok first ./ [08:00] mborzecki: sorry, I should have done the refactor indepedantly [08:00] haha [08:06] re === pstolowski|afk is now known as pstolowski [08:11] morning [08:12] hey pawel [08:21] pstolowski: hey [08:38] o/ [08:47] mborzecki: trying to fix the name=snapd branch now [08:48] mborzecki: made it conditional on v1/v2 mode, to use different directories [08:48] hoping /sys/fs/cgroup/snapd survives into lxd [08:48] and permissions there are sufficient [08:48] zyga: you likely need to merge master, paralle instances landed on friday [08:48] mborzecki: I know, I plan to soon [08:49] mvo: left a comment about retry-tool in place of the while loop, feels a bit awkward bc of grep [08:49] mvo: so i'll skip this one, and the mockSuccessfulReboot(), actually that second one is mroe interesting, it assumes snap_mode = try, but we don't switch the gadget snap during reboot, only base/kernel [08:55] brb [08:57] mborzecki: aha, right - good point [08:57] mborzecki: sorry, I missed that [08:58] mvo: no worries, i think we could make mockSuccessfulReboot work with some tweaking to catch unintended test cases [08:58] mborzecki: *nod* [09:02] chrome said to restart [09:02] mvo: ^ [09:55] mborzecki: mo'in. Does your +1 to #7669 apply to #7670? :) [09:55] PR #7669: snap, cmd/snap: support (but warn) deprecated multi-slash channel [09:55] PR #7670: cmd/snap: support (but warn) using deprecated multi-slash channel [09:56] Chipaca: let me take a look [09:58] Chipaca: idk, the suffering strings are gone ;) [09:58] mborzecki: ikr [09:58] mborzecki: OTOH valid value + error is a weird flex [10:01] mborzecki: how did you make a comment on an error message that is no longer there? [10:02] haha [10:02] yeah [10:04] Chipaca: btw. do we use the british or us spelling in the messages? [10:11] mborzecki: us :-/ [10:14] another test pass, [10:19] test passed [10:19] that's very promising [10:19] running more challenging test [10:40] mvo: #7714 has two +1s but it includes #7682 which has some nits pointed out by pedronis [10:40] mvo: up to you whether to merge it and then followup or what [10:40] PR #7714: overlord: refactor mgrsSuite and extract kernelSuite [10:40] PR #7682: overlord: add kernel remodel undo tests and fix undo [10:40] grr [10:41] #7682 [10:41] PR #7682: overlord: add kernel remodel undo tests and fix undo [10:41] better :) [10:53] Chipaca: I'm just finishing the unit test asked for in 7682, should be ready in some minutes [10:54] making progress, next up is the lxd test [11:11] Chipaca, mborzecki 7682 should be ready now, I addressed the comments from samuele [11:14] updated cgroup dir based on v1 / v2 mode in snapd side, resuming tests [11:15] Chipaca, mborzecki actually, yeah, probably/possible easier if I merge 7714 and do my other bits as a followup, up to you if you feel this is easier to review [11:16] mborzecki: the changes are not too terrible, not great [11:16] mvo: sgtm [11:16] mborzecki: we should do runc-like change later on [11:16] mvo: landing 7714 sgtm [11:16] PR snapd#7714 closed: overlord: refactor mgrsSuite and extract kernelSuite [11:19] mborzecki: 7665 now has conflicts, let me know if I should help. and a strange spread failure [11:20] Chipaca, mborzecki 7682 should be much simpler to review now :) [11:20] \o/ [11:20] tests passed [11:20] good [11:20] moving on [11:20] mvo: that failure in 7665 looks like something raised in the forums during the weekend [11:20] running the lxd test now [11:21] mvo: https://forum.snapcraft.io/t/snap-remove-fails-because-of-user-data/13990 [11:21] mborzecki: saw that one, still puzzled by it [11:22] * Chipaca replies with as much [11:28] mborzecki: ok, feel free to restart the test if its unrelated [11:30] zyga: i've responded to #7675 [11:30] PR #7675: release: preseed mode flag [11:30] pstolowski: thank you, looking now [11:32] pstolowski: replied [11:32] mvo: lxd test has passed with the new feature enabled *outside* [11:33] zyga: thanks [11:33] mvo: this is promising, looking at making the inside case now [11:34] mvo: the other case is just test adjustment which will later on become a change to lxd [11:35] mvo: this is really promising, I think it will work [11:35] zyga: cool [11:46] Chipaca: hmm regarding snap download, there's some code in image/helpers.go that just calls os.Exit(1) on sigint ;) [11:46] mborzecki: does it also Finalize the progress bar? [11:47] (also, yes, exit 1 would probably be the right thing to do) [11:47] (but after finalizing the progress bar) [11:47] it's Finished() followed by os.Exit() [11:47] perfect :) [11:47] Finished, Finalize, Fwhatever [11:47] hm also, snap download --remote leaves a *.partial file behind, but one that goes via snapd does not? [11:47] make-the-terminal-usable-again [11:48] mborzecki: --direct doesn't leave a .partial when interrupted? [11:48] that'd be a(nother) regression [11:49] Chipaca: --direct does leave *.partial, the non-direct one does not [11:49] * mborzecki tries to parse what he just wrote [11:50] mborzecki: ah, that's a known regression [11:50] mborzecki: non-direct does not know how to resume [11:50] and indeed it'll take a refactor of the download endpoint to do it [11:57] Chipaca: yeah, I think we need to talk what to do - I think I will give your idea of "snap download --indirect" a shoot and then we can iterate on this without regressions [11:57] mborzecki: thanks for looking at this so carefully! [11:58] mvo: my biggest concern is that i don't see how to get resume without a rest api refactor [12:00] actually thinking a bit more, it's a minor refactor, a change to which things are obligatory i guess [12:00] and a change to how it's used for sure [12:00] that's my brain coming back to me with this that i'd been mulling over the weekend [12:01] mvo: for snap download, what you want to do is first do an exact search by name, and take the info you get back to (1) check whether you have that file already, (2) check if you have a partial, (3) download exactly that revision, maybe with a resume in it [12:02] so the refactor for download is (a) always take the revision, and (b) optioanlly take a resume position [12:02] that's at the api level; internally it's a bit bigger of a change because snap download itself shouldn't be doing the two requests, i guess? in that maybe there's a way to not have to hit info again server-side [12:02] in the fast case at least [12:03] but that can be a later optimization [12:03] or [12:03] OR! [12:04] we could make snap download take the snap id, and then reconstructing the download irl is easier? [12:09] mvo: I'm working on the denial, no idea what is it yet [12:11] mvo: I've simplified the test and will iterate interactively on the next shell [12:12] Chipaca: for starters we could just write the data to *.partial file and rename, seeing *.snap that isn't really complete is bit confusing [12:13] cachio: did you have a patched version of spread that can run tests in a specific order? [12:13] mborzecki, yes [12:14] can you point me to it? [12:14] currently stuck on one denial: [12:14] cannot remount /sys/fs/cgroup read-write: Permission denied [12:14] Nov 04 12:13:45 nov041205-146458 audit[28564]: AVC apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-my-ubuntu_" name="/sys/fs/cgroup/" pid=28564 comm="snap-confine" flags="rw, remount" [12:14] mborzecki, sure [12:15] I changed the apparmor profile of lxd to contain [12:15] mount options=(remount, rw) /sys/fs/cgroup/, [12:15] mborzecki, https://cachio.s3.amazonaws.com/spread/spread2 [12:15] but perhaps I misunderstand, need to check what those paths map to in practice [12:15] zyga: isn't that a profile that lxd created for the container? [12:15] mborzecki: yes [12:15] cachio: thanks! [12:15] mborzecki: the denial is for the profile from lxd [12:15] mborzecki, np [12:15] mborzecki: that is what I am changing [12:16] mborzecki: inside the test I patch and reload that profile [12:16] mborzecki: on the container host [12:16] ah ok [12:16] just use: -order -workers 1 [12:16] mborzecki, this will use 1 instance for all the tests [12:17] and run the tests in the order you add them in the command line [12:18] Chipaca: thank you, yeah, in some sense its an optimization [12:18] mborzecki: interesting, I can switch all of the apparmor for lxd into complain mode [12:18] cachio: ok, running it now [12:18] and it doesn't help [12:19] I probably don't understand what is really going on [12:21] mborzecki: 7665 has now conflicts :/ [12:21] mvo: mhm, first trying to catch the problem with removal of /var/snap/ [12:24] mborzecki: no worries, I can fix the conflicts [12:26] * pstolowski lunch [12:29] mvo: thanks for merging master to the remodel branch! [12:31] mborzecki: no worries, I want to see it merged :) [12:36] mvo: is there an IRL bug that triggered #7682? [12:36] PR #7682: overlord: add kernel remodel undo tests and fix undo [12:36] zyga, hey [12:36] Chipaca: no real life bug [12:37] did you see something like this: https://paste.ubuntu.com/p/6NYyBbjtMW/ [12:37] mvo: phew [12:37] Chipaca: it was just the result of missing undo testing in the handlers code [12:37] Chipaca: yeah [12:37] it is failing on ubunto-core-18 tests on pi3 for edge cahnnel [12:37] * mvo gets some lunch [12:38] making some more progress [12:38] hey cachio [12:38] I realized lxd start / stop rewrites the container profile [12:38] so my changes are not effective [12:38] I started the container [12:39] and use apparmor parser to switch it to complain mode after taht [12:39] I used aa-status to check what the outcome is after taht [12:39] and see that some of the chagnes are applied, there are a number of processes with complain profiles [12:40] but I can see that the bulk of the container is confined [12:40] I'm exploring why [12:40] the output of aa-status is [12:40] https://www.irccloud.com/pastebin/3m0f3aYx/ [12:40] in case someone figures out why [12:46] zyga, all this is related to the mount namespaces error? [12:47] cachio: no, this is ongoing work on new cgroup [12:48] ahh, ok [12:54] I managed to switch snap-confine profile into complain mode [12:55] it seems there's no propagation of any kind [12:55] so changing the profile on the host doesn't impact profiles derived from it in any way [12:55] but that's fine [12:55] so [12:55] progress: [12:55] snap-confine inside is in complain mode but I still get a denial [12:55] looking at why now :) [12:55] perhaps I just put the wrong one in complain mode [12:55] it's not re-execing [12:55] checking [12:56] more profiles in complain mode https://www.irccloud.com/pastebin/cwRZuduL/ [12:58] mvo: in the meantime I'm running all tests to see if anything is impacted , 50% there, I'll know soon [12:58] google:ubuntu-core-18-64 .../tests/main/remodel-gadget# ls /var/snap/pc [12:58] 1001 1002 [12:58] heh, wtf?? [12:59] * diddledan sees zyga wants complaints so moans that it's not bed time yet [13:06] _weird_ [13:06] improvement [13:06] I got ALLOWED messages [13:06] so it did switch to complain mode [13:06] and the things mentioned were really about what is new in the code [13:07] but I got the error still [13:07] but even though i still got an error from ount [13:07] checking why [13:08] the error in case anyone sees error in my logic https://www.irccloud.com/pastebin/9JmCN4RA/ [13:08] * Chipaca wanders off mumbling something about lunch [13:08] Chipaca: good idea [13:08] I'll update the test to do what I did manually [13:08] and wrap up for lunch [13:11] test modified, restarted clean [13:11] I think it is better [13:11] I'll double check the code, maybe there is something silly in my patch [13:13] mvo: we are all doomed ;-) [13:13] https://m.tagesspiegel.de/berlin/experten-warnten-schon-2017-it-katastrophe-am-berliner-kammergericht-kam-mit-ansage/25163810.html [13:14] I want more of https://mspoweruser.com/microsoft-4-day-workweek/ though ;) [13:15] but first, let's ship it for vancouver [13:15] mvo: regular tests are move than 90% complete now [13:15] mvo: if anything exploded I'll know soon [13:26] PR snapd#7718 opened: sandbox/seccomp: accept build ID generated by Go toolchain [13:26] mvo: zyga: it'd be nice to get it for 2.43 [13:26] also pinged jdstrand for a review though the bit is trivial [13:34] mborzecki, hey, this error is hapenning when updating the arch image https://travis-ci.org/snapcore/spread-cron/builds/607000141#L2056 [13:35] any idea ? [13:37] cachio: yeah, looks like the package needs an update [13:37] let me see if there's anything i can do about it [13:39] * zyga is afk for lunch now [13:39] only lxd and mount-ns tests failed, so good and expected [13:40] mborzecki, nice thanks [13:43] cachio: pinged the guy who maintains the package with a patch that should fix it, let's see if it lands today/tomorrow, if not i can fork the package, put it on github and we can pull it from that location [13:45] mborzecki, nice thanks!! [13:53] hmmm [13:53] stgraber: hello, can you please help me how lxd uses apparmor profiles [13:54] stgraber: I'm trying to understand how to switch the entire container into complain mode [13:54] stgraber: I tried running apparmor_parser -r --complain /var/snap/lxd/common/lxd/security/... [13:54] stgraber: using aa-status on the host I can see some changes but I still see a number of confined profiles [13:54] er, enforcing profiles [13:55] e.g. the output of aa-status from the container host is [13:55] https://www.irccloud.com/pastebin/08Gl52Z9/ [13:55] I would appreciate if you can shed some light on this [13:55] LXD regenerates and reload the container profile on container startup, files are mostly there to inspect but changes won't persist [14:07] stgraber: right, but is there more than one file to change? [14:07] stgraber: I'm getting things I don't quite understand, after a lot of hand-called apparmor_parser --complian --reload I get some ALLOW but still some EPERM and nothing logged with DENIED [14:08] stgraber: is it safe to change any profiles after the container is started? [14:09] stgraber: note that I suspect that "deny" rules silence logging [14:09] stgraber: so perhaps what I'm hitting is a deny somewhere [14:10] stgraber: alternatively [14:10] stgraber: is there a knob on lxd that would allow me to run a container with complain more apparmor? [14:23] 15 minute break [14:23] stgraber: I'm all ears if you have some ideas [14:23] stgraber: but I'll be sending patches to lxd to help with this case, it's a one-liner *I believe* for now [14:25] zyga: what' [14:25] zyga: what's the one liner? [14:25] stgraber: to remount /sys/fs/cgroup rw [14:25] all I need is [14:25] mount -o remount,rw /sys/fs/cgroup [14:25] mkdir /sys/fs/cgroup/snapd [14:25] mount -o remount,ro /sys/fs/cgroup [14:26] mount -t cgroup cgroup /sys/fs/cgroup/snapd -o none,name=snapd [14:26] that's all [14:26] I believe all but the rw remount are permitted [14:26] zyga: I'm reasonably sure that we cannot allow that without causing a big security issue [14:26] hmmm [14:26] because of remount or because of something else [14:27] please tell me what you think, it's important in our design [14:27] because of the apparmor rule generator being broken [14:28] stgraber: is there anything we can do? [14:28] if rw,remount hits the parser bug, no [14:28] aha [14:28] adding such a rule would allow every single mounts to succeed [14:29] zyga: why are you trying to mount your own v1 hierarchy anyway? [14:29] stgraber: can we do something else to avoid the need to mkdir, can we mkdir early in systemd somewhere? [14:30] stgraber: for tracking processes correctly, it's the only way we found that works well in v1 and v2 mode [14:30] zyga: how do you do it in v2 where /sys/fs/cgroup is a cgroup2 mount? [14:30] stgraber: in /run/snapd/cgroup [14:30] ah so you're assuming that a cgroup2 system will still have cgroup1 enabled in the kernel? [14:31] stgraber: I wanted it to be in /run/snapd/cgroup now but this is more complex to do so that it works in lxd correctly [14:31] stgraber: yes, for now yes [14:31] stgraber: we need to work on extending cgroup2 usage in systemd [14:31] zyga: why would /run/snapd/cgroup be harder in LXD? [14:31] zyga: that should actually be easier. Obviously you'll have the problem that this will fail on any system which doesn't have LXD on the host [14:32] stgraber: because /run is changed by lxd and we need to change it so that it is not necessary to have that mount inside the per-snap mount namespace [14:32] PR snapd#7670 closed: cmd/snap: support (but warn) using deprecated multi-slash channel [14:32] zyga: so you're going to break a LOT of systems, but that's going to happen regardless so long as you want your own named controller [14:32] stgraber: can you explain how we will break things? [14:32] zyga: unprivileged users are only allowed to mount existing controllers [14:32] zyga: if your host doesn't have a name=snapd controller, no container will be allowed to mount it [14:32] aha [14:32] the host will have it [14:33] no it won't [14:33] perhaps I misunderstand [14:33] not for anyone who's not using the snap [14:33] and are there people who are not using lxd as a snap that will run into this? [14:33] so you'll break all chromebooks, all gentoo users, all alpine users, most arch users, ... [14:33] I see [14:33] that's bad news indeed [14:33] I wonder if we have some ideas as a way out then [14:34] stgraber: can you tell me more about the rule [14:34] we actually ran into that very same problem when systemd introduced name=systemd in the first place [14:34] stgraber: about what makes the cgroup mount fail if it doesn't exist on the host [14:34] any distro which wasn't using systemd was unable to run systemd containers because of it [14:34] stgraber: note that the feature is still not on by default, or merged for that matter [14:34] mvo: got a fix for the test failure that occurred in #7665, i'll propse it separately unless the travis job that's currently running fails [14:34] stgraber: we just need _a_ way to track processes reliably [14:34] PR #7665: devicestate: add support for gadget->gadget remodel [14:34] stgraber: and we have found none better than this [14:35] zyga: the kernel effectively looks at all existing controllers (same list you'd see in /proc/self/cgroups) and anything that's not in that list, you can't mount unless you're real root outside of a cgroup namespace [14:35] I see [14:35] and it treats the name=... thing as a controller in this sense? [14:36] note that our hierarchy has no controllers at all [14:36] but I'm sure you know that, just checking if I understand right [14:36] zyga: given that your current plan will not work on any distro which has fully transitioned to cgroup2 (no more cgroup1 in kernel) and will break for a lot of container users, it doesn't seem to me like a particularly good plan :) [14:36] zyga: yes [14:36] stgraber: there are no better plans yet [14:36] zyga: you wouldn't want an unprivileged container being able to mount thousands of those, allocating global kernel structs [14:36] stgraber: pure v2 doesn't let us track processes because we'll interfere with systemd [14:37] zyga: can't you just keep snap-confine as the parent of the processes in the snap and make it a subreaper? [14:38] zyga: stgraber: tbh we don't seem to have a reliable, race free way of tracking processes in v2 world without complicating things [14:38] stgraber: if we do that, we know if a process dies [14:38] stgraber: but not if there are any left [14:38] zyga: but you know that when you die all subprocesses will too [14:38] stgraber: we'd need to match fork and exit reliably but that's not what subreaper can do [14:39] stgraber: I'm not sure I follow, are you saying that we can then kill snap-confine tracking process to kill all the children? [14:39] zyga: yes [14:39] stgraber: that's not what we are trying to do [14:40] stgraber: the goal is to know when it's safe to refresh [14:40]