[06:12] morning [06:21] PR snapd#7703 closed: cmd/snap: make 'snap list' shorten latest/$RISK to $RISK [06:24] hey mborzecki [06:26] mvo: hey [06:36] PR snapd#7454 closed: interfaces: extend the fwupd slot to be implicit on classic [07:06] good morning [07:06] zyga: hey [07:06] 5 days left [07:06] let's make it count! === Girtablulu|Away is now known as Girtablulu [07:18] PR snapd#7717 closed: tests/docker-smoke: add minimal docker smoke test [07:19] mvo: some conflicts in #7714 [07:19] PR #7714: overlord: refactor mgrsSuite and extract kernelSuite [07:19] mvo: i can look into that, unless you've already fixed it [07:23] mborzecki: missed the ping, either way is fine [07:23] hey zyga [07:24] mvo: ok, i'll push it [07:28] ta! [07:30] mvo: #7715 is based on 7714 isn't it? [07:30] PR #7715: overlord: add base->base remodel undo tests and fixes [07:31] mborzecki: correct [07:32] mvo: ok, i'll update that one too then [07:32] ok [07:32] mborzecki: it also needs a review and some more things (that samuele pointed out) but lets do the mechanical bits first [07:33] mborzecki: hm, looks like 7665 is in some strange state, says its waiting for CI for me, mabe we ned to push master to it [07:34] mvo: yeah, i'll be addressing your comments and will push there [07:34] ta [07:48] brb [07:56] mborzecki: are you working on the remodel-kernel-undo-tests PR? it has no conflicts so I could just address the feedback and push and ask you to review :) ? [07:58] mvo: no, i just pushed a little patch to 7714 and gave it +1 [07:58] mborzecki: cool, I will just address samuele points in 7682 then and push and then you can review that too, [07:58] mvo: i think we can land 7714 and then update 7715, otherwise i'll get messy i think [07:58] mborzecki: its a subset so review should be trivial [07:59] mborzecki: aha, even better [07:59] mborzecki: one small issue is that 7714 includes 7682 [08:00] mborzecki: so I need to make this one ok first ./ [08:00] mborzecki: sorry, I should have done the refactor indepedantly [08:00] haha [08:06] re === pstolowski|afk is now known as pstolowski [08:11] morning [08:12] hey pawel [08:21] pstolowski: hey [08:38] o/ [08:47] mborzecki: trying to fix the name=snapd branch now [08:48] mborzecki: made it conditional on v1/v2 mode, to use different directories [08:48] hoping /sys/fs/cgroup/snapd survives into lxd [08:48] and permissions there are sufficient [08:48] zyga: you likely need to merge master, paralle instances landed on friday [08:48] mborzecki: I know, I plan to soon [08:49] mvo: left a comment about retry-tool in place of the while loop, feels a bit awkward bc of grep [08:49] mvo: so i'll skip this one, and the mockSuccessfulReboot(), actually that second one is mroe interesting, it assumes snap_mode = try, but we don't switch the gadget snap during reboot, only base/kernel [08:55] brb [08:57] mborzecki: aha, right - good point [08:57] mborzecki: sorry, I missed that [08:58] mvo: no worries, i think we could make mockSuccessfulReboot work with some tweaking to catch unintended test cases [08:58] mborzecki: *nod* [09:02] chrome said to restart [09:02] mvo: ^ [09:55] mborzecki: mo'in. Does your +1 to #7669 apply to #7670? :) [09:55] PR #7669: snap, cmd/snap: support (but warn) deprecated multi-slash channel [09:55] PR #7670: cmd/snap: support (but warn) using deprecated multi-slash channel [09:56] Chipaca: let me take a look [09:58] Chipaca: idk, the suffering strings are gone ;) [09:58] mborzecki: ikr [09:58] mborzecki: OTOH valid value + error is a weird flex [10:01] mborzecki: how did you make a comment on an error message that is no longer there? [10:02] haha [10:02] yeah [10:04] Chipaca: btw. do we use the british or us spelling in the messages? [10:11] mborzecki: us :-/ [10:14] another test pass, [10:19] test passed [10:19] that's very promising [10:19] running more challenging test [10:40] mvo: #7714 has two +1s but it includes #7682 which has some nits pointed out by pedronis [10:40] mvo: up to you whether to merge it and then followup or what [10:40] PR #7714: overlord: refactor mgrsSuite and extract kernelSuite [10:40] PR #7682: overlord: add kernel remodel undo tests and fix undo [10:40] grr [10:41] #7682 [10:41] PR #7682: overlord: add kernel remodel undo tests and fix undo [10:41] better :) [10:53] Chipaca: I'm just finishing the unit test asked for in 7682, should be ready in some minutes [10:54] making progress, next up is the lxd test [11:11] Chipaca, mborzecki 7682 should be ready now, I addressed the comments from samuele [11:14] updated cgroup dir based on v1 / v2 mode in snapd side, resuming tests [11:15] Chipaca, mborzecki actually, yeah, probably/possible easier if I merge 7714 and do my other bits as a followup, up to you if you feel this is easier to review [11:16] mborzecki: the changes are not too terrible, not great [11:16] mvo: sgtm [11:16] mborzecki: we should do runc-like change later on [11:16] mvo: landing 7714 sgtm [11:16] PR snapd#7714 closed: overlord: refactor mgrsSuite and extract kernelSuite [11:19] mborzecki: 7665 now has conflicts, let me know if I should help. and a strange spread failure [11:20] Chipaca, mborzecki 7682 should be much simpler to review now :) [11:20] \o/ [11:20] tests passed [11:20] good [11:20] moving on [11:20] mvo: that failure in 7665 looks like something raised in the forums during the weekend [11:20] running the lxd test now [11:21] mvo: https://forum.snapcraft.io/t/snap-remove-fails-because-of-user-data/13990 [11:21] mborzecki: saw that one, still puzzled by it [11:22] * Chipaca replies with as much [11:28] mborzecki: ok, feel free to restart the test if its unrelated [11:30] zyga: i've responded to #7675 [11:30] PR #7675: release: preseed mode flag [11:30] pstolowski: thank you, looking now [11:32] pstolowski: replied [11:32] mvo: lxd test has passed with the new feature enabled *outside* [11:33] zyga: thanks [11:33] mvo: this is promising, looking at making the inside case now [11:34] mvo: the other case is just test adjustment which will later on become a change to lxd [11:35] mvo: this is really promising, I think it will work [11:35] zyga: cool [11:46] Chipaca: hmm regarding snap download, there's some code in image/helpers.go that just calls os.Exit(1) on sigint ;) [11:46] mborzecki: does it also Finalize the progress bar? [11:47] (also, yes, exit 1 would probably be the right thing to do) [11:47] (but after finalizing the progress bar) [11:47] it's Finished() followed by os.Exit() [11:47] perfect :) [11:47] Finished, Finalize, Fwhatever [11:47] hm also, snap download --remote leaves a *.partial file behind, but one that goes via snapd does not? [11:47] make-the-terminal-usable-again [11:48] mborzecki: --direct doesn't leave a .partial when interrupted? [11:48] that'd be a(nother) regression [11:49] Chipaca: --direct does leave *.partial, the non-direct one does not [11:49] * mborzecki tries to parse what he just wrote [11:50] mborzecki: ah, that's a known regression [11:50] mborzecki: non-direct does not know how to resume [11:50] and indeed it'll take a refactor of the download endpoint to do it [11:57] Chipaca: yeah, I think we need to talk what to do - I think I will give your idea of "snap download --indirect" a shoot and then we can iterate on this without regressions [11:57] mborzecki: thanks for looking at this so carefully! [11:58] mvo: my biggest concern is that i don't see how to get resume without a rest api refactor [12:00] actually thinking a bit more, it's a minor refactor, a change to which things are obligatory i guess [12:00] and a change to how it's used for sure [12:00] that's my brain coming back to me with this that i'd been mulling over the weekend [12:01] mvo: for snap download, what you want to do is first do an exact search by name, and take the info you get back to (1) check whether you have that file already, (2) check if you have a partial, (3) download exactly that revision, maybe with a resume in it [12:02] so the refactor for download is (a) always take the revision, and (b) optioanlly take a resume position [12:02] that's at the api level; internally it's a bit bigger of a change because snap download itself shouldn't be doing the two requests, i guess? in that maybe there's a way to not have to hit info again server-side [12:02] in the fast case at least [12:03] but that can be a later optimization [12:03] or [12:03] OR! [12:04] we could make snap download take the snap id, and then reconstructing the download irl is easier? [12:09] mvo: I'm working on the denial, no idea what is it yet [12:11] mvo: I've simplified the test and will iterate interactively on the next shell [12:12] Chipaca: for starters we could just write the data to *.partial file and rename, seeing *.snap that isn't really complete is bit confusing [12:13] cachio: did you have a patched version of spread that can run tests in a specific order? [12:13] mborzecki, yes [12:14] can you point me to it? [12:14] currently stuck on one denial: [12:14] cannot remount /sys/fs/cgroup read-write: Permission denied [12:14] Nov 04 12:13:45 nov041205-146458 audit[28564]: AVC apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-my-ubuntu_" name="/sys/fs/cgroup/" pid=28564 comm="snap-confine" flags="rw, remount" [12:14] mborzecki, sure [12:15] I changed the apparmor profile of lxd to contain [12:15] mount options=(remount, rw) /sys/fs/cgroup/, [12:15] mborzecki, https://cachio.s3.amazonaws.com/spread/spread2 [12:15] but perhaps I misunderstand, need to check what those paths map to in practice [12:15] zyga: isn't that a profile that lxd created for the container? [12:15] mborzecki: yes [12:15] cachio: thanks! [12:15] mborzecki: the denial is for the profile from lxd [12:15] mborzecki, np [12:15] mborzecki: that is what I am changing [12:16] mborzecki: inside the test I patch and reload that profile [12:16] mborzecki: on the container host [12:16] ah ok [12:16] just use: -order -workers 1 [12:16] mborzecki, this will use 1 instance for all the tests [12:17] and run the tests in the order you add them in the command line [12:18] Chipaca: thank you, yeah, in some sense its an optimization [12:18] mborzecki: interesting, I can switch all of the apparmor for lxd into complain mode [12:18] cachio: ok, running it now [12:18] and it doesn't help [12:19] I probably don't understand what is really going on [12:21] mborzecki: 7665 has now conflicts :/ [12:21] mvo: mhm, first trying to catch the problem with removal of /var/snap/ [12:24] mborzecki: no worries, I can fix the conflicts [12:26] * pstolowski lunch [12:29] mvo: thanks for merging master to the remodel branch! [12:31] mborzecki: no worries, I want to see it merged :) [12:36] mvo: is there an IRL bug that triggered #7682? [12:36] PR #7682: overlord: add kernel remodel undo tests and fix undo [12:36] zyga, hey [12:36] Chipaca: no real life bug [12:37] did you see something like this: https://paste.ubuntu.com/p/6NYyBbjtMW/ [12:37] mvo: phew [12:37] Chipaca: it was just the result of missing undo testing in the handlers code [12:37] Chipaca: yeah [12:37] it is failing on ubunto-core-18 tests on pi3 for edge cahnnel [12:37] * mvo gets some lunch [12:38] making some more progress [12:38] hey cachio [12:38] I realized lxd start / stop rewrites the container profile [12:38] so my changes are not effective [12:38] I started the container [12:39] and use apparmor parser to switch it to complain mode after taht [12:39] I used aa-status to check what the outcome is after taht [12:39] and see that some of the chagnes are applied, there are a number of processes with complain profiles [12:40] but I can see that the bulk of the container is confined [12:40] I'm exploring why [12:40] the output of aa-status is [12:40] https://www.irccloud.com/pastebin/3m0f3aYx/ [12:40] in case someone figures out why [12:46] zyga, all this is related to the mount namespaces error? [12:47] cachio: no, this is ongoing work on new cgroup [12:48] ahh, ok [12:54] I managed to switch snap-confine profile into complain mode [12:55] it seems there's no propagation of any kind [12:55] so changing the profile on the host doesn't impact profiles derived from it in any way [12:55] but that's fine [12:55] so [12:55] progress: [12:55] snap-confine inside is in complain mode but I still get a denial [12:55] looking at why now :) [12:55] perhaps I just put the wrong one in complain mode [12:55] it's not re-execing [12:55] checking [12:56] more profiles in complain mode https://www.irccloud.com/pastebin/cwRZuduL/ [12:58] mvo: in the meantime I'm running all tests to see if anything is impacted , 50% there, I'll know soon [12:58] google:ubuntu-core-18-64 .../tests/main/remodel-gadget# ls /var/snap/pc [12:58] 1001 1002 [12:58] heh, wtf?? [12:59] * diddledan sees zyga wants complaints so moans that it's not bed time yet [13:06] _weird_ [13:06] improvement [13:06] I got ALLOWED messages [13:06] so it did switch to complain mode [13:06] and the things mentioned were really about what is new in the code [13:07] but I got the error still [13:07] but even though i still got an error from ount [13:07] checking why [13:08] the error in case anyone sees error in my logic https://www.irccloud.com/pastebin/9JmCN4RA/ [13:08] * Chipaca wanders off mumbling something about lunch [13:08] Chipaca: good idea [13:08] I'll update the test to do what I did manually [13:08] and wrap up for lunch [13:11] test modified, restarted clean [13:11] I think it is better [13:11] I'll double check the code, maybe there is something silly in my patch [13:13] mvo: we are all doomed ;-) [13:13] https://m.tagesspiegel.de/berlin/experten-warnten-schon-2017-it-katastrophe-am-berliner-kammergericht-kam-mit-ansage/25163810.html [13:14] I want more of https://mspoweruser.com/microsoft-4-day-workweek/ though ;) [13:15] but first, let's ship it for vancouver [13:15] mvo: regular tests are move than 90% complete now [13:15] mvo: if anything exploded I'll know soon [13:26] PR snapd#7718 opened: sandbox/seccomp: accept build ID generated by Go toolchain [13:26] mvo: zyga: it'd be nice to get it for 2.43 [13:26] also pinged jdstrand for a review though the bit is trivial [13:34] mborzecki, hey, this error is hapenning when updating the arch image https://travis-ci.org/snapcore/spread-cron/builds/607000141#L2056 [13:35] any idea ? [13:37] cachio: yeah, looks like the package needs an update [13:37] let me see if there's anything i can do about it [13:39] * zyga is afk for lunch now [13:39] only lxd and mount-ns tests failed, so good and expected [13:40] mborzecki, nice thanks [13:43] cachio: pinged the guy who maintains the package with a patch that should fix it, let's see if it lands today/tomorrow, if not i can fork the package, put it on github and we can pull it from that location [13:45] mborzecki, nice thanks!! [13:53] hmmm [13:53] stgraber: hello, can you please help me how lxd uses apparmor profiles [13:54] stgraber: I'm trying to understand how to switch the entire container into complain mode [13:54] stgraber: I tried running apparmor_parser -r --complain /var/snap/lxd/common/lxd/security/... [13:54] stgraber: using aa-status on the host I can see some changes but I still see a number of confined profiles [13:54] er, enforcing profiles [13:55] e.g. the output of aa-status from the container host is [13:55] https://www.irccloud.com/pastebin/08Gl52Z9/ [13:55] I would appreciate if you can shed some light on this [13:55] LXD regenerates and reload the container profile on container startup, files are mostly there to inspect but changes won't persist [14:07] stgraber: right, but is there more than one file to change? [14:07] stgraber: I'm getting things I don't quite understand, after a lot of hand-called apparmor_parser --complian --reload I get some ALLOW but still some EPERM and nothing logged with DENIED [14:08] stgraber: is it safe to change any profiles after the container is started? [14:09] stgraber: note that I suspect that "deny" rules silence logging [14:09] stgraber: so perhaps what I'm hitting is a deny somewhere [14:10] stgraber: alternatively [14:10] stgraber: is there a knob on lxd that would allow me to run a container with complain more apparmor? [14:23] 15 minute break [14:23] stgraber: I'm all ears if you have some ideas [14:23] stgraber: but I'll be sending patches to lxd to help with this case, it's a one-liner *I believe* for now [14:25] zyga: what' [14:25] zyga: what's the one liner? [14:25] stgraber: to remount /sys/fs/cgroup rw [14:25] all I need is [14:25] mount -o remount,rw /sys/fs/cgroup [14:25] mkdir /sys/fs/cgroup/snapd [14:25] mount -o remount,ro /sys/fs/cgroup [14:26] mount -t cgroup cgroup /sys/fs/cgroup/snapd -o none,name=snapd [14:26] that's all [14:26] I believe all but the rw remount are permitted [14:26] zyga: I'm reasonably sure that we cannot allow that without causing a big security issue [14:26] hmmm [14:26] because of remount or because of something else [14:27] please tell me what you think, it's important in our design [14:27] because of the apparmor rule generator being broken [14:28] stgraber: is there anything we can do? [14:28] if rw,remount hits the parser bug, no [14:28] aha [14:28] adding such a rule would allow every single mounts to succeed [14:29] zyga: why are you trying to mount your own v1 hierarchy anyway? [14:29] stgraber: can we do something else to avoid the need to mkdir, can we mkdir early in systemd somewhere? [14:30] stgraber: for tracking processes correctly, it's the only way we found that works well in v1 and v2 mode [14:30] zyga: how do you do it in v2 where /sys/fs/cgroup is a cgroup2 mount? [14:30] stgraber: in /run/snapd/cgroup [14:30] ah so you're assuming that a cgroup2 system will still have cgroup1 enabled in the kernel? [14:31] stgraber: I wanted it to be in /run/snapd/cgroup now but this is more complex to do so that it works in lxd correctly [14:31] stgraber: yes, for now yes [14:31] stgraber: we need to work on extending cgroup2 usage in systemd [14:31] zyga: why would /run/snapd/cgroup be harder in LXD? [14:31] zyga: that should actually be easier. Obviously you'll have the problem that this will fail on any system which doesn't have LXD on the host [14:32] stgraber: because /run is changed by lxd and we need to change it so that it is not necessary to have that mount inside the per-snap mount namespace [14:32] PR snapd#7670 closed: cmd/snap: support (but warn) using deprecated multi-slash channel [14:32] zyga: so you're going to break a LOT of systems, but that's going to happen regardless so long as you want your own named controller [14:32] stgraber: can you explain how we will break things? [14:32] zyga: unprivileged users are only allowed to mount existing controllers [14:32] zyga: if your host doesn't have a name=snapd controller, no container will be allowed to mount it [14:32] aha [14:32] the host will have it [14:33] no it won't [14:33] perhaps I misunderstand [14:33] not for anyone who's not using the snap [14:33] and are there people who are not using lxd as a snap that will run into this? [14:33] so you'll break all chromebooks, all gentoo users, all alpine users, most arch users, ... [14:33] I see [14:33] that's bad news indeed [14:33] I wonder if we have some ideas as a way out then [14:34] stgraber: can you tell me more about the rule [14:34] we actually ran into that very same problem when systemd introduced name=systemd in the first place [14:34] stgraber: about what makes the cgroup mount fail if it doesn't exist on the host [14:34] any distro which wasn't using systemd was unable to run systemd containers because of it [14:34] stgraber: note that the feature is still not on by default, or merged for that matter [14:34] mvo: got a fix for the test failure that occurred in #7665, i'll propse it separately unless the travis job that's currently running fails [14:34] stgraber: we just need _a_ way to track processes reliably [14:34] PR #7665: devicestate: add support for gadget->gadget remodel [14:34] stgraber: and we have found none better than this [14:35] zyga: the kernel effectively looks at all existing controllers (same list you'd see in /proc/self/cgroups) and anything that's not in that list, you can't mount unless you're real root outside of a cgroup namespace [14:35] I see [14:35] and it treats the name=... thing as a controller in this sense? [14:36] note that our hierarchy has no controllers at all [14:36] but I'm sure you know that, just checking if I understand right [14:36] zyga: given that your current plan will not work on any distro which has fully transitioned to cgroup2 (no more cgroup1 in kernel) and will break for a lot of container users, it doesn't seem to me like a particularly good plan :) [14:36] zyga: yes [14:36] stgraber: there are no better plans yet [14:36] zyga: you wouldn't want an unprivileged container being able to mount thousands of those, allocating global kernel structs [14:36] stgraber: pure v2 doesn't let us track processes because we'll interfere with systemd [14:37] zyga: can't you just keep snap-confine as the parent of the processes in the snap and make it a subreaper? [14:38] zyga: stgraber: tbh we don't seem to have a reliable, race free way of tracking processes in v2 world without complicating things [14:38] stgraber: if we do that, we know if a process dies [14:38] stgraber: but not if there are any left [14:38] zyga: but you know that when you die all subprocesses will too [14:38] stgraber: we'd need to match fork and exit reliably but that's not what subreaper can do [14:39] stgraber: I'm not sure I follow, are you saying that we can then kill snap-confine tracking process to kill all the children? [14:39] zyga: yes [14:39] stgraber: that's not what we are trying to do [14:40] stgraber: the goal is to know when it's safe to refresh [14:40] stgraber: when no apps are running [14:40] stgraber: in a zygote-like mode when snap-confine sticks around and watches processes die [14:41] stgraber: we could perhaps know something useful but I don't believe we can do what we want to with just that [14:43] mborzecki, mvo: I think we need to abort and come up with another idea [14:43] based on what stgraber said above I don't believe this is something we can salvage unfortunately [14:44] this still leaves us with no tracking, let alone notification past v1 [14:45] stgraber: how did you fix the issue affecting systemd needing to mount name=systemd [14:46] zyga: most distros switched to systemd, the rest are mounting the systemd controller on the host anyway [14:46] stgraber: I see [14:47] mborzecki: thank you [14:48] mvo, mborzecki: can we meet later today [14:48] zyga: so sadly there's no nice way that I know for you to get notified when a namespace goes away, otherwise you could use something like a uts namespace [14:48] I have an idea [14:49] stgraber: I planned to use release agent on the name=snapd hierarchy [14:49] but thinking about it now [14:49] how to work without any cgroups [14:49] zyga: have one namespace per snap, dump all new processes in there, then when you want to know if the snap is safe to refresh, check if any process uses it [14:49] stgraber: one uts namespace? [14:49] stgraber: and in v2 mode? [14:49] stgraber: in v1 we can use the freezer for that [14:50] stgraber: (note that we need to have per-app understanding in our current model but perhaps we could drop that) [14:50] stgraber: but what the name=snapd idea was about is something that works both in v1 and v2 mode [14:50] zyga: doesn't matter v1/v2, this wouldn't use cgroups at all [14:50] ah [14:50] sorry [14:50] I didn't understand [14:50] uts namespace [14:50] so iterate over process [14:51] and check which uts namespace is used by each? [14:51] and check if any of those match that of our snap processes? [14:51] yeah, that's the part that's not amazing, having to iterate but otherwise should work fine [14:51] yes [14:51] that might w [14:51] that might work [14:51] I was thinking to use the existing mount namespace [14:51] given that we already create one for strict snaps [14:51] and are about to for classic snaps [14:52] yeah but your mntns are per snap + per user or something so that wouldn't work so well [14:52] PR snapd#7682 closed: overlord: add kernel remodel undo tests and fix undo [14:52] stgraber: yes [14:52] stgraber: so you are saying we could create and save one for each pkg.app pair [14:53] the tracking with a uts namespace would still be bypassable by any snap allowed to do fancy clone/unshare, but that's unlikely to really matter, you're not building security on this, if a snap wants to make itself break on refresh, whatever :) [14:54] stgraber: correct but the kind of snaps that would be hurt by this (think things like docker) really would as they would get unexpected refresh semantics [14:54] stgraber: but thank you for the idea, I think this is workable [14:54] zyga: yeah, on initial startup, you'd unshare the uts namespace, bind-mount its inode somewhere in /run/snap as you do for mntns, then any new process gets added using setns and to see if it's safe, you iterate through /proc for anything that has a matching inode on ns/uts [14:55] stgraber: correct [14:55] stgraber: I agree this would work [14:55] you could even make actual use of the uts namespace and give each snap a unique hostname [14:55] stgraber: I think that will break snaps [14:55] stgraber: especially it would be unexpected for snap with classic confinement [14:56] indeed :) [14:56] stgraber: or maybe the time namespace [14:56] ;-) [14:56] one thing to keep in mind is that if the hostname changes on the host, it won't in the snap [14:56] stgraber: yeah, I think it's a good route but we need to plan which namespace to use [14:56] uts doesn't feel like it [14:56] you certainly don't want to use ipc, mnt, net, pid or user [14:56] I would ideally do it via mount namespace and just save those with finer granularity [14:56] so you have two options really [14:57] feels like it's the same borderline use as the pids controller right now [14:57] I think mount is indeed unworkable on second thought [14:57] mount is the most used namespace, a lot of software mess with it and snapd already has more than one mntns per snap, so seems pretty hard to track [14:58] cgroup and uts are the two easy ones, cgroup may actually be a better option [14:58] hmm, though it may seem weird [14:58] yeah, no, don't use cgroup, you'll break lxd :) [14:59] uts won't break us though, the only issue with it is the hostname changes not getting propagated until a snap restart, but I don't think anyone will really notice/care [15:00] zyga: we could probably do something with pidfds though, but that's likely to be an issue due to lack of kernel support right now and it still being in very active development (by us) [15:00] stgraber: how about netlink [15:00] and observing all process activity [15:00] * cachio lunch [15:00] so match fork/exec/exit [15:01] zyga: I didn't think we had a reliable API for that in netlink, short of having to ptrace stuff [15:01] stgraber: I think there's something, not ptrace, there's a program that shows that (fork exec across all system) using netlink [15:01] zyga: also, that'll break when snapd restarts as you'll miss all those events [15:01] but yeah, I don't know how that works yet [15:01] stgraber: yes, it'd have to be a special process that's never quitting [15:01] and if it's relying on tracing, then it won't work in containers [15:01] zygote like per snap [15:02] AFAIK it is pure netlink [15:02] stgraber: alternative we considered before was putting the process in a well-known cgroup name eg. snap.app under the cgroup created for it by systemd (just one level deeper), but that'd involve walking /proc/*/cgroup [15:02] and walking /proc/*/cgroup feels like a bad idea [15:03] in v2 everything is somewhere in a hierarchy somewhere [15:04] zyga: forkstat/netlink connector requires real root [15:04] stgraber: I see [15:04] stgraber: thank you for checking [15:05] zyga: it's also apparently lossy, under load there's no delivery guarantee [15:06] stgraber: thank you, this is even more relevant [15:07] stgraber: anything we could use to tag a process with [15:07] stgraber: I don't mind enumeration [15:07] it won't be done all the time [15:07] just in special moments [15:07] I just want to see if we can identify a process as belonging to a snap [15:08] even weird things like containers [15:08] stgraber: thank you for all the input, it was invaluable, [15:08] I think we need to rethink the idea [15:09] zyga: there have been discussions about ptags in the past but nothing ever got merged [15:09] PR snapd#7719 opened: tests/main/gadget-update-pc: cleanup per-revision directories [15:10] mvo: this addresses the failures due to /var/snap/pc/ directories being left behind in the tests [15:10] mvo: ^^ [15:10] zyga: effectively arbitrary process tags which would be inherited and could even be marked immutable in some cases, it was one of the ideas put forth to solve the audit id issue for containers but after years of bikeshedding, nothing really happened [15:11] mborzecki: nice, *thank you* [15:13] stgraber: utterly crazy idea, set timerclask_ns to a unique value, use that [15:13] timerslack_ns [15:14] I prefer timertelegram_ns :-p [15:14] slack is second place :-D [15:16] mborzecki: but in a meeting right now so will look in a bit [15:36] PR snapd#7719 closed: tests/main/gadget-update-pc: cleanup per-revision directories [15:48] mvo: do you have time today? [15:48] mvo: for a 10 minute call? [15:49] mborzecki: do you have any technical ideas? [15:49] mborzecki: or can we exchange ideas before your EOD [15:49] mborzecki: it's fine to say no, perhaps at this stage having some rest would help [15:49] mborzecki: I don't see a way to solve this issue now [15:50] zyga: tomorrow morning, i'm taking kids to some extra classes in 10 [15:50] ok [15:50] thank you [15:59] stgraber: can snapd run inside lxd, on travis, on arm64? https://travis-ci.org/snapcore/core20/jobs/600808794?utm_medium=notification&utm_source=github_status [15:59] stoopkid: i see that core snap fails to setup security profiles? [15:59] stoopkid: unping [15:59] stoopkid: i see that core snap fails to setup security profiles, and i'm trying to get arm64 ci going with travis. [16:00] it seems udevadm failed [16:00] exit code 2 [16:01] i'll restart the build to see if things got any better [16:06] PR snapd#7680 closed: daemon: parse and reject invalid channels in snap ops [16:07] xnox: just try to run the snap install twice [16:07] xnox: that's usually enough for those [16:07] #7671 needs two (2!) reviews, and it's the last chunk of the channels preparatory work [16:07] PR #7671: overlord/patch: normalize tracking channel in state [16:08] Chipaca: is it okay if I call it quits now, sorry, if it was one review I'd jump in [16:08] but I need to break, think and move a little [16:08] zyga: you can call it quits any time you need to [16:08] zyga: also: stepping away from the keyboard to think is not quitting [16:08] and all my work is essentially down the drain today [16:08] so need to think if anything can save it [16:08] * Chipaca hugs zyga [16:08] * diddledan hugs too [16:08] * zyga hugs Chipaca [16:08] thank you guys! [16:09] * zyga hugs diddledan too :) [16:09] anything, as crazy as possible, that technically works, is good [16:09] I'll grab my bike and go for a ride [16:14] zyga: was in meetings, sorry [16:33] jdstrand, when do you plan on making a release of review-tools that includes https://git.launchpad.net/review-tools/commit/?id=31c894178a8fd110b9f65f0ee532b7e11384f8b8 ? [16:41] oSoMoN: let me prepare a release now [16:41] perfect, thanks! [16:46] roadmr: hi! would you mind pulling 20191104-1644UTC? [16:47] jdstrand: sure [16:49] roadmr: thanks :) [16:49] oSoMoN: fyi ^ (will be a bit before in production) [16:50] cheers [16:58] jdstrand: no name=snapd for now [16:59] jdstrand: if you want to know more I can summarize [16:59] jdstrand: if not we're going to discuss more tomorrow to see if we have other options [16:59] zyga: I read backscroll [16:59] zyga: it's too bad cause it seemed really elegant. guess it was too elegant, but good thing we got stgraber in the loop [17:00] yes, the lesson I learned is to ask more experts early [17:00] I wasn't aware of the extra limitations on rootless containers [17:00] actually [17:01] one idea [17:01] stgraber: har har har [17:01] stgraber: can you answer one more question please [17:01] stgraber: if we set a subreaper per snaps [17:01] *per snap* [17:01] stgraber: will we be able to follow that chain reliably [17:01] from random process, to their parent pid, to the subreaper [17:01] so that effectively, as long as the "watcher" subreapers are alive [17:01] we can reliably trace lineage? [17:02] so the idea is that instead of killing the watchers to kill the "container" [17:03] we can just scan all processes and try to match them to the reaper [17:04] hmmm, reading the man page it seems not useful [17:04] because it is not preserved across fork/clone [17:04] but is across exec [17:04] so we could watch our immediately forked process die [17:05] and if they had any children processes, we could watch them die if their parent (that we started) did quit [17:05] but I don't see anything in the man page linking the descendants to the reaper in a way that can be read [17:06] * zyga goes to experiment [17:06] maybe it's not too bad actually === pstolowski is now known as pstolowski|afk [17:43] PR snapcraft#2783 closed: cli: run review tools before pushing to the store if available [17:45] stgraber ^ this works [17:45] jdstrand: ^ this works [17:45] we can track all children existence [17:45] it's even better because the reaper can quit when they all all gone [17:46] so all we need to do is to unlink a "this snap is busy" file then [17:46] stgraber: ^ [17:46] PR snapcraft#2785 closed: remote-build: add initial command unit tests [17:49] a super-simple interactive reaper [17:49] https://paste.ubuntu.com/p/kYqhmWGtn3/ [17:49] I don't know if this can be made to work with proper pid tracking (e.g. how would this ever work for services) [17:49] but it's *something* [17:49] need to ponder on it more [18:08] sergiusens did we change way SNAPCRAFT_PROJECT_DIR behaves? I'm getting empty string there [18:08] sergiusens SNAPCRAFT_PROJECT_NAME and SNAPCRAFT_PROJECT_GRADE seems to work fine [18:14] ondra: no we did not change it [18:15] sergiusens then there is regression, as suddenly I'm getting empty string [18:22] ondra: on stable? [18:22] sergiusens stable and candidate [18:22] sergiusens at least what I tested [18:23] ondra: well stable has not changed for the pass 2 months [18:23] sergiusens hmm, then it must be this particular snap triggering it [18:31] sergiusens defo happening also with stable [21:33] it seems like snap is doing something terribly wrong with apparmor on lubuntu 18.04. i've installed both snap-store and vlc and both of them just throw tons of apparmor errors when i try to start them. [21:49] litheum: could you open a topic about this in the forum? [21:49] litheum: it should just work :-) if it doesn't we want to fix it [21:49] but I, personally, don't want to even think about fixing it at this time [21:50] :-) 'm off to bed [23:23] PR snapcraft#2791 opened: introduce bind-ssh support