[04:51] PR snapd#7607 closed: tests: launch the lxd images folowing the pattern ubuntu:${VERSION_ID} [05:05] morning [05:05] hey mborzecki [05:06] mvo: how are the tests today? i'm seeing quite a lot of red [05:07] uhh lxd-nofuse/lxd-snapfuse [05:07] mborzecki: yeah, cachio debugged/fixed this one [05:07] mborzecki: mergering master should help :) [05:07] ah, great [05:11] o/ [05:13] good morning [05:13] I'll grab some breakfast [05:14] zyga: good morning [05:25] zyga: hey [05:25] o/ [05:26] mvo: can tou take a look at https://github.com/snapcore/snapd/pull/7605 ? i think i'm missing something, /etc/systemd as a whole is listed in writable, shouldn't mkdir /etc/systemd/journal.conf.d succeed then? [05:26] PR #7605: tests: configure the journald service for core systems [05:30] mborzecki: are you sure /etc/systemd is writable? can you provide a reference? [05:31] zyga: ahh damn, looking at the wrong tab [05:31] zyga: it's writable on core18, on core there's individual directories under /etc/systemd [05:31] that's how I remember it [05:31] yeah, it's complex :/ [05:32] mborzecki: sure, looking [05:32] mvo: nvm, mystery solved :P [05:32] mborzecki: but iirc on core16 not all of it is writable just selected parts [05:33] mvo: right, i have writable-paths from both open and was looking at the other tab :P [05:34] mborzecki: I wonder if xnox experiment to use systemd in initrd works out [05:38] * mvo nods [05:38] I'll get to https://github.com/snapcore/snapd/pull/7547 now [05:38] PR #7547: many: use a dedicated named cgroup hierarchy for tracking [05:41] zyga: can you take a look at https://github.com/snapcore/snapd/pull/7602 ? [05:41] PR #7602: overlord/many: extend check snap callback to take snap container [05:46] zyga: 1825298 is interessting [05:53] dear launchpad, please stop timing out on me when I try to do bugtriage, love michael [05:57] * mvo stops bug triage [06:05] mvo: looking now [06:07] mvo: oh boy [06:07] mvo: how do we leave this misery [06:08] zyga: thats a tricky one :( [06:08] zyga: also super annoying :( [06:09] mvo: one way out would be to stop using /etc for apparmor profiles we use [06:09] this would involve shipping empty replacements that do NOP [06:09] and shipping files in /var forever [06:13] zyga: does apparmor support /var ? [06:13] mvo: ubuntu's apparmor does, that's how our profiles work [06:13] mvo: other distributions don't until apparmor 3.0 eventually ships [06:13] mvo: we wrote snapd.apparmor.service to support that and use it on some systems [06:14] mvo: solus has a custom solution for loading profiles but it supports /var/lib/snapd/apparmor already [06:14] mvo: we can do it [06:16] zyga: interessting, might be a good option, I feel I lack some details though, need to look closer after I did this bug triage [06:17] mvo: sure, we can talk about it later today [06:17] mvo: I mainly worry about how to fit this into our framework of rollbacks [06:18] mvo zyga: you could change the profile in /etc to '#include if exists ' so that it can hang around and still work even if the one in /var goes away [06:19] amurray: hey :) [06:19] mvo zyga: ah sorry - this was added in AppArmor 2.13 iirc so won't work for older releases [06:19] amurray: last time we looked it was not supported [06:19] :D [06:19] exactly [06:19] hey zyga [06:19] so it's great, but we cannot have it [06:19] bah ignore me then... oh well [06:21] amurray: shipping software on linux is easy they said ;) [06:21] amurray: while we have you [06:21] amurray: do you have a moment to look at an idea we had [06:22] amurray: it's described here https://github.com/snapcore/snapd/pull/7547 [06:22] PR #7547: many: use a dedicated named cgroup hierarchy for tracking [06:22] zyga: sure it's easy shipping software when you have snapd - shipping snapd however... that is harder :) [06:22] zyga: give me a sec [06:22] amurray: but the basic idea is that we mount a cgroup v1 hierarchy at /run/snapd/cgroup [06:22] amurray: no controllers just name=snapd [06:22] amurray: and we use it in v1 and v2 worlds so that we can always track processes [06:22] amurray: we want holes to be poked in the idea [06:24] zyga: my cgroups knowledge is a bit rusty but I will try and brush up and then review it in the next day or so if that's ok? [06:26] amurray: perfect [06:26] thank you so much [06:26] * zyga takes bit out for a walk [06:27] zyga: no worries [06:30] amurray: hey, do you happen to know if there is a "if file.exists()" primitive in apparmor profiles? or an include_if_availalbe? context is bug 1825298 [06:30] Bug #1825298: apparmor.service fails to start when apt install/remove snapd due to snapd profile error === pstolowski|afk is now known as pstolowski [07:03] morning [07:06] mborzecki: hey, have you looked at https://paste.ubuntu.com/p/S48Gyv5mpp/ recently? snapd deb failing to install inside container [07:06] pstolowski: he, it's fixed in master by cachio [07:07] mborzecki: ah, great [07:11] hey pstolowski ! [07:11] o/ [07:13] mvo: do you think https://bugs.launchpad.net/snapd/+bug/1824162 is medium prio? i was contemplating setting it critical for a while :/ [07:13] Bug #1824162: /usr/lib/snapd/snapd:11:runtime:runtime:runtime:runtime:runtime [07:15] mvo: it may occur regardless of experimental hotplug flag, and may prevent clean shutdown of snapd; of course only if there is udev netlink activity [07:27] pstolowski: I saw the report [07:27] pstolowski: it should probably be high or criticial, yes [07:27] pstolowski: do we have a fix yet or is it complicated? [07:28] mvo: no fix but should be simple, i needed to think a bit about it. should have a PR today [07:43] PR snapd#7593 closed: recovery-tool: add sfdisk wrapper [07:55] PR snapd#7609 opened: snap-recovery: remove "usedPartitions" from sfdisk.Create() [08:06] davdunc, could we update the aws-cli snap? It's getting a bit too old to use with EKS. Anything Weaveworks can do to help get this bumped regularly? Cheers [08:21] mborzecki: updated https://github.com/snapcore/snapd/pull/7547 [08:22] PR #7547: many: use a dedicated named cgroup hierarchy for tracking [08:22] mborzecki: I think I applied everything that we agreed upon [08:23] mborzecki: I didn't apply the changes that would impact people with the feature disabled for now [08:25] mborzecki: I'll do a small patch on top that enables this unconditionally to see if things fall apart [08:25] mborzecki: and then move this to happen for all snap confinement types [08:28] pedronis, hello! would you mind voting on https://forum.snapcraft.io/t/auto-connecting-the-personal-files-interface-for-the-chromium-snap-part-ii/13705 ? [08:34] PR snapd#7602 closed: overlord/many: extend check snap callback to take snap container [08:35] mborzecki: please do a follow up to #7602 where you swap *snap.Info and snap.Container [08:35] PR #7602: overlord/many: extend check snap callback to take snap container [08:36] pedronis: sure [08:36] thanks [08:38] mborzecki: to be clear, it means that now args are snap, snapf, curSnap [08:38] pedronis: yup, got it :P [08:38] mborzecki: I suppose snap, curSnap, snapf if it's a bit simpler [08:38] but snapf first is strange because of most of these will not use it [08:39] it's ancillary to the infos [08:48] mborzecki: trivial https://github.com/snapcore/snapd/pull/7610 [08:48] PR #7610: cmd/snap-confine: remove leftover condition from capability world [08:49] PR snapd#7610 opened: cmd/snap-confine: remove leftover condition from capability world [08:56] mvo: hmm i take my words back, a simple fix doesn't cut it for go-udev, it may get more complicated to fix [09:06] mborzecki: we need to talk [09:06] mborzecki: about classic mount ns [09:06] mborzecki: I have an idea that will resolve a problem that was raised during one of the review cycles [09:06] mborzecki: essentially, we can keep using the mount namespace we inherit when starting [09:06] mborzecki: and not re-associate back to pid-1 ns [09:06] mborzecki: I need to go for a haircut now, let's chat later, ok? [09:07] zyga: sgtm, deep in some bits of gadget/devicestate now [09:07] mborzecki: but the main idea is: since there is no persistence we don't need to re-associate with pid-1 ns [09:07] mborzecki: and anything we need to do (like tracking) can be forked to a helper that re-associates and quits [09:07] mborzecki: so that main process is where it was originally [09:07] mborzecki: this means that there is perfect compatibility and no impact to any odd setups that work today [09:08] * zyga runs for an errand, ttyl === jacekn_ is now known as jacekn [09:12] pstolowski: uh, thats slightly sad [09:32] PR snapd#7577 closed: overlord: set fake serial in TestRemodelSwitchToDifferentKernel [09:34] hi jdstrand. I am settign up microk8s with strict confinement. Partially works. I have a problem with a pod getting a denial. [09:34] apparmor="DENIED" operation="ptrace" profile="snap.microk8s.daemon-kubelet//systemd_run" pid=3034 comm="mount" requested_mask="trace" denied_mask="trace" peer="snap.microk8s.daemon-kubelet" [09:34] Have you seen this before? [09:40] kjackal: it's a bit early [09:40] yeap I know, no worries. I will ping him later as well [09:44] Chipaca: hi, I did a pass on #7608 [09:44] PR #7608: o/snapstate, etc: SnapState.Channel -> TrackingChannel, and a setter [09:44] just reading it [09:44] pedronis: the "Invalid channel" was because that way it looked just like the error we would get from the store :-) [09:45] perhaps too cute a reason tho [09:59] pedronis: pushed fixes for those and the spread tests [09:59] now going for a walk, bbiab [10:06] Chipaca: made a naming remark [10:08] PR snapd#7611 opened: overlord/many: switch order of check snap parameters [10:08] pedronis: ^^ [10:10] mborzecki: thanks, the fact that nil now are packed together seems to mean it's the right thing [10:10] pedronis: wodnering about moving it further to the right, after flags though [10:11] we tend to put flags as right as we can [10:12] you would have: nil, snapstate.Flags{}, nil [10:12] doesn't look better to me, any particular reason? [10:12] if we want to shuffle again [10:13] having after the first snap would still be best (except it needs more signature churn) [10:16] pedronis: otoh, it's probably good as it is in the PR, infos come first, then the container, then the rest [10:17] mborzecki: as I said I'm fine with the PR state [10:17] ack [10:20] mborzecki: do you have a moment for https://github.com/snapcore/snapd/pull/7518 ? [10:20] PR #7518: cmd/snap: sort tasks in snap debug timings output [10:21] pstolowski: sure [10:21] thanks! [10:22] PR snapd#7612 opened: gadget: add a public helper for parsing gadget metadata [10:22] another simple one ^^ [10:23] PR snapd#7355 closed: interfaces/apparmor: load multiple profiles in a single batch [10:24] pedronis: what's the problem with the transition in the store wrt #7092? [10:24] PR #7092: packaging: use snapd type and snapcraft 3.x <⛔ Blocked> [10:24] PR snapd#7613 opened: snap-recovery: add minimal binary so that we can use spread on it [10:25] mvo: there is 'include if exists' - see say https://gitlab.com/apparmor/apparmor/blob/master/profiles/apparmor.d/abstractions/dri-enumerate#L11 for an example - but this is only supported by apparmor 2.13 [10:25] and as zyga mentioned, this 2.13 is not available everywhere so include if exists is probably not a realistic option [10:26] pstolowski: the store checks that the type of snap upload is like the one of the last approved one [10:26] so for snapd is app [10:26] atm [10:27] we are discussing with the store to have an override [10:27] it's a good check in general, but a couple of cases show we need an override [10:27] pedronis: i see, thanks [10:28] amurray: thank you! [10:33] PR snapcraft#2751 closed: tests: move cli store push/upload tests to FakeStoreCommands [10:51] PR snapcraft#2745 closed: tests: update spread tests to account for content snaps [11:12] * pstolowski lunch [11:52] zyga: question about #7610, about something i only noticed thanks to pstolowski [11:52] PR #7610: cmd/snap-confine: remove leftover condition from capability world [11:52] sure [11:52] looking at the PR [11:52] zyga: why do you call geteuid etc when you've already loaded their answers via getresuid? [11:53] Chipaca: there's no good reason, it's just a bunch of old code few people touch [11:53] Chipaca: some reasons as that it is easier to ... reason about the code in front of you [11:53] vs some state [11:53] Chipaca: but I think that it can be simplified [11:53] pstolowski, Chipaca: would you mind if I open the simplification as another PR if this goes green? [11:54] I have some more in this area anyway [11:54] zyga: absolutely [11:54] zyga: what's currently 'if geteuid() != 0 { bail } ' could go right after the call to getresuid and make it easier to follow without needing the extra syscall [11:54] yeah [11:54] etc etc [11:54] I missed that, I looked at other parts and got confused [11:54] zyga: also these are all nits :-) [11:55] sure, but it's good to ask such questions [11:55] thank you for that! [11:55] :) [11:56] :) [11:57] speaking of questions, lunchp? [11:57] * Chipaca goes [11:58] Chipaca: lunchp? is redundant, no? [11:58] it's either lunch? or lunchp [11:58] ;-) [11:59] I was questioning the very question, probably [11:59] Chipaca: (lunch?)? ;-) [12:00] PR snapd#7610 closed: cmd/snap-confine: remove leftover condition from capability world [12:00] it's in :) [12:05] PR snapd#7611 closed: overlord/many: switch order of check snap parameters === ricab is now known as ricab|lunch [12:38] mborzecki: I opened a draft of device cgroup changes I had stashed, there's more piled on top to support v2 but I think this is a good thing to discuss before [12:38] mborzecki: if you have some time now we can discuss before the standup [12:38] PR snapd#7614 opened: cmd/snap-confine: implement snap-device-helper internally [12:38] mborzecki: it's not complex, I think it's actually more readable than before [12:39] zyga: ok, let me grab a coffee [12:41] pstolowski: re 7355> woohoo, merged! :) [12:41] jdstrand: hey :) [12:42] jdstrand: there's something that you could look at to help us move forward with a feature [12:42] jdstrand: we also asked alex and he said he would look tomorrow [12:42] jdstrand: but it would be good for you be aware of it in general [12:42] jdstrand: https://github.com/snapcore/snapd/pull/7547 [12:42] PR #7547: many: use a dedicated named cgroup hierarchy for tracking [12:43] jdstrand: the main idea is that we mount a new cgroup at /run/snapd/cgroup, just for tracking processes [12:43] jdstrand: no controllers are involved [12:43] kjackal: yes, I have seen that and it should be addressed in snapd 2.42. what does 'snap version' give you? [12:43] jdstrand: and we use this method in v1 and v2 worlds where it surprisingly works for as long as v1 is in the kernel [12:44] jdstrand: we have some more plans for this, there's a follow up branch that adds and configures a release agent for the cgroup [12:44] jdstrand: and this lets us solve some long-standing bugs with resouce leaks as well as get the new desired features for app termination notification in snapd [12:50] jdstrand: 2.42+git1515.143caf4~ubuntu16.04.1 [12:50] zyga: thanks for making me aware. the idea seems ok otoh [12:52] jdstrand: I am now in the middle of another addon that is failing. This time with: [12:52] apparmor="DENIED" operation="exec" info="no new privs" error=-1 profile="snap.microk8s.daemon-containerd" name="/usr/local/bin/proxy-init" pid=16941 comm="runc:[2:INIT]" requested_mask="x" denied_mask="x" fsuid=0 ouid=0 target="cri-containerd.apparmor.d" [12:53] kjackal: can you paste /var/lib/snapd/apparmor/profiles/snap.microk8s.daemon-kubelet ? [12:54] jdstrand: Here it is http://paste.ubuntu.com/p/fqpxsZscF9/ [12:55] kjackal: re the paste> you don't have the rule in there. can you paste /snap/microk8s/current/meta/snap.yaml ? [12:56] jdstrand: http://paste.ubuntu.com/p/4qMqPQjmKn/ [12:57] hmm, I thought that was in 2.42 [12:57] * jdstrand double checks [13:02] kjackal: ok, I misread the trace denial. I saw it the other way around (kubelet tracing itself) but not kubelet's systemd-run needing to trace kubelet [13:03] kjackal: do you know what operation is causing that? [13:04] jdstrand: I shared this via email https://docs.google.com/document/d/1mGHEfgiTDB6Nd11vgkPgCY2OrzrDGk7vmEgjrJGTKmA/edit?usp=sharing [13:04] kjackal: also, what kernel is this on? [13:04] jdstrand: its on Linux ip-172-31-20-243 4.15.0-1051-aws #53-Ubuntu SMP Wed Sep 18 13:35:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux [13:04] PR #53: Negate the integration build tag [13:05] kjackal: ok, I understand the denial and will create a PR [13:05] kjackal: do you have more information on the proxy-init denial? [13:06] I am looking into this now [13:06] What other info would you like me to gather? [13:07] kjackal: like with the subpath, the context of the denial. also, this looks to be in aws, can you give me ssh access? [13:08] of course, looking for you in LP [13:10] jdstrand: try ubuntu@ec2-54-161-115-99.compute-1.amazonaws.com [13:13] kjackal: ok, I'm in [13:13] there is a tmux session I am using, in case you want to show me anything [13:14] kjackal: what are you doing to trigger the proxy-init denial? [13:14] jdstrand: it is part of the "microk8s.enable linkerd" [13:15] all containers are failing. I suspect the reason is "securityContext: [13:15] runAsUser: 2103" [13:16] jdstrand: see "sudo microk8s.kubectl edit -n linkerd deployment.apps/linkerd-controller" [13:16] that is entirely possible [13:18] kjackal: that user doesn't exist on the main system, but more importantly, I think nnp (no new privs) may get in the way if you try to use a different user than what it is running as. try making that root [13:18] jdstrand: the problem is that the manifests I apply are not mine [13:19] jdstrand: I am deploying the linkerd manifests. If they fail out of the box they will fail from anyone [13:20] joedborg: are you using linkerd? have you seen/fixed kjackal's denial in kubernetes-worker? [13:20] linkerd sounds like a startup's name (linkerd.io! we link all your links!) [13:21] jdstrand: i'm not using linkered. kjackal sent me some last week which one in particular? [13:21] joedborg: the one from 30 minutes ago [13:21] (in this channel) [13:21] * joedborg found it [13:22] kjackal: nnp presents a number of challenges [13:22] joedborg: these are the instructions for deploying linkerd https://linkerd.io/2/getting-started/ [13:22] I was just joking about linkerd.io :( [13:23] roadmr: clearly not ;) [13:23] well it's like "I was just joking about Donald Trump being president" :( [13:23] jdstrand: kjackal: Ah, no i haven't, but this may be where k8s-worker and microk0s diverge; microk8s runs a lot of infra outside of k8s, where as k8s-worker runs it inside, so there will be differences like this (where the paths are different, for example) [13:24] thanks kjackal, I'll test it on my local cluster [13:24] joedborg: that's fine, thanks [13:25] joedborg: jdstrand, do you think we could say that "certain workloads will not run on strictly confined setup" ? [13:25] kjackal: have you tried to use a layout for /usr/local/bin? [13:25] kjackal: it isn't in the current snap.yaml [13:26] kjackal: ala https://github.com/charmed-kubernetes/snap-kubernetes-worker/blob/master/snap/snapcraft.yaml [13:26] jdstrand: I can add it now [13:26] kjackal: I would like to better understand what is happening before saying that (re certain workloads) === ricab|lunch is now known as ricab [13:27] jdstrand kjackal I think it's possible to run all workloads under strict confinement, but that it hignes on getting runc to use chroot, rather than pivot (as we discussed yesterday) [13:32] kjackal: can you not install/refresh microk8s on the aws node I'm on for a bit (I'll let you know)? [13:32] kjackal: also, what is a command I can run to trigger the subpath denial? [13:32] jdstrand: kjackal: ah, so I do get that same issue when running linkerd `[86131.195871] audit: type=1400 audit(1571232667.748:226767): apparmor="DENIED" operation="exec" info="no new privs" error=-1 profile="snap.kubernetes-worker.containerd" name="/usr/local/bin/proxy-init" pid=10902 comm="runc:[2:INIT]" requested_mask="x" denied_mask="x" fsuid=0 ouid=0 target="cri-containerd.apparmor.d"` [13:33] jdstrand: try this: sudo microk8s.kubectl apply -f /var/snap/microk8s/current/testcfgmount.yaml [13:34] joedborg: is it possible to intercept that 'runAsUser: 2103' bit and remove it/make it '0'? [13:35] jdstrand: it might be possible, but would probably mean no cross compatibility with running clusters [13:36] zyga: I had my hand on "close window" before mvo finished saying "weird users of qemu" [13:36] * Chipaca stays well away [13:37] jdstrand: just FYI, that's user 2103 from within containerd [13:37] Chipaca: any reason not to change .Full already ? just more spread tests breaking? or something else [13:37] jdstrand: like with the pivot root stuff [13:38] pedronis: no reason [13:38] pedronis: oh wait, you mean the implementation? [13:38] Chipaca: yes [13:38] kjackal, joedborg: I need to study the linkerd issue. nnp is, shall we say, problematic atm for all the LSMs (including apparmor) [13:38] it's used only in errors afaict [13:39] pedronis: I think it'll break some things without the patch [13:39] let me check though [13:39] jdstrand: i imagine there's something been done to the docker snap to allow that [13:40] Chipaca: afai see all usages are in cmd/snap/error.go [13:40] Chipaca: this is Full, not Clean [13:40] pedronis: that's usage of .Full, usage of canon(ical)ize are in snapstate [13:41] I don't see Full in snapstate [13:41] (I might have removed it myself at some point, don't know, I don't see it anymore) [13:41] pedronis: my question is what happens if you do --channel=//foo//, shut down snapd ,restart with the new one that calls Canonize [13:42] Chipaca: what it has to do with Full() tough? [13:42] that's a different issue [13:42] ? [13:42] Chipaca: yes, we want the patch very soon [13:42] pedronis: then we're talking cross-porpoises [13:42] pedronis: what're you talking about :-) [13:43] Chipaca: change Channel.Full to be Canonize [13:43] what code would explode ? [13:43] 🐬 [13:43] pedronis: i thought we were talking about making channel.Canonize call Channel.Full [13:44] pedronis: you were talking about renaming Channel.Full to be Channel.Canonize? [13:44] no? [13:44]