[04:52] the rocketchat snap package is missing two rules (https://github.com/RocketChat/Rocket.Chat/issues/14562). how can i add them? [04:52] sorry, i mean apparmor rules [04:56] scoopex: rebuilding the snap with the network-observe plug would probably do the trick. [04:57] that exposes a lot more though. [05:23] morning === morphis2 is now known as morphis [06:23] Hello [06:24] zyga: hey [06:31] How are you? [06:31] We had first rain in weeks [06:31] It is still raining now [06:42] Hey mvo [06:42] hey zyga [06:46] mvo: hey [06:47] hey mborzecki ! thanks for your decompile of the bpf in the performance thread [06:49] mvo: sure, do you have the link to the image you used with volatility? [06:50] mvo: i'm inclined to try rebuilding a couple of versions of libseccomp, check the bpf they generate and how it impacts the performance [06:50] mborzecki: i used http://amnesia.gtisc.gatech.edu/~moyix/ds_fuzz_hidden_proc.img.bz2 [06:50] mborzecki: it was more or less a random choice in their example image section [06:51] (its one of the smaller ones though) [06:51] mvo: hah, i could probably write a small benchmark but we could equally well use volatility :) [06:55] mborzecki: yeah, volatility is nice as it exhibits the issue easily [07:05] Wimpress: hello [07:05] Wimpress: please ping me when you are working, we need to sync [07:07] https://github.com/snapcore/snapd/pull/7062 is green and needs a 2nd review (re-enabling test) [07:11] * zyga is a little sleepy after uberlong day [07:11] on the upside the new bed is ok [07:14] pstolowski|afk: neat stuff === pstolowski|afk is now known as pstolowski [07:14] mornings, hey zyga [07:15] hey :) [07:15] good morning [07:15] pstolowski: it was raining all night [07:15] every bit of soil and plant around here needed that [07:18] zyga: it has been rainy and cloudy here as well for over a week.. and it's actually a little chilly [07:19] pstolowski: I kind of miss that now [07:27] brb [07:36] coffee and back to work soon [07:36] pstolowski: I'll reivew your PR first [07:36] ty [07:41] pstolowski: thanks! [07:42] coffee-in-hand, review mode :-) [07:42] yw [07:43] jamesh: hi, could you work to try to land 6959 ? [07:52] pstolowski: little chilly, 16C right now here :/ [07:57] mborzecki: wow [07:57] pstolowski: can you help me understand the code in the PR please [07:57] pstolowski: what sets skip-setup-profiles?, I cannot see it [07:57] EDIT: d'oh [07:57] sorry, typo in search [07:57] I should finish that coffee, I see it now [07:58] pstolowski: I moved some cards related to the interface connection work, doesn't seem you have a card in doing about it though [08:03] zyga: good [08:04] pedronis: indeed, thanks [08:10] pstolowski: I'd like to run a scenario by you, [08:14] pstolowski: (still assembling the scenario, I'll get back to you) [08:14] mvo: i'm experimenting a bit with versions of libseccomp, so the impact is marginal [08:15] mvo: i've checked 2.4.1, 2.3.3, latest master and the binary tree branch [08:18] mborzecki: I wonder if we could do a lookup table or if bpf is too limited [08:18] mvo: with ebpf yes, cbpf is limited [08:18] mvo: sadly seccomp is cbpf [08:21] mvo: if you're interested, this is the profile with 2.4.1 https://paste.ubuntu.com/p/wRTDmq39Sw/ and this one is with binary tree patches https://paste.ubuntu.com/p/Q9VYyhchRR/ [08:40] zyga: Morning [08:41] Wimpress: hello [09:06] pedronis: i'm removing ...== "snapd" checks; wondering what to do about client-side filtering and special-casing around connections/interfaces [09:16] pstolowski: we are mostly interested about places that were using .Type for anything else [09:17] pstolowski: client side, not sure, we had a lot of back and forth there [09:17] pstolowski: what's the special casing around connections/interfaces ? [09:18] pedronis: it's mostly about system nickname afaiu [09:18] pstolowski: that goes through the mapper, no? [09:18] the mapper decision could use the type now [09:20] pedronis: maybe, but that may not solve everything, e.g. cmd_connections uses different format for system snaps, i.e. it omits snap name for system snaps, and uses just the name match [09:21] pstolowski: if we do things correctly, selectInterfaceMapper can use the type [09:21] pstolowski: maybe I wasn't clear, the issue is not that we compare with the name in general [09:21] the issue is places that were checking the type of snapInfo for most things [09:21] but the name for snapd [09:22] but I'm probably still confused what the problem is [09:23] pstolowski: are you talking about places that use SystemSnapName ? [09:23] pedronis: isn't the goal to support snapd snap whatever its name might be in the future? [09:24] pstolowski: not really [09:24] never said that [09:24] NOTE: the client side checks were there because some parts of the API did not return enough information AFAIK [09:24] pstolowski: the goal is to make decision sensitive based on the type [09:24] pedronis: allright. wrong assumption on my side then. there is no issue then. [09:27] mvo: added a note about compiled bpf with 2.4.1 and the binary tree branch: https://forum.snapcraft.io/t/concerns-about-performance/12194/8 [09:27] pstolowski: you are talking about isSystemSnap in cmd_connections ? [09:28] pedronis: yes. and there is some special-casing in cmd_interfaces as well. although they're both irrelevant in the light of what you just clarified [09:29] we might want to do something about that, but is not a priority [09:32] pedronis: what should I do wrt the code for snaps with no health? [09:32] in info [09:33] pedronis: I don't have a strong opinion on this :) [09:33] having a code seems more uniform, but this being snap and not snapd it's purely cosmetic [09:38] * Chipaca proposes snapd-health-is-a-state-of-complete-physical-mental-and-social-well-being-and-not-merely-the-absence-of-disease-or-infirmity [09:41] mborzecki: thank you [09:43] Chipaca: no code please, it's the unknown unknown case [09:45] Chipaca: also, code are optional, we should embrace that, not give the impression they aren't [09:46] Chipaca: if you think the code really adds value, then we should always set in the api, and not leave the field empty [09:46] there [09:46] but I don't think it's worth it/makes a lot of sense [09:48] agreed then, no code in this case [09:53] break time! [09:53] * Chipaca dances [09:53] https://www.youtube.com/watch?v=otCpCn0l4Wo obvs [09:58] * zyga -> breakfast [10:30] back [10:37] pstolowski: I sent a partial review earlier; I cannot poke any holes in it [10:40] zyga: ty, i saw you comment. i think it's no different (in terms of risk) to anything else we do; fwtw if we made any error in that logic, we would end up with less persmissions, not more (initial setup-profiles is always done before auto-connect and link-snap). we just need to be careful as always [10:52] pstolowski: left some comments under https://github.com/snapcore/snapd/pull/7081 [10:54] thanks [11:31] Walk [11:39] re [12:21] oho [12:21] splits [12:22] mborzecki: one last python patch https://github.com/snapcore/snapd/pull/7082 [12:31] * zyga switches to reviews === ricab is now known as ricab|lunch [12:56] mborzecki: r cbpf> for some definition of sadly, ebpf would give us a lot of grief due to spectre (and its mitigations) [12:57] (cc mvo) ^ [12:58] jdstrand: as in https://seclists.org/oss-sec/2019/q1/106 ? [12:59] mborzecki: if we're looking to ease the pain of the lookup, trying to push oracle's patch forward is likely the way forward. we could do experimental builds and determine if it is good enough, then vendor libseccomp with that patch if it seemed ok [13:02] mborzecki: yes. some are disabling ebpf entirely. everyone else says only root can use it (if not running under seccomp already iirc) [13:07] mborzecki: I see you post that bintree didn't help. we could try to tweak libseccomp to put common syscalls like openat sooner. [13:08] mborzecki: and then send that upstream [13:08] jdstrand: yeah, the cost seems to be better on average, but lower number syscall seem to fare worse [13:09] and vendor it in the meantime. before vendoring or sending upstream, we'd need to run through the tests I wrote for the daemon user though... [13:09] mborzecki: I suspect if you posted your findings to the bintree one, people would try to come up with a compromise approach [13:09] jdstrand: actually that is interesting whether we should aim for all syscalls be similarly impacted by seccomp filter, rather than favoring some and penalizing others [13:10] mborzecki: since the goal is spead up when it actually slowed down [13:10] jdstrand: right, but if you do say, recvmsg/sendmsg a lot you'd see gains :) [13:12] mborzecki: well, seccomp is always going to have a measurable impact ime. what we can do to soften that... it seems plausible to put massively used ones earlier (like recvmsg/sendmsg). one would think that openat would be at the end though-- read/write/lseek at the beginning since you tend do that more than opens. [13:13] mborzecki: which means that we're always going to be slow in some pathological cases [13:14] mborzecki: if you are right about python compiling modules, shouldn't it be shipping pyc files? [13:20] mborzecki: the fact that we do a lookup at all, for volatility, it seems difficult to imagine doing better than bintree. I mean, read is very high in the list with only 6 syscalls before it, and lseek only 14. they are highly prioritized already [13:22] jdstrand: we can LD_PRELOAD and translate all the syscalls to open + mmap + caching (don't close and reopen) [13:23] jdstrand: we can perhaps do some tricks with multiple profiles, hand written, that use bitmasks to quickly ACK numbers that are high up the call frequency and then offset the cost for the NACK syscalls, maybe some sort of bloom filter approach crafted out of multiple filters that are all attached together [13:24] jdstrand: sorry for delayed replies, we're in the standup currently [13:24] jdstrand: mvo posted some info about -m compileall, but I did not apply it locall [13:26] mborzecki: it is unclear to me the frequency of the syscalls for volatility. the table tells my with <2.4, read has 0 syscalls before RET and with 2.4, 6, but how often is read called? (we know read will be high, but perhaps there is some frequently used syscall that could reasonably moved up the list) [13:27] mborzecki: mvo mentioned lseek and mmap... [13:28] jdstrand: let me grab a summary from strace [13:29] jdstrand: http://paste.ubuntu.com/p/F8PSHcscDg/ [13:31] zyga: I'm not clear on your LD_PRELOAD proposal. you are saying you want to modify program behavior to sidestep the issue? [13:31] jdstrand: so read, lseek, mmap, munmap, fstat should already be handled rather early [13:31] jdstrand: openat is probably an artifact of snap-confine and perhaps python module compilation (?) [13:32] mborzecki: how easy it for you to put lseek first in the list with 2.4 and rerun? [13:32] mborzecki: just to double check - you also see the problem go away once you add @unrestricted into the .src file and compile that, right? [13:32] mvo: yes, @unrestricted is ~14s vs 20 with seccomp set up [13:34] jdstrand: where's the list? [13:36] mvo: it's always been known there is overhead with syscall filtering. a map would sure be nice here... lseek is already prioritized and towards the top of the list. we knew pathological cases would be problematic (eg, opening lots of files over and over, etc) [13:36] mborzecki: well, yes, then I guess that is already putting it towards 'not easy' (I don't know otoh) [13:38] mvo: but volatility is executing lseek nearly a million times... [13:38] jdstrand: yeah, I think this forum post hit exactly this pathological case [13:38] jdstrand: i'll look into how libseccomp generates bpf, i see there's some syscall sorting happening early on so that's probably the right spot :) [13:38] jdstrand: its fine, if we can't do anything we can't do anything :) I was mostly trying to understand the details [13:40] mborzecki: it looks like there is a priority field looking at gen_bpf.h [13:40] err gen_bpf.c [13:41] mborzecki: iirc, upstream said there was more that could be done in this area, separate from bintree [13:42] I think that given the number of system calls it is silly a seccomp program cannot have a small array that has “yes, no, compute” action. Vast majority of times a simple lookup is sufficient [13:43] Maybe we need to invest in a kernel patch [13:43] zyga: I was wondering the same - it seems the bpf machine is very limited here (maciej mentioned this earlier here) [13:44] mborzecki: db.c talks about the priority field, and talks about 'user hints' [13:44] mborzecki: which is... interesting... [13:44] Ebpf seems to have everything we want though. Maps arrays jumps. It is just out of reach :-( [13:45] But even regular bot we could use some tricks with bit mask [13:45] a/bot/bpf [13:46] yeah, was wondering that too [13:47] zyga: once we have the daemon user tests in place, perhaps we could consider using ebpf on systems that have it available. I mean, snap-confine is running as root. it will have CAP_SYS_ADMIN for using bpf()... [13:48] zyga: it wouldn't be something we would want to rush and we'd have to really justify it in my mind, cause the entire industry is using libseccomp. if we go out on our own and get it wrong, that would be, let's shall we say, less than ideal [13:49] (cc mvo) ^ [13:49] jdstrand: hm seccomp_syscall_priority() [13:49] jdstrand: but seccomp does not do eBPF, or does it? [13:49] jdstrand: let's see if i can tweak the code to call it and make it prioritize lseek [13:50] I'd much prefer to see mborzecki's findings as feedback in the bintree pr [13:50] zyga: no, it would need to be a new LSM [13:51] zyga: there are some patches for seccomp to use ebpf. that would definitely need to be evaluated :) [13:51] Understood [13:53] anyway, I suspect bintree is going to help lift everything, even if some pathological cases remain [13:56] mborzecki: yes, look at that, a C api and everything. huh. so, assuming that can make things better, it is theoretically possible for snapd to expose some knobs to tune the filter [13:58] mborzecki: eg, seccomp_syscall_priority(ctx, SCMP_SYS(lseek), 220); seccomp_syscall_priority(ctx, SCMP_SYS(read), 219); [13:59] mborzecki: with knobs, we could perhaps make pathological cases not so bad... [14:00] * jdstrand is a little concerned what that might do to the added tests for snap_daemon since that whole thing was so brittle prior to 2.4 (and its defaults) [14:03] mborzecki: I of course made up 220 and 219. I don't know what the current prio is for either. maybe for your 'quick' test, 255 and 254 is best since volatility is dominated by those two [14:06] jdstrand: heh, https://paste.ubuntu.com/p/CPZ6cGkt8q/ shaved off ~1s [14:08] jdstrand: fwiw scmp_bpf_sim tells me it's fewer instructions now for those syscalls [14:10] mborzecki: hmm, only one second. that suggests the act of filtering at all causes a fairly significant slowdown (at least when calling a function a million times), with the ordering being an additional factor [14:10] jdstrand: fwiw i could possibly come up with a trace to measure how much it takes [14:13] jdstrand: had to fix the unrestricted time in the forum note, idk why i had 14s instead of 18s [14:14] jdstrand: so fiddling with bintree branch or the regular branch, we can gain a bit, but not much tbh [14:14] mborzecki: it might be interestinng for completeness since we were talking about sort order so much and that a map would help (which these finding suggest perhaps not) [14:14] mborzecki: re gain> right [14:17] jdstrand: a map of what? or an eBPF map? [14:18] jdstrand: fwiw http://paste.ubuntu.com/p/DHV5tfChq6/ disassembly output with prioritized syscalls [14:20] that has to be optimizable === pedronis_ is now known as pedronis === ricab|lunch is now known as ricab [14:23] mborzecki: I was saying that your run might suggest that the filtering code in the kernel is what is slow. if lseek is at the top of the list then sort order is essentially taken out of the equation for that syscall. so a(n ebpf) map isn't going to help that [14:23] mborzecki: perhaps the kernel could be improved to do caching of a previous lookup... that is likely memory intensive though [14:25] mborzecki: however, your 18s:20.5s vs 14s:20.5 is a big deal. I think we can live with that overhead for the foreseaable future. we can pull back bintree in the fullness of time and lift everyone a bit. with higher priority items out of the way, we could consider adding seccomp_syscall_priority() to the mix [14:27] jdstrand: w8, that's 18s with @unrestricted, vs 20. with bintree or priorized vs 20. with 2.4.1 [14:27] mborzecki: even better [14:27] mborzecki: point is, the delta between unrestricted and restricted is not great [14:28] jdstrand: right, the penalty is there and it's measurable [14:29] it is there, it is measurable. we knew that (we just didn't measure it). my point is that a 10% slowdown for a pathological case is not terrible [14:29] jdstrand: also, we probably need a better benchmark [14:30] and that perhaps in the future, we can get that to 5% with seccomp_syscall_priority() if that bubbles up to a roadmap item [14:30] mborzecki: re benchmark> absolutely [14:39] mborzecki: some of the rain got here :) [14:41] even storm [14:41] finally colder though [14:53] lots of thunder and rain here, if I don't respond it may that my network got knocked outu [14:53] *out [15:00] mborzecki: fyi, I see you are responding-- I just responded again with my thoughts (perhaps you want to consider/agree/refute them in your response) [15:02] mborzecki: did another pass on 6890, still not super clear why the calls to checkpointPrefix and rollbackPrefix are not more symmetric [15:14] pedronis: hm must have missed that in your previous review, will take a look [15:15] jdstrand: left a note with results from more runs, so only conclusions at this point are: @unrestricted has no overhead and we need a better benchmark [15:17] mborzecki: thanks! those are concrete conclusions. I think an important additional conclusion is that sort order isn't everything and there is overhead in mediating it at all. that is useful and can guide us going forward [15:18] mborzecki: I also think that there is reasonable hope that bintree will help non-pathological cases, but that needs measuring [15:21] mborzecki: it is interesting that bintree/prioritized is slower with the averages than straight 2.4.1 [15:21] * jdstrand updates his math [15:31] mborzecki: I wonder if it makes sense for when bintree is available, to make it chooseable. ie, add an api call to opt into/out of bintree... it seems that the upstream patch doesn't make this tunable. it only looks at MIN_SYSCALLS_TO_USE_BINTREE [15:33] mborzecki: fyi, our default template uses 368 syscalls today [16:08] degville: hey, thanks for debug timings doc tweaks! one typo there: "snapd stores up to 100 measurements with the oldest dropped to for new measurements"; did you mean "to make room for.." ? [16:10] pstolowski: no problem, and yes, you're right. [16:21] * zyga food [16:31] mvo: btw, #7016 needs 2nd review [16:40] pstolowski: yes, will do [16:41] mvo: thanks === pstolowski is now known as pstolowski|afk [19:00] mvo: oh well it seems that govendor doesn't like pull requests [19:08] has anyone used govendor with an unmerged PR? [20:26] cmatsuoka: what do you mean? [20:27] zyga: I have code that depends on a go-tpm PR [20:27] you have to update govendor's json thing [20:27] we rarely do that though [20:27] you may also need to check if debian has that as a package [20:27] or otherwise part of CI will fail [20:28] similar thing for fedora (sorry) [20:28] zyga: you mean, by hand? because the govendor utility ignores the package [20:28] no, not by hand [20:28] I forgot how to useit [20:28] AFAIK you need to "go get" it [20:28] and tell govendor to add it [20:28] which will update the json [20:28] look at the history for vendor.json, maybe there are some clues [20:28] I honestly don't remember how to use it [20:29] if you govendor add the package (that's already positioned at the right branch) it's ignored, govendor doesn't add it [20:29] but if it's a "normal" package it works [20:36] aha, I can tell govendor to use the downstream tree with the upstream name [20:37] ok, that will solve the problem (hopefully) [20:51] yay, it worked! [20:54] thanks zyga for making me read the FAQ again :D