[04:08] <mup> PR snapcraft#2354 closed: Preflight missing multipass <Created by evandandrea> <Merged by sergiusens> <https://github.com/snapcore/snapcraft/pull/2354>
[06:14] <mborzecki> morning
[06:22] <zyga|afk> Hey
[06:22] <zyga|afk> mborzecki: how are you
[06:23] <mborzecki> zyga|afk: hey, feeling better today
[07:08] <mborzecki> zyga|afk: noticed that removable-media allows acessing/wiritng to /mnt, i was always under the impression that only {/run}/media is allowed there
[07:10] <mborzecki> mvo: morning
[07:10] <mvo> mborzecki: good morning! hope you feel better
[07:10] <mvo> zyga|afk: hey, anything new from the aa bug?
[07:11] <mborzecki> mvo: yes, way better today
[07:14] <mvo> mborzecki: great!
[07:21] <mborzecki> zyga|afk: when you're here, can you take a look at https://forum.snapcraft.io/t/the-removable-media-interface/7910 I've added /mnt there since we support it, but did not mention anything about mount propagation since the interface does not allow mounting anyway
[07:58] <pstolowski> hey
[07:59] <mborzecki> pstolowski: hey
[07:59] <pstolowski> mborzecki: hey, how are you, feeling better?
[07:59] <mborzecki> pstolowski: yeah, thanks
[08:01] <mvo> zyga|afk: woah, I got "cannot create temporary directory for /var/lib/snapd mount point: Permission denied" now on first try of mvo5/run-fontconfig-2.36
[08:05] <mwhudson> popey: i took your advice and made a screenshot https://snapcraft.io/go
[08:07] <mup> PR snapd#6223 closed: cmd/libsnap: move apparmor-support to libsnap <Simple 😃> <Created by zyga> <Merged by bboozzoo> <https://github.com/snapcore/snapd/pull/6223>
[08:07] <mborzecki> mwhudson: nice, should include go env output too
[08:08] <mwhudson> mborzecki: you think? i most of it is pretty boring
[08:08] <mwhudson> +think
[08:08] <pedronis> mvo: hi
[08:09] <mborzecki> mwhudson: maybe go env GOROOT GOTOOLDIR?
[08:09] <pedronis> mvo: anything I can help with?
[08:10] <mwhudson> mborzecki: yeah, might make sense
[08:10] <mwhudson> the whole process is a bit tedious tbh, but if i redo it for some reason :)
[08:11] <mborzecki> mwhudson: that's asciinema right?
[08:11] <mvo> pedronis: good morning - we are still struggling with the apparmor failure in 2.36. I did some work on the --trace-exec in 6185, please have a look but I can do more just got distracted because of the aa issue
[08:11] <mwhudson> mborzecki: yes, plus hacking to smooth out the typing
[08:11] <mvo> pedronis: but 6185 should be cleaner than before
[08:11] <popey> mwhudson: I love that!
[08:11] <pedronis> mvo: ok, do we need to have a chat about the aa issues with zyga as well?
[08:12] <mborzecki> mwhudson: mhm, very nice!
[08:12] <mvo> pedronis: maybe, I think he had some new ideas since last night, but I don't know more details yet
[08:12] <pedronis> ok
[08:13] <popey> mwhudson: I hope it's not so big as to trigger this though ;) https://gitlab.gnome.org/GNOME/gnome-software/issues/532
[08:13] <mwhudson> popey: heh no
[08:13] <mvo> pedronis: 5845 needs a second review (the files interface) but thats probably in good hands with jamie and zyga
[08:14] <pedronis> pstolowski: are you blocked atm?  my and mvo days are a bit full of meetings and we have the 2.36 troubles
[08:14] <pedronis> pstolowski: I still would like the 3 of us to talk about hot plug next step this week tough, but tomorrow might be better
[08:30] <pstolowski> pedronis: not really, it can wait till tomorrow (not sure i can get 2nd review of the hotplug disconnect branchs from mvo under these circumstances anyway). i've also been working on one of the 2.36 blockers
[08:32] <pedronis> pstolowski: ok, btw I answered in https://bugs.launchpad.net/snapd/+bug/1777121 it makes sense to pick it up, make a card and work on it when you have time
[08:33] <mup> Bug #1777121: Remove is called after snap services are stopped  <snapd:Triaged> <https://launchpad.net/bugs/1777121>
[08:33] <pstolowski> pedronis: great, i'll take it
[09:05] <pedronis> pstolowski: can I help somehow with that panic?
[09:06] <pstolowski> pedronis: no, thanks, i think that's fixed, except cannot land due to other random test failures
[09:07] <mvo> pedronis: the panic is interessting, mocking the usr.lib.snapd.snap-confine.real makes the problem go away - it almost looks like there might be a connection to the other bug we are looking at
[09:07] <zyga|afk> Yep
[09:07] <zyga|afk> Hello
[09:07] <mvo> pstolowski: could it be that setup profiles for some reason looks at the real /snap/core/current/... usr.lib.snapd.snap-confine.real and dies there?
[09:07] <zyga|afk> Sorry for starting late. Daughter woes
[09:07] <pstolowski> hey zyga|afk
[09:07] <mvo> zyga|afk: hey
[09:08] <zyga|afk> I’m taking bit out but heading home now
[09:08] <zyga|afk> I will read backlog
[09:08] <pstolowski> mvo: no, it's not looking at real one; it's for sure looking at /tmp/.../check-xyz/snap/core/<rev> (i had debug logs confirming this)
[09:08] <mvo> pstolowski: ok
[09:09] <pstolowski> it's curious why it wasn't failing before though, not sure what changed. or if we were plain lucky before
[09:11] <zyga|afk> Hahhahhahh
[09:11] <zyga|afk> You will have fun knowing why
[09:11] <zyga|afk> Man
[09:12] <zyga|afk> Sorry still not at my kb
[09:14] <zyga|afk> There
[09:14] <zyga|afk> Solved it
[09:16] <mvo> zyga|afk: can't wait to hear the details
[09:16] <mvo> zyga|afk: does it also shed some light on the other issue ?
[09:17] <pstolowski> can't wait to that as well :)
[09:20] <pstolowski> mvo: is there any other 2.36 issue i can help with?
[09:21] <mvo> pstolowski: 6185 needs a second review, we could pull this into 2.36
[09:21] <mvo> pstolowski: but it first needs a merge into master :)
[09:36] <pstolowski> mvo, zyga|afk https://github.com/snapcore/snapd/pull/6219 is green now ; needs 2nd review
[09:36] <mup> PR #6219: overlord/tests: fix panic in managers test <Created by stolowski> <https://github.com/snapcore/snapd/pull/6219>
[09:37] <pstolowski> now sure how relevant it is with zyga|afk's findings about setup-profiles
[09:38] <mvo> pstolowski: I'm looking at the exact error now, I have it in a VM and it seems to be reliable to reproduce
[09:38] <pstolowski> mvo: is it cosmic?
[09:38] <zyga|afk> re
[09:38] <mvo> pstolowski: 16.04
[09:38] <zyga> give me a moment please
[09:41] <zyga> so, the 1000ft version is that we never ever mocked security backends
[09:42] <zyga> and that's not good
[09:42] <zyga> and stuff didn't blow up because we 100% ignored all errors
[09:42] <zyga> so running overlord tests would really run apparmor parser
[09:42] <zyga> reload udev
[09:42] <zyga> and all the other stuff
[09:43] <zyga> including setting up all snaps on manager startup
[09:43] <zyga> including initializing the apparmor backend
[09:43] <mvo> errors:[]state.taskError{state.taskError{task:"Setup snap \"some-snap\" (40) security profiles", error:"cannot setup apparmor for snap \"core\": cannot create host snap-confine apparmor configuration: cannot compute snap-confine profile: cannot open apparmor profile for vanilla snap-confine: open /tmp/check-2391705272889139391/162/snap/core/1/etc/apparmor.d/usr.lib.snapd.snap-confine: no such file or directory"
[09:43] <zyga> and setting up snap-confine profile when core was being setup
[09:43] <mvo> zyga: so yes :)
[09:43] <zyga> so
[09:43] <zyga> the reason that pawel's branch fixed the panic
[09:44] <zyga> is that it made part of the apparmor.Backend.Setup code happy
[09:44] <pedronis> zyga: where?  I'm quite sure we tried to mock aa parser
[09:44] <zyga> I realised while outside that the only reason this can have an effect
[09:44] <zyga> is that we ran with the full backend
[09:44] <zyga> pedronis: managers_test
[09:44] <zyga> it just spawns the full interface manager and carries on
[09:44] <pedronis>  ms.aa = testutil.MockCommand(c, "apparmor_parser", "")
[09:44] <pedronis> etc
[09:45] <mvo> zyga: I think this is understood now, thank you! the next question is why http://paste.ubuntu.com/p/fv5fsh3PDy/ makes the error to setup the aa profile basicly go away
[09:45] <mvo>   
[09:45] <pstolowski> zyga: we do mock apparmor_parser though (at least in these tests i worked on)
[09:45] <pedronis> maybe it doesn't work, but for sure it was trying
[09:45] <zyga> that's only part of the story, we should not run those tests with real backends
[09:45] <pedronis> zyga: ?
[09:45] <pedronis> they are meant to be quite integrationy
[09:45] <pedronis> that seems a bit of a broad statement
[09:46] <zyga> then we need much more preparation in that phase, right now a small tweak in backend needs to be reflected in extra setup or mocking in the overlord test
[09:46] <mvo> pedronis: the error we get in the test is actually that it can't open the core template profile, so this happens before aa_parser (just for context)
[09:46] <zyga> anyway, please let me explore
[09:46] <zyga> yep
[09:46] <zyga> so we need to either:
[09:46] <pedronis> mvo: yes, so we have some code that doesn't respect dirs root
[09:46] <zyga> 1) prepare a full blown environemnt that each backend expects
[09:46] <pedronis> or soemthing
[09:47] <pedronis> I'm just trying to counter that it wasn't trying to mock things
[09:47] <mvo> pedronis: it does respect them, thats the problem, there is nothing there so it fails to open the profile
[09:47] <mvo> pedronis: yeah, that is correct - we do mock stuff
[09:47] <zyga> 2) ask each backend to mock itself properly so that it can kind of run but have no real effect
[09:47] <zyga> 3) not run with real backends at that stage
[09:47] <zyga> we mock things in the wrong place, the overlord doens't know how to mock apparmor
[09:47] <zyga> mocking one command is not enough
[09:48] <pedronis> zyga: assume I understand what you are saying
[09:48] <zyga> the responsibility of knowing how to mock is in the backend itself, not in a test far far away
[09:50] <pedronis> zyga: is there a clean way to turn off the backends? would those tests still pass as they are?
[09:50] <zyga> e.g. calling backend.Initialize from apparmor interrogates the system about nfs and overlayfs
[09:50] <zyga> that's not mocked in any way
[09:50] <zyga> pedronis: sure, we do that in the interface manager tests
[09:50] <zyga> pedronis: I was surprised this is not done so
[09:51] <zyga> partially the question is what do we want to do in those tests
[09:51] <zyga> are they integration tests?
[09:51] <zyga> are they tests for the overlord?
[09:51] <pedronis> zyga: the name tells you what they try to do
[09:51] <pedronis> they try to test the integration of more than one manager
[09:51] <zyga> if they are integration tests then running with real managers, even if their activity is mocked feels wasteful since we don't observe what they do
[09:51] <zyga> pedronis: that's fine
[09:52] <zyga> keeping the manager is ok
[09:52] <pedronis> zyga: anyway I told you a very actionable thing, is there a clean way to turn off the backends? would those tests still pass as they are?
[09:52] <zyga> but if we don't check what is the impact of apparmor or seccomp part of the interface manager, perhaps we should not use real backends
[09:52] <pedronis> you answered the first bit
[09:52] <pedronis> let's see about the 2nd, no?
[09:52] <zyga> yes, checking that now
[09:52] <zyga> hold on
[09:52] <mvo> I think this is interessting and we need to look at this but it does not (afaict) help with the apparmor_parser getting killed issue we have in 2.36
[09:53] <pedronis> mvo: does it get killed on some distros or all distros?
[09:53] <pedronis> when does it get killed?
[09:53] <mvo> pedronis: so far I only saw it on 16.04
[09:53] <pedronis> I mean in which tests
[09:53] <mvo> pedronis: it gets killed when it tries to setup the snap-confine security profiles on the hosts
[09:53] <pedronis> in a spread test? unit test?
[09:53] <pedronis> randomly?
[09:54] <mvo> pedronis: I see it in tests/main/layout and tests/main/parallel-install-layout - spread tests
[09:54] <pedronis> ok, spread test
[09:54] <mvo> pedronis: let me try to find out if it happens in more
[09:54] <pedronis> anywy, yes then is not related
[09:54] <pedronis> by def
[09:55] <mvo> pedronis: making the error from a notify log to a real error changes the outcome for some reason, its super strange but with http://paste.ubuntu.com/p/kksmSpN95H/ I cannot reproduce the issue anymore
[09:56] <pedronis> mvo: there are probably two calls? and the first fails benigly and the 2nd days?
[09:56] <pedronis> that would explain why that would make a difference
[09:57] <mvo> pedronis: also failure seems slightly random, in GH we see it also in parallel-install-interfaces-content and snap-env but all 16.04
[09:57] <zyga> mvo: there is one more place
[09:57] <zyga> mvo: actually two, you just have an older patch
[09:58] <zyga> mvo: the key place is actually in interface manager initialize when we redo security when system key changes, this is still ignored in the older copy you have
[09:58] <mvo> zyga: ok, where can I find the correct one?
[09:59] <mvo> zyga: what is super strange is that literally on the first try of run-fontconfig-2.36 I hit the bug
[09:59] <mvo> zyga: then I added the diff and run again and had ~4-5 runs since and nothing triggers it
[10:08] <zyga> pedronis: all tests pass
[10:08] <pedronis> zyga: good, can you propose something that removes the bad mocking and does the right mocking? so we can look at it
[10:08] <zyga> yep
[10:09] <zyga> doing that now
[10:12] <pedronis> mvo: I added a couple of questions to #6185
[10:13] <mvo> pedronis: thank you, checking
[10:13] <mup> PR #6185: snap: add new `snap run --trace-exec` call <Performance 🚀> <Created by mvo5> <https://github.com/snapcore/snapd/pull/6185>
[10:14] <zyga> mvo: sorry, missed your question
[10:14] <zyga> mvo: there's a patch with logging -> errors on GH but I found one more place where that happens, haven't pushed that part
[10:20] <zyga> pedronis: https://github.com/snapcore/snapd/pull/6226
[10:20] <mup> PR #6226: overlord: mock security backends for testing <Created by zyga> <https://github.com/snapcore/snapd/pull/6226>
[10:20] <zyga> let's see how this runs
[10:20] <mup> PR snapd#6226 opened: overlord: mock security backends for testing (2.36) <Created by zyga> <https://github.com/snapcore/snapd/pull/6226>
[10:22] <mvo> zyga: thank you!
[10:23] <zyga> mvo: re errors vs logging, I checked my patches from last night and they are all up to date
[10:23] <zyga> sorry for the noise, still drinking my first coffee
[10:24] <mvo> zyga: no worries
[10:24] <zyga> what remains a bit of a mystery is the no-output error from apparmor parser
[10:24] <mvo> zyga: I reverted to see if I get the failure without the change or if I'm hunting a ghost
[10:24] <zyga> cookl
[10:25] <mvo> zyga: yeah, especially since we set SNAPD_DEBUG afaik in the tests so we should see output
[10:25] <zyga> I restarted my tests, if I can get it to happen again I'd like to look
[10:26] <zyga> I wonder if it is possible that mocked no-op apparmor_parser is _somehow_ leaking from unit tests into the state of the machine where they execute
[10:30] <zyga> mvo: I will now look at the profile compatibility issue on leap
[10:30] <zyga> mvo: I think we should not compile snap-confine profile if snap-confine in the distribution has disabled apparmor
[10:31] <pedronis> zyga: MockCommand is based on setting PATH so it should't go further than unit tests
[10:31] <zyga> pedronis: yeah but maybe some magic sequence of bugs or something
[10:31] <zyga> mvo: alternatively we can look at apparmor parser version and skip it this way, it should be the same outcome
[10:31] <zyga> s/be/have/
[10:32] <mvo> zyga: first run without the diff that makes it a real error and I hit the issue again
[10:32] <mvo> zyga: "Nov 28 10:32:00 nov281009-194489 snapd[29337]: backend.go:312: cannot create host snap-confine apparmor configuration: cannot reload snap-confine apparmor profile: cannot load apparmor profiles: signal: terminated
[10:32] <mvo> "
[10:32] <mvo> Nov 28 10:32:00 nov281009-194489 snapd[29337]: apparmor_parser output:
[10:33] <zyga> can you check one thing
[10:33] <zyga> if you have shell still
[10:33] <mvo> I do
[10:33] <zyga> look at what apparmor_parser is
[10:33] <zyga> is it the real deal
[10:33] <zyga> also look at journalctl
[10:33] <mvo> zyga: https://paste.ubuntu.com/p/BvBvkxP4Mn/
[10:34] <zyga> if the parser is loading stuff into the kernel it will trigger an audit event
[10:34] <zyga> so you may match the timestamp above
[10:34] <zyga> to an audit even
[10:34] <zyga> (looks allright)
[10:35] <mvo> zyga: https://paste.ubuntu.com/p/KWYSKPVgJY/ - looks like audit is not showing that the snap-cofine profile is loaded
[10:36] <zyga> indeed
[10:36] <zyga> anything around the timestamp of
[10:36] <zyga>  "Nov 28 10:32:00 nov281009-194489
[10:37] <mvo> zyga: https://paste.ubuntu.com/p/6wbYy8psFb/
[10:38] <zyga> hum
[10:38] <zyga> as if nothing had happened
[10:38] <mvo> zyga: I will try to run the apparmor_parser manual now
[10:38] <zyga> mvo: crazy idea, apparmo_parser -> apparmor_parser.real, log all invocations
[10:39] <zyga> can you reliably reproduce it?
[10:39] <zyga> which test was it that failed now
[10:39] <mvo> zyga: the test is pretty random
[10:39] <mvo> zyga: but without your diff I hit it 2/2 so far
[10:39] <mvo> zyga: today
[10:39] <mvo> zyga: via the mvo5/run-fontconfig-2.36
[10:40] <mvo> zyga: I think plain upstream/release/2.36 will also work, takes ~100 tests and then it happens
[10:42] <zyga> mvo: I will try your branch now
[10:45] <mvo> zyga: silly question, what is our cache dir again?
[10:45] <zyga> which cache?
[10:46] <zyga> apparmor?
[10:46] <mvo> zyga: apparmor cache for the re-exec thing
[10:46] <zyga> we use /var/cache/apparmor
[10:46] <zyga> for all our profiles
[10:46] <zyga> we no longer use any other paths
[10:46] <mvo> zyga: thanks
[10:47] <zyga> note: we still use the other place for that single profile that ships with the package
[10:47] <mvo> zyga: and running it by hand - works :(
[10:47] <zyga> yeah, I ran it by hand once in a debug session
[10:47] <zyga> it's maddening
[10:47] <zyga> wish I had a rollback machine to get a few seconds into the past
[10:47] <zyga> mvo: so FYI, /etc/apparmor.d/cache/
[10:47] <zyga> because cache is in etc
[10:49] <pedronis> pstolowski: mvo: I created a meeting for tomorrow, let me know if it doesn't work
[10:50] <pstolowski> pedronis: thanks, it's fine
[10:50] <mvo> pedronis: thank you
[11:04] <zyga> brb, it's super cold in the office today
[11:05] <mvo> zyga: see you - I add the wrapper thing now
[11:14] <zyga> back
[11:14] <zyga> mvo: https://github.com/snapcore/snapd/pull/6226 is green!
[11:14] <mup> PR #6226: overlord: mock security backends for testing (2.36) <Created by zyga> <https://github.com/snapcore/snapd/pull/6226>
[11:14] <zyga> pedronis: can you look please,
[11:14] <zyga> I think this is the dealbreaker
[11:14] <zyga> though it doesn't explain why things fail in non-unit tests
[11:15] <zyga> those also passed on 1st run :)
[11:15] <zyga> virtually unheard of :)
[11:16] <pedronis> zyga: do we have other places outside of overlord that do overlord.New ? and need the same mocking
[11:16] <pedronis> I think daemon might
[11:17] <zyga> I checked all of overlord but indeed, daemon might
[11:17] <zyga> looking now
[11:17] <pedronis> zyga: +1 with this kind of questions
[11:21] <zyga> pedronis: yes, adding the same treatment to daemon now
[11:22] <pedronis> zyga: devicestate/firstboot_test.go seems also to create a full overlord
[11:22] <pedronis> for some tests
[11:23] <zyga> I added a printf that logs the added backends, I'll check that we are good across the tree
[11:24] <zyga> actually, it seems daemon did this already
[11:24]  * zyga double checks
[11:25] <zyga> though not for all suites
[11:30] <mvo> zyga: oh, fun
[11:30] <zyga> mvo: any luck with apparmor_parser?
[11:31] <mvo> zyga: not yet
[11:31] <mvo> zyga: still running
[11:31] <mvo> zyga: green is also annoying - why is it green, it should fail with the same appamor issue
[11:32] <mvo> zyga: I mean, it should fail with permission denied at a random place when the apparmor_profile canot be (re)loaded
[11:33] <zyga> yes :/
[11:33] <zyga> odd observation
[11:33] <zyga> go test
[11:33] <zyga> I see stdout
[11:33] <zyga> go test ./...
[11:33] <zyga> I don't see stdout!
[11:36] <mvo> zyga: I remember I discussed that with john a while ago, it was strange go testrunner setup iirc
[11:44] <zyga> everything is mocked now, let's do a master version of that
[11:47] <mup> PR snapd#6227 opened: overlord,daemon: mock security backends for testing <Created by zyga> <https://github.com/snapcore/snapd/pull/6227>
[11:49] <mvo> zyga: with wrapper -> no error
[11:49] <zyga> seriously
[11:49] <zyga> how can that be!?!
[11:50] <zyga> maybe just back luck?
[11:50] <zyga> *bad
[11:51] <mvo> zyga: maybe, I run it again
[11:51] <mvo> zyga: the mock security backends got cherry-picked, right?
[11:51] <zyga> mvo: meaning?
[11:51] <zyga> I opened a 2.36 and master PRs
[11:51] <zyga> same patch
[11:51] <mvo> zyga: I mean, between 2.36 and master it can be chrry picked?
[11:51] <zyga> yes
[11:51] <mvo> zyga: great
[11:51] <zyga> two PRs are in flight now
[11:52] <mvo> zyga: ok
[11:52] <mvo> zyga: did you restart the 2.36?
[11:52] <zyga> yes
[11:52] <zyga> well
[11:52] <zyga> I pushed to include daemon mocking and devicestate mocking
[11:52] <zyga> so same patch in both places, one was new one was force pushed over the old one
[11:53] <mvo> zyga: ok
[11:58] <mup> PR snapd#6228 opened: snapstate,overlord: update fontconfig caches with overlord mocking (2.36) <Created by mvo5> <https://github.com/snapcore/snapd/pull/6228>
[12:01] <mvo> zyga: second run with wrapper starts now
[12:08] <zyga> mvo: I'm fixing leap bug
[12:08] <zyga> mvo: I'm a bit unsure if we should land https://github.com/snapcore/snapd/pull/6221
[12:08] <mup> PR #6221: interfaces: return security setup errors (2.36) <⛔ Blocked> <Created by zyga> <https://github.com/snapcore/snapd/pull/6221>
[12:08] <zyga> the risk is breaking snapd startup
[12:09] <mvo> zyga: got the error: https://paste.ubuntu.com/p/r54VmSGf37/
[12:09] <mvo> zyga: yeah, this is why I set it to blocked
[12:09] <zyga> perhaps we should still ignore this error: https://github.com/snapcore/snapd/blob/master/overlord/ifacestate/helpers.go#L187
[12:09] <zyga> looking
[12:09] <mvo> zyga: I think this needs some more work, breaking on startup is not ideal
[12:10] <zyga> I checked my PR, it's not blocking startup
[12:10] <mvo> zyga: hm, actually the real error is earlier, let me dig some more
[12:10] <zyga> memory of what I pushed last evening is rusty
[12:10] <zyga> mvo: is that the only invocation?
[12:10] <zyga> or last one
[12:11] <mvo> zyga: the last one, a gazillon before
[12:11] <zyga> aha
[12:11] <mvo> zyga: I thik `--replace --write-cache -O no-expr-simplify --cache-loc=/var/cache/apparmor /var/lib/snapd/apparmor/profiles/snap-confine.core.6022` is the failing one but the script needs to be smarter
[12:11] <zyga> maybe patch the wrapper to log the error too
[12:11] <mvo> zyga: yeah
[12:11] <zyga> cool
[12:11] <zyga> suggestion
[12:11] <zyga> when it fails copy the non -- arguments to /tmp/WAT/
[12:11] <zyga> for inspection
[12:11] <zyga> fingers crossed
[12:14] <zyga> pstolowski, mvo: shall we close https://github.com/snapcore/snapd/pull/6219
[12:15] <mup> PR #6219: overlord/tests: fix panic in managers test <⛔ Blocked> <Created by stolowski> <https://github.com/snapcore/snapd/pull/6219>
[12:18] <pstolowski> closed
[12:18] <mup> PR snapd#6219 closed: overlord/tests: fix panic in managers test <⛔ Blocked> <Created by stolowski> <Closed by stolowski> <https://github.com/snapcore/snapd/pull/6219>
[12:19] <zyga> pstolowski: thank you for writing that, without that PR I would never think that that code is using apparmor backend
[12:21] <mvo> yeah, thanks pstolowski and zyga for this one - now we just need to figure the appamor thing out to make me truly happy
[12:21] <pstolowski> yep, that an interesting one ;)
[12:38] <zyga> mvo: the branches are green
[12:38] <zyga> shall we merge https://github.com/snapcore/snapd/pull/6226
[12:38] <mup> PR #6226: overlord,daemon: mock security backends for testing (2.36) <Created by zyga> <https://github.com/snapcore/snapd/pull/6226>
[12:38] <zyga> and https://github.com/snapcore/snapd/pull/6227
[12:38] <mup> PR #6227: overlord,daemon: mock security backends for testing <Created by zyga> <https://github.com/snapcore/snapd/pull/6227>
[12:46] <zyga> pedronis: ^ ?
[12:48] <pedronis> one sec
[12:49] <pedronis> zyga: did you rebase it?
[12:49] <zyga> pedronis: the one on master, yes
[12:50] <pedronis> well seems you force pushed the one on 2.36 as well
[12:50] <zyga> yes, with daemon and devicestate changes
[12:50] <zyga> I wanted one patch for master
[12:53] <pedronis> zyga: do we need the devicestate changes? it doesn't seem to use the interface manager
[12:54] <zyga> pedronis: it spawns the overlord, that initializes apparmor backend doing some work
[12:54] <pedronis> where?
[12:54] <zyga> just creating the overlord is enough, then that adds the interface manager, that does the rest
[12:55] <zyga> I tested this with a printf next to AddBackend in the interface manager initialization function
[12:55] <zyga> (printing each security backend being added)
[12:55] <pedronis> zyga: it creates only a Mock overlord
[12:55] <pedronis> we do that all over tha place
[12:55] <pedronis> so we need to know because there might be more
[12:55] <zyga> Mock is safe
[12:55] <zyga> let me double check
[12:56] <pedronis> I don't see a overlord.New there
[12:56] <pedronis> but maybe I'm missing something
[12:57] <pedronis> zyga: I'm talking devicestate_test.go to be clear
[12:57] <pedronis> firstboot_test.go does create a full overlord
[12:58] <zyga> testing...
[12:58] <zyga> and I think you are correct, perhaps I did one too many :)
[12:58] <zyga> I can drop that
[13:00] <zyga> pushed
[13:00] <pedronis> looking in a sec
[13:00] <zyga> into the master version of the PR
[13:01] <zyga> devicestate tests are slooow
[13:01] <pedronis> zyga: :)
[13:02] <zyga> 25 seconds on a 12 core VM
[13:02] <pedronis> they got worse then
[13:02] <pedronis> anyway not a priority to improve that
[13:02] <zyga> pushed into 2.36 PR as well
[13:02] <pedronis> right now
[13:02] <pedronis> they take 4s here
[13:03] <pedronis> btw
[13:03] <zyga> PASS: devicestate_test.go:899: deviceMgrSuite.TestDoRequestSerialErrorsOnNoHost	20.213s
[13:03] <zyga> maye OS differences matter?
[13:03] <pedronis> no, network setup/dns resolution
[13:03] <pedronis> differences
[13:03] <zyga> yep
[13:04] <mup> PR snapd#6161 closed: tests: new test suite to run snapd tests on a google remote instance <Created by sergiocazzolato> <Closed by sergiocazzolato> <https://github.com/snapcore/snapd/pull/6161>
[13:06] <zyga> mvo: any luck?
[13:06] <zyga> https://github.com/snapcore/snapd/pull/6228 is green btw
[13:06] <mup> PR #6228: snapstate,overlord: update fontconfig caches with overlord mocking (2.36) <Created by mvo5> <https://github.com/snapcore/snapd/pull/6228>
[13:06] <pedronis> zyga: +1, need a 2nd review for master changes, also agree with mvo how land it in 2.36, will we get conflicts now?
[13:07] <zyga> pedronis: not likely
[13:07] <zyga> pedronis: the patches are the same
[13:07] <zyga> in any case, we can manage if some conflicts happen
[13:22] <zyga> I have a fix for the leap issue
[13:23] <zyga> mvo: I added a new apparmor level
[13:23] <zyga> ancient
[13:23] <zyga> when level is ancient, we don't enable the apparmor backend
[13:23] <zyga> when the parser has insufficient features to compile our basic profiles I return ancient
[13:23] <zyga> (using the existing parser feature check)
[13:24] <zyga> this makes leap 43.2 work
[13:24] <zyga> on TW all is good, as is on 16.04
[13:24] <zyga> parser feature check is not version based, we actually try to use the parseer
[13:24] <zyga> so this feels like a good way to fix this
[13:25] <zyga> I'll clean up the code slightly, add tests and propose
[13:28] <mvo> zyga: no luck yet, one full run worked
[13:28] <mvo> zyga: second run is ongoing
[13:28] <zyga> :/
[13:28] <zyga> but at least we're crawling out of the 2.36 hole
[13:29] <seb128> pstolowski, hey, unsure bug #1754345 should have been closed, it was an understood problem/assigned and an user states he's still having the issue, could you reconsider?
[13:29] <mup> Bug #1754345: Returns "invalid credentials" error while trying to refresh an invalid macaroon <snapd:Invalid by chipaca> <https://launchpad.net/bugs/1754345>
[13:30] <pedronis> zyga: pstolowski: mborzecki: btw both mvo and me have a bunch of meetings today so won't make the standup
[13:30] <zyga> pedronis: ack
[13:30] <mborzecki> ack
[13:30] <zyga> brace, the meetings are coming :)
[13:30] <pstolowski> seb128: sure, looking
[13:30] <seb128> pstolowski, thx
[13:31] <pstolowski> seb128: right, re-opened, thanks; i didn't see the new comment yet
[13:31] <mup> PR snapd#6229 opened: release: probe apparmor features lazily <Created by zyga> <https://github.com/snapcore/snapd/pull/6229>
[13:31] <seb128> pstolowski, thx
[13:32] <zyga> mvo: 1st of the several branches that lead towards leap fix: https://github.com/snapcore/snapd/pull/6229
[13:32] <mup> PR #6229: release: probe apparmor features lazily <Created by zyga> <https://github.com/snapcore/snapd/pull/6229>
[13:32] <zyga> pstolowski, mborzecki ^ if you can
[13:32] <mborzecki> zyga: looking
[13:32] <zyga> for context: we load apparmor even when it's kernel features are sane but userspace is too old,
[13:33] <zyga> this is the first of that sequence: the next is caching of parser feature check, the last is introduction of the "ancient" apparmor "level" (level is how we classify apparmor) and usage of that level when parser doesn't have the feature we need
[13:33] <zyga> decided to split because smaller patches and we can see if something falls apart
[13:34] <mborzecki> zyga: lgtm, not that the user can switch off secrity=apparmor at runtime
[13:34] <zyga> yep
[13:34] <zyga> that's handled
[13:34] <zyga> :)
[13:34] <zyga> (when you do that apparmor fs is not mounted)
[13:34] <zyga> and we treat it as apparmor none
[13:35] <mborzecki> that selinux uhh, funny how all the information is scattered everywhere, i'm sure ther's like a handfull of people who know their way around it
[13:35] <zyga> mborzecki: laughing their way to the bank ;)
[13:36] <zyga> mvo: what do you want to do about 6228
[13:36] <zyga> merge it or drop it?
[13:37] <zyga> brb, more tea, it's still freezing in here
[13:38] <mborzecki> i had this nice split into snapd, helpers with specific types, separate for snap-{update,discard}-ns, s-c and cli tools snap{,ctl} but was getting EACCESS when snapd (snappy_t domain) tried to run s-u-n (snappy_mount_t domain), even though i seemigly had all the stuff set up properly (which apparenly i didn't)
[13:38] <pstolowski> zyga: looking
[13:39] <mvo> zyga: woah, its green now 6228 - thats so strange
[13:39] <mvo> zyga: I run it again, just to see what happens
[13:42] <zyga> mvo: backends were messing up the system
[13:42] <zyga> note
[13:42] <zyga> mvo: when backends were paritally mocked
[13:42] <mvo> zyga: well, maybe
[13:43] <mvo> zyga: but how?
[13:43] <zyga> e.g. not disabled
[13:43] <mvo> zyga: I mean, I get the theory
[13:43] <zyga> but not mocking commands
[13:43] <zyga> and spread runs unit tests as root
[13:43] <zyga> things really affect the system
[13:43] <zyga> and silly tests that fake installs core
[13:43] <mvo> zyga: yeah, but rootdir is set to something different
[13:43] <zyga> mmmm
[13:43] <zyga> yeah
[13:43] <zyga> that's true
[13:43] <zyga> well
[13:43] <zyga> ish
[13:43] <mvo> zyga: I mean, I want to believe
[13:43] <zyga> it may be set to whatever
[13:43] <zyga> but if we can call apparmor_parser /path/to/whatever
[13:43] <zyga> it would still do stuff :)
[13:43] <mvo> zyga: it will
[13:44] <zyga> did your overloaded parser command catch any like that?
[13:44] <mvo> zyga: and yet the file is correct when I get a debug shell, apparmor parser is correct
[13:44] <mvo> zyga: still no reproducer since the one I showed you earlier
[13:44] <zyga> what do you have in your tree?
[13:44] <zyga> as in
[13:44] <zyga> which patches
[13:44]  * zyga does it fail on vailla 2.36?
[13:45] <mvo> zyga: this is run-fontconfig-2.36 with the extra apparmor_parser wrapper
[13:45] <zyga> aha
[13:45] <zyga> well
[13:45] <zyga> magic
[13:45] <mvo> zyga: I have not tried vanialla
[13:45] <mvo> zyga: and yes, magic - the bad kind
[13:45] <zyga> that should trigger it
[13:45] <zyga> what you have
[13:45] <zyga> as long as you don't have the mocking patch
[13:45] <zyga> where backends are nil
[13:45] <mvo> yeah, I don't have this patch
[13:46] <xnox> mvo, has i broken systemd in disco on arm64... or? https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-disco/disco/arm64/s/snapd/20181127_120551_2b149@/log.gz
[13:46] <xnox> 2018-11-27 11:33:48 Error executing autopkgtest:ubuntu-19.04-arm64:tests/main/interfaces-daemon-notify :
[13:46] <xnox> -----
[13:46] <xnox> also lots of apparmor denials
[13:46] <xnox> [Tue Nov 27 11:33:26 2018] audit: type=1400 audit(1543318407.221:18418): apparmor="DENIED" operation="exec" profile="snap.test-snapd-daemon-notify.notify" name="/bin/systemd-notify" pid=4946 comm="notify" requested_mask="x" denied_mask="x" fsuid=0 ouid=0
[13:46] <xnox> [Tue Nov 27 11:33:26 2018] audit: type=1400 audit(1543318407.229:18419): apparmor="DENIED" operation="open" profile="snap.test-snapd-daemon-notify.notify" name="/bin/systemd-notify" pid=4946 comm="notify" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[13:46] <xnox> [Tue
[13:46] <mvo> zyga: I think you are right, something in this test is bleeding into the others but there are some holes still I think
[13:47] <zyga> mvo: perhaps we should not run unit tests as root :)
[13:47] <mvo> xnox: thanks, I need to look at this
[13:47] <zyga> in case they are less unit-y
[13:47] <zyga> call them
[13:47] <zyga> practical tests
[13:47] <mvo> zyga: iirc they run with some su -l call, let me look
[13:47] <zyga> practical detonation tests
[13:47] <zyga> in the spread task?
[13:47] <zyga> I didn't check
[13:47] <mvo> zyga: they run as user "test"
[13:47] <zyga> uh
[13:47] <zyga> ok
[13:48] <zyga> so that theory goes out the window
[13:48] <zyga> how about package build
[13:48] <zyga> do we do nocheck?
[13:48] <mvo> zyga: at least the ones in tests/unit/go
[13:48] <mvo> zyga: yeah, let me check package build
[13:48] <mvo> zyga: dpkg-buildpackage is also run with su test and fakeroot
[13:49] <zyga> ok, so all theories are out the window
[13:49] <zyga> feels like if we don't find this
[13:49] <zyga> it will just come back :)
[13:49] <mvo> zyga: yeah, its very strange
[13:50] <mvo> zyga: anyway, a shame, I really liked the theory :(
[13:50]  * mvo weeps a bit in the corner
[13:58] <zyga> mvo: can you ack https://github.com/snapcore/snapd/pull/6227 please
[13:58] <mup> PR #6227: overlord,daemon: mock security backends for testing <Created by zyga> <https://github.com/snapcore/snapd/pull/6227>
[13:59] <zyga> mborzecki: can you please look at https://github.com/snapcore/snapd/pull/6149 after the standup
[13:59] <mup> PR #6149: cmd/snap-confine: capture initialized per-user mount ns <Per-user mount ns  🐎> <Created by zyga> <https://github.com/snapcore/snapd/pull/6149>
[13:59] <zyga> it's the next part of the feature branches
[14:00] <zyga> holly molly
[14:00] <zyga> mvo: noooooooo
[14:00] <zyga> :D
[14:00] <zyga> https://api.travis-ci.org/v3/job/460760408/log.txt
[14:00] <zyga> https://www.irccloud.com/pastebin/oxArPCqx/
[14:00] <zyga> this is from https://github.com/snapcore/snapd/pull/6226
[14:00] <mup> PR #6226: overlord,daemon: mock security backends for testing (2.36) <Created by zyga> <https://github.com/snapcore/snapd/pull/6226>
[14:03] <zyga> mvo: again failed on 16.04 only
[14:04] <zyga> mvo: my take on this is that there are multiple things happing in parallel: the backends doing stuff turned out to be a fluke (they just affect the unit tests), the ignored error in setup is interesting and I would like to focus on reproducing that problem with the panic issue out of the way
[14:14] <zyga> mvo: I added a restore-each check for "signal: terminated"
[14:15] <zyga> fingers crossed (if that happens)
[14:15] <zyga> using the same seed that was in the failing log
[14:19] <zyga> wow, hit the problem instantly
[14:19] <zyga> looking
[14:20] <zyga> ooooooooh
[14:20] <zyga> mvo: THEORY :)
[14:20] <zyga> 2018-11-28T14:18:10Z INFO Requested daemon restart.
[14:20] <mvo> zyga: tell me
[14:21] <mvo> zyga: termintaes as part of daemon restart?
[14:21] <zyga> yes
[14:21] <zyga> checking
[14:21] <zyga> hold on
[14:21] <zyga> mvo:
[14:21] <zyga> wowoooow
[14:21] <zyga> Nov 28 14:18:44 nov281411-886866 snapd[29935]: helpers.go:187: cannot regenerate seccomp profile for snap "core": signal: terminated
[14:21] <zyga> look what this says
[14:21] <mvo> zyga: nice, can't wait (and I'm in a meeting)
[14:21] <zyga> snap-seccomp
[14:21] <zyga> 14:18
[14:22] <mvo> zyga: ohhh
[14:22] <zyga> this is not even apparmor
[14:22] <mvo> zyga: nice
[14:22] <zyga> it's any child
[14:22] <mvo> zyga: so a missing wait somewhere? when we shut down/restart?
[14:22] <mvo> zyga: nice!
[14:22] <mvo> zyga: nice nice nice
[14:22] <zyga> probably
[14:22] <zyga> but man
[14:23] <zyga> looking at timing logs
[14:23] <mvo> zyga: also means we had this forever
[14:23] <zyga> yes
[14:23] <mvo> zyga: we just now notice because the profile actually changed
[14:23] <mvo> zyga: so far all the bits fit :)
[14:24] <zyga> https://github.com/snapcore/snapd/pull/6230
[14:24] <mup> PR snapd#6230 opened: spread: detect "signal: terminated" in journal logs <Created by zyga> <https://github.com/snapcore/snapd/pull/6230>
[14:24] <mup> PR #6230: spread: detect "signal: terminated" in journal logs <Created by zyga> <https://github.com/snapcore/snapd/pull/6230>
[14:24] <zyga> need to run for lunch
[14:24] <zyga> but so far best theory
[14:24] <zyga> and man, we suck :)
[14:24] <mvo> zyga: !!!
[14:24]  * mvo hugs zyga 
[14:32] <pstolowski>  /o\
[14:35] <zyga> mvo: is it that fix for daemon shutdown that [c]ipaca did?
[14:36] <zyga> I'll try
[14:37]  * cachio going to the bank
[14:37] <cachio> and lunch
[14:39] <mvo> zyga: as a quick test - we can set KillMode=process and see if that helps
[14:39] <mvo> zyga: I bet it does
[14:40] <zyga> mvo: better, I'll cherry pick the two patches from chipaca that fix it
[14:40] <zyga> it's super easy to trigger now
[14:40] <zyga> with that journalctl check
[14:40] <mup> PR snapd#6228 closed: snapstate,overlord: update fontconfig caches with overlord mocking (2.36) <Created by mvo5> <Closed by mvo5> <https://github.com/snapcore/snapd/pull/6228>
[14:42] <zyga> mvo: running now
[14:43] <mvo> zyga: cool, let me know
[14:43] <zyga> fingers very much croessed
[14:43] <zyga> *crossed
[14:43] <mvo> zyga: hopefully the fix from chipaca is enough
[14:43] <zyga> mvo: I will look at what happens in the daemon restart sequence now
[14:43] <mup> PR snapd#6231 opened: data: set KillMode=process <Created by mvo5> <https://github.com/snapcore/snapd/pull/6231>
[14:44] <zyga> thanks for that ^ mvo
[14:44] <zyga> feels like attacking many fronts at the same time yields results
[14:44] <mvo> zyga: the critical bit is that we can't stop the daemon before all the subprocesses are finished
[14:44] <mvo> zyga: or we need Killmode=process
[14:45] <zyga> mvo: are you out of the meeting spree?
[14:45] <zyga> can you review https://github.com/snapcore/snapd/pull/6227
[14:45] <mup> PR #6227: overlord,daemon: mock security backends for testing <Created by zyga> <https://github.com/snapcore/snapd/pull/6227>
[14:46] <mvo> zyga: yeah, we are doing a pincer move on this (remember Cannae) - anyway, still meeting so only 1/4 brain available
[14:51] <zyga> running, no failures yet
[14:51] <mvo> zyga: the one from john?
[14:51] <zyga> yes
[14:51] <mvo> zyga: that is definitely the best solution if that is enough - yay^2
[14:51] <zyga> I took two patches from john and kept my journalctl test
[14:51] <zyga> yes :)
[14:51] <mvo> zyga: please propose to 2.36
[14:51] <zyga> absolutely will
[14:51] <zyga> if this passes
[14:51] <mvo> zyga: than hopefully when the meetings are over I can merge and get on with life
[14:52] <mvo> zyga: thank you!
[14:52] <zyga> thank you
[14:52] <zyga> for 2.36.2 I would like to fix leap too, so if time permits I will try
[14:52] <zyga> 2.36.2 is tomorrow?
[14:52] <mvo> zyga: or later tonight
[14:52] <zyga> ok
[14:52] <zyga> can be tonight
[14:52] <zyga> I mean, I have the patches now
[14:52] <mvo> zyga: I would rather do a .3 than to wait tbh
[14:52] <zyga> ok
[14:52] <zyga> I can distro patch too
[14:52] <mvo> zyga: I think its fine, we can do .3 with --trace-exec
[14:52] <mvo> zyga: and your fix
[14:52] <zyga> ok
[14:52] <mvo> zyga: etc
[14:53] <zyga> let's do .2 today if we can
[14:53] <mvo> zyga: but yeah, depends on how long tests take and all that
[14:53] <zyga> and .3 tomorrow with extra leap fix
[14:53] <zyga> (leap is not critical since it doesn't break the use of apps)
[14:53] <mvo> zyga: so including it is not off the chart, just that I would not want to wait for it
[14:53] <zyga> I agree
[14:53] <zyga> 40/364, no errors in sight
[14:53] <mvo> zyga: hurrazh!
[14:54] <mvo> zyga: I close the KillMode=process then
[14:54] <mvo> zyga: at least for now
[14:54] <zyga> keep it
[14:54] <zyga> let's see what happens
[14:54] <mvo> zyga: you think so? ok
[15:07] <zyga> mvo: nooo, broke again
[15:07] <zyga> looking
[15:07] <zyga> Nov 28 15:07:08 nov281459-174393 snapd[30000]: helpers.go:187: cannot regenerate seccomp profile for snap "core": signal: terminated
[15:08] <zyga> I find this part interesting:
[15:08] <zyga> https://www.irccloud.com/pastebin/WYfAENsz/
[15:08] <zyga> they always come in pairs, first this then the message that we're ready to go, in deamon
[15:08] <zyga> so it must be the profile regen code
[15:09] <mvo> zyga: lets see if KillMode= helps
[15:09] <zyga> journald logs: http://paste.ubuntu.com/p/hv8y3mgQJF/
[15:09] <zyga> Nov 28 15:06:18 nov281459-174393 snapd[29138]: main.go:82: DEBUG: Setting up sd_notify() watchdog timer every 2m30s
[15:10] <zyga> sooooo
[15:10] <zyga> theory
[15:10] <zyga> we start up
[15:10] <zyga> shit takes time
[15:10] <zyga> systemd says "bye dude" and kills us
[15:10] <zyga> I once told maciej that during startup
[15:10] <zyga> when we regen profiles
[15:10] <zyga> we should tell systemd "gimme more time"
[15:10] <zyga> otherwise watchdog
[15:10] <zyga> may
[15:10] <zyga> may kill us on a slow system
[15:10] <zyga> viable?
[15:12] <zyga> http://paste.ubuntu.com/p/Bp9jBsjN3S/ <- complete journal logs
[15:13] <zyga> so
[15:13] <zyga> I don't get it
[15:13] <zyga> look at that 2nd log please
[15:13] <zyga> go to line 1272
[15:13] <zyga> we just installed core rev 6022
[15:13] <zyga> on line 1280 we restart
[15:13] <mvo> zyga: still meeting will look in a wee bit
[15:13] <zyga> now comes up something weird
[15:14] <zyga> we do a bit of restarts
[15:14] <zyga> one after another
[15:14] <zyga> (sure, I'm just letting my mind flow)
[15:14] <cwayne> Wow when did mvo become Scottish
[15:15] <zyga> it's double plus interesting that the test this failed on is main/degraded
[15:15] <mvo> zyga: oh, interessting - we could disable the watchdog to test this theory
[15:15] <zyga> i.e. nothing special
[15:15] <zyga> yep, I'll try that in a moment
[15:15] <mvo> cwayne: haha - I think I read too much terry pratchett
[15:15] <zyga> we restart about a half a dozen times
[15:15] <zyga> and what's with CUPS and ACPI being restarted
[15:16] <zyga> are we doing something wrong in test prep?
[15:17] <zyga> mvo: I don't think this is the watchdog
[15:17] <zyga> there's just one snap installed, core
[15:17] <zyga> would not take that long
[15:17] <zyga> on line 1375 test helpers say "new test starts here"
[15:18] <zyga> going to look what prints that
[15:24] <mvo> zyga: lets have a quick HO to catchup after my meeting(s)
[15:24] <zyga> gladly
[15:24] <zyga> I'm out of ideas
[15:24] <zyga> looking at your kill mode branch
[15:24] <mvo> zyga: it is running and still going strong afaict
[15:24] <mvo> zyga: so fingers crossed, I have a theory that I would like to talk through (to see if it is really cohesive)
[15:25] <zyga> list of jobs that failed with "signal: terminated" present in the log https://www.irccloud.com/pastebin/ZQikNO6z/
[15:25] <zyga> I wonder if this list is stable
[15:25] <zyga> I wonder if there's a combination of test activity and something else
[15:25] <zyga> your killmode branch seems to suggest that systemd stops snapd
[15:26] <zyga> and things go south because children die too
[15:26] <zyga> question is why
[15:26] <zyga> why are we being stopped
[15:26] <zyga> or more precisely: why this happens this way
[15:26] <zyga> mvo: one more idea to fix this
[15:26] <zyga> mvo: in regen all profiles code path, when anything fails, we don't write system key
[15:26] <mvo> zyga: systemd
[15:26] <zyga> then next time we try again
[15:26] <mvo> zyga: oh, interessting
[15:27] <zyga> yes, systemd but why :-)
[15:27] <zyga> yep
[15:27] <zyga> I'll make coffee
[15:27] <zyga> when will your meetings finish?
[15:27] <mvo> zyga: yeah, lets tlak in the HO
[15:27] <mvo> zyga: in 5-20min
[15:27] <zyga> ok
[15:27] <zyga> I'll take a break now
[15:27] <mvo> zyga: sounds great
[15:27] <zyga> see you in 15
[15:27] <mvo> zyga: I have a theory
[15:27] <mvo> zyga: so don't worry - and poke holes into it once we talk :)
[15:37] <zyga> back
[15:41] <mvo> zyga: let me make a cup of tea, meeting is over, ready in 3min
[15:41] <zyga> ok
[15:44] <zyga> started another run with:
[15:44] <zyga> chipaca's patches
[15:44] <zyga> terminated detector
[15:44] <zyga> propagated error from snap-confine apparmor setup
[15:44] <zyga> and system-key write skip if something fails
[15:44] <zyga> let's see what we get now
[15:46] <pedronis> zyga: mvo: let me know if I can help, want to discuss, I need a short break now tough
[15:46] <zyga> k
[15:51] <pedronis> mvo: is the fontconfig stuff expected in #6231?
[15:51] <mup> PR #6231: data: set KillMode=process <Created by mvo5> <https://github.com/snapcore/snapd/pull/6231>
[15:51] <mvo> pedronis: yes, lets set this to blocked
[15:51] <mvo> pedronis: its an experiment to see if that fixes the issue in the test
[15:51] <mvo> s
[15:54] <Mmike> Hi, lads. Can't snaps run in lxc containers?  I've created a fresh, empty container (lxc launch ubuntu:16.04 xen-snaptest), when it started I added my ssh key to ubuntu user (lxc exec xen-snaptest), sshed into the container, did apt update/upgrade mantra, then did: snap install hello. Snap installed, but when I try to run 'hello' command, this is what I get:
[15:54] <Mmike> ubuntu@xen-test:~$ hello
[15:54] <Mmike> cannot remount /tmp/snap.rootfs_z1wdjU/var/lib/snapd/lib/vulkan as read-only: Permission denied
[15:55] <Mmike> my container is unprivileged
[15:55] <Mmike> at least the lxd docs say defaults are unpriv containers, and also I can see that /proc inside the container is owned by nobody.nogroup
[15:56] <mup> PR snapd#6232 opened: overlord/snapstate: support for pre-remove hook <Created by stolowski> <https://github.com/snapcore/snapd/pull/6232>
[15:58] <pedronis> mvo: zyga: so we are getting killed why running subprocesses?
[15:59] <pedronis> but we don't know what is killing us?
[15:59] <pedronis> s/why/while/
[16:05] <zyga> pedronis: hey
[16:05] <zyga> can you join the standup please
[16:05] <zyga> it will be easier to explain this way
[16:36] <mvo> zyga: and remember to listen to the free software song
[16:38] <zyga> mvo: systemd uses kill(pid, SIGTERM)
[16:38] <zyga> mvo: not -pid
[16:38] <zyga> (sorry, I didn't mean to say -SIGTERM earier)
[16:38] <zyga> I will nuke system key and see what happens during a restart
[16:40] <mvo> zyga: hm, does it send SIGERM to anything else, i.e. a running snap-seccomp or similar?
[16:40] <zyga> mvo: going through the log now
[16:42] <mup> PR snapd#6183 closed: sanity, spread, tests: add CentOS (2.36) <Created by bboozzoo> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/6183>
[16:43] <mvo> 6195 and 6218 need a second review
[16:44]  * mvo gets dinner while waiting for this
[16:53] <zyga> mvo: this is what happens:
[16:53] <zyga> systemd killing everything in a cgroup https://www.irccloud.com/pastebin/2TpqMTd7/
[16:55] <zyga> immediately after that there's a second loop that ensures the cgroup is now empty
[16:56] <zyga> mvo: that's settled
[16:56] <zyga> but one thing still bugs me
[16:56] <zyga> if we only tell systemd that we are ready after we are done starting the overlord
[16:56] <zyga> how can we ever be killed mid way?
[16:56] <zyga> are we not telling systemd about our readiness on 16.04?
[16:59] <zyga> on 14.04 we are type=notify
[16:59] <zyga> I wonder if this means we still don't fully know how this happens
[17:11] <zyga> mvo: forkstat debugging  https://www.irccloud.com/pastebin/yeeIBEmw/
[17:11] <zyga> just to put my mind at ease
[17:11] <zyga> mvo: ^ this is actually a pretty amazing way to see what was going on around each test
[17:12] <zyga> and have historical data if it fails
[17:14] <zyga> small digression about our lives: https://youtu.be/sTdWQAKzESA
[17:21] <zyga> eh, forkstat -l is too new
[17:24] <roadmr> hi jdstrand !
[17:58] <Saviq> cachio: hey, are you guys testing on Fedora rawhide? or plan to?
[17:58] <zyga> Saviq: not testing now
[17:58] <zyga> Saviq: planning to is hard to say
[17:59] <Saviq> what I'm really asking is how to reliably get a rawhide env on spread+google... but I suppose that answers that ;)
[17:59] <zyga> I'll let cachio answer that :)
[17:59] <Saviq> we're trying to upgrade from the latest image available, but that misses the "reliable" bit
[18:06] <zyga> Nov 28 18:05:08 nov281757-030500 snapd[30252]: helpers.go:192: cannot regenerate seccomp profile for snap "core": signal: terminated
[18:06] <zyga> http://paste.ubuntu.com/p/4X5KvfZB7W/ < forkstat
[18:07] <zyga> mvo: ^
[18:10] <zyga> mvo: this forkstat stuff may also help maciej with systemd / mount error
[18:10] <zyga> e.g. would you guess we are calling probe-bcache on all those snaps?
[18:12] <zyga> 18:05:08 exit  30278     15    2.510 /snap/core/6022/usr/lib/snapd/snap-seccomp compile /var/lib/snapd/seccomp/bpf/snap.core.hook.configure.src /var/lib/snapd/seccomp/bpf/snap.core.hook.configure.bin
[18:13] <zyga> this is snap-seccomp existing after SIGTERM
[18:14] <zyga> mvo: there are some more details here
[18:14] <zyga> mvo: anyway, ping if you are around
[18:14] <zyga> otherwise let's chat later
[18:19] <zyga> mvo: we should look at this tomorrow
[18:19] <zyga> it seems that snapd was running long before
[18:19] <zyga> mvo: tracing the snap-seccomp that was stopped
[18:31] <mvo> zyga: hm?
[18:31] <zyga> hey
[18:31] <zyga> if you want please look at the pastebin above
[18:31] <mvo> zyga: looking
[18:32] <zyga> find the line where snap-seccomp process 30278 is killed
[18:32] <zyga> then look back in time
[18:32] <zyga> to see when it is started
[18:32] <zyga> one thing that doesn't make sense in the current theory
[18:32] <zyga> is that "systemctl restart snapd.service"
[18:32] <zyga> that will _wait_ until security is setup, right?
[18:33] <mvo> zyga: I'm not sure
[18:33] <mvo> zyga: I think we need to look at systemd and read how it behaves for things not fully started up
[18:33] <zyga> in general, it waits until the service is ready
[18:33] <zyga> my point is that if we had started snapd  in the past
[18:33] <zyga> it has completed the setup op
[18:33] <zyga> what we're observing is disagreeing with that
[18:34] <zyga> it seems that when we ask snapd to restart (from a synchronously running shell script)
[18:34] <zyga> it is still starting up
[18:34] <zyga> I don't know how to explain that
[18:34] <zyga> apart from my misunderstanding of systemd
[18:34] <zyga> but snapd.service is service type "notify"
[18:34] <zyga> anyway, look at the log, at the branch
[18:34] <zyga> you can reproduce the failure yourself
[18:34] <zyga> and look at both forkstat data
[18:34] <zyga> and at journal
[18:34] <zyga> I think this is very very useful in general, for debugging all sort of issues
[18:36] <mvo> zyga: hm, hm, I see
[18:36] <zyga> did I share the seed?
[18:36] <mvo> zyga: I don't see in the forkstat when a process dies and why - or am I looking at the wrong thing?
[18:37] <zyga> mvo: spread -debug -v -seed=1543419805 google:ubuntu-16.04-64:
[18:37] <zyga> you are not looking right
[18:37] <zyga> look at the output above:
[18:37] <zyga> 18:05:08 exit  30278     15    2.510 /snap/core/6022/usr/lib/snapd/snap-seccomp compile /var/lib/snapd/seccomp/bpf/snap.core.hook.configure.src /var/lib/snapd/seccomp/bpf/snap.core.hook.configure.bin
[18:37] <zyga> this tells you that on 18:05:08 a process with pid 30278 exited due to signal 15
[18:37] <zyga> taking 2.510 seconds of execution time
[18:37] <zyga> you can now look in past for pid 30278
[18:38] <zyga> you will find when the process was launched
[18:39] <mvo> zyga: aha, nice
[18:39] <mvo> zyga: sorry, just not used to ready it
[18:39] <zyga> yeah
[18:39] <mvo> zyga: uh, just looking at daemon.go - we set READY=1 in Start() so maybe we do it too early
[18:39] <zyga> took me some time to figure it out
[18:39] <zyga> oooh
[18:39] <zyga> yeah
[18:39] <zyga> definitely!
[18:39] <zyga> actually
[18:39] <zyga> let me look :D
[18:39] <zyga> I think I'm easily excited
[18:40] <mvo> zyga: ahahahaha
[18:40] <mvo> zyga: well, looking now as well where exactly the security setup happens
[18:41] <zyga> security setup happens in overlord.New
[18:41] <zyga> it's all packed there
[18:41] <zyga> but
[18:41] <zyga> maybe
[18:41] <zyga> what I'm missing is that some of it runs in a goroutine?
[18:41] <zyga> but probably not
[18:42] <zyga> mvo: if you follow overlord.New you will get to ifacestate.Manager()
[18:42] <zyga> that follows to m.initialize
[18:43] <zyga> and that to m.regenerateAllSecurityProfiles
[18:43] <zyga> one possible alternative is that there was a real refresh
[18:43] <zyga> but not sure that's even possible
[18:43] <mvo> zyga: hm, indeed
[18:44] <zyga> one thing for sure: forkstat == invaluable for corelating logs and activity
[18:44] <zyga> we can look at snap changes, journal and forkstat
[18:44] <zyga> and see what happens
[18:44] <zyga> I won't get to the bottom of this tonight but we have all the means to conclusively say what the issue is
[18:45] <zyga> mvo: as for systemd killing processes
[18:45] <zyga> it seems to just read the cgroup .procs file
[18:45] <zyga> and kill one by one
[18:45] <zyga> not the parent
[18:45] <zyga> I would love to cross check that with code and docs
[18:45] <zyga> I have not yet
[18:45] <zyga> that's from my strace'ing session
[18:45] <pedronis> zyga: there shouldn't be snapd own goroutines going (except the watchdog one) until inside start I think
[18:45] <pedronis> Start
[18:45] <zyga> mmm
[18:46] <zyga> interface manager doesn't have any ensure logic that would do security setup
[18:46] <zyga> unless hotplug and autoconnect are considered
[18:46] <pedronis> ?
[18:47] <zyga> I meant that if there are no things going on in the background outside of the state manager then this must be synchronously running from overlord.New
[18:47] <mvo> zyga: do you have a current jounalctl -u snapd log?
[18:47] <zyga> er, I did but I closed the session
[18:47] <pedronis> zyga: that's also what I said
[18:47] <zyga> I can restart and give you one in a moment
[18:47] <zyga> pedronis: then we are in agreement and this suggests that snapd was stopped while it was still starting
[18:47] <mvo> zyga: just curious what we see there (e.g. snapd starting and then the error right after that)
[18:47] <zyga> mvo: started now
[18:48] <zyga> I will pastebin the three logs when I have them
[18:48] <pedronis> zyga: but regenerateAllSecurityProfiles is called by initialize
[18:48] <pedronis> and does security setup, no?
[18:48] <zyga> yed
[18:48] <zyga> *yes it does
[18:48] <pedronis> that's the bit I was confused
[18:48] <zyga> it's on the direct call path from overlord.New
[18:48] <pedronis> yes
[18:49] <pedronis> anyway no Ensure is called before do Loop in Start
[18:49] <cachio> Saviq, we dont test on rawhide
[18:49] <zyga> hmm, overlord.Start is called and then we tell systemd we're ready
[18:50] <zyga> I'm sorry, I meant overlord.Loop
[18:50] <pedronis> yes
[18:52] <cachio> Saviq, so far we test on f28
[18:52] <cachio> Saviq, and we are gonna test on f29 soon
[18:53] <cachio> Saviq, no plans for rawhide, why?
[18:54] <Saviq> cachio: we're ~testing Mir on rawhide, was wondering if you have a reliable solution
[18:54] <Saviq> so we're upgrading from f28 to rawhide in spread, but that's not really reliable...
[18:55] <zyga> mvo: got it
[18:55] <zyga> mvo: Nov 28 18:55:03 nov281847-525365 snapd[30231]: helpers.go:192: cannot regenerate seccomp profile for snap "core": signal: terminated
[18:55] <zyga> mvo: forkstat http://paste.ubuntu.com/p/DqmTPympC6/
[18:56] <zyga> mvo: journal http://paste.ubuntu.com/p/4h9gcyWHCH/
[18:56] <zyga> mvo: snap changes http://paste.ubuntu.com/p/snbqrwBXMW/
[18:57] <cachio> Saviq, still not testing on f29
[18:57] <cachio> Saviq, this it blocking that https://forum.snapcraft.io/t/issue-with-repackaged-core-and-testing/8394
[18:57] <cachio> Saviq, I'll let you know if there is a plan to test on rawhide
[18:57] <zyga> mvo: (phone)
[19:02] <zyga> mvo: if you are looking at this
[19:02] <zyga> mvo: I would love to understand the origin of the restart frenzy from 18:54:58 toll 18:55:02
[19:07] <cachio> Saviq, do you need an image of f rawhide?
[19:08] <Saviq> cachio: if it's not any trouble :)
[19:08] <cachio> Saviq, I'll create it today
[19:08] <cachio> I'll ping you once it is ready
[19:08] <mvo> zyga: sorry, was distracted as well - maybe we should add a debug.log() before READY=1
[19:33] <pedronis> mvo: I gave my +1 to #6185, needs a full 2nd review, and there are already some interesting comments there from other reviewers
[19:33] <mup> PR #6185: snap: add new `snap run --trace-exec` call <Performance 🚀> <Created by mvo5> <https://github.com/snapcore/snapd/pull/6185>
[19:35] <mvo> pedronis: thank you. yeah, second review for fontconfig needed
[19:36] <mvo> pedronis: I will look at the comments from pawel in the morning
[19:40] <zyga> mvo: can you look at https://github.com/snapcore/snapd/pull/6227
[19:41] <mup> PR #6227: overlord,daemon: mock security backends for testing <Created by zyga> <https://github.com/snapcore/snapd/pull/6227>
[19:41] <zyga> I'd love to shrink the number of PRs
[19:43] <mup> PR snapd#6227 closed: overlord,daemon: mock security backends for testing <Created by zyga> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/6227>
[19:44] <mup> PR snapd#6226 closed: overlord,daemon: mock security backends for testing (2.36) <Created by zyga> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/6226>
[19:44] <zyga> thanks
[19:45] <mup> PR snapd#6229 closed: release: probe apparmor features lazily <Created by zyga> <Merged by zyga> <https://github.com/snapcore/snapd/pull/6229>
[19:45] <mvo> zyga: thank *you*
[19:45] <zyga> I'll prepare that final small PR with system key behavior
[19:45] <mvo> zyga: 6195 needs a second review
[19:45] <zyga> aha
[19:46] <zyga> looking
[19:47] <pedronis> mvo: zyga: I think I reviewed the pending things for 2.36 blocking a .2 right?
[19:47] <zyga> AFAIK yes
[19:47] <mvo> pedronis: yeah, I think we are good
[19:47] <pedronis> zyga: mvo: thanks for chasing these issues btw
[19:49]  * pedronis goes a bit afk
[19:49] <mvo> pedronis: thank you, yeah, happy that we found the issue. enjoy the evening
[19:49]  * mvo also needs to be afk
[20:29] <zyga> pushed the syskey handling change
[20:29] <mup> PR snapd#6233 opened: overlord: don't write system key if security setup fails <Created by zyga> <https://github.com/snapcore/snapd/pull/6233>
[20:29] <mup> PR snapd#6234 opened: overlord: don't write system key if security setup fails (2.36) <Created by zyga> <https://github.com/snapcore/snapd/pull/6234>
[20:29] <zyga> I will add unit tests tomorrow, it's far too late anyway
[20:29] <zyga> ttyl!
[20:38] <mup> PR snapd#6235 opened: overlord,apparmor: new syskey behaviour + non-ignored snap-confine profile errors <Created by zyga> <https://github.com/snapcore/snapd/pull/6235>
[20:50]  * roadmr repings jdstrand
[21:03] <jdstrand> roadmr: sorry, still going through backscroll from being off. what's up?
[21:04] <roadmr> jdstrand: wanted to ask! is it possible to make a plug in one snap automagically connect to another snap's slot? I know the IDs of both snaps.
[21:24] <jdstrand> roadmr: yes
[21:25] <roadmr> \o/ jdstrand how would I go about it? I'm happy to study existing examples if you point me to some
[22:14] <zyga> hey jdstrand, long time no see
[22:30] <jdstrand> zyga: hey, yeah