[05:49] morning [06:19] mborzecki: hi! [06:20] mardy: any new findings since yesterday? [06:28] mborzecki: nope [06:51] PR snapd#10848 closed: interfaces/u2f-devices: add GoTrust Idem Key (https://launchpad.net/bugs/1945182) [06:55] mvo: thanks for landing https://github.com/snapcore/snapd/pull/10845, i see it's already set for 2.52 [06:55] PR #10845: interfaces/seccomp: add clone3 to default template [06:56] PR snapd#10845 closed: interfaces/seccomp: add clone3 to default template [07:01] morning [07:06] mborzecki: yeah, I cherry picked it already iirc [07:08] mvo: thank you! [07:23] good morning pstolowski, mvo [07:24] mborzecki: this is another debug attempt, I want to see if it would work if systemd handled it all itself: https://github.com/mardy/snapd/commit/7aa7bca91a4a243e06c6a72bd6aacd9cdc12d457 [07:24] so far appears to be working, but I'll try in the afternoon, when it usually fails [07:28] mborzecki: actually, even not going that far: since our snap is run in a scope, we do not need to move the PID and create a subgroup, right? (opposed to what I was doing yesterday, when I enabled delegation and continued creating a subgroup) [07:37] mardy: hmm actually the docs aren't clear, groups upper in the hierarchy are managed by systemd, below that unit are managed by you, but how about the unit then? [07:38] in the meantime, i think we have real trouble with nvidia and new glibc now [08:21] mardy: out of curiosity i've added the synchronization with JobRemoved signal, I can propose a branch, so that you can pull it, rebase your changes on top and check whether it's behaving better or not [08:22] bit ugly but not too complicated [08:50] mborzecki: ok, though I'm sure it will work (not necessarily because it fixes anything, but just because it adds a delay :-) ) [08:52] hehe, actually the sync with signal is only marginally slower on my system [08:58] jamesh: hi [08:59] pstolowski: hi [09:02] jamesh: do you have a moment for a quick chat about notifications? [09:02] sure [09:04] jamesh: ok, you should have got an invite via email [10:03] mardy: busy loop: 1.26ms mean ± 0.3ms, signal wait: 3.2ms mean ± 0.9ms, N=10 [10:03] on my system [10:22] mborzecki: on the spread the busy loop is a bit more than 4ms [10:32] mborzecki: do you know the reason why we are not using SC_DEVICE_CGROUP_FROM_EXISTING when setting up the cgroup from udev-support.c? [10:33] ah, maybe I'm misinterpreting what that flag does [10:40] mardy: hmm? [10:44] mborzecki: I thought that the flag was used to decide if a separate cgroup had to be created, but it appears that it rather decides only if the directory must be created [10:46] jamesh: i forgot to address the point about GIcon (what you described under the original forum post), I should probably address this in the already opened PR [11:09] mborzecki: I'm running the tests with another possible fix, which so far seems to be working: setup the devices cgroup in the snap scope, without creating another subgroup. In this way the PID is not moved to another group by snap-confine. [11:09] that seems like a good idea mardy [11:10] zyga-mbp: the other change I tested this morning was more drastic, that is leaving it all up to systemd: https://github.com/mardy/snapd/commit/7aa7bca91a4a243e06c6a72bd6aacd9cdc12d457 [11:10] zyga-mbp: of course the devices should not be hardcoded like that, it was just for testing :-) [11:11] remember that hotplug must not regres [11:11] *regress [11:11] mardy: that seems ok and is more in line with v2, I see problems with snap-device-helper though, as right now for v1 and v2 we modify just one resource (group in case of v1, or a map for v2) [11:15] mborzecki: mmm... can you give me some additional pointers? [11:15] mardy s-d-h modifies the existing cgroup on hotplug events [11:16] mborzecki: ah, but then your comment was referring to the "let it all up to systemd" solution, right? [11:17] mardy: no the snap scope thing, i understand you let systemd create /sys/fs/cgroup/devices/..../snap.....scope and then modify devices.allow there, right? [11:17] mborzecki: yep [11:17] so in this snap-device-helper would have to traverse the whole /sys/fs/cgroup/devices tree looking for groups that are named like snap scopes (which we already do for v2 but for other purpose) [11:18] mborzecki: but we do have a say on the naming of the transient scope, don't we? [11:19] mardy: yes, we name in a specifiic way, that's what the current code relies on for instance in refresh app awareness [11:19] mardy: but you don't have a say in the orgranization of parent groups afaik [11:20] mborzecki: the transient snap scope is created right under /sys/fs/cgroup/devices/ [11:21] ah, I see, you mean that globs don't exist in syscalls, so we'd have to list the contents of /sys/fs/cgroup/devices/ and pick the one which starts with the right prefix; is this what you mean? [11:21] mardy: yes, can you paste the output of `tree sys/fs/cgroup/devices`? [11:23] mborzecki: uh... unfortunately not: I'm running the tests, and for some reason I cannot open a secondary ssh connection (it claims that the password is wrong, even though I'm copy-pasting it...) [11:24] oh, the spread machine is stuck :-D [11:35] hehe [11:37] meh, chasing down tests that do something funny with the session [11:38] btw. `su - .. test` enters some silly mode if the test user session isn't up, but if it's up, things work fine, i'm not even sure we need to use `tests.session -u test exec` to run commands, i started fixing tests to use that but noticed along the way that just earlier session prepare is teardown [11:38] s/teardown/and teardown is enough/ [11:38] PR core#125 closed: Copy dpkg.yaml for LP Buildd [11:42] got another spread machine, and still it won't accept the password [11:43] PR core#126 opened: Makefile: refactor to have a common $TARGET_BASENAME [11:44] ah, and after I do that, the spread machine gets stuck again :-o [11:45] mvo: so, the problem was the missing '"'?? [12:29] mardy: I think the missing ";" actually [12:29] mardy: but let's see, test building this is a bit annoying [12:47] PR snapd#10853 opened: tests: use the latest cpu family for nested tests execution [13:42] cachio_: does debian-sid no longer support NTP for some reason ? https://pastebin.ubuntu.com/p/qCZ3YzwnMP/ [13:42] first time I've seen that one fail [13:57] fun, our dbustest seems to be racy [13:57] does 10173 need a security review or is that good to go as is? (the polkit security backend) [13:58] PR snapd#9773 closed: interfaces/apparmor: do not fail during initialization when there is no AppArmor profile for snap-confine [13:58] PR snapd#10571 closed: daemon: implement access checkers for themes API [14:18] eh and dbustest seems to a bit flaky with injecting messages [14:23] mborzecki: ok, my fix does not work: sometimes the device cgroup is just set to "/user.slice", even though we are not moving our PID anymore [14:24] mborzecki: what I don't understand is why the loop check we have in tracking.go is not enough. It checks the cgroup that our PID belongs to, and only loops out when it's the correct one. [14:25] is our PID being moved out and in again? Does not make any sense... [14:27] mborzecki: can you push what you have so far? I'd like to try it, since at this time the bug is relatively easy to trigger [14:46] mborzecki: I now notice that cgroup.ProcessPathInTrackingCgroup() returns true if the name=systemd controller matches (for cgroups v1), which is probably not enough (we also want the "devices" controller to be setup) [14:57] mardy: the check only ensures that the process is in the right group in the systemd hierarchy, which isn't associated with any controller (at least not in v1) [14:58] mardy: my changes are in this branch: https://github.com/bboozzoo/snapd/tree/bboozzoo/snap-run-wait-systemd-group-job-done [15:05] mardy: well, it does the right thing, we're asking it for the tracking group, aren't we? [15:12] mborzecki: yes, from the POV of the master branch; not for my modifications (but that's my problem :-D) [16:03] PR snapd#10854 opened: spread: use `bios: uefi` for uc20 [17:37] mborzecki: still there? I've been running the tests on your branch for about 2 hours, it never failed [17:51] mardy: yay, that's good news then 😉 let's sync tomorrow morning [21:29] PR snapd#10855 opened: tests: reset some mount units failing on ubuntu impish