mupPR snapd#9495 opened: logger: use KernelCommandLineSplit to parse debug flag <Simple πŸ˜ƒ> <Created by cmatsuoka> <https://github.com/snapcore/snapd/pull/9495>00:09
mupPR snapd#9495 closed: logger: use KernelCommandLineSplit to parse debug flag <Simple πŸ˜ƒ> <Created by cmatsuoka> <Closed by cmatsuoka> <https://github.com/snapcore/snapd/pull/9495>03:05
mupPR snapd#9494 closed: logger: use strutil.KernelCommandLineSplit in debugEnabledOnKernelCmdline <Simple πŸ˜ƒ> <Skip spread> <Created by mvo5> <Merged by bboozzoo> <https://github.com/snapcore/snapd/pull/9494>05:45
zyga-x240mborzecki: o/05:51
mborzeckizyga-x240: hey05:51
zyga-x240mborzecki: I think we should revert bits of the speedup changes05:51
zyga-x240it's been hanging yesterday05:51
zyga-x240or investigate and fix05:51
zyga-x240it may be a python-version-specific bug or just a general bug05:51
mborzeckizyga-x240: did the change where we respect workers count land?05:53
zyga-x240I think so05:53
zyga-x240afk for some time, lucy just woke up05:54
mupPR snapd#9480 closed: snap: support different exit-code in the snap command <Created by mvo5> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9480>06:31
zyga-x240mvo: we need to fix spread-shellcheck06:36
mborzeckizyga-x240: tests/unit/go hangs on 20.04 too right?06:38
mvozyga-x240, mborzecki yeah, something is wrong, I was just startng a spread google run to see what is going on. do you have an idea already?06:39
zyga-x240mborzecki: not sure06:42
zyga-x240mborzecki: I just don't know06:42
zyga-x240mvo: not immediately, either bug in older python (related to recursive executor submit) or something else06:42
mborzeckitryign with --max-procs=2 might be useful06:43
mborzeckizyga-x240: otoh, the unit tests job as run by gh actions does not seem to fail or hang for that matter06:46
zyga-x240yeah, that's what makes me think it may be python version-specific behavior06:46
zygaI'm finishing my breakfast now, I'll start in a moment06:48
mborzeckizyga: looking at one of my PRs, it failed on 20.04, failed on 18.04 (though tests/unit/spread-shellcheck test) hit kill-timeout in both cases, 16.04 passed06:49
zygait's all over the place06:49
zygashall we revert and fix this async?06:49
zygaI'd rewrite the code so that it returns todo units like what jamesh suggested06:50
zygathen it's one loop and one executor only06:50
mborzeckizyga: sounds like more work though ;)06:50
mborzeckizyga: anyways, i think we should revert that last change06:50
zygathat's why separate actions 1) revert 2) fix06:50
zygamvo how does that sound?06:51
mborzeckialso tests/unit/spread-shellcheck duplicates work right now, run-checks is run in tests/unit/go06:51
mborzeckiso either we can disable spread-shellcheck in tests/unit/go or drop the other test06:51
mvomborzecki: I think we can kill tests/unit/spread-shellcheck now06:51
mborzeckimvo: sgtm06:52
mborzeckirunning manually now, those spread nodes have 1 cpu06:54
mborzeckizyga: heh, so some deadlock, a number of jobs submitted, nothing happening, cpu usage 0%06:55
zygathat's quick06:55
pstolowskigood morning!06:57
zyga-x240pstolowski: hello06:58
mborzeckizyga-x240: https://paste.ubuntu.com/p/ZqDTCcFvnb/ heh (cc mvo)06:58
mborzeckipstolowski: hey06:58
mborzeckizyga-x240: only 2 threads and both are waiting06:58
zygaand good idea to use gdb!06:59
zygaso one is running checkpaths07:00
zygagoing through each location07:00
zygawhile the other is in checkfile07:01
zygawaiting for the result07:01
zygamborzecki perhaps just ensuring we have 3 workers miniumum :P07:01
zygaa bit lame but ...07:01
zyga(as in the minimum N)07:01
zygamborzecki what do you think?07:01
mborzeckizyga: --max-procs 3 seems to work07:05
jameshIn the end, you really want the code waiting for futures moved outside of the thread pool07:06
zygayeah, I think that's the proper solution07:07
jameshnext step: overengineer it with asyncio07:11
zygajamesh no no ;)07:14
jameshzyga: but just think of all the threads you'd save!07:15
mvomborzecki: nice find!07:16
mborzeckijamesh: save all the threads!07:16
mborzeckihmm so maybe we just need +1 worker really07:21
zygamborzecki I think that's the easy fix07:22
zygaand we should rewrite that slightly as jamesh mentioned07:22
mborzeckizyga: i'd land an easy fix with a comment, and then maybe work on the larger fix/refactor07:22
mborzeckizyga: fwiw i can reproduce this locally with --max-procs=107:24
zygayeah, my fault for testing on my beefiest system07:24
mborzeckizyga: are you opening a quick pr with the workaround?07:41
zygamborzecki nope, I thought you want to do that07:41
zygaI can though07:41
mborzeckizyga: no worries, i can push it07:41
pstolowskimvo: hi! #8960 got +1 from Samuele and has 3 reviews; would be great to land at the most convienient moment after cutting a new release branch. perhaps it would make sense to squash-merge it in case of anything unexpected (and a need for a revert)?07:42
mupPR #8960: o/snapstate,servicestate: use service-control task for service actions (9/9) <Needs Samuele review> <Services βš™οΈ> <β›” Blocked> <Created by stolowski> <https://github.com/snapcore/snapd/pull/8960>07:42
zyga-x240I think I need to solve the blockers on https://github.com/snapcore/snapd/pull/920407:47
mupPR #9204: sandbox: track applications unconditionally <Created by zyga> <https://github.com/snapcore/snapd/pull/9204>07:47
zyga-x240as that's really required to enable r-a-a07:47
zyga-x240mborzecki: IIRC fedora will disable getenforce/setenforce soon07:50
zyga-x240how will that affect our test suite?07:50
mborzeckizyga-x240: hm, got more info?07:50
zyga-x240mborzecki: one sec07:50
mupPR snapd#9496 opened: spread-shellcheck: temporary workaround for deadlock, drop unnecessary test <Simple πŸ˜ƒ> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9496>07:51
mborzeckizyga-x240: it's ok for us, we only switch between permissive/enforcing07:52
mborzeckizyga-mbp: mvo: ^^ 949607:57
mvomborzecki: looking08:00
mupPR snapd#9497 opened: Have session agent connect to the D-Bus session bus <Created by jhenstridge> <https://github.com/snapcore/snapd/pull/9497>08:01
jameshzyga-x240: ^^ this PR might help out a bit with your notifications work.08:03
zyga-x240jamesh: interesting08:04
zyga-x240I was using the other socket for broadcast but this may be useful as well08:04
jameshother socket?08:05
zyga-x240jamesh: the snapd-user-agent socket08:05
zyga-x240not dbus :)08:05
zyga-x240why do we have to be launched by systemd?08:06
jameshzyga-x240: we're already launched by systemd08:06
zyga-x240that's a dbus service08:06
zyga-x240got confused for a sec08:06
jameshwe want to make sure that if we get activated via D-Bus first, we still get our file descriptor08:06
jameshfor REST08:06
jameshI'm not suggesting replacing the snapd <-> agent communication with D-Bus08:07
jameshthis would be for a return path of the agent <-> desktop shell communication08:07
zyga-x240jamesh: I've sent a quick review just now08:10
zyga-x240something weird on centos 708:22
zyga-x240type=SYSCALL msg=audit(10/13/20 08:11:54.336:587) : arch=x86_64 syscall=kill success=yes exit=0 a0=0xffffffffffffa7e8 a1=SIGKILL a2=0x0 a3=0xf91e50 items=0 ppid=1 pid=22326 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=snapd exe=/usr/libexec/snapd/snapd subj=system_u:system_r:snappy_t:s0 key=(null)08:30
zyga-x240type=AVC msg=audit(10/13/20 08:11:54.336:587) : avc:  denied  { sigkill } for  pid=22326 comm=snapd scontext=system_u:system_r:snappy_t:s0 tcontext=system_u:system_r:snappy_cli_t:s0 tclass=process permissive=108:30
zyga-x240type=SYSCALL msg=audit(10/13/20 08:01:54.377:396) : arch=x86_64 syscall=connect success=yes exit=0 a0=0x8 a1=0xc000320f10 a2=0x22 a3=0x4 items=0 ppid=22326 pid=22552 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=snap exe=/usr/bin/snap subj=system_u:system_r:snappy_cli_t:s0 key=(null)08:31
zyga-x240mborzecki: do you know how to recover from: Corrupted checkpoint file. Inode match, but newer complete event (1602577959.287:791) found before loaded checkpoint 1602577772.146:79008:33
zyga-x240that's from ausearch08:33
zyga-x240nvm I got it08:34
zyga-x240type=AVC msg=audit(10/13/20 08:32:39.287:791) : avc:  denied  { connectto } for  pid=23299 comm=snap path=/run/dbus/system_bus_socket scontext=system_u:system_r:snappy_cli_t:s0 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=108:34
zyga-x240type=AVC msg=audit(10/13/20 08:32:39.287:791) : avc:  denied  { search } for  pid=23299 comm=snap name=dbus dev="tmpfs" ino=13502 scontext=system_u:system_r:snappy_cli_t:s0 tcontext=system_u:object_r:system_dbusd_var_run_t:s0 tclass=dir permissive=108:34
zyga-x240those are the first two denials to fix08:34
* zyga-x240 tries to adjust the policy08:45
mborzeckizyga-x240: did you manage to fix it?08:57
mborzecki(the policy i mean)08:57
zyga-x240mborzecki: I'm slow at iterating at this08:58
zyga-x240not yet08:58
zyga-x240mborzecki: should we bump the snapd policy version perhaps? :)08:59
zyga-x240maybe it should match snapd version08:59
mborzeckizyga-x240: try with dbus_stream_connect_system_dbusd(snappy_cli_t) and dbus_chat_system_bus(snappy_cli_t)09:00
mborzeckizyga-x240: perhaps we should, though we never did09:00
mborzeckizyga-x240: setting it to version of snapd sounds ok09:01
zyga-x240thanks, trying09:01
mborzeckizyga-x240: the modules in core policy ahve a version bump on each change (in theory)09:02
zyga-x240I suspect nothing just cares about this number09:02
zyga-x240just an observation, it's not important09:02
mborzeckizyga-x240: there maybe some automation under the hood, in our development we replace the module by hand, so it automatically gets a higher priority than the currently loaded one09:03
zyga-x240mborzecki: I'll read https://github.com/SELinuxProject/refpolicy/blob/master/policy/modules/services/dbus.te and friends to see if there's something we should use09:09
mborzeckizyga-x240: though you really want to read this: https://github.com/SELinuxProject/refpolicy/blob/master/policy/modules/services/dbus.if09:12
zyga-x240(I meant all three)09:13
mborzeckizyga-x240: the *.te (type enforcement?) file is for the actual system dbus daemon, *.fc (file context?) is the files/sockets/dirs09:13
mborzeckizyga-x240: *.if is the interfaces for use from other modules09:13
zyga-x240mborzecki: is all the comment XML in those .if files processed by anything?09:22
zyga-x240is there a "compiled" version anywhere?09:22
mborzeckizyga-x240: yes, there should be documentation in your system, though i don't think is avaialble anywhere online09:23
zyga-x240ah, ok09:23
zyga-x240I'll check on F3209:23
mborzeckizyga-x240: latest tech :P09:23
zyga-x240mborzecki: more denials: https://pastebin.ubuntu.com/p/5K5TTpKwkT/09:28
zyga-x240I think the blocker is type=AVC msg=audit(10/13/20 09:16:35.588:270) : avc:  denied  { search } for  pid=22502 comm=snap name=dbus dev="tmpfs" ino=13389 scontext=system_u:system_r:snappy_cli_t:s0 tcontext=system_u:object_r:system_dbusd_var_run_t:s0 tclass=dir permissive=109:28
zyga-x240but the rest is also interesting, it seems snapd cannot stop the hook09:29
zyga-x240(after a timeout)09:29
mborzeckiyeah, sigkill and all09:29
mborzeckizyga-x240: dbus_system_bus_client(snappy_cli_t) should do it09:30
mupPR snapd#9293 closed: snap: auto-import will not try to auto-create users on managed devices <Needs Samuele review> <Created by mvo5> <Closed by mvo5> <https://github.com/snapcore/snapd/pull/9293>09:31
mupPR snapd#9498 opened: client,daemon,snap: auto-import does not error on managed devices <Needs Samuele review> <Run nested> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9498>09:31
mborzeckizyga-x240: and you can probably drop dbus_connect_system_bus(snappy_cli_t) looks like it duplicates some of the things from *_bus_client09:32
pedronis#9463 needs 2nd reviews (it's small)09:38
mupPR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463>09:38
mupPR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463>09:44
zyga-x240made one remark, so perhaps something to adjust before landing09:44
zyga-x240mborzecki: nice, passing09:45
zyga-x240mborzecki: we need session equivalent though09:45
zyga-x240I don't see any in the refpolicy gh repo09:46
zyga-x240mborzecki: rebased and pushed back to https://github.com/snapcore/snapd/pull/920409:46
mupPR #9204: sandbox: track applications unconditionally <Created by zyga> <https://github.com/snapcore/snapd/pull/9204>09:46
zyga-x240pedronis: should I push a trivial patch for https://github.com/snapcore/snapd/pull/946309:55
mupPR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463>09:55
zyga-x240for https://github.com/snapcore/snapd/pull/9463#discussion_r50381393509:55
pedroniszyga-x240: if you have time yes09:55
zyga-x240on it09:56
pedronismvo: should we close #8845 and #8929 until we have time to discuss/adjust them and reprose?09:56
mupPR #8845: [RFC] many: add "system.service.snapd-autoimport.disable" setting <Needs Samuele review> <β›” Blocked> <Created by mvo5> <https://github.com/snapcore/snapd/pull/8845>09:56
mupPR #8929: [RFC] many: add new "daemon-startup: inhibit" option <Needs Samuele review> <Created by mvo5> <https://github.com/snapcore/snapd/pull/8929>09:56
mvopedronis: sure09:57
zyga-x240https://github.com/snapcore/snapd/pull/9463/commits/26f6b6680027ea9b1262a03b27a28ac6f3d60a9e if anyone wants to cross-check10:01
mupPR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463>10:01
mupPR snapd#8845 closed: [RFC] many: add "system.service.snapd-autoimport.disable" setting <Needs Samuele review> <β›” Blocked> <Created by mvo5> <Closed by mvo5> <https://github.com/snapcore/snapd/pull/8845>10:02
zyga-x240mborzecki: any idea on how to provide sigkill and other permissions10:14
zyga-x240I think we're missing a test10:14
zyga-x240that we can kill a hook that is running for too long10:14
zyga-x240(or miss a test that runs selinux checks)10:14
zyga-x240mborzecki: do you think we could move the "no denials" check to an invariant?10:14
mborzeckizyga-x240: hm this should do it `allow snappy_t snappy_cli_t:process { sigkill };`10:19
mborzeckizyga-x240: wondering though, why the hook process was still snappy_cli_t10:19
mborzeckican you reproduce that and grab ps -Z ?10:19
zyga-x240I will try10:19
zyga-x240ah I know why10:21
zyga-x240mborzecki: because that was still the "snap run" phase10:21
zyga-x240we got a denialy on dbus method call10:21
zyga-x240and got stuck waiting for a response10:21
zyga-x240mborzecki: I wonder if we should setup a timeout / guard of some sort10:22
zyga-x240but that explains the label10:22
zyga-x240we switch that in snap-exec10:22
=== JanC_ is now known as JanC
zyga-x240mborzecki: what is the label we transition to when we run as a snap app?10:25
zyga-x240I added this10:25
zyga-x240# snapd tries to kill hooks that run for over 10 minutes.10:25
zyga-x240allow snappy_t snappy_cli_t:process { sigkill };10:25
mborzeckizyga-x240: if it's under systemd the final label is unconfined_service_t10:25
zyga-x240but I think we need more than that10:25
zyga-x240not under systemd10:26
zyga-x240those are specifically hooks10:26
zyga-x240systemd just tracks them10:26
zyga-x240not spawns them10:26
mborzeckizyga-x240: right, but the transitions take the same route iirc10:26
mborzeckizyga-x240: so the hook ends up as unconfined service too10:26
zyga-x240ok, that's good then10:28
zyga-x240I'll add two lines10:28
* zyga-x240 tests killing w10:29
zyga-x240with disabled dbus perms10:30
pstolowskipedronis: hi, i've spent quite a bit investigation undo on remove, found a couple of issues and filed https://bugs.launchpad.net/snapd/+bug/1899614 (and also worked on addressing point #2 there)10:38
mupBug #1899614: multiple problems with undo for 'snap remove'  <snapd:New> <https://launchpad.net/bugs/1899614>10:38
zyga-x240pstolowski: rogpeppe is around and could provide extra information about the other bug we've discussed11:01
pstolowskizyga-x240: thanks, but i don't have anything specific atm, but mvo had requested some info there before, would be good to add it if possible11:02
pstolowskirogpeppe ^11:02
rogpeppepstolowski: ok, i've added that command output11:03
zyga-x240thank you!11:03
rogpeppei don't seem to get email notifications from launchpad, so please ping me here if you want any further interaction from me in that issue, thanks!11:04
zyga-x240ah, will do11:05
pstolowskiok, that log is very interesting.. we saw " there was a rollback across reboot" before, didn't we?11:06
pstolowskithis at least explains the refresh and undo on Oct 5 that I also saw in the state timings; but what happened after next 2 reboots and why did it work till Oct 7th is a mistery11:08
zyga-x240pstolowski: I don't recall how this layer works, what would happen if the boot partition is corrputed (vfat) and we cannot really set the next boot to anything different11:08
zyga-x240like it's always stuck at one thing11:08
zyga-x240would that explain anything?11:08
pstolowskii've no idea about that code11:08
zyga-x240rogpeppe: I think you could try unmounting the boot partition (vfat) and fscking it11:08
zyga-x240rogpeppe: also if you want to recover the system, we should be able to help11:08
zyga-x240just not sure what's the best way to do that11:08
zyga-x240I was also talking to mborzecki about this issue11:09
zyga-x240and we don't remove snapd.service from disk11:09
zyga-x240rogpeppe: could you check if snapd.service is in /etc/systemd/system?11:09
pstolowskiafaiu the system works again, no? logs from oct1211:09
zyga-x240rogpeppe: oh, is snapd running now?11:09
pstolowskimaybe i'm making assumptions.. but afaiu it stopped working on 7th?11:10
rogpeppeone mo, let me check11:10
rogpeppethe system isn't working currently11:10
rogpeppesnapd does seem to be running:11:11
rogpepperogpeppe@localhost:~$ ps alxw | grep snapd11:11
rogpeppe4     0   725     1  20   0 928048 18712 -      Ssl  ?          2:57 /snap/snapd/9169/usr/lib/snapd/snapd11:11
pstolowskirogpeppe: ha, that's weird!11:11
zyga-x240can you run /snap/snapd/9169/usr/bin/snap?11:11
pstolowskirogpeppe: and no /snap/snapd/current symlink right?11:11
rogpeppe /snap/snapd/current is still not there, right11:11
rogpeppei can run /snap/snapd/9169/usr/bin/snap ok11:12
zyga-x240pstolowski: ^ and ideas on what to explore?11:12
zyga-x240rogpeppe: maybe snap list --all11:12
zyga-x240for a first sanity check11:12
rogpeppei could give you a login to the system if you want11:12
zyga-x240then maybe snap install snapd?11:12
zyga-x240pstolowski: do you want to debug this?11:12
rogpeppethis is the output of snap list --all: https://paste.ubuntu.com/p/Rwvr2dPD9M/11:14
pstolowskirogpeppe: yes, sure, that would be great11:14
zyga-x240core           16-2.46.1           9995  latest/stable     canonical*    core11:15
zyga-x240core is linked11:15
pstolowskido we do anything magical where snapd would start without current symlink after rebooot? (don't think so, but...)11:15
zyga-x240but the boot base is core18, right?11:15
zyga-x240rogpeppe: what's /meta/snap.yaml snap name?11:15
* rogpeppe tries to remember how to grant ssh access. so rusty!11:15
zyga-x240pstolowski: I don't think we do11:15
zyga-x240rogpeppe: you can try ssh-import-id11:15
zyga-x240not sure if it's preinstalled11:15
rogpeppethat's the command i was trying to remember!11:15
zyga-x240cool :)11:16
zyga-x240mvo: ^ very interesting bug11:16
zyga-x240lots for us to learn on robustness11:16
rogpeppepstolowski: is your launchpad username pstolowski ?11:16
rogpeppeok, no ssh-import-id command11:17
pstolowskirogpeppe: 1 sec, i will import ssh keys from my current box11:17
mvozyga-x240: is there more news?11:19
zyga-x240mvo: yes, we have access to the device11:19
zyga-x240snapd is disabled!11:19
mvowoah, how did that happen :( ?11:20
zyga-x240the boot partition is still corrputed I suspect11:20
pstolowskirogpeppe: my lp user is stolowski11:20
zyga-x240we can get all the logs11:20
zyga-x240mvo: missing undo in unlink snap, I bet11:20
zyga-x240but could be something much more complex11:20
pstolowskii'm not sure it's this zyga-x24011:20
pstolowskibut cannot exclude it of course11:20
mvothank you so much rogpeppe !11:21
zyga-x240pstolowski: ack11:21
pstolowskimvo, zyga-x240 any suggestions what to collect? anything regarding boot?11:21
* mvo hugs pstolowski and zyga-x240 for their tireless debugging also11:21
rogpepperogpeppe@localhost:~/.ssh$ ed11:21
rogpeppe-bash: ed: command not found11:21
zyga-x240pstolowski: maybe to be safe collect all journal logs11:21
zyga-x240rogpeppe: vi is there11:21
zyga-x240rogpeppe: you can also echo >>11:21
rogpeppeyeah, i'll use cat11:21
rogpeppei don't use an ANSI terminal so vi isn't good for me11:21
pstolowskizyga-x240: is tar /var/log.. good enough, or is there a better way?11:23
pstolowskithanks rogpeppe, checking11:23
zyga-x240I think that's good11:23
zyga-x240pstolowski: you can use journalctl with a standlone directory to examine machine logs without journald itself11:23
pstolowskizyga-x240: yeah i did it once.. slightly inconvinent but works11:24
zyga-x240pstolowski: journalctl -D /path/to/var/log/journal11:25
zyga-x240and then it works IIRC11:25
zyga-x240mborzecki: is spread-shellcheck fixed?11:31
zyga-x240I see the PR11:31
amurrayzyga-x240: hey so you mentioned re docker-support/multipass-support being broken with aa3 on groovy that you would prefer a snap-update-ns approach - can you elaborate more on what you are thinking here? I am not sure I understand exactly what you have in mind.11:36
zyga-x240I was thinking that the special interfaces they rely on could provide a mount profile that puts the base snap's apparmor config in /etc11:36
pstolowskimvo: anything re boot env that can be useful?11:37
zyga-x240something like mount --bind /snap/core18/current/etc/apparmor.d /etc/apparmor.d11:37
zyga-x240pstolowski: if you can, try fscking the boot partition11:37
zyga-x240dd it11:37
zyga-x240to analyze post-mortem11:37
zyga-x240you may want to flip it read only for that operation11:37
zyga-x240or unmount it11:37
zyga-x240do you remember that vfat bug we ran into before?11:38
amurrayzyga-x240: ok so is this already easily possible with the existing way that interfaces are declared? I am not super familiar with that...11:38
zyga-x240amurray: I believe it should, the only thin that would be required in addition to this, is the permission for snap-update-ns to do this as well11:38
zyga-x240amurray: if that's urgent I could look11:38
zyga-x240amurray: but do look at the mount profile part11:38
zyga-x240the apparmor part should be easy once that is in the works11:38
zyga-x240you can test this by making a snap that uses the new interface (or the vanilla original snaps)11:39
zyga-x240and looking at the generated mount profile in /var/lib/snapd/apparmor/mount/11:39
zyga-x240there are some examples11:39
zyga-x240for instance, the desktop interface uses this mechanism to bind mount fonts around11:39
amurrayzyga-x240: oh can you point me at examples, I am still confused πŸ˜•11:39
amurrayah ok will take a look11:39
zyga-x240amurray: in the snapd tree please look at interfaces/builtin/desktop.go11:40
zyga-x240let me open it as well11:40
amurrayyep am just looking now11:40
pstolowskizyga-x240: i'd rather avoid any potentially destructive steps atm, would leave that to rogpeppe11:40
zyga-x240pstolowski: ok, a dd of the vfat while it is mounted would be useful as well11:40
zyga-x240even if you just stash it on the device11:40
zyga-x240not sure how large it is11:40
zyga-x240amurray: so if you scroll to line 29511:40
amurrayzyga-x240: I am guessing that AddMountEntry() would be the thing?11:40
zyga-x240you can see how it grants apparmor permissions11:41
pstolowskidd of /boot partition? nb, logs will be huuge11:41
zyga-x240there are several profiles involved11:41
zyga-x240pstolowski: the vfat11:41
zyga-x240not sure how big it is11:41
amurrayah yep and the corresponding apparmor bits - thanks :)11:41
zyga-x240amurray: the key part there is AddUpdateNSf function11:41
zyga-x240which adds a piece of text for per-snap profile for snap-update-ns11:41
zyga-x240this just needs the permission to bind /snap/{base}/*/etc/apparmor.d -> /etc/apparmor.d11:42
zyga-x240now jump to 32211:42
zyga-x240this does what you mentioned bfore11:42
zyga-x240the difference is that we need spec.AddMountENtry (not *User*)11:42
zyga-x240there's more11:42
zyga-x240I believe those should be permanent things11:42
zyga-x240regardless of connection11:43
zyga-x240so the method signature is different11:43
zyga-x240you can see that in ...11:43
zyga-x240if you go to interfaces/mount/spec.go:20611:43
zyga-x240there's a Slot variant just below11:43
zyga-x240the difference is in the arguments provided,11:44
zyga-x240the Permanent methods get an interface and a plug or a slot, not a connected plug / slot11:44
zyga-x240so it's just one side that you see11:44
pstolowskijournal logs are 324M, tgz11:44
zyga-x240I think that'sa sensible approach11:44
zyga-x240pstolowski: oh my11:44
zyga-x240pstolowski: maybe too much11:44
zyga-x240pstolowski: not sure, if we can send that over, that's good11:44
zyga-x240but confirm with rogpeppe for sure11:44
amurrayzyga-x240: ok thanks heaps for your guidance - I'll try take a look tomorrow morning and see if I can cook something up11:44
zyga-x240amurray: let me know how this feels like11:45
zyga-x240amurray: if you get stuck I'll help11:45
pstolowskirogpeppe: ok to transfer 324M ^ ?11:45
zyga-x240amurray: which interfaces were those? docker-support and multipass-something?11:45
amurrayzyga-x240: thanks - so multipass-support iirc11:45
zyga-x240ideally we'd have a spread test that installs those snaps11:46
zyga-x240and looks at the mount profile or at the mount namespace11:46
zyga-x240I think that's the last step though11:46
zyga-x240I can definitely help11:46
amurrayyeah I was wondering if I should try and add a test with whatever fix I come up with for this but will focus on a getting the right fix first and then can look at that if time permits...11:47
zyga-x240amurray: you can start with a quick failing test11:47
zyga-x240do you know how to write those?11:48
zyga-x240mkdir tests/regression/lp-XXX11:48
zyga-x240add a summary: with some info11:48
zyga-x240then execute: | (newline)(tab)false11:48
zyga-x240run that test with SPREAD_DEBUG_EACH=0 spread -debug -v google:ubuntu-20.10-64:tests/regression/lp-XXX11:49
zyga-x240in the shell install the snap you need11:49
zyga-x240use nsenter / cat to explore the files in /etc/apparmor.d11:49
zyga-x240eventually copy those ideas over to the yaml11:49
zyga-x240quit the debug shell and re-run to verify11:49
zyga-x240at some point it will measure failure11:49
zyga-x240and then that's a good start11:49
zyga-x240we have a library of helper programs that assist in writing tests11:50
zyga-x240but the best thing is you can really experience this from the point of view of a user11:50
zyga-x240and create a valid test11:50
zyga-x240that only later needs tweaking so that it fits the rest of the test stack11:50
amurraythis will be my first time writing a test so again I really appreciate the guidance - cheers11:50
zyga-x240amurray: look at various tests around, though you may stumble on more unusual test from time to time11:51
zyga-x240you can also use qemu locally11:51
zyga-x240you will need a test image, you can get that with adt11:51
zyga-x240I can find the magic line if you want to use that instead of the google backend11:51
zyga-x240just let me know11:51
zyga-x240I think, on 20.10, that is autopkgtest-buildvm-ubuntu-cloud11:51
zyga-x240you just need a few extra args to get a groovy image11:52
amurraysure any help with magic incantations are greatly appreciated :)11:52
zyga-x240drop that into ~/.spread/qemu11:52
zyga-x240as ubuntu-20.10-64.img11:52
zyga-x240and you're good11:52
zyga-x240a bit of advice that qemu tests are heavy on networking11:52
zyga-x240so you may want a good connection11:53
zyga-x240over time you can speed up with things like apt-cacher-ng11:53
zyga-x240anyway, let me know if this helps and if you get stuck on anything just ask11:53
amurraywill do - thanks again (my connection is ok, not great so will see how I fare...)11:53
amurrayok time for me to go eod - thanks again zyga-x240 for your help - have a great day11:54
zyga-x240see you later11:55
pstolowskizyga-x240: sorry, i'm not sure about dd and vfat, can you elaborate?11:58
zyga-x240pstolowski: how large is the fvat partition on that pi?11:58
zyga-x240I don't recall11:58
zyga-x240pstolowski: I wonder what's the impact of the fact that the partition is not cleanly unmounted11:59
zyga-x240and may not have been unmounted11:59
pstolowskizyga-x240: i don't see any mounted vfat partitions11:59
zyga-x240cleanly that is11:59
zyga-x240can you paste mount?11:59
zyga-x240rogpeppe: do you recall if you unmounted the boot partition last time we were looking at tihs?12:00
rogpeppezyga-x240: yeah, i might have12:00
zyga-x240ah, that explains things12:00
zyga-x240thank you12:00
pstolowskizyga-x240,z rogpeppe i've already collected mount output12:00
zyga-x240rogpeppe: did you try to fsck that partition after unmounting it?12:01
rogpeppezyga-x240: i tried, but there's no fsck command available12:01
zyga-x240rogpeppe: oh12:01
rogpeppezyga-x240: (and no way to install one, of course :) )12:01
zyga-x240pstolowski: what's the boot base (/meta/snap.yaml will help)12:01
zyga-x240is that core18 or core?12:01
zyga-x240I see /sbin/fsck.vfat in both core and core1812:03
zyga-x240can you confirm those are on PATH pstolowski?12:03
zyga-x240rogpeppe: ^12:04
zyga-x240rogpeppe: if you can, perhaps fsck.vfat /dev/mmcblk0p{something}12:04
pstolowskizyga-x240: it's core18 - https://paste.ubuntu.com/p/Rwvr2dPD9M/12:05
zyga-x240pstolowski: right and that is the boot base for sure?12:05
zyga-x240our list output doesn't show this12:05
rogpeppezyga-x240: what's that "{something}" supposed to be a placeholder for?12:05
zyga-x240rogpeppe: the number of the partition with vfat12:05
zyga-x240lsblk can help finding it12:05
zyga-x240I think it's just p0 or p112:05
pstolowskino  fsck.vfat on path!12:06
zyga-x240pstolowski: and in /sbin/fsck.vfat?12:06
rogpeppepstolowski: feel free to run fsck...12:06
pstolowski$ ls /sbin/fsck*12:06
zyga-x240pstolowski: can you check /meta/snap.yaml to ensure that the boot base is core18 for sure, I'm surprised to see three core revisions and one core1812:06
zyga-x240let me check12:07
pstolowski"/sbin/fsck  /sbin/fsck.cramfs  /sbin/fsck.ext2  /sbin/fsck.ext3  /sbin/fsck.ext4  /sbin/fsck.minix"12:07
zyga-x240mvo: ^^^12:07
zyga-x240that's very likely a serious problem12:07
zyga-x240pstolowski: how about fsck.fat?12:07
zyga-x240is that gone too?12:07
zyga-x240I see it in my core18 snap on x86-6412:07
zyga-x240pstolowski: that's worth reporting as a separate bug with a regression test that checks that's an executable program12:08
pstolowskizyga-x240: on what path on your system?12:08
zyga-x240and that it can run --help12:08
zyga-x240pstolowski: /snap/core18/current/sbin/fsck.vfat12:08
zyga-x240that's a symlink to fsck.fat12:08
zyga-x240but I see that in the core snap as well12:09
* zyga-x240 looks at revision numbers12:09
* zyga-x240 checks stable channel12:10
zyga-x240pstolowski: waiting for your confirmation of the boot base please12:10
pstolowskizyga-x240: yeah, core has fsck.vfat here. but core18 doesn't. and it's not symlinked anywhere12:11
zyga-x240stable core has fsck12:11
zyga-x240probably core18 is at fault then12:11
zyga-x240pstolowski: but core is the boot base12:11
pstolowskizyga-x240: boot base is core1812:11
zyga-x240so what's going on?12:11
pstolowskii'm collecting all this and will soon attach to the report12:12
zyga-x240pstolowski: I think core18 didn't refresh12:12
zyga-x240it's very old12:12
zyga-x240core18 is revision 1885 here12:12
zyga-x240but 1885 in your log12:12
zyga-x240I think that could be related12:12
zyga-x240rogpeppe: consider runnin fsck.vfat from /snap/core/current/sbin/fsck.vfat12:12
zyga-x240then we could try running snap refresh core1812:13
zyga-x240and snap install snapd12:13
zyga-x240that may recover the system12:13
zyga-x240I think this system is just stuck at old revisions and cannot move forward12:13
zyga-x240the fsck bug was fixed12:13
zyga-x240but this device is still affected12:13
zyga-x240not sure what you think but the revision number there is really old12:14
zyga-x240pstolowski: so is snapd running now?12:14
zyga-x240the service I mean12:14
pstolowskizyga-x240: yes12:14
zyga-x240pstolowski: can you try refreshing core1812:15
zyga-x240though wait12:15
zyga-x240wait please12:15
zyga-x240that would cause a reboot12:15
zyga-x240and IIRC that's a problem12:15
zyga-x240rogpeppe: ^12:15
zyga-x240rogpeppe: is rebooting that device acceptable for you?12:15
zyga-x240or will it misbehave?12:15
zyga-x240(I think that given its state we should first fsck boot partitio12:15
rogpeppezyga-x240: that's fine, although it probably won't restart12:15
zyga-x240mount it12:15
zyga-x240and look at what's there12:15
rogpeppezyga-x240: it will probably need manual intervention12:15
zyga-x240pstolowski: ^ ok, let's not reboot it yet12:15
zyga-x240pstolowski: please fsck boot12:16
rogpeppezyga-x240: it almost always needs to be restarted twice for some reason12:16
zyga-x240using core's fsck12:16
zyga-x240rogpeppe: I think because it reboots to try new core18 snap12:16
zyga-x240but fails12:16
zyga-x240because boot data is corrputed12:16
zyga-x240so it gets stuck12:16
zyga-x240(no watchdog timer)12:16
zyga-x240then reboot rolls back12:16
zyga-x240and we're back here12:16
pstolowskiyeah, i'd avoid any intervention now. i'd discuss the solution for rogpeppe and pass it to him to do12:16
zyga-x240rollback across reboot as pstolowski noted earlier12:16
pstolowskilet's discuss on the standup12:16
zyga-x240pstolowski: agreed12:16
zyga-x240rogpeppe: ^ if you can fsck is IMO safe12:17
zyga-x240then mounting the partition back12:17
rogpeppehow on earth did the boot data get corrupted anyway? i haven't made any changes to it since i installed12:17
zyga-x240that may recover everything, assuming you try to install snapd again12:17
zyga-x240rogpeppe: it's fvat12:17
zyga-x240and it's written to by both uboot and linux12:17
zyga-x240we really don't know12:17
rogpeppezyga-x240: vfat can corrupt even when you're not changing it?12:17
zyga-x240rogpeppe: we do change it on boot12:18
zyga-x240we set a flag that says "we're trying to boot that thing now"12:18
zyga-x240so that on reboot (if we fail) we don't retry12:18
rogpeppei guess that's where the issue comes from12:18
zyga-x240but boot a safe value12:18
zyga-x240we found some issues with uboot FAT before12:18
zyga-x240those got fixed12:18
zyga-x240or was that GRUB fat12:18
zyga-x240I think you can fix the partition so that it's not unclean12:18
zyga-x240mount it12:18
zyga-x240so that snapd can write to it12:18
zyga-x240install snapd using the snap binary from core snap12:19
zyga-x240that should make snapd snap not disabled anymore12:19
zyga-x240if that works12:19
zyga-x240I'd try to refresh core18 and see if it works12:19
zyga-x240I suspect it might just12:19
zyga-x240rogpeppe: this is your decision12:21
rogpeppezyga-x240: go for it12:21
rogpeppezyga-x240: i can easily manually reboot if needed12:21
rogpeppezyga-x240: this shouldn't corrupt the main data, right?12:21
pstolowskirogpeppe: give me 30 minutes to finish the download :)12:21
zyga-x240rogpeppe: it's a separate partition12:22
zyga-x240rogpeppe: I think pawel asked you to perform those changes though, I think it's best to let him upload logs for forensics12:22
zyga-x240and for you two to agree as to who runs the commands12:22
zyga-x240so that it's not racy :)12:22
rogpeppei'm happy for pstolowski to do whatever's needed. i'm quite busy currently.12:22
zyga-x240pstolowski: it's up to us12:23
zyga-x240on the upside the bug may be fixed12:23
zyga-x240it's just prevented your device from refreshing12:23
pstolowskiyeah, that all makes sense12:24
pstolowskibut but12:24
pstolowskiwhy no current symlink?12:24
zyga-x240pstolowski: although understanding exactly how snapd became unlinked would be useful12:24
pstolowskiundo bug?12:24
zyga-x240I wonder what happens in that special boot code12:24
zyga-x240that reconciles what the system was booted with (Base name and rev)12:24
zyga-x240with what's in snapd state12:24
zyga-x240maybe that, when a skew is detected, somehow took out snapd snap?12:25
zyga-x240brb, afk for a moment12:26
zyga-x240or maybe until standup12:26
zyga-mbpmvo could you please land https://github.com/snapcore/snapd/pull/949612:52
mupPR #9496: spread-shellcheck: temporary workaround for deadlock, drop unnecessary test <Simple πŸ˜ƒ> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9496>12:52
zyga-mbpthe store had some issues with searching, probably usual load or something like that, and this blocks master12:52
mvozyga-mbp: done12:56
zyga-mbpthank you!12:56
mupPR snapd#9496 closed: spread-shellcheck: temporary workaround for deadlock, drop unnecessary test <Simple πŸ˜ƒ> <Created by bboozzoo> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9496>12:57
mborzeckioff to pick up my daughter from school13:47
zygapstolowski I'm hacking tests for this case but if you want to pair-program on fixing that device remotely, let me know13:54
pstolowskizyga: that's actually a good idea. but i'm having a short lunch break now, how about in 30?13:55
zygano rush, I didn't have lunch yet either13:56
zygalet me know when its comfortable for you13:56
zygathat fsck test is actually pretty cool14:13
pstolowskizyga: i've a meeting in 10 minutes, how about afterwards?14:20
mvopstolowski: I wonder if this will happen, lot's of people declined14:21
zygapstolowski: how long is the meeting?14:21
pstolowskimvo: ah ok14:21
zygapstolowski: I'm here for now, I can grab lunch and eat it next to a hangout14:21
pstolowskizyga: ok, i'll know in a few moments if the meeting happens (likely not)14:27
zygapstolowski: ok, can we do it in 15 minutes14:29
zygaI'm getting lunch in this instant14:29
zygasee you at 16:4514:31
mvopstolowski: noone in the call so far14:31
pstolowskizyga: ok, works for me, see you soon14:34
zygamvo, pedronis: pstolowski discovered the root cause15:06
pstolowskibug report coming15:10
mvozyga: ohhh?15:12
* cachio lunch15:19
zygamvo still in a call15:27
zygabut vey unexpected15:27
zygaokay, back15:51
zygamvo so two well defined bugs15:51
zygamvo one triggers the other15:51
zygamvo one breaks snapd refreshes15:51
zygaother causes snapd to deactivate itself on failed refresh15:51
zygathe root cause was session services15:51
zygaand the fact that it silently depends on a specific revision of core/core1815:52
zygawe hit a EROFS and snapd refresh fails to proceed15:52
zygapawel is reporting both issues15:52
zygathey are well defined and should be relatively easy to fix15:52
zygaat least the breaking root cause15:52
zygarogpeppe: we fixed snapd on your system15:52
zygarogpeppe but we refrained from doing anything that would reboot the device15:53
zygarogpeppe we also fixed the boot partition15:53
zygarogpeppe with a bit of luck, you should be able to refresh core18 snap now15:53
zygarogpeppe and it should reboot successfully15:53
zygaor if it does not, there are more bugs for us to find15:53
zygarogpeppe if you choose to refresh core18, do let us know what happened15:53
zygarogpeppe we also mounted the boot partition back, so it should be all right now15:53
zygamvo let me know if you want to talk about any details15:54
* mvo is in a meeting15:54
pstolowskimvo, zyga  https://bugs.launchpad.net/snapd/+bug/1899664 and https://bugs.launchpad.net/snapd/+bug/189966516:01
mupBug #1899664: snapd refresh on old core18 fails due to read-only /etc/dbus-1/session.d <snapd:New> <https://launchpad.net/bugs/1899664>16:01
mupBug #1899665: Failed refresh of snapd drops current symlink on failure <snapd:New> <https://launchpad.net/bugs/1899665>16:01
pstolowskipedronis: ^16:02
zygapstolowski \o/16:03
zygathank you!16:04
mvopstolowski: without having looked at those, how hard do you think this is to fix? could we work on this as a blue item realtively soon?16:04
rogpeppezyga: thanks!16:04
mvoand thanks zyga and pstolowski (and rogpeppe of course!)16:04
rogpeppezyga: how would i go about refreshing the core18 snap?16:04
zygarogpeppe snap refresh core1816:05
rogpeppezyga: ok, i'll try that. i.e. run that command, then `sudo reboot now`, right?16:06
zygano need, it will reboot itself16:06
pstolowskimvo: yes i will work on them, r/o filesystem will be easy, not sure about the one re symlink, but probably not too complicated either16:06
zygarogpeppe note that the way we fixed your system is ephemeral16:06
zygaand if core18 fails to refresh16:06
zygasnapd will break itself again16:06
zygathat is why I wanted to know that this works or fails once the reboot completes16:07
zygarogpeppe oh, and we re-started the hydroctl service as well16:07
rogpeppezyga: yay!16:07
rogpeppezyga: it's working!16:08
pstolowskirogpeppe: feel free to remove my ssh access when you confirm core18 works16:08
zygamvo: interestingly, snapd will hit this case in a normal way16:09
zygathat I did not think about before16:09
zygapstolowski we are here because that device has long refresh window16:09
zygait will refresh infrequently16:09
zygaand when it does16:09
zygait refreshes snapd first16:09
zygaso I think the severity of https://bugs.launchpad.net/snapd/+bug/1899664 should be increased16:09
mupBug #1899664: snapd refresh on old core18 fails due to read-only /etc/dbus-1/session.d <snapd:Triaged> <https://launchpad.net/bugs/1899664>16:09
zygait's not such a special case after all16:10
pstolowskiyeah i didn't set sev yet16:10
* mvo hugs pstolowski 16:13
* mvo hugs zyga too16:13
pstolowskiwas a collective effort really16:13
rogpeppezyga: after "snap refresh core18":16:13
rogpeppeerror: cannot perform the following tasks:16:13
rogpeppe- Make current revision for snap "core18" unavailable (cannot set next boot: cannot determine bootloader)16:13
rogpeppe- Make snap "core18" (1889) available to the system (cannot set next boot: cannot determine bootloader)16:13
pstolowskimvo: eod for today, i'll work on fixes tomorrow morning16:13
mvopstolowski: sure thing16:14
pstolowskiooh woot16:14
pstolowskilooks like i'll log in there again16:14
pstolowskicore18         20190723        1076  latest/stable  canonicalβœ“    base,disabled16:15
pstolowskiit did the same for core1816:15
pstolowskiit is disabled now16:15
zygarogpeppe oohh16:18
zygarogpeppe thank you, we will look again tomorrow16:18
rogpeppezyga: ok, thanks16:18
zygarogpeppe this device, while very unfortunate for you, will really help make snapd more robust16:18
rogpeppezyga: i hope so :)16:19
pstolowskilooks like an error on  'Make snap ... available to the system" results in wrong undo, this is what i see for core18 and the same happened for snapd16:22
pstolowskibut yes, i'll continue tomorrow16:22
* zyga-x240 works on the test 16:47
zyga-x240so cool things16:48
zyga-x240this is my favourite test!16:48
* zyga runs the 2nd fsck test18:27
zygaI mean in the test18:42
* cachio doctor appointment18:44
zygamore iterations19:16
* zyga goes to do some evening housework while tests run19:18
zygatests pass19:24
zygaOMG THE RAIN19:45
zygamy dog decided to have a slow long walk19:46
zygaI'm so wet19:46
zygatrying core fsck test now19:49
zygaI'll push both tests at once19:49
zygahttps://github.com/snapcore/snapd/pull/9446 needs review19:50
mupPR #9446: overlord,usersession: initial notifications of pending refreshes <Created by zyga> <https://github.com/snapcore/snapd/pull/9446>19:50
zygahttps://github.com/snapcore/snapd/pull/9422 needs review as well and is short19:50
mupPR #9422: overlord: add link participant for linkage transitions <Needs Samuele review> <Created by zyga> <https://github.com/snapcore/snapd/pull/9422>19:50
* zyga shower19:56
zyga16 passes20:09
zyga18 and 20 are in progress20:09
* zyga tea20:09
zygaamurray: oh is it morning for you? :)20:14
zyga18 passes20:21
zyganow just 2020:21
zygaactually making tea :)20:21
zyga-x240core20 takes foreeeever to test boot20:44
zyga-x240but close20:44
mupPR snapcraft#3315 opened: build(deps-dev): bump junit from 3.8.1 to 4.13.1 in /tests/spread/plugins/v1/maven/snaps/maven-hello/my-app <dependencies> <java> <Created by dependabot[bot]> <https://github.com/snapcore/snapcraft/pull/3315>20:45
mupPR snapcraft#3316 opened: build(deps-dev): bump junit from 3.8.1 to 4.13.1 in /tests/spread/plugins/v1/maven/snaps/legacy-maven-hello/my-app <dependencies> <java> <Created by dependabot[bot]> <https://github.com/snapcore/snapcraft/pull/3316>20:45
zyga-x240core20 should be good, running a clean test now21:03
* zyga-x240 pushed https://github.com/snapcore/snapd/pull/9499 and EODs21:31
mupPR #9499: tests: add tests for fsck <Created by zyga> <https://github.com/snapcore/snapd/pull/9499>21:31
zyga-x240xnox: ^ perhaps you know what is responsible for fsck.vfat in uc2021:31
zyga-x240this test shows that ubuntu-seed stays corrupted across reboot, a regression compared to core 16 and core 1821:32
* zyga-x240 EODs 21:33
zyga-x240xnox: if you have feedback please comment on the PR, I'll review that tomorrow21:33
mupPR snapd#9499 opened: tests: add tests for fsck <Created by zyga> <https://github.com/snapcore/snapd/pull/9499>21:35

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!