mup | PR snapd#9495 opened: logger: use KernelCommandLineSplit to parse debug flag <Simple π> <Created by cmatsuoka> <https://github.com/snapcore/snapd/pull/9495> | 00:09 |
---|---|---|
mup | PR snapd#9495 closed: logger: use KernelCommandLineSplit to parse debug flag <Simple π> <Created by cmatsuoka> <Closed by cmatsuoka> <https://github.com/snapcore/snapd/pull/9495> | 03:05 |
zyga-x240 | o/ | 05:35 |
mup | PR snapd#9494 closed: logger: use strutil.KernelCommandLineSplit in debugEnabledOnKernelCmdline <Simple π> <Skip spread> <Created by mvo5> <Merged by bboozzoo> <https://github.com/snapcore/snapd/pull/9494> | 05:45 |
mborzecki | morning | 05:51 |
zyga-x240 | mborzecki: o/ | 05:51 |
mborzecki | zyga-x240: hey | 05:51 |
zyga-x240 | mborzecki: I think we should revert bits of the speedup changes | 05:51 |
zyga-x240 | it's been hanging yesterday | 05:51 |
zyga-x240 | or investigate and fix | 05:51 |
zyga-x240 | it may be a python-version-specific bug or just a general bug | 05:51 |
mborzecki | hmmm | 05:52 |
mborzecki | zyga-x240: did the change where we respect workers count land? | 05:53 |
zyga-x240 | I think so | 05:53 |
zyga-x240 | afk for some time, lucy just woke up | 05:54 |
mup | PR snapd#9480 closed: snap: support different exit-code in the snap command <Created by mvo5> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9480> | 06:31 |
zyga-x240 | mvo: we need to fix spread-shellcheck | 06:36 |
mborzecki | zyga-x240: tests/unit/go hangs on 20.04 too right? | 06:38 |
mvo | zyga-x240, mborzecki yeah, something is wrong, I was just startng a spread google run to see what is going on. do you have an idea already? | 06:39 |
zyga-x240 | mborzecki: not sure | 06:42 |
zyga-x240 | mborzecki: I just don't know | 06:42 |
zyga-x240 | mvo: not immediately, either bug in older python (related to recursive executor submit) or something else | 06:42 |
mborzecki | tryign with --max-procs=2 might be useful | 06:43 |
mborzecki | zyga-x240: otoh, the unit tests job as run by gh actions does not seem to fail or hang for that matter | 06:46 |
zyga-x240 | yeah, that's what makes me think it may be python version-specific behavior | 06:46 |
zyga | I'm finishing my breakfast now, I'll start in a moment | 06:48 |
mborzecki | zyga: looking at one of my PRs, it failed on 20.04, failed on 18.04 (though tests/unit/spread-shellcheck test) hit kill-timeout in both cases, 16.04 passed | 06:49 |
zyga | heh | 06:49 |
zyga | it's all over the place | 06:49 |
zyga | shall we revert and fix this async? | 06:49 |
zyga | I'd rewrite the code so that it returns todo units like what jamesh suggested | 06:50 |
zyga | then it's one loop and one executor only | 06:50 |
mborzecki | zyga: sounds like more work though ;) | 06:50 |
zyga | yes | 06:50 |
mborzecki | zyga: anyways, i think we should revert that last change | 06:50 |
zyga | that's why separate actions 1) revert 2) fix | 06:50 |
zyga | mvo how does that sound? | 06:51 |
mborzecki | also tests/unit/spread-shellcheck duplicates work right now, run-checks is run in tests/unit/go | 06:51 |
mborzecki | so either we can disable spread-shellcheck in tests/unit/go or drop the other test | 06:51 |
mvo | mborzecki: I think we can kill tests/unit/spread-shellcheck now | 06:51 |
mborzecki | mvo: sgtm | 06:52 |
mborzecki | running manually now, those spread nodes have 1 cpu | 06:54 |
mborzecki | zyga: heh, so some deadlock, a number of jobs submitted, nothing happening, cpu usage 0% | 06:55 |
zyga | backtrace! | 06:55 |
zyga | that's quick | 06:55 |
pstolowski | good morning! | 06:57 |
zyga-x240 | pstolowski: hello | 06:58 |
mborzecki | zyga-x240: https://paste.ubuntu.com/p/ZqDTCcFvnb/ heh (cc mvo) | 06:58 |
mborzecki | pstolowski: hey | 06:58 |
mborzecki | zyga-x240: only 2 threads and both are waiting | 06:58 |
zyga | interesting | 06:59 |
zyga | and good idea to use gdb! | 06:59 |
zyga | so one is running checkpaths | 07:00 |
zyga | going through each location | 07:00 |
zyga | while the other is in checkfile | 07:01 |
zyga | waiting for the result | 07:01 |
zyga | yeah | 07:01 |
zyga | mborzecki perhaps just ensuring we have 3 workers miniumum :P | 07:01 |
zyga | a bit lame but ... | 07:01 |
zyga | (as in the minimum N) | 07:01 |
zyga | mborzecki what do you think? | 07:01 |
mborzecki | zyga: --max-procs 3 seems to work | 07:05 |
jamesh | In the end, you really want the code waiting for futures moved outside of the thread pool | 07:06 |
zyga | yeah, I think that's the proper solution | 07:07 |
jamesh | next step: overengineer it with asyncio | 07:11 |
zyga | jamesh no no ;) | 07:14 |
mborzecki | haha | 07:14 |
jamesh | zyga: but just think of all the threads you'd save! | 07:15 |
mvo | mborzecki: nice find! | 07:16 |
mborzecki | jamesh: save all the threads! | 07:16 |
mborzecki | hmm so maybe we just need +1 worker really | 07:21 |
zyga | mborzecki I think that's the easy fix | 07:22 |
zyga | and we should rewrite that slightly as jamesh mentioned | 07:22 |
mborzecki | zyga: i'd land an easy fix with a comment, and then maybe work on the larger fix/refactor | 07:22 |
zyga | +1 | 07:23 |
mvo | +1 | 07:23 |
mborzecki | zyga: fwiw i can reproduce this locally with --max-procs=1 | 07:24 |
zyga | yeah, my fault for testing on my beefiest system | 07:24 |
mborzecki | zyga: are you opening a quick pr with the workaround? | 07:41 |
zyga | mborzecki nope, I thought you want to do that | 07:41 |
zyga | I can though | 07:41 |
mborzecki | zyga: no worries, i can push it | 07:41 |
zyga | thanks! | 07:41 |
pstolowski | mvo: hi! #8960 got +1 from Samuele and has 3 reviews; would be great to land at the most convienient moment after cutting a new release branch. perhaps it would make sense to squash-merge it in case of anything unexpected (and a need for a revert)? | 07:42 |
mup | PR #8960: o/snapstate,servicestate: use service-control task for service actions (9/9) <Needs Samuele review> <Services βοΈ> <β Blocked> <Created by stolowski> <https://github.com/snapcore/snapd/pull/8960> | 07:42 |
zyga-x240 | I think I need to solve the blockers on https://github.com/snapcore/snapd/pull/9204 | 07:47 |
mup | PR #9204: sandbox: track applications unconditionally <Created by zyga> <https://github.com/snapcore/snapd/pull/9204> | 07:47 |
zyga-x240 | as that's really required to enable r-a-a | 07:47 |
zyga-x240 | mborzecki: IIRC fedora will disable getenforce/setenforce soon | 07:50 |
zyga-x240 | how will that affect our test suite? | 07:50 |
mborzecki | zyga-x240: hm, got more info? | 07:50 |
zyga-x240 | mborzecki: one sec | 07:50 |
zyga-x240 | https://lwn.net/Articles/831748/ | 07:51 |
mup | PR snapd#9496 opened: spread-shellcheck: temporary workaround for deadlock, drop unnecessary test <Simple π> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9496> | 07:51 |
mborzecki | zyga-x240: it's ok for us, we only switch between permissive/enforcing | 07:52 |
mborzecki | zyga-mbp: mvo: ^^ 9496 | 07:57 |
mvo | mborzecki: looking | 08:00 |
mup | PR snapd#9497 opened: Have session agent connect to the D-Bus session bus <Created by jhenstridge> <https://github.com/snapcore/snapd/pull/9497> | 08:01 |
jamesh | zyga-x240: ^^ this PR might help out a bit with your notifications work. | 08:03 |
zyga-x240 | jamesh: interesting | 08:04 |
zyga-x240 | I was using the other socket for broadcast but this may be useful as well | 08:04 |
jamesh | other socket? | 08:05 |
zyga-x240 | jamesh: the snapd-user-agent socket | 08:05 |
zyga-x240 | not dbus :) | 08:05 |
zyga-x240 | why do we have to be launched by systemd? | 08:06 |
zyga-x240 | ah | 08:06 |
jamesh | zyga-x240: we're already launched by systemd | 08:06 |
zyga-x240 | that's a dbus service | 08:06 |
zyga-x240 | got confused for a sec | 08:06 |
jamesh | we want to make sure that if we get activated via D-Bus first, we still get our file descriptor | 08:06 |
jamesh | for REST | 08:06 |
zyga-x240 | mmm | 08:07 |
jamesh | I'm not suggesting replacing the snapd <-> agent communication with D-Bus | 08:07 |
jamesh | this would be for a return path of the agent <-> desktop shell communication | 08:07 |
zyga-x240 | right | 08:07 |
zyga-x240 | jamesh: I've sent a quick review just now | 08:10 |
zyga-x240 | something weird on centos 7 | 08:22 |
zyga-x240 | hmm | 08:30 |
zyga-x240 | type=SYSCALL msg=audit(10/13/20 08:11:54.336:587) : arch=x86_64 syscall=kill success=yes exit=0 a0=0xffffffffffffa7e8 a1=SIGKILL a2=0x0 a3=0xf91e50 items=0 ppid=1 pid=22326 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=snapd exe=/usr/libexec/snapd/snapd subj=system_u:system_r:snappy_t:s0 key=(null) | 08:30 |
zyga-x240 | type=AVC msg=audit(10/13/20 08:11:54.336:587) : avc: denied { sigkill } for pid=22326 comm=snapd scontext=system_u:system_r:snappy_t:s0 tcontext=system_u:system_r:snappy_cli_t:s0 tclass=process permissive=1 | 08:30 |
zyga-x240 | type=SYSCALL msg=audit(10/13/20 08:01:54.377:396) : arch=x86_64 syscall=connect success=yes exit=0 a0=0x8 a1=0xc000320f10 a2=0x22 a3=0x4 items=0 ppid=22326 pid=22552 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=snap exe=/usr/bin/snap subj=system_u:system_r:snappy_cli_t:s0 key=(null) | 08:31 |
zyga-x240 | mborzecki: do you know how to recover from: Corrupted checkpoint file. Inode match, but newer complete event (1602577959.287:791) found before loaded checkpoint 1602577772.146:790 | 08:33 |
zyga-x240 | that's from ausearch | 08:33 |
zyga-x240 | nvm I got it | 08:34 |
zyga-x240 | so | 08:34 |
zyga-x240 | type=AVC msg=audit(10/13/20 08:32:39.287:791) : avc: denied { connectto } for pid=23299 comm=snap path=/run/dbus/system_bus_socket scontext=system_u:system_r:snappy_cli_t:s0 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=1 | 08:34 |
zyga-x240 | type=AVC msg=audit(10/13/20 08:32:39.287:791) : avc: denied { search } for pid=23299 comm=snap name=dbus dev="tmpfs" ino=13502 scontext=system_u:system_r:snappy_cli_t:s0 tcontext=system_u:object_r:system_dbusd_var_run_t:s0 tclass=dir permissive=1 | 08:34 |
zyga-x240 | those are the first two denials to fix | 08:34 |
* zyga-x240 tries to adjust the policy | 08:45 | |
mborzecki | zyga-x240: did you manage to fix it? | 08:57 |
mborzecki | (the policy i mean) | 08:57 |
zyga-x240 | mborzecki: I'm slow at iterating at this | 08:58 |
zyga-x240 | not yet | 08:58 |
zyga-x240 | mborzecki: should we bump the snapd policy version perhaps? :) | 08:59 |
zyga-x240 | maybe it should match snapd version | 08:59 |
mborzecki | zyga-x240: try with dbus_stream_connect_system_dbusd(snappy_cli_t) and dbus_chat_system_bus(snappy_cli_t) | 09:00 |
mborzecki | zyga-x240: perhaps we should, though we never did | 09:00 |
mborzecki | zyga-x240: setting it to version of snapd sounds ok | 09:01 |
zyga-x240 | thanks, trying | 09:01 |
mborzecki | zyga-x240: the modules in core policy ahve a version bump on each change (in theory) | 09:02 |
zyga-x240 | mm | 09:02 |
zyga-x240 | I suspect nothing just cares about this number | 09:02 |
zyga-x240 | just an observation, it's not important | 09:02 |
mborzecki | zyga-x240: there maybe some automation under the hood, in our development we replace the module by hand, so it automatically gets a higher priority than the currently loaded one | 09:03 |
zyga-x240 | mborzecki: I'll read https://github.com/SELinuxProject/refpolicy/blob/master/policy/modules/services/dbus.te and friends to see if there's something we should use | 09:09 |
mborzecki | haha | 09:12 |
mborzecki | zyga-x240: though you really want to read this: https://github.com/SELinuxProject/refpolicy/blob/master/policy/modules/services/dbus.if | 09:12 |
zyga-x240 | (I meant all three) | 09:13 |
mborzecki | zyga-x240: the *.te (type enforcement?) file is for the actual system dbus daemon, *.fc (file context?) is the files/sockets/dirs | 09:13 |
mborzecki | zyga-x240: *.if is the interfaces for use from other modules | 09:13 |
zyga-x240 | mmm | 09:13 |
zyga-x240 | mborzecki: is all the comment XML in those .if files processed by anything? | 09:22 |
zyga-x240 | is there a "compiled" version anywhere? | 09:22 |
mborzecki | zyga-x240: yes, there should be documentation in your system, though i don't think is avaialble anywhere online | 09:23 |
zyga-x240 | ah, ok | 09:23 |
zyga-x240 | I'll check on F32 | 09:23 |
mborzecki | zyga-x240: latest tech :P | 09:23 |
zyga-x240 | mborzecki: more denials: https://pastebin.ubuntu.com/p/5K5TTpKwkT/ | 09:28 |
zyga-x240 | I think the blocker is type=AVC msg=audit(10/13/20 09:16:35.588:270) : avc: denied { search } for pid=22502 comm=snap name=dbus dev="tmpfs" ino=13389 scontext=system_u:system_r:snappy_cli_t:s0 tcontext=system_u:object_r:system_dbusd_var_run_t:s0 tclass=dir permissive=1 | 09:28 |
zyga-x240 | but the rest is also interesting, it seems snapd cannot stop the hook | 09:29 |
zyga-x240 | (after a timeout) | 09:29 |
mborzecki | yeah, sigkill and all | 09:29 |
mborzecki | zyga-x240: dbus_system_bus_client(snappy_cli_t) should do it | 09:30 |
zyga-x240 | yep | 09:30 |
mup | PR snapd#9293 closed: snap: auto-import will not try to auto-create users on managed devices <Needs Samuele review> <Created by mvo5> <Closed by mvo5> <https://github.com/snapcore/snapd/pull/9293> | 09:31 |
mup | PR snapd#9498 opened: client,daemon,snap: auto-import does not error on managed devices <Needs Samuele review> <Run nested> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9498> | 09:31 |
mborzecki | zyga-x240: and you can probably drop dbus_connect_system_bus(snappy_cli_t) looks like it duplicates some of the things from *_bus_client | 09:32 |
zyga-x240 | ok | 09:32 |
pedronis | #9463 needs 2nd reviews (it's small) | 09:38 |
mup | PR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463> | 09:38 |
zyga-x240 | ack | 09:40 |
zyga-x240 | https://github.com/snapcore/snapd/pull/9463#pullrequestreview-507255123 | 09:44 |
mup | PR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463> | 09:44 |
zyga-x240 | made one remark, so perhaps something to adjust before landing | 09:44 |
zyga-x240 | mborzecki: nice, passing | 09:45 |
zyga-x240 | mborzecki: we need session equivalent though | 09:45 |
zyga-x240 | I don't see any in the refpolicy gh repo | 09:46 |
zyga-x240 | mborzecki: rebased and pushed back to https://github.com/snapcore/snapd/pull/9204 | 09:46 |
mup | PR #9204: sandbox: track applications unconditionally <Created by zyga> <https://github.com/snapcore/snapd/pull/9204> | 09:46 |
zyga-x240 | pedronis: should I push a trivial patch for https://github.com/snapcore/snapd/pull/9463 | 09:55 |
mup | PR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463> | 09:55 |
zyga-x240 | er | 09:55 |
zyga-x240 | for https://github.com/snapcore/snapd/pull/9463#discussion_r503813935 | 09:55 |
pedronis | zyga-x240: if you have time yes | 09:55 |
zyga-x240 | on it | 09:56 |
pedronis | mvo: should we close #8845 and #8929 until we have time to discuss/adjust them and reprose? | 09:56 |
mup | PR #8845: [RFC] many: add "system.service.snapd-autoimport.disable" setting <Needs Samuele review> <β Blocked> <Created by mvo5> <https://github.com/snapcore/snapd/pull/8845> | 09:56 |
mup | PR #8929: [RFC] many: add new "daemon-startup: inhibit" option <Needs Samuele review> <Created by mvo5> <https://github.com/snapcore/snapd/pull/8929> | 09:56 |
pedronis | *re-propose | 09:57 |
mvo | pedronis: sure | 09:57 |
pedronis | thx | 09:58 |
mvo | closed | 09:58 |
zyga-x240 | pushed | 10:01 |
zyga-x240 | https://github.com/snapcore/snapd/pull/9463/commits/26f6b6680027ea9b1262a03b27a28ac6f3d60a9e if anyone wants to cross-check | 10:01 |
mup | PR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463> | 10:01 |
mup | PR snapd#8845 closed: [RFC] many: add "system.service.snapd-autoimport.disable" setting <Needs Samuele review> <β Blocked> <Created by mvo5> <Closed by mvo5> <https://github.com/snapcore/snapd/pull/8845> | 10:02 |
zyga-x240 | mborzecki: any idea on how to provide sigkill and other permissions | 10:14 |
zyga-x240 | I think we're missing a test | 10:14 |
zyga-x240 | that we can kill a hook that is running for too long | 10:14 |
zyga-x240 | (or miss a test that runs selinux checks) | 10:14 |
zyga-x240 | mborzecki: do you think we could move the "no denials" check to an invariant? | 10:14 |
mborzecki | zyga-x240: hm this should do it `allow snappy_t snappy_cli_t:process { sigkill };` | 10:19 |
mborzecki | zyga-x240: wondering though, why the hook process was still snappy_cli_t | 10:19 |
mborzecki | can you reproduce that and grab ps -Z ? | 10:19 |
zyga-x240 | oh | 10:19 |
zyga-x240 | I will try | 10:19 |
zyga-x240 | sure | 10:19 |
zyga-x240 | ah I know why | 10:21 |
zyga-x240 | mborzecki: because that was still the "snap run" phase | 10:21 |
zyga-x240 | we got a denialy on dbus method call | 10:21 |
zyga-x240 | and got stuck waiting for a response | 10:21 |
zyga-x240 | mborzecki: I wonder if we should setup a timeout / guard of some sort | 10:22 |
zyga-x240 | but that explains the label | 10:22 |
zyga-x240 | we switch that in snap-exec | 10:22 |
zyga-x240 | er | 10:22 |
zyga-x240 | snap-confine | 10:22 |
=== JanC_ is now known as JanC | ||
zyga-x240 | mborzecki: what is the label we transition to when we run as a snap app? | 10:25 |
zyga-x240 | I added this | 10:25 |
zyga-x240 | # snapd tries to kill hooks that run for over 10 minutes. | 10:25 |
zyga-x240 | allow snappy_t snappy_cli_t:process { sigkill }; | 10:25 |
mborzecki | zyga-x240: if it's under systemd the final label is unconfined_service_t | 10:25 |
zyga-x240 | but I think we need more than that | 10:25 |
zyga-x240 | not under systemd | 10:26 |
zyga-x240 | those are specifically hooks | 10:26 |
zyga-x240 | systemd just tracks them | 10:26 |
zyga-x240 | not spawns them | 10:26 |
mborzecki | zyga-x240: right, but the transitions take the same route iirc | 10:26 |
mborzecki | zyga-x240: so the hook ends up as unconfined service too | 10:26 |
zyga-x240 | oh | 10:28 |
zyga-x240 | ok, that's good then | 10:28 |
zyga-x240 | I'll add two lines | 10:28 |
* zyga-x240 tests killing w | 10:29 | |
zyga-x240 | with disabled dbus perms | 10:30 |
pstolowski | pedronis: hi, i've spent quite a bit investigation undo on remove, found a couple of issues and filed https://bugs.launchpad.net/snapd/+bug/1899614 (and also worked on addressing point #2 there) | 10:38 |
mup | Bug #1899614: multiple problems with undo for 'snap remove' <snapd:New> <https://launchpad.net/bugs/1899614> | 10:38 |
pstolowski | *investigating | 10:40 |
zyga-x240 | pstolowski: rogpeppe is around and could provide extra information about the other bug we've discussed | 11:01 |
pstolowski | zyga-x240: thanks, but i don't have anything specific atm, but mvo had requested some info there before, would be good to add it if possible | 11:02 |
pstolowski | rogpeppe ^ | 11:02 |
rogpeppe | pstolowski: ok, i've added that command output | 11:03 |
zyga-x240 | thank you! | 11:03 |
pstolowski | thanks! | 11:04 |
rogpeppe | i don't seem to get email notifications from launchpad, so please ping me here if you want any further interaction from me in that issue, thanks! | 11:04 |
zyga-x240 | ah, will do | 11:05 |
pstolowski | ok, that log is very interesting.. we saw " there was a rollback across reboot" before, didn't we? | 11:06 |
zyga-x240 | aha | 11:07 |
pstolowski | this at least explains the refresh and undo on Oct 5 that I also saw in the state timings; but what happened after next 2 reboots and why did it work till Oct 7th is a mistery | 11:08 |
zyga-x240 | pstolowski: I don't recall how this layer works, what would happen if the boot partition is corrputed (vfat) and we cannot really set the next boot to anything different | 11:08 |
zyga-x240 | like it's always stuck at one thing | 11:08 |
zyga-x240 | would that explain anything? | 11:08 |
pstolowski | i've no idea about that code | 11:08 |
zyga-x240 | rogpeppe: I think you could try unmounting the boot partition (vfat) and fscking it | 11:08 |
zyga-x240 | rogpeppe: also if you want to recover the system, we should be able to help | 11:08 |
zyga-x240 | just not sure what's the best way to do that | 11:08 |
zyga-x240 | I was also talking to mborzecki about this issue | 11:09 |
zyga-x240 | and we don't remove snapd.service from disk | 11:09 |
zyga-x240 | rogpeppe: could you check if snapd.service is in /etc/systemd/system? | 11:09 |
pstolowski | afaiu the system works again, no? logs from oct12 | 11:09 |
zyga-x240 | rogpeppe: oh, is snapd running now? | 11:09 |
pstolowski | maybe i'm making assumptions.. but afaiu it stopped working on 7th? | 11:10 |
rogpeppe | one mo, let me check | 11:10 |
rogpeppe | the system isn't working currently | 11:10 |
rogpeppe | snapd does seem to be running: | 11:11 |
rogpeppe | rogpeppe@localhost:~$ ps alxw | grep snapd | 11:11 |
rogpeppe | 4 0 725 1 20 0 928048 18712 - Ssl ? 2:57 /snap/snapd/9169/usr/lib/snapd/snapd | 11:11 |
zyga-x240 | ! | 11:11 |
pstolowski | rogpeppe: ha, that's weird! | 11:11 |
zyga-x240 | can you run /snap/snapd/9169/usr/bin/snap? | 11:11 |
pstolowski | rogpeppe: and no /snap/snapd/current symlink right? | 11:11 |
rogpeppe | /snap/snapd/current is still not there, right | 11:11 |
rogpeppe | i can run /snap/snapd/9169/usr/bin/snap ok | 11:12 |
zyga-x240 | pstolowski: ^ and ideas on what to explore? | 11:12 |
zyga-x240 | rogpeppe: maybe snap list --all | 11:12 |
zyga-x240 | for a first sanity check | 11:12 |
rogpeppe | i could give you a login to the system if you want | 11:12 |
zyga-x240 | then maybe snap install snapd? | 11:12 |
zyga-x240 | pstolowski: do you want to debug this? | 11:12 |
rogpeppe | this is the output of snap list --all: https://paste.ubuntu.com/p/Rwvr2dPD9M/ | 11:14 |
pstolowski | rogpeppe: yes, sure, that would be great | 11:14 |
zyga-x240 | core 16-2.46.1 9995 latest/stable canonical* core | 11:15 |
zyga-x240 | core is linked | 11:15 |
pstolowski | do we do anything magical where snapd would start without current symlink after rebooot? (don't think so, but...) | 11:15 |
zyga-x240 | but the boot base is core18, right? | 11:15 |
zyga-x240 | rogpeppe: what's /meta/snap.yaml snap name? | 11:15 |
* rogpeppe tries to remember how to grant ssh access. so rusty! | 11:15 | |
zyga-x240 | pstolowski: I don't think we do | 11:15 |
zyga-x240 | rogpeppe: you can try ssh-import-id | 11:15 |
zyga-x240 | not sure if it's preinstalled | 11:15 |
rogpeppe | that's the command i was trying to remember! | 11:15 |
zyga-x240 | cool :) | 11:16 |
zyga-x240 | mvo: ^ very interesting bug | 11:16 |
zyga-x240 | lots for us to learn on robustness | 11:16 |
rogpeppe | pstolowski: is your launchpad username pstolowski ? | 11:16 |
rogpeppe | ok, no ssh-import-id command | 11:17 |
pstolowski | rogpeppe: 1 sec, i will import ssh keys from my current box | 11:17 |
rogpeppe | :thumbsup: | 11:17 |
mvo | zyga-x240: is there more news? | 11:19 |
zyga-x240 | mvo: yes, we have access to the device | 11:19 |
zyga-x240 | snapd is disabled! | 11:19 |
mvo | woah, how did that happen :( ? | 11:20 |
zyga-x240 | the boot partition is still corrputed I suspect | 11:20 |
pstolowski | rogpeppe: my lp user is stolowski | 11:20 |
zyga-x240 | we can get all the logs | 11:20 |
mvo | \o/ | 11:20 |
zyga-x240 | mvo: missing undo in unlink snap, I bet | 11:20 |
zyga-x240 | but could be something much more complex | 11:20 |
pstolowski | i'm not sure it's this zyga-x240 | 11:20 |
pstolowski | but cannot exclude it of course | 11:20 |
mvo | thank you so much rogpeppe ! | 11:21 |
zyga-x240 | pstolowski: ack | 11:21 |
pstolowski | mvo, zyga-x240 any suggestions what to collect? anything regarding boot? | 11:21 |
* mvo hugs pstolowski and zyga-x240 for their tireless debugging also | 11:21 | |
rogpeppe | rogpeppe@localhost:~/.ssh$ ed | 11:21 |
rogpeppe | -bash: ed: command not found | 11:21 |
rogpeppe | dammit! | 11:21 |
zyga-x240 | pstolowski: maybe to be safe collect all journal logs | 11:21 |
mvo | +1 | 11:21 |
zyga-x240 | rogpeppe: vi is there | 11:21 |
zyga-x240 | rogpeppe: you can also echo >> | 11:21 |
rogpeppe | yeah, i'll use cat | 11:21 |
rogpeppe | i don't use an ANSI terminal so vi isn't good for me | 11:21 |
pstolowski | zyga-x240: is tar /var/log.. good enough, or is there a better way? | 11:23 |
pstolowski | thanks rogpeppe, checking | 11:23 |
zyga-x240 | I think that's good | 11:23 |
zyga-x240 | pstolowski: you can use journalctl with a standlone directory to examine machine logs without journald itself | 11:23 |
pstolowski | zyga-x240: yeah i did it once.. slightly inconvinent but works | 11:24 |
zyga-x240 | pstolowski: journalctl -D /path/to/var/log/journal | 11:25 |
zyga-x240 | and then it works IIRC | 11:25 |
zyga-x240 | mborzecki: is spread-shellcheck fixed? | 11:31 |
zyga-x240 | ah | 11:31 |
zyga-x240 | I see the PR | 11:31 |
zyga-x240 | thanks! | 11:31 |
zyga-x240 | approved | 11:32 |
amurray | zyga-x240: hey so you mentioned re docker-support/multipass-support being broken with aa3 on groovy that you would prefer a snap-update-ns approach - can you elaborate more on what you are thinking here? I am not sure I understand exactly what you have in mind. | 11:36 |
zyga-x240 | sure | 11:36 |
zyga-x240 | I was thinking that the special interfaces they rely on could provide a mount profile that puts the base snap's apparmor config in /etc | 11:36 |
pstolowski | mvo: anything re boot env that can be useful? | 11:37 |
zyga-x240 | something like mount --bind /snap/core18/current/etc/apparmor.d /etc/apparmor.d | 11:37 |
zyga-x240 | pstolowski: if you can, try fscking the boot partition | 11:37 |
zyga-x240 | or | 11:37 |
zyga-x240 | dd it | 11:37 |
zyga-x240 | to analyze post-mortem | 11:37 |
zyga-x240 | you may want to flip it read only for that operation | 11:37 |
zyga-x240 | or unmount it | 11:37 |
zyga-x240 | do you remember that vfat bug we ran into before? | 11:38 |
amurray | zyga-x240: ok so is this already easily possible with the existing way that interfaces are declared? I am not super familiar with that... | 11:38 |
zyga-x240 | amurray: I believe it should, the only thin that would be required in addition to this, is the permission for snap-update-ns to do this as well | 11:38 |
zyga-x240 | amurray: if that's urgent I could look | 11:38 |
zyga-x240 | amurray: but do look at the mount profile part | 11:38 |
zyga-x240 | the apparmor part should be easy once that is in the works | 11:38 |
zyga-x240 | you can test this by making a snap that uses the new interface (or the vanilla original snaps) | 11:39 |
zyga-x240 | and looking at the generated mount profile in /var/lib/snapd/apparmor/mount/ | 11:39 |
zyga-x240 | there are some examples | 11:39 |
zyga-x240 | for instance, the desktop interface uses this mechanism to bind mount fonts around | 11:39 |
amurray | zyga-x240: oh can you point me at examples, I am still confused π | 11:39 |
zyga-x240 | sure | 11:39 |
amurray | ah ok will take a look | 11:39 |
zyga-x240 | amurray: in the snapd tree please look at interfaces/builtin/desktop.go | 11:40 |
zyga-x240 | let me open it as well | 11:40 |
amurray | yep am just looking now | 11:40 |
pstolowski | zyga-x240: i'd rather avoid any potentially destructive steps atm, would leave that to rogpeppe | 11:40 |
zyga-x240 | pstolowski: ok, a dd of the vfat while it is mounted would be useful as well | 11:40 |
zyga-x240 | even if you just stash it on the device | 11:40 |
zyga-x240 | not sure how large it is | 11:40 |
zyga-x240 | amurray: so if you scroll to line 295 | 11:40 |
amurray | zyga-x240: I am guessing that AddMountEntry() would be the thing? | 11:40 |
zyga-x240 | you can see how it grants apparmor permissions | 11:41 |
pstolowski | dd of /boot partition? nb, logs will be huuge | 11:41 |
zyga-x240 | there are several profiles involved | 11:41 |
zyga-x240 | pstolowski: the vfat | 11:41 |
zyga-x240 | not sure how big it is | 11:41 |
amurray | ah yep and the corresponding apparmor bits - thanks :) | 11:41 |
zyga-x240 | amurray: the key part there is AddUpdateNSf function | 11:41 |
zyga-x240 | which adds a piece of text for per-snap profile for snap-update-ns | 11:41 |
zyga-x240 | this just needs the permission to bind /snap/{base}/*/etc/apparmor.d -> /etc/apparmor.d | 11:42 |
zyga-x240 | now jump to 322 | 11:42 |
zyga-x240 | this does what you mentioned bfore | 11:42 |
zyga-x240 | *before | 11:42 |
zyga-x240 | the difference is that we need spec.AddMountENtry (not *User*) | 11:42 |
zyga-x240 | there's more | 11:42 |
zyga-x240 | I believe those should be permanent things | 11:42 |
zyga-x240 | regardless of connection | 11:43 |
zyga-x240 | so the method signature is different | 11:43 |
zyga-x240 | you can see that in ... | 11:43 |
zyga-x240 | if you go to interfaces/mount/spec.go:206 | 11:43 |
zyga-x240 | AddPermanentPlug | 11:43 |
zyga-x240 | there's a Slot variant just below | 11:43 |
zyga-x240 | the difference is in the arguments provided, | 11:44 |
zyga-x240 | the Permanent methods get an interface and a plug or a slot, not a connected plug / slot | 11:44 |
zyga-x240 | so it's just one side that you see | 11:44 |
zyga-x240 | anyway, | 11:44 |
pstolowski | journal logs are 324M, tgz | 11:44 |
zyga-x240 | I think that'sa sensible approach | 11:44 |
zyga-x240 | pstolowski: oh my | 11:44 |
zyga-x240 | pstolowski: maybe too much | 11:44 |
zyga-x240 | pstolowski: not sure, if we can send that over, that's good | 11:44 |
zyga-x240 | but confirm with rogpeppe for sure | 11:44 |
amurray | zyga-x240: ok thanks heaps for your guidance - I'll try take a look tomorrow morning and see if I can cook something up | 11:44 |
zyga-x240 | amurray: let me know how this feels like | 11:45 |
zyga-x240 | ok | 11:45 |
zyga-x240 | amurray: if you get stuck I'll help | 11:45 |
pstolowski | rogpeppe: ok to transfer 324M ^ ? | 11:45 |
zyga-x240 | amurray: which interfaces were those? docker-support and multipass-something? | 11:45 |
amurray | zyga-x240: thanks - so multipass-support iirc | 11:45 |
zyga-x240 | right | 11:46 |
zyga-x240 | ideally we'd have a spread test that installs those snaps | 11:46 |
zyga-x240 | and looks at the mount profile or at the mount namespace | 11:46 |
zyga-x240 | I think that's the last step though | 11:46 |
zyga-x240 | I can definitely help | 11:46 |
amurray | yeah I was wondering if I should try and add a test with whatever fix I come up with for this but will focus on a getting the right fix first and then can look at that if time permits... | 11:47 |
zyga-x240 | amurray: you can start with a quick failing test | 11:47 |
zyga-x240 | do you know how to write those? | 11:48 |
zyga-x240 | mkdir tests/regression/lp-XXX | 11:48 |
zyga-x240 | add a summary: with some info | 11:48 |
zyga-x240 | then execute: | (newline)(tab)false | 11:48 |
zyga-x240 | run that test with SPREAD_DEBUG_EACH=0 spread -debug -v google:ubuntu-20.10-64:tests/regression/lp-XXX | 11:49 |
zyga-x240 | in the shell install the snap you need | 11:49 |
zyga-x240 | use nsenter / cat to explore the files in /etc/apparmor.d | 11:49 |
zyga-x240 | eventually copy those ideas over to the yaml | 11:49 |
zyga-x240 | quit the debug shell and re-run to verify | 11:49 |
zyga-x240 | at some point it will measure failure | 11:49 |
zyga-x240 | and then that's a good start | 11:49 |
zyga-x240 | we have a library of helper programs that assist in writing tests | 11:50 |
zyga-x240 | but the best thing is you can really experience this from the point of view of a user | 11:50 |
zyga-x240 | and create a valid test | 11:50 |
zyga-x240 | that only later needs tweaking so that it fits the rest of the test stack | 11:50 |
amurray | this will be my first time writing a test so again I really appreciate the guidance - cheers | 11:50 |
zyga-x240 | amurray: look at various tests around, though you may stumble on more unusual test from time to time | 11:51 |
zyga-x240 | you can also use qemu locally | 11:51 |
zyga-x240 | you will need a test image, you can get that with adt | 11:51 |
zyga-x240 | I can find the magic line if you want to use that instead of the google backend | 11:51 |
zyga-x240 | just let me know | 11:51 |
zyga-x240 | I think, on 20.10, that is autopkgtest-buildvm-ubuntu-cloud | 11:51 |
zyga-x240 | you just need a few extra args to get a groovy image | 11:52 |
amurray | sure any help with magic incantations are greatly appreciated :) | 11:52 |
zyga-x240 | drop that into ~/.spread/qemu | 11:52 |
zyga-x240 | as ubuntu-20.10-64.img | 11:52 |
zyga-x240 | and you're good | 11:52 |
zyga-x240 | a bit of advice that qemu tests are heavy on networking | 11:52 |
zyga-x240 | so you may want a good connection | 11:53 |
zyga-x240 | over time you can speed up with things like apt-cacher-ng | 11:53 |
zyga-x240 | anyway, let me know if this helps and if you get stuck on anything just ask | 11:53 |
amurray | will do - thanks again (my connection is ok, not great so will see how I fare...) | 11:53 |
amurray | ok time for me to go eod - thanks again zyga-x240 for your help - have a great day | 11:54 |
zyga-x240 | likewise! | 11:54 |
zyga-x240 | see you later | 11:55 |
pstolowski | zyga-x240: sorry, i'm not sure about dd and vfat, can you elaborate? | 11:58 |
zyga-x240 | pstolowski: how large is the fvat partition on that pi? | 11:58 |
zyga-x240 | I don't recall | 11:58 |
zyga-x240 | pstolowski: I wonder what's the impact of the fact that the partition is not cleanly unmounted | 11:59 |
zyga-x240 | and may not have been unmounted | 11:59 |
pstolowski | zyga-x240: i don't see any mounted vfat partitions | 11:59 |
zyga-x240 | cleanly that is | 11:59 |
zyga-x240 | hmmm | 11:59 |
zyga-x240 | can you paste mount? | 11:59 |
zyga-x240 | rogpeppe: do you recall if you unmounted the boot partition last time we were looking at tihs? | 12:00 |
rogpeppe | zyga-x240: yeah, i might have | 12:00 |
zyga-x240 | ah, that explains things | 12:00 |
zyga-x240 | thank you | 12:00 |
pstolowski | zyga-x240,z rogpeppe i've already collected mount output | 12:00 |
zyga-x240 | rogpeppe: did you try to fsck that partition after unmounting it? | 12:01 |
rogpeppe | zyga-x240: i tried, but there's no fsck command available | 12:01 |
zyga-x240 | rogpeppe: oh | 12:01 |
rogpeppe | zyga-x240: (and no way to install one, of course :) ) | 12:01 |
zyga-x240 | pstolowski: what's the boot base (/meta/snap.yaml will help) | 12:01 |
zyga-x240 | is that core18 or core? | 12:01 |
zyga-x240 | I see /sbin/fsck.vfat in both core and core18 | 12:03 |
zyga-x240 | can you confirm those are on PATH pstolowski? | 12:03 |
zyga-x240 | rogpeppe: ^ | 12:04 |
zyga-x240 | rogpeppe: if you can, perhaps fsck.vfat /dev/mmcblk0p{something} | 12:04 |
pstolowski | zyga-x240: it's core18 - https://paste.ubuntu.com/p/Rwvr2dPD9M/ | 12:05 |
zyga-x240 | pstolowski: right and that is the boot base for sure? | 12:05 |
zyga-x240 | our list output doesn't show this | 12:05 |
rogpeppe | zyga-x240: what's that "{something}" supposed to be a placeholder for? | 12:05 |
zyga-x240 | rogpeppe: the number of the partition with vfat | 12:05 |
zyga-x240 | lsblk can help finding it | 12:05 |
zyga-x240 | I think it's just p0 or p1 | 12:05 |
pstolowski | no fsck.vfat on path! | 12:06 |
zyga-x240 | pstolowski: and in /sbin/fsck.vfat? | 12:06 |
rogpeppe | pstolowski: feel free to run fsck... | 12:06 |
pstolowski | nope | 12:06 |
pstolowski | $ ls /sbin/fsck* | 12:06 |
zyga-x240 | pstolowski: can you check /meta/snap.yaml to ensure that the boot base is core18 for sure, I'm surprised to see three core revisions and one core18 | 12:06 |
zyga-x240 | woah! | 12:07 |
zyga-x240 | let me check | 12:07 |
pstolowski | "/sbin/fsck /sbin/fsck.cramfs /sbin/fsck.ext2 /sbin/fsck.ext3 /sbin/fsck.ext4 /sbin/fsck.minix" | 12:07 |
zyga-x240 | mvo: ^^^ | 12:07 |
zyga-x240 | that's very likely a serious problem | 12:07 |
zyga-x240 | pstolowski: how about fsck.fat? | 12:07 |
zyga-x240 | is that gone too? | 12:07 |
zyga-x240 | I see it in my core18 snap on x86-64 | 12:07 |
zyga-x240 | pstolowski: that's worth reporting as a separate bug with a regression test that checks that's an executable program | 12:08 |
pstolowski | zyga-x240: on what path on your system? | 12:08 |
zyga-x240 | and that it can run --help | 12:08 |
zyga-x240 | pstolowski: /snap/core18/current/sbin/fsck.vfat | 12:08 |
zyga-x240 | that's a symlink to fsck.fat | 12:08 |
zyga-x240 | but I see that in the core snap as well | 12:09 |
* zyga-x240 looks at revision numbers | 12:09 | |
* zyga-x240 checks stable channel | 12:10 | |
zyga-x240 | pstolowski: waiting for your confirmation of the boot base please | 12:10 |
pstolowski | zyga-x240: yeah, core has fsck.vfat here. but core18 doesn't. and it's not symlinked anywhere | 12:11 |
zyga-x240 | stable core has fsck | 12:11 |
zyga-x240 | ok | 12:11 |
zyga-x240 | probably core18 is at fault then | 12:11 |
zyga-x240 | pstolowski: but core is the boot base | 12:11 |
pstolowski | zyga-x240: boot base is core18 | 12:11 |
zyga-x240 | so what's going on? | 12:11 |
zyga-x240 | ahh | 12:11 |
zyga-x240 | ok | 12:12 |
pstolowski | i'm collecting all this and will soon attach to the report | 12:12 |
zyga-x240 | pstolowski: I think core18 didn't refresh | 12:12 |
zyga-x240 | it's very old | 12:12 |
zyga-x240 | core18 is revision 1885 here | 12:12 |
zyga-x240 | but 1885 in your log | 12:12 |
zyga-x240 | I think that could be related | 12:12 |
zyga-x240 | rogpeppe: consider runnin fsck.vfat from /snap/core/current/sbin/fsck.vfat | 12:12 |
zyga-x240 | then we could try running snap refresh core18 | 12:13 |
zyga-x240 | and snap install snapd | 12:13 |
zyga-x240 | that may recover the system | 12:13 |
zyga-x240 | I think this system is just stuck at old revisions and cannot move forward | 12:13 |
zyga-x240 | the fsck bug was fixed | 12:13 |
zyga-x240 | but this device is still affected | 12:13 |
pstolowski | interesting | 12:14 |
zyga-x240 | not sure what you think but the revision number there is really old | 12:14 |
zyga-x240 | pstolowski: so is snapd running now? | 12:14 |
zyga-x240 | the service I mean | 12:14 |
pstolowski | zyga-x240: yes | 12:14 |
zyga-x240 | pstolowski: can you try refreshing core18 | 12:15 |
zyga-x240 | though wait | 12:15 |
zyga-x240 | wait please | 12:15 |
zyga-x240 | that would cause a reboot | 12:15 |
zyga-x240 | and IIRC that's a problem | 12:15 |
zyga-x240 | rogpeppe: ^ | 12:15 |
zyga-x240 | rogpeppe: is rebooting that device acceptable for you? | 12:15 |
zyga-x240 | or will it misbehave? | 12:15 |
zyga-x240 | (I think that given its state we should first fsck boot partitio | 12:15 |
rogpeppe | zyga-x240: that's fine, although it probably won't restart | 12:15 |
zyga-x240 | mount it | 12:15 |
zyga-x240 | and look at what's there | 12:15 |
rogpeppe | zyga-x240: it will probably need manual intervention | 12:15 |
zyga-x240 | pstolowski: ^ ok, let's not reboot it yet | 12:15 |
zyga-x240 | pstolowski: please fsck boot | 12:16 |
rogpeppe | zyga-x240: it almost always needs to be restarted twice for some reason | 12:16 |
zyga-x240 | using core's fsck | 12:16 |
zyga-x240 | rogpeppe: I think because it reboots to try new core18 snap | 12:16 |
zyga-x240 | but fails | 12:16 |
zyga-x240 | because boot data is corrputed | 12:16 |
zyga-x240 | so it gets stuck | 12:16 |
zyga-x240 | (no watchdog timer) | 12:16 |
zyga-x240 | then reboot rolls back | 12:16 |
zyga-x240 | and we're back here | 12:16 |
pstolowski | yeah, i'd avoid any intervention now. i'd discuss the solution for rogpeppe and pass it to him to do | 12:16 |
zyga-x240 | rollback across reboot as pstolowski noted earlier | 12:16 |
pstolowski | let's discuss on the standup | 12:16 |
zyga-x240 | pstolowski: agreed | 12:16 |
zyga-x240 | rogpeppe: ^ if you can fsck is IMO safe | 12:17 |
zyga-x240 | then mounting the partition back | 12:17 |
rogpeppe | how on earth did the boot data get corrupted anyway? i haven't made any changes to it since i installed | 12:17 |
zyga-x240 | that may recover everything, assuming you try to install snapd again | 12:17 |
zyga-x240 | rogpeppe: it's fvat | 12:17 |
zyga-x240 | and it's written to by both uboot and linux | 12:17 |
zyga-x240 | we really don't know | 12:17 |
rogpeppe | zyga-x240: vfat can corrupt even when you're not changing it? | 12:17 |
zyga-x240 | rogpeppe: we do change it on boot | 12:18 |
zyga-x240 | we set a flag that says "we're trying to boot that thing now" | 12:18 |
rogpeppe | oh | 12:18 |
zyga-x240 | so that on reboot (if we fail) we don't retry | 12:18 |
rogpeppe | i guess that's where the issue comes from | 12:18 |
zyga-x240 | but boot a safe value | 12:18 |
zyga-x240 | indeed | 12:18 |
zyga-x240 | we found some issues with uboot FAT before | 12:18 |
zyga-x240 | those got fixed | 12:18 |
zyga-x240 | or was that GRUB fat | 12:18 |
zyga-x240 | anyway | 12:18 |
zyga-x240 | I think you can fix the partition so that it's not unclean | 12:18 |
zyga-x240 | then | 12:18 |
zyga-x240 | mount it | 12:18 |
zyga-x240 | so that snapd can write to it | 12:18 |
zyga-x240 | then | 12:18 |
zyga-x240 | install snapd using the snap binary from core snap | 12:19 |
zyga-x240 | that should make snapd snap not disabled anymore | 12:19 |
zyga-x240 | if that works | 12:19 |
zyga-x240 | I'd try to refresh core18 and see if it works | 12:19 |
zyga-x240 | I suspect it might just | 12:19 |
zyga-x240 | rogpeppe: this is your decision | 12:21 |
rogpeppe | zyga-x240: go for it | 12:21 |
rogpeppe | zyga-x240: i can easily manually reboot if needed | 12:21 |
rogpeppe | zyga-x240: this shouldn't corrupt the main data, right? | 12:21 |
pstolowski | rogpeppe: give me 30 minutes to finish the download :) | 12:21 |
zyga-x240 | rogpeppe: it's a separate partition | 12:22 |
zyga-x240 | rogpeppe: I think pawel asked you to perform those changes though, I think it's best to let him upload logs for forensics | 12:22 |
zyga-x240 | and for you two to agree as to who runs the commands | 12:22 |
zyga-x240 | so that it's not racy :) | 12:22 |
rogpeppe | i'm happy for pstolowski to do whatever's needed. i'm quite busy currently. | 12:22 |
zyga-x240 | ok | 12:23 |
zyga-x240 | pstolowski: it's up to us | 12:23 |
zyga-x240 | on the upside the bug may be fixed | 12:23 |
zyga-x240 | it's just prevented your device from refreshing | 12:23 |
pstolowski | yeah, that all makes sense | 12:24 |
pstolowski | but but | 12:24 |
pstolowski | why no current symlink? | 12:24 |
zyga-x240 | pstolowski: although understanding exactly how snapd became unlinked would be useful | 12:24 |
pstolowski | undo bug? | 12:24 |
zyga-x240 | I wonder what happens in that special boot code | 12:24 |
zyga-x240 | that reconciles what the system was booted with (Base name and rev) | 12:24 |
zyga-x240 | with what's in snapd state | 12:24 |
zyga-x240 | maybe that, when a skew is detected, somehow took out snapd snap? | 12:25 |
zyga-x240 | brb, afk for a moment | 12:26 |
zyga-x240 | or maybe until standup | 12:26 |
zyga-mbp | mvo could you please land https://github.com/snapcore/snapd/pull/9496 | 12:52 |
mup | PR #9496: spread-shellcheck: temporary workaround for deadlock, drop unnecessary test <Simple π> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9496> | 12:52 |
zyga-mbp | the store had some issues with searching, probably usual load or something like that, and this blocks master | 12:52 |
mvo | zyga-mbp: done | 12:56 |
zyga-mbp | thank you! | 12:56 |
mup | PR snapd#9496 closed: spread-shellcheck: temporary workaround for deadlock, drop unnecessary test <Simple π> <Created by bboozzoo> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9496> | 12:57 |
mborzecki | off to pick up my daughter from school | 13:47 |
zyga | pstolowski I'm hacking tests for this case but if you want to pair-program on fixing that device remotely, let me know | 13:54 |
pstolowski | zyga: that's actually a good idea. but i'm having a short lunch break now, how about in 30? | 13:55 |
zyga | no rush, I didn't have lunch yet either | 13:56 |
zyga | let me know when its comfortable for you | 13:56 |
mborzecki | re | 14:12 |
zyga | that fsck test is actually pretty cool | 14:13 |
pstolowski | zyga: i've a meeting in 10 minutes, how about afterwards? | 14:20 |
mvo | pstolowski: I wonder if this will happen, lot's of people declined | 14:21 |
zyga | pstolowski: how long is the meeting? | 14:21 |
pstolowski | mvo: ah ok | 14:21 |
zyga | pstolowski: I'm here for now, I can grab lunch and eat it next to a hangout | 14:21 |
pstolowski | zyga: ok, i'll know in a few moments if the meeting happens (likely not) | 14:27 |
zyga | pstolowski: ok, can we do it in 15 minutes | 14:29 |
zyga | I'm getting lunch in this instant | 14:29 |
zyga | see you at 16:45 | 14:31 |
mvo | pstolowski: noone in the call so far | 14:31 |
pstolowski | zyga: ok, works for me, see you soon | 14:34 |
zyga | back | 14:45 |
pstolowski | ok | 14:46 |
zyga | mvo, pedronis: pstolowski discovered the root cause | 15:06 |
pstolowski | bug report coming | 15:10 |
mvo | zyga: ohhh? | 15:12 |
* cachio lunch | 15:19 | |
zyga | mvo still in a call | 15:27 |
zyga | but vey unexpected | 15:27 |
zyga | *very | 15:27 |
zyga | okay, back | 15:51 |
zyga | mvo so two well defined bugs | 15:51 |
zyga | mvo one triggers the other | 15:51 |
zyga | mvo one breaks snapd refreshes | 15:51 |
zyga | other causes snapd to deactivate itself on failed refresh | 15:51 |
zyga | the root cause was session services | 15:51 |
zyga | and the fact that it silently depends on a specific revision of core/core18 | 15:52 |
zyga | we hit a EROFS and snapd refresh fails to proceed | 15:52 |
zyga | pawel is reporting both issues | 15:52 |
zyga | they are well defined and should be relatively easy to fix | 15:52 |
zyga | at least the breaking root cause | 15:52 |
zyga | rogpeppe: we fixed snapd on your system | 15:52 |
zyga | rogpeppe but we refrained from doing anything that would reboot the device | 15:53 |
zyga | rogpeppe we also fixed the boot partition | 15:53 |
zyga | rogpeppe with a bit of luck, you should be able to refresh core18 snap now | 15:53 |
zyga | rogpeppe and it should reboot successfully | 15:53 |
zyga | or if it does not, there are more bugs for us to find | 15:53 |
zyga | rogpeppe if you choose to refresh core18, do let us know what happened | 15:53 |
zyga | rogpeppe we also mounted the boot partition back, so it should be all right now | 15:53 |
zyga | mvo let me know if you want to talk about any details | 15:54 |
* mvo is in a meeting | 15:54 | |
zyga | ack | 15:56 |
pstolowski | mvo, zyga https://bugs.launchpad.net/snapd/+bug/1899664 and https://bugs.launchpad.net/snapd/+bug/1899665 | 16:01 |
mup | Bug #1899664: snapd refresh on old core18 fails due to read-only /etc/dbus-1/session.d <snapd:New> <https://launchpad.net/bugs/1899664> | 16:01 |
mup | Bug #1899665: Failed refresh of snapd drops current symlink on failure <snapd:New> <https://launchpad.net/bugs/1899665> | 16:01 |
pstolowski | pedronis: ^ | 16:02 |
zyga | pstolowski \o/ | 16:03 |
zyga | thank you! | 16:04 |
mvo | pstolowski: without having looked at those, how hard do you think this is to fix? could we work on this as a blue item realtively soon? | 16:04 |
rogpeppe | zyga: thanks! | 16:04 |
mvo | and thanks zyga and pstolowski (and rogpeppe of course!) | 16:04 |
rogpeppe | zyga: how would i go about refreshing the core18 snap? | 16:04 |
zyga | rogpeppe snap refresh core18 | 16:05 |
rogpeppe | zyga: ok, i'll try that. i.e. run that command, then `sudo reboot now`, right? | 16:06 |
zyga | no need, it will reboot itself | 16:06 |
pstolowski | mvo: yes i will work on them, r/o filesystem will be easy, not sure about the one re symlink, but probably not too complicated either | 16:06 |
zyga | rogpeppe note that the way we fixed your system is ephemeral | 16:06 |
zyga | and if core18 fails to refresh | 16:06 |
zyga | snapd will break itself again | 16:06 |
zyga | that is why I wanted to know that this works or fails once the reboot completes | 16:07 |
zyga | rogpeppe oh, and we re-started the hydroctl service as well | 16:07 |
rogpeppe | zyga: yay! | 16:07 |
rogpeppe | zyga: it's working! | 16:08 |
pstolowski | rogpeppe: feel free to remove my ssh access when you confirm core18 works | 16:08 |
zyga | mvo: interestingly, snapd will hit this case in a normal way | 16:09 |
zyga | that I did not think about before | 16:09 |
zyga | pstolowski we are here because that device has long refresh window | 16:09 |
zyga | it will refresh infrequently | 16:09 |
zyga | and when it does | 16:09 |
zyga | it refreshes snapd first | 16:09 |
zyga | so I think the severity of https://bugs.launchpad.net/snapd/+bug/1899664 should be increased | 16:09 |
mup | Bug #1899664: snapd refresh on old core18 fails due to read-only /etc/dbus-1/session.d <snapd:Triaged> <https://launchpad.net/bugs/1899664> | 16:09 |
zyga | it's not such a special case after all | 16:10 |
pstolowski | yeah i didn't set sev yet | 16:10 |
pstolowski | +1 | 16:10 |
* mvo hugs pstolowski | 16:13 | |
* mvo hugs zyga too | 16:13 | |
pstolowski | was a collective effort really | 16:13 |
rogpeppe | zyga: after "snap refresh core18": | 16:13 |
rogpeppe | error: cannot perform the following tasks: | 16:13 |
rogpeppe | - Make current revision for snap "core18" unavailable (cannot set next boot: cannot determine bootloader) | 16:13 |
rogpeppe | - Make snap "core18" (1889) available to the system (cannot set next boot: cannot determine bootloader) | 16:13 |
pstolowski | mvo: eod for today, i'll work on fixes tomorrow morning | 16:13 |
mvo | pstolowski: sure thing | 16:14 |
pstolowski | ooh woot | 16:14 |
pstolowski | looks like i'll log in there again | 16:14 |
pstolowski | core18 20190723 1076 latest/stable canonicalβ base,disabled | 16:15 |
pstolowski | it did the same for core18 | 16:15 |
pstolowski | it is disabled now | 16:15 |
zyga | rogpeppe oohh | 16:18 |
zyga | rogpeppe thank you, we will look again tomorrow | 16:18 |
rogpeppe | zyga: ok, thanks | 16:18 |
zyga | rogpeppe this device, while very unfortunate for you, will really help make snapd more robust | 16:18 |
rogpeppe | zyga: i hope so :) | 16:19 |
pstolowski | looks like an error on 'Make snap ... available to the system" results in wrong undo, this is what i see for core18 and the same happened for snapd | 16:22 |
pstolowski | but yes, i'll continue tomorrow | 16:22 |
pstolowski | cu | 16:23 |
* zyga-x240 works on the test | 16:47 | |
zyga-x240 | so cool things | 16:48 |
zyga-x240 | this is my favourite test! | 16:48 |
* zyga runs the 2nd fsck test | 18:27 | |
zyga | ... | 18:42 |
zyga | rebooting | 18:42 |
zyga | I mean in the test | 18:42 |
* cachio doctor appointment | 18:44 | |
zyga | o/ | 18:47 |
zyga | more iterations | 19:16 |
* zyga goes to do some evening housework while tests run | 19:18 | |
zyga | woot | 19:24 |
zyga | tests pass | 19:24 |
zyga | OMG THE RAIN | 19:45 |
zyga | my dog decided to have a slow long walk | 19:46 |
zyga | I'm so wet | 19:46 |
zyga | trying core fsck test now | 19:49 |
zyga | I'll push both tests at once | 19:49 |
zyga | https://github.com/snapcore/snapd/pull/9446 needs review | 19:50 |
mup | PR #9446: overlord,usersession: initial notifications of pending refreshes <Created by zyga> <https://github.com/snapcore/snapd/pull/9446> | 19:50 |
zyga | https://github.com/snapcore/snapd/pull/9422 needs review as well and is short | 19:50 |
mup | PR #9422: overlord: add link participant for linkage transitions <Needs Samuele review> <Created by zyga> <https://github.com/snapcore/snapd/pull/9422> | 19:50 |
* zyga shower | 19:56 | |
zyga | 16 passes | 20:09 |
zyga | 18 and 20 are in progress | 20:09 |
* zyga tea | 20:09 | |
zyga | amurray: oh is it morning for you? :) | 20:14 |
zyga | 18 passes | 20:21 |
zyga | now just 20 | 20:21 |
zyga | actually making tea :) | 20:21 |
zyga-x240 | core20 takes foreeeever to test boot | 20:44 |
zyga-x240 | but close | 20:44 |
mup | PR snapcraft#3315 opened: build(deps-dev): bump junit from 3.8.1 to 4.13.1 in /tests/spread/plugins/v1/maven/snaps/maven-hello/my-app <dependencies> <java> <Created by dependabot[bot]> <https://github.com/snapcore/snapcraft/pull/3315> | 20:45 |
mup | PR snapcraft#3316 opened: build(deps-dev): bump junit from 3.8.1 to 4.13.1 in /tests/spread/plugins/v1/maven/snaps/legacy-maven-hello/my-app <dependencies> <java> <Created by dependabot[bot]> <https://github.com/snapcore/snapcraft/pull/3316> | 20:45 |
zyga-x240 | core20 should be good, running a clean test now | 21:03 |
* zyga-x240 pushed https://github.com/snapcore/snapd/pull/9499 and EODs | 21:31 | |
mup | PR #9499: tests: add tests for fsck <Created by zyga> <https://github.com/snapcore/snapd/pull/9499> | 21:31 |
zyga-x240 | xnox: ^ perhaps you know what is responsible for fsck.vfat in uc20 | 21:31 |
zyga-x240 | this test shows that ubuntu-seed stays corrupted across reboot, a regression compared to core 16 and core 18 | 21:32 |
* zyga-x240 EODs | 21:33 | |
zyga-x240 | xnox: if you have feedback please comment on the PR, I'll review that tomorrow | 21:33 |
mup | PR snapd#9499 opened: tests: add tests for fsck <Created by zyga> <https://github.com/snapcore/snapd/pull/9499> | 21:35 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!