[00:09] <mup> PR snapd#9495 opened: logger: use KernelCommandLineSplit to parse debug flag <Simple 😃> <Created by cmatsuoka> <https://github.com/snapcore/snapd/pull/9495>
[03:05] <mup> PR snapd#9495 closed: logger: use KernelCommandLineSplit to parse debug flag <Simple 😃> <Created by cmatsuoka> <Closed by cmatsuoka> <https://github.com/snapcore/snapd/pull/9495>
[05:35] <zyga-x240> o/
[05:45] <mup> PR snapd#9494 closed: logger: use strutil.KernelCommandLineSplit in debugEnabledOnKernelCmdline <Simple 😃> <Skip spread> <Created by mvo5> <Merged by bboozzoo> <https://github.com/snapcore/snapd/pull/9494>
[05:51] <mborzecki> morning
[05:51] <zyga-x240> mborzecki: o/
[05:51] <mborzecki> zyga-x240: hey
[05:51] <zyga-x240> mborzecki: I think we should revert bits of the speedup changes
[05:51] <zyga-x240> it's been hanging yesterday
[05:51] <zyga-x240> or investigate and fix
[05:51] <zyga-x240> it may be a python-version-specific bug or just a general bug
[05:52] <mborzecki> hmmm
[05:53] <mborzecki> zyga-x240: did the change where we respect workers count land?
[05:53] <zyga-x240> I think so
[05:54] <zyga-x240> afk for some time, lucy just woke up
[06:31] <mup> PR snapd#9480 closed: snap: support different exit-code in the snap command <Created by mvo5> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9480>
[06:36] <zyga-x240> mvo: we need to fix spread-shellcheck
[06:38] <mborzecki> zyga-x240: tests/unit/go hangs on 20.04 too right?
[06:39] <mvo> zyga-x240, mborzecki yeah, something is wrong, I was just startng a spread google run to see what is going on. do you have an idea already?
[06:42] <zyga-x240> mborzecki: not sure
[06:42] <zyga-x240> mborzecki: I just don't know
[06:42] <zyga-x240> mvo: not immediately, either bug in older python (related to recursive executor submit) or something else
[06:43] <mborzecki> tryign with --max-procs=2 might be useful
[06:46] <mborzecki> zyga-x240: otoh, the unit tests job as run by gh actions does not seem to fail or hang for that matter
[06:46] <zyga-x240> yeah, that's what makes me think it may be python version-specific behavior
[06:48] <zyga> I'm finishing my breakfast now, I'll start in a moment
[06:49] <mborzecki> zyga: looking at one of my PRs, it failed on 20.04, failed on 18.04 (though tests/unit/spread-shellcheck test) hit kill-timeout in both cases, 16.04 passed
[06:49] <zyga> heh
[06:49] <zyga> it's all over the place
[06:49] <zyga> shall we revert and fix this async?
[06:50] <zyga> I'd rewrite the code so that it returns todo units like what jamesh suggested
[06:50] <zyga> then it's one loop and one executor only
[06:50] <mborzecki> zyga: sounds like more work though ;)
[06:50] <zyga> yes
[06:50] <mborzecki> zyga: anyways, i think we should revert that last change
[06:50] <zyga> that's why separate actions 1) revert 2) fix
[06:51] <zyga> mvo how does that sound?
[06:51] <mborzecki> also tests/unit/spread-shellcheck duplicates work right now, run-checks is run in tests/unit/go
[06:51] <mborzecki> so either we can disable spread-shellcheck in tests/unit/go or drop the other test
[06:51] <mvo> mborzecki: I think we can kill tests/unit/spread-shellcheck now
[06:52] <mborzecki> mvo: sgtm
[06:54] <mborzecki> running manually now, those spread nodes have 1 cpu
[06:55] <mborzecki> zyga: heh, so some deadlock, a number of jobs submitted, nothing happening, cpu usage 0%
[06:55] <zyga> backtrace!
[06:55] <zyga> that's quick
[06:57] <pstolowski> good morning!
[06:58] <zyga-x240> pstolowski: hello
[06:58] <mborzecki> zyga-x240: https://paste.ubuntu.com/p/ZqDTCcFvnb/ heh (cc mvo)
[06:58] <mborzecki> pstolowski: hey
[06:58] <mborzecki> zyga-x240: only 2 threads and both are waiting
[06:59] <zyga> interesting
[06:59] <zyga> and good idea to use gdb!
[07:00] <zyga> so one is running checkpaths
[07:00] <zyga> going through each location
[07:01] <zyga> while the other is in checkfile
[07:01] <zyga> waiting for the result
[07:01] <zyga> yeah
[07:01] <zyga> mborzecki perhaps just ensuring we have 3 workers miniumum :P
[07:01] <zyga> a bit lame but ...
[07:01] <zyga> (as in the minimum N)
[07:01] <zyga> mborzecki what do you think?
[07:05] <mborzecki> zyga: --max-procs 3 seems to work
[07:06] <jamesh> In the end, you really want the code waiting for futures moved outside of the thread pool
[07:07] <zyga> yeah, I think that's the proper solution
[07:11] <jamesh> next step: overengineer it with asyncio
[07:14] <zyga> jamesh no no ;)
[07:14] <mborzecki> haha
[07:15] <jamesh> zyga: but just think of all the threads you'd save!
[07:16] <mvo> mborzecki: nice find!
[07:16] <mborzecki> jamesh: save all the threads!
[07:21] <mborzecki> hmm so maybe we just need +1 worker really
[07:22] <zyga> mborzecki I think that's the easy fix
[07:22] <zyga> and we should rewrite that slightly as jamesh mentioned
[07:22] <mborzecki> zyga: i'd land an easy fix with a comment, and then maybe work on the larger fix/refactor
[07:23] <zyga> +1
[07:23] <mvo> +1
[07:24] <mborzecki> zyga: fwiw i can reproduce this locally with --max-procs=1
[07:24] <zyga> yeah, my fault for testing on my beefiest system
[07:41] <mborzecki> zyga: are you opening a quick pr with the workaround?
[07:41] <zyga> mborzecki nope, I thought you want to do that
[07:41] <zyga> I can though
[07:41] <mborzecki> zyga: no worries, i can push it
[07:41] <zyga> thanks!
[07:42] <pstolowski> mvo: hi! #8960 got +1 from Samuele and has 3 reviews; would be great to land at the most convienient moment after cutting a new release branch. perhaps it would make sense to squash-merge it in case of anything unexpected (and a need for a revert)?
[07:42] <mup> PR #8960: o/snapstate,servicestate: use service-control task for service actions (9/9) <Needs Samuele review> <Services ⚙️> <⛔ Blocked> <Created by stolowski> <https://github.com/snapcore/snapd/pull/8960>
[07:47] <zyga-x240> I think I need to solve the blockers on https://github.com/snapcore/snapd/pull/9204
[07:47] <mup> PR #9204: sandbox: track applications unconditionally <Created by zyga> <https://github.com/snapcore/snapd/pull/9204>
[07:47] <zyga-x240> as that's really required to enable r-a-a
[07:50] <zyga-x240> mborzecki: IIRC fedora will disable getenforce/setenforce soon
[07:50] <zyga-x240> how will that affect our test suite?
[07:50] <mborzecki> zyga-x240: hm, got more info?
[07:50] <zyga-x240> mborzecki: one sec
[07:51] <zyga-x240> https://lwn.net/Articles/831748/
[07:51] <mup> PR snapd#9496 opened: spread-shellcheck: temporary workaround for deadlock, drop unnecessary test <Simple 😃> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9496>
[07:52] <mborzecki> zyga-x240: it's ok for us, we only switch between permissive/enforcing
[07:57] <mborzecki> zyga-mbp: mvo: ^^ 9496
[08:00] <mvo> mborzecki: looking
[08:01] <mup> PR snapd#9497 opened: Have session agent connect to the D-Bus session bus <Created by jhenstridge> <https://github.com/snapcore/snapd/pull/9497>
[08:03] <jamesh> zyga-x240: ^^ this PR might help out a bit with your notifications work.
[08:04] <zyga-x240> jamesh: interesting
[08:04] <zyga-x240> I was using the other socket for broadcast but this may be useful as well
[08:05] <jamesh> other socket?
[08:05] <zyga-x240> jamesh: the snapd-user-agent socket
[08:05] <zyga-x240> not dbus :)
[08:06] <zyga-x240> why do we have to be launched by systemd?
[08:06] <zyga-x240> ah
[08:06] <jamesh> zyga-x240: we're already launched by systemd
[08:06] <zyga-x240> that's a dbus service
[08:06] <zyga-x240> got confused for a sec
[08:06] <jamesh> we want to make sure that if we get activated via D-Bus first, we still get our file descriptor
[08:06] <jamesh> for REST
[08:07] <zyga-x240> mmm
[08:07] <jamesh> I'm not suggesting replacing the snapd <-> agent communication with D-Bus
[08:07] <jamesh> this would be for a return path of the agent <-> desktop shell communication
[08:07] <zyga-x240> right
[08:10] <zyga-x240> jamesh: I've sent a quick review just now
[08:22] <zyga-x240> something weird on centos 7
[08:30] <zyga-x240> hmm
[08:30] <zyga-x240> type=SYSCALL msg=audit(10/13/20 08:11:54.336:587) : arch=x86_64 syscall=kill success=yes exit=0 a0=0xffffffffffffa7e8 a1=SIGKILL a2=0x0 a3=0xf91e50 items=0 ppid=1 pid=22326 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=snapd exe=/usr/libexec/snapd/snapd subj=system_u:system_r:snappy_t:s0 key=(null)
[08:30] <zyga-x240> type=AVC msg=audit(10/13/20 08:11:54.336:587) : avc:  denied  { sigkill } for  pid=22326 comm=snapd scontext=system_u:system_r:snappy_t:s0 tcontext=system_u:system_r:snappy_cli_t:s0 tclass=process permissive=1
[08:31] <zyga-x240> type=SYSCALL msg=audit(10/13/20 08:01:54.377:396) : arch=x86_64 syscall=connect success=yes exit=0 a0=0x8 a1=0xc000320f10 a2=0x22 a3=0x4 items=0 ppid=22326 pid=22552 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=(none) ses=unset comm=snap exe=/usr/bin/snap subj=system_u:system_r:snappy_cli_t:s0 key=(null)
[08:33] <zyga-x240> mborzecki: do you know how to recover from: Corrupted checkpoint file. Inode match, but newer complete event (1602577959.287:791) found before loaded checkpoint 1602577772.146:790
[08:33] <zyga-x240> that's from ausearch
[08:34] <zyga-x240> nvm I got it
[08:34] <zyga-x240> so
[08:34] <zyga-x240> type=AVC msg=audit(10/13/20 08:32:39.287:791) : avc:  denied  { connectto } for  pid=23299 comm=snap path=/run/dbus/system_bus_socket scontext=system_u:system_r:snappy_cli_t:s0 tcontext=system_u:system_r:system_dbusd_t:s0-s0:c0.c1023 tclass=unix_stream_socket permissive=1
[08:34] <zyga-x240> type=AVC msg=audit(10/13/20 08:32:39.287:791) : avc:  denied  { search } for  pid=23299 comm=snap name=dbus dev="tmpfs" ino=13502 scontext=system_u:system_r:snappy_cli_t:s0 tcontext=system_u:object_r:system_dbusd_var_run_t:s0 tclass=dir permissive=1
[08:34] <zyga-x240> those are the first two denials to fix
[08:45]  * zyga-x240 tries to adjust the policy
[08:57] <mborzecki> zyga-x240: did you manage to fix it?
[08:57] <mborzecki> (the policy i mean)
[08:58] <zyga-x240> mborzecki: I'm slow at iterating at this
[08:58] <zyga-x240> not yet
[08:59] <zyga-x240> mborzecki: should we bump the snapd policy version perhaps? :)
[08:59] <zyga-x240> maybe it should match snapd version
[09:00] <mborzecki> zyga-x240: try with dbus_stream_connect_system_dbusd(snappy_cli_t) and dbus_chat_system_bus(snappy_cli_t)
[09:00] <mborzecki> zyga-x240: perhaps we should, though we never did
[09:01] <mborzecki> zyga-x240: setting it to version of snapd sounds ok
[09:01] <zyga-x240> thanks, trying
[09:02] <mborzecki> zyga-x240: the modules in core policy ahve a version bump on each change (in theory)
[09:02] <zyga-x240> mm
[09:02] <zyga-x240> I suspect nothing just cares about this number
[09:02] <zyga-x240> just an observation, it's not important
[09:03] <mborzecki> zyga-x240: there maybe some automation under the hood, in our development we replace the module by hand, so it automatically gets a higher priority than the currently loaded one
[09:09] <zyga-x240> mborzecki: I'll read https://github.com/SELinuxProject/refpolicy/blob/master/policy/modules/services/dbus.te and friends to see if there's something we should use
[09:12] <mborzecki> haha
[09:12] <mborzecki> zyga-x240: though you really want to read this: https://github.com/SELinuxProject/refpolicy/blob/master/policy/modules/services/dbus.if
[09:13] <zyga-x240> (I meant all three)
[09:13] <mborzecki> zyga-x240: the *.te (type enforcement?) file is for the actual system dbus daemon, *.fc (file context?) is the files/sockets/dirs
[09:13] <mborzecki> zyga-x240: *.if is the interfaces for use from other modules
[09:13] <zyga-x240> mmm
[09:22] <zyga-x240> mborzecki: is all the comment XML in those .if files processed by anything?
[09:22] <zyga-x240> is there a "compiled" version anywhere?
[09:23] <mborzecki> zyga-x240: yes, there should be documentation in your system, though i don't think is avaialble anywhere online
[09:23] <zyga-x240> ah, ok
[09:23] <zyga-x240> I'll check on F32
[09:23] <mborzecki> zyga-x240: latest tech :P
[09:28] <zyga-x240> mborzecki: more denials: https://pastebin.ubuntu.com/p/5K5TTpKwkT/
[09:28] <zyga-x240> I think the blocker is type=AVC msg=audit(10/13/20 09:16:35.588:270) : avc:  denied  { search } for  pid=22502 comm=snap name=dbus dev="tmpfs" ino=13389 scontext=system_u:system_r:snappy_cli_t:s0 tcontext=system_u:object_r:system_dbusd_var_run_t:s0 tclass=dir permissive=1
[09:29] <zyga-x240> but the rest is also interesting, it seems snapd cannot stop the hook
[09:29] <zyga-x240> (after a timeout)
[09:29] <mborzecki> yeah, sigkill and all
[09:30] <mborzecki> zyga-x240: dbus_system_bus_client(snappy_cli_t) should do it
[09:30] <zyga-x240> yep
[09:31] <mup> PR snapd#9293 closed: snap: auto-import will not try to auto-create users on managed devices <Needs Samuele review> <Created by mvo5> <Closed by mvo5> <https://github.com/snapcore/snapd/pull/9293>
[09:31] <mup> PR snapd#9498 opened: client,daemon,snap: auto-import does not error on managed devices <Needs Samuele review> <Run nested> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9498>
[09:32] <mborzecki> zyga-x240: and you can probably drop dbus_connect_system_bus(snappy_cli_t) looks like it duplicates some of the things from *_bus_client
[09:32] <zyga-x240> ok
[09:38] <pedronis> #9463 needs 2nd reviews (it's small)
[09:38] <mup> PR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463>
[09:40] <zyga-x240> ack
[09:44] <zyga-x240> https://github.com/snapcore/snapd/pull/9463#pullrequestreview-507255123
[09:44] <mup> PR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463>
[09:44] <zyga-x240> made one remark, so perhaps something to adjust before landing
[09:45] <zyga-x240> mborzecki: nice, passing
[09:45] <zyga-x240> mborzecki: we need session equivalent though
[09:46] <zyga-x240> I don't see any in the refpolicy gh repo
[09:46] <zyga-x240> mborzecki: rebased and pushed back to https://github.com/snapcore/snapd/pull/9204
[09:46] <mup> PR #9204: sandbox: track applications unconditionally <Created by zyga> <https://github.com/snapcore/snapd/pull/9204>
[09:55] <zyga-x240> pedronis: should I push a trivial patch for https://github.com/snapcore/snapd/pull/9463
[09:55] <mup> PR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463>
[09:55] <zyga-x240> er
[09:55] <zyga-x240> for https://github.com/snapcore/snapd/pull/9463#discussion_r503813935
[09:55] <pedronis> zyga-x240: if you have time yes
[09:56] <zyga-x240> on it
[09:56] <pedronis> mvo: should we close #8845 and #8929 until we have time to discuss/adjust them and reprose?
[09:56] <mup> PR #8845: [RFC] many: add "system.service.snapd-autoimport.disable" setting <Needs Samuele review> <⛔ Blocked> <Created by mvo5> <https://github.com/snapcore/snapd/pull/8845>
[09:56] <mup> PR #8929: [RFC] many: add new "daemon-startup: inhibit" option <Needs Samuele review> <Created by mvo5> <https://github.com/snapcore/snapd/pull/8929>
[09:57] <pedronis> *re-propose
[09:57] <mvo> pedronis: sure
[09:58] <pedronis> thx
[09:58] <mvo> closed
[10:01] <zyga-x240> pushed
[10:01] <zyga-x240> https://github.com/snapcore/snapd/pull/9463/commits/26f6b6680027ea9b1262a03b27a28ac6f3d60a9e if anyone wants to cross-check
[10:01] <mup> PR #9463: seed/seedwriter/writer.go: check DevModeConfinement for dangerous features <Bug> <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9463>
[10:02] <mup> PR snapd#8845 closed: [RFC] many: add "system.service.snapd-autoimport.disable" setting <Needs Samuele review> <⛔ Blocked> <Created by mvo5> <Closed by mvo5> <https://github.com/snapcore/snapd/pull/8845>
[10:14] <zyga-x240> mborzecki: any idea on how to provide sigkill and other permissions
[10:14] <zyga-x240> I think we're missing a test
[10:14] <zyga-x240> that we can kill a hook that is running for too long
[10:14] <zyga-x240> (or miss a test that runs selinux checks)
[10:14] <zyga-x240> mborzecki: do you think we could move the "no denials" check to an invariant?
[10:19] <mborzecki> zyga-x240: hm this should do it `allow snappy_t snappy_cli_t:process { sigkill };`
[10:19] <mborzecki> zyga-x240: wondering though, why the hook process was still snappy_cli_t
[10:19] <mborzecki> can you reproduce that and grab ps -Z ?
[10:19] <zyga-x240> oh
[10:19] <zyga-x240> I will try
[10:19] <zyga-x240> sure
[10:21] <zyga-x240> ah I know why
[10:21] <zyga-x240> mborzecki: because that was still the "snap run" phase
[10:21] <zyga-x240> we got a denialy on dbus method call
[10:21] <zyga-x240> and got stuck waiting for a response
[10:22] <zyga-x240> mborzecki: I wonder if we should setup a timeout / guard of some sort
[10:22] <zyga-x240> but that explains the label
[10:22] <zyga-x240> we switch that in snap-exec
[10:22] <zyga-x240> er
[10:22] <zyga-x240> snap-confine
[10:25] <zyga-x240> mborzecki: what is the label we transition to when we run as a snap app?
[10:25] <zyga-x240> I added this
[10:25] <zyga-x240> # snapd tries to kill hooks that run for over 10 minutes.
[10:25] <zyga-x240> allow snappy_t snappy_cli_t:process { sigkill };
[10:25] <mborzecki> zyga-x240: if it's under systemd the final label is unconfined_service_t
[10:25] <zyga-x240> but I think we need more than that
[10:26] <zyga-x240> not under systemd
[10:26] <zyga-x240> those are specifically hooks
[10:26] <zyga-x240> systemd just tracks them
[10:26] <zyga-x240> not spawns them
[10:26] <mborzecki> zyga-x240: right, but the transitions take the same route iirc
[10:26] <mborzecki> zyga-x240: so the hook ends up as unconfined service too
[10:28] <zyga-x240> oh
[10:28] <zyga-x240> ok, that's good then
[10:28] <zyga-x240> I'll add two lines
[10:29]  * zyga-x240 tests killing w
[10:30] <zyga-x240> with disabled dbus perms
[10:38] <pstolowski> pedronis: hi, i've spent quite a bit investigation undo on remove, found a couple of issues and filed https://bugs.launchpad.net/snapd/+bug/1899614 (and also worked on addressing point #2 there)
[10:38] <mup> Bug #1899614: multiple problems with undo for 'snap remove'  <snapd:New> <https://launchpad.net/bugs/1899614>
[10:40] <pstolowski> *investigating
[11:01] <zyga-x240> pstolowski: rogpeppe is around and could provide extra information about the other bug we've discussed
[11:02] <pstolowski> zyga-x240: thanks, but i don't have anything specific atm, but mvo had requested some info there before, would be good to add it if possible
[11:02] <pstolowski> rogpeppe ^
[11:03] <rogpeppe> pstolowski: ok, i've added that command output
[11:03] <zyga-x240> thank you!
[11:04] <pstolowski> thanks!
[11:04] <rogpeppe> i don't seem to get email notifications from launchpad, so please ping me here if you want any further interaction from me in that issue, thanks!
[11:05] <zyga-x240> ah, will do
[11:06] <pstolowski> ok, that log is very interesting.. we saw " there was a rollback across reboot" before, didn't we?
[11:07] <zyga-x240> aha
[11:08] <pstolowski> this at least explains the refresh and undo on Oct 5 that I also saw in the state timings; but what happened after next 2 reboots and why did it work till Oct 7th is a mistery
[11:08] <zyga-x240> pstolowski: I don't recall how this layer works, what would happen if the boot partition is corrputed (vfat) and we cannot really set the next boot to anything different
[11:08] <zyga-x240> like it's always stuck at one thing
[11:08] <zyga-x240> would that explain anything?
[11:08] <pstolowski> i've no idea about that code
[11:08] <zyga-x240> rogpeppe: I think you could try unmounting the boot partition (vfat) and fscking it
[11:08] <zyga-x240> rogpeppe: also if you want to recover the system, we should be able to help
[11:08] <zyga-x240> just not sure what's the best way to do that
[11:09] <zyga-x240> I was also talking to mborzecki about this issue
[11:09] <zyga-x240> and we don't remove snapd.service from disk
[11:09] <zyga-x240> rogpeppe: could you check if snapd.service is in /etc/systemd/system?
[11:09] <pstolowski> afaiu the system works again, no? logs from oct12
[11:09] <zyga-x240> rogpeppe: oh, is snapd running now?
[11:10] <pstolowski> maybe i'm making assumptions.. but afaiu it stopped working on 7th?
[11:10] <rogpeppe> one mo, let me check
[11:10] <rogpeppe> the system isn't working currently
[11:11] <rogpeppe> snapd does seem to be running:
[11:11] <rogpeppe> rogpeppe@localhost:~$ ps alxw | grep snapd
[11:11] <rogpeppe> 4     0   725     1  20   0 928048 18712 -      Ssl  ?          2:57 /snap/snapd/9169/usr/lib/snapd/snapd
[11:11] <zyga-x240> !
[11:11] <pstolowski> rogpeppe: ha, that's weird!
[11:11] <zyga-x240> can you run /snap/snapd/9169/usr/bin/snap?
[11:11] <pstolowski> rogpeppe: and no /snap/snapd/current symlink right?
[11:11] <rogpeppe>  /snap/snapd/current is still not there, right
[11:12] <rogpeppe> i can run /snap/snapd/9169/usr/bin/snap ok
[11:12] <zyga-x240> pstolowski: ^ and ideas on what to explore?
[11:12] <zyga-x240> rogpeppe: maybe snap list --all
[11:12] <zyga-x240> for a first sanity check
[11:12] <rogpeppe> i could give you a login to the system if you want
[11:12] <zyga-x240> then maybe snap install snapd?
[11:12] <zyga-x240> pstolowski: do you want to debug this?
[11:14] <rogpeppe> this is the output of snap list --all: https://paste.ubuntu.com/p/Rwvr2dPD9M/
[11:14] <pstolowski> rogpeppe: yes, sure, that would be great
[11:15] <zyga-x240> core           16-2.46.1           9995  latest/stable     canonical*    core
[11:15] <zyga-x240> core is linked
[11:15] <pstolowski> do we do anything magical where snapd would start without current symlink after rebooot? (don't think so, but...)
[11:15] <zyga-x240> but the boot base is core18, right?
[11:15] <zyga-x240> rogpeppe: what's /meta/snap.yaml snap name?
[11:15]  * rogpeppe tries to remember how to grant ssh access. so rusty!
[11:15] <zyga-x240> pstolowski: I don't think we do
[11:15] <zyga-x240> rogpeppe: you can try ssh-import-id
[11:15] <zyga-x240> not sure if it's preinstalled
[11:15] <rogpeppe> that's the command i was trying to remember!
[11:16] <zyga-x240> cool :)
[11:16] <zyga-x240> mvo: ^ very interesting bug
[11:16] <zyga-x240> lots for us to learn on robustness
[11:16] <rogpeppe> pstolowski: is your launchpad username pstolowski ?
[11:17] <rogpeppe> ok, no ssh-import-id command
[11:17] <pstolowski> rogpeppe: 1 sec, i will import ssh keys from my current box
[11:17] <rogpeppe> :thumbsup:
[11:19] <mvo> zyga-x240: is there more news?
[11:19] <zyga-x240> mvo: yes, we have access to the device
[11:19] <zyga-x240> snapd is disabled!
[11:20] <mvo> woah, how did that happen :( ?
[11:20] <zyga-x240> the boot partition is still corrputed I suspect
[11:20] <pstolowski> rogpeppe: my lp user is stolowski
[11:20] <zyga-x240> we can get all the logs
[11:20] <mvo> \o/
[11:20] <zyga-x240> mvo: missing undo in unlink snap, I bet
[11:20] <zyga-x240> but could be something much more complex
[11:20] <pstolowski> i'm not sure it's this zyga-x240
[11:20] <pstolowski> but cannot exclude it of course
[11:21] <mvo> thank you so much rogpeppe !
[11:21] <zyga-x240> pstolowski: ack
[11:21] <pstolowski> mvo, zyga-x240 any suggestions what to collect? anything regarding boot?
[11:21]  * mvo hugs pstolowski and zyga-x240 for their tireless debugging also
[11:21] <rogpeppe> rogpeppe@localhost:~/.ssh$ ed
[11:21] <rogpeppe> -bash: ed: command not found
[11:21] <rogpeppe> dammit!
[11:21] <zyga-x240> pstolowski: maybe to be safe collect all journal logs
[11:21] <mvo> +1
[11:21] <zyga-x240> rogpeppe: vi is there
[11:21] <zyga-x240> rogpeppe: you can also echo >>
[11:21] <rogpeppe> yeah, i'll use cat
[11:21] <rogpeppe> i don't use an ANSI terminal so vi isn't good for me
[11:23] <pstolowski> zyga-x240: is tar /var/log.. good enough, or is there a better way?
[11:23] <pstolowski> thanks rogpeppe, checking
[11:23] <zyga-x240> I think that's good
[11:23] <zyga-x240> pstolowski: you can use journalctl with a standlone directory to examine machine logs without journald itself
[11:24] <pstolowski> zyga-x240: yeah i did it once.. slightly inconvinent but works
[11:25] <zyga-x240> pstolowski: journalctl -D /path/to/var/log/journal
[11:25] <zyga-x240> and then it works IIRC
[11:31] <zyga-x240> mborzecki: is spread-shellcheck fixed?
[11:31] <zyga-x240> ah
[11:31] <zyga-x240> I see the PR
[11:31] <zyga-x240> thanks!
[11:32] <zyga-x240> approved
[11:36] <amurray> zyga-x240: hey so you mentioned re docker-support/multipass-support being broken with aa3 on groovy that you would prefer a snap-update-ns approach - can you elaborate more on what you are thinking here? I am not sure I understand exactly what you have in mind.
[11:36] <zyga-x240> sure
[11:36] <zyga-x240> I was thinking that the special interfaces they rely on could provide a mount profile that puts the base snap's apparmor config in /etc
[11:37] <pstolowski> mvo: anything re boot env that can be useful?
[11:37] <zyga-x240> something like mount --bind /snap/core18/current/etc/apparmor.d /etc/apparmor.d
[11:37] <zyga-x240> pstolowski: if you can, try fscking the boot partition
[11:37] <zyga-x240> or
[11:37] <zyga-x240> dd it
[11:37] <zyga-x240> to analyze post-mortem
[11:37] <zyga-x240> you may want to flip it read only for that operation
[11:37] <zyga-x240> or unmount it
[11:38] <zyga-x240> do you remember that vfat bug we ran into before?
[11:38] <amurray> zyga-x240: ok so is this already easily possible with the existing way that interfaces are declared? I am not super familiar with that...
[11:38] <zyga-x240> amurray: I believe it should, the only thin that would be required in addition to this, is the permission for snap-update-ns to do this as well
[11:38] <zyga-x240> amurray: if that's urgent I could look
[11:38] <zyga-x240> amurray: but do look at the mount profile part
[11:38] <zyga-x240> the apparmor part should be easy once that is in the works
[11:39] <zyga-x240> you can test this by making a snap that uses the new interface (or the vanilla original snaps)
[11:39] <zyga-x240> and looking at the generated mount profile in /var/lib/snapd/apparmor/mount/
[11:39] <zyga-x240> there are some examples
[11:39] <zyga-x240> for instance, the desktop interface uses this mechanism to bind mount fonts around
[11:39] <amurray> zyga-x240: oh can you point me at examples, I am still confused 😕
[11:39] <zyga-x240> sure
[11:39] <amurray> ah ok will take a look
[11:40] <zyga-x240> amurray: in the snapd tree please look at interfaces/builtin/desktop.go
[11:40] <zyga-x240> let me open it as well
[11:40] <amurray> yep am just looking now
[11:40] <pstolowski> zyga-x240: i'd rather avoid any potentially destructive steps atm, would leave that to rogpeppe
[11:40] <zyga-x240> pstolowski: ok, a dd of the vfat while it is mounted would be useful as well
[11:40] <zyga-x240> even if you just stash it on the device
[11:40] <zyga-x240> not sure how large it is
[11:40] <zyga-x240> amurray: so if you scroll to line 295
[11:40] <amurray> zyga-x240: I am guessing that AddMountEntry() would be the thing?
[11:41] <zyga-x240> you can see how it grants apparmor permissions
[11:41] <pstolowski> dd of /boot partition? nb, logs will be huuge
[11:41] <zyga-x240> there are several profiles involved
[11:41] <zyga-x240> pstolowski: the vfat
[11:41] <zyga-x240> not sure how big it is
[11:41] <amurray> ah yep and the corresponding apparmor bits - thanks :)
[11:41] <zyga-x240> amurray: the key part there is AddUpdateNSf function
[11:41] <zyga-x240> which adds a piece of text for per-snap profile for snap-update-ns
[11:42] <zyga-x240> this just needs the permission to bind /snap/{base}/*/etc/apparmor.d -> /etc/apparmor.d
[11:42] <zyga-x240> now jump to 322
[11:42] <zyga-x240> this does what you mentioned bfore
[11:42] <zyga-x240> *before
[11:42] <zyga-x240> the difference is that we need spec.AddMountENtry (not *User*)
[11:42] <zyga-x240> there's more
[11:42] <zyga-x240> I believe those should be permanent things
[11:43] <zyga-x240> regardless of connection
[11:43] <zyga-x240> so the method signature is different
[11:43] <zyga-x240> you can see that in ...
[11:43] <zyga-x240> if you go to interfaces/mount/spec.go:206
[11:43] <zyga-x240> AddPermanentPlug
[11:43] <zyga-x240> there's a Slot variant just below
[11:44] <zyga-x240> the difference is in the arguments provided,
[11:44] <zyga-x240> the Permanent methods get an interface and a plug or a slot, not a connected plug / slot
[11:44] <zyga-x240> so it's just one side that you see
[11:44] <zyga-x240> anyway,
[11:44] <pstolowski> journal logs are 324M, tgz
[11:44] <zyga-x240> I think that'sa sensible approach
[11:44] <zyga-x240> pstolowski: oh my
[11:44] <zyga-x240> pstolowski: maybe too much
[11:44] <zyga-x240> pstolowski: not sure, if we can send that over, that's good
[11:44] <zyga-x240> but confirm with rogpeppe for sure
[11:44] <amurray> zyga-x240: ok thanks heaps for your guidance - I'll try take a look tomorrow morning and see if I can cook something up
[11:45] <zyga-x240> amurray: let me know how this feels like
[11:45] <zyga-x240> ok
[11:45] <zyga-x240> amurray: if you get stuck I'll help
[11:45] <pstolowski> rogpeppe: ok to transfer 324M ^ ?
[11:45] <zyga-x240> amurray: which interfaces were those? docker-support and multipass-something?
[11:45] <amurray> zyga-x240: thanks - so multipass-support iirc
[11:46] <zyga-x240> right
[11:46] <zyga-x240> ideally we'd have a spread test that installs those snaps
[11:46] <zyga-x240> and looks at the mount profile or at the mount namespace
[11:46] <zyga-x240> I think that's the last step though
[11:46] <zyga-x240> I can definitely help
[11:47] <amurray> yeah I was wondering if I should try and add a test with whatever fix I come up with for this but will focus on a getting the right fix first and then can look at that if time permits...
[11:47] <zyga-x240> amurray: you can start with a quick failing test
[11:48] <zyga-x240> do you know how to write those?
[11:48] <zyga-x240> mkdir tests/regression/lp-XXX
[11:48] <zyga-x240> add a summary: with some info
[11:48] <zyga-x240> then execute: | (newline)(tab)false
[11:49] <zyga-x240> run that test with SPREAD_DEBUG_EACH=0 spread -debug -v google:ubuntu-20.10-64:tests/regression/lp-XXX
[11:49] <zyga-x240> in the shell install the snap you need
[11:49] <zyga-x240> use nsenter / cat to explore the files in /etc/apparmor.d
[11:49] <zyga-x240> eventually copy those ideas over to the yaml
[11:49] <zyga-x240> quit the debug shell and re-run to verify
[11:49] <zyga-x240> at some point it will measure failure
[11:49] <zyga-x240> and then that's a good start
[11:50] <zyga-x240> we have a library of helper programs that assist in writing tests
[11:50] <zyga-x240> but the best thing is you can really experience this from the point of view of a user
[11:50] <zyga-x240> and create a valid test
[11:50] <zyga-x240> that only later needs tweaking so that it fits the rest of the test stack
[11:50] <amurray> this will be my first time writing a test so again I really appreciate the guidance - cheers
[11:51] <zyga-x240> amurray: look at various tests around, though you may stumble on more unusual test from time to time
[11:51] <zyga-x240> you can also use qemu locally
[11:51] <zyga-x240> you will need a test image, you can get that with adt
[11:51] <zyga-x240> I can find the magic line if you want to use that instead of the google backend
[11:51] <zyga-x240> just let me know
[11:51] <zyga-x240> I think, on 20.10, that is autopkgtest-buildvm-ubuntu-cloud
[11:52] <zyga-x240> you just need a few extra args to get a groovy image
[11:52] <amurray> sure any help with magic incantations are greatly appreciated :)
[11:52] <zyga-x240> drop that into ~/.spread/qemu
[11:52] <zyga-x240> as ubuntu-20.10-64.img
[11:52] <zyga-x240> and you're good
[11:52] <zyga-x240> a bit of advice that qemu tests are heavy on networking
[11:53] <zyga-x240> so you may want a good connection
[11:53] <zyga-x240> over time you can speed up with things like apt-cacher-ng
[11:53] <zyga-x240> anyway, let me know if this helps and if you get stuck on anything just ask
[11:53] <amurray> will do - thanks again (my connection is ok, not great so will see how I fare...)
[11:54] <amurray> ok time for me to go eod - thanks again zyga-x240 for your help - have a great day
[11:54] <zyga-x240> likewise!
[11:55] <zyga-x240> see you later
[11:58] <pstolowski> zyga-x240: sorry, i'm not sure about dd and vfat, can you elaborate?
[11:58] <zyga-x240> pstolowski: how large is the fvat partition on that pi?
[11:58] <zyga-x240> I don't recall
[11:59] <zyga-x240> pstolowski: I wonder what's the impact of the fact that the partition is not cleanly unmounted
[11:59] <zyga-x240> and may not have been unmounted
[11:59] <pstolowski> zyga-x240: i don't see any mounted vfat partitions
[11:59] <zyga-x240> cleanly that is
[11:59] <zyga-x240> hmmm
[11:59] <zyga-x240> can you paste mount?
[12:00] <zyga-x240> rogpeppe: do you recall if you unmounted the boot partition last time we were looking at tihs?
[12:00] <rogpeppe> zyga-x240: yeah, i might have
[12:00] <zyga-x240> ah, that explains things
[12:00] <zyga-x240> thank you
[12:00] <pstolowski> zyga-x240,z rogpeppe i've already collected mount output
[12:01] <zyga-x240> rogpeppe: did you try to fsck that partition after unmounting it?
[12:01] <rogpeppe> zyga-x240: i tried, but there's no fsck command available
[12:01] <zyga-x240> rogpeppe: oh
[12:01] <rogpeppe> zyga-x240: (and no way to install one, of course :) )
[12:01] <zyga-x240> pstolowski: what's the boot base (/meta/snap.yaml will help)
[12:01] <zyga-x240> is that core18 or core?
[12:03] <zyga-x240> I see /sbin/fsck.vfat in both core and core18
[12:03] <zyga-x240> can you confirm those are on PATH pstolowski?
[12:04] <zyga-x240> rogpeppe: ^
[12:04] <zyga-x240> rogpeppe: if you can, perhaps fsck.vfat /dev/mmcblk0p{something}
[12:05] <pstolowski> zyga-x240: it's core18 - https://paste.ubuntu.com/p/Rwvr2dPD9M/
[12:05] <zyga-x240> pstolowski: right and that is the boot base for sure?
[12:05] <zyga-x240> our list output doesn't show this
[12:05] <rogpeppe> zyga-x240: what's that "{something}" supposed to be a placeholder for?
[12:05] <zyga-x240> rogpeppe: the number of the partition with vfat
[12:05] <zyga-x240> lsblk can help finding it
[12:05] <zyga-x240> I think it's just p0 or p1
[12:06] <pstolowski> no  fsck.vfat on path!
[12:06] <zyga-x240> pstolowski: and in /sbin/fsck.vfat?
[12:06] <rogpeppe> pstolowski: feel free to run fsck...
[12:06] <pstolowski> nope
[12:06] <pstolowski> $ ls /sbin/fsck*
[12:06] <zyga-x240> pstolowski: can you check /meta/snap.yaml to ensure that the boot base is core18 for sure, I'm surprised to see three core revisions and one core18
[12:07] <zyga-x240> woah!
[12:07] <zyga-x240> let me check
[12:07] <pstolowski> "/sbin/fsck  /sbin/fsck.cramfs  /sbin/fsck.ext2  /sbin/fsck.ext3  /sbin/fsck.ext4  /sbin/fsck.minix"
[12:07] <zyga-x240> mvo: ^^^
[12:07] <zyga-x240> that's very likely a serious problem
[12:07] <zyga-x240> pstolowski: how about fsck.fat?
[12:07] <zyga-x240> is that gone too?
[12:07] <zyga-x240> I see it in my core18 snap on x86-64
[12:08] <zyga-x240> pstolowski: that's worth reporting as a separate bug with a regression test that checks that's an executable program
[12:08] <pstolowski> zyga-x240: on what path on your system?
[12:08] <zyga-x240> and that it can run --help
[12:08] <zyga-x240> pstolowski: /snap/core18/current/sbin/fsck.vfat
[12:08] <zyga-x240> that's a symlink to fsck.fat
[12:09] <zyga-x240> but I see that in the core snap as well
[12:09]  * zyga-x240 looks at revision numbers
[12:10]  * zyga-x240 checks stable channel
[12:10] <zyga-x240> pstolowski: waiting for your confirmation of the boot base please
[12:11] <pstolowski> zyga-x240: yeah, core has fsck.vfat here. but core18 doesn't. and it's not symlinked anywhere
[12:11] <zyga-x240> stable core has fsck
[12:11] <zyga-x240> ok
[12:11] <zyga-x240> probably core18 is at fault then
[12:11] <zyga-x240> pstolowski: but core is the boot base
[12:11] <pstolowski> zyga-x240: boot base is core18
[12:11] <zyga-x240> so what's going on?
[12:11] <zyga-x240> ahh
[12:12] <zyga-x240> ok
[12:12] <pstolowski> i'm collecting all this and will soon attach to the report
[12:12] <zyga-x240> pstolowski: I think core18 didn't refresh
[12:12] <zyga-x240> it's very old
[12:12] <zyga-x240> core18 is revision 1885 here
[12:12] <zyga-x240> but 1885 in your log
[12:12] <zyga-x240> I think that could be related
[12:12] <zyga-x240> rogpeppe: consider runnin fsck.vfat from /snap/core/current/sbin/fsck.vfat
[12:13] <zyga-x240> then we could try running snap refresh core18
[12:13] <zyga-x240> and snap install snapd
[12:13] <zyga-x240> that may recover the system
[12:13] <zyga-x240> I think this system is just stuck at old revisions and cannot move forward
[12:13] <zyga-x240> the fsck bug was fixed
[12:13] <zyga-x240> but this device is still affected
[12:14] <pstolowski> interesting
[12:14] <zyga-x240> not sure what you think but the revision number there is really old
[12:14] <zyga-x240> pstolowski: so is snapd running now?
[12:14] <zyga-x240> the service I mean
[12:14] <pstolowski> zyga-x240: yes
[12:15] <zyga-x240> pstolowski: can you try refreshing core18
[12:15] <zyga-x240> though wait
[12:15] <zyga-x240> wait please
[12:15] <zyga-x240> that would cause a reboot
[12:15] <zyga-x240> and IIRC that's a problem
[12:15] <zyga-x240> rogpeppe: ^
[12:15] <zyga-x240> rogpeppe: is rebooting that device acceptable for you?
[12:15] <zyga-x240> or will it misbehave?
[12:15] <zyga-x240> (I think that given its state we should first fsck boot partitio
[12:15] <rogpeppe> zyga-x240: that's fine, although it probably won't restart
[12:15] <zyga-x240> mount it
[12:15] <zyga-x240> and look at what's there
[12:15] <rogpeppe> zyga-x240: it will probably need manual intervention
[12:15] <zyga-x240> pstolowski: ^ ok, let's not reboot it yet
[12:16] <zyga-x240> pstolowski: please fsck boot
[12:16] <rogpeppe> zyga-x240: it almost always needs to be restarted twice for some reason
[12:16] <zyga-x240> using core's fsck
[12:16] <zyga-x240> rogpeppe: I think because it reboots to try new core18 snap
[12:16] <zyga-x240> but fails
[12:16] <zyga-x240> because boot data is corrputed
[12:16] <zyga-x240> so it gets stuck
[12:16] <zyga-x240> (no watchdog timer)
[12:16] <zyga-x240> then reboot rolls back
[12:16] <zyga-x240> and we're back here
[12:16] <pstolowski> yeah, i'd avoid any intervention now. i'd discuss the solution for rogpeppe and pass it to him to do
[12:16] <zyga-x240> rollback across reboot as pstolowski noted earlier
[12:16] <pstolowski> let's discuss on the standup
[12:16] <zyga-x240> pstolowski: agreed
[12:17] <zyga-x240> rogpeppe: ^ if you can fsck is IMO safe
[12:17] <zyga-x240> then mounting the partition back
[12:17] <rogpeppe> how on earth did the boot data get corrupted anyway? i haven't made any changes to it since i installed
[12:17] <zyga-x240> that may recover everything, assuming you try to install snapd again
[12:17] <zyga-x240> rogpeppe: it's fvat
[12:17] <zyga-x240> and it's written to by both uboot and linux
[12:17] <zyga-x240> we really don't know
[12:17] <rogpeppe> zyga-x240: vfat can corrupt even when you're not changing it?
[12:18] <zyga-x240> rogpeppe: we do change it on boot
[12:18] <zyga-x240> we set a flag that says "we're trying to boot that thing now"
[12:18] <rogpeppe> oh
[12:18] <zyga-x240> so that on reboot (if we fail) we don't retry
[12:18] <rogpeppe> i guess that's where the issue comes from
[12:18] <zyga-x240> but boot a safe value
[12:18] <zyga-x240> indeed
[12:18] <zyga-x240> we found some issues with uboot FAT before
[12:18] <zyga-x240> those got fixed
[12:18] <zyga-x240> or was that GRUB fat
[12:18] <zyga-x240> anyway
[12:18] <zyga-x240> I think you can fix the partition so that it's not unclean
[12:18] <zyga-x240> then
[12:18] <zyga-x240> mount it
[12:18] <zyga-x240> so that snapd can write to it
[12:18] <zyga-x240> then
[12:19] <zyga-x240> install snapd using the snap binary from core snap
[12:19] <zyga-x240> that should make snapd snap not disabled anymore
[12:19] <zyga-x240> if that works
[12:19] <zyga-x240> I'd try to refresh core18 and see if it works
[12:19] <zyga-x240> I suspect it might just
[12:21] <zyga-x240> rogpeppe: this is your decision
[12:21] <rogpeppe> zyga-x240: go for it
[12:21] <rogpeppe> zyga-x240: i can easily manually reboot if needed
[12:21] <rogpeppe> zyga-x240: this shouldn't corrupt the main data, right?
[12:21] <pstolowski> rogpeppe: give me 30 minutes to finish the download :)
[12:22] <zyga-x240> rogpeppe: it's a separate partition
[12:22] <zyga-x240> rogpeppe: I think pawel asked you to perform those changes though, I think it's best to let him upload logs for forensics
[12:22] <zyga-x240> and for you two to agree as to who runs the commands
[12:22] <zyga-x240> so that it's not racy :)
[12:22] <rogpeppe> i'm happy for pstolowski to do whatever's needed. i'm quite busy currently.
[12:23] <zyga-x240> ok
[12:23] <zyga-x240> pstolowski: it's up to us
[12:23] <zyga-x240> on the upside the bug may be fixed
[12:23] <zyga-x240> it's just prevented your device from refreshing
[12:24] <pstolowski> yeah, that all makes sense
[12:24] <pstolowski> but but
[12:24] <pstolowski> why no current symlink?
[12:24] <zyga-x240> pstolowski: although understanding exactly how snapd became unlinked would be useful
[12:24] <pstolowski> undo bug?
[12:24] <zyga-x240> I wonder what happens in that special boot code
[12:24] <zyga-x240> that reconciles what the system was booted with (Base name and rev)
[12:24] <zyga-x240> with what's in snapd state
[12:25] <zyga-x240> maybe that, when a skew is detected, somehow took out snapd snap?
[12:26] <zyga-x240> brb, afk for a moment
[12:26] <zyga-x240> or maybe until standup
[12:52] <zyga-mbp> mvo could you please land https://github.com/snapcore/snapd/pull/9496
[12:52] <mup> PR #9496: spread-shellcheck: temporary workaround for deadlock, drop unnecessary test <Simple 😃> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9496>
[12:52] <zyga-mbp> the store had some issues with searching, probably usual load or something like that, and this blocks master
[12:56] <mvo> zyga-mbp: done
[12:56] <zyga-mbp> thank you!
[12:57] <mup> PR snapd#9496 closed: spread-shellcheck: temporary workaround for deadlock, drop unnecessary test <Simple 😃> <Created by bboozzoo> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9496>
[13:47] <mborzecki> off to pick up my daughter from school
[13:54] <zyga> pstolowski I'm hacking tests for this case but if you want to pair-program on fixing that device remotely, let me know
[13:55] <pstolowski> zyga: that's actually a good idea. but i'm having a short lunch break now, how about in 30?
[13:56] <zyga> no rush, I didn't have lunch yet either
[13:56] <zyga> let me know when its comfortable for you
[14:12] <mborzecki> re
[14:13] <zyga> that fsck test is actually pretty cool
[14:20] <pstolowski> zyga: i've a meeting in 10 minutes, how about afterwards?
[14:21] <mvo> pstolowski: I wonder if this will happen, lot's of people declined
[14:21] <zyga> pstolowski: how long is the meeting?
[14:21] <pstolowski> mvo: ah ok
[14:21] <zyga> pstolowski: I'm here for now, I can grab lunch and eat it next to a hangout
[14:27] <pstolowski> zyga: ok, i'll know in a few moments if the meeting happens (likely not)
[14:29] <zyga> pstolowski: ok, can we do it in 15 minutes
[14:29] <zyga> I'm getting lunch in this instant
[14:31] <zyga> see you at 16:45
[14:31] <mvo> pstolowski: noone in the call so far
[14:34] <pstolowski> zyga: ok, works for me, see you soon
[14:45] <zyga> back
[14:46] <pstolowski> ok
[15:06] <zyga> mvo, pedronis: pstolowski discovered the root cause
[15:10] <pstolowski> bug report coming
[15:12] <mvo> zyga: ohhh?
[15:19]  * cachio lunch
[15:27] <zyga> mvo still in a call
[15:27] <zyga> but vey unexpected
[15:27] <zyga> *very
[15:51] <zyga> okay, back
[15:51] <zyga> mvo so two well defined bugs
[15:51] <zyga> mvo one triggers the other
[15:51] <zyga> mvo one breaks snapd refreshes
[15:51] <zyga> other causes snapd to deactivate itself on failed refresh
[15:51] <zyga> the root cause was session services
[15:52] <zyga> and the fact that it silently depends on a specific revision of core/core18
[15:52] <zyga> we hit a EROFS and snapd refresh fails to proceed
[15:52] <zyga> pawel is reporting both issues
[15:52] <zyga> they are well defined and should be relatively easy to fix
[15:52] <zyga> at least the breaking root cause
[15:52] <zyga> rogpeppe: we fixed snapd on your system
[15:53] <zyga> rogpeppe but we refrained from doing anything that would reboot the device
[15:53] <zyga> rogpeppe we also fixed the boot partition
[15:53] <zyga> rogpeppe with a bit of luck, you should be able to refresh core18 snap now
[15:53] <zyga> rogpeppe and it should reboot successfully
[15:53] <zyga> or if it does not, there are more bugs for us to find
[15:53] <zyga> rogpeppe if you choose to refresh core18, do let us know what happened
[15:53] <zyga> rogpeppe we also mounted the boot partition back, so it should be all right now
[15:54] <zyga> mvo let me know if you want to talk about any details
[15:54]  * mvo is in a meeting
[15:56] <zyga> ack
[16:01] <pstolowski> mvo, zyga  https://bugs.launchpad.net/snapd/+bug/1899664 and https://bugs.launchpad.net/snapd/+bug/1899665
[16:01] <mup> Bug #1899664: snapd refresh on old core18 fails due to read-only /etc/dbus-1/session.d <snapd:New> <https://launchpad.net/bugs/1899664>
[16:01] <mup> Bug #1899665: Failed refresh of snapd drops current symlink on failure <snapd:New> <https://launchpad.net/bugs/1899665>
[16:02] <pstolowski> pedronis: ^
[16:03] <zyga> pstolowski \o/
[16:04] <zyga> thank you!
[16:04] <mvo> pstolowski: without having looked at those, how hard do you think this is to fix? could we work on this as a blue item realtively soon?
[16:04] <rogpeppe> zyga: thanks!
[16:04] <mvo> and thanks zyga and pstolowski (and rogpeppe of course!)
[16:04] <rogpeppe> zyga: how would i go about refreshing the core18 snap?
[16:05] <zyga> rogpeppe snap refresh core18
[16:06] <rogpeppe> zyga: ok, i'll try that. i.e. run that command, then `sudo reboot now`, right?
[16:06] <zyga> no need, it will reboot itself
[16:06] <pstolowski> mvo: yes i will work on them, r/o filesystem will be easy, not sure about the one re symlink, but probably not too complicated either
[16:06] <zyga> rogpeppe note that the way we fixed your system is ephemeral
[16:06] <zyga> and if core18 fails to refresh
[16:06] <zyga> snapd will break itself again
[16:07] <zyga> that is why I wanted to know that this works or fails once the reboot completes
[16:07] <zyga> rogpeppe oh, and we re-started the hydroctl service as well
[16:07] <rogpeppe> zyga: yay!
[16:08] <rogpeppe> zyga: it's working!
[16:08] <pstolowski> rogpeppe: feel free to remove my ssh access when you confirm core18 works
[16:09] <zyga> mvo: interestingly, snapd will hit this case in a normal way
[16:09] <zyga> that I did not think about before
[16:09] <zyga> pstolowski we are here because that device has long refresh window
[16:09] <zyga> it will refresh infrequently
[16:09] <zyga> and when it does
[16:09] <zyga> it refreshes snapd first
[16:09] <zyga> so I think the severity of https://bugs.launchpad.net/snapd/+bug/1899664 should be increased
[16:09] <mup> Bug #1899664: snapd refresh on old core18 fails due to read-only /etc/dbus-1/session.d <snapd:Triaged> <https://launchpad.net/bugs/1899664>
[16:10] <zyga> it's not such a special case after all
[16:10] <pstolowski> yeah i didn't set sev yet
[16:10] <pstolowski> +1
[16:13]  * mvo hugs pstolowski 
[16:13]  * mvo hugs zyga too
[16:13] <pstolowski> was a collective effort really
[16:13] <rogpeppe> zyga: after "snap refresh core18":
[16:13] <rogpeppe> error: cannot perform the following tasks:
[16:13] <rogpeppe> - Make current revision for snap "core18" unavailable (cannot set next boot: cannot determine bootloader)
[16:13] <rogpeppe> - Make snap "core18" (1889) available to the system (cannot set next boot: cannot determine bootloader)
[16:13] <pstolowski> mvo: eod for today, i'll work on fixes tomorrow morning
[16:14] <mvo> pstolowski: sure thing
[16:14] <pstolowski> ooh woot
[16:14] <pstolowski> looks like i'll log in there again
[16:15] <pstolowski> core18         20190723        1076  latest/stable  canonical✓    base,disabled
[16:15] <pstolowski> it did the same for core18
[16:15] <pstolowski> it is disabled now
[16:18] <zyga> rogpeppe oohh
[16:18] <zyga> rogpeppe thank you, we will look again tomorrow
[16:18] <rogpeppe> zyga: ok, thanks
[16:18] <zyga> rogpeppe this device, while very unfortunate for you, will really help make snapd more robust
[16:19] <rogpeppe> zyga: i hope so :)
[16:22] <pstolowski> looks like an error on  'Make snap ... available to the system" results in wrong undo, this is what i see for core18 and the same happened for snapd
[16:22] <pstolowski> but yes, i'll continue tomorrow
[16:23] <pstolowski> cu
[16:47]  * zyga-x240 works on the test 
[16:48] <zyga-x240> so cool things
[16:48] <zyga-x240> this is my favourite test!
[18:27]  * zyga runs the 2nd fsck test
[18:42] <zyga> ...
[18:42] <zyga> rebooting
[18:42] <zyga> I mean in the test
[18:44]  * cachio doctor appointment
[18:47] <zyga> o/
[19:16] <zyga> more iterations
[19:18]  * zyga goes to do some evening housework while tests run
[19:24] <zyga> woot
[19:24] <zyga> tests pass
[19:45] <zyga> OMG THE RAIN
[19:46] <zyga> my dog decided to have a slow long walk
[19:46] <zyga> I'm so wet
[19:49] <zyga> trying core fsck test now
[19:49] <zyga> I'll push both tests at once
[19:50] <zyga> https://github.com/snapcore/snapd/pull/9446 needs review
[19:50] <mup> PR #9446: overlord,usersession: initial notifications of pending refreshes <Created by zyga> <https://github.com/snapcore/snapd/pull/9446>
[19:50] <zyga> https://github.com/snapcore/snapd/pull/9422 needs review as well and is short
[19:50] <mup> PR #9422: overlord: add link participant for linkage transitions <Needs Samuele review> <Created by zyga> <https://github.com/snapcore/snapd/pull/9422>
[19:56]  * zyga shower
[20:09] <zyga> 16 passes
[20:09] <zyga> 18 and 20 are in progress
[20:09]  * zyga tea
[20:14] <zyga> amurray: oh is it morning for you? :)
[20:21] <zyga> 18 passes
[20:21] <zyga> now just 20
[20:21] <zyga> actually making tea :)
[20:44] <zyga-x240> core20 takes foreeeever to test boot
[20:44] <zyga-x240> but close
[20:45] <mup> PR snapcraft#3315 opened: build(deps-dev): bump junit from 3.8.1 to 4.13.1 in /tests/spread/plugins/v1/maven/snaps/maven-hello/my-app <dependencies> <java> <Created by dependabot[bot]> <https://github.com/snapcore/snapcraft/pull/3315>
[20:45] <mup> PR snapcraft#3316 opened: build(deps-dev): bump junit from 3.8.1 to 4.13.1 in /tests/spread/plugins/v1/maven/snaps/legacy-maven-hello/my-app <dependencies> <java> <Created by dependabot[bot]> <https://github.com/snapcore/snapcraft/pull/3316>
[21:03] <zyga-x240> core20 should be good, running a clean test now
[21:31]  * zyga-x240 pushed https://github.com/snapcore/snapd/pull/9499 and EODs
[21:31] <mup> PR #9499: tests: add tests for fsck <Created by zyga> <https://github.com/snapcore/snapd/pull/9499>
[21:31] <zyga-x240> xnox: ^ perhaps you know what is responsible for fsck.vfat in uc20
[21:32] <zyga-x240> this test shows that ubuntu-seed stays corrupted across reboot, a regression compared to core 16 and core 18
[21:33]  * zyga-x240 EODs 
[21:33] <zyga-x240> xnox: if you have feedback please comment on the PR, I'll review that tomorrow
[21:35] <mup> PR snapd#9499 opened: tests: add tests for fsck <Created by zyga> <https://github.com/snapcore/snapd/pull/9499>