[01:15] PR snapcraft#3314 closed: schema: add missing name item in package-repositories [05:40] morning [06:03] mvo: hey [06:08] good morning mborzecki [06:22] mvo: do you recall vfat corruption when we got uboot.env and UBOOT.ENV in the same fs? [06:23] mvo: do you have any tooling to corrupt the vfat from that investigation? i was able to find https://gist.github.com/bboozzoo/e68254507eef4673dd8c6f9b82f65d92 with the hope it can used in #9499 [06:23] PR #9499: tests: add tests for fsck [06:31] mborzecki: I found some go libraries to directly access fat that can probably be used for this [06:31] mborzecki: let me look at your stuff [06:31] mvo: fatcat? [06:32] mvo: i have a vague recollection we tried to do something with https://github.com/Gregwar/fatcat/ [06:32] mborzecki: I don't remember exactly, let me try to look [06:35] mborzecki: https://github.com/mvo5/go-fs [06:36] mvo: interesting, thanks! [06:38] mborzecki: if you want fat32 and lfn support you will need my version but it does not do much, maybe it's actually not that great [06:39] good morning [06:39] mvo: hey! [06:39] good morning zyga [06:39] mvo: i was thinking that maybe we could break one of the entries, or a superblock [06:39] mborzecki: +1 [06:39] zyga: morning, left some comments in 9499 [06:40] thanks [06:40] I'm looking at those now [06:40] it was late so maybe a typo crept in? [06:40] mborzecki: we could experiment with corrupting a cluster too, up to you, I follow with popcorn from the sideline :) [06:41] I'll check the offsets first [06:41] mborzecki: I replied [06:42] I really think it's a separate test [06:42] and I think the core16 / core18 situation is unclear, for the reason I outlined in my own comments [06:42] so I'll start by expanding that test not to rely on early-boot kernel messages [06:42] mborzecki: still, we don't fsck on core20 so ... bummer [06:42] I suspect we have to explicitly at the level we are at now [06:43] zyga: yeah, i recall ian mentioning something about fsck when he worked on snap-bootstrap [06:43] mborzecki: the good thing is - we know :) [06:43] the bad thing is, it's not good :) [06:45] mborzecki: separately I think we should consider re-mouting vfat as read only [06:45] ubuntu-seed is writable and we mount snaps from there [06:45] it's just waiting to corrupt and die [06:45] we could really keep it read only for most of the runtime of a device [06:45] and only remount to writable when we need to perform a specific operation [06:45] read-only fat doesn't corrupt on power loss [06:47] zyga: yeah, it was raised for uc20 [06:56] mmm, good [06:56] today is a weird day, no school for kids, wife sick and back from work [06:56] feels like back to quarantine [07:00] morning [07:00] zyga: hey, i think we forgot to mount boot back after fsck yesterday, that's why core18 failed to refresh :) [07:01] pstolowski: hmmm, could be but I don't recall [07:01] pstolowski: we can look at that again [07:01] yes i just checked [07:01] pstolowski: maybe at ... 9:30? [07:01] ok [07:01] I need to move back to the office and grab coffee (still upstairs in the kitchen) [07:01] that's also an interesting material for spread test [07:01] pstolowski: unmounted boot? [07:01] zyga: hey so I tried your approach but I am not having much success (I think I am not using the right path for the core snap or perhaps am missing something else) - is this about what you had in mind? https://pastebin.ubuntu.com/p/2FXrMhN2Ck/ [07:01] zyga: yes, but just to provoke failure of refresh [07:02] pstolowski: indeed [07:02] amurray: hey! [07:02] amurray: looking [07:03] amurray: it looks semi okay, I think the current symlink may be a problem, we code defensively and we avoid mounting over symlinks [07:03] what happens in your case? [07:03] zyga: this results in the following: https://pastebin.ubuntu.com/p/mf8fGRW6dr/ so I think it is on the right track [07:03] yeah [07:03] that definitely looks good [07:03] ahh current symlink... do you know how I could know what the actual path would be then to the core snap? [07:04] *or should I say whatever the base snap would be? [07:04] amurray: this _may_ be a little more complex, I came to realize, because we cannot easily capture the base revision [07:04] amurray: so without checking I suspect this is good on paper but breaks because snap-update-ns doesn't want to mount with a source /snap/core/current [07:05] but it also shows a deeper problem, that now the snap has an explicit relationship to the base snap [07:05] amurray: so when the base revision changes, ... what then? [07:05] amurray: normally we have some logic to re-create the mount namespace in that case [07:05] zyga: also do you know how I can debug what snap-update-ns is doing...? [07:05] ah right.. so on base refresh... [07:05] amurray: yes, export SNAP_DEBUG or SNAPD_DEBUG and run it by hand [07:05] you can actually do something simpler [07:06] just set SNAP_CONFINE_DEBUG=1 and snap run --shell docker [07:06] this will give you a lot of details [07:06] it's a bit complex [07:06] because those snaps have services [07:06] it may be better to start with a fake-docker snap that just has the interface [07:06] connect it [07:06] and then see what happens [07:06] otherwise the mount namespace is created by the service app they contain [07:06] and you don't see everything [07:07] by using a new fake snap you can run it by hand with more exact control [07:07] amurray: my advice is to see exactly what happens in snap-update-ns first [07:07] amurray: and then consider two options: [07:08] amurray: special-case /snap/$name/current as source and de-reference that in snap-update-ns [07:08] amurray: explicitly teach snap-update-ns about $BASE_SNAP_NAME $BASE_SNAP_REVISION [07:08] (those variables are made up) [07:08] snap-update-ns could expand those variables and we could use those variables in the mount profile [07:08] then we don't need to special case a path [07:09] and the feature may be useful for other things [07:09] amurray: how does that sound? [07:11] it sounds a lot harder than the snap-confine approach... but if it is the better solution then I'll keep plugging away at it - you said 'see exactly what happens in snap-update-ns' - I am still not sure how to see that.. .I tried running snapd manually with SNAP_DEBUG etc but it didn't seem to show anything relevant [07:12] amurray: don't run snapd with debug [07:12] amurray: run snap run with debug [07:12] snap run invokes snap-confine which invokes snap-update-ns [07:12] if you enable debug there you will see that [07:13] one sec [07:14] hmm https://pastebin.ubuntu.com/p/mDGTJp7GCH/ - I think the 'DEBUG: joined preserved mount namespace docker' means it is not showing what is actually happening [07:14] amurray: zyga@x240:~$ SNAPD_DEBUG=yes snap run gimp [07:14] amurray: exactly, because the service has constructed the mount namespace and you are not seeing that anymore [07:14] amurray: you can work around that without a new snap though [07:14] amurray: just stop all docker services [07:14] amurray: discard the mount namespace with sudo /usr/lib/snapd/snap-discard-ns docker [07:15] and then run [07:15] SNAPD_DEBUG=yes snap run --shell docker [07:15] that will give you the full output [07:15] amurray: snap-confine-based approach is possible too but it would be coarse and unable to handle interfaces [07:15] I'd prefer to avoid that [07:16] and it's a step in the wrong direction in a way, as we want to avoid sharing /etc over time, and have less rigid set of mounts there [07:17] yep that did it (snap-discard-ns) - https://pastebin.ubuntu.com/p/Nx3C4VgCPt/ - ok I gotta run to pick up my daughter but will check back with you later tonight - thanks again for your help :) [07:19] amurray: ack [07:23] * zyga improved the fsck test and is running it to see if core16 really fscks or not [07:23] mborzecki: I need a review for https://github.com/snapcore/snapd/pull/9446 [07:23] PR #9446: overlord,usersession: initial notifications of pending refreshes [07:23] mvo: ^ and perhaps you since ian is off this week? [07:24] mborzecki: - google:fedora-31-64:tests/main/sudo-env is that due to sudo changes in fedora? [07:25] - google:arch-linux-64:tests/main/snapshot-cross-revno failed on unrelated pr [07:26] pstolowski: ok, I'll set what I'm doing aside and prep for our session [07:26] zyga: yes, it's on my list but I suck and have not managed to get to it :/ need to do the 360 and an interview prep and then will hpoefully get to it [07:26] zyga: I really want this to land asap [07:29] zyga: yes, most likely due to sudo changes [07:29] zyga: i'll open a PR with the fix [07:29] mborzecki: superb [07:29] ok [07:29] coffee [07:29] pstolowski: give ma 3 minutes [07:33] pstolowski: just warming stuff up [07:33] ok [07:33] pstolowski: do you have the session? [07:34] zyga: i've now [07:45] mvo: drat, we don't fsck on core16 or core18 [07:45] the test is now improved and fails everywhere [07:45] so .. well :) [07:46] zyga: w8, what? [07:47] aah, core16 and core18 don't use systemd in initrd, so fsck is not automagic [07:47] mborzecki: yep [07:47] mborzecki: I have a test [07:49] rogpeppe: hi [07:49] mborzecki: the previous test was flawed because core16 and core18 did not have systemd forwarding messages from early boot [07:50] anyway, I'll push the test [07:50] pstolowski: hello! [07:51] rogpeppe: hey, we have mounted /boot back and managed to refresh core18 and everything looked ok, but unfortunately after reboot it didn't come back, cannot ssh anymore :( [07:51] pstolowski: ok, i'll ask it to be power-cycled [07:51] rogpeppe: looks like we need a manual restart of your pi3 [07:52] pstolowski: that's what always happens, unfortunately [07:52] mborzecki: pushede [07:52] pstolowski: which is a bit of a shame, because it makes the whole thing somewhat unreliable [07:52] rogpeppe: yeah, no denying, hopefully we will get to the bottom of this [07:53] rogpeppe: please let us know when you can power cycle it, we will try to understand what's going on [07:53] pstolowski: i suspect that snappy will sometimes try to reboot itself to update system s/w, and this causes the machine to go down [07:53] mborzecki: please look at my logic in https://github.com/snapcore/snapd/pull/9499 [07:53] PR #9499: tests: add tests for fsck [07:54] mborzecki: I really would like to know if we fsck [07:54] and the test is wrong [07:54] or if we just don't fsck [07:54] and the test is sadly right [07:57] rogpeppe: pstolowski and me will resume debugging once you give us a note that the power cycle occurred [08:00] zyga: should happen in the next 5 minutes or so; i'll let you know [08:00] rogpeppe: great, thank you [08:03] PR snapd#9500 opened: tests/main/sudo-env: snap bin is avaialble on Fedora [08:03] zyga: ^^ [08:03] superb [08:04] +1 [08:08] zyga, pstolowski: ok, it's back up now. it took two power-cycles as usual. [08:08] rogpeppe: thank you [08:08] two? as in yank cable twice? [08:09] interesting [08:09] pstolowski: can you connect? [08:10] checking [08:10] zyga: yes, i can [08:10] pstolowski: ok, let's examine the boot partition again [08:10] and maybe snap changes [08:10] and journal? [08:10] shall we get back to the HO? [08:11] zyga: yes, give me 5 [08:11] sure [08:15] ok rdy [08:18] we were just talking with pawel that maybe the second reboot is because we fail to do the initial reboot at all [08:19] mvo: per monday's discussion i've updated https://github.com/snapcore/snapd/pull/9420 to use 384MB for UC20 [08:19] PR #9420: tests/nested/manual/minimal-smoke: use 384MB of RAM for nested UC20 <â›” Blocked> [08:31] mborzecki: thanks! (in a meeting) [08:33] rogpeppe: does the device reboot correctly if you just "sudo reboot"? [08:33] rogpeppe: or is it also going into this weird state that requires more than one reboot to complete? [08:34] zyga: i don't know... i'd presume not, otherwise it wouldn't have gone down last night [08:34] zyga: shall we try? [08:34] mmm [08:34] we're still coming up with options, we may decide to try that to learn more [08:34] we'll let you know for sure [08:34] zyga: ta [08:35] rogpeppe: no chance of a serial log? :D [08:36] zyga: nope, i'm afraid not. i've never worked out how to get serial output from the pi. [08:37] zyga: don't you have to solder something onto the board or something like that? [08:40] rogpeppe: no, just a pair of wires and a computer with usb port to capture the log [08:41] zyga: presumably you need some kind of connector for the wires, and a serial port adaptor [08:42] zyga: not sure my dad has either of those things around currently (actually, he _may_ have a serial port adaptor, but i'm not sure i want to get him to do all this remotely :) ) [08:42] rogpeppe: yeah, those are super common on ebay and amazon and go for a few pounds [08:42] I was more of a joking, it'd be ideal for debugging [08:42] we're eliminating possibilities [08:43] zyga: yeah, i've thought that before, but thought it would be too much hassle [08:51] rogpeppe: ok, we'd like to reboot the device without any changes [08:51] we have some ideas but we'd be much faster if we can do some reboots [08:51] without going through refresh [08:51] rogpeppe: is it okay if we reboot the device now? [08:51] zyga: please do [08:52] ok [08:52] zyga: let me know if you need it power-cycling again [08:52] zyga: although i'm in a meeting in 8 minutes [08:53] ok [08:53] we are back [08:53] thank you! [09:00] rogpeppe: it rebooted successfully [09:00] zyga: yay! [09:01] quick errand, brb [09:21] re [09:42] zyga: \o/ you rock! [09:42] mvo: we're not done, it's not over yet [09:42] and it's pawel and maciek :) [09:57] * mvo hugs mborzecki and pstolowski [09:57] mvo: yeah it just survived manual reboot, but not reboot after refresh; we're seeing 'rollback across reboot' [10:17] quick question: do we have an example gadget with cloud init configuration? [10:18] * zyga doesn't remember one [10:18] that's fine, I will just create one I think [10:21] not a good store day [11:13] rogpeppe: we did lots of analysis but ultimately reboot just failed [11:14] rogpeppe: if you can power cycle (once for now), that would be great [11:14] zyga: ok :\ [11:14] let us know when ready [11:18] rogpeppe: so, it would be amazing if you could eventually get this [11:18] https://www.amazon.co.uk/UART-CP2102-Module-Serial-Converter/dp/B00AFRXKFU/ref=sr_1_7?dchild=1&keywords=USB+TTL+serial&qid=1602674287&sr=8-7 [11:19] rogpeppe: let us know when the 1st power-cycle had occurred [11:19] if you already did that you can go ahead with the second one (2nd power cycle) [11:19] 1st power cycle did not (as expected, though unclear why) recover the device [11:22] ogra: do you have the permission to download pi-kernel revno 44? [11:45] mvo: do you have access to core18 snap? [11:45] I could use revision 1076 with assertions [11:45] similarly to pi-kernel at revision 44 [11:45] and pi gadget revision 17 [11:59] zyga-mbp, let me try ... [11:59] thank you! [12:00] $ snap download --revision=44 pi-kernel [12:00] Fetching snap "pi-kernel" [12:00] error: cannot download snap "pi-kernel": Access by specifying a revision is not allowed for this Snap. [12:00] nope [12:00] oh well [12:00] i fear only the kernel team can ... [12:01] (i can download the current revision for each track/channel but not specify --revision) [12:01] yeah, same here [12:02] also 44 is pretty old .. we're at 200 currently [12:02] I know, we are trying to reproduce a peculiar failure [12:02] (for armhf that is) [12:27] cmatsuoka: hi, can you take a look at https://github.com/snapcore/snapd/pull/9474 ? it's been +1'ed by pedronis [12:27] PR #9474: boot, overlord/devicestate: list trusted and managed assets upfront [12:28] mborzecki: already looking, was about to +1 it [12:29] cmatsuoka: cool thanks! [12:29] cmatsuoka: fwiw https://github.com/snapcore/snapd/pull/9443 could use your review too :) [12:29] PR #9443: gadget, gadget/install: support for ubuntu-save, create one during install if needed [12:36] mborzecki: ah yes, I started reviewing it a few days ago and didn't finish it, resuming now [12:44] cmatsuoka: thank you [12:53] mborzecki: done [12:54] cmatsuoka: yay, now if only testing was green [12:55] Hi, I cannot launch any snap on groovy. When I launch a command from the terminal, it waits for a moment then I get this error: [12:55] $ matterhorn [12:55] cannot stat /var/lib/snapd/seccomp/bpf/snap.matterhorn.matterhorn.bin: No such file or directory [12:56] is there a knows issue? [12:56] $ snap version [12:56] snap 2.47.1+20.10 [12:56] snapd 2.47.1+20.10 [12:56] series 16 [12:56] ubuntu 20.10 [12:56] kernel 5.8.0-22-generic [12:58] jibel hi [12:58] hmm [12:58] interesting [12:58] I assume the referenced file does not exist [12:58] can you run snap list without failure? [12:59] yes [12:59] no failure [12:59] jibel: yeah, can you also do `ls -l /var/lib/snap/seccomp/bpf` ? [12:59] and if you run "snap changes" do you see any errors? [12:59] and the snaps I cannot launc hare listed [13:00] ah [13:00] 65 Error today at 13:04 CEST today at 13:06 CEST Auto-refresh snaps "snapd" [13:00] and the error [13:00] 2020-10-14T13:04:37+02:00 ERROR cannot compile /var/lib/snapd/seccomp/bpf/snap.dl-ubuntu-test-iso.dl-ubuntu-test-iso.src: fork/exec /snap/snapd/8790/usr/lib/snapd/snap-seccomp: no such file or directory [13:01] log of change 65 https://paste.ubuntu.com/p/MKWRHhZpX4/ [13:01] upgrade of snapd didnt go well apparently [13:02] zyga-mbp: I can give you core18 r1076 - give me 1min [13:02] mvo thank you [13:02] jibel we have a standup call now [13:02] I'll get back to you after that [13:02] np [13:05] zyga-mbp, for some reason auto-refresh of snapd failed, I re-refreshed snapd manually and it fixed $world [13:05] weird [13:08] jibel could you look for system logs from that moment [13:08] perhaps there are clues there? [13:19] here it is https://paste.ubuntu.com/p/ZSRcqbsVCM/ [13:28] jibel thank you [13:28] jibel very interesting [13:38] jibel let me read the log quickly [13:38] jibel can you ls -ld /snap/snapd/8790/usr/lib/snapd/snap-seccomp [13:38] and report a bug with this log, snap changes and snap tasks NNN where NNN is the change that failed [13:38] I need to get my dog to the vet in a moment and I don't want to lose a chance to see the state your system was in right now [13:40] jibel: ^ [13:40] zyga-mbp, version 8790 has been removed from the system, I've only 9279 and 9607 [13:40] that explains a lot [13:40] thank you! [13:40] $ ls -ld /snap/snapd/*/usr/lib/snapd/snap-seccomp [13:40] so snap changes [13:41] -rwxr-xr-x 1 root root 2306928 sept. 4 18:33 /snap/snapd/9279/usr/lib/snapd/snap-seccomp [13:41] -rwxr-xr-x 1 root root 2306928 sept. 30 06:59 /snap/snapd/9607/usr/lib/snapd/snap-seccomp [13:41] -rwxr-xr-x 1 root root 2306928 sept. 30 06:59 /snap/snapd/current/usr/lib/snapd/snap-seccomp [13:41] and snap tasks for the last few changes that happened [13:41] I really need to go now [13:41] I'll report a bug [13:41] mborzecki, pstolowski: ^ perhaps something you can pick up to the extent that the relevant data is collected in a bug [13:43] * zyga-mbp goes [13:46] jibel: can you find the pid of `snapd` and then run `sudo ls -l /proc//exe` ? [13:47] mvo, could you find the error for sbuild? [13:51] cachio: got log of that failure? [13:51] mborzecki, which failure? [13:52] the sbuild on? [13:52] cachio: yes, the sbuild one [13:53] mborzecki, https://paste.ubuntu.com/p/f9MJNgc6r9/ [13:54] mborzecki, lrwxrwxrwx 1 root root 0 oct. 14 15:02 /proc/70906/exe -> /usr/lib/snapd/snapd [13:54] mborzecki, zyga-mbp bug 1899794 [13:54] Bug #1899794: snapd error during refresh [13:58] jibel: can you attach the output of journalctl -u snapd.failure.service too? [14:00] done [14:03] jibel: thanks, this looks very interesting, can you also grab the output of `journalctl -u snapd.service` from before oct 12? [14:05] jibel: looks like something happend on oct 12 ~7:03 and the failure handler got started, so maybe a complete log of the snapd.service would be even better [14:09] mborzecki, attached [14:09] jibel: thanks [14:09] I'm wondering if it could be related to bug 1871538 where systemd brings the entire session down [14:09] Bug #1871538: dbus timeout-ed during an upgrade, taking services down including gdm (Ubuntu Focal):Invalid> [14:17] jibel: not sure, looks like there were 2 instances of snapd service running one point [14:30] jibel: was there a package update sometime ~7:00 on oct 12? [14:32] mborzecki, yes among other there was snapd:amd64 (2.46.1+20.10, 2.47.1+20.10) [14:32] jibel: around that time? [14:33] from 07:01:25 to 07:03:53 [14:33] jibel: cool, thanks [14:44] jibel: ok, added more comments to the bug, a nice one [14:45] mvo: pedronis: i've added some comments to https://bugs.launchpad.net/snapd/+bug/1899794 looks like a real problem [14:45] Bug #1899794: Error during refresh of snapd leads to unusable system [14:46] errands [14:49] mborzecki, ta [15:23] mvo: I did a pass in #9469 [15:23] PR #9469: snapshots: import of a snapshot set [15:25] mborzecki: thx, snap-failure /revert logic needs improvements [15:35] pedronis: awesome, thank you! [15:36] /me lunch [15:36] * cachio lunch [16:30] PR snapd#9501 opened: [RFC] wrappers: do not error out on read-only /etc/dbus-1/session.d filesystem on core18 [17:31] cachio: I did not even look at the sbiuld error, so sorry! had meeting, an interview and 360 [17:32] mvo, np, I'll take a look [17:33] mvo, is it failing to build https://paste.ubuntu.com/p/hdxvkmrRVG/ [17:33] mvo, is it any way to show more info about what failed? [17:34] cachio: yeah, the error as is is really not helpful [17:34] cachio: there should be maybe a sbuild.log with more info? [17:34] mvo, nice, thanks [17:41] re [17:50] cachio: if you don't find anything, please remind me tomorrow, my day should be a bit less crazy tomorrow :) [17:50] mvo, sure, thanks! [17:53] sorry, this was a long visit [17:53] I'll grab some food and do reviews for cachio [17:53] and then try to install the kernel and core mvo shared [18:03] zyga \o/ [18:04] :) [18:13] pedronis: I addressed the snapshot import feedback, will ask pawel tomorrow [18:13] pedronis: for a second review [18:13] pedronis: will call it a day now [18:23] PR snapcraft#3317 opened: plugin handler: properly handle snapcraftctl errors [18:46] * cachio -> kinesiologist [19:38] * zyga picks up zyga:tweak/ignore-running aka https://github.com/snapcore/snapd/pull/9406 [19:38] PR #9406: many: allow ignoring running apps for specific request [19:53] PR snapcraft#3318 opened: plugin handler: set -x for scriptlets [20:36] PR snapd#9502 opened: tests: more output for sbuild test [20:39] * zyga made some progress and calls it a day [20:49] ok one more patch [20:49] pushed some tests to https://github.com/snapcore/snapd/pull/9406 [20:49] PR #9406: many: allow ignoring running apps for specific request [21:03] PR snapcraft#3319 opened: [RFC] plugin handler: use bash with additional error checking for core20 scripts [21:31] PR snapd#9503 opened: tests: use tests.backup tool