[05:23] morning [05:32] mborzecki: hi. I saw in a bug report that you'd been looking at the GNOME 3.38 Wayland incompatibility. I've filed https://gitlab.gnome.org/GNOME/mutter/-/issues/1454 upstream to see if we can get things fixed there. We could presumably do something on the snapd side too though. [05:33] jamesh: thanks, so the problem is mutter dropped the abstract socket for Xwayland completely? [05:33] mborzecki: yes. [05:34] jamesh: heh, tough luck, since it's in fedora 33 now, and probably other distros that ship with 3.38 [05:34] like Ubuntu 20.10 [05:34] oops, school run, back in a bit [05:34] (although we don't default to a Wayland session) [06:02] re [06:21] jamesh: otoh, at least the yuzu snap that was tried in https://bugzilla.redhat.com/show_bug.cgi?id=1883621 should have picked up wayland socket? [06:23] mborzecki: based on the output, it's skipping Qt's Wayland backend and using X11 [06:24] the rest of the message show it using the XCB backend, which is X11 [06:28] jamesh: heh, so I tried setting QT_QPA_PLATFORM=wayland and it seems to have no effect [06:28] the snap may not ship the Wayland plugin [06:29] jamesh: there's this message `Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, xcb.` but it's not clear whether those plugins were found, or whether that's some general list [06:31] mborzecki: presumably it has its plugins in $SNAP/usr/lib/x86_64-linux-gnu/qt5/plugins/platforms or something similar [06:31] jamesh: fwiw /usr/lib/x86_64-linux-gnu/qt5/plugins/platforms/libqwayland-generic.so exists in the snap [06:34] mborzecki: this is kind of beside the point though: yes, any of those individual snaps could work if they were ported to Wayland. No, that wouldn't solve the underlying problem [06:35] mborzecki: I'd like to see this fixed upstream, but there's likely to be systems hanging around for a long time with broken X11 access [06:36] jamesh: hm wodnering whether there's anything in snapd we could do besides printing warning [06:36] mborzecki: bind mount host system /tmp/.X11-unix into the sandbox [06:36] which kind of breaks our "snaps can do anything with they private /tmp" promise [06:37] yeah, but maybe it's a legitimate exception [06:38] affected systems are those on recent enough mutter, so maybe not that many yet and we have a bit of time left [06:38] Perhaps tie it to the interfaces that grant x11 access [06:38] It's recent enough Mutter _and_ using Wayland [06:38] I'm not sure if Fedora is Wayland by default yet [06:38] and there's gnome 38.1 coming too [06:39] iirc it is, and hopefully f33 doesn't ship until gnome 38.1? [06:40] reminds me i should update fedora packages to 2.47 [06:46] jamesh: i'll ping zyga when he's around, we can always propose a fix on the snapd side too and then discuss with pedronis/mvo whether it should land [06:46] sounds good [06:47] jamesh: hopefully it's fairly simple tweak in x11/wayland interfaces mount profiles [06:47] mborzecki: I wouldn't think any change would be necessary in the wayland interface? [06:49] jamesh: don't remember which interface assumes the 'handling' of Xwayland [06:50] mborzecki: It shouldn't care if it is talking to Xorg on bare metal or Xwayland [06:51] If you're running a snapped Xwayland, you'd need an x11 slot for other snaps to connect to [06:52] (of course, this is where things get complicated: if it isn't the implicit :x11 slot, what directory do you bind mount on the plug side? [06:58] morning [06:58] pstolowski: hey [06:59] need to take some paperwork to my accountant, i'll be online in a bit [07:33] * dot-tobias says morning [08:09] re [08:27] zyga: hi, is there something I should re-review or review for you? [08:35] good morning [08:35] pedronis not yet, I will finish one or maybe both older branches [08:35] pedronis they are close but not yet [08:35] ok [08:35] mborzecki, jamesh: hey guys [08:35] hi zyga [08:36] are you talking about the xwayland change in mutter recently? [08:37] jamesh: is the bind mounted thing a socket? [08:37] we need apparmor permission for the socket original bind path [08:38] as bind mounting doesn't change that [08:38] zyga: yeah. Mutter/gnome-shell 3.38 no longer creates the abstract "\0/tmp/.X11-unix/Xn" socket [08:38] right [08:38] I read the bug thread in the morning [08:39] (I had some calls and mandatory walks) [08:39] is the abstract socket path a default value of some environment variable? [08:39] er [08:39] not abstract [08:39] could we set an environment variable and bind mount the non-abstract socket somewhere? [08:39] I would prefer not to bind mount it to /tmp [08:40] though not strongly, just feels dirty / old design [08:40] sockets go to run nowadays [08:40] zyga: hm yeah, good question, maybe we could overwrite the location somehow [08:40] worth checking what the libraries do [08:40] zyga: and if it was run, it should work out of the box without dirty tricks [08:40] I have some patches to finish from last night but it's something we could do for the next release for sure [08:40] (sans some apparmor policy tweaks) [08:40] mborzecki yeah [08:42] zyga: if an app is run with e.g. DISPLAY=:0, it will attempt to connect to "/tmp/.X11-unix/X0" as an abstract namespace and regular socket (not sure which order it tries) [08:42] DISPLAY=:1 would try .../X1, etc. [08:43] jamesh: right, the question is whether we can cantrol the /tmp/.. prefix with some env variable [08:43] or maybe it's a well known convention and here's no way around it [08:43] s/here/there/ [08:44] Xorg and Xwayland listen on both by default. In the new Mutter, they're attempting to start Xwayland lazily: open the sockets, wait for someone to try and connect, then have Xwayland adopt the file descriptors [08:45] they had this implemented with the two sockets, but it was only polling the abstract namespace socket. This meant clients that connected to the regular socket before Xwayland started would hang [08:45] To fix that, they just got rid of the abstract namespace socket [08:45] jamesh: if i understood zyga's idea correctly, we could try to bind /tmp/.X11-unix to some location under /run/, and if possible set an env variable to redirect the app there, so as to avoid polluting /tmp inside the snap mount ns [08:45] The bug I filed is proposing to revert that and just fix it to poll both sockets [08:46] mborzecki yes [08:46] jamesh right, I think that's totally fine as well [08:47] I don't think there is any environment variable to redirect it from /tmp: if there was, I'm sure desktops would have started using it long ago [08:48] so mounting elsewhere would require putting a symlink in /tmp [08:49] mborzecki: hi, are you working on a follow up to #9443 to move key saving? [08:50] pedronis: yes, that and a followup to #9440 to replace lazy bootloader lookup with ForGadget() when creating observers (though i should open that in a short bit since the code is ready) [08:51] mborzecki: thx [08:53] pstolowski: hi, are you blocked on me atm? [08:55] zyga: here's the code where it tries to open the display: https://gitlab.freedesktop.org/xorg/lib/libxcb/-/blob/master/src/xcb_util.c#L226 -- the socket path is hard coded. [08:55] pedronis: not really, but thanks for asking! do you intend to look at services PR (would be great), or are you ok if it lands with existing reviews? that will wait for 2.48 branch anyway, so not urgent, and not blocking [08:56] pstolowski: I'll try to look at it but not this week [08:56] pstolowski: I will try to re-review #9391 though [09:04] pedronis: i'm currently busy with the recent issues.. i think we need to think about undo of snap remove [09:06] and yes, landing #9391 would be nice [09:09] pstolowski: need reviews for that? [09:11] mborzecki: it got +1, but is missing review from Samuele at this point. but every review is welcome :) [09:21] pstolowski: yes, I saw from the SU notes, that you are looking into some bugs :/ [09:23] jamesh unfortunate [09:23] jamesh let's see what upstream response is [09:29] jamesh: thanks for finding that code [09:30] mborzecki: there's similar code in libxtrans: I'm not actually sure which one usually gets run for modern apps. In both cases, the paths are hard coded into the library [09:33] morning folks [09:33] #9471 and #9472 are both really simple and need 2nd +1s [09:33] :-) [09:36] ijohnson: thanks, did 9471, and 72 now too [09:37] ijohnson: have you heard back from hellsworth about dhcp & nm maybe? [09:38] ijohnson good morning! [09:41] Hey zyga and mborzecki [09:41] mborzecki thanks! I'm talking with hellsworth later today re nm ran out of time yesterday [09:42] ijohnson: great, thanks, i suspect we'll need to copy /var/lib/dhcp too, but better to have a confirmation [09:42] Yeah indeed [09:42] ijohnson: fwiw it's already listed in writable-paths too [09:48] yeah it's easy enough to add to that list later when we know it's needed, because we could need other things for example too [09:54] pedronis: hi, do you want to wait until after the sprint to discuss 9458 and exposing a way to build a testkeys trusting snapd snap for CI only? [09:58] * zyga reboots [10:04] oh zyga I forgot to ask how is your thumb doing!? [10:05] ijohnson still in one piece [10:05] that is a good thing [10:05] just a cut [10:05] the nail is the worst part [10:05] it heals so slowly [10:05] yeah that sounds really painful [10:05] brand new knife [10:05] knife works good though it sounds! [10:05] haha, yeah [10:05] so I assume the knife is okay too [10:06] my grandpa used to tell this story about this guy who worked on a farm wearing and cut his hand up while wearing gloves and he complained "my hand will grow back but this glove never will" [10:07] well presumably my grandpa still tells the story he's still around after all :-) [10:08] ijohnson that's a gruesome story :) [10:08] haha yeah [10:15] hi ijohnson 👋 [10:16] hey dot-tobias sorry I saw your message yesterday but didn't get a chance to finalize all the commands to debug your issue, will try to finish that after a little bit this morning [10:17] ijohnson no worries, can image you're busy – appreciate any help whenever you have the time 😊 [10:17] yes of course, we hope to be releasing soon so issues like this are important to work through and solve :-) [10:28] time to drive back home [10:29] back online in a bit [10:29] * zyga-x240 writes some tests [10:37] ijohnson: yes [10:37] pedronis: ok [10:52] ijohnson: I mean, I looked at it, I have opionions, but this week is probably not a good time to discuss them [10:52] that's fine, I assumed as much [10:54] zyga: +1 to 7614 [10:54] ijohnson I spoke with mvo [10:54] it should land after core 20 releases [10:54] to stay on the safe side [10:54] I will add a label [10:54] thanks! [10:54] zyga: no problem, I guess it will live to see it's first birthday then [10:56] I've replied to the dprintf comment as well [10:56] there's a bit of function pointer thing coming up later, dprintf is used in v1 mode [10:56] v2 uses a different implementation with essentially the same idea (add this major:minor thing) [10:56] ah ok, I wasn't sure why but if that machinery will be changed, just seems that dprintf is more generically useful than just for udev [10:57] that's true [10:57] it's pretty neat actually [10:57] oh well, after core20 we can dust this off [10:57] thank you for the review [10:57] I hope I didn't miss anything, this branch survived a few small changes to the original code [10:58] yeah it all looks good to me, I look forward to the day we can get rid of the shell script based udev helper entirely :-) [10:58] well [10:58] me too :) [10:58] actually there are some things we could do that would also fix the udev raciness problem [10:58] but that's not close yet [11:01] mvo: pedronis: can we document resilience.vitality-hint for folks to use generically ? or is there more work that will come to that setting / feature that necessitates not mentioning it to the world yet [11:02] ijohnson: I think documenting this is a good idea [11:02] ok, I will try to write something maybe degville can take a look and integrate into the relevant area [11:04] \o/ [11:04] thanks [11:08] * zyga coffee [11:10] perfect while spread runs :) [11:10] re [11:18] degville: I just made edits directly to the system-options page, can you review my proposed addition? It's a little awkward to phrase/explain the feature in a way that doesn't just say "when you set this setting then we set OOMScoreAdjust to this thing" [11:18] ijohnson: yes, of course - thanks for making those updates. [11:18] thanks! [11:19] * ijohnson still thinks it's magic that changes to the forum get reflected on the snapcraft.io/docs page immediately [11:33] * zyga puts on his canadian pullover [11:33] that time of the year [11:42] ...when the lawn mowers magically turn into leaf blowers ... [11:56] at least they don't suck ;) [11:57] cachio: https://github.com/snapcore/snapd/pull/9475 [11:57] ijohnson: if you have a moment ^ that's fairly short and simple [11:58] zyga, nice, thanks, I'll take a look [11:59] looking === pedronis_ is now known as pedronis [12:13] pstolowski: I reviewed #9391 [12:20] pedronis: ty [12:45] brb, see you at standup [14:20] * zyga runs more tests and thinks of lunch [14:20] it's getting late === King_InuYasha is now known as Conan_Kudo === Conan_Kudo is now known as King_InuYasha === King_InuYasha is now known as Conan_Kudo === Conan_Kudo is now known as King_InuYasha [15:37] a little bit more and things should work [15:59] * cachio lunch [16:16] why bind mounting /tmp/.unix-x11 would break "snaps can do anything with they private /tmp" promise? [16:18] they would still able to do anything except removing that dir but why someone would want to do that? [16:19] it's likely that nobody would even notice there is new dir there [16:21] vidal72[m] I think because /tmp was empty and had no meaningful content [16:21] now it does not [16:22] yeah but why it matters? [16:23] from user/developer perspective? [16:24] it's the same like you when add any other subpath elsewhere which didn't existed but exist now [16:26] for app this should be invisible change [16:44] vidal72[m]: I think it's not that, it's just that we try to "tame" the system by defining some rules (any rules are good really) [16:44] vidal72[m]: I think that by having a swarm of exceptions we revert back to the original moving zoo [16:45] personally I would prefer if the X11 socket was configurable [16:45] or if the location was more sensible [16:48] * zyga-x240 debugs more tests [16:49] but shouldn't the rules be backed by some reasoning? otherwise you end up with rules for the sake of rules which are disconnected for actual needs [16:52] vidal72[m]: well, in this specific case, it's clear that /tmp is not a place for sockets [16:52] it's not an accident we've been using /run for that for a long while now [16:52] I realize X11 predates that [16:53] vidal72[m]: just like we are evolving the stack to support containerized apps elsewhere, I don't see why changing the location of the X11 socket, over time, is out of the question [16:56] I think the expectation is x11 will die over time which is why such fundamental changes are unlikely [16:56] vidal72[m]: well, I think that X11 will be with my grandchildren [16:57] and that changing a socket path _location_ to include additional one is fairly trivial [16:57] even if that has to be made across a few libraries [16:57] that doesn't mean I think it should be done [16:57] but I do think we should not size it as something it is not [16:58] even if it's done then it's not solution for urgent snap issue which occurs right now [16:59] I think the current issue is easier to solve by reverting the change [16:59] reverting the change break flatpak [16:59] which is why it was done in first place [16:59] I'm not an expert on those libraries but I believe it's possible to handle both locations, is it not? [17:00] perhaps jamesh should discuss the details [17:00] and gnome devs are quite close to flatpak [17:02] I'm sure all changes are done in good faith [17:03] I think gnome devs weren't aware that it affects snap but if they have to chose to support only one thing then... [17:03] I really doubt in the software world there's no way to solve this [17:04] I'm more surprised it's a change now [17:04] beside that snap requiring abstract socket is problem on its own and you didn't saw problem with fixing it on snap side back then https://forum.snapcraft.io/t/xorg-abstract-socket-is-mandatory-for-running-snaps/4580/9 [17:04] wasn't it a long-standing location [17:04] or did it suddenly break now due to other changes? [17:05] vidal72[m]: my opinion there was pure pragmatism, we can still bind mount to /tmp [17:05] vidal72[m]: but I really would prefer not to, at least without considering the alternatives or a bigger framework that could track such hacks over time [17:09] it stopped working because gnome mutter disabled support for using xorg abstract socket for xwayland, they weren't aware someone is relying on it [17:10] vidal72[m]: right, but was it a problem for flatpak just now or was this always broken in some way? [17:10] you said it was done to fix flatpak somehow [17:10] always broken [17:11] Guess not that many people run flatpak and wayland then [17:11] perhaps by 2022 wayland is complete enough to switch over, I would really like that [17:11] for 2020 the usability issues were too grave, IIRC [17:12] well, your prob is Xwayland, not wayland after all 😉 [17:12] yes it;s xwayland + on-dmand launch gnome mutter feature which I think was experimental until recently [17:13] (i'm pretty sure native wayland apps will just run fine ... its just that there are so few still 🙂 ) [17:13] it was long unfixed until gnome core dev hitted it :) [17:13] very unusual ;-) (just kidding) [17:13] I mean, it's just a bug [17:14] people with the right experience will look at it [17:14] yeah [17:14] from snapd point of view, we can always add hacks [17:14] my entire point is that I'd love to have structure to those [17:14] james and daniel are already deeply in it it seems [17:14] I would love some tea with milk [17:14] I think so, the problem is in good hands [17:18] I would love to see this "hack" in snap as my system has x11 abstract socket disabled for increasing security [17:19] vidal72[m]: perhaps I'm mistaken but I believe wayland is pretty much insecure until the entire transition is complete, including input, which is still years away [17:19] so while good, the security situation is not terribly improved yet [17:20] people should just use mir .... [17:20] 😄 [17:21] * zyga-x240 makes small progress on improving snapd test suite [17:21] zyga-x240: I don't use wayland at all, I mean I disabled xorg abstract socket [17:21] vidal72[m]: oh, I misunderstood then, how is that better? [17:23] because restricting access to unix socket (which is just a file) is much easier that restricting access to abstract socket which needs new network namespace [17:23] vidal72[m]: I see, that's true indeed [17:24] in snapd, with apparmor, we can restrict both, I believe, but that applies to snap apps [17:54] zyga-x240, when you have some time, could you please take another look to #9425 [17:54] tx [17:54] sure [17:55] zyga-x240: we can do both provided we have the af_unix patch. without that, only named sockets (not abstract or anonymous) [17:56] I see [18:01] done [18:01] cachio: ^ [18:01] tx [18:05] cachio: could you please look at https://github.com/snapcore/snapd/pull/9478 [18:05] sure [18:07] zyga-x240: thanks for choosing 14 instead of 60 days for forcing a refresh. please also see https://forum.snapcraft.io/t/wip-refresh-app-awareness/10736/44 [18:07] * jdstrand meant to comment on that sooner [18:07] jdstrand: looking [18:09] thank you, replied! [18:10] jdstrand: I think that the best outcome is that this gathers interest of app developers and we can consolidate snapctl APIs into popular snaps, such as chrome, so that they can provide a more complete experience [18:11] zyga-x240: thanks! iirc, isn't the current feature no longer experimental in 2.47? does it make sense to mention that there as well? (since stable is still 2.46)? [18:11] jdstrand: it will be experimental for at least one more release [18:11] jdstrand: (after this one) [18:11] jdstrand: we need --ignore-running switch to snap refresh, among others [18:11] zyga-x240: yes. snaps ideally would be involved because they know best what should happen and how to respond [18:11] jdstrand: we need some selinux changes for the monitoring being on by default [18:11] jdstrand: my thoughts exactly [18:12] jdstrand: so it's not something we will enable yet but I hope that it can be done soon, now that core20 seemingly is close to release, and more review time is available [18:12] * jdstrand nods [18:14] I mean, there is always going to be cases where the snap publisher isn't the upstream and they don't really have the capacity to integrate with snapctl, etc as fully as perhaps is desired, so notifications will always be helpful (and a nice fallback for those snaps that do try to do something, but are buggy/etc) [18:14] jdstrand: I was hoping that with the help of the desktop team [18:14] jdstrand: we could have an action on the notification [18:15] * jdstrand nods [18:15] jdstrand: and interacting with the action would send X11/wayland message asking the application to close gracefully [18:15] jdstrand: (which would be done without any privilege escalation, locally in the session) [18:15] jdstrand: but that has no guarantee to close the app, simply to ask it [18:15] that could be nice and would be a great improvement [18:15] I had a look at that and had a depression [18:15] :D [18:16] heh [18:16] two decades ago I bought a windows petzold book on how winapi looks like [18:16] and that had such message in first lesson [18:16] X11 does not [18:16] so ;) [18:16] well, I'm sure the desktop team is more knowledgeable than I am [18:16] and that something can be arranged [18:17] (i was especially fond of recalling how windows coalesces messages that invalidate each other, before those are delivered to the application) [18:17] anyway, [18:17] :D [18:18] cachio: one more but not priority: https://github.com/snapcore/snapd/pull/9479 [18:18] and this is really early code [18:19] I suspect you'd need shell help there rather than just taking it on. X11 should give you everything you'd need to do something (perhaps drastic), but I suspect there would be obstacles doing that on wayland. perhaps there are f.d.o apis for that... [18:19] zyga-x240, nice [18:19] I'll check that later [18:19] I need to go to the kinesiologist now [18:20] kenvandine: ^ do you know if x11/wayland have a way to gracefully ask an application to quit, assume all we know is the PID [18:20] cachio: that's allright, it's really early code [18:20] kenvandine: something that might trigger an app to ask the user to save a document or confirm [18:20] * cachio afk -> kinesiologist [18:21] zyga-x240: yeah, there is [18:21] i don't recall what off hand though [18:21] kenvandine: \o/ [18:21] kenvandine: if you recall I'd love to know [18:21] I could really use that [18:21] but when the user logs out the session triggers all the apps to block until saving [18:21] oooh [18:21] that's perfec [18:21] *perfect, I'll try to integrate that [18:21] :) [18:22] it used to be triggered by the gnome-session package [18:22] I got lost in various standards and arcane specification [18:22] but i am not sure now how it works [18:22] probably some glib API [18:23] kenvandine: fyi, the libreoffice lzo change was merged in the review-tools. emi is working to get that in a store pull. we could do manual reviews for a while if needed [18:23] cool [18:23] hellsworth was waiting on that [18:24] but i'll tell her to go for it [18:24] kenvandine: oh, is she now taking care of libreoffice? [18:24] yeah [18:26] zyga-x240: gnome-session-quit does still seem to support it [18:26] kenvandine: hat's off to her [18:26] so look at what it does [18:26] jdstrand: indeed :) [18:26] jdstrand: she's also maintaining network-manager, tough [18:26] apt-get source $(dpkg -S $(command -v gnome-session-quit)) [18:26] thanks! [18:26] ambitious. awesome :) [18:28] I'm sad to report that using "devel" instead of groovy does not work for source packages [18:31] * zyga-x240 spawns a test and takes a break [18:31] back in 30 [19:40] re [19:40] that was not 30 :) [19:57] how do i see the base of a snap? [19:59] ah info --verbose [20:23] one more iteration [20:23] mwhudson: note that there are some weird edge cases [20:23] like base: core16 being supplied by core [20:24] not sure if that's handled there [20:24] zyga-x240: ah i don't really mind about taht [20:24] yeah, I think core16 never really happened === benfrancis5 is now known as benfrancis [20:46] i don't know if anyone's around, but i've got an odd problem with a raspberry pi running ubuntu core. it's been running for ages without much more issue than the occasional reboot, but just recently (after a power cut), it came back up but its service didn't come up [20:46] i've just investigated and there's no snap command any more! [20:47] the file /usr/bin/snap exists but that's a symlink to /snap/snapd/current and that doesn't exist [20:47] hmm [20:47] rogpeppe: is that a core18 system? [20:48] uname -a prints : Linux localhost 4.15.0-1041-raspi2 #44-Ubuntu SMP PREEMPT Wed Jul 3 15:47:01 UTC 2019 armv7l armv7l armv7l GNU/Linux [20:48] zyga-x240: i've no idea what that means :) [20:48] rogpeppe: ok, sorry, [20:48] it's kind of late so my questions may be silly [20:48] did any snaps fail mounting? [20:48] zyga-x240: np. your interaction is appreciated! [20:48] zyga-x240: how can i tell? [20:49] systemctl --failed [20:49] that may show what's wrong [20:49] * zyga-x240 -> shower [20:49] rogpeppe: best to file a bug with the details you can collect [20:49] I was about to go to sleep [20:49] yup, looks like my service failed: [20:49] rogpeppe@localhost:/snap/snapd$ systemctl --failed [20:49] UNIT LOAD ACTIVE SUB DESCRIPTION [20:49] ● snap.hydroctl.hydroserver.service loaded failed failed Service for snap application hydroctl.hydroserver [20:49] LOAD = Reflects whether the unit definition was properly loaded. [20:49] ACTIVE = The high-level unit activation state, i.e. generalization of SUB. [20:50] SUB = The low-level unit activation state, values depend on unit type. [20:50] 1 loaded units listed. Pass --all to see loaded but inactive units, too. [20:50] To show all installed unit files use 'systemctl list-unit-files'. [20:50] is snapd.service running? [20:50] ls /snap/snapd/current/ [20:50] nope [20:50] is snapd mounted? [20:50] how can i tell? [20:50] that ls command should help [20:50] df, i guess [20:50] current is either a link to a mount point [20:50] or a dangling symlink [20:51] df works too [20:51] looks like it might be: https://paste.ubuntu.com/p/McFB3WDSbP/ [20:52] the /snap/snapd/current file doesn't exist [20:53] that directory just has four directories in with numeric names [20:53] next step is to check the status of the mount units [20:53] systemctl status snap-snapd-$number.mount [20:53] the one that current points to is the most relevant [20:53] with the numbers in that dir? [20:53] yes [20:53] there is no "current" :) [20:54] is the current symlink entirely gone? [20:54] oh my [20:55] pick the largest number [20:55] https://paste.ubuntu.com/p/mwjp9bJ3Tt/ [20:55] but I suspect something very wrong happened [20:55] yes, the symlink is entirely gone [20:55] so they are all mounted [20:55] that's so weird [20:55] can you collect journal logs (if you have persistence) [20:55] if not that's futile [20:55] I'd re-create the symlink to the largest number [20:55] and restart the system to see if things get back to shape [20:56] though no guarantees [20:56] if that works, snap changes output would be useful [20:56] there was a power-cut FYI [20:56] I see [20:56] well, something to learn each time [20:56] how should i get journal logs? [20:56] journalctl --list-boots should tell you if you have persistent logging [20:56] if there's one entry you do not [20:57] following that journalctl can be coerced to render all logs as text [20:58] looks like persistence is enabled: https://paste.ubuntu.com/p/2tRSFgP7bB/ [20:59] i can never remember how to use journalctl :) what's the right incantation to get logs out? [20:59] rogpeppe: journalctl -b -1 should give you logs from the boot that died due to power cut [20:59] please collect logs, I think we can learn a lot from this failure [20:59] and to make the system more robust [20:59] but to do that, we need to give this problem proper resources (awake developers) [21:00] https://paste.ubuntu.com/p/7cTd6b8Yq4/ [21:00] if you re-create current and reboot, that's useful to know as well [21:01] unfortunately the machine is a few hundred miles away from me, so i might have to wait before rebooting [21:01] Oct 05 12:59:14 localhost kernel: FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. [21:01] I see [21:01] you don't have to reboot [21:02] I'm just tired and I don't remember the right units to start [21:02] you may want to fsck that [21:03] ah, i'll try fsck. hopefully it'll be ok to run that live. [21:03] rogpeppe: https://github.com/snapcore/core18/blob/master/static/usr/lib/core18/run-snapd-from-snap [21:03] that's what we run on most systems with core18 + snapd snaps [21:04] so after i've run fsck, i should try to run that? [21:04] what is the status off this service? https://github.com/snapcore/core18/blob/master/static/lib/systemd/system/core18.start-snapd.service [21:04] it's invoked from this systemd unit [21:04] (that was the part I was not sure of) [21:04] fsck refuses to run with the filesystem still mounted [21:05] rogpeppe: you should be able to umount that briefly, that's boot and it is not used now [21:07] fsck doesn't seem to like me: [21:07] root@localhost:/snap/snapd# fsck /dev/mmcblk0p1 [21:07] fsck from util-linux 2.31.1 [21:07] what does it mean by " [21:07] root@localhost:/snap/snapd# fsck /dev/mmcblk0p1 [21:07] fsck from util-linux 2.31.1 [21:08] what does it mean by "fsck from util-linux 2.31.1" ? [21:08] hmmm [21:08] not sure [21:08] can you check the status of core18.start-snapd.service please [21:08] last thing I'd like to know before going to sleep [21:09] sorry, how do i do that? [21:09] systemctl status $unit_name [21:09] systemctl status core18.start-snapd.service [21:09] ah, got it [21:10] it's been too long [21:10] no worries [21:10] https://paste.ubuntu.com/p/5xTNkyqdqg/ [21:11] hmm [21:11] ls -ld /var/lib/snapd/state.json [21:11] (it should not be empty) [21:11] -rw------- 1 root root 43119 Oct 7 15:11 /var/lib/snapd/state.json [21:13] can you try one more thing: [21:13] please run: /snap/snapd/(pick largest number)/usr/bin/snap debug state /var/lib/snapd/state.json [21:14] that will tell us what happened [21:14] it looks like snapd disabled itself somehow [21:16] q . [21:16] root@localhost:/snap/snapd# /snap/snapd/9169/usr/bin/snap debug state /var/lib/snapd/state.json [21:16] ID Status Spawn Ready Label Summary [21:16] that's that? no output? [21:16] yup [21:16] ! [21:17] is /var/lib/snapd/state.json empty/corrputed [21:17] i'd paste you the contents of state.json but it looks like it's got sensitive keys in it [21:17] yeah, please dont [21:17] it's valid json tho [21:17] hmmhmm [21:17] any nothing in /changes top-level key there? [21:18] (inside the json that is) [21:19] that's empty [21:19] woah [21:19] ok [21:19] please report this [21:19] I don't have a smoking gun at all [21:19] we'd have to inspect the state of each snap (in the state) [21:19] you could make a copy of state.json in case it changes, even on the device itself [21:20] where should i report it? [21:20] launchpad.net/snapd [21:21] lol, launchpad :) [21:21] indeed [21:22] i've redacted everything that looks auth-related. i could paste the resulting json if you thought that would be useful. [21:23] you can file a private bug [21:23] and attach that to the bug report (works better for larger files) [21:23] have you got a suggestion for what keywords would be good to use in the bug title? [21:24] snapd lost current symlink, total system failure [21:24] something to catch the eye [21:24] we can change that later [21:24] cool, that's useful, thanks [21:25] thank you! [21:27] zyga-x240: what other info would you suggest is useful to attach to the report? [21:28] rogpeppe: hmm, maybe last time when you remember it working? [21:34] ok, will do [21:42] zyga-x240: https://bugs.launchpad.net/snapd/+bug/1898934 [21:43] is state.json there sanitized? [21:43] I've made the bug private just in case [21:43] zyga-x240: i will leave the system up and unchanged for a while, in case you want me to do any further investigation, but then i'll need to reinstall or something because this is running an important service (it's running my parents' house heating system!) [21:43] zyga-x240: yes it is sanitized [21:43] rogpeppe: let's chat tomorrow, I think we can recover it [21:43] (remotely, without rebooting) [21:44] zyga-x240: i can reboot if needed - my dad can reboot [21:44] zyga-x240: one other odd thing: when the system crashes (which is does relatively regularly), it always takes two tries before it comes up properly [21:45] right, I remember that [21:45] zyga-x240: that's almost certainly entirely unrelated though [21:45] really not sure why [21:46] zyga-x240: i'm still intending to try running raspbian for a bit and see if it helps. i don't know if the issues are with ubuntu core or with the hardware itself [21:46] * zyga-x240 pushed one more bit to https://github.com/snapcore/snapd/pull/8573 and goes to bed === diddledan5 is now known as diddledan