[05:15] morning [05:25] Good morning [05:26] I’m doing my school run now. See you all later [06:08] back now [06:08] mborzecki: it's cold today [06:08] 13C at most [06:08] brrr [06:08] I hope it won't rain later [06:08] and rainy here [06:09] mborzecki: still, all the cars are stuck in traffic [06:09] biking to school is way more robust [06:09] how are we doing today? [06:09] tests were awful yesterday [06:09] failing left and right on random stuff [06:09] store, portals, you name it [06:14] good morning mvo [06:15] a little cold and rainy today :) [06:15] how was your Monday? [06:19] quick breakfast [06:19] hey zyga [06:21] mvo: hey [06:21] hey mborzecki [06:25] PR snapd#7425 closed: channel: introduce Resolve and ResolveLocked [06:25] zyga: can you take another look at #7412 ? looks like we could land it easily [06:25] PR #7412: tests: run dbus-launch inside a systemd unit [06:25] sure [06:26] looks like we can merge 7342 too? [06:26] and 7125 needs a second review (should be simple) [06:27] +1 on 7412 [06:27] mvo, do you want me to mere or do you want to yourself? [06:28] mvo: +1 on 7342 [06:28] I can do the merge, I just noticed it has no second +1 [06:28] looking at 7125 now [06:29] PR snapd#7412 closed: tests: run dbus-launch inside a systemd unit [06:31] PR snapd#7342 closed: fixme: rename ubuntu*architectures to dpkg*architectures [06:42] mvo: reviewed 7125, +1 but please check my comment there [06:43] zyga: thanks, looking now [06:50] thank you [06:52] mvo, mborzecki: I'd like to ask for a review of https://github.com/snapcore/snapd/pull/7435 [06:53] PR #7435: tests: explicitly restore after using LXD [06:53] it's a blocker for progress on https://github.com/snapcore/snapd/pull/7168 [06:53] PR #7168: tests: measure testbed for leaking mountinfo entries [06:54] I have one PR slot open so I'll work on finishing and proposing a mount-ns extension that involves a mimic, so that we can properly evaluate https://github.com/snapcore/snapd/pull/7436 later [06:54] PR #7436: many: make per-snap mount namespace MS_SHARED [06:57] mvo: offtopic, last night I was playing with raspberry pi [06:57] and I think we can slightly improve our watchdog story there [06:57] specifically around try mode boots [06:59] zyga: oh? tell me more [07:00] I read a little about the watchdog on the pi [07:00] it's a bit weird, has fixed 15 second interval [07:00] we could enable it from the boot loader [07:00] so any try mode boot can recover [07:00] I will poke around in free time over evenings [07:00] maybe I will reach something that works [07:01] hey, is anybody experiencing problems with the telegram snap? I've opened https://forum.snapcraft.io/t/telegram-snap-fails-to-start/13132 [07:02] abeato: that's new to me [07:02] abeato: ls -ld /mnt ? [07:03] zyga, $ ls -ld /mnt [07:03] drwxr-xr-x 2 root root 4096 jul 19 2016 /mnt [07:03] ok, so regular directory, not a symlink === pstolowski|afk is now known as pstolowski [07:03] mornings [07:05] abeato: can you please check how many files matching "*snap-confine*" glob are present in /etc/apparmor.d/ [07:06] zyga, https://paste.ubuntu.com/p/xt7NyYgPKr/ [07:06] that's that! [07:06] mvo: ^^^^ [07:06] sep 11? [07:06] abeato: one of the files is wrong [07:06] it's a bug in our postinst script I believe [07:06] mvo: should that be fixed and released? [07:06] hm, interesting [07:06] abeato: what does "apt-cache policy snapd" say [07:06] that will be most useful to mvo [07:07] zyga, https://paste.ubuntu.com/p/G2JMwHJSJG/ [07:08] abeato: the fix for this bug is in 61cc58dbb0a7a1a785e9e3c391b6f593df892839 [07:08] Date: Wed Aug 14 09:43:41 2019 +0200 [07:08] it may not be released yet, perhaps [07:08] mvo: ^ can you confirm if 2.40 has this [07:09] abeato: yeah, the /etc/apparmor.d/usr.lib.snapd.snap-confine should not be there :( [07:09] zyga: yeah 2.40 should fix it [07:09] if you want to fix your system please remove the file mvo mentioned and call sudo apparmor_parser -r /etc/apparmor.d/usr.lib.snapd.snap-confine.real [07:09] mvo, zyga snapd journal: https://paste.ubuntu.com/p/wHsRR4R2xD/ [07:09] mvo: in that case the fix doesn't work [07:09] zyga: oh well [07:09] zyga: let me look at this again [07:09] thank you! [07:10] do you need more data? [07:10] abeato: can you please update the forum thread with the log from this conversation? [07:10] zyga: can you take a quick look at https://github.com/snapcore/snapd/pull/7109 ? pushed some changes there yday [07:10] PR #7109: snap-confine: fallback gracefully on a cgroup v2 only system [07:10] abeato: a bugreport (super small, just the data you already pasted) [07:10] zyga, sure [07:10] abeato: then I will do a sledgehammer fix [07:10] abeato: thank you [07:10] mborzecki: sure, looking now [07:10] mvo, launchpad? [07:10] abeato: plus the content of the /etc/apparmor.d/usr.lib.snapd.snap-confine please [07:10] abeato: yeah [07:10] ok [07:10] abeato: or if its alreaady in the forum thats fine [07:11] abeato: just need a refrence in the PR [07:11] it is already in the forum, yes [07:11] I will update then the forum post [07:11] abeato: then that should be fine [07:11] abeato: thank you! [07:11] np [07:12] anyone else seen this google:debian-9-64:tests/main/snap-service-watchdog to fail recently? [07:12] PR snapd#7438 opened: devicestate: add support for base->base remodel [07:13] for soem reason the snap app gets SIGABRT https://paste.ubuntu.com/p/Z6k382RCrD/ [07:17] mvo, zyga https://forum.snapcraft.io/t/telegram-snap-fails-to-start/13132 updated [07:18] this one is interesting https://paste.ubuntu.com/p/kmGzQzZRHJ/ probably something for Chipaca or pedronis [07:18] store woes? [07:24] idk, nonce is logged in the POST request, so it is sent ;) [07:27] abeato: could you please pastebin me /var/lib/dpkg/info/ubuntu-core-launcher.conffiles [07:27] abeato: and snapd.conffile too ? [07:28] mvo, there is nothing starting as ubuntu-core* in /var/lib/dpkg/info/ === Greyztar- is now known as Greyztar [07:29] abeato: uh, sorry, please see if there is "snap-confine.conffiles" [07:29] mvo: are conffiles retained after a package is removed? [07:30] mvo: that is, they remain until purged? [07:30] zyga: yes [07:30] zyga: correct [07:30] mvo, that file is not there either [07:30] mvo, snapd.conffiles exists [07:31] mvo, https://paste.ubuntu.com/p/pTN8P7Mtt2/ - but note that I already removed the old file and run apparmor_parser to fix the problem [07:31] abeato: any output from grep usr.lib.snapd.snap-confine /var/lib/dpkg/info/*.conffiles [07:31] ? [07:31] $ grep usr.lib.snapd.snap-confine /var/lib/dpkg/info/*.conffiles [07:32] abeato: yeah, thats fine - I'm mostly trying to figure out if its still leftover in some dpkg files [07:32] /var/lib/dpkg/info/snapd.conffiles:/etc/apparmor.d/usr.lib.snapd.snap-confine.real [07:32] abeato: thats the only match? [07:32] mvo, yes [07:33] abeato: thanks! I'm slightly puzzled but thats fine, I think I know what to do (even though I'm not sure how this happens, i.e. dpkg should either know about the file or it should be gone :/ [07:33] right, it's weird... [07:39] abeato: did you develop snapd on this machine? [07:39] perhaps it came from some earlier hacking [07:39] zyga, no [07:39] mvo: I need to take a break, back-pain after last evening's longer session [07:39] I'll stretch and be back in a few moments [07:40] zyga: sure thing, get well! [07:43] PR snapd#7439 opened: packaging: remove obsolete usr.lib.snapd.snap-confine in postinst [07:44] abeato: -^ [07:45] mvo, great! [08:16] zyga: https://forum.snapcraft.io/t/significance-of-info-files-in-run-snapd-ns/12938 [08:17] tests are red again :/ [08:19] pedronis: hi, i thnk you weren't around when i liked it, interesting failure i stumbled upon today https://paste.ubuntu.com/p/kmGzQzZRHJ/ [08:20] s/liked/linked/ [08:21] mborzecki: weird error given that we see the nonce in the requests log [08:21] mhm [08:23] for some reason the nonce the store just gave us is considered invalid [08:23] unless if it's repeats worth poking the store people [08:23] but we haven't touched anything in that area since a while [08:28] we do send the exact thing we get back [08:28] fwiw [08:28] yes [08:28] so it's not missing [08:28] this isn't the first time we've seen this error [08:29] i think it's worth chasing down [08:29] * Chipaca gets on it [08:36] mborzecki: anything changed that make the tests red? [08:37] mvo: no, looks like the usual stuff, desktop portal, occasionally installing snapd deps from package archive or store hiccups [08:37] :( [08:46] mborzecki: thank you, replied [08:50] snap/channel/channel_test.go:139:17: undefined: arch.UbuntuArchitecture on master [08:51] opening a pr in a bit [08:52] mvo: I did a pass over the 2.42 PRs (except mine), they all need a little bit more work I fear [08:52] mborzecki: crossing merges ? [08:52] pedronis: yes [08:53] pedronis: ok, I have a look, thank you! [08:54] mvo: I'll try to tweak the test to check for systemd version in mine when I get a 2nd, worst case it will not make 2.42 [08:54] PR snapd#7440 opened: snap/channel: fix unit tests, UbuntuArchitecture -> DpkgArchitecture [08:55] also something weird with tests/unit/go on debian, gofmt is not installed (?) [08:57] Chipaca: also in #7411 I have two wonderings that maybe you can help with (I @chipaca-ed you on them) [08:57] PR #7411: cmd/model: output tweaks, add'l tests [08:57] pedronis: ack [08:58] pedronis: i noticed i didn't notice some of the things you pointed out [08:58] will look in a bit [09:32] mvo: your PRs for 2.42 also need 2nd reviews [09:57] zyga: i wonder if this might have some relevance: Sep 09 19:27:54 localhost snapd[692]: handlers.go:459: Reported install problem for "core18" as f732dba4-d35a-11e9-a660-fa163e6cac46 OOPSID [09:57] rogpeppe: it means that snapd fails to refresh core18 [09:57] aborts the transaction [09:57] and rolls back [09:57] that explains your reboot loop [09:57] this is very useful information [09:58] zyga: i'm not seeing a reboot loop currently FWIW [09:58] mvo: ^ can we pull the log from that error tracker entry? [09:58] rogpeppe: it will be re-attempted again [09:58] rogpeppe: until it refreshes successfully [09:58] that report has hints as to what went wrong [09:58] zyga: ah, so that's what explains the fact that it's rebooting periodically, i guess [09:58] yes [09:58] it's the transactional nature [09:58] it's just not immune to problems [09:59] * zyga found a bug (in what he was doing since morning) [09:59] zyga: here's another one: Sep 09 07:40:40 localhost snapd[715]: handlers.go:459: Reported install problem for "core18" as fc12d9b2-d2e7-11e9-b568-fa163e102db1 OOPSID [10:01] zyga: INFO Waiting for restart... [10:01] ERROR cannot finish core18 installation, there was a rollback across reboot [10:01] huh [10:01] interesting [10:01] nothing magic [10:02] rogpeppe: what does snap version say? [10:02] looks like the restart is not happening, I see a lot of Waiting for resart [10:02] snapd version is 2.40 [10:03] zyga: https://pastebin.canonical.com/p/WHVwWjRTVs/ [10:03] this is snapd + core18 arrangement, right? [10:03] zyga: [10:03] rogpeppe@localhost:~$ snap version [10:03] snap 2.40 [10:03] snapd 2.40 [10:03] series 16 [10:03] kernel 4.15.0-1041-raspi2 [10:05] rogpeppe: can you run snap changes [10:05] and pastebin that? [10:06] zyga: http://paste.ubuntu.com/p/GbtJH9N6vv/ [10:06] how about snap tasks 11 [10:06] and snap tasks 13 [10:07] zyga: ? [10:08] run the command: snap tasks 11 [10:08] and pastebin the output please [10:08] zyga: http://paste.ubuntu.com/p/CnjtBWwrtr/ [10:09] oh [10:09] Error yesterday at 19:25 UTC yesterday at 19:27 UTC Automatically connect eligible plugs and slots of snap "core18" [10:09] it failed on auto-connect?! [10:09] off to school to pick up the kids, afk for a bit [10:09] pstolowski: ^ can auto-connect prevent a reboot? [10:09] zyga: this is task 13: http://paste.ubuntu.com/p/SvvXP7rP5R/ [10:09] same thing happened here [10:10] I think we're getting somewhere now [10:16] * Chipaca takes a break [10:17] zyga: as any other task handler that errors out and triggers undo. it's implemented to retry on conflicts (which it did a couple of times in that log), but then there is a bunch of things it looks up in the state that can error out. slightly weird we don't see what the error was for this task [10:17] pstolowski: we can presumably ask rogpeppe for the state file [10:17] do you think it would help [10:18] yes it might help, maybe we will be able to track down what changed in the state that made task error out [10:20] rogpeppe: can you please report a bug on bugs.launchpad.net/snapd [10:20] rogpeppe: include a rough description of the problem how you see it [10:21] rogpeppe: so yes, if you can grab and send us state.json that would be great (don't pastebin as it has your macaroon etc) [10:21] rogpeppe: and then work with pstolowski to attach the logs there [10:21] rogpeppe: and the state file [10:21] pstolowski: where does that file live? [10:21] rogpeppe: /var/lib/snapd/ [10:21] rogpeppe: please make sure to use a private bug (when reporting it) because as pawel said, the state file contains some shared secrets [10:21] let us know if you need any help with reporting the bug [10:22] we will use it for tracking and eventual regression testing [10:22] mborzecki: found a small-ish bug just now, /etc/ssl is a special case, as you know [10:22] mborzecki: as is /etc/alternatives [10:22] mborzecki: and they don't play nicely with the trespassing detector [10:23] zyga: are the macaroons the only secret things in there? [10:23] yes [10:24] zyga: ok, i've redacted them specifically. [10:26] rogpeppe: ty [10:43] zyga, pstolowski: https://bugs.launchpad.net/snapd/+bug/1843417 [10:43] Bug #1843417: ubuntu core installation goes down regularly [10:43] thank you very much [10:43] we'll try to get to the bottom of this [10:43] pstolowski: can we use the state file to simulate a refresh somehow? [10:44] zyga: thanks. BTW if it can't be fixed fairly soon, i'll need to move to another distribution, because winter is coming and my parents need heat in the house :) [10:44] rogpeppe: can you try one specific command that may help you [10:44] rogpeppe: snap refresh core18 [10:44] that will refresh the base snap [10:44] zyga: ok, trying [10:44] then you can try to refresh snapd [10:44] essentially one-by one [10:44] one-by-one [10:45] rather than all at once [10:45] if there's some kind of conflict happening [10:45] it might be averted this way [10:45] zyga: ok, it's refreshed and is rebooting [10:45] zyga: well, in a minute [10:46] zyga: ok, we'll see if it comes back up [10:46] fingers crossed [10:46] systems at scale is complex [10:46] we wish to make this totally unattended [10:46] but as reality shows, it's not trivial [10:47] zyga: in theory possible but lots of mocking [10:48] zyga: i'm somewhat unhappy about the reboot just happening randomly, not under some sort of control, particularly if there's a possiblity that the system might not recover from it [10:48] zyga: in this particular case, if this happened when people were away, it could result in the house not being heated enough and pipes freezing, leading to expensive damage [10:49] rogpeppe: you can schedule updates [10:49] there's a way to update predictably at very precise moments [10:49] zyga: oh? how would i do that? [10:49] https://snapcraft.io/docs/keeping-snaps-up-to-date [10:49] in doubt ask mborzecki, he knows this code very well [10:50] or ask degville about the documentation for suggestions or improvements [10:50] i'm looking into the state [10:50] zyga: is there some documentation for the actual refresh timer syntax that isn't just examples? [10:51] re [10:51] I don't believe there is, what would you like, a more formal syntax? [10:51] zyga: yup [10:51] I believe it's a set of ranges [10:51] but mborzecki can correct me on this [10:51] zyga: and an explanation of the semantics [10:51] mborzecki: is there a more formal syntax of https://snapcraft.io/docs/keeping-snaps-up-to-date [10:51] zyga: with those docs, i'm left guessing [10:51] zyga: what does the "5" in "fri5" mean, for example? [10:52] rogpeppe: I think you're right - we should add a formal explanation of the syntax. [10:52] rogpeppe: the semantics is that snapd will only attempt to refresh in time that fits that schedule [10:52] zyga: it's surely not the 5th friday in the month [10:52] rogpeppe: just examples [10:52] rogpeppe: it actually is [10:52] rogpeppe: you can plan a monthly refresh this way [10:52] zyga: most months don't have a 5th friday [10:52] perhaps the example is not the best but that is the intent [10:53] mborzecki: what would happen if you pick the 5th Friday actually? [10:53] rogpeppe: exmples list: fri5,23:00-01:00 Last Friday of the month, from 23:00 to 1:00 the next day [10:53] rogpeppe: it's day[], 5 means the last week basically [10:53] mborzecki: i don't understand why that's the last friday of the month - that's what i mean when i say that it would be good to actually document the semantics [10:54] mborzecki: if the digit "5" is special, then it should say so [10:54] mborzecki: for example, would fri9 mean the same thing? [10:55] rogpeppe: some syntax was initially described here https://forum.snapcraft.io/t/refresh-scheduling-on-specific-days-of-the-month/1239/6 some changes were made along the way though [10:55] mborzecki: more unknowns: would you be allowed to do "22:00~23:00/2" ? [10:55] mborzecki: if so, what's the difference between that and "22:00-23:00/2" ? [10:56] I think those are all good questions, thank you for engaging with us rogpeppe [10:56] mborzecki: basically, i'd like to see some actual description of the semantics, not just a set of examples where i'm left to try to infer the actual rules [10:57] rogpeppe: there's this page too: https://forum.snapcraft.io/t/timer-string-format/6562 [10:58] i gues it could use a little update with some details [10:58] PR snapd#7441 opened: asserts,seed/seedwriter: follow snap type sorting in the model assertion snap listings [10:58] that design doc from niemeyer is a great start. it would be nice if some more of that made it into the actual docs. [11:00] it also mentions some stuff that isn't documented at all, such as `0:00~24:00/6:00` [11:01] but maybe that's not implemented, i guess [11:02] zyga: FYI the pi did not successfully reboot. [11:02] zyga: i'm gonna have to ask for it to be power cycled again [11:02] rogpeppe: did it reboot but fail or failed to reboot? [11:03] zyga: i've got no way of knowing, i'm afraid [11:03] huh, I see :/ [11:03] zyga: it said it was going down, my ssh connection was terminated, and it hasn't come back up again [11:03] it has rebooted then [11:03] zyga: well, it shut down :) [11:03] and after someone reboots it again it should roll back [11:04] zyga: and then i'm back in the same position as before? [11:04] I was discussing a way to make that automatic in case of failure on the pi specifically [11:04] yes [11:04] zyga: ok. :-( [11:04] rogpeppe: you can set the schedule to avoid refreshes while we try to understand the cause of the failure [11:05] zyga: can i set the schedule so refreshes are turned off entirely? [11:05] no, that's explicitly not available [11:05] zyga: AFAICS the minimum frequency is once per month [11:05] rogpeppe: you can try one more thing [11:05] you can refresh snapd itself [11:05] that should not reboot [11:05] but give you new software stack [11:05] so that some bugs that were fixed since 2.40 can be applied [11:06] well the fixes that is, not the bugs [11:06] rogpeppe: you can try to refresh snapd to candidate channel to get it [11:06] rogpeppe: with "snap refresh snapd --candidate" [11:06] on core18 systems you no longer need to reboot to get new snapd, fortunately [11:07] using the same strategy you could even refresh to a hotfix branch that contains a fix for your machine [11:07] which would allow you to refresh the rest of the system correctly, once we understand the nature of the failure [11:09] zyga: ok, i'll try that when the system has been restarted, thanks [11:12] zyga: so, auto-connect handler errors out because WaitRestart() reports a rollback error, we hit the "// TODO: make sure this revision gets ignored for automatic refreshes" case again, there is a revision mismatch there. this was discussed a few months ago when we hit similiar case. pedronis also did some work around reboots recently but i'm not sure if that's in play here [11:14] pstolowski: the checking for reboots has been added [11:14] maybe we didn't remove all the TODOs [11:14] zyga: tbh, in situations like this, maybe we should have a mechanism to temporarily disable refreshes, some local assertion or whatnot [11:14] zyga: so auto-connect is just a vitim here, problem is elsewhere [11:14] mborzecki: yeah [11:15] I don't remember when it landed though [11:16] zyga: do you think that the failure to reboot correctly is related to this problem here, or just another problem that happens to be exacerbating the issue? [11:16] I think that it may be a separate problem [11:17] perhaps it'd be good to look at snap boot environment and see what it says [11:17] mborzecki: you can set refresh.hold no [11:17] aahh [11:17] right [11:18] a bit annoying to set, we really need a command for it, but is there [11:18] zyga: ok, i'm back into the system [11:19] pstolowski: we should drop that todo, the check is now done, is done much earlier, in daemon itself [11:19] zyga: it's maybe interesting that it seems the system only reboots successfully after exactly two power cycles. [11:19] pedronis: refresh.hold? what is that [11:20] zyga: https://forum.snapcraft.io/t/system-options/87#heading--refresh-hold [11:20] it's on same page as timer [11:21]