ijohnson[m] | waltman: you should create a forum topic with more detailed output to share for folks to look at, it's hard to tell what the problem is without seeing the full output and many folks are offline right now | 01:13 |
---|---|---|
ijohnson[m] | mwhudson: so is there an easy reproducer for this bug ? | 01:13 |
waltman | ijohnson[m]: What category should it go in? | 01:29 |
ijohnson[m] | waltman: probably the snapd category makes the most sense | 01:29 |
waltman | Cool, that's what I picked | 01:30 |
mwhudson | ijohnson[m]: yes, boot the server installer and pkill snapd | 01:36 |
mwhudson | er sudo pkill -TERM snapd | 01:36 |
ijohnson[m] | mwhudson: which server installer? | 01:37 |
mwhudson | ijohnson[m]: impish | 01:37 |
mwhudson | ijohnson[m]: https://cdimage.ubuntu.com/ubuntu-server/daily-live/pending/impish-live-server-amd64.iso | 01:37 |
ijohnson[m] | perfect thanks, I'll give it a try and see what's up | 01:37 |
mwhudson | thanks | 01:37 |
mwhudson | ijohnson[m]: having fun yet? | 03:18 |
ijohnson[m] | yeah this is weird | 03:19 |
ijohnson[m] | mwhudson: so would the previous installers where this was working have used the release pocket or would they have been using the proposed pocket ? | 03:28 |
ijohnson[m] | cause I see the same behavior with 2.53+21.10 deb of snapd too | 03:28 |
ijohnson[m] | so I'm thinking this was broken between 2.51.7 -> 2.53 | 03:28 |
mwhudson | ijohnson[m]: the isos are built with the release pocket | 03:31 |
ijohnson[m] | and that hasn't changed recently right | 03:31 |
mwhudson | ijohnson[m]: i'm not sure if you saying it's weird makes me happy (i didn't miss something obvious) or sad (it might be a pain to fix) | 03:31 |
mwhudson | ijohnson[m]: no | 03:31 |
mwhudson | when did 2.53 hit the release pocket? | 03:32 |
ijohnson[m] | it's okay to have complicated and conflicting feelings about snapd | 03:32 |
ijohnson[m] | 2.53+21.10ubuntu1 just hit like 2 days ago | 03:32 |
ijohnson[m] | but I can see the same behavior with 2.53+21.10 which is why I was wondering if it changed | 03:32 |
mwhudson | 2.53~pre1.git19b68f708 landed about two weeks ago it seems | 03:33 |
ijohnson[m] | that's the one I'm trying now | 03:33 |
mwhudson | tempted to build my own snapd with more debugging | 03:36 |
ijohnson[m] | seems fine with that version | 03:36 |
mwhudson | ah ok | 03:36 |
ijohnson[m] | that's probably the next step | 03:36 |
mwhudson | hopefully that's a smaller diff to read :) | 03:36 |
ijohnson[m] | yeah | 03:36 |
mwhudson | oh no | 03:37 |
mwhudson | i wonder if it's go 1.16 vs go 1.17 | 03:37 |
mwhudson | ah no 2.53~pre1.git19b68f708 was build with 1.17 | 03:38 |
ijohnson[m] | hmm though after numerous iterations on the same VM I can't reproduce it anymore 😕 | 03:38 |
mwhudson | hmm | 03:41 |
mwhudson | i'll try an iso with 2.53~pre1.git19b68f708 installed | 03:41 |
ijohnson[m] | yeah all I was doing was just downloading the debs and installing them in the root shell | 03:42 |
mwhudson | hmm 2.53~pre1.git19b68f708 seems the same here :/ | 03:42 |
mwhudson | oh but i got "WARNING: cannot gracefully shut down in-flight snapd API activity in: 25s" | 03:43 |
mwhudson | how did it get that far without writing maintenance.json | 03:45 |
ijohnson[m] | yeah so I booted a fresh iso that had 2.53+21.10ubuntu1 on it, reproduced the bug, immediately downloaded snapd_2.53~pre1.git19b68f708_amd64.deb + installed it and now I can't reproduce the bug anymore | 03:45 |
mwhudson | hmmmm | 03:45 |
mwhudson | did snapd get restarted during the upgrade? | 03:45 |
ijohnson[m] | yes | 03:48 |
ijohnson[m] | so I see the same thing with 2.53+21.10 too | 03:48 |
ijohnson[m] | this is what I'm doing https://pastebin.ubuntu.com/p/ZR2Zh62DrP/ | 03:49 |
mwhudson | i installed ~pre1 into an iso and trying to reinstall (i accidentally left the deb in the live session's rootfs) is hanging | 03:49 |
ijohnson[m] | maybe I'm not triggering the bug the same way? | 03:49 |
ijohnson[m] | the first time I run pkill -TERM it doesn't kill snapd immediately, but after downgrading the deb, then snapd is restarted and now it doesn't exhibit the bug | 03:50 |
mwhudson | yeah so in general it seems after snapd has restarted once it's ok | 03:50 |
ijohnson[m] | yeah agreed | 03:54 |
ijohnson[m] | it's getting a bit late for me, but I think checking if a iso built with 2.53+21.10 and another one built with 2.53~pre1-blah are affected the same way would be super helpful in bisecting where this got broken | 03:55 |
ijohnson[m] | I really hope we don't have to go all the way back to 2.51.7, but also that would be really surprising if this only recently just started failing | 03:55 |
mwhudson | trying 2.51.1+21.10 now | 04:01 |
mwhudson | seems to fail the same way??? | 04:04 |
mwhudson | i have to go and make dinner now | 04:04 |
ijohnson[m] | oh noes | 04:04 |
ijohnson[m] | mwhudson: I pinged the EU folks who should be around in 1-2 hours who can take a look | 04:04 |
mwhudson | ijohnson[m]: maybe it will turn out to be systemd's fault! | 04:05 |
ijohnson[m] | Oooh that's my favorite | 04:06 |
mwhudson | ijohnson[m]: i'm stumped and am giving up for now, hopefully the european wizards can figure it out | 04:15 |
mwhudson | (also had covid vaccine #2 yesterday and am finding it a bit hard to think) | 04:16 |
mardy | mvo: hi! Does golang has a sort of event loop? I'm just trying to make some sense out of https://bugs.launchpad.net/ubuntu-cdimage/+bug/1946656 (added a few questions in the last comment) | 06:31 |
mup | Bug #1946656: [daily impish-live-server] snap stuck in the installer system <fr-1794> <snapd:New for mardy> <subiquity:New> <Ubuntu CD Images:New> <https://launchpad.net/bugs/1946656> | 06:31 |
mborzecki | mwhudson: still around, i see that subiquity restarts the snapd.service for some reason, and it takes a while for snapd to go down, however after it is finally down, snapd.socket is stopped to (does subiquity request that?), anyways, if the socket is down, snap list will obviously block | 07:09 |
zyga | good morning | 07:25 |
mup | PR snapd#10908 opened: cmd/snap-failure: use snapd from the snapd snap if core is not present <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/10908> | 07:54 |
mborzecki | zyga: hey | 08:01 |
zyga | mborzecki, hey :) | 08:02 |
zyga | how are you today? | 08:02 |
zyga | I'm playing with intellinet PDU | 08:02 |
mborzecki | zyga: fighting fires with PRs | 08:02 |
zyga | mborzecki, good luck, is the fire externally caused or internal / rushing fixes for impish? | 08:05 |
mardy | zyga: hi! | 08:05 |
zyga | hey mardy | 08:05 |
sil2100 | mvo, mardy, mborzecki: thanks for looking into LP: #1946656 ! Hope you and the subiquity guys can find out what's up there since this is basically a 21.10 release-blocker, so we'd appreciate all-hands-on-deck for that one o/ | 08:22 |
mup | Bug #1946656: [daily impish-live-server] snap stuck in the installer system <fr-1794> <snapd:New for mardy> <subiquity:New> <Ubuntu CD Images:New> <https://launchpad.net/bugs/1946656> | 08:22 |
mardy | sil2100: maybe you can help me: can I replace a binary in the impish-live-server-amd64.iso? I mounted it as a loop device, but inside it I don't see a normal Linux FS; just the boot/ folder, and then a debian repo | 08:35 |
mardy | ah, I just found a couple of squashfs files | 08:37 |
sil2100 | mardy: depending on what kind of changes you need to do, but I suppose using something like https://github.com/mwhudson/livefs-editor might be helpful! | 08:37 |
mardy | sil2100: wow, that looks handy, thanks! | 08:46 |
mborzecki | sil2100: who can take a look into subiquity? i see there's like 25 connections to /run/snapd.socket from /snap/subiquity/2793/usr/bin/python3.8 -m subiquity.cmd.server that don't go away when snapd is shutting down (also what's causing snapd to wait longer due to a graceful shutdown) | 08:50 |
sil2100 | mwhudson, dbungert: ^ | 08:54 |
mwhudson | mborzecki: yeah i noticed the same thing | 08:55 |
mborzecki | mwhudson: just added https://bugs.launchpad.net/ubuntu-cdimage/+bug/1946656/comments/13 | 08:55 |
mup | Bug #1946656: [daily impish-live-server] snap stuck in the installer system <fr-1794> <snapd:New for mardy> <subiquity:New> <Ubuntu CD Images:New> <https://launchpad.net/bugs/1946656> | 08:55 |
mwhudson | mborzecki: all i can think is that moving from core18 to core20 brought a new version of some library that is messing things up | 08:55 |
mardy | mwhudson: how do I install livefs-edit? I tried "pip install .", but then the program fails with: | 08:57 |
mborzecki | mwhudson: i suspect the problem here is taht snapd tries to do a graceful shutdown and waits for those connections to be idle, but somehow they do not enter such state, so eventually we hit a sigterm timeout in systemd which issues a sigkill, thus snapd fails, triggering snap-failure which hits a 'bug' i fixed in https://github.com/snapcore/snapd/pull/10908, but fixing the connections part would also make the problem go away | 08:57 |
mup | PR #10908: cmd/snap-failure: use snapd from the snapd snap if core is not present <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/10908> | 08:57 |
mardy | File "/home/mardy/.local/bin/livefs-edit", line 8, in <module> | 08:57 |
mardy | sys.exit(__main__()) | 08:57 |
mardy | TypeError: 'module' object is not callable | 08:57 |
mwhudson | mardy: oops | 08:58 |
mwhudson | mardy: i usually just run "sudo PYTHONPATH=~/src/livefs-editor python3 -m livefs_edit ..." | 08:58 |
mardy | mwhudson: I'll follow your steps, then, thanks :-) | 08:59 |
mwhudson | mborzecki: yeah i wonder why the connections are not going idle | 08:59 |
mwhudson | mborzecki: maybe subiquity isn't reading the complete response or something? | 08:59 |
mborzecki | mwhudson: like a trailing \n or something, hmm that's possible | 09:00 |
mwhudson | one quick hack coming up | 09:00 |
mborzecki | mwhudson: also why so many connections? maybe there's a connection per request? | 09:01 |
mwhudson | mborzecki: i suspect the answer to that is the answer to the other thing | 09:01 |
mwhudson | maybe requests_unixsocket just isn't very good | 09:02 |
mborzecki | mwhudson: yeah, might be, perhaps the connection does not become idle as data was not fully received, and a new one is created for a subsequent request | 09:02 |
mardy | mwhudson: could it be possibly related to https://bugs.launchpad.net/snapd/+bug/1943169 ? | 09:05 |
mup | Bug #1943169: snapd daemon doesn't send an EOF <snapd:Invalid by mardy> <https://launchpad.net/bugs/1943169> | 09:05 |
mwhudson | mardy: i doubt it | 09:06 |
mardy | mborzecki: regardless of that, can't/shouldn't snapd just close all the connections abruptly? | 09:06 |
mwhudson | pfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff | 09:06 |
mwhudson | so this patch seems to fix the issue https://paste.ubuntu.com/p/ThkpshCXRz/ | 09:07 |
mborzecki | mardy: well, we can, but that's not nice 😉 | 09:07 |
mardy | mwhudson: nice! ship it! :-) | 09:07 |
mborzecki | mwhudson: ah enter/exit 😉 nice | 09:08 |
mardy | mborzecki: yes, but relying on clients behaving properly is not robust (though it's nice :-) ), since we might not control them | 09:09 |
mborzecki | mwhudson: so what's the plan now? fix in subiquity, rebuild the package & respin the image? | 09:09 |
mwhudson | mborzecki: my plan is to ask sil2100 what to do :) | 09:10 |
mborzecki | haha 🙂 fair enough | 09:10 |
mborzecki | maybe i can patch the livecd and check with the patch locally | 09:11 |
sil2100 | mwhudson: ooooh! So we can fix this in subiquity then? | 09:17 |
sil2100 | mwhudson: yes, if this works then damn, let's get this into subiquity stable and respin o/ | 09:18 |
mborzecki | sil2100: please let me know when you have something | 09:20 |
mwhudson | mborzecki: have you seen the joy/horror that is ./scripts/quick-test-this-branch.sh in the subiquity tree? | 09:22 |
mborzecki | mwhudson: oooh, interesting | 09:24 |
mwhudson | mborzecki, sil2100: https://github.com/canonical/subiquity/pull/1094 | 09:24 |
mup | PR canonical/subiquity#1094: close the session object after each request to the snapd API <Created by mwhudson> <https://github.com/canonical/subiquity/pull/1094> | 09:24 |
mardy | mborzecki: are you doing any more work on this issue? Just to double-check that we don't do duplicate efforts | 09:38 |
mborzecki | mardy: not really, trying to repack subiquity snap and edit the livecd | 09:39 |
mardy | mborzecki: I've been reproduced part of the issue locally, where snapd gets stuck after a TERM signal if it has a pending connection, and I'd like to try to fix it (maybe wait for three seconds, and then just close all connections) | 09:39 |
mardy | mborzecki: nice | 09:39 |
mborzecki | mardy: look at the code in daemon.go, we pass context.WithTimeout() to Shutdown(), which in theory should hit a timeout at some point | 09:40 |
mborzecki | meh, livefs-editor doesn't like me really | 10:12 |
frederic_02 | speack french? | 10:23 |
mup | PR snapd#10909 opened: daemon: make daemon shutdown timeout shorter <Simple 😃> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/10909> | 11:00 |
mup | PR snapd#10901 closed: overlord: add managers unit test demonstrating cyclic dependency between gadget and kernel updates <Skip spread> <Created by bboozzoo> <Merged by bboozzoo> <https://github.com/snapcore/snapd/pull/10901> | 11:20 |
mup | PR snapd#10910 opened: o/snapstate: check snaps for duplicate or invalid names <Created by MiguelPires> <https://github.com/snapcore/snapd/pull/10910> | 12:55 |
mup | PR snapd#10911 opened: daemon: use the syscall connection to get the socket credentials <Created by mardy> <https://github.com/snapcore/snapd/pull/10911> | 13:00 |
mup | PR snapd#10912 opened: tests: not testing lxd snap anymore on i386 architecture <Simple 😃> <Created by sergiocazzolato> <https://github.com/snapcore/snapd/pull/10912> | 14:31 |
flotter | Is snapd supported on the platforms under packages/ and are all those up to date ? | 14:43 |
flotter | i.e. fedora | 14:44 |
mup | PR snapd#10890 closed: tests: using test-snapd-curl snap instead of http snap <Created by sergiocazzolato> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/10890> | 15:31 |
ijohnson[m] | flotter: do you mean `packaging/` dir in the snapd git tree? if so then yes those are the supported distros and they should generally be up to date, sometimes they can lag a bit since it takes time to do release artifacts and get them through the upstream pipelines, etc after we have cut a tag for a release | 15:40 |
miguelpires | mvo: can you merge this https://github.com/snapcore/snapd/pull/10897 please? failures are unrelated | 16:02 |
mup | PR #10897: osutil: ensure parent dir is opened and sync'd <Simple 😃> <Created by MiguelPires> <https://github.com/snapcore/snapd/pull/10897> | 16:02 |
mvo | miguelpires: sure | 16:51 |
mup | PR snapd#10897 closed: osutil: ensure parent dir is opened and sync'd <Simple 😃> <Created by MiguelPires> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/10897> | 16:56 |
mup | PR snapd#10913 opened: [WIP] docs: update HACKING.md instructions <Created by flotter> <https://github.com/snapcore/snapd/pull/10913> | 17:36 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!