[00:57] PR snapcraft#3025 closed: tests: move FakeApt fixtures into deb tests [06:17] o/ [06:18] PR snapd#8471 closed: many: fix loading apparmor profiles on Ubuntu 20.04 with ZFS (2.44) [06:58] morning [07:12] Hey Paweł :-) [07:13] Maybe a slow day for a change [07:19] zyga: hey! we will see, mvo had issues with finalizing 2.44 yesterday evening [07:26] i hope to progress with nested vm test for early core config today [07:37] pstolowski: what kind of issues? [07:37] pstolowski: 2.44 backports not working? [07:39] zyga: just tests afaiu, plus something slightly diverged and needed a small fix [07:39] I see [07:45] pstolowski: I see mvo mentioned that master is broken [07:47] zyga: i don't know about that, maybe missed something [07:47] it's in the standup doc, it seems to be that service that gets masked somehow [07:47] so tests don't pass [07:54] zyga: #8472 fixes it for 2.44 [07:54] PR #8472: tests: disable some problematic tests for 2.44 [07:54] (well, disables the test) [07:55] so yes we need a fix for master [07:56] i'll look at it [08:14] tests/main/interfaces-time-control passed on 20.04 for me [08:15] perhaps 20.04 was fixed since yesterday [08:53] pstolowski: I suspect it's something earlier that breaks it [08:53] re [08:54] zyga: aah, i didn't think about it. running entire suite then [08:58] PR snapd#8473 opened: tests: when restoring chrony do not restart systemd-timesyncd [09:00] MVO [09:00] BEHAVE :) [09:00] it's your day off [09:01] pstolowski: ^ [09:01] +1 [10:19] PR snapcraft#3026 opened: spread tests: add core20 and cleanup systems [10:34] * zyga is drained today [10:54] echo .... [10:55] .... [10:56] ho ho ho [10:56] i'm running entire suite from mvo's PR locally to check this ntp issue [10:56] it's like we have a quarantine on the channel too [10:56] :D [10:56] thanks! [10:56] thank you for chasing reliablilty [11:05] zyga: so the test fails on test-snapd-timedate-control-consumer.timedatectl-timeserver set-ntp yes [11:05] mmm [11:05] yes [11:05] zyga: but i'm dropped in the shell, and manually it works [11:05] zyga: perhaps there is a race with ntp being synchronized? [11:05] oh that's curious [11:05] when it failed, how did it fail? [11:05] what was the error message? [11:06] sorry, not synchronized; i mean 'available' or some such [11:06] + test-snapd-timedate-control-consumer.timedatectl-timeserver set-ntp yes [11:06] Failed to set ntp: NTP not supported [11:06] zyga: ^ [11:07] same command now works manually on that system [11:08] r45d09i-o [11:09] that was not my password :) [11:09] just cleaning the keyboard [11:17] pstolowski: not sure if that helps, I sometimes run a spread with -shell, instead of debug, and then play through the "execute" part interactively [11:18] zyga: hmm we do check test-snapd-timedate-control-consumer.timedatectl-timeserver status before we carry on with the rest of the test [11:18] but maybe it's still racy [11:18] i'll try a retry around set [11:18] ah [11:20] ah, no, ignore me, i misunderstood it [11:20] ok, still digging in ;) [12:07] i think i have a tentative workaround, waiting for tests to finish. it involves retries in two places though and i'm unclear about the root cause [12:08] mmm [12:08] I was wondering if https://shop.3mdeb.com/product/tpm2/ is worth buying for desktops that don't have TPM support [12:16] PR snapcraft#3027 opened: static: mypy requires __init__.py [12:21] PR snapd#8474 opened: [WIP] tests: retry timedatectl set ops [12:22] looking [12:23] let's see how it goes across systems [12:23] pstolowski: is the move of the first block relevant? [12:24] I'm asking because there's a install of chrony that now now happens *after* [12:24] ah [12:24] that's the restore section [12:24] ok, makes sense [12:24] let's see how it works :) [12:27] looks like the mount namespace is failing to be set-up before running the install hook in this forum post: https://forum.snapcraft.io/t/sudo-snap-install-gimp-fails/16526/5 [12:27] it seems so [12:28] diddledan: thanks for raising this, I replied [12:28] thank you :-) [12:29] * diddledan huggle zyga [12:30] funny gif of the day to keep everyone smiling: https://i.redd.it/h9hr7wo8hxq41.gif [12:53] zyga: yeah i think the order of restore was a bit weird/suspect [12:54] mvo messaged me, [12:54] there's a failure in pulseaudio test in the release branch [12:54] he asked us to look [12:54] do you know which test we have for pulse? [12:55] interfaces-pulseaudio? [12:56] let's try it [12:56] running [12:58] running SPREAD_DEBUG_EACH=0 spread -debug google:tests/main/{interfaces-pulseaudio,interfaces-audio-playback-record} [12:59] morning zyga pstolowski [12:59] hey ijohnson [12:59] good morning [12:59] hi! [12:59] happy friday (indoors) [12:59] indeed! [12:59] is it just the 3 of us today then? [12:59] I think so [12:59] jdstrand: is around as well I think [13:00] nice, SU? [13:00] I'm looking into this failure by request from mvo https://www.irccloud.com/pastebin/H0N9IpA9/ [13:00] oh [13:12] * diddledan waves [13:12] I'm lurking :-p [13:12] you lurker you [13:13] https://youtu.be/mFfQQYsamqM [13:18] ijohnson: do you think sending a patch that moves a single test over to session tool is useful? [13:19] ijohnson: I have a few of those that I could just send [13:19] not sure that it helps for 2.44 though [13:19] zyga: I think it is useful in general maybe, but I don't think we should do it for the release branch yet [13:19] I was only thinking about master now [13:19] sure for master yes I think we should try to use session-tool as much as possible [13:20] ok, I'll send what I have then [13:27] ack [13:27] I've got the interfaces-audio-playback-record test running now off 2.44 branch to see if I can reproduce the failure [13:28] same here [13:29] ijohnson: I've sent a patch with tree tests [13:29] https://github.com/snapcore/snapd/pull/8475 [13:29] PR #8475: tests: port snap-session-agent-* to session-tool [13:29] cool, I'll take a look in a little bit, gonna finish porting the interfaces-pulseaudio test changes to audio-playback-record first [13:29] PR snapd#8475 opened: tests: port snap-session-agent-* to session-tool [13:30] thank you [13:31] ijohnson: reproduced! [13:31] nice [13:31] logs? [13:31] https://www.irccloud.com/pastebin/CXofm8kK/ [13:32] google:ubuntu-19.10-64 .../tests/main/interfaces-pulseaudio# ls -ld /run/user/12345/pulse/native [13:32] srw-rw-rw- 1 test test 0 Apr 10 13:25 /run/user/12345/pulse/native [13:32] PR #10: Update README.md [13:33] processes [13:33] https://www.irccloud.com/pastebin/0oP0R6j2/ [13:33] journal [13:33] https://paste.ubuntu.com/p/yYKS3Rv2JC/ [13:33] weird [13:33] pulse doesn't start [13:33] because it's running [13:34] Apr 10 13:25:17 apr101317-837037 dbus-daemon[612]: Unknown username "pulse" in message bus configuration file [13:34] PR #10: Update README.md [13:34] Apr 10 13:25:17 apr101317-837037 groupadd[19712]: group added to /etc/group: name=pulse, GID=125 [13:34] Apr 10 13:25:17 apr101317-837037 groupadd[19712]: group added to /etc/gshadow: name=pulse [13:34] Apr 10 13:25:17 apr101317-837037 groupadd[19712]: new group: name=pulse, GID=125 [13:34] Apr 10 13:25:17 apr101317-837037 useradd[19716]: new user: name=pulse, UID=116, GID=125, home=/var/run/pulse, shell=/usr/sbin/nologin [13:34] Apr 10 13:25:17 apr101317-837037 usermod[19722]: change user 'pulse' password [13:35] are we installing pulse? [13:35] * zyga checks the test [13:35] oh [13:35] zyga you are running interfaces-pulseaudio [13:35] yes [13:35] well, both but here this one failed [13:35] hmm so that's failing again too ? [13:36] hmm, well fwiw I just ran audio-playback-record via spread and it didn't fail for me [13:36] * ijohnson tries again with interfaces-pulseaudio too [13:36] also sorry I forgot is the failure we see on the release branch just 19.10, or is it 20.04 too? [13:37] 19.10 often, 20.04 once - according to mvo [13:37] ok [13:37] interesting [13:38] so we start pulse [13:38] it gets pid 20042 [13:38] that pid is gone now btw [13:38] then we wait for it to respond [13:38] but then another pulse runs [13:38] I'll look for the 20042 pid in the journal [13:38] nothing [13:38] ah, wait [13:38] the pid is useless [13:39] it's the pid of the shell :/ [13:39] actual pulse is 20080 [13:41] https://www.irccloud.com/pastebin/vFqZLjvh/ [13:41] ^ I was trying to run the failing command myself [13:42] right so this matches what happened last time, in that the daemon seemed to be around but wouldn't respond to anything [13:42] https://www.irccloud.com/pastebin/lUWR3zfO/ [13:42] pulseaudio log file [13:42] E: [pulseaudio] socket-server.c: bind(): Address already in use [13:42] because there's a socket [13:42] that's there [13:42] PR snapd#8476 opened: secboot: add tpm support helpers [13:42] and it cannot get it [13:43] :D [13:43] google:ubuntu-19.10-64 .../tests/main/interfaces-pulseaudio# ls -ld //run/user/12345/pulse/native [13:43] srw-rw-rw- 1 test test 0 Apr 10 13:25 //run/user/12345/pulse/native [13:43] google:ubuntu-19.10-64 .../tests/main/interfaces-pulseaudio# date [13:43] Fri Apr 10 13:43:48 UTC 2020 [13:43] PR #10: Update README.md [13:44] 2020-04-10 15:28:22 Debug output for google:ubuntu-19.10-64:tests/main/interfaces-pulseaudio : [13:44] ah wait, that last timestamp is my local time [13:44] zyga: so the socket is there, but nothing is holding on to it? [13:44] yes [13:44] and bind fails [13:44] because it's there [13:44] one sec [13:44] let me correlate pulse startup timestamps [13:44] with the date of that socket [13:45] Apr 10 13:25:22 apr101317-837037 dbus-daemon[612]: [system] Activating via systemd: service name='org.freedesktop.RealtimeKit1' unit='rtkit-daemon.service' requested by ':1.89' (uid=12345 pid=20080 comm="pulseaudio --exit-idle-time=300 -n -F /home/test/p" label="unconfined") [13:45] this is the only proof I have [13:45] PR #10: Update README.md [13:45] pulse requests realtime scheduling on startup [13:45] this is at 13:25:22 [13:45] the socket is ... roughly the same [13:46] let's get more precise timestamp [13:47] yay I reproduced it too on 19.10 [13:47] let's have a look [13:47] srw-rw-rw- 1 test test 0 2020-04-10 13:25:21.799833281 +0000 native [13:47] this is definitely younger [13:47] but [13:47] but only by a second [13:47] I doubt it wasn't pulse who made it [13:48] ah no actually I had an error in my test changes porting over, wrong variable name [13:48] * ijohnson tries again [13:49] I'm so inclined to port this to session tool and see if it fails [13:50] also half of the prepare goes away with it [13:50] true, I'd say give it a shot, I mean if it makes the test more reliable then it's got to be better than the current situation [13:51] I cannot explain why socket bind failed [13:51] let me read the man page [13:52] EADDRINUSE [13:52] The specified local address is already in use or the filesystem socket object already exists. [13:52] so [13:52] ijohnson: I'll start by not rewriting this [13:52] but by adding an assert up top [13:52] that that socket file is gone [13:52] maybe it will fail early on that [13:52] any other ideas [13:52] to look in this session? [13:54] zyga: so in your session right now you cannot use paplay ? [13:54] correct [13:54] I tried [13:54] I get... [13:54] zyga: can you try restarting pulseaudio the way the test does ? [13:54] https://www.irccloud.com/pastebin/UcKlvZqR/ [13:55] as in? [13:55] kill the current process [13:55] and start it the way it was started? [13:55] yes pulseaudio --kill or something like that [13:55] ok [13:55] then do [13:55] as_user "pulseaudio --exit-idle-time=300 -n -F /home/test/pulse-test.pa --log-level=4 --verbose 2>&1 | tee $PA_TEST_LOG >/dev/null" & [13:56] (some of the variables are not defined in the debug shell) [13:56] so it's not as easy [13:56] one moment [13:56] ah yeah right [13:57] doh, my fix for timeserver failed after all :(. "Failed to restart systemd-timesyncd.service: Unit systemd-timesyncd.service is masked." [13:57] google:ubuntu-19.10-64 /run# su -l -c "HOME=/home/test XDG_RUNTIME_DIR=/run/user/12345 pulseaudio --exit-idle-time=300 -n -F /home/test/pulse-test.pa --log-level=4 --verbose 2>&1 | tee $PA_TEST_LOG >/dev/null" test [13:58] pstolowski: check the maintainer script of chrony [13:58] pstolowski: it must be doing that [13:58] ijohnson: weird, I think the shell is stuck now? [13:58] hmmm [13:59] ok, managed to kill that [13:59] https://www.irccloud.com/pastebin/V0D8IOLf/ [13:59] paplay failed again [13:59] hmm is the pid file for pulseaudio there? [14:00] zyga: hmm indeed, it;s restore section that failed because of this [14:00] zyga: it's like in /run/pulse dir I think [14:00] I killed pulse, let me try again [14:00] weird [14:00] killing pulse yanked *all* of /run/user/12345 [14:00] it's empty [14:02] restarted pulse, again socket already in use [14:02] mmm [14:02] the pid file is correct [14:02] https://www.irccloud.com/pastebin/WdaDZDVw/ [14:03] that's weird [14:03] look at this [14:03] srw-rw-rw- 1 test test 0 Apr 10 14:01 native [14:03] PR #10: Update README.md [14:03] I'm sure the permissions were 600 before [14:04] nah, they were the same [14:04] so... [14:04] is it connected [14:04] I'll check if that socket is really open [14:04] there is a pulseaudio running [14:04] and there is a socket [14:04] clients try to connect to that socket [14:04] and then pulseaudio just refuses [14:04] https://www.irccloud.com/pastebin/rI9Th6Dn/ [14:06] I don't know how to correlate the fd 9 there to the socket on disk [14:06] any ideas? [14:06] mmm lsof ? [14:07] empty [14:07] as in [14:08] lsof of tthe "native" socket is empty [14:08] nobody has that open [14:08] I'll try stracing pulse [14:08] well the test passed for me again so I'm just gonna keep running it [14:09] https://paste.ubuntu.com/p/4wSSB2mBcV/ [14:09] it immediately died though [14:09] this one worked [14:09] https://paste.ubuntu.com/p/HjsnJgqy9x/ [14:09] 21195 bind(14, {sa_family=AF_UNIX, sun_path="/run/user/12345/pulse/native"}, 30) = -1 EADDRINUSE (Address already in use) [14:10] HUH [14:10] are on disk unix sockets like TCP sockets [14:10] zyga: ok, reproduced outside of our tests by switching between systemd-timesyncd and chrony packages and restarting [14:10] wait, what do you mean when you say this one worked ? [14:10] ijohnson: this one is still running [14:10] ijohnson: the previous run (identical) failed quickly and quit [14:10] look at the end of each strace [14:10] ah, can you get another shell to try playing ? [14:10] * ijohnson looks [14:10] no, same shell [14:11] in both cases I ran this command: [14:11] su -l -c "HOME=/home/test XDG_RUNTIME_DIR=/run/user/12345 strace -f -o /tmp/pulse.strace pulseaudio --exit-idle-time=300 -n -F /home/test/pulse-test.pa --log-level=4 --verbose 2>&1 | tee $PA_TEST_LOG >/dev/null" test & [14:11] the one that failed didn't get to bind [14:12] in the second log something weird happens [14:12] pulse makes a socket (fd 14) [14:12] hmm it's unclear why the strace that failed immediately didn't get to bind [14:12] and then connects! [14:12] 21195 connect(14, {sa_family=AF_UNIX, sun_path="/run/user/12345/pulse/native"}, 110) = 0 [14:12] which works [14:12] so it closes the socket (14) [14:12] makes a new socket (well also 14 now) [14:13] and then binds [14:13] which fails [14:13] * zyga reads unix(7) [14:13] 😱 [14:16] so [14:16] I think I know [14:16] well [14:16] maybe :) [14:16] one sec [14:16] yeah [14:16] hehehe [14:16] ijohnson: do you want to know what the problem is :DDD [14:16] ......... [14:16] what happens if I say no [14:16] :-D [14:17] I WANT TO KNOW [14:17] ijohnson: I just go out, get hit by a bus [14:17] okay fine, I can't live with myself denying roadmr the conclusion to this suspense [14:17] thanks 💚 [14:17] <3 [14:18] heh [14:18] ok [14:18] so the bug is really in the test code [14:18] we use su and shit to run as test [14:18] and guess what [14:19] systemd running in the user session starts pulseaudio.socket and .service [14:19] and we race and lose [14:19] it's really incorrect [14:19] if we were using session tool [14:19] we could really not start pulse at all [14:19] because just starting session for the test user gives us one already [14:19] without the race [14:19] hahahahahahahahahahaha [14:19] oh man [14:19] the socket is there because systemd took it [14:19] of course systemd is racing with us [14:19] for socket activation [14:20] hehe ;) [14:20] well in this case zyga let's just port the whole thing to use session-tool [14:20] https://paste.ubuntu.com/p/3HCqGzXrv2/ [14:21] this is strace after masking pulseaudio.{session,socket} for the user [14:21] systemctl --user --global mask pulseaudio.{session,socket} [14:21] 21597 bind(14, {sa_family=AF_UNIX, sun_path="/run/user/12345/pulse/native"}, 30) = 0 [14:21] bind now worked [14:21] paplay now worked [14:21] https://www.irccloud.com/pastebin/QPrlRzPI/ [14:21] problem solved [14:21] I guess if we didn't want to jump ship to session-tool for the release branch we could just mask pulseaudio at the start of the test [14:21] can I get a cookie :D [14:22] ijohnson: yes [14:22] yes you can get all the cookies you want [14:22] my thoughts exactly [14:22] haha [14:22] I'll prepare a patch [14:22] you just need to get find them without leaving the house [14:22] and we really need to burn user.sh with fire [14:22] thanks zyga, good detective work [14:22] and remove all the hacks [14:22] it's working against us [14:23] zyga: 🍪 [14:26] PR snapcraft#3027 closed: static: mypy requires __init__.py [14:27] ijohnson: just two tests, right? [14:27] I cannot grep any more [14:27] zyga: just interfaces-pulseaudio and interfaces-audio-playback-record [14:29] rm: cannot remove '/run/user/12345': Device or resource busy [14:29] this is also related [14:29] man this is a good day [14:36] \o/ [14:36] https://github.com/snapcore/snapd/pull/8477 [14:36] PR #8477: tests: fix racy pulseaudio tests [14:36] PR snapd#8477 opened: tests: fix racy pulseaudio tests [14:38] ijohnson: I'm happy I didn't port this test [14:38] yes, porting it would have fix things but would we know why? [14:38] PR snapd#8472 closed: tests: disable some problematic tests for 2.44 [14:38] very true [14:39] but to be honest, do we really need to know all the reasons why all our user session tests are broken [14:39] haha [14:39] I rebased and force pushed as requested by mvo [14:39] also in other news, I fixed the uboot script and have a booting uc20 on the pi \o/ [14:39] wooot! [14:39] ijohnson: I plan to buy one more rpi4 [14:39] maybe next week you could show me how to get it working [14:40] I'll break for lunch now :) [14:42] sounds good [14:50] PR snapcraft#3028 opened: static: add codespell excludes for .direnv [15:00] pstolowski: so what about that ntp test? [15:00] do we know what the problem is? [15:00] zyga: i found another revelation :) [15:00] testing atm [15:01] super :) [15:03] PR snapd#8478 opened: tests: fix racy pulseaudio tests === Eickmeyer[q] is now known as Eickmeyer [15:14] zyga: pushed 1 more change to ntp test [15:15] pstolowski: looking [15:15] zyga: approved [15:15] ijohnson: thank you :) [15:15] LOL [15:15] pstolowski: nice [15:16] pstolowski: that's the bug? :D [15:16] how did we missit before [15:17] zyga: i have *no idea* how did it work before. maybe packaging changed. i reproduced on my local box outside of tests. systemd one gets masked when chrony gets installed. but chrony doesn't even provide systemd-timesyncd, it provides chrony[d].service [15:17] pstolowski: zyga which PR is the ntp test ? [15:17] 8484 [15:17] pstolowski: I think it's sensible that chrony masks systemd feature it replaces [15:17] so that's a good find [15:17] please polish the PR, provide one patch and check if you can get a 2.44 version as well [15:19] jdstrand: I think I won't do much today [15:19] jdstrand: we're getting ready for my fathers-in-law 70th birthday [15:19] jdstrand: and people look at me to EOW now [15:19] jdstrand: I'll get through all your comments for Monday though [15:19] jdstrand: I'll be off on Monday but if you are round, try looking at the two PRs then [15:19] zyga: sounds good! there is a chance I'll be off monday [15:20] that's great, I won't stres for Monday then [15:20] just for Tuesday :) [15:20] zyga: enjoy your long weeked :) [15:20] ditto :) [15:20] zyga: try not to stress for either ;) [15:20] I feel happy anyway, this was a 2.44 firefighting day [15:20] jdstrand: if you have a second over weekend [15:20] can I interest you in libzt-doc in debian? :) [15:20] I wrote some manual pages, [15:20] lots of text [15:21] if you want to see how I did, I'd be honored to get any feedback :) [15:21] it's not work related at all so feel free to just stay indoors and read books :) [15:21] * zyga needs to EOW now [15:21] nice! :) [15:21] take care [15:21] cake is on the table [15:21] see you next week everyone! [15:22] I'll circle back to read PRs in the evening [15:48] have a good weekend zyga [16:05] Wait I want cake [16:06] 🎂 [16:07] who said cake? [16:08] You did [16:09] oh [16:09] ok then, I thought it was someone else [16:09] * cmatsuoka goes back to sleep [16:14] zzz :) [16:37] PR snapd#8477 closed: tests: fix racy pulseaudio tests (2.44) [16:50] PR snapcraft#3028 closed: static: add codespell excludes for .direnv [16:56] PR snapd#8479 opened: release: 2.44.3 [16:56] PR snapcraft#3026 closed: spread tests: add core20 and cleanup systems [17:44] PR snapcraft#3029 opened: tests: speed up clean command unit tests [17:56] ijohnson: do you have a reproducer for https://bugs.launchpad.net/snapd/+bug/1871189 ? [17:56] or should I go for the full snaps? [17:56] Bug #1871189: Snapd `cannot update snap namespace` when connecting / disconnecting interfaces [17:58] PR snapd#8480 opened: [WIP] tests: fix timeserver-control-test for 2.44 <⛔ Blocked> [18:00] zyga: there? [18:00] sure [18:01] zyga: my fix for ntp failed on pulseaudio on 19.10, everything else passed [18:01] hahaha [18:01] I guess we need mvo to merge all the fixes :) [18:02] zyga: i just force pushed to clean the history, so re-running the tests [18:02] pstolowski: can you force push 8484 with a description of what is broken and how you fixed it [18:02] zyga: prepared 2.44 cherry-pick just in case [18:02] or is that ready as-is? [18:04] spread tests are churning [18:04] I think we can only EOD and wait now :) [18:06] zyga: dang, i think i lost mvo's original commit in the battle :/ [18:06] pstolowski: git reflog [18:06] zyga: i'm too tired already [18:06] you still have it [18:06] can salvage [18:07] zyga: yeah, it's one line, not a problem. but i was testing without it before [18:07] anyway... eod [18:07] see you, and happy easter! [18:07] :) [18:07] you too :) [18:08] have a good evening and the rest of the weekend! [18:08] likewise! bye [18:12] PR snapd#8478 closed: tests: fix racy pulseaudio tests === ijohnson is now known as ijohnson|lunch [20:18] PR snapcraft#3030 opened: [WIP] repo: drop _AptCache and add migrate to install_stage_packages() === ijohnson|lunch is now known as ijohnson [22:18] PR snapcraft#3022 closed: plugins: introduce v2.PluginV2 and v2.NilPlugin [22:27] PR snapcraft#2966 closed: build providers: move to buildd images