[01:28] PR snapd#10707 opened: tests/regression: add regression test for LP #1942266 [01:43] PR snapd#10708 opened: tests/lib/prepare.sh: use core20 from beta channel temporarily <⚠ Critical> [05:20] morning [05:22] good morning school chaos :) [05:45] zyga-mbp: heh, damn right [05:45] need to run an errand, bb around 9 [06:27] morning [06:44] re [06:44] pstolowski hey :) [06:44] pstolowski: hey [07:18] hi all! Yes, I also went to attend to the school chaos :-) === alan_g_ is now known as alan_g [07:22] zyga-mbp: meh, snapd on fedora is broken https://bugzilla.redhat.com/show_bug.cgi?id=1999998 https://forum.snapcraft.io/t/cannot-install-snap-file-snap-is-unusable-due-to-missing-files/25719/15 [07:23] oh [07:23] not quite sure how that squashfs-tools update went in with only 3 days in testing [07:23] eh and i hate fedora packaging process, so many manual steps [07:23] mborzecki break fast and often [07:23] my fedora laptop is in the Huawei office today [07:25] mborzecki what's the fix strategy? change snapd to cope with the new output? [07:26] perhaps snap-unsquashfs is needed, it could link to cgo libsquashfs? [07:26] zyga-mbp: i've already fixed it, but did not update the package in fedora yet because i did not suspect they would actually update to new squashfs-tools in a fedora version that was released a while back [07:27] fedora updates are like that [07:27] yeah, fun and unpredictable [07:28] would a zool test prevent that issue? [07:28] IIRC zool is like autopkgtest? [07:28] anyways, the fix was cherry-picked for 2.51.7 so i can drop the patch in arch too and probably update tw pacakge as well [07:29] so half a day lost to packaging [07:29] mborzecki good luck! lots of people depend on that [07:31] zyga-mbp: still, i'm happy that i don't have to deal *deb beaurocracy [07:32] see :) [07:32] some upsides ;D [07:49] hmmm why do we have this tag in the repo: https://github.com/snapcore/snapd/tree/3.9.9-0sdhd5 ? [08:02] mborzecki looks like a mistake? [09:00] hmm i'm very confused by snap-preseed test failures on 21.10 on master; on our PRs it fails in prepare on /dev/nbd0p1 device check for qemu, but when I run this test manually with spread on google:ubuntu-21.10-64 I'm always getting a build failure on "# build squashfuse and rename to snapfuse" [09:00] mvo: ^ is there anything magical now happening wrt to snapfuse (v3?)? [09:02] pstolowski: oh, there should not be, can you show me the full output? maybe the script is buggy or something [09:02] pstolowski: what is needed to reproduce? [09:03] mvo: I run spread -debug google:ubuntu-21.10-64:tests/main/preseed [09:05] pstolowski: thanks, let me try this [09:05] mvo: a bit more context: https://pastebin.ubuntu.com/p/WgvdZK6Nyb/ [09:05] mvo: in our tests it fails on nbd check though https://pipelines.actions.githubusercontent.com/xS8oSnypZkPEQZqiZgDaRp2kdvQJKbOY08TesHp7E8vn7g4hYR/_apis/pipelines/1/runs/34095/signedlogcontent/101?urlExpires=2021-09-01T08%3A25%3A11.5093188Z&urlSigningMethod=HMACV1&urlSignature=IfSWYWEXjw9%2F0MFg%2Bu3E3F3noQ%2BO1o7x%2BD8tFdqxq60%3D [09:05] pstolowski: oh, this looks like the c-vendor file is not there [09:06] so i wonder what's different when I trigger it manually [09:06] pstolowski: probably, just rm -rf ./c-vendor/squashfuse and try again, maybe it's outdated [09:07] pstolowski: actually I think that is it :/ the script may need to be made smarter for situations like this [09:07] pstolowski: rm -rf c-vendor/squashfuse/ and then try again, I will also run it hree to double check [09:07] mvo: thanks, re-trying [09:20] mvo: yup, that solved it, i'm now seeing the same failure as we have on the PRs, thanks! [09:29] PR snapd#10709 opened: spread: add 21.10 to qemu, remove 20.10 (EOL) [09:46] mvo, pstolowski: oh no, snapd FTBFS again: https://launchpad.net/ubuntu/+source/snapd/2.51.1+20.04ubuntu1/+build/22030863 [09:48] sil2100: did you try one rebuild already? maybe just bad luck? [09:50] sil2100: I can also trigger the rebuild of course (don't want to dump this on you) [10:05] zyga-mbp: can you take a look at https://build.opensuse.org/request/show/915439 ? [10:06] sure [10:06] boo? [10:06] is boo#nnn a suse bug scheme? [10:08] mborzecki "Do you really want to approve this request, despite of open review requests?" [10:08] do you know where I can send a review so that this pop up does not show up? [10:08] zyga-mbp: iirc it's bugzilla.opensuse.org [10:10] zyga-mbp: sorry, forgot to drop a patch and had to supersede the request with a new one https://build.opensuse.org/request/show/915446 [10:11] done [10:19] PR snapd#10709 closed: spread: add 21.10 to qemu, remove 20.10 (EOL) [10:20] mvo: do we have anyone who can look into fixing https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=993233 ? [10:21] mborzecki is it a dput that's required? [10:23] mborzecki, zyga-mbp indeed, seems like we need to update snapd in debian to fix this [10:23] debian is open now [10:25] mv [10:25] mvo: fwiw the squashfs-tools patch was cherry-picked to 2.51.7 so a simple update is all we need [10:26] yeah, I think so, I will try to get to it today [10:31] sil2100: oh.. and these are different failures [10:32] sil2100: did anything change with these builders? are they slower than before? [10:32] 10634 needs a second review, looks like go modules are finally ready [10:34] PR snapd#10710 opened: tests: add more space on ubuntu xenial [10:45] PR snapd#10711 opened: tests: bump the number of retries when waiting for /dev/nbd0p1 [11:00] pstolowski: no idea, I think those are standard ones as before [11:00] Interesting, hirsute was fine? [11:09] Morning folks is UC20 in gce spread still broken? [11:13] ijohnson[m]: yeah, it's confusing, I tried it in qemu and it's fine [11:14] ijohnson[m]: but there was a change https://people.canonical.com/~mvo/core20-changes/html/edge/'20210826'r1090_'20210831'r1097.html so [11:14] good morning ijohnson[m] [11:22] Hey zyga-mbp [11:23] mvo :-/ yeah I was trying to debug it with Sergio last night didn't make it far enough, I'll resume looking at the GCE stuff, Sergio showed me how to view the serial console on GCE myself so that will aid debugging a bit [11:24] ijohnson[m]: do you have a log output that you can paste? [11:24] ijohnson[m] morning. thanks for the help with etrace yesterday [11:24] ijohnson[m]: I'm very puzzled by the fact that it's only affecting google nad not qemu [11:24] popey: hey, nice to see you [11:24] hey mvo o/ [11:25] PR snapd#10712 opened: tests: ensure the `libvirt-daemon-system` pacakge is installed <⚠ Critical> [11:25] ijohnson[m] I filed a few bugs for you on etrace :D [11:25] pstolowski: thanks so much for 10711 [11:25] pstolowski: I think it's the last thing that fails on 21.10, then we can make it required [11:26] ijohnson[m]: in case you want a break from debugging that gce, 10643 (moving to go modules) needs a :+1: :) [11:30] Whoops I step away for a minute and have like 11 pings haha [11:30] mvo i finished a review on the go modules or yesterday but I'm looking into the golang-evdev thing still as I'm still puzzled [11:31] @mvo I put log output in the SU doc [11:31] popey thanks for testing it out and helping debug yesterday :-) [11:33] ijohnson[m]: thanks, I think there are some small things to address in the go mod still so no real rush, just really would like to get it out :) [11:34] Sure [11:34] ijohnson[m] happy to test further, I have a spare machine i can kick off tests on and leave running for long periods. I want desktop snaps to be faster, so am motivated to help. [11:36] That's great thanks for the offer, I'll be sure to take you up on it, I'll take a look at the snaps you filed issues for first though it might be a simple/silly thing that doesn't require 20 minutes to run through just to get a single error message :-) [11:37] I do think the ability to run non-snap packages in the same run would be important. [11:38] so you can compare /snap/bin/vlc with /usr/bin/vlc and see what the difference is. Because I don't believe some people believe there's any issue beyond compression [11:38] mvo: let's see if it's happy, i only run it twice manually [11:41] sil2100: is the riscv64 build failure reproducible on every run? [11:44] popey yeah I think I'm response to your issue the thing to do is create a matrix for each startup type so you can compare the different versions since for example I added some really basic support for running flatpaks too [11:44] good call [11:50] PR snapd#10710 closed: tests: add more space on ubuntu xenial [12:08] ijohnson: hi, do you have a log or sth related to https://github.com/snapcore/snapd/pull/10708 ? i'm trying to investigate why uc20 fails to boot all with my remodel pr, maybe it's related [12:08] PR #10708: tests/lib/prepare.sh: use core20 from beta channel temporarily <⚠ Critical> [12:23] > removing (with --purge) the lxd snap and reinstalling it helped [12:23] Aside from a potential bug, next time, try and remove the dataset [12:31] bboozzoo hey there's a log in the SU doc [12:31] bboozzoo all UC20 systems in gce are failing to boot AIUI [12:31] On master that is [12:32] ijohnson: heh, i merged master to my remodel pr yesterday and it was getting stuck on install, although seemingly in tests that required resealing [12:44] sil2100: i'll see if I can find something about this new failures later today [13:21] PR core#121 closed: Generate the dpkg.yaml file [13:36] pstolowski: thanks! [13:45] PR snapd#10712 closed: tests: ensure the `libvirt-daemon-system` pacakge is installed <⚠ Critical> [13:50] PR snapd#10713 opened: tests: use host-scaled settle timeout for hookstate tests [13:51] heh, so the snapd snap from that remodel PR works in qemu [13:51] sil2100: ^ #10713 should help with one of the failures [13:51] Bug #10713: Broken context-sensitive spell check in Evolution (Greek, Hebrew) [13:51] PR #10713: tests: use host-scaled settle timeout for hookstate tests [13:51] sil2100: but the second failure is unclear; i think mvo wants to re-run this build [13:55] PR snapd#10714 opened: tests: move interfaces-libvirt test back to 16.04 <⚠ Critical> [13:57] pstolowski: there is a re-run running right now, let's see how it goes! It takes around 3-4 hours though [13:57] sil2100: whaaat?!!! [14:00] PR snapd#10711 closed: tests: bump the number of retries when waiting for /dev/nbd0p1 [14:03] It's riscv! [14:26] fun, so while I can see serial output from gcloud it has a buffer of like 150K or something silly like this and so I lost all of the previous serial console output due to the immense spam, seems I need to stream the output continuously while spread is running to get the full log for some reason :-/ [14:29] and the only way to know the next iteration is to parse stderr, but we also want to keep stdout for the log itself [14:29] look at this grossness https://gist.github.com/anonymouse64/90b30502ffa3093f77d9a48d9f1d5e8a [14:30] PR core20#108 closed: Generate dpkg.yaml file [14:30] PR core18#180 closed: Generate dpkg.yaml [14:36] ijohnson: mvo ok, i can reproduce the failure from my remodel PR with core20 from edge, the system does not boot [14:36] bboozzoo: I finally have a system running with full serial console logs on gce [14:37] the exact revision is 1097 [14:37] took a bit of finnicky nonsense with the gcloud command to stream the console output [14:40] ijohnson: i've repacked the pc gadgetg with cmdline.extra now [14:40] yeah that's what I'm doing too [14:41] ijohnson[m]: anything interessting so far on the console :) ? I'm very curious about this one [14:41] this is the patch to master I'm booting with https://pastebin.ubuntu.com/p/WmtvTVmq63/ [14:42] mvo: just booted now, I'm watching the output [14:46] hmm it's just stuck in install mode [14:46] it's not stuck in the initrd, I think that may have been a red herring that it seemed like it was stuck there [14:47] and actually it made it through install mode successfully, it is just stuck in run mode, I mispoke earlier [14:47] of course I forgot to include snapd.debug=1 in the kernel command line so probably need to run again with that, but it does seem to have booted successfully into run mode [14:49] this is the current log https://pastebin.ubuntu.com/p/PQKq6xNDVd/ [14:50] and it just stays stuck there indefinitely [14:50] alright I'm trying a new run with more debug info [14:59] ijohnson: hmm, at least what I observe is that the snapd-seeding service starts, but then it takes very long for actual snapd output to appear and the system seems to be stuck [15:00] ijohnson: and it's happening at random [15:04] hmm so you see it sometimes works? [15:05] bboozzoo: also interestingly, @ondra reported the same thing in MM this morning [15:05] mborzecki: can you reproduce anything there outside of google? or only google too? [15:05] mvo: it kinda sounds like ondra may have reproduced it on whatever physical device he was working on [15:08] ijohnson: yeah, it kind of gets stuck, happens at random [15:08] ijohnson[m]: yeah, was just reading this, strange [15:09] ijohnson: it looks like this: https://paste.ubuntu.com/p/JmtvzVgwMH/ [15:10] then if you wait for like 30s or more it proceeds further [15:10] what is confusing to me is that the PR that switches core20 back to beta does not help [15:10] bboozzoo: looking at your logs, but for me using core20 from edge it never proceeds even after waiting 5 minutes [15:11] mvo: well what I see is that we get further with using core20 to beta [15:11] ijohnson: it was like that in gce, even when i restarted the vm service [15:11] ijohnson[m]: yeah, one thing is that it seems there is a new 20/beta,edge kernel from the 24th too [15:11] wait bboozzoo are you talking about booting uc20 "natively" in GCE or for nested tests ? [15:12] ijohnson: no, nested in gce [15:12] I have only concerned myself with native uc20 runs in GCE, hasn't looked at nested tests at all [15:12] ah okay, so we are looking at different things, but could be different manifestations of the same problem [15:12] ijohnson: heh, not it got stuck locally again, look at this jump from 11 to 39 seconds, https://paste.ubuntu.com/p/Y7qKWG5WBT/ maybe there's also some networking issues in gce? [15:15] oh interesting that is weird [15:15] okay well I have gotten as much as I can out of this run, need to start another run [15:15] mvo: this was the next run I did, also with master snapd + edge core20 [15:15] so in "packaging: add libfuse3-dev build-dep" (2 days ago) ubuntu-core-20-64 had no failure, let me try to bisect from our previous runs if I can pinpoint this more [15:15] I'm going to try beta core20 again to see how far it gets [15:15] https://pastebin.ubuntu.com/p/QnKdX4bgMf/ [15:16] ijohnson[m]: \o/ thanks [15:22] late Monday it seems to have started [15:30] I see https://github.com/snapcore/snapd/pull/10685 was happy [15:30] PR #10685: many: fix run-checks gofmt check [15:33] oh but that started running at 2021-08-30 07:35:59 [15:33] I assume UTC? [15:39] aha, so beta channel core20 snap does work, but it is quirky in that we seeded the beta channel revision, but we use --channel=edge in our ubuntu-image command, so when the system boots up with the beta channel, it immediately refreshes to the edge channel and becomes broken again [15:39] mvo: ^ [15:40] the reason my PR failed is because spread doesn't expect the system to be rebooting so it fails trying to run commands on the VM in GCE, and when it sees that the system isn't responding tries to SSH in via debug and gets EOF since the machine is rebooting at that exact moment [15:40] so if we do a spread run with the beta channel of the core20 snap, and we make sure that the core20 snap is not refreshed, things should be happy [15:41] out of curiosity I am seeing if after refreshing from beta channel core20 to edge if we can still boot or if even that is broken too [15:43] * cachio_ afk [15:52] mvo: here's the log from using beta core20 snap, you can see it proceeds all the way to refreshing core20 from beta back to edge and it reboots https://pastebin.ubuntu.com/p/zShWjWCfwm/ [15:52] ijohnson[m]: ohhhh, ok [15:52] mvo: still no smoking gun on why core20 is broken [15:52] but now I know how to unbreak our spread systems for PR's [15:53] just testing my patch locally and then I will update https://github.com/snapcore/snapd/pull/10708 [15:53] PR #10708: tests/lib/prepare.sh: use core20 from beta channel temporarily <⚠ Critical> [16:06] PR snapd#10715 opened: Bump secboot [16:08] ijohnson[m]: I'm looking over the PRs that recently got merged for core20 and there are some potential culprits - we merge the systemd-time-wait-sync service, that sounds like something that may break things [16:09] ijohnson[m]: we changed some bits in writeable (/etc/issue,motd) [16:11] * ijohnson[m] looks for some goats to sacrifice to prevent this from being a writable-paths problem [16:11] ijohnson[m]: is there a way to run this easily against a branch of the core20 snap? I could prepare something [16:12] ijohnson[m]: I suspect the enable-time-wait-sync even though it feels odd but it's the usual messing with systemd dependencies that might have got us into this mess [16:12] ijohnson[m]: I prepared a PR with the revert but would love to test first [16:12] mvo: sure, if you can wait like 5-10 minutes I should be done verifying my fix which should show how to use a different version of core20 to test this easily [16:13] ijohnson[m]: sure, will wait and prepare the snap in the meantime [16:16] mvo: ok it worked \o/ [16:16] let me push it [16:22] a simple pr: 10714 needs a review, another thing that will unbreak tests [16:24] ijohnson[m]: excllent, I'm preparing the revert and testing it [16:26] mvo: alright https://github.com/snapcore/snapd/pull/10708 is ready again [16:26] PR #10708: tests/lib/prepare.sh: use core20 from beta channel temporarily <⚠ Critical> [16:26] mvo: specifically, just change the channel of snap download core20 in prepare.sh there [16:27] mvo: I hope it's okay I added a change to my PR there to also patch the gadget so we get more debugging by default [16:27] though I suppose that change may break some output on QEMU [16:27] mmm, let me back that out of that PR and we can propose that separately I suppose [16:27] ijohnson[m]: +1 [16:31] PR snapd#10714 closed: tests: move interfaces-libvirt test back to 16.04 <⚠ Critical> [16:33] ok, I put the extra gadget debug stuff into 10716 instead, so 10708 is just about switching which channel of the core20 snap we use [16:33] one thing I'm unsure about with 10708 is whether there are any spread tests which rely on the core20 / base snap being asserted from the store, since that will no longer be true with my branch/workaround, I had a look and couldn't see any but I could be wrong [16:34] and I did change the patch in 10716 to just make the console changes for GCE specifically, so it shouldn't detract from the qemu experience [16:36] PR snapd#10716 opened: tests/lib/prepare.sh: add debug kernel command line params via gadget on UC20 [16:38] ijohnson[m]: thanks! I opened pr#110 for core20 to undo the timesync target, but the spread test is still running [16:39] ack [16:40] PR core20#110 opened: hooks: revert PR#105 - it seems to [16:41] ijohnson[m]: fwiw, this is what is running right now https://paste.ubuntu.com/p/kMY3bVz7kY/ [16:41] [16:42] mvo: ack that looks good, if it works then we have our smoking gun as it were [16:42] also I'm not sure should we be using CORE_CHANNEL or BASE_CHANNEL ? [16:42] also do we want all our CI for snapd to be running on edge ? [16:49] ijohnson[m]: well, I think we need better QA for core20, i.e. build an image and run a test smoke test :/ [16:50] ijohnson[m]: in general I think running edge is fine, because the alternative is that we run beta and get the same issue a bit later when edge goes to beta [16:50] true [16:50] mvo: but yeah having spread tests for core20 would be great, would be a bit of work to enable though [16:50] well, if we have gating that buids a core20 in gce and does the edge->beta core20 then that's fine of coure, then beta would be better for us. but I don't think we have this :/ [16:51] mvo: also I think we were bitten by the fact that the TPE lab was moving so the tests that normally do run on edge channel of core20 snap did not run [16:51] mvo: because AIUI cachio_ has tests that run against the edge channel of the core20 snap [16:51] ijohnson[m]: could be, I don't know to what extend this is tested, if it is we should probably move to beta indeed [16:51] but they run on devices in the lab, and the entire lab is down right now [16:52] though actually if the tests run in the lab they wouldn't have caught this particular issue in GCE 🤔 [16:52] ijohnson[m]: probably not :/ [16:53] * mvo goes for dinner and checks the spread run with the updated core20 with #110 in a bit [16:53] PR #110: introduce AccountKey, support decoding it and validation [16:55] PR core20#110 closed: hooks: revert PR#105 - it seems to [17:28] ijohnson[m]: indeed with this reverted things seem to be unbroken [17:31] mvo great, and it seems xnox has already merged it too [17:34] I triggred an edge build [17:47] Is there a way for a snap that runs as a background service to read environment variables from the HOST ? [17:48] we have two snaps that run as background service on our hardware. We want developers to export "credentials" on their Ubuntu system and then just restart the two snap daemons [17:53] ijohnson[m]: and it's ready, I triggered a new test run in #10634 [17:53] Bug #10634: gnome-mag: SONAME change breaks existing gnopernicus package [17:53] PR #10634: many: move to go modules [18:11] PR snapd#10717 opened: tests: fix interfaces-libvirt test [18:15] om26er: can't you just use `snap set foo creds=secrets` ? [18:15] ijohnson[m] will it persist reboots though ? [18:16] om26er: yes snap config set with `snap set ...` is persistent [18:17] I just tried, it seems I need a config hook for this to work ? [18:17] yes you do, it can be empty though [18:30] ijohnson[m] I just updated my snap and created a `snap/hooks/configure` file with shebang only and made it executable. The snap build was successfully and I can call `snap set ...`, however the exported variable does not show [18:30] ...when I run `snap run --shell ` [18:31] shells don't persist, how is this being exported? [18:32] om26er: I mean you can do `snap set foo thing=other-thing` and then in that snap do `snapctl get thing` [18:32] om26er: I made an example of converting snapctl settings to environment variables easily here: https://forum.snapcraft.io/t/neat-trick-for-over-writing-default-environment-variable-values-for-daemons/13404/3 [19:21] Is there a reason why `snap set foo` doesn't support underscores ? [19:22] You can't set things like `snap set foo CLOUD_USER_ENDPOINT=ws://localhost:9000/ws` [19:22] just the original design I guess, why is it a problem for you ? [19:24] The environment variables are generated by a mobile app, which are also going to be used by other clients. So it would have been "simpler" to ask people to just run `snap set foo ENV_VAR=VALUE` [19:25] We might have to do some special casing for snaps as those keys would have to be redefined [19:26] i.e. CLOUD_USER_ENDPOINT --> CLOUD-USER-ENDPOINT [19:27] yeah that would be the easiest thing to do, but probably also worth filing a bug against snapd for allowing _ in key names, the key names just get serialized as json strings in an object so _ is not a problem serialization [19:30] PR core20#109 closed: hooks: add bootchart configuration [19:30] launchpad or forum ? [19:31] launchpad [19:35] Issue core20#111 opened: Convert hooks/024-configure-bootchart.chroot to static files [19:36] Reported https://bugs.launchpad.net/snapd/+bug/1942367 [19:36] Bug #1942367: Support underscore in 'snap set` keys [19:40] * cachio_ afk -> doc app [19:41] ijohnson[m], hey [19:41] ijohnson[m], about the #10708 which fails google:ubuntu-20.04-64:tests/main/interfaces-libvirt [19:41] Bug #10708: syck_0.42-3(ia64/unstable): FTBFS: test failures [19:41] PR #10708: tests/lib/prepare.sh: use core20 from beta channel temporarily <⚠ Critical> [19:41] hey cachio_ [19:41] Can a snap read values from `/etc/environment` if run as a daemon ? [19:42] I created a pr to fix that but stillnot working [19:42] om26er: it should automatically inherit from /etc/environment if it is a daemon [19:42] to workaround that we need to revert the test interfaces-libvirt to be executed on xenial [19:42] cachio_: what do you mean ? [19:42] ijohnson[m], google:ubuntu-20.04-64:tests/main/interfaces-libvirt [19:43] that test was updated [19:43] initially was running on xenial [19:43] but few days ago I moved that to focal [19:43] cachio_: right did you see my comment ? mvo reverted that test to xenial so master is unblocked [19:43] in a PR [19:43] ah, nice [19:44] cachio_: still no idea what's wrong with running the test on focal though, I haven't looked at that test very much [19:44] didn't see that comment [19:44] ijohnson[m], I left a comment here https://github.com/snapcore/snapd/pull/10717#issuecomment-910529226 [19:44] PR #10717: tests: fix interfaces-libvirt test [19:45] with the error I see [19:45] I need to go to the doctor now, I'll continue with that one later [19:47] ack [21:06] PR snapd#10708 closed: tests/lib/prepare.sh: use core20 from beta channel temporarily <⚠ Critical> [21:06] PR snapd#10718 opened: tests/lib/prepare.sh: download core20 for UC20 runs via BASE_CHANNEL