=== ubott2 is now known as ubottu === niemeyer_ is now known as niemeyer [02:59] PR snapcraft#3295 opened: specification: desktop extension font hook [05:27] morning [05:34] back in a bit, need ot drive the kids to school and then hopefully a brief visit to the tax office [06:28] good morning [06:28] * zyga sees two failures on 16.04 left [06:28] neat :) [06:30] zyga: good morning [06:32] hey :) [06:32] how are you? [06:33] ok, tweked two more tests and started another run [06:40] 16.04 gets a green pass, looking at 18.04 which unlocks a few more tests [06:41] re [06:41] mvo: zyga: morning guys [06:41] hey mborzecki [06:42] hey mborzecki [06:42] it's not quite frosty morning stage yet [06:42] but it's definitely chilly morning here [06:42] but even more than yesterday, we're looking at crazy 28C later today [06:49] I'll grab some food [07:05] mmmm, scrambled eggs never get old [07:07] morning [07:10] hello Pawel [07:10] good morning pstolowski [07:26] o/ [07:53] 18.04 is clean [07:53] 20.04 and then core [07:54] I think there may be a few tests left [07:54] and definitely selinux [07:54] but this looks great [08:06] zyga: ๐Ÿ‘‹ [08:06] zyga: is snap-confine still your bff? [08:11] asking because a friend is getting a 'sanity timeout expired' thing, https://dpaste.org/2YDP [08:12] snap version & etc, https://dpaste.org/mfb8 [08:22] Chipaca hey [08:22] sorry, I was in a call [08:22] Chipaca ohhh [08:22] let me look [08:22] BTW, the friend he mentions it's me [08:22] StyXman o/ [08:22] StyXman: ๐Ÿ™‚ [08:23] so was the machine under some incredible load? [08:23] zyga-mbp: of note is that this isn't the first time it happens, and last time StyXman had to reboot to fix it [08:23] that timeout is a relatively short in human scale but really huge in kernel scale duration where we grab an exclusive lock to do something tiny [08:23] well, 'had to', tried it and it worked [08:23] PR snapd#9400 opened: o/assertstate: support refreshing any number of snap-declarations [08:23] I have a few minutes and then another call I cannot miss [08:23] if you give me more context I will think about what may be going on [08:24] maybe it's a stuck process dead somewhere [08:24] well [08:24] not dead but blocked [08:24] and holding the lock [08:24] we rigged it so that processes that can grab it should die [08:24] the only exception is snapd which also grabs the lock for brief moments [08:24] so any context you can provide helps [08:24] zyga-mbp: maybe running fuser on the lock shines some light on that? [08:25] yes [08:25] it's a flock-based lock [08:25] the locks are in /run/snapd/lock [08:26] StyXman: try: sudo fuser -v /run/snapd/lock/lxd.lock [08:26] there's also a global lock /run/snapd/lock/.lock [08:27] zyga-mbp: o/ [08:27] * zyga-mbp types from two accounts [08:28] zyga-mbp: holding *which* lock? [08:28] we grab the global lock, do some setup, then we grab the per-snap lock [08:28] here it was definitely holding the per-snap lock [08:28] nothing on fuser's ouput [08:28] so /run/snapd/lock/lxd.lock [08:29] normally snap-confine itself grabs this lock [08:29] nothing either, weird, even when snapd is running [08:29] to synchronize access of concurrently executing snap-confine [08:29] sometimes snapd also grabs this lock [08:29] but very briefly and in specific scenarios [08:29] can you reproduce the failure? [08:29] all the time on this machine [08:29] hmmm [08:29] ok [08:30] wait [08:30] all the time in this reboot of this machine [08:30] I hadn't (can't) reboot it now [08:30] and i had this befor on other hosts [08:30] https://github.com/lxc/lxd/issues/6772#issuecomment-606597033 [08:31] and the next, I'm the same person [08:31] and I guess you'r in a meeting again [08:34] I notice that the last two messages are not prefixed with DEBUG, so I guess it's not snapd who's printing them? [08:34] notice the like above is an issue opened in the lxd project [08:35] load is bellow 6 on a 32 core machine [08:35] less than 50% ram used by apps, so no, no load at all [08:36] I had restarted snapd too, just in case [08:38] PR snapd#9401 opened: gadget: allow content observer to have opinions about a change [08:57] re [08:57] DEBUG messages are from snap-confine [09:01] StyXman: hmmm [09:01] StyXman: can you describe the setup a bit [09:01] like the FS used [09:01] or anything unusual? [09:03] not much [09:03] let me check this one [09:03] lxd on metal? [09:03] yes [09:03] sharing with KVM [09:04] fs is xfs [09:05] like I said, 32 cores, 128 of ram [09:06] fs's are on LVM, opver several disks, but I don't think that's relevant [09:07] how can I figure out which lock is failing? [09:08] run under strace? [09:08] run snapd* [09:08] one sec [09:08] sure [09:08] you can run snap-confine via strace [09:08] k [09:08] how? [09:09] I mean, which invocation (incantation?) of snap-confine? [09:09] hold on :) [09:09] right [09:11] sudo SNAP_NAME=lxd SNAP_REVISION=$(readlink /snap/lxd/current) strace -f /snap/core/9993/usr/lib/snapd/snap-confine snap.lxd.lxd /bin/true [09:11] StyXman: ^ try this, you may also add --base core18 if lxd was using core18 as base (I forgot) [09:11] er [09:12] SNAP_INSTANCE_NAME=lxd instead of SNAP_NAME= most likely [09:12] daam, that's definitely an incantation :) [09:12] typed dry, my setup has a custom snapd that does a lot new things and I'm not running things now [09:13] something's *really* slow here [09:13] I can see read()s scroll by [09:14] it takes ages reading /proc/self/mountinfo [09:14] WTF [09:14] can you read that? [09:15] note: [09:15] it may be reading the mountinfo of lxd [09:15] to reproduce [09:15] nsenter -m/run/snapd/ns/lxd.mnt cat /proc/self/mountinfo [09:16] do pastebin if it's curious and you have no secrets there [09:16] this is insane: https://dpaste.org/viR7 [09:16] nsenter? [09:16] hmm, it doesn't load [09:17] nsenter is a tool that can jump into one of the many linux namespaces [09:17] the incantation I gave shows the mount namespace of the lxd snap [09:17] something's really worng [09:17] StyXman: I cannot open the URL you have provided [09:17] it's full of the same mount info over and over again [09:18] how big is the paste/ [09:18] oh it loaded [09:18] *sigh* [09:18] mdione@demo-hv1:~$ sudo nsenter -m/run/snapd/ns/lxd.mnt time wc -l /proc/self/mountinfo [09:18] nsenter: failed to execute time: No such file or directory [09:18] can you run the 2nd command and show me the vanilla output? [09:18] it never ends [09:18] how about /bin/cat? [09:19] or /usr/bin/cat [09:19] wait [09:19] (it's not running that in a shell) [09:19] cat worked, it just never ends [09:19] so if you want time, you need sh -c ... [09:19] I'll make coffee [09:19] the 'file' has this line repeade indefinitely: [09:19] /lxd/storage-pools /mnt/kvm/lxd/storage-pools rw,relatime shared:104 master:34 - ext4 /dev/mapper/ssd-kvm rw,data=ordered [09:19] I think this may require lxd developers [09:19] lxd is very much special [09:20] and does things regular snaps cannot [09:20] StyXman: please collect the mountinfo in a bug report [09:20] and share it here [09:20] bug report for lxd, or snapd? [09:20] or both :-p [09:20] for lxd for now [09:20] on lp it can be both [09:21] so [09:21] it's not a bug in snap-confine [09:21] it's a very unusual mount namespace [09:21] the origin of the bug is unclear for now [09:21] ack :( [09:22] please collect the full output of mountinfo [09:22] on both the host [09:22] and as seen from lxd point of view [09:23] if you need to reboot please get those first [09:23] you can redirect to a file to avoid losing it [09:25] I waited more than a minute before I got tired of seeing the same line appear over and over again [09:25] also: WTH is it doing between lines? [09:27] I just send it to a log file, let's see how long it takes [09:27] (now I just realized maybe the remoteness was playing a part in it) [09:29] WTF [09:29] the first few reads take >100ยตs, then it jumps up to 10s of ms [09:32] cat is taking ages [09:34] 3m48s [09:34] wc -l /tmp/nsenter_cat_proc_self_mountinfo.log [09:34] 65631 /tmp/nsenter_cat_proc_self_mountinfo.log [09:37] StyXman it's a virtual file [09:38] generating each line is non trivial in general [09:38] can you paste it anywhere? [09:39] I don't think it makes sense [09:40] oh [09:40] yes, there are some interesting things [09:40] we use loop files in the ocntainers [09:40] and it seems it has something to do [09:40] it's exponential! [09:41] https://dpaste.org/6qp0 [09:41] before the first loop device the'res one line [09:41] before the second, 2 [09:41] before the third, 4 [09:42] 8, 16,, ... [09:42] 2**n [09:42] there are 28 loop files on that node === benfrancis5 is now known as benfrancis [09:46] so at the end it should be... around 512 million lines? [09:53] StyXman that can take a while [09:53] * Chipaca has a O(no) joke [09:53] I think we'd like to know how it happened [09:53] and if it comes back after rebooting [09:53] Chipaca: hehehehe [09:54] but before please ask stgraber for help [09:54] my network is experiencing big issues, I may be offline [09:54] StyXman: stgraber is -0400 so maybe in a bit [09:56] is that NY? [09:57] * StyXman has more clocks in his wall that humanly needed [10:01] StyXman: NY is also -4, yes [10:08] re [10:08] I think my network is back [10:08] StyXman what's in mountinfo outside of lxd? [10:08] on your host, that is [10:08] did you do any changes to mount tables recently? [10:09] on the host not really, it's been ages since it was installed [10:09] ok [10:09] the contaioners come and go as needed [10:10] it's (ab)used for testing purposes [10:11] here's the lxd comments: https://github.com/lxc/lxd/issues/6772#issuecomment-697265476 [10:12] thank you [10:19] Issue core20#72 opened: move docker user/group to extrausers [10:19] PR core20#45 closed: Add arches [10:19] PR core20#86 closed: .travis.yml: use stable snapcraft now [10:24] PR core20#87 opened: Revert "hooks: mv docker user/group definition to extrausers" [10:39] yay new pc-kernel in edge [10:39] and also good morning folks [10:40] hi ijohnson ! [10:41] o/ pstolowski [10:41] ijohnson: good morning! [10:42] hmm [10:42] hey zyga [10:42] is google down? [10:42] * ijohnson is excitedly testing the new edge kernel [10:43] 2020-09-23 12:43:03 Cannot allocate google:opensuse-15.1-64: cannot perform Google request: Get https://www.googleapis.com/compute/v1/projects/computeengine/zones: Post https://oauth2.googleapis.com/token: dial tcp 172.217.16.10:443: i/o timeout [10:43] over and over [10:43] maybe my network segment somewhere is flaky [10:50] zyga: seems to be working fine for me, I just launched a spread run on gce [10:50] thanks [10:57] 2020-09-23T12:56:36+02:00 ERROR Get https://api.snapcraft.io/api/v1/snaps/assertions/snap-revision/EnYyzGIokaLNadpwYWUw0qFczPCprbUPfS3KPBv8VsDMkFLDmosiytxhmkP9hwSP?max-format=0: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) [10:57] hmmm [10:57] network is troubled [11:00] mborzecki: I've just installed gimp [11:00] no fonts [11:00] and it crashes [11:01] zyga: on fedora/opensuse? [11:01] on focal [11:01] ooh [11:01] zyga: try with --experimental-gdbserver maybe? [11:02] zyga: but i investigated it a bit on arch and the backtrace was pointing to fonts issues again [11:14] PR snapd#9402 opened: tests/lib/prepare.sh: stop patching the uc20 initrd since it has been updated now <โš  Critical> [11:14] weird [11:14] irc works [11:14] but stuff other than that falls apart [11:16] ssh? http? ftp? [11:17] http [11:18] http or https? [11:18] what ISP? [11:18] t-mobile [11:22] * zyga goes to debug [11:22] sigh [11:31] (gimp:1053668): GLib-ERROR **: 13:30:51.456: ../../../../glib/gmem.c:333: overflow allocating 18446744073709551615*18446744073709551615 bytes [11:33] mborzecki: reinstalled, no change [11:35] zyga: yeah, iirc there was an earlier assertion that failed, try running with G_DEBUG=fatal-critical [11:35] quick errand, back in 30 or so [11:35] zyga: here's a nickle, kid, get yourself a computer that can allocate 18446744073709551615*18446744073709551615 bytes [11:35] Fontconfig warning: FcPattern object width does not accept value [85 100) [11:35] /snap/gimp/292/usr/bin/gimp: Gimp-Text-KRYTYCZNE: gimp_font_factory_load_names: assertion 'fontset' failed [11:35] * Chipaca is out of nickles [11:36] heh [11:36] gimp_device_info_set_device: trying to set GdkDevice 'Ultimate Gadget Laboratories Ultimate Hacking Keyboard' on GimpDeviceInfo which already has a device [11:50] degville: ijohnson: hi, a naming question here where maybe you can give some input: https://github.com/snapcore/snapd/pull/9401#discussion_r493493467 [11:50] PR #9401: gadget: allow content observer to have opinions about a change [11:50] pedronis: looking now... [11:52] pedronis I responded too [12:01] re [12:10] pstolowski: hi, added some comments in #9391 [12:10] PR #9391: [RFC] o/assertstate: introduce ValidationTrackingKey/ValidationSetTracking and basic methods [12:19] ijohnson: I reviewed #9379 [12:19] PR #9379: [RFC] cmd/s-b/initramfs-mounts: use ConfigureTargetSystem for install, recover modes [12:19] * zyga-mbp goes for a walk [12:33] pedronis: ty [12:35] pedronis: wdyt about just using TrustedAssetsBootloader for the merged TrustedAssets* & ManagedAssets* interfaces? [12:35] or TrustedBootloader maybe? [12:35] mborzecki: TrustedAssetsBootloader was what I thought [12:37] pedronis: ok [13:40] thanks pedronis I'll look at your review [13:41] btw can I get a review on https://github.com/snapcore/snapd/pull/9369 since it's green again due to new kernel snap in edge/beta ? [13:41] PR #9369: tests/nested/manual/refresh-revert-fundamentals: re-enable test [13:41] (from anyone though) [13:41] that pr is green now just needs reviews [14:07] man github actions new view is nice in that it jumpts to the error again like it did a few months ago initially, but now the log on that page is less than 20% of the screen :-/ [14:12] tests/main/snap-run seems to be hanging in strace reliably on ubuntu 20.04, 18.04 and 16.04 since recently I wonder if something broken in strace recently that makes this hang happen [14:24] I'm running opensuse tests and it looks like apparmor profile for snap-confine is just not getting loaded [14:25] PR snapd#9369 closed: tests/nested/manual/refresh-revert-fundamentals: re-enable test <โš  Critical> [14:29] zyga: I looked at https://github.com/snapcore/snapd/pull/9384, one questions there [14:29] PR #9384: overlord: export and use snapd tools [14:30] maybe I'm confused or something can be simplified further? [14:48] #9395 needs another review and is fairly simple [14:48] PR #9395: o/snapstate/check_snap_test.go: mock osutil.Find{U,G}id to avoid bug in test [15:04] pedronis: I addressed your feedback on #9379 and marked it ready for review [15:04] PR #9379: [RFC] cmd/s-b/initramfs-mounts: use ConfigureTargetSystem for install, recover modes [15:04] pedronis: let me know if you want me to split the pr up at all [15:17] ijohnson: why split it? is not large [15:17] pedronis: I dunno cause sometimes it is easier to split things [15:17] but sure not splitting is easier for me anyways :-) [15:18] anything up to 500 is usually reasonable, even a bit more if a good chunk is tests [15:20] makes sense [15:20] PR snapd#9403 opened: many: move ManagedAssetsBootloader into TrustedAssetsBootloader, drop former [15:23] ijohnson: question there [15:24] pedronis: is https://github.com/snapcore/snapd/pull/9005 something we want for 1.0? [15:24] PR #9005: boot: support setting extra command line argument, bootloader interface tweaks <โ›” Blocked> [15:24] where? [15:26] ijohnson: in the PR [15:26] pedronis: oh it was in the resolved conversation sorry I see it now [15:26] mborzecki: possibly, we need to discuss [15:26] pedronis: ok, let's sync in the morning then [15:27] pedronis: https://github.com/CanonicalLtd/subiquity/blob/fe012f20dc758f1fb1641d0b5744baf00739524d/bin/console-conf-wrapper#L25 [15:27] afaik writing the complete file shouldn't interfere with that at all [15:28] ah, thx [15:29] yes, writing the complete is not a problem. we still support the recover trigger in case anyway because of the or cond in the service [15:34] ijohnson: +1 [15:34] nice thanks [16:00] PR snapd#9402 closed: tests/lib/prepare.sh: stop patching the uc20 initrd since it has been updated now <โš  Critical> [16:03] pedronis ack, looking [16:04] zyga-mbp: I got this failure google:arch-linux-64:tests/main/cgroup-tracking:root here: https://github.com/snapcore/snapd/pull/8569/checks?check_run_id=1155234596 [16:04] PR #8569: o/assertstate,asserts: use bulk refresh to refresh snap-declarations [16:15] PR snapd#8569 closed: o/assertstate,asserts: use bulk refresh to refresh snap-declarations [16:15] PR snapd#9404 opened: asserts: deserialize grouping only once in AddBatch if needed [16:25] PR snapd#9334 closed: tests/nested/manual: add test for grades above signed booting with testkeys [16:25] PR snapd#9397 closed: tests/nested: misc robustness fixes [16:28] cachio: did you ever look at the api for gcloud launching shielded vm's? I see that actually in their rest API they have a `shieldedInstanceInitialState` setting which can take in pk, keks , dbs, and dbxs settings, which I assume is for the secure boot configuration [16:29] pedronis: looking at the failure [16:29] cachio: of course to be able to set that, I assume it would need spread changes but that could probably be handled [16:30] ijohnson, hey [16:31] I already verified that a time ago [16:31] they confirmed that it is not possible to set the keys for the secure boot [16:31] cachio: ah ok, so this api is not really usable for that then [16:31] It was when we were at frankfurt [16:31] cachio: I was wondering if maybe they added it since you looked at it [16:32] I can chack again [16:32] right a million years ago :-D [16:32] cachio: no need, I just was waiting for test results and looked at the api [16:32] ijohnson, it was the ifrst thing we tried to do [16:32] I see [16:33] ijohnson, I fixed tumbleweed again [16:33] for some raeson a new kernel appeared [16:33] nice I'll keep an eye out on runs this afternoon for tumbleweed [16:33] I am testing right now [16:35] pedronis: no immediate idea, we did run a test from another suite quite recently before this test, still no idea why this failed exactly [16:59] * cachio afk -> go to doctor app and make study [17:01] ijohnson, I am leaving now, if you need any review just leave me a message, I'll do it when I am back [17:02] cachio: I'm good for now, thanks! [17:02] ijohnson, nice [18:40] cachio: when you're back really simple pr https://github.com/snapcore/snapd/pull/9405 [18:40] PR #9405: tests/nested/manual/grade-signed-above-testkeys-boot: enable kvm [18:41] PR snapd#9405 opened: tests/nested/manual/grade-signed-above-testkeys-boot: enable kvm [19:29] re [19:29] PT is late recently [19:29] tired and sleepy [19:29] and bloody tests got network timeouts [19:37] some more network luck now [19:37] 2020-09-23 21:37:38 Error discarding google:arch-linux-64 (sep231929-708690): cannot deallocate Google server google:arch-linux-64 (sep231929-708690): cannot perform Google request: Delete https://www.googleapis.com/compute/v1/projects/computeengine/zones/us-east1-b/instances/sep231929-708690: dial tcp 216.58.208.202:443: i/o timeout [19:37] all the time [19:38] oh well [19:38] tomorrow I will start with my network [19:45] on fallback link [19:46] I need to debug the primary link tomorrow [19:46] o/ [19:46] * zyga-kaveri EODs [19:46] ijohnson: if you have a minute I have a super-simple PR at 9407 [19:46] PR snapd#9406 opened: many: allow ignoring running apps for specific request [19:46] PR snapd#9407 opened: overlord: explicitly set refresh-app-awareness in tests [19:53] zyga-kaveri: sur [19:53] Sure [21:47] PR snapcraft#3294 closed: project loader: install dirmngr prior to configuring package repositories