=== benfrancis9 is now known as benfrancis [01:48] PR snapd#9083 opened: tests: new parameters nested uc20 [05:10] morning [06:53] mborzecki: hi [06:53] pedronis: hey [06:54] mborzecki: should I force merge 9082 ? [06:55] pedronis: yes, i looked at the failures and they are unrelated [06:55] ok [06:58] mborzecki: anything else that requires my immediate attention? [06:59] PR snapd#9082 closed: interfaces/system-key: in WriteSystemKey during tests, don't call ParserFeatures [06:59] pedronis: no i think that's it for now, there's some more review i'll be doing and probably need a 2nd reviewer too [07:04] mborzecki: I commented in #9080 and I think #9079 needs a review [07:04] PR #9080: osutil/disks: use xerrors to indicate a fs label wasn't found [07:04] PR #9079: gadget/install: retrieve command lines from bootloader [07:04] mborzecki: did you see my comment/question in #9078? [07:04] PR #9078: [RFC] boot: fancy marshaller for modeenv values [07:05] pedronis: yes, i've finished the change and ii'll be pushing that in a couple of minutes [07:05] ok [07:35] pedronis: goconfigparser is something we keep bundled like other deps right? [07:36] mborzecki: in which sense? [07:37] pedronis: ah ok, it's vendored like everything else, for minute there i thought we were moving it around in the tree [07:38] pedronis: anywyas, the trouble it we need to land a fix in goconfigparser for json to work [07:38] :/ [07:38] we package it at least for debian afaict: golang-github-mvo5-goconfigparser-dev [07:38] it's confused by [] in lists [07:39] ah, because the section stuff [07:39] doesn't have ^$ ? [07:39] I was wondering about that, also not sure why it's like that [07:39] oerheks: yes, regexp.MustCompile(`\[(?P
[^]]+)\]`), but should be regexp.MustCompile(`^\[(?P
[^]]+)\]`) [07:40] pedronis: ^^ [07:43] mborzecki: fedora has: Provides: bundled(golang(github.com/mvo5/goconfigparser)) [07:43] not sure what that means [07:54] pedronis: i've updated #9078 and opened https://github.com/mvo5/goconfigparser/pull/6 [07:54] PR #9078: [RFC] boot: fancy marshaller for modeenv values <â›” Blocked> [07:54] PR mvo5/goconfigparser#6: configparser: fix section header regex to not interfere with JSON data [11:53] Eighth_Doctor: hey, any clue what changed in rawhide recently that could break building go binaries statically? the usual set of flags does not build a static bin anymore, heck even -ldflags=-extldflags=-static does not [11:54] ugh [11:54] Go 1.15 landed [11:54] you probably should file a bug report about this [11:56] Eighth_Doctor: hm -buildmode pie used to trigger external linker, but it does not anymore, i need to explicitly pass -linkmode external to get what i want [11:58] Go compiler upstream doesn't really have a concept of "stability" here [11:58] heh === pedronis_ is now known as pedronis [12:23] morning folks [12:30] ijohnson, hello [12:30] hi cachio [12:30] good morning [12:32] hey cmatsuoka [12:33] ijohnson: hi, good morning === benfrancis4 is now known as benfrancis [14:12] cachio: something like https://pastebin.ubuntu.com/p/MB9Qh36xkd/ should work for you to run nested jobs via github actions nightly [14:12] I didn't filter for all the relevant jobs, so that just runs all nested suites on 20.04, but that should be easy to fix with spead, etc. [14:12] s/spead/spread/ [14:15] yes [14:15] now I am comparing the times with and without kvm [14:16] ijohnson, so then I can adjust the workers needed to run when we use the "Run nested" label [14:16] right [14:16] thanks for the code, I'll add that to the PR that I am creting [14:16] creating [14:18] ijohnson, what I am testing now is to add more cores when kvm is not used [14:18] perhaps that help as well [14:18] ah true, I remember that bug, yes having more cores should help us I think [14:22] mborzecki: pedronis: I sent an invite for next week to brainstorm about bootstate refactorings/improvements before we add resealing to the mix there [14:22] ijohnson: ack [14:50] * cachio lunch [15:02] ijohnson: with L1 focal it boots correctly, and then strange things happen [15:02] interesting, strange things as in similar strange things we see on GCE ? [15:03] it didn't reboot, but it locked. I wonder if GCE has a watchdog for such cases [15:03] I'll keep it locked to see if it still reboots after some time [15:04] in the meantime I'll look for my cell phone, I think lost it [15:04] interesting one thing we don't know is when the machine reboots vs when it hangs from output to the serial file [15:22] cachio: it would be good to also add the debug output of the serial log for a nested VM when we fail to prepare, currently the debug-each I have that shows the boot log only runs if the execute section is where we fail [15:22] cachio: i.e. from this log, I can't see the serial boot log of the nested VM [15:22] https://github.com/snapcore/snapd/pull/9083/checks?check_run_id=930252395 [15:22] PR #9083: tests: new parameters nested uc20 [15:46] ijohnson: it's really freezing at random points during the boot process, and I think it could be possible that GCE is rebooting the machine if it reaches that state [15:46] cmatsuoka: ahhh very interesting [15:48] maybe we could display an RTC timestamp at the very beginning of userspace to measure the time between reboots? [15:56] ijohnson: sorry, but I xerrors.Errorf is more meant to stack errors that comes from other packages, than to label own errors but making it nested on the spot, that is a bit odd [15:56] s/but making/by making/ [15:56] pedronis: sorry I was just trying to do the easy thing with less code but I can make an Error type [15:56] ijohnson: yes, but in this case the easy thing is also a bit odd :) [15:57] well the error isn't currently surfaced to the user anywhere and we have much more confusing errors coming from the initramfs so I just shrugged when writing the code [15:57] but it appears that shrugging was not justified [15:58] ijohnson: no, in general we should try to improve our errors, I had cmatsuoka remove some nesting on some secboot errors for example [15:59] yep, at a certain point it becomes too verbose and not very informative [16:00] anyway when writing a ErrorMatches test it's always a chance to consider if the error is ok, can be improved [16:06] Sure [16:08] ijohnson: watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [systemd-modules:210] [16:09] https://paste.ubuntu.com/p/9hsS2jHhc8/ [16:10] ijohnson: I re-reviewed #8917 [16:10] PR #8917: osutil/disks: add mock disk and tests for happy path of mock disks <â›” Blocked> [16:10] ssh ijohnson I'll add that [16:13] do we print the go version we use to build the snapds in the spread tests? [16:22] cmatsuoka: ah good find, I was about to say what we could try is some kind of fifo that qemu logs the serial log to, where at the other end of the fifo we prepend the real host (well L1) time stamps for each line that was output to the log [16:22] cachio: thanks [16:24] ijohnson: adding timestamps is a good idea [16:24] pedronis: thanks, I responded to one point about checking the uniqueness of the dev strings, unfortunately we can't enforce that from MockMountPointDisksToPartionMapping [16:25] ijohnson: I think ts(1) does exactly that [16:25] * ijohnson is always learning about new tools that do exactly what I think of without complicated shell scripts :-) [16:26] ubuntu@nested-test:~$ uname | ts [16:26] Jul 31 13:25:59 Linux [16:32] ijohnson: I see, I made a comment, I'm not sure we want to go there but well disks are actually complicated [16:33] pedronis: sorry I didn't understand, where are you not sure we should go there ? [16:33] ijohnson: my suggestion in the new comment [16:33] let me look [16:34] sorry I still don't understand, how could we have a single MountPoint map to multiple MockDiskMappings ? [16:34] which one would be used ? [16:34] pedronis: ^ [16:37] ijohnson: sorry, let me stare at this more carefully, there is something odd here for sure [16:37] cmatsuoka: mmm for the purposes of debugging spread runs I wonder why we even need a file for the serial log, we could just have qemu have serial go to stdout and then pipe that through ts, and that just goes to journald because we run qemu through systemd-run ? [16:38] pedronis: ok I will wait to push any more patches to the mockdisk PR until we are aligned [16:40] ijohnson: I see [16:44] ijohnson: made this make more sense: https://github.com/snapcore/snapd/pull/8917/files#r463717116 [16:44] PR #8917: osutil/disks: add mock disk and tests for happy path of mock disks <â›” Blocked> [16:45] heh, maybe this makes more sense [16:46] pedronis: so then you mean to check if the pointers to MockDiskMapping are different, and if the pointers are different that they have unique DevNum strings ? [16:47] yes, though I really probably mean to use map and to take pointers, taking values of MockDiskMapping is odd [16:47] it's not a very valuey struct [16:47] to me [16:48] cachio: cmatsuoka: I pushed this: https://github.com/snapcore/snapd/pull/9065/commits/05fe6d8f383396e6a53d2cb70739e4c632b483f4 if it is useful I will submit a separate PR for that [16:48] PR #9065: debug: forward systemd journald output to serial console when booting for uc20 [16:48] pedronis: mmm ok [16:50] ijohnson: I mean, go for example will not let you compate two of them [16:50] *compare [16:50] that makes it non valuey in my book [16:51] pedronis: the only reason they are not comparable is because they have a map in them [16:51] ijohnson: yes [16:51] which perhaps is your whole point [16:51] partly yes [16:53] mmm alright I can do that then I don't think anything should be broken doing it this way [16:54] ijohnson: yes, I don't think it should mess too much with your downstream PRs [16:54] maybe a few more & somewhere [17:25] pedronis: ok updated 9080 and 8917 === ijohnson is now known as ijohnson|lunch [18:29] ijohnson|lunch: re-reviewed, closer but not quite === ijohnson|lunch is now known as ijohnson [18:29] yes I saw [18:31] * pedronis is also a bit melting here [18:33] ijohnson: sorry this is becoming a bit an adventure in the ontology of disks [18:33] haha it's ok [18:33] let's just remember to squash merge everything otherwise we will inevitably have to relive this ontology [18:53] ijohnson: +1 on 9080 [18:54] \o/ [18:54] I changed the summary as well [18:54] ah yes I suppose that is now out of date [18:54] s/is/was/ [18:59] ijohnson: reviewed 8987 as well, one small thing (which I should have mentioned before :/ ) [19:00] pedronis: you mean 8917 ? [19:00] ah yes I see [19:06] ijohnson, cachio: I think I tweaked the configuration enough to have the nested machine working with ovmf on my local amd, but I'll do some more tests and try to minimize the amount of changed variables [19:07] nice! [19:09] cmatsuoka, nice [19:09] I am still testing with no kvm [19:09] is not so bad [19:09] cachio: there are two independent things that cause problems here: smp and rng passthrough [19:10] if I disable rng passthrough and set smp to 1, it tends to be much more stable [19:18] cmatsuoka, how did you disabled rng passthrough? [19:19] cachio: I removed "-object rng-random,filename=/dev/hwrng,id=rng0 -device virtio-rng-pci,rng=rng0" from the parameter list, but it depends on you having these parameters already or not [19:20] in my last pr I added that [19:20] but didnt see much change with or without it [19:20] I had to remove it here, otherwise I get a consistent crash [19:20] if I disable kvm the test takes about 8 minutes more to run [19:22] so it seems that from my original set of parameters, if I set -smp 1 and remove rng passthrough it works, but my L0 host is a Ryzen7 [19:22] I'll try with a generic host to see if it still works [19:23] cachio: this is what worked for me: https://paste.ubuntu.com/p/yZhhrFBwBR/ [19:24] it is pretty similar to what we use on nested on google [19:25] yes, I changed more parameters initially then I placed them back to find which ones were actually causing the problem [19:25] bu you run kvm [19:25] but it could depend on the version of their L0 host and cpu as well [19:25] I am runnign qemu with kvm enalbed [19:25] I think it is the same [19:26] and this is L0 focal + L1 focal [19:26] it's more likely that they have L0 bionic (+patches maybe) + L1 focal [19:26] yes [19:27] who knows [19:27] 8 minutes more is not that bad if it's reliable [19:28] cmatsuoka, seems to be acceptable [19:28] I am running 2 tests together to see the time [19:43] yah 8 minutes is acceptable to me [20:26] ijohnson, cmatsuoka updated #9083 [20:26] PR #9083: tests: new parameters nested uc20 [20:26] with the no kvm scenario [20:30] nice thanks cachio [20:31] ijohnson, yaw, let see the action results