mupPR snapcraft#3286 opened: [legacy backport] v1 plugins: lock godep's dependencies <maintenance> <Created by cjp256> <https://github.com/snapcore/snapcraft/pull/3286>00:25
mupPR snapcraft#3286 closed: [legacy backport] v1 plugins: lock godep's dependencies <maintenance> <Created by cjp256> <Merged by sergiusens> <https://github.com/snapcore/snapcraft/pull/3286>01:26
mupPR snapd#9348 opened: tests: print all the serial logs for the nested test <Run nested> <Simple 😃> <Created by sergiocazzolato> <https://github.com/snapcore/snapd/pull/9348>04:08
mborzeckimvo: hey, saw your comment under #9347, i can start looking into this06:11
mupPR #9347: tests/lib/nested.sh: use more focused cloud-init config for uc20 <Run nested> <Simple 😃> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9347>06:11
mvomborzecki: thanks06:11
mvomborzecki: might be as simple as adding something to nested_start_core_vm_unit that already waits for ssh06:11
mborzeckianyway, looks like we're becoming experts in cloud init too06:12
mvomborzecki: yeah :(06:14
mvomborzecki: fun, looks like your better logging PR passed except for the smoke test where the change to add logging broke the test because now snap-confine prints stuff to stderr06:14
mborzeckimvo: oh, and idk if you seen the comment about the nested suite sending 150MB of project data to the vm06:16
mborzeckimaybe that's the kernel/core/snapd snaps repacked06:17
mvomborzecki: yeah, I think06:18
mvomborzecki: looks like we need to either put them elsewhere or clean them after the repacking06:19
mvomborzecki: but yeah, looks like there is some junk around06:19
mvomborzecki: given that all the other tests have passed and that is super rare - what functional change could have triggered this? or do you think it's pure coincidence?06:21
mvomborzecki: i.e. could the cloud-init chnage be responsible?06:21
mborzeckifeels like a stroke of luck06:21
mvomborzecki: ok06:22
mborzeckimvo: as for SNAP_DEBUG=1 in environment, we could drop an override for snapd.service, i had that at some point, but then dropped it and added to /etc/environemnt (because no logs were appearing which i didn't know at the time to be caused by the MaxLevel* in journald)06:34
mvomborzecki: yeah, let me quickly push an idea06:36
mborzeckiok, runing the nested suite now06:36
mvomborzecki: 9349 use your idea but puts it only for the snapd unit06:37
mborzeckimvo: this is what i have for the snap command; https://paste.ubuntu.com/p/s4zr5NsKX4/06:37
mvomborzecki: that looks reasonable, a bit less central than I had hoped, we can't put it into nested_start_for_core_vm ? this waits for ssh already?06:38
mupPR snapd#9349 opened: tests: change snapd.spread-tests-run-mode-tweaks.sh to add logging <Run nested> <UC20> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9349>06:39
mborzeckimvo: hm, need to double check it wouldn't conflict the nested/manual tests06:39
mvomborzecki: yeah, probably fine06:43
mvomborzecki: mostly wondering06:43
mborzeckimvo: just checked and it should be fine, i've updated the patch06:44
mvomborzecki: it's also true that muddeling the concept of start and wait-for-ready is a bit strange so keeping seprarate may well be better06:44
mvomborzecki: cool06:44
mvomborzecki: again, mostly thinking that this is okay since we wait for ssh already06:44
mborzeckimaybe i should just push all of it into #934306:45
mupPR #9343: tests: more logging for UC20 kernel test <Run nested> <Test Robustness> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9343>06:45
mborzeckii mean ian's path + the tweak for waiting06:45
mvomborzecki: yes06:48
mvomborzecki: I think that's good06:48
pedronismvo: mborzecki: should we sync?06:52
mborzeckipedronis: at 9? i'll grab some tea06:53
pedronismvo: I'm quite confused by #9349 given Ian's merge from yesterday06:54
mupPR #9349: tests: change snapd.spread-tests-run-mode-tweaks.sh to add logging <Run nested> <UC20> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9349>06:54
mborzeckiNOMATCH is something that was added to spread recently?06:58
mvopedronis: maybe I'm confused, let me double check what happend :(06:58
mvopedronis: oh, I see :(06:59
pedronismvo: afaik on master we don't use the function you are changing anymore06:59
pedronismvo: I asked you to comment on that yesterday evening when Ian asked to merge and you said yes :)07:00
mvopedronis: it is fine07:00
pedronisI'm going to the SU07:00
mvopedronis: it's just that it's all a bit of a maze and apparently I did not had enough tea yet07:00
mvopedronis: joining07:00
mupPR snapd#9349 closed: tests: change snapd.spread-tests-run-mode-tweaks.sh to add logging <Run nested> <UC20> <Created by mvo5> <Closed by mvo5> <https://github.com/snapcore/snapd/pull/9349>07:04
mvogood morning pstolowski and zyga-kaveri07:07
zyga-kaverihow is core20?07:07
zyga-kaveriI'm baby-sitting still07:07
zyga-kaveripstolowski: hi07:12
zyga-kaveripstolowski: could you please look at https://pastebin.ubuntu.com/p/QMJ3x8HK4h/07:12
zyga-kaveriis anything there out of place for preseeding?07:13
zyga-kaveriit's not the full list07:13
zyga-kaveriI think I know what is going on07:13
* zyga-kaveri fixes07:16
pstolowskizyga-kaveri: seems 3 tasks are missing (your new tasks I presume)07:22
zyga-kaveripstolowski: yeah exactly :D07:23
zyga-kaveriman that run was pretty green overall07:25
zyga-kaverione test adjustment, some random failures07:25
zyga-kaveribut otherwise it's pretty good07:25
zyga-kaveriI was only running core and main tests for ubuntu 1607:25
mborzeckimvo: pedronis: i've updated #934708:15
mupPR #9347: tests/lib/nested.sh: use more focused cloud-init config for uc20 <Run nested> <Simple 😃> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9347>08:15
mvomborzecki: \o/08:25
zyga-kaveriyour video froze mvo08:32
pedronismborzecki: mvo: so it takes around 50-60 to a resealing in the non-accel vms, almost 80 with two run kernels I suppose08:49
pedronismborzecki: we are hasing 2 or 3 kernels each time plus other ops plus signing08:49
pedronismvo: mborzecki: actually not, we are not hashing kernels yet, so it will become even slower08:51
zyga-kaveriraspberry pi 1 speed!08:51
pedronismborzecki: mvo: anyway, we should be able to cut down on the number of reseals, we get so many because current unasserted kernel logic08:52
pedronisspeeding them up is a different matter, it would need help from secboot, also there are trade offs08:53
mupPR snapd#9350 opened: [RFC] store: handle v2 error when fetching assertions <Created by stolowski> <https://github.com/snapcore/snapd/pull/9350>08:59
pstolowskipedronis: hi, let me know if this is what you had in mind ^ ; i'll add tests if the idea is ok09:00
* pstolowski back to snapshots09:03
mupPR snapd#9351 opened: tests: simplify repack_snapd_snap_with_deb_content_and_run_mode_first_boot_tweaks <Cleanup :broom:> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9351>09:19
zyga-kaveri2020-09-15 11:22:19 Executing google:ubuntu-20.04-64:tests/main/preseed (sep150911-454471) (1/1)...09:32
zyga-kaveri2020-09-15 11:22:55 Successful tasks: 109:32
zyga-kaveripstolowski: we should find a quiet time09:41
zyga-kaveripstolowski: and demo vscode to the team09:41
zyga-kaveripstolowski: it's such a gamechanger09:41
zyga-kaveriokay, onto snap-confine changes09:43
zyga-kaveriNOW IS THE TIME :D09:43
pstolowskizyga-kaveri: is it vim vs emacs vs vscode now? ;)09:54
zyga-kaveripstolowski: for some people it's always vim vs emacs09:55
zyga-kaveripstolowski: the rest just uses code09:55
gkoFor me, it's emacs AND code.10:03
zyga-kaveriI'm code and vim10:04
zyga-kaveribut vim much less often10:04
zyga-kaverimainly inertia10:04
zyga-kaveribut code is just genuinely better10:04
gkocode's remote is nice.10:06
gkoEspecially with low memory/disk space VM...10:06
zyga-kaveriI only wish it had riscv and mips binaries10:07
gkoToo bad for Perl, it needs at least 5.18 and my platform uses 5.8.9...10:08
mvomborzecki: 9347 is at 1:28h already, I wonder if that is good or bad10:17
mborzeckimvo: looks like it's slow, i was able to see the logs and minimal smoke hit a 40m timeout10:17
* mvo nods10:17
mborzeckimaybe we should have a longer kill timeout for the nested suite10:18
mvoyeah, might be needed10:18
mvolet's see if it completes, I really hope it does (successfully)10:18
mvomborzecki: and it just failed in 2020-09-15T10:22:06.2176601Z     - google-nested:ubuntu-20.04-64:tests/nested/manual/minimal-smoke :(10:22
mborzeckimvo: so that's it, hit a timeout on minimal/smoke, but otherwise the rest ran succesfuly10:22
mvomborzecki: aha, timeout? ok, that sounds very promising10:23
mborzeckimvo: yup, a spread kill timeout10:23
mvomborzecki: yeah, just saw it. sounds like we should raise it and re-run, can you do that?10:23
mvomborzecki: but that is *very* encouraging, well done mborzecki  and ijohnson  :)10:23
mvomborzecki: I really hope/think this could give us the reliable tests we need10:24
mborzeckimvo: hm i'd be leaning towards landing the PR and bumping the timeout in another one, looking at the logs, the test stated at 9:10, and at 9:49 it was executing tests/smoke/remove in the nested vm, so just running slow/late10:27
mvomborzecki: works for me10:28
mvomborzecki: done10:29
mvomborzecki: ok, let's include the longer timeout as one of the needed next steps (is more logging next?)10:29
mupPR snapd#9347 closed: tests/lib/nested.sh: use more focused cloud-init config for uc20 <Run nested> <Simple 😃> <UC20> <Created by anonymouse64> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9347>10:29
mborzeckimvo: pedronis: #9343 is updated with a bump in kill timeouts and a little tweak for cleaning the serial log before each test (otherwise it would accumulate)10:41
mupPR #9343: tests: more logging for UC20 kernel test <Run nested> <Test Robustness> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9343>10:41
pedronismvo: what's #9348 and how it relates to #9343 ?  (cc mborzecki )10:43
mupPR #9348: tests: print all the serial logs for the nested test <Run nested> <Simple 😃> <Created by sergiocazzolato> <https://github.com/snapcore/snapd/pull/9348>10:43
mupPR #9343: tests: more logging for UC20 kernel test <Run nested> <Test Robustness> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9343>10:43
mborzeckipedronis: looks like a followup/tweak we could do once 9343 lands?10:45
pedronismborzecki: the changes to the cloud init config don't make sense anymore in it? don't we land some other code now?10:46
pedronisheh, didn't we land10:46
mborzeckiah, damn missed those10:46
pedronisit needs a master merge?10:47
pedronisah, it just had one?10:47
pedronismborzecki: mvo: fwiw I'm working on resealing a bit less even with unasserted kernels10:48
mvomborzecki: does it make sense to land 9343 as it is or instead pull the logging changes out and land that separately and then we merge master into the kernel reseal tests?10:49
mvo(cc pedronis -^)10:49
mborzeckiwe can land the logging first maybe?10:50
mborzeckii can cherry pick the relevant commits and open a PR10:50
pedronismborzecki: would think it's better, reading 9343 is a bit hard10:56
mborzeckihmm landing #9311 would make it smaller still10:57
mupPR #9311: nested: add support to telnet to serial port in nested VM <Run nested> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9311>10:57
pedronismborzecki: that looks ok and has a lot of +1 (it's a bit repetitive but so nested.sh)11:01
mvomborzecki: I merged master into 9311, once that is green we should merge to master11:02
mvomborzecki: also 9311 is not really needed for gce, we can always merge later11:03
mvobut in a meeting so only have 10% of my brain11:03
mupPR snapd#9352 opened: test: improve logging in nested tests <Run nested> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9352>11:15
mborzeckipedronis: mvo: ^^ just the logging11:15
mvomborzecki: \o/11:23
* mvo is still in a meeting11:23
dariballI am putting together a custom-image as documented, works pretty good. I would now like to have some configurations changed like "sudo snap set somesnap foo=bar" for my custom-image, what would be the right way to do this? create a snap which is exclusively running those inside the configure hook? like a configuration-snap ? or is there some other prefered way?11:26
ogra_dariball, the typical way is to use a custom gadget and set them in the "defaults:" in snapycraft.yaml11:29
ogra_dariball, https://snapcraft.io/docs/gadget-snap11:31
dariballtherefore I would fork the default gadget snap, right, because I can only have one gadget snap?11:31
dariballand I would add arbitrary commands to the prepare-device hook?11:32
ogra_yu would use the gadget.yaml options for connections and for setting defaults11:32
ogra_look at the linked page above11:32
dariballand running arbitrary commands shall be avoided?11:33
dariballbecause let's say I have some cli tool from a snap I would like to run ?11:34
ogra_well, thats not esaily doable without forking the snap and make some daemonized script11:35
ogra_(snaps can not easily call commands from other snaps, else you'd be able to just create a snap full of scripts)11:36
dariballokok, thx this already helps alot, hope I get along with connections + defaults, looks good...11:37
ogra_if you have a brands stroe you *can* use the snapd REST API from a config or tooling snap though, that allows things beyond the limited stuff11:37
ogra_*brand store11:37
ogra_but use of the interface (snapd-control) that gives you access to the API is brand store bound, snaps in the global store do not get permission to use it11:38
pedronismvo: mborzecki: I haven't proposed but here are the changes to reseal less with unasserted kernels: https://github.com/pedronis/snappy/commit/39746b163590c1f6be0103362b8fcfeee0f3233811:40
zyga-kaveriok, modified snap-confine and some small bits, running in complain snap-confine apparmor profile to see what we get11:57
mupPR snapd#9185 closed: secboot: use the snapcore/secboot native recovery key type <UC20> <Created by cmatsuoka> <Merged by cmatsuoka> <https://github.com/snapcore/snapd/pull/9185>12:05
mborzeckifunny when spread hangs on discarding a node12:05
mborzeckipedronis: the patch looks reasonable, are you going to propose a branch?12:10
pedronismborzecki: probably not before we get other things in12:12
mupPR snapcraft#3285 closed: v1 plugins: lock godep's dependencies <maintenance> <Created by cjp256> <Merged by sergiusens> <https://github.com/snapcore/snapcraft/pull/3285>12:12
mupPR snapcraft#3287 opened: elf: reduce noise in the developer debug logs <Created by cjp256> <https://github.com/snapcore/snapcraft/pull/3287>12:12
mborzeckipedronis: ack12:12
mborzeckiheh, finally the nested suite is in progress in 935212:24
zyga-kaveriexported tools work12:29
zyga-kaveriI need to adjust apparmor profiles12:29
zyga-kaveribut it's functional12:29
mborzeckiheh, nested tests i ran: !!!! X64 Exception Type - 0E(#PF - Page-Fault)  CPU Apic ID - 00000000 !!!!12:43
mborzeckithat's when rebooting from install -> run12:44
cachiomborzecki, I saww that yesterday as erll12:44
mborzeckiand random, didn't happen in other tests12:45
cachioit is very sporadic12:45
cmatsuokamaybe -smp 1 could help mitigate this?12:46
cachiocmatsuoka, I saw that just running with kvm enabled12:47
cachiomborzecki, are you running with kvm enalbed right?12:47
cmatsuokacachio: what version of ovmf are we using in our tests?12:48
cachiocmatsuoka, need to check, I tried with the latest and the once from  proposed12:48
mborzeckicachio: no, this was on gcp12:51
mborzeckicmatsuoka: and the whichever version of qemu/ovmf we have in focal12:52
cachiomborzecki, how did you run the test on gcp?12:52
mborzeckicachio: spread -debug -v google-nested:ubuntu-20.04-64:tests/nested/...12:52
cachiocmatsuoka, this is the version we use of ovmf 0~20191122.bd85bf54-2ubuntu312:53
cachiomborzecki, well the tests in core20 suite will run without kvm but tests on manual suite will run with kvm enabled12:54
cachiomborzecki, don't know which test showed that panic12:54
mborzeckicachio: tests/nested/manual/core20-early-config so maybe it's related to kvm12:55
cachiomborzecki, ahh, I think so12:55
cmatsuokathe groovy ovmf is a bit newer, but many changes in upstream since then12:58
mupPR snapd#9311 closed: nested: add support to telnet to serial port in nested VM <Run nested> <Created by mvo5> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9311>13:00
mvomborzecki: 9352 just passesd13:24
mborzeckimvo: just landed it13:24
mvomborzecki: \o/ third in a row, fingers crossed13:24
mborzeckimvo: i'll cherry pick the spread test from your PR and add it to kernel reseal one13:24
mvomborzecki: sounds great13:24
mvomborzecki: feel free to close the test PR from me once you chrry-picked to the right place13:25
mupPR snapd#9352 closed: test: improve logging in nested tests <Run nested> <UC20> <Created by bboozzoo> <Merged by bboozzoo> <https://github.com/snapcore/snapd/pull/9352>13:25
zyga-kaveriI'm going to break for lunch now13:42
mborzeckipedronis: mvo: i've updated https://github.com/snapcore/snapd/pull/933113:44
mupPR #9331: boot: reseal when changing kernel <Run nested> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9331>13:44
mborzeckimvo: i've dropped the patch where test waits for snap to become available, we have that covered in nested setup now13:45
mupPR snapd#9338 closed: tests: add nested core20 kernel reseal test <Run nested> <UC20> <Created by mvo5> <Closed by bboozzoo> <https://github.com/snapcore/snapd/pull/9338>13:45
mupPR snapd#9351 closed: tests: simplify repack_snapd_snap_with_deb_content_and_run_mode_first_boot_tweaks <Cleanup :broom:> <Created by mvo5> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9351>13:45
pedronismborzecki: reviewed, it needs a 2nd review13:49
mborzeckipedronis: thanks!13:49
mborzeckimvo: ijohnson: could you take a look at #9331?13:49
mupPR #9331: boot: reseal when changing kernel <Run nested> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9331>13:49
mvomborzecki: of course13:51
pedronismy own comment on it are addressed in #9340 (which is adding some tests, a doing some refacor/renames)13:52
mupPR #9340: boot: streamline bootstate20.go reseal and tests changes <Run nested> <UC20> <Created by pedronis> <https://github.com/snapcore/snapd/pull/9340>13:52
=== tomwardill_ is now known as tomwardill
=== bluesabre_ is now known as bluesabre
=== marosg_ is now known as marosg
=== beisner_ is now known as beisner
=== coreycb_ is now known as coreycb
=== philroche_ is now known as philroche
=== urluck_ is now known as urluck
=== ogra_ is now known as Guest47593
mupPR snapd#9321 closed: cmd/snap/model: specify grade in the model command output <Simple 😃> <UC20> <Created by anonymouse64> <Merged by anonymouse64> <https://github.com/snapcore/snapd/pull/9321>14:10
ijohnsonmborzecki: pedronis I had a question on the unit tests in 933114:17
ijohnsonperhaps I have misunderstood something14:17
pedronisthere is at least one test that my PR changes to do something quite different14:18
pedronismore in line with the name14:18
pedronisah, something else, yes I think some of those tests are cheating a bit14:19
pedronisbecause the mock bootloader has limitations14:20
ijohnsonok, is it alright that we aren't testing the same way things are being used in real life?14:20
pedroniswell, we just feed all that to a mocked secboot reseal14:20
pedronisthat just check that the processing looked alright14:21
ijohnsondo we have anywhere that unit tests end to end with the actual expected boot chain? or is that just unnecessary since we will have the spread tests14:21
pedronisso I think it's ok but might confuse us later, but it's not a blocker atm14:21
ijohnsonok, thanks14:21
pedronisijohnson: yes, we do have tests that are more realistic14:21
pedronisseal_test.go own tess are more realistic14:21
ijohnsonok sounds good14:21
pedronisthe comment about checking the env seems legit though14:22
pedronisit might be easier to pick it up in my PR tough14:22
ijohnsonthat's fine with me14:24
ijohnsonI'm finishing up a reivew of 9340 now btw14:24
=== xnox1 is now known as xnox
mborzecki16.04 failed with google:ubuntu-16.04-64:tests/main/lxd:snapd_cgroup_neither14:51
* mvo is off for a couple of minutes to taxi kids around, tg for emergencies14:53
cachiomborzecki, again for me https://paste.ubuntu.com/p/6F4jypF8Wd/14:55
mupPR snapd#9353 opened: [RFC] configcore: do not error in console-conf.disable for install mode <Created by mvo5> <https://github.com/snapcore/snapd/pull/9353>14:55
mborzeckihm looks like we still need to do something about the client timeout14:59
ijohnsonmborzecki: which client timeout?15:00
mborzeckidoTimeout in client.go15:00
mborzeckiwas doing a couple of kernel reseal runs on gcp and one failed with cannot communicate with the server, client timeout exceeded and so on15:00
mborzeckiit's set to 50s atm, maybe we should have an env variable to make it longer15:01
* cachio lunch15:12
ijohnsonyeah we should probably bump that timeout, there are reports of it not being long enough on the forum too on like rpi or somethin I think15:18
mupPR snapcraft#3238 closed: db: introduce generalized datastore <enhancement> <Created by cjp256> <Merged by sergiusens> <https://github.com/snapcore/snapcraft/pull/3238>15:22
jdstrandzyga-kaveri: hey, please have snap-confine PRs still go through us. feel free to ping me and I'll assign someone15:29
jdstrandnot that you weren't going to do that, just wanted to make sure people know :)15:30
jdstrandcc pedronis ^15:30
* jdstrand would cc mvo but he's not here atm :)15:30
mborzeckihm mvo isn't around anymore?15:43
ijohnsonyeah he said he had to go taxi his kids around15:43
mborzeckiijohnson: ah, thanks for the info15:43
ijohnsonmborzecki: any pr's I should review right now?15:44
ijohnsonI reviewed both 9331 and 9340 this morning15:44
mborzeckiijohnson: you can try #934115:48
mupPR #9341: tests: add nested core20 gadget reseal test <Run nested> <UC20> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9341>15:48
ijohnsonmborzecki: ack15:48
mborzeckipedronis: #9331 spread run finished, there's failure in nested tests, tests/nested/core20/basic which expected test-snapd-sh to be there but looking at the log it got 400 over the API, and minimal-smoke which looks like it was just killed (?) but went throught 3 out of 4 smoke tests by that time15:51
mupPR #9331: boot: reseal when changing kernel <Run nested> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9331>15:51
pedronismborzecki: minimal smoke failed also on 18 ?15:57
ijohnsonpedronis: mborzecki I have seen minimal smoke fail on 18 a few times and have an idea why it failed15:57
ijohnsonI reproduced it yesterday but ran out of time to fully debug15:57
mborzeckipedronis: yeah, but there it seems like an ssh thing15:58
ijohnsonessentially when we create the users and specify them as sudoers by writing the files, somehow those files get  lost/ reset to be empty when the vm is shut down15:58
ijohnsonso I was going to try by doing a sync inside the vm before shutting down, then a sync outside the vm before starting it up again15:58
mborzeckifunny, becuse the setup was successful, `cannot connect: cannot connect to external:ubuntu-core-18-64: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none password], no supported methods remain`15:58
mborzeckiijohnson: wooo, but we call shutdown in the vm, don't we?15:59
ijohnsonit's bugs all the way up and down the stack in nested vms15:59
mborzeckiapaprently ;)15:59
ijohnsonbut I would not be worried about minimal smoke on 18 if that was the only thing that failed15:59
mborzeckianyways, i don't think the failures in nested core20 were related to the kernel reseal test16:00
mborzeckiand we really should do the repacking somewhere outside of $SPREAD_PATH, `2020-09-15T15:30:24.5255269Z 2020-09-15 15:13:31 Project content is packed for delivery (350.90MB).`16:01
pedronismborzecki: ijohnson: so you are saying I should force merge #9331 and the tweak my follow-up?16:02
mupPR #9331: boot: reseal when changing kernel <Run nested> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9331>16:02
mborzeckipedronis: yeah, i'll add a note about failures  to the PR16:02
ijohnsonpedronis: let me have a look again at the test to make sure I'm not errantly identifying the fialure without looking at the logs16:03
mborzeckiha, ok i see why nested/basic failed, it's `error: cannot communicate with server: timeout exceeded while waiting for response`16:03
mborzeckiso the timeout tweak16:03
mborzeckiand the test does something weird, there's `nested_exec "sudo snap install test-snapd-sh" || true` with a note that ssh sometimes dies at this point>16:05
ijohnsonmmm very suspicious16:05
pedronisit should do --no-wait maybe?16:05
ijohnsonbut pedronis mborzecki I looked at the non-uc20 logs for 9331 and I wouldn't be worried about any of those other failures16:05
pedronismmh, it doesn't have two +1 atm16:06
ijohnsonsorry my comment wasn't a +1 because I was unsure about the test situation16:07
ijohnsonpedronis: I +1d it formally16:07
pedronismborzecki: I'll thos kernel_status checks to mine16:08
pedronis*I'll add those16:08
mborzeckipedronis: cool, thank you16:08
pedronismborzecki: this one is also adding the spread test, right?16:10
mborzeckipedronis: yes, the kernel-reseal test was successful16:11
pedronismborzecki: ijohnson: merged16:13
mborzeckipedronis: thanks!16:13
ijohnsonmvo will be happy16:13
pedronisnow I will to deal with conflicts in mine16:13
mvopedronis: yeah just saw it \o/16:14
mvoijohnson: I am!16:14
mvomborzecki: \o/ as well16:14
mvopedronis: I assume you merge master into 9340 now?16:14
mborzeckimvo: shall i merge master to #9341 and make it non-draft?16:15
mupPR #9341: tests: add nested core20 gadget reseal test <Run nested> <UC20> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9341>16:15
mvomborzecki: yeah16:15
mvomborzecki: it has this really ugly wart that it makes assumptions about the edition16:16
mupPR snapd#9331 closed: boot: reseal when changing kernel <Run nested> <UC20> <Created by bboozzoo> <Merged by pedronis> <https://github.com/snapcore/snapd/pull/9331>16:16
mupPR snapd#9343 closed: tests: more logging for UC20 kernel test <Run nested> <Test Robustness> <UC20> <Created by bboozzoo> <Closed by bboozzoo> <https://github.com/snapcore/snapd/pull/9343>16:16
mvomborzecki: let me quickly look if I could improve that with some python-json16:16
ijohnsonmborzecki: mvo: in that pr, what does the XXX2 about? how were you trying to update ubuntu-boot?16:17
mvoijohnson: just bumping the editon there16:17
ijohnsondoes it still fail like that? the nested test is green on that pr16:17
ijohnsonmaybe that was an old failure?16:17
pedronisit will do nothing if the content is the same though16:17
ijohnsonpedronis: the content is changed16:18
ijohnsonwhitespace is added to the recovery grub16:18
pedronisah, ok16:18
mborzeckishall i propose this? https://paste.ubuntu.com/p/fKWpKjkGNz/16:18
mvoyeah, so I think it will still be a problem, I can do a separate PR that illustrates the problem, I think it might be a bug in our16:19
mvoin our gadget update code16:19
ijohnsonmborzecki: lgtm ship it16:19
mvomborzecki: seems reasonable, I think we got a similar request recently by a customer16:19
mborzeckii can set spread to run the nested suite when opening the PR and we'll see about that basic test16:20
* mvo nods16:20
mupPR snapd#9354 opened: client: bump the default request timeout to 120s <Run nested> <Simple 😃> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9354>16:21
mvoijohnson: but to be clear about the XXX2, I think it can wait for later this week :)16:21
ijohnsonI +1d it, lgtm with some small suggestions that can be a followup with no-spread tag16:22
mvoI will work on a cleanup with better python3 json instead of the fugly shell|pipes16:28
ijohnsonyou could use an imcprehensible set of jq expressions too :-)16:30
ijohnsonpedronis: instead of CloudInitRestrictOptions.DisableCloudInit as an opt name, what do you think about AlwaysDisableAfterLocalDatasourcesRun ?16:32
ijohnsonthat conveys that it onnly disables after it runs, and lets us make clear that it applies to both NoCloud and now None too16:33
ijohnson(and potentially others as we see fit later)16:33
pedronisseems ok, do we need the "Always" bit in it?16:33
ijohnsonmmm maybe not16:34
ijohnsonDisableAfterLocalDatasourcesRun seems to make sense enough16:34
mborzeckiijohnson: added a comment explaining what's happening https://github.com/snapcore/snapd/pull/9341/files#r48880661216:39
mupPR #9341: tests: add nested core20 gadget reseal test <Run nested> <UC20> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9341>16:39
ijohnsonmborzecki: thanks looking now16:39
ijohnsonmborzecki: ah thanks that makes sense I forgot that the edition for the different structures can be different16:40
mborzeckiijohnson: (cc mvo) that error is pure accident by trying simple s/edition: 1/edition: 2/16:40
mvomborzecki: haha - so the sed is too stupid?16:40
pedronisijohnson: mborzecki: I merged master and updated #934016:40
mupPR #9340: boot: streamline bootstate20.go reseal and tests changes <Run nested> <UC20> <Created by pedronis> <https://github.com/snapcore/snapd/pull/9340>16:40
ijohnsoncan I mark 9341 ready for review ?16:40
ijohnsonpedronis: thanks will try to look before my meeting in 5 minutes16:41
mvoijohnson: +116:41
mvoijohnson: done that now16:41
mborzeckimvo: the regular gadget update spread test mangles gadget.yaml with some python code, but we can improve the nested test later16:41
mvomborzecki: yeah, that sounds reaonable. as long as its a silly issue on the test side by me I'm happy16:42
mvomborzecki: I put this on my todo16:42
mborzeckimvo: take a look at tests/core/gadget-update-pc/generate.py16:42
ijohnsonpedronis: ah it occurs to me now that all of the tests using MockTrustedAssetsBootloader are not using ExtractedRunKernelImageBootloader logic, they are all using EnvRefExtractedKernelBootloader's instead16:44
ijohnsonnot sure if that's an issue or not16:44
mvomborzecki: cool, thanks16:44
ijohnsonbecause you have snap_kernel in the bl vars, which won't be the case for a real trusted assets bl because it will be an extracted one16:44
* ijohnson -> meeting16:45
pedronisijohnson: that is something that comes from the previous PR16:45
pedronisthat we just landed16:45
ijohnsonpedronis: yes I didn't realize it before16:45
ijohnsonI don't think it's an issue16:45
ijohnsonI don't think it's a large issue16:45
mborzeckiijohnson: yeah, mocking the bootloader was a bit pita16:45
pedroniswell, bootloader is a zoo that we need to fix16:45
ijohnsonbut it would be nice if we had at least one unit tests that uses both the extracted run kernel image impl and the trusted assets bl16:45
ijohnsonbecause we won't have any devices in the wild for a while yet that are env ref kernel and trusted assets16:46
* ijohnson -> really meeting now16:46
* mvo needs to taxi around again16:46
pedronismborzecki: anyway this a problem we get from bootloadertest, no?16:47
pedronisWithTrusted assets has the wrong mix of things?16:47
pedronisall the test do is tab := bootloadertest.Mock("trusted", "").WithTrustedAssets()16:48
mborzeckipedronis:  yes16:48
mborzeckipedronis: we'd have to have different variants of trusted assets bootloader, one for extracted run kernel, and another for env ref16:49
pedroniswell, we really need one for now16:49
pedronisbut we have kind of the wrong one :)16:49
pedronisanyway this is not something I can solve in my PR16:50
pedronisI can add a TODO though16:50
ijohnsonwe won't have any trusted assets bootloaders in practice that are env ref (like uboot or lk) until we add support for more bootloaders to uc2016:50
ijohnsonthe only trusted bootloader today will be grub which is always extracted run kernel on uc2016:50
ijohnsonyes I think TODO is fine16:51
pedronisyes, what I meant with we need one16:51
mborzeckiiirc tried to use extracted run kernel one at some point, but it was a bit larger than any of the prs i had at the time16:51
mborzeckii can look into that tomorrow morning16:52
ijohnsonI don't doubt that it was incredibly complicated16:52
pedronisyes, it has a more involved interface16:52
ijohnsonmock bootloaders is a mess of a zoo on top of the mess of a zoo that is already bootloaders16:52
pedronisand usually we have helpers around it16:52
pedronismborzecki: it might not be worth fixing it, before we fix the zoo16:52
mborzeckithat may be true16:52
pedronismborzecki: ijohnson: added this: https://github.com/snapcore/snapd/pull/9340/commits/e22c82485de00a54aeb8a31bec7092af8977899116:56
mupPR #9340: boot: streamline bootstate20.go reseal and tests changes <Run nested> <UC20> <Created by pedronis> <https://github.com/snapcore/snapd/pull/9340>16:57
ijohnsonpedronis: thanks16:57
ijohnson+1d the pr16:58
mborzeckitake a look at the changelog: https://bodhi.fedoraproject.org/updates/FEDORA-2020-0d5e544db717:04
mborzecki/var/lib/snapd/snap/bin is in secure_path in sudo config on fedora now17:04
mborzeckizyga: Eighth_Doctor: ^^17:04
zygamborzecki In a call17:13
zygabut will look soon17:13
cachioijohnson, hey, I updated #932617:15
mupPR #9326: tests: fix for basic20 test running on external backend and rpi <Simple 😃> <Created by sergiocazzolato> <https://github.com/snapcore/snapd/pull/9326>17:15
zygabodhi seems very slow somehow17:15
cachioif you could take quick look would be nice17:16
ijohnsonthanks cachio will look in a bit17:16
Eighth_Doctormborzecki: yay!17:18
pedronismborzecki: ijohnson: I proposed #935517:28
mupPR #9355: boot: with unasserted kernels reseal if there's a hint modeenv changed <Run nested> <Skip spread> <UC20> <Created by pedronis> <https://github.com/snapcore/snapd/pull/9355>17:28
mupPR snapd#9355 opened: boot: with unasserted kernels reseal if there's a hint modeenv changed <Run nested> <Skip spread> <UC20> <Created by pedronis> <https://github.com/snapcore/snapd/pull/9355>17:31
mborzeckimvo: pedronis: i've pinged you in #935418:24
mupPR #9354: client: bump the default request timeout to 120s <Run nested> <Simple 😃> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9354>18:24
ijohnsonmmm minimal-smoke on uc18 strikes again in that pr18:24
ijohnsonI'll file a pr with that sync idea I had, I don't think it can hurt anything18:24
mborzeckiijohnson: ok, if not i'll try to investigate that tomorrow18:25
ijohnsonnote that I only reproduced it by running spread in a loop, don't use the -repeat option to spread, because that will just reuse the same vm18:25
ijohnsoni.e. try something like `for i in $(seq 1 10); do spread --debug google-nested:ubuntu-18.04-64:tests/nested/manual/minimal-smoke; done`18:26
mborzeckiuhh #9341 has spread jobs still queued18:26
mupPR #9341: tests: add nested core20 gadget reseal test <Run nested> <UC20> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9341>18:26
ijohnsonmborzecki: hmm try canceling the jobs and restarting them maybe?18:27
mborzeckii'll let it sit for a while18:27
mborzeckiactually need to wrap it up, chase the kids to their beds otherwise they are going to have hard time getting up for school tomorrow18:28
ijohnsonoh man don't ever run `go test -count=100 ./...` unless you want to cry and be scared about our tests18:28
ijohnsonso many tests get broken somehow18:28
ijohnsonperhaps we should have a spread test which just does this18:29
mvoI see that the checks for 9340 got canceled, why?18:33
ijohnsonI assume lots of tests aren't actually properly cleaning themselves up18:33
ijohnsonmvo: I just canceled them to restart them18:34
ijohnsonthey were stuck18:34
ijohnsonrestarted now18:34
pedronisijohnson: I'm not even sure go check supports that tbh18:34
ijohnsonpedronis: oh really ? huh18:35
pedronisit's a go test option after all18:35
pedroniswe have one Test in the go level sense per package18:35
pedronisso not sure if it makes sense, what it does in practice for us18:35
ijohnsonmmm I guess gocheck doesn't even have a count function18:36
ijohnsonerr s/function/option/18:36
pedronisit doesn't18:37
ijohnsonstill feels like this should work though18:37
ijohnsonand to be fair for low numbers of count it does seem to work18:37
pedronisanyway I fear is a matter where feelings don't count. if we have enough seen flakiness without poking for more18:38
pedronismvo: I canceled them because I had more to push18:38
ijohnsonyes I'm ok if we just want to shove this under the rug until it seems relevant18:39
pedronismvo: I'm not sure how you want to proceed  landing wise18:39
pedronismvo: do we need to sync?18:39
mvo pedronis about the order?18:39
pedronisyes, and whether you want to have some control over what we run tests for vs not18:39
pedronismvo: we have 6 PRs for 2.47: https://github.com/snapcore/snapd/milestone/3218:40
pedronisplus the one Maciej asked to consider18:41
mvopedronis: which one is the one from maciej?18:41
mvopedronis: yeah, maybe a sync would be best18:41
pedronismvo: https://github.com/snapcore/snapd/pull/935418:41
mupPR #9354: client: bump the default request timeout to 120s <Run nested> <Simple 😃> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9354>18:41
pedronismvo: we can sync if you want18:42
mvopedronis: yeah, let me grab my headset18:42
mvopedronis: ready when you are in the standup channel18:43
mupPR snapd#9340 closed: boot: streamline bootstate20.go reseal and tests changes <Run nested> <UC20> <Created by pedronis> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9340>18:51
mvoijohnson: do you think you could have a look at 9355 ?18:54
ijohnsonmvo: sure it is in my queue for today, should I look now?18:54
mvoijohnson: I had hoped to land it in my day but I'm not sure that is feasible :/18:55
ijohnsonmvo: ack I will look at it now then18:55
mupPR snapd#9353 closed: configcore: do not error in console-conf.disable for install mode <Created by mvo5> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9353>19:02
mvoijohnson: thank you!19:06
mupPR snapd#9354 closed: client: bump the default request timeout to 120s <Run nested> <Simple 😃> <UC20> <Created by bboozzoo> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9354>19:07
mupPR snapd#9356 opened: sysconfig,o/devicestate: mv DisableNoCloud to DisableAfterLocalDatasourcesRun <UC20> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9356>19:12
ijohnsonmvo: pedronis: approved 9355 conditional on my understanding being correct about a question on gadget update codeflow19:12
* zyga will sort out family and will return to make some more changes to export manager 19:14
pedronisijohnson: I answered, let me know if the answer makes sense19:14
* ijohnson looks19:14
ijohnsonpedronis: yep that's basically what I thought so good to go from me19:15
mupPR snapd#9357 opened: interfaces/builtin/process-control: add scheduler_setattr to seccomp <Needs security review> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9357>19:47
cmatsuokamvo: the gadget update failed19:51
mvocmatsuoka: oh no, in what way?19:52
cmatsuokamvo: it tries to compare the volume name and ends with "cannot find device matching structure #0 ("mbr"): unexpected number of matches (0) for /sys/block/*/ubuntu-data-20833fbc-9153-4ccd-b00b-d687b76dfd0a"19:53
zygajdstrand lol19:53
zygajdstrand funny19:53
jdstrandmy review comment for that? ^ :)19:53
mvocmatsuoka: woah, that is strange19:53
pedronismvo: cmatsuoka: that's a more fundamental problem, not really related to reseal19:53
zygajdstrand I asked about RR scheduling19:54
zygait feels unsafe19:54
cmatsuokapedronis: yes, exactly19:54
mvocmatsuoka: thanks! and yes, not related to reseal :(19:54
zygaunless process control is very privileged already19:54
cmatsuokamvo: the full error message is in my SU notes, now checking the gadget update code19:55
mvocmatsuoka: thanks, I think worst case is we need to skip this PR for 2.47~rc119:55
jdstrandzyga: it is quite privileged already19:55
jdstrandyou can kill or renice an process19:56
zygajdstrand the risk here, as you surely know, is that rr scheduling can lock out all processes and hang the system for most practical purposes19:56
ijohnsonfixed my typo :-)19:56
zygaijohnson do you know if the desire is to use RR scheduling or anything else?19:56
cmatsuokamvo: did we test gadget updates recently? it could be there since we added the uuid to ubuntu-data19:56
jdstrandzyga: sure. so can creative use of kill :)19:56
ijohnsonzyga: all I know is what's on the forum post19:56
ijohnsoncmatsuoka: we should have gadget asset update spread tests that run on every pr19:57
zygajdstrand ish, we can mediate kill, there's no real way around runaway RR process19:57
zygaijohnson let me read it19:57
ijohnsoncmatsuoka: see tests/core/gadget-update-pc19:57
cmatsuokaijohnson: ah ok, that's good, thanks19:57
ijohnsonmaybe the test is insufficient for what we changed in uc20 though19:58
jdstrandzyga: plus, we can't really mediate to the level that we would like since we can't control the pid or the sched_attr struct19:58
zygaoh, right :(19:58
jdstrandzyga: I'm saying, we can mediate kill, and we do, but we grant it in process-control19:58
zygaI wish we had that CPU pinning interface so that we can ensure at least one core remains for system apps19:59
zygathat's fine, I just wanted to raise this19:59
jdstrandif we get some finer controls, we can move towards breaking this up19:59
ijohnsoncmatsuoka: I think that mborzecki had a comment about gadget asset updates this morning and there was something for him to work on related to this area19:59
ijohnson(specifically around the device mapper nodes,etc.)19:59
cmatsuokaijohnson: the existing test is unencrypted only, and it fails with the encrypted device name with the uuid I think20:00
ijohnsoncmatsuoka: though looking at the generate.py, it only handles ubuntu-seed edition updates afaict20:00
ijohnsoncmatsuoka: right20:01
ijohnsonyeah so I think this is the bug that mborzecki was referring to this morning20:01
pedronishttps://github.com/snapcore/snapd/pull/9341/files has the bug in the comment20:02
mupPR #9341: tests: add nested core20 gadget reseal test <Run nested> <UC20> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9341>20:02
pedronisin the test20:02
ijohnsoncachio: sorry one more comment on 932620:03
pedronismvo: I'm not sure I understand the test in #9341, isn't it changing a file that we ignore20:04
mupPR #9341: tests: add nested core20 gadget reseal test <Run nested> <UC20> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9341>20:04
cmatsuokaok, so maciek is aware20:04
ijohnsonpedronis: isn't the recovery grub in the boot chain for run mode though?20:05
ijohnsonpedronis: ahhhh but that's the recovery grub executable not the config20:05
ijohnsongood catch20:05
pedronisthe config should come from snapd20:05
cachioijohnson, done20:06
ijohnsonpedronis: but is the .cfg file included in the resealing ?20:06
pedronisit's not a trusted asset either way20:06
pedronisso I'm not quite sure what that test does20:07
ijohnsonright so a "realer" test would involve changing grubx64.efi right?20:07
ijohnsonthanks cachio, +120:07
cachioijohnson, thanks20:07
pedronisijohnson: yes20:07
ijohnsonhmm I wonder if we can just pad 0's on the end or something20:07
cachioI am going to pay some taxes now20:07
cachioI'll be back in 30 minutes20:07
* cachio afk20:08
pedronisijohnson: mvo: anyway in theory we should drop those .conf files from the gadget20:08
pedronisthey should be ignored anyway20:08
mvopedronis: uh, good catch indeed. but the test seems to indicate we do reseal?20:08
pedronismvo: well, we reseal at each reboot20:08
pedronisso not sure20:08
mvoat least it did indicate that at some point when I was running it20:08
mvopedronis: fun :(20:08
mvopedronis: ok, then the test is no good20:09
pedronisI mean, it will mean something after my branch lands20:09
mvopedronis: let me mark it as blocked and it will need a better test20:09
ijohnsonmvo: pedronis: I can try modifying the test to just pad 0's on the end of grubx64.efi, not sure if that will boot or not20:09
mvoijohnson: \o/20:10
pedronisijohnson: it's a pe binary I think, there might be some tools to change something in it and then sign it again20:11
ijohnsonmmm yeah but padding 0's on the end is easier if it works20:11
pedronisit has section with sizes, so not sure what adding 0 add it end does20:12
pedronisI would suspect not to work but maybe I'm wrong20:12
pedronismaybe it works but it's a noop, not sure20:12
mvoeasy enough to try :)20:13
ijohnsonalright building image with 0 appended to the grubx64.efi20:13
ijohnsonlet's see what happens20:13
ijohnsonnope didn't work20:15
ijohnsonalright let's look at these pe tools then20:16
pedronisijohnson: you stripped the signature added the zero and signed it again with snakeoil?20:17
pedronis(just making sure=20:18
ijohnson... no? I just added 0's20:18
pedronisit's signed20:18
ijohnsonI guess I didn't realize I needed to resign grub20:18
ijohnsondoes shim verify the signature of grub too?20:18
* cmatsuoka afk for a few minutes to replace a faulty network card20:18
ijohnsondo I need to modify shim at all? I already resigned shim with snakeoil20:18
pedronisijohnson: yes20:19
pedronisbut if you modify grub, you need to sign with snakeoil too20:19
pedronisI mean strip signature and sign with snakeoil20:19
pedronisthe same thing we do with shim20:19
ijohnsonI guess the question is, do I need to modify shim beyond just strip signature from shim and resign shim with snakeoil20:19
ijohnsonack ok I will try that then20:19
pedronisijohnson: no, that's all you need to do, is the same princicple by which we can then sign kernel with snakeoil20:20
pedronisso far we didn't need to sign grub again20:20
pedronisbut if we change it we have to20:20
pedronisijohnson: for all I know changing what signs it might enough of a change20:21
pedronisbut not sure20:21
ijohnsondoes the tpm resealing use the full hash of the file? or maybe it strips the signature and hashes the actual binary content minus the signature20:22
pedronisijohnson: the latter, but it happens only on cmatsuoka PR20:22
pedronisbefore it only matters what are the ca20:22
pedronisbut as I said it might be that signing grub changes that, we would have to look at tcg log20:23
ijohnsonalright I'm at least able to boot if I resign grub with snakeoil, I'm just gonna wait to see if install finishes with grub resigned with snakeoil too just to be extra sure20:25
ijohnsonthen I'll try my resigned with snakeoil grub that is padded with a 020:25
mvoadded a comment and marked as blocked20:26
mvoijohnson: feel free to remove blocked once your changes are there :)20:26
ijohnsonmmm this seems not good20:30
ijohnsonstateengine.go:150: state ensure error: devicemgr: cannot mark boot successful: cannot compose run mode boot chains: cannot find expected boot asset bootx64.efi in modeenv20:31
pedronisthat is weird20:33
ijohnsonthis is with an edge snapd20:33
ijohnsonbut with rebuilt kernel and gadget snaps so maybe I broke something20:34
ijohnsonlet me retry this boot20:34
pedronisijohnson: mvo: the kernel test has the same problem btw, we are reinstalling the same, we should at least unpack it and add a file to the initramfs or something. anyway doing that is much more typical , we do it already20:35
ijohnsonmmm good point20:36
ijohnsonyeah this is what I expected to be needed eventually, I can also work on preparing that kind of change too20:36
pedronisit's probably more important than the gadget20:36
ijohnsonI already do something similar in the kernel-failover test to make the initramfs panic20:36
pedronisright now20:36
pedronisyea, that covers our case a bit but given it fails to boot, we don't have e2e reseal story with the failover one20:37
pedroniswe just reseal back to the one kernel from before20:37
ijohnsonmmm I can still reproduce this issue with cannot find expected boot asset in modeenv20:37
pedronisthat is weird, is install failing to write the right bit in modeenv20:38
ijohnsonpedronis: if it's okay I would like to keep trying to understand this failure, because it just started today20:38
pedroniswe are installing and booting in spread  though20:39
pedronisso seems something misaligned20:39
pedronisor we have a silly bug that spread doesn't exercise20:39
ijohnsonnow I'm trying just a fresh boot with all edge snaps without rebuilding anything, so all ms key signatures20:40
ijohnsonpedronis: we should probably have a spread test too which just tries to boot uc20 with all edge snaps, but with snapd from spread20:40
pedronisbtw the gadget-reseal test failed on that PR20:41
pedronisweird, stat: cannot stat '/run/mnt/ubuntu-seed/device/fde/ubuntu-data.sealed-key'20:41
ijohnsonalright I reproduced this install failure with all edge snaps20:42
ijohnsonso we have a real problem somewhere20:42
pedronissnapd from edge as well?20:44
ijohnsonpedronis: yes20:44
ijohnsonpedronis: here's a full log and all the snaps I used to build the image: https://pastebin.ubuntu.com/p/wgMYSfKZjv/20:44
* ijohnson needs to step out for a little bit, will return to this in 10ish minutes20:44
pedronisijohnson: I suspect kernel doesn't have yet the right bits?20:45
pedronisI mean snap-boostrap dropping values when it deals with modeenv ?20:45
ijohnsonpedronis: ah I bet you are right20:46
mvokernel gets the initrd bits from the beta channel so that would be 2.46 until we release 2.47~rc120:46
ijohnsonpedronis: I will verify that theory when I get back20:46
mvothanks, I need to drop, will read standup doc in the morning or mattermost/tg20:46
* mvo waves20:46
ijohnsonok back now21:13
zygaijohnson how is sealing going?21:15
ijohnsonWe seem to have regressions with the set of snaps on edge21:15
ijohnsonPresumably because something with resealing is broken with the old snap-bootstrap in the initramfs21:16
zygaso no 2.47?21:16
ijohnsonnot tonight at least21:17
ijohnsonunclear if this is a blocker or not21:17
ijohnsonI mean obviously it's a blocker that edge images can't boot21:17
ijohnsonerr can't seed/isntall21:17
zygaoh well21:18
zygaI think everyone working on this is really tired21:18
zygaso no surprise something doesn't work sometimes21:18
ijohnsonyeah I think we just missed it due to a lack of testing for this scenario specifically, with edge kernel snap instead of rebuilt kernel snap21:19
zygaah, we always rebuild the kernel for compatibility?21:19
ijohnsonyes in spread tests, we rebuild the initramfs to have the new snap-bootstrap bits21:20
zygadrat I need to go to the office21:20
zyga(I'm coding from bed)21:20
zygaanyway, time to sleep21:20
ijohnsonpedronis: yes so rebuilding with snap-bootstrap from the edge snapd works to boot an edge uc20 image21:20
zygaI'll finish this in the morning21:20
ijohnsonbye zyga!21:20
zygagood luck guys21:20
ijohnsonhave a good evening21:20
zygayou too :)21:20
ijohnsonpedronis: I'm looking into what bits from snap-bootstrap might be effected here21:21
pedronisijohnson: it the modeenv manipulations I suspect, they don't preserve fields they don't know about21:22
ijohnsonyes that's my suspicions as well21:22
ijohnsoncurrently looking at InitramfsRunModeSelectSnapsToMount21:22
ijohnsonbut what I'm confused about is that nothing should have changed to require a rewrite of the modeenv on first boot of run mode21:23
pedronisijohnson: do we have base set as try for some reason?21:30
ijohnsonpedronis: I don't think so21:30
ijohnsonbut I'm trying to get a peek at the modeenv as written by install mode before run mode snapd runs21:30
ijohnsonbecause I think part of what happens is that we can't run and so snapd uninstalls things21:31
ijohnsonpedronis: no, the modeenv is not rewritten from the initramfs during first boot21:39
ijohnsonpedronis: after snap-bootstrap exits, I still see the same modeenv with current_trusted_boot_assets, etc. in it21:40
ijohnsonso it must be something else21:40
pedronisthis very weird21:40
ijohnsonoh wait that was with the new snapd21:40
ijohnsonsorry let me re-run that test21:40
=== mwhudson_ is now known as mwhudso
=== mwhudso is now known as mwhudson
ijohnsonpedronis: ok so after the snap-bootstrap in the kernel snap runs, the modeenv does  _not_ have the keys in it21:43
ijohnsonbut try_base is not set21:43
ijohnsonso it's a bit unclear to me why we would rewrite the modeenv21:43
ijohnsonbut rewriting the modeenv seems to be the issue here I think21:43
pedronisijohnson: I agree, seems a bug?21:44
ijohnsonyes probably a bug21:44
pedronisanyway we need to teach modeenv to behave correctly21:44
ijohnsonthat too21:44
pedronisbut would still be good to understand the bug21:44
pedronisyou might have to look at the code in 2.46 though21:44
pedronisbecause things have changed since21:44
ijohnsonah yes very good point21:44
ijohnsonactually wait21:44
ijohnsonno we need to look at 2.45.2 (?) code21:45
ijohnsonI don't think that ubuntu-core-initramfs was ever rev'd21:45
ijohnsonactually I don't even know what version of snapd snap-bootstrap is in the kernel snap right now21:45
* ijohnson goes to go digging21:45
ijohnsonso ubuntu-core-initramfs is at version 36, uploaded on may 15th21:46
ijohnsonso snapd 2.45+20.04 is what's in the initramfs right now21:47
pedronisijohnson: it was writing to modeenv incondtionally21:50
ijohnsonpedronis: yeah so we always write the modeenv from the initramfs in 2.4521:50
ijohnsonah yes21:50
ijohnsonwe found the same code :-)21:50
ijohnsonhmm, I'm not sure what to do21:50
ijohnsonmaybe we need to re-release snapd snap to the previous one, and then get xnox to re-build the initramfs asap and release a new kernel snap that doesn't unconditionally write the modeenv21:51
ijohnsonpedronis: thoughts ?21:51
pedronisijohnson: how is the code in 2.46 ?21:53
ijohnsonpedronis: let me look quickly21:53
ijohnsonpedronis: 2.46.1 does not always write modeenv21:54
ijohnsonpedronis: let me quickly check that 2.46.1 snap-bootstrap boots properly21:54
ijohnsonpedronis: 2.46.1 snap-bootstrap works22:00
pedronisok, notice that once we switch to v1 for sealed keys we'll have a bigger problem22:01
ijohnsonpedronis: how so ?22:01
ijohnsonI see22:01
ijohnsonyeah that will be a problem22:01
pedronisthe secboot inside old kernel won't know what to do with it22:02
ijohnsonI think the right thing to do is to get new initramfs re-spun as soon as possible22:02
ijohnsonregardless of what we do with snapd on edge right now22:02
pedronisanyway we need a fix in master to teach modeenv to keep and write back any fields it doesn't use/know22:04
ijohnsonpedronis: yes I'm working on that right now22:04
ijohnsonpedronis: https://github.com/snapcore/snapd/pull/935822:20
mupPR #9358: boot/modeenv: track unknown keys in Read and put back into modeenv during Write <UC20> <⚠ Critical> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9358>22:20
ijohnsonmmm maybe I should add a DeepEquals test for that too22:21
* ijohnson goes to write one22:21
mupPR snapd#9358 opened: boot/modeenv: track unknown keys in Read and put back into modeenv during Write <UC20> <⚠ Critical> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9358>22:22
ijohnsonok, done22:26
ijohnsonpedronis: I'm gonna eod now, but I will be semi-around for reviews or to land things if needed, otherwise I will be working early tomorrow22:27
=== ahayzen_ is now known as ahayzen
pedroniscmatsuoka: are you still around?23:06
cmatsuokapedronis: yes23:06
pedronisI just landed my PR, could you merge master into #9277 ?23:06
mupPR #9277: secboot: add boot manager profile to pcr protection profile <Run nested> <UC20> <⛔ Blocked> <Created by cmatsuoka> <https://github.com/snapcore/snapd/pull/9277>23:06
cmatsuokapedronis: sure, doing it now23:07
mupPR snapd#9355 closed: boot: with unasserted kernels reseal if there's a hint modeenv changed <Run nested> <Skip spread> <UC20> <Created by pedronis> <Merged by pedronis> <https://github.com/snapcore/snapd/pull/9355>23:08
mupPR snapcraft#3288 opened: cmake v2 plugin: add help for cmake generators <Created by sergiusens> <https://github.com/snapcore/snapcraft/pull/3288>23:13

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!