mborzeckidriving the kids to school, bbl05:31
gkoHello, I'm trying to install snap on a CentOS, but get "No package snapd available." even after "sudo yum install epel-release"...06:05
gko(Package matching epel-release-7-11.noarch already installed.)06:06
mborzeckigko: centos 7/8?06:10
gkoIf I search "snapd", I get: "snapd-debuginfo.x86_64 : Debug information for package snapd" and "snapd-glib-debuginfo.x86_64 : Debug information for package snapd-glib"06:12
gkoCentOS Linux release 7.6.1810 (Core)06:12
mborzeckigko: seems working here: https://paste.ubuntu.com/p/z8JmK6qJZX/06:13
zygagood morning06:15
mborzeckigko: is epel actually enabled? maybe only epel-debuginfo is enabled, can you paste the output of `yum repolist`?06:16
* zyga eats breakfast, will start soon06:22
mupPR snapd#9337 closed: boot,many: reseal only when meaningful and necessary <Run nested> <UC20> <Created by pedronis> <Merged by bboozzoo> <https://github.com/snapcore/snapd/pull/9337>06:23
mborzeckimvo: morning06:26
mvomborzecki: good morning06:26
zygahey mvo06:26
mborzeckimvo: left you a comment yesterday https://github.com/snapcore/snapd/pull/9341#issuecomment-691692486 but looks like you missed it06:27
mupPR #9341: tests: add nested core20 gadget reseal test <Run nested> <UC20> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9341>06:27
mborzeckimvo: anyways, i've merged 9337, so maybe you can just merge master and push again06:28
mborzeckimvo: fwiw, the nested test passed when i pushed that commit i mentioned to your pr06:28
mvogood morning zyga06:28
mvomborzecki: cool, looking06:29
zygamvo: today is different, wife's Mondays got swapped to the afternoon shift06:29
gkomborzecki: repo id                                                   repo name                                                       status06:31
gkoepel                                                      EPEL                                                                 906:31
mvomborzecki: I merged master into 9341 now06:33
mborzeckimvo: cool06:33
mvomborzecki: thank you! and you said it passed earlier?06:33
mborzeckimvo: yes, i think so, it was one of those github action job status emails, let me check if i still have it in the trash06:34
mvomborzecki: nice06:35
mborzeckimvo: https://paste.ubuntu.com/p/kX8kqQSQ9y/ that's the branch right?06:35
mvomborzecki: I think it passed locally but it was a bit of a pain, some strangess like I see in the "snap change 8" that installs the pc gagdet an info about "restarting snapd" but it does restart the system06:35
mborzeckiah w8, 4 annotations, pfff missed that06:35
mborzeckimvo: saw a column of 0s and was too happy about that ;)06:36
mvomborzecki: haha06:36
mvomborzecki: no worries06:36
mvomborzecki: I think the test itself is good06:36
mvomborzecki: but it highlighted some small issues06:37
mborzeckigko: it's showing `epel/x86_64   Extra Packages for Enterprise Linux 7 - x86_64       13,446` ?06:37
mvomborzecki: anyway, I'm quite happy that things seems to be working :)06:37
mborzeckigko: something seems off in your system, it's listing 13k packages here, but only 9 in your setup06:38
gkomborzecki: right... no wonder if can't find anything.06:39
mborzeckigko: maybe try to `yum reinstall epel-release`?06:40
mupPR snapd#9339 closed:  boot: make MockUC20Device use a model and MockDevice more realistic  <UC20> <Created by pedronis> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/9339>06:43
mvomborzecki: hm, 9331 has conflicts now :/06:53
mborzeckimvo: yup, resolving them right now06:53
mvomborzecki: \o/06:53
gkomborzecki: OK, my fault... there was another repo file also using epel... Thanks!06:56
mborzeckimvo: updated06:59
mvogood morning pstolowski07:07
* tobias_ waves general good morning07:12
mborzeckipstolowski: hey07:32
* zyga canceled PT and jumps into code07:35
pedronismvo: mborzecki: hi, do we need to sync?07:43
mborzeckipedronis: hi, yes, in 5? (cc mvo?)07:43
mvopedronis, mborzecki sounds good07:43
mvozyga: in a call right now08:01
mborzeckimvo: tweaked that diff a bit more https://paste.ubuntu.com/p/K9Dg4tQgkg/08:31
mvomborzecki: thanks!08:32
mvomborzecki: running locally now08:33
mborzeckimvo: i'm setting up an image for running nested locally too, are the nested tests using qemu from the repos or some special-flavor one?08:39
pedronispstolowski: hi, can you setup a slot after your lunch to chat about next topics?08:39
pstolowskipedronis: hey, sure08:40
mvomborzecki: I'm just using focal08:41
mvomborzecki: I merged 9311 to have more options08:42
pstolowskipedronis: sent08:42
mvomborzecki: also make sure you have master, 9310 and 9305 are important08:42
mvomborzecki: and then I just run spread -debug -v qemu-nested:ubuntu-20.04-64:tests/nested/core20/kernel-reseal08:43
mvomborzecki: for me (with amd) nested kvm also works so things are somewhat fast08:44
mvomborzecki: on my intel laptop that appears to be not working but I did not debug this further because the laptop is slower anyway08:44
mborzeckimvo: mhm, let me try that08:44
mvomborzecki: for intel it's also: $ cat /sys/module/kvm_*/parameters/nested08:45
mvomborzecki: (instead of 1 on amd)08:45
mvomborzecki: so the existing detection for support will not trigger on intel (which is a bit of a feature because it did not work for me :)08:45
mvomborzecki: anyway, hope that I'm not spaming you too much08:46
mvomborzecki: usually I then just telnet into the spread serial port and login and monitor stuff08:46
mvomborzecki: like tail /tmp/work-dir/logs/serial.log or journalctl -u nested-vm08:46
mvomborzecki: etc08:46
mborzeckimhm, the /sys/ bit is set to 1, so that's good08:47
mvomborzecki: but I wish we could get a more holisitic view from spread too08:47
mvomborzecki: yay08:47
mvomborzecki: is that on intel or amd for you?08:47
mborzeckiwaiting for the base image to get updated, thena  reboot and i'll try to run it08:47
mborzeckimvo: amd08:47
mvomborzecki: cool08:49
mvomborzecki: something seems to be not quite working with your diff, no sanpd output on the serial port right now :/ and huge delays09:01
mborzeckimvo: this is what i see in the logs in the vm the prepare fails:09:01
mborzeckiSep 14 08:51:38 ubuntu snapd[721]: devicemgr.go:725: System initialized, cloud-init reported to be done, set datasource_list to [ None ]09:01
mborzeckiSep 14 08:51:44 ubuntu snapd[721]: taskrunner.go:271: [change 2 "Request device serial" task] failed: cannot deliver device serial request: Cannot process serial request for device with brand "BhgbYoDtThegqVkEU7oiZP8GQwCoUIxz" and model "pc"09:01
mborzeckiSep 14 08:56:45 ubuntu snapd[721]: taskrunner.go:271: [change 3 "Request device serial" task] failed: cannot deliver device serial request: Cannot process serial request for device with brand "BhgbYoDtThegqVkEU7oiZP8GQwCoUIxz" and model "pc"09:01
mvomborzecki: I think the request serial are red-herrings09:02
mvomborzecki: would be great to get more debug output from cloud-init I guess :/09:02
mvomborzecki: i.e. if it actually created the user for us09:03
mvomborzecki: ha, we don't set "enable_ssh" in from nested.sh so your tweaks need a slightly different place09:10
mvomborzecki: or we just enable ssh, wonder why we don't09:10
mvomborzecki: I tweaked "nested.sh" now to run "repack_snapd_snap_with_..." to set "enable_ssh" to true, this should also make debugging a lot simpler, lets see how it goes and if that breaks anything09:18
mborzeckimvo: hmm ssh seems to be working fine, i can ssh into the vm after prepare fails :/09:19
mborzeckimvo: heh, and the test is ofc executing now09:23
pedronismborzecki: mborzecki: yes the deliver serial request are red-herrings, we just need to use serial-authority09:23
pedroniswe probably should because they just pollute the logs for nothing09:23
mupPR snapd#9221 closed: tests: disk space awareness spread test <Disk space awareness> <Created by stolowski> <Merged by stolowski> <https://github.com/snapcore/snapd/pull/9221>09:24
pedronismvo: with None I would suspect not09:25
pedronisabout creating the suer09:25
pedronisit should say NoCloud if it created the user ?09:26
mvomborzecki: oh, can you paste all the info you have why preapre failed?09:26
pedronisanyway it seems the cloud-init info is not seen/passed right if we get None09:26
mvopedronis: oh, interessting09:28
pedronisor maybe is another strange cloud-init corner case09:28
pedronisin principle we should teach the code to turn None into disabled09:30
mborzeckimvo: https://paste.ubuntu.com/p/MF6sjVhwyr/ timeout checking whether snapd seeded09:31
mborzeckimaybe we should have that timeout configurable via env too09:32
mvomborzecki: and if you ssh into the nested system, what do you see there for "journalctl -u snapd" ?09:33
mvomborzecki: i.e. does it actually fail to seed?09:33
mvomborzecki: i.e. "nested_exec sudo journalctl -u snapd" ?09:33
mborzeckimvo: no, just the snap command hit a timeout, the system seeded ok09:33
mvomborzecki: ohhh09:33
mvomborzecki: ok, let me push something09:33
pedronisI don't even understand where we build the cloud-init data atm with a bit of grepping09:34
mvomborzecki: I pushed a small change that waits for the snap command to become available as suggested by ian09:35
mvomborzecki: this should make this part more robust09:35
mborzeckimvo: from what i can see, the error comes directly from our client code09:38
mvomborzecki: if you log into the system, do you also get a client timeout then or was this a one-off thing?09:39
mborzeckimvo: maybe it needs to multiple steps, i.e. the command -v snap loop, then make the timeout somehow configurable via env (SNAP_CLIENT_TIMEOUT?)09:39
pedronismvo: it's our setup that is weird:  datasource_list: [ "None"]09:39
mvomborzecki: oh, interessting, do you think it's actually that slow?09:39
mvopedronis: oh, nice catch09:39
pedronisin nested.sh09:39
pedronismvo: in theory it's ok to use09:42
mborzeckihmm i use [NoCloud, None] usually09:43
pedronisyes, more typical, but None should be ok afaict09:44
pedroniswhat I mean, it should not cause problems09:45
pedronisyou should still get a user09:45
mvomborzecki: hrm,hrm,something in the nested vm after "Satrting create static device nodes in /dev" is really very slow :(09:46
pedronisdo we know when snap-bootstrap run?09:47
mborzeckihm, it takes 500s to read the reboot during initial install09:52
zygaread the reboot?10:06
pstolowskizyga: i've requested your re-review of #9270 because of a few more commits after your previous review (a few more cases where --root=.. was passed to systemctl)10:07
mupPR #9270: wrappers, systemd: allow empty root dir and conditionally do not pass --root to systemctl <Run nested> <Services ⚙️> <Created by stolowski> <https://github.com/snapcore/snapd/pull/9270>10:07
zygapstolowski: ack10:08
* mvo is away for a few min to pickup kids10:10
* zyga grabs some food10:23
pedronispstolowski: you didn't add a meet11:30
pstolowskipedronis: yes, let's use standup HO11:30
mborzeckimvo: have you looked at the snapd snap produced by repack_snapd_snap_with_deb_content_and_run_mode_firstboot_tweaks11:41
mvomborzecki: firstboot tweaks will fail because we do things differently, need to hack this a bit further, no real results yet :(12:01
mborzeckimvo: i'm trying with repacked core20, injecting the bits directly there12:01
zygabrb, tea12:05
mborzeckithe nested suite is dog slow :/12:09
mvomborzecki: yeah, it's all a bit frustrating12:16
mvomborzecki: I modified the image caching to use gzip -1 instead of xz locally but it does not make a  huge diffrence12:17
mvomborzecki: I think part of it is really trying to figure out what part exactly is so slow and if we do something silly somewhere12:18
mborzeckimvo: looking at dmesg timestamps in serial logs, install takes ~500s12:18
* zyga cleans up unit tests12:21
pedronismborzecki: is that just install? without first seed?12:21
mborzeckipedronis: yeah, from first boot, to a reboot12:22
pedroniswe need install mode logs12:23
mvopedronis: +10012:23
mborzeckihmm, something's off, i've added a drop in override for journald to the core snap, sice it's a base that runs durin install, the logs should be visible on the console12:25
mborzeckifwiw, first reboot is at: [  549.003080] reboot: Restarting system12:25
mupPR snapcraft#3284 opened: build providers: rename clean() -> clean_parts() to clarify scope <Created by cjp256> <https://github.com/snapcore/snapcraft/pull/3284>12:39
mvomborzecki: I merge my install mode pr and see if that gives me any clues12:42
mborzeckimvo: which one is that?12:43
mupPR snapd#9342 opened: tests: add more checks to disk space awareness spread test <Disk space awareness> <Created by stolowski> <https://github.com/snapcore/snapd/pull/9342>12:44
mvomborzecki: 931712:50
mvomborzecki: running it now so after the standup we hopefully have results :)12:51
mborzeckiheh `Sep 14 12:42:14 ubuntu systemd[1]: Startup finished in 36.166s (kernel) + 1min 50.783s (userspace) = 2min 26.949s.`12:55
mborzeckithis is run mode starting up12:55
mborzeckimvo: this is the diff i'm trying right now: https://paste.ubuntu.com/p/hHY6hjWxTV/13:35
mvomborzecki: thanks13:35
mvomborzecki: nice, does it work?13:36
mborzeckimvo: not quite, idk why i'm not seeing the run system logs13:36
mvomborzecki: http://paste.ubuntu.com/p/CTQwsyvZCC/ is my heavily hacked stuff13:36
mborzeckimvo: want to try with systemd.journald.forward_to_console=1 in the command line?13:38
mvomborzecki: oh, excellent idea13:39
mvomborzecki: yeah, I mean, this is obviously just a quick hack to see if I can any extra data :/13:39
mborzeckimvo: higher chance of succeeding in getting more logs then i have have here with repacking13:40
pedroniscmatsuoka: we should try to see what happens combining #9340 and #9277 (for this we probably need to bump secboot version)13:40
mupPR #9340: boot: streamline bootstate20.go reseal and tests changes <Run nested> <UC20> <Created by pedronis> <https://github.com/snapcore/snapd/pull/9340>13:40
mupPR #9277: secboot: add boot manager profile to pcr protection profile <Run nested> <UC20> <⛔ Blocked> <Created by cmatsuoka> <https://github.com/snapcore/snapd/pull/9277>13:40
cmatsuokapedronis: ack13:41
mvomborzecki: I added it now but will let my current run continue13:42
mborzeckimvo: do you cache the kernel/core/snapd snaps somehow locally so that the vms do not have to download them all over again?13:43
mvomborzecki: I don't :( I think having a squid-deb-proxy or debcacher-ng would be helpful13:45
zygalucy is fast asleep - back to work13:59
mborzeckimvo: heh, and now swtpm socket isn't ready by the time the vm starts13:59
zygamborzecki: I have a snap proxy at home13:59
mvomborzecki: meh, it's getting worse and worse13:59
* zyga hugs mvo and mborzecki 13:59
mborzeckizyga: hm spread runs vms with -net user, so afaiu the vm will not have access to lan14:00
zygamborzecki: it does14:00
zyga-net user drops ping and stuff14:01
zygabut it works normally14:01
zygalan or otherwise14:01
zygaas long as the private IP is in another subnet from LAN14:01
zyga(private qemu-given IP)14:01
pstolowskicachio: hey, i just hit 2020-09-14 13:24:53 Cannot allocate google-nested:ubuntu-18.04-64: cannot find any Google image matching "ubuntu-1804-64-virt-enabled" on project "computeengine" or "ubuntu-os-cloud"14:02
cachiopstolowski, 1 sec14:03
cachioit is the gce issue14:03
pstolowskicachio: should i just restart the tests or is it more permanent?14:04
cachiopstolowski, try now please14:05
mvomborzecki: this is all very frustrating, even when adding systemd.journald.forward_to_console=yes it does not even work14:10
mvomborzecki: I mean, I still don't see anything in the serial log14:10
mborzeckimvo: i think that console conf hijacks the console14:11
mvomborzecki: and systemd.debug also does not work14:11
mborzecki(or the serial as such)14:11
mvomborzecki: I can try this14:11
mborzeckithe last line i see is a prompt from console conf14:11
mborzeckimvo: fwiw, might be an accident, but with haveged install time is now ~360s rather than ~50014:16
mborzeckiequally well might be some nework stuff causing that14:18
mupIssue core20#80 closed: networking does not persist in a reboot loop on arm64 pi4 <Created by anonymouse64> <Closed by xnox> <https://github.com/snapcore/core20/issues/80>14:28
pstolowskizyga: do you have a sec for https://github.com/snapcore/snapd/pull/9342 ?14:46
mupPR #9342: tests: add more checks to disk space awareness spread test <Disk space awareness> <Created by stolowski> <https://github.com/snapcore/snapd/pull/9342>14:46
zygapstolowski: sure14:46
pstolowskizyga: ty!14:48
mupPR snapd#9342 closed: tests: add more checks to disk space awareness spread test <Disk space awareness> <Created by stolowski> <Merged by stolowski> <https://github.com/snapcore/snapd/pull/9342>14:50
mborzeckimvo: hm maybe this https://paste.ubuntu.com/p/29Xz6JcJrg/15:03
mborzeckihopefully ttyS1 is not hijacked and we can still get logs out of it15:04
mvomborzecki: mux=on? should I also add this?15:06
mvomborzecki: nice, let's hope this gives output15:07
mborzeckimvo: if that doesn't work, we can always have a service taht does `journalctl -f > /dev/someserial`, unfortunately none of the systemd-journal-gateway* things are in the core snap15:09
pedronismborzecki: mvo: do we need console-conf in these tests? can't we turn it off? or turn it off in most? or turn it off while debugging?15:12
mvopedronis: yeah, I think we can turn them off15:14
mvopedronis: well, so … having console-conf means there is a way to login15:14
mvopedronis: so it's not entirely without merits but if we provide an alternative login then it's not needed15:14
mvopedronis: while trying to debug I can say that sealing to commandline works, I tried to change it and got a recovery prompt15:25
ijohnsonif we fail to unlock in the initramfs as part of a kernel snap update, will we reboot automatically and trigger rollback to the previous one? I don't think so, but perhaps it would be smart to teach the initramfs to do this for at least specifically the kernel snap, we could detect we are trying a kernel snap update before unlocking the encrypted partition just by looking at kernel_status and bootloader vars15:33
mborzeckimvo: not much of an improvement unfortunately, idk why journald just stops logging to serial console at some point15:33
mvomborzecki: yeah, I'm also a bit stuck here, trying out more things but it's very frustrating15:33
pedronisijohnson: it sounds reasonable but probably to be done after we have landed the current bits15:34
ijohnsonpedronis: ack I will make a small todo for myself to look into that15:34
ijohnson(for later on)15:34
mborzeckimvo: i need to taxi the kids around in 20 minutes, i'll open a branch with the patches i have15:35
mvomborzecki: cool, I keep exploring this15:39
mborzeckipff, something new `Connection timed out during banner exchange` when trying to ssh into a nested vm15:40
mupPR snapd#9343 opened: tests: more logging for UC20 kernel test <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9343>15:40
mborzeckimvo: need to go out now, check this commit: https://github.com/snapcore/snapd/pull/9343/commits/648801163d3d09f3db18dd71bec3d79690eda3b115:45
mupPR #9343: tests: more logging for UC20 kernel test <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9343>15:45
mvomborzecki: nice15:45
mvomborzecki: does it work :) ?15:45
mvomborzecki: I keep poking at this15:46
mborzeckimvo: idk yet, just added, check back in a bit15:46
mvomborzecki: \o/ cool15:46
mvomborzecki: thanks in any case15:46
mborzeckimvo: but that may be it, the dfault is 'debug' for journal/syslog, but only info for console15:46
mborzeckiok, got to go, bbl15:47
* cachio lunch15:58
ijohnsoncachio: I've seen the minimal-smoke test fail multiple times now due to issues with running spread inside the external system like this16:25
ijohnsonerr wait sorry need to find the other pastebin16:26
ijohnsonhere it is: https://pastebin.ubuntu.com/p/bmHw78Qfp4/16:28
ijohnsonany ideas on what might be wrong? it doesn't seem to be the case that there is no user on the VM we are using as an external system, just that we can't use sudo or somehow can't login the way that spread is trying to do16:29
ijohnsonthe first failure was from my pr https://github.com/snapcore/snapd/pull/9332, while the second failure was from https://github.com/snapcore/snapd/pull/931116:30
mupPR #9332: spread.yaml, tests/nested: misc changes <Run nested> <Simple 😃> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9332>16:30
mupPR #9311: nested: add support to telnet to serial port in nested VM <Run nested> <Created by mvo5> <https://github.com/snapcore/snapd/pull/9311>16:30
mupPR snapd#9344 opened: tests/lib/nested.sh: wait for the tpm socket to exist <Run nested> <Simple 😃> <Test Robustness> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9344>16:40
ijohnsonsimple nested test robustness PR ^16:41
mborzeckimvo: back for a bit, the logs are visible now!16:44
cachioijohnson, I'll take a look16:48
ijohnsonthanks cachio16:49
ijohnsonI'm running a spread run now of my pr to see if I can reproduce that issue16:49
mvomborzecki: yay17:05
mvomborzecki: running your PR locally now while waiting for spread to catchup17:08
mborzeckimvo: this is what i got in the last local run: https://paste.ubuntu.com/p/pfZsCCyBsS/17:12
mvomborzecki: nice! good debug logs! is it still running or did creating the user fail?17:15
mborzeckimvo: it's `Failed password for user1 from port 56336 ssh2` until the very end, either the user was not created yet (cloud-init does that right?), or something else went wrong17:15
mvomborzecki: ok17:15
mvomborzecki: so it failed, but least we have more logs now :)17:16
mvomborzecki: hm, at least this "[  161.918339] useradd[760]: new group: name=user1, GID=1000" is visible17:16
mborzeckimvo: yeah, at least we see it's resaling17:16
mborzeckimvo: maybe something wrong with ssh config then?17:17
mvomborzecki: yeah, I wonder if the "preauth" part of the failure gives a clue already17:19
mborzeckimvo: do you remember where PasswordAuthentication in sshd_config gets enabled?17:22
cmatsuokaijohnson: could you have a look at #9185 again when you have time?17:24
mupPR #9185: secboot: use the snapcore/secboot native recovery key type <UC20> <Created by cmatsuoka> <https://github.com/snapcore/snapd/pull/9185>17:24
cachioijohnson, worker for me17:30
cachioI'll retry it in 3 machines17:31
mborzeck1mvo: duh, i have no clue why ssh may be failing, trying one more time17:33
mborzeck1ijohnson: could you run the tests/nested/core20/kernel-reseal test from #9343 locally?17:34
mupPR #9343: tests: more logging for UC20 kernel test <Run nested> <Test Robustness> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9343>17:34
zygaonly one function left to test17:34
zygaI made a lot of tea17:34
ijohnsonmborzeck1: sure17:38
mborzeck1ijohnson: in my runs, the test fails to login over ssh, i suspect PasswordAuthentication may have not been enabled17:39
ijohnsoncachio: yeah worked for me too, I will try in a loop to see if we can reproduce17:39
ijohnsonmborzeck1: hmm interesting17:39
ijohnsoncmatsuoka: yes I will try to do a review tonight17:39
mborzeck1ijohnson: otoh, i'm not sure how it gets enabled in those tests, we dont' seem to calling repack_snapd_snap_with_deb_content_and_run_mode_firstboot_tweaks in a way that would enable that, so maybe cloud-init does it implicitly when it adds a user?17:40
ijohnsonmborzeck1: right it should be using whatever is the default17:41
ijohnsonmborzeck1: if cloud-init ran it should have created the user with the pw17:41
ijohnsonmborzeck1: is that your full log of a failed run @ https://paste.ubuntu.com/p/pfZsCCyBsS/ ?17:42
mborzeck1ijohnson: yup, that's the full log i got17:42
pedronismborzeck1: it should be turned on by: ssh_pwauth: True in the cloud-init config17:47
cmatsuokapedronis: #9340 + #9277 + #9185 + some conflict fixing seem to work17:50
mupPR #9340: boot: streamline bootstate20.go reseal and tests changes <Run nested> <UC20> <Created by pedronis> <https://github.com/snapcore/snapd/pull/9340>17:50
mupPR #9277: secboot: add boot manager profile to pcr protection profile <Run nested> <UC20> <⛔ Blocked> <Created by cmatsuoka> <https://github.com/snapcore/snapd/pull/9277>17:50
mupPR #9185: secboot: use the snapcore/secboot native recovery key type <UC20> <Created by cmatsuoka> <https://github.com/snapcore/snapd/pull/9185>17:50
pedroniscmatsuoka: thx17:51
cmatsuokapedronis: and snapcore/secboot update + canonical/go-tpm2 update too17:51
pedronismborzeck1: mvo: could we add printing of ssh config and extrausers content somewhere that runs/at end of cloud-init?18:04
mvopedronis: good idea18:07
* zyga spawns more spread tests, pushes the branch and EODs18:07
mvopedronis: hm, the annoying part is that it appears that the exiting "echo test" is not output in the log that maciej posted :(18:09
mborzeck1pedronis: ijohnson: so it looks like it's actually taking that long to reach the point where cloud init makes the system accessible https://paste.ubuntu.com/p/6sPDmDhFzX/18:14
pedronismvo: I suspect cloud-init sends it output somewhere else18:15
ijohnsonmborzeck1: ah you know what I remember this problem18:15
mborzeck1funny there's this:18:15
mborzeck1[  154.076623] passwd[766]: password for 'user1' changed by 'root'18:15
mborzeck1but it seems the change is picked up by pam way later:18:16
mborzeck1[  396.314719] chpasswd[1626]: pam_extrausers(chpasswd:chauthtok): password changed for user118:16
ijohnsoncloud-init creates the user from config but for whatever reason the user isn't accessible until after everything is done running, I've seen this happen in other nested suites where it fails to prepare, then drops me to a shell and 20 minutes later when I see the failure I can login just fine18:16
ijohnsonmborzeck1: yes that makes sense18:16
ijohnsonI think we just need to bump the timeout for now and try to optimize things later on18:17
mborzeck1hah, interesting18:17
mborzeck1[  404.228972] cloud-init[1623]: Cloud-init v. 20.2-45-g5f7825e2-0ubuntu1~20.04.1 running 'modules:config' at Mon, 14 Sep 2020 18:06:32 +0000. Up 390.55 seconds.18:17
mborzeck1[  405.991338] systemd[1]: Finished Apply the settings specified in cloud-config.18:17
ijohnsonit would be interesting to know if gce's tpm implementation works to the point where we could use it now for many of these tests and just use nested tests for things that really actually need nested vm's18:17
mborzeck1wonder whether `Up 390.55 seconds` means that it was running for that long18:18
mvoI'm in the vm now18:19
mvosystemd take 44% cpu18:19
mvoand console-conf was also pretty cpu heavy18:19
mvoand now I got kicked out :(18:19
pedronisijohnson: sounds like something needs reloading to notice the new user?18:19
ijohnsonpedronis: I dunno, it might be a "feature" of cloud-init that your user isn't "login-able" until cloud-init thinks the system is "ready"18:20
pedronismborzeck1: do we run a script inside the vms? where is it defined?18:20
ijohnsonpedronis: what do you mean by a script ?18:20
ijohnsonfor cloud-init ?18:21
mborzeck1pedronis: https://github.com/snapcore/snapd/pull/9343 repacks the core, so we can inject pretty much anything now18:21
mupPR #9343: tests: more logging for UC20 kernel test <Run nested> <Test Robustness> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9343>18:21
pedronismborzeck1: no, I'm thinking a bit of code we run everywhere defined in prepare stuff?18:21
mborzeck1pedronis: you mean in the nested vm?18:23
pedronismborzeck1: no, I mean repack_snapd_snap_with_deb_content_and_run_mode_firstboot_tweaks18:24
mborzeck1pedronis: that extra shell bit in the helper is not used when nested runs (i presume we expect cloud-init to set up ssh)18:26
mborzeck1and accounts18:26
mborzeck1well, just one account really18:26
pedronismborzeck1: my question is whether we run it or not, my fear is that things interfere with each other18:26
pedronisthat's why I ask18:26
mborzeck1pedronis: no afaict, the bit is not added, ENABLE_SSH is false when the helper is called from nested prepare18:28
pedronisI see, ok18:28
mupPR snapcraft#3285 opened: v1 plugins: lock godep's dependencies <Created by cjp256> <https://github.com/snapcore/snapcraft/pull/3285>18:30
* zyga opened export manager PR and EODs18:30
mupPR snapd#9345 opened: overlord: introduce the export manager, export snapd tools <Created by zyga> <https://github.com/snapcore/snapd/pull/9345>18:31
mborzeck1tweaked the cloud config we use a bit18:39
mvomborzeck1: \o/18:39
ijohnsonpedronis: fwiw I have an open pr that changes repack_snapd_snap_with_deb_content_and_run_mode_firstboot_tweaks to instead call repack_snapd_deb_to_snap from snaps.sh to reduce confusion about that18:46
ijohnsonpedronis: actually that pr is now green with 2 +1's and I'd like to merge it to master, do you have objections ?18:47
mupPR #9332: spread.yaml, tests/nested: misc changes <Run nested> <Simple 😃> <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9332>18:47
pedronisijohnson: I don't have objections, but mvo and mborzeck1 might19:00
pedronisijohnson: also why was it like this? we have two functions of which the second behaves like the first if a param is false?19:01
ijohnsonpedronis: ack, well mvo already +1d the PR, so unless mborzeck1 has an opinion I will merge it19:01
mvoijohnson: works for me19:01
mborzeck1wfm too19:01
ijohnsonpedronis: cachio added the param to the longer function before I added/created the simpler one19:01
ijohnsonpedronis: I created the simpler one for uc18 nested cloud-init tests specifically19:01
ijohnsonpedronis: but actually the same function works for uc20 nested tests too19:02
ijohnsonmborzeck1: cool I'll merge that one now19:02
mborzeck1ijohnson: tried using [NoCloud,None] in datasources list for cloud init, suspecting it may be getting confused or somesuch, but i don't see any change19:03
pedronismborzeck1: we need to find a way to get the ssh config and the extrausers file into the logs19:03
pedronisijohnson: but now that parameter could be dropped from the 2nd, no?19:04
ijohnsonpedronis: yes it could19:04
ijohnsonpedronis: I can do that19:04
pedronisit would make sense to me, but maybe cachio has reasons, but even then I would then write a wrapper that picks one or the other in this new world19:05
ijohnsonmborzeck1: with 648801163d3d09f3db18dd71bec3d79690eda3b1 from your pr, I got a successful run on google-nested19:06
ijohnsonmborzeck1: I ran `spread --debug google-nested:ubuntu-20.04-64:tests/nested/core20/kernel-reseal`19:06
mupPR snapd#9332 closed: spread.yaml, tests/nested: misc changes <Run nested> <Simple 😃> <Created by anonymouse64> <Merged by anonymouse64> <https://github.com/snapcore/snapd/pull/9332>19:06
ijohnsonpedronis: I really don't think it's necessary anymore and it reads quite confusing to see "false" as an argument to an already really long bash function :-/19:07
ijohnsonbut yes I will get cachio to approve the pr when ready19:07
ijohnsonmborzeck1: so was the issue that you can't run it locally with qemu-nested ?19:07
mborzeck1ijohnson: thanks for trying, it did work for me a couple for runs, but it's not really consistent19:11
ijohnsonmborzeck1: ok, I will run more tries to see if I can reproduce the failure, to be clear though, you're running in gce with google-nested or locally with qemu-nested ?19:11
mborzeck1mvo: ijohnson: pedronis: pushed a bit more changes to #9343, bumped the timeout and somesuch19:12
mupPR #9343: tests: more logging for UC20 kernel test <Run nested> <Test Robustness> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9343>19:12
mborzeck1ijohnson: google-nested19:12
mvomborzeck1: looking19:12
mborzeck1need to wrap it up, tuck the kids to bed19:13
mvomborzeck1: thanks!19:13
mvomborzeck1: good night19:14
mupPR snapd#9346 opened: interfaces: builtin: add iotedge interface to support Azure iotedge <Created by kubiko> <https://github.com/snapcore/snapd/pull/9346>19:16
mupPR snapd#9344 closed: tests/lib/nested.sh: wait for the tpm socket to exist <Run nested> <Simple 😃> <Test Robustness> <Created by anonymouse64> <Merged by anonymouse64> <https://github.com/snapcore/snapd/pull/9344>19:41
pedronisijohnson: I think I'm understanding something of the nested issues20:30
ijohnsonunderstanding is good20:31
ijohnsonI have now run 3 times the kernel reseal test on gce and it hasn't failed for me fwiw20:31
pedronisijohnson: basically because we use chpasswd (and not other keys), that happens in a later cloud-init phase20:31
pedronisif you look at the logs of a successful run20:32
pedronis[  125.319739] passwd[783]: password for 'user1' changed by 'root'20:32
ijohnsonyeah that matches basically what I expected20:32
pedronisbut things works only after I see20:32
ijohnsonI use this cloud-init config locally which doesn't use the chpasswd module:20:33
pedronis[  346.008722] chpasswd[1668]: pam_extrausers(chpasswd:chauthtok): password changed for user120:33
pedronisthat's at 5 minutes in20:34
pedronisand counting20:34
pedronisthe issue afaict is that part of cloud-init runs only after we are seeded20:34
pedronisand seeding is slow20:34
ijohnsonpedronis: I wonder if cloud-init creates the user and the passwd message we see is just because the user was created and the password was set to ""20:34
pedronisyes, something like20:34
ijohnsonbut we should try something like the minimal cloud-init I just pasted instead, perhaps that would run faster as that's just using the users module/key thing20:35
* ijohnson afk for 5ish minutes20:35
ijohnsonmmm now I reproduced a failure on the kernel-reseal nested test21:28
pedroniswith the increased timeout?21:33
ijohnsonit looks like the vm is hard-locked up21:34
ijohnsonqemu is still running21:34
ijohnsonbut many messages like this in the serial log21:34
ijohnson[  649.475297] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd:1]21:34
ijohnson[  649.587280] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [sshd:2191]21:34
pedronisfun, not21:34
ijohnson[  684.935379] rcu: INFO: rcu_sched self-detected stall on CPU21:34
ijohnson[  684.935379] rcu: 1-...!: (14946 ticks this GP) idle=ec6/1/0x4000000000000002 softirq=73107/73107 fqs=29521:34
ijohnson[  684.935379] rcu: rcu_sched kthread starved for 14407 jiffies! g142565 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=021:34
ijohnson[  684.935379] rcu: RCU grace-period kthread stack dump:21:34
cmatsuokaijohnson: hummm... we've seen this before I think21:35
ijohnsonyeah I dunno what's the situation, but I don't see that chpasswd message so it seems that it took too long to get to that point21:35
ijohnsonand it seems that kvm was shut off properly21:36
ijohnsonthis is the qemu cmdline21:36
mupPR snapd#9346 closed: interfaces: builtin: add iotedge interface to support Azure iotedge <Needs security review> <⛔ Blocked> <Created by kubiko> <Closed by kubiko> <https://github.com/snapcore/snapd/pull/9346>21:37
cmatsuokaijohnson:  I think it was when cachio's VMs were rebooting randomly and we considered it could be the watchdog rebooting the system21:37
ijohnsonyeah but this is also on GCE, which had that problem21:37
ijohnsonbut I thought that turning off kvm fixed that problem21:37
ijohnsonwell it fixed the problem of randomly rebooting21:37
ijohnsonnow maybe the problem is just that we get hung and don't get rebooted whereas before we would at least get rebooted21:37
cmatsuokaijohnson: is this happening randomly?21:38
cmatsuokaor did you find a way to make it reproducible?21:38
cachioijohnson, hey21:39
cachio2 cpus?21:39
cmatsuokahmm, -smp 221:39
ijohnsoncmatsuoka: it is random, I ran the same branch successfuly twice before it failed21:39
cachioI'll try to reproduce it21:39
ijohnsoncachio: should we not be using 2 cpus ?21:39
ijohnsonthis is for nested/core20 suite21:40
cachioit should,t be a problem21:40
cachiojust when you have kvm enalbed could be a prooblem21:41
cachiobut it is not the case21:41
ijohnsoncachio: kvm is disabled in this case21:41
ijohnsoncachio: this is from running mborzecki's branch on PR https://github.com/snapcore/snapd/pull/934321:42
mupPR #9343: tests: more logging for UC20 kernel test <Run nested> <Test Robustness> <UC20> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/9343>21:42
cachioijohnson, ah, nice, I'll use it21:43
ijohnsoncachio: also I reproduced the issue I mentioned earlier around lunch time21:44
ijohnsonI don't know a root cause yet, but somehow something deleted our sudoers entries21:44
cachioI ran many times and all of them passed21:44
cachioI saw many times that until I added sync before stop the vm21:45
ijohnsonyeah maybe sync just makes it less likely?21:46
ijohnsonwhat's weird is that I can see in the spread prepare output it created the users and they executed sudo commands successfully21:46
ijohnsonI wonder if we need to run sync inside the VM as well as outside the VM too21:46
cachioijohnson, perhaps21:47
cachioI can try that21:47
cachioI am preparing a new pr with some improvements21:47
ijohnsoncachio: thanks I'm gonna move on and debug some other things21:47
cmatsuokamaybe the slow system under stress is confusing rcu?21:50
cmatsuokaah we really have cpu soft lockups21:54
pedronislooking at the logs it seems reaseling itself is very slow (but we would need to add dedicated logging to be sure)21:56
ijohnsoncmatsuoka: this is the full log I have from that VM before I gave up and killed it https://pastebin.ubuntu.com/p/2w3XvXBh4B/22:06
* cmatsuoka verifies...22:06
cmatsuokaijohnson: is this the kernel-reseal test? any special command line to run it?22:18
* cmatsuoka running the test...22:20
ijohnsoncmatsuoka: yes that's the test22:22
ijohnsonno, just whatever's in that branch right now is what I ran22:22
cmatsuokagce seems especially slow today22:23
ijohnsoncachio: could you please look at pr 9347 quickly tonight? it is small and I would like to try and merge that tonight if possible so it's ready for folks tomorrow morning22:23
mupPR #9347: tests/lib/nested.sh: use more focused cloud-init config for uc20 <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9347>22:23
cmatsuokacachio: any known problem with arch tests and cgroups recently?22:26
mupPR snapd#9347 opened: tests/lib/nested.sh: use more focused cloud-init config for uc20 <Created by anonymouse64> <https://github.com/snapcore/snapd/pull/9347>22:27
cmatsuokadinner, then will check test results22:37
cachioijohnson, sure22:47
cachiocmatsuoka, didn't see errors, do you have al og22:47
cachioijohnson, +1, now lets wait for test results22:58
ijohnsoncachio thanks but very confusing that the nested runs are all done because it says they were cached?22:59
ijohnsonThis branch has never been run before it was opened less than an hour ago22:59
ijohnsonI have to EOD now but maybe someone should look into that22:59
cachioijohnson, I added the run nested tag23:02
cachiocould you please re-push on that branch23:02
cachioso we force the nested tests are executed at least once for that PR?23:02
ijohnsonOh I forgot to add the tag23:02
ijohnsonSorry let me close and reopen23:03
cachioijohnson, tx23:03
cmatsuokacachio: it worked now, it was just a random failure I guess23:06
cmatsuokacachio: thanks23:06
cachiocmatsuoka, ok23:10
cachiocmatsuoka, please send me a log in case you see it again23:10

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!