[02:19] PR snapd#9235 opened: tests: new tests for nested and external [05:00] PR snapd#9233 closed: vendor: run ./get-deps.sh to update the secboot hash [05:23] morning [05:30] PR snapd#9232 closed: run-checks: check for dirty build tree too === blackboxsw_ is now known as blackboxsw_eod [05:39] o/ [05:39] mvo you're up early [05:43] * zyga leaves the macbook to charge and goes upstairs to do reviews from the imac [05:52] mvo: zyga: hey [05:53] hey zyga and mborzecki [06:05] PR snapd#9236 opened: kernel: remove "edition" from kernel.yaml and add "update" [06:07] heh, in the 15.2 images we have there's no daemon user/group [06:11] zyga: i think we need to fix the rpm spec on openssue and add a dependency on system-user-daemon (as daemon is needed for system usernames) [06:12] zyga: idk why but installing lsb release does not seem to pull in that package (even though daemon user is required by lsb?) [06:14] thanks for this [06:14] I mean, thanks for looking/fixing this [06:16] mborzecki: quick question - gadget updates are already smart in the sense that there is no file update if the hash is the same, correct? [06:16] mvo: yes [06:16] ta [06:17] * mvo still scratches head over how to best apply the kernel-dtb refs [06:18] mvo: this is the bit that does that https://github.com/snapcore/snapd/blob/master/gadget/mountedfilesystem.go#L763 [06:18] mborzecki: ta [06:26] zyga: thanks for the review comments. Since you're the expert on process tracking, my spread test is failing on Ubuntu 14.04, where it doesn't seem to be able to identify a process from a classic confined snap. Is that a known limitation? [06:45] jamesh, yes, 14.04 has no tracking [06:45] jamesh we could fake it I mean [06:45] we can expand tracking to use apparmor labels [06:46] that would cover 14.04 [06:46] by fake it, I mean we could be better while not being really equal to 16.04+ [06:46] mborzecki sounds good (dependency change) - I have some changes I didn't merge into master, related to suse packaging [06:46] zyga: weird that the rest of the test using strict confinement passed then. [06:47] jamesh, hmmm, but even with classic confinement, we should be setting an apparmor label [06:47] zyga: i have a fix to push to #9218 [06:47] PR #9218: tests: running tests on opensuse leap 15.2 [06:47] zyga: I could probably just skip the test on 14.04 [06:47] jamesh which exact test is that? [06:47] jamesh I honestly think that's best [06:47] 14.04 is not a fully supported release [06:49] zyga: the spread test in https://github.com/snapcore/snapd/pull/9132: the "exit_staus 11 ..." test at the end to check against a classic process returns 10 (which is "not snap confined") [06:49] PR #9132: o/hookstate/ctlcmd: add optional --pid argument to "snapctl is-connected" [06:49] jamesh I'll look [07:04] morning [07:07] * zyga small errand [07:07] hey Pawel [07:18] good morning pstolowski [07:26] zyga: can you take a look at? [07:26] zyga: pff, forgot to paste the link ;P https://github.com/snapcore/snapd/pull/9218 [07:26] PR #9218: tests: running tests on opensuse leap 15.2 [07:33] eh, snap-logs test failures seem related to my services changes but i don't understand how [07:34] did anyone see this test failing on master recently? [07:46] mborzecki looks sensible [07:46] pstolowski: snap logs? [07:47] pstolowski: there's a PR from ijohnson with a bug fix, do you have a log? [07:47] mborzecki: yes sna logs [07:47] pstolowski: this PR from ian: https://github.com/snapcore/snapd/pull/9234 [07:47] PR #9234: systemd/systemd.go: support journald JSON messages with arrays for values [07:48] related? [07:49] mborzecki: ah, interesting! thanks, could be [07:49] going to test with my branch [07:51] this test often fails on snap logs -f ... check (no match for the expected log message), but snap logs -n=all does include it [08:37] quick errand, back in 30 [09:04] pstolowski: now the service PR is ready for re-review, right? (not sure I'll get to it today though) [09:05] pedronis: i'm testing ijohnson's fix for log parsing to see if it fixes snap-logs test failures in my PR, i'll know shortly [09:15] re [09:16] mborzecki: hi, is #9201 still a draft or should I re-review it? [09:16] PR #9201: boot: observe update & rollback of trusted assets [09:16] pedronis: i converted it to non-draft [09:17] pedronis: would appreciate if you could take a look, and maybe we can discuss things if needed [09:17] ok [09:21] mvo: can you force merge https://github.com/snapcore/snapd/pull/9218 ? the failure on 18.04 is unrelated [09:21] PR #9218: tests: running tests on opensuse leap 15.2 [09:25] mborzecki: sure [09:25] mborzecki: also reviewed/approved the other pr [09:26] mvo: thanks! [09:26] PR snapd#9218 closed: tests: running tests on opensuse leap 15.2 [09:44] pedronis: Ian's fix didn't help with my issue with snap-logs, probably makes sense if you wait till i figure out what's going on [09:55] pstolowski: ok, it wasn't yet at the top of my queue anyway [10:00] ha https://pastebin.ubuntu.com/p/v9G8ZMsw2c/ [10:01] journalctl seems to be skipping messages if we ask for multiple units [10:28] macos github job failing? [10:34] huh https://github.com/snapcore/snapd/runs/1041031007 [10:35] go get golang.org/x/sys/unix apparently fails there [10:39] morning folks [10:40] anybody have any ideas why I would be getting `cannot install "generic-classic" fallback model assertion: no matching public key "Lcw4MOHeD6yUabUqD_Gd5BwBGuJBB6t8jDV42wg02jv5XikFZoe6CBMJAo5pTbrW" for signature by "generic"` for unit tests I didn't change in o/devicestate ? [10:40] pedronis: maybe ^ ? [10:56] hi ijohnson [10:57] hey pstolowski [10:57] more fun with journald I see ? [10:57] ijohnson: yes [10:58] the fun never ends! [11:01] ijohnson: where is that? it seems an assertion db not configured right [11:01] pedronis: it comes from the firstBoot16Suite.TestPopulateFromSeedOnClassicNoop unit test [11:02] this only happens when I enable my 2nd suite in devicestate_cloudinit_test.go for uc20 variants [11:02] where my uc20 suite inherits from the devicemgrbasesuite [11:02] ijohnson: ok, something is not cleaning up correctly I suppose [11:03] pedronis: what's most confusing is that I have tried even deleting all of the setup code from my uc20 suite, and simply calling the device mgr base suite setup is enough to trigger this other unit tests to fail [11:03] pedronis: do you want me to push a wip branch to look at the tests ? [11:04] ijohnson: I suppose [11:04] one moment [11:10] pedronis: sorry still a wip commit, but here is my current branch: https://github.com/anonymouse64/snapd/commit/8c17373ac055a1d0ef90f99f3bc8a566ed594b95 [11:10] pedronis: if you just run all the tests in o/devicestate you should see the issue with all of my various print statements [11:10] * ijohnson afk for quick breakfast [11:15] 2020-08-28T10:15:41.3077042Z - Fetch and check assertions for snap "test-snapd-content-plug" (5) (cannot fetch assertion: got unexpected HTTP status code 408 via GET to "https://api.snapcraft.io/api/v1/snaps/assertions/snap-revision/UzfhJIDc1A04xUNzbAEwvu2_aCeT3FxOvo5tsKFG2V9ZzssK_ZIaEiRAFh8RdNus?max-format=0") [11:15] pedronis: still getting 408 from time time apparently ^^ [11:15] yea [11:27] ok I'm back [11:30] pstolowski: 9211 looks good, please let me know if you need a force-merge or if the spread issues are real [11:39] mvo: could I ask for your super powers on https://github.com/snapcore/snapd/pull/9225#issuecomment-682478359 ? [11:39] PR #9225: many: cloud-init cleanups from previous PR's [11:39] the comment I just linked to explains the various spread failures, some of which are already fixed in master [11:39] none of which are related to the PR in question [11:44] huh I think the issue I was having is that my cloudInitUC20Suite embedded the cloudInitBaseSuite struct, which embeds the deviceMgrBaseSuite, and so it inherited a SetUpTest from that, but then I also called setup on deviceMgrBaseSuite again separately, so setting my test up twice caused a different test to fail [11:45] pedronis: ^ [11:46] mvo: yes i think you can merge [11:46] ijohnson: yes, I see some setup being run twice, is that the problem though? [11:47] pedronis: I am not 100% sure, but I am going to try with a small patch on master to see if I can reproduce with just calling deviceMgrBaseSuite's setup twice on a single test [11:47] ijohnson: anyway fwiw the line that is causing trouble is s.AddCleanup(sysdb.MockGenericClassicModel(s.storeSigning.GenericClassicModel)) [11:47] yes [11:48] sorry I meant interesting [11:48] it seems is not undone properly ? [11:48] hmm [11:49] ah [11:49] it's funny probably [11:49] in a terrible sort of funny way [11:49] let me see if I'm right [11:58] ijohnson: yes, the issue is the double setup, the problem with that is among other things that the first setup is never undone, because the 2nd setup will simply reset the cleanup function list [11:59] pedronis: hmm that's weird, should we fix that ? [12:01] ijohnson: I don't know if it's weird or not, it's one way to implement this, you need to stare at testutils/base.go [12:02] uhh [12:02] AddCleanup just appends the handlers, so the handler should just get called even if it's appended twice ? [12:02] oh I wonder if it's called in the wrong order [12:02] maybe AddCleanup should be a stack [12:02] well, the order is weird as well [12:02] but that's the problem [12:02] *use [12:02] your second setup [12:02] calls the BaseTest.Setup [12:03] that's just forgets everything [12:03] ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh [12:03] yes you're right [12:03] I see it now [12:03] SetUpTest is not really intended to be called twice [12:03] in general I mean [12:03] yes, I'm not sure how I accidentally got to that point [12:04] but I think it's clear to me that as long as we don't use BaseTest.SetUpTest twice things are fine [12:04] so that was my original problem [12:04] yes [12:04] ok, good to know [12:04] but I agree that the order we call the cleanup is weird [12:04] it should be a stack [12:04] I will fix up this commit and open the PR shortly [12:04] I don't know why is like that [12:04] probably just because it was easy [12:04] last change to that file was 4 years ago [12:05] anyway I don't think SetUpTest should need to cleanup the handlers, is not a map, otoh maybe it should panic if the handlers are not empty already [12:06] I like that change a lot actually [12:06] would have made it super obvious my error very quickly [12:06] maybe I should try to do that in a separate PR and fix the order, and see if anything breaks [12:06] I will propose that in a separate PR I think [12:06] or you can [12:06] well I will work on that after opening the cloud-init PR [12:06] so if you want to do it first go for it [12:08] ok, I'll try to do that just now [12:12] ijohnson: we have some tests that fail interestingly [12:12] zyga, hi [12:12] hi [12:12] oooh fun [12:12] morning cachio and zyga [12:12] nice find on non-stack cleanup for base test [12:12] ijohnson, good morning [12:12] and the related issues [12:13] maybe base test should panic if invoked incorrectly [12:13] ijohnson: one is cloud init, was the problem already on master? [12:13] zyga, yesterday implented what you sent and that works, the session is created [12:13] pedronis: no, there shouldn't any problems on master [12:13] well I don't think so [12:13] zyga, during the test the sesions is there [12:13] pedronis: what test failed ? [12:13] ijohnson: well it's panicing [12:13] err how did it fail [12:14] zyga, but still fails when we do "systemctl --user daemon-reload" [12:14] well I added code to panic if cleanup handlers are not empty in setup [12:14] what I said before [12:14] :-( [12:14] ok, I will have a look [12:14] cachio what do you mean "fails", what fails when you do this? [12:15] zyga, 1 sec [12:15] ijohnson: I see issues with repair, something in cmd/snap (just one test, that is weird) [12:15] and then quite a few in devicestate [12:15] the rest is fine [12:17] zyga, https://paste.ubuntu.com/p/PPYFgSQy69/ [12:17] fail like this [12:17] cachio before the failure, do we still have the seesion? [12:17] zyga, yes [12:17] are you absolutely sure? [12:18] when it fails we have the session in the debug session [12:18] the root session? [12:18] yes [12:18] can you check the status of user@0.service [12:18] zyga, sure [12:19] pedronis: I don't understand how, but if I make cloudInitSuite embed a pointer to deviceMgrBaseSuite and initialize that pointer in cloudInitSuite.SetUpTest, then the panic goes away [12:19] pedronis: I have seen other test suites that inherit base suites do that too, so perhaps that's the thing to do [12:20] otoh, I have seen the pattern of not defining SetUpTest for the base suite, and instead having something like setup() for the base suite which needs to be called from the final suite's SetUpTest [12:21] ijohnson: so some tests call SetUpTest on their own, fun [12:21] without TearDown [12:21] ugh what [12:22] I'm still confused how this happens for i.e. the cloudInitSuite since we aren't manually calling SetUpTest there [12:22] well, I'll have to stare it myself in a bit [12:23] ok, well we know how to make the panic's go away [12:23] * ijohnson returns to prepping the cloud-init PR [12:24] zyga, https://paste.ubuntu.com/p/vvxJCWCr67/ [12:24] cachio and systemctl --user status [12:25] (note: assuming this is invoked by root) [12:25] not by the external user [12:25] Failed to connect to bus: No such file or directory [12:25] strace -f systemctl --user status [12:25] and check which connect fails [12:27] I am in core16 [12:27] dont have strace [12:27] is there an y snap instead? [12:27] we have strace static [12:27] try that [12:29] this is hte output https://paste.ubuntu.com/p/KjKMz6s5g9/ [12:30] cachio run strace-static directly, not as a snap please [12:31] https://paste.ubuntu.com/p/w5krDr87Np/ [12:32] like that? [12:34] no [12:34] that's not any different [12:34] directly without snap run pipeline [12:38] zyga, https://paste.ubuntu.com/p/VGQYZKsXP5/ [12:39] open("/sys/fs/kdbus/1001-user/bus", O_RDWR|O_NOCTTY|O_CLOEXEC) = -1 ENOENT (No such file or directory) [12:39] this is not running as root for real [12:39] are you logged in as the external user? [12:39] https://paste.ubuntu.com/p/t3sHWqj9CC/ [12:40] we login as external [12:40] using sudo / su will again bring back the same issues [12:40] it's not the same [12:40] as logging in as root [12:40] through ssh [12:40] then we make sudo to run the test script [12:40] hmm? [12:40] we cannot do that [12:40] so we should run with runuser? [12:40] we must use what I provided to run the script [12:40] runuser won't work directly [12:41] do you understand what the problem is? [12:41] partially [12:41] the script cannot run in the session of the external user which just happens to elevate to root [12:41] our tests do not cope with that [12:42] my suggestion was to run the whole test this way [12:42] "this way" is the way we talked about yesterday, with the service and pam name [12:42] this then runs in an environment very similar to root ssh-ing into the system [12:43] cachio perhaps we can log in as external [12:43] install a ssh key for root [12:43] and then go back to running as root [12:43] is it possible to do that in core syste? [12:43] to add a ssh key for root? [12:44] can you write to /root/.ssh/? [12:44] if so then yes [12:44] I mean... [12:44] root is not read only :D [12:45] zyga, but to connect to the system using spread a password we need for root [12:46] because spread just connects using password [12:46] something has to change [12:46] what we have doesn't work [12:46] we just need to pick the most sensible thing to alter [12:47] zyga, so [12:47] some time ago I did a change [12:47] in sporead to support connecting using ssh keys [12:47] if we could add that to spread we could connect using root [12:47] and the problem is solved [12:48] right? [12:48] if we can connect as root over ssh then yes [12:48] however that is done [12:48] cachio could we hack it like this? [12:48] add an extrauser [12:48] root-with-password [12:48] have it have uid 0 [12:48] and home /root [12:49] but have a password? [12:49] we need password for spread [12:50] right, but isn't that what I just said? [12:52] I though it was a question [12:52] cachio? [12:53] 1 sec [12:55] can do that [12:55] Warning: The home dir /root you specified already exists. [12:55] adduser: The UID 0 is already in use. [12:57] cachio you can add it as another uid [12:57] and just sed it to zero [12:57] it's just the tool that complains [12:58] zyga, cant create the home /root [12:58] because it already exists [12:58] use whatever directory and edit the created user [12:58] you can just edit the text file [12:59] you can even just append to [13:02] PR snapd#9237 opened: [RFC] many: enable cloud-init on uc20 for grade signed and secured [13:03] * zyga will go AFK in about 10 minutes [14:04] errands, bbl [14:23] ijohnson: so on master things are clean for me now [14:23] devicestate was in a very messy state though [14:32] PR snapd#9225 closed: many: cloud-init cleanups from previous PR's [14:39] ijohnson, is this the same error you fixed a time ago for uc20-recovery test? https://paste.ubuntu.com/p/C94XTPBTWG/ [14:40] cachio: could be, will look in little bit === blackboxsw_ is now known as blackboxsw [14:55] * cachio lunch [14:55] pedronis: so are you going to prepare a PR to make the panic change ? [14:55] ijohnson: yes, about to propose it actually [14:55] pedronis: also if you could give a quick review to #9237 on the general direction and options names and such I can split it up [14:55] PR #9237: [RFC] many: enable cloud-init on uc20 for grade signed and secured [14:55] thanks [14:56] I guess names of things can be separate PR's but the overall interaction between devicestate and sysconfig.ConfigureRunSystem via the options is more what I'm curious about [14:56] ijohnson: I have a meeting now but I'll try to look after [14:57] k, thanks [14:57] ijohnson: https://github.com/snapcore/snapd/pull/9238 is the PR, likely it might conflict with yours [14:57] PR #9238: many: check that users of BaseTest don't forget to consume cleanups [14:57] PR snapd#9238 opened: many: check that users of BaseTest don't forget to consume cleanups [14:57] pedronis: sure no problem, the RFC one won't be mergable right away unless one big PR review is easiest [15:07] PR snapd#9239 opened: many: misc doc-comment changes and typo fixes [15:11] pedronis: your hunch about systemd & my --root change was good... this is peculiar: https://pastebin.ubuntu.com/p/yMXYd93ZQZ/ [15:14] also funny how the message is slightly different [15:16] pedronis: I de-conflicted things so that #9237 doesn't need your PR and doesn't conflict with it afaict and updated the PR description to reflect this [15:16] PR #9237: [RFC] many: enable cloud-init on uc20 for grade signed and secured [15:16] the basic problem is we set rsyslog in gagdet defaults (which makes no sense on core18 but because of emulation mode), but then we fail on normal "snap set.. ", but that was masked with --root=.. as my pastebin demonstrates, because it prints error but gives exit 0 [15:17] pstolowski: bash exit codes strike again? [15:17] ijohnson: what about bash exit codes? [15:17] PR snapd#9240 opened: o/devicestate/devicestate_cloudinit_test.go: test cleanup for uc20 cloud-init tests [15:17] pstolowski: you said "because it prints error but gives exit 0" [15:18] ijohnson: see https://pastebin.ubuntu.com/p/yMXYd93ZQZ/ [15:18] just curious if that was related to all the random bash issues that zyga has discovered recently [15:18] ijohnson: ah got you. no it's not bash, it' ssytemctl behavior [15:18] ah great [15:18] that's so silly [15:18] ijohnson: note the slightly different spelling of the error message... [15:18] --root=/ makes it exit with code 0 and not that option makes it exit with code 1 [15:19] yet it means the same [15:19] oh ffs [15:19] that's ridiculous [15:21] "error but not really" [15:26] it's also the case on 20.04 [15:40] mmm how do we do tests for grade signed images in spread with custom snapd snaps [15:40] because grade signed model assertions don't allow unasserted snaps [15:40] pedronis: thoughts ? [15:43] cachio: do you have more logs from that uc20-recovery failure? it seems that it should have gone back to run mode but it didn't so that seems like a real test failure [15:47] PR snapd#9241 opened: tests: do not set rsyslog.disable in core18 config defaults test [15:47] i see no other option than ^ [15:53] ijohnson: there is nothing easy we can do, what kind of modifications do you need? [15:53] ijohnson: also I did a pass on #9237 [15:53] PR #9237: [RFC] many: enable cloud-init on uc20 for grade signed and secured [15:54] hmm i wonder if we should maybe ignore error on disable in configcore [15:56] pstolowski: well, if I understand the issue is that we trying to disable a service that doesn't exist? [15:57] and in that case --root makes a difference [15:58] pedronis: yes, we do that because test setup is wrong, but it is masked because with --root systemctl returns 0 [16:02] pedronis: I'm trying to write a spread test of the grade signed cloud-init changes [16:02] pedronis: but how do I test grade signed with an unasserted custom snapd snap [16:06] pstolowski: systemd seems to be using a special exit code to mean no unit [16:07] but is not consistent [16:07] pedronis: yes [16:08] back from PT [16:08] I need to catch up and cover this time, I'll be here for a few hours at least [16:09] ijohnson: we can only write a test using store snaps , so I'm asking what changes do we need? otherwise we can have a chat but the solutions are slightly horrible [16:10] pedronis: well if we accept that we can only write cloud-init spread tests for grade signed, then we don't need any changes [16:10] pedronis: it just means that getting all the bits correct for uc20 cloud-init will take more time as I can't verify anything with an integration test [16:10] I can't parse that sentence [16:11] pedronis: sorry let me rephrase [16:11] pedronis: I would like to write a spread test like we have for uc16/uc18 cloud-init but for grade==signed uc20 [16:11] pedronis: but I can't write this spread test today because there is no snapd snap in the store with my changes I made [16:11] cachio any luck? [16:11] ijohnson: can we have a quick chat? [16:11] pedronis: sure [16:12] pedronis: I'm in SU HO [16:12] pstolowski: it seems that systemctl status foo uses exit code 4 when the unit is not there, maybe we should teach systemd.Status to detect that and then use Status in the loop to know if a service exits at all [16:13] ans just skip doing anything with it [16:13] if it doesn't [16:13] pedronis: to be clear about systemctl issue, this is not about emulation mode (we use it when we should); this is about running for real, per man page the mere presence of --root bypasses systemd (and apparently changes semantics of error code) [16:14] pedronis: yeah, probably [16:14] yes, that's why I'm suggesting something slightly complicated [16:14] I don't think we want to just ignore errors without understanding them [16:14] fair point [16:18] i'll address this on monday, thanks [16:18] have a great weekend everyone! [16:54] ijohnson, hey, do you know if there was any change that could produce this cd: /home/gopath/src/github.com/snapcore/snapd/tests/core/uc20-recovery: No such file or directory [16:54] I think it is the first thing you fixed [16:54] for this test [16:58] cachio: I asked earlier before you disconnected [16:58] cachio: do you have more logs from that run ? [16:59] no, but I can reproduce it [16:59] cachio: what seems to have happened is a real test failure, where we went to reboot and should have gone back to run mode, but we didn't and went back to recover mode [16:59] cachio: if you could get me the full system journal that would be great [16:59] and give you a shell there [16:59] I need to go break for lunch now, but will be back in a little bit to debug this further with you === ijohnson is now known as ijohnson|lunch [16:59] ijohnson|lunch, sure [17:07] #9201 needs 2nd reviews [17:07] PR #9201: boot: observe update & rollback of trusted assets [17:17] * zyga grabs coffee === ijohnson|lunch is now known as ijohnson [17:51] pedronis: yes I can take a look, also should I look at the cups PR again today ? [17:54] ijohnson: if you can (abou cups) [17:54] pedronis: sure [17:57] ijohnson, I have an instance ready [17:57] ijohnson, which already failed [17:58] you need to be connected to the vpn [18:52] cmatsuoka: ijohnson: do you need anything more from me today? [18:52] pedronis: no I'm good [18:52] * ijohnson is staring at fakestore [18:52] * ijohnson to some useful effect however [19:10] pedronis: I'm trying the bootloader chain composition but found some problems [19:11] pedronis: mainly related to being unable to, after composing the run mode chain, map entries to the asset cache [19:11] (because we don't know from the final chain which parts came from each bootloader) [19:12] in my previous implementation I was doing this mapping before composing the run mode chain [19:13] I could do the same thing but it seems to me that the bootloader shouldn't be cache-aware [19:14] anyway, I'll think a bit about the in-between degraded upgrade states before returning to the boot chains [19:24] cmatsuoka: I see two approaches to that we include the root path in the assets names then you can find which partition/bootloader they belong to from that, or we return just names (that is what we need) and have a separate flag/marker to know which role/partition they are on [19:25] pedronis: just the name with an origin mark should solve it [19:25] pedronis: thanks [21:04] * cachio afk