=== mup_ is now known as mup === Eickmeyer is now known as Eickmeyer-Quasse === Eickmeyer-Quasse is now known as Eickmeyer[q] [05:50] morning [06:20] o/ [06:22] zyga: hey [06:22] zyga: some trouble with the cla-check job [06:22] uc20-snap-recovery failed [06:22] zyga: where? [06:22] but it ran on 19.10 [06:22] https://github.com/snapcore/snapd/pull/8440/checks?check_run_id=569193826 [06:22] PR #8440: github: move spread to self-hosted workers [06:23] zyga: uh, merge master [06:23] is that even expected? [06:23] known issue? [06:23] zyga: yes, it's fixed already [06:24] k [06:25] how did cla check fail? [06:25] it passed on my branch just now [06:25] 38seconds [06:26] meanwhile, travis is broken [06:26] https://t.co/h3UEAleWVW?amp=1 [06:27] I think I can just go back to bed [06:27] zyga: if you open a PR with a commit right on top of the master so that no merge commit is generated it will fail [06:27] I see [06:57] PR snapd#8439 closed: secboot: import secboot on ubuntu, provide dummy on !ubuntu [07:01] morning [07:01] good morning pstolowski [07:02] zyga: quick question, do we have a 32bit machine in travis actions? [07:03] mvo: travis actions? [07:03] zyga: sorry, gh actions [07:03] mvo: as I said yesterday I didn't add a 32bit xenial machine to github actions [07:03] mvo: though it's a one-liner in the matrix, it slipped through the cracks in the initial PRs [07:04] good morning :) [07:04] last night store went belly up [07:04] and everything running failed one way or another [07:04] so I just called it quits and went to sleep (too late anyway) [07:05] mvo: pstolowski: hey [07:05] PR snapd#8455 opened: tests/lib/cla_check: expect explicit commit range [07:05] zyga: can we skip the spread jobs? [07:05] mborzecki: in principle yes but it's not something we coded, we should try that if: ... expression I pasted before [07:05] one sec [07:05] maybe add that to your PR [07:06] contains(github.event.issue.labels.*.name, 'skip-spread') or somesuch? [07:06] yes [07:06] if: !contains ... [07:06] idk tho, just copied and pasted from the docs :P [07:06] :) [07:06] I tried to get https://github.com/snapcore/snapd/pull/8440 green [07:06] PR #8440: github: move spread to self-hosted workers [07:07] but each time something random failed [07:07] PR snapd#8456 opened: tests: add 32 bit machine to GH actions [07:07] some desktop service, some store bits, some reboot tests [07:07] so tough luck [07:18] mvo: could you please merge https://github.com/snapcore/snapd/pull/8454 [07:18] PR #8454: tests/session-tool: session ordering is non-deterministic [07:22] zyga: hm the docs are kinda meh [07:23] PR snapd#8457 opened: github: skip spread jobs when corresponding label is set [07:32] mborzecki: interesting, except that the status check is required [07:32] mborzecki: perhaps instead wrap that in ${{ }} [07:32] and have the worker essentially do nothing? [07:33] mborzecki: ${{ .. }} is required in run blocks [07:33] zyga: hm which pr? [07:33] your pr [07:33] there's 2 ;) [07:33] 8457 [07:33] and there's a syntax error [07:34] I would drop the first part [07:34] as all events are pull reqeusts [07:34] let me pull the docs [07:35] if: contains(github.event.pull_request.labels.*.name, 'Skip spread') [07:35] then just negate [07:35] if: !contains(github.event.pull_request.labels.*.name, 'Skip spread') [07:35] but as we learned, that should not go into if because then the status check wont report [07:35] so maybe: [07:36] run: | echo ${{ !contains(...) }} [07:36] and see what that prints (probably true as that is just js) [07:36] heh [07:36] then wrap that into a shell [07:36] and should be good [07:36] i mean, wtf are the docs about labels? [07:36] they are there [07:36] hold on [07:36] it's somewhat confusing because they are not in the action docs [07:36] but in the bigger github docs [07:36] the whole object model is documented [07:37] https://developer.github.com/v3/issues/labels/ [07:37] by doing ${{ ... }} you're effectively tapping into that [07:38] zyga: the pull request event is this: https://developer.github.com/v3/activity/events/types/#pullrequestevent doesn't list the label there but it's in the example [07:38] and it's an empty array [07:39] however, there's actually an example in the issues event payload [07:39] https://developer.github.com/v3/pulls/ has the labels listed [07:40] zyga: re 8454 sure, I will merge once the spread tests finished, they are still running [07:40] thanks [07:41] one test already failed [07:41] on portal info [07:41] zyga: oh, ok. is james aware of the flakiness here? [07:41] I don't know [07:41] it's in spread-unstable so perhaps nobody noticed? [07:42] jamesh: can you please check if this is expected [07:42] aha, could be [07:42] https://github.com/snapcore/snapd/pull/8454/checks?check_run_id=569207445#step:4:814 [07:42] PR #8454: tests/session-tool: session ordering is non-deterministic [07:43] fedora failed to prepare, network error [07:44] zyga: idk, i think that the labels is not actually included there [07:44] mborzecki: where specifically? [07:44] zyga: is the pull_request object is the same as pull_request in https://developer.github.com/v3/activity/events/types/#pullrequestevent then the label is not htere [07:44] but should be? [07:44] idk [07:44] pull request *event* [07:44] zyga: It isn't expected. If you're seeing this error, then it can't map the process ID to a snap via cgroups [07:44] refers to pull request [07:45] that has labels [07:45] jamesh: fun, I guess it is debug time then [07:56] mvo: https://github.com/snapcore/snapd/pull/8456/files [07:56] is the vendor change expected? [07:56] PR #8456: tests: add 32 bit machine to GH actions [07:56] mvo: https://github.com/snapcore/snapd/pull/8440 is green [07:56] PR #8440: github: move spread to self-hosted workers [07:56] but let's chat about that in the call [07:56] zyga: don't think that check works https://github.com/snapcore/snapd/pull/8457 looks like the spread jobs are still schedule [07:56] PR #8457: github: skip spread jobs when corresponding label is set [07:56] d [07:57] mborzecki: how do you determine that? [07:57] mborzecki: they are required, so they are marked as expected [07:57] mborzecki: note that normally you don't get any jobs until the previous pass is successful [07:58] so I don't believe this is accurate as measurement [07:58] ok, let's wait then [07:58] zyga: yeah [07:59] ah [07:59] I see the 2nd commit now [07:59] cool [07:59] thanks [08:11] PR snapd#8440 closed: github: move spread to self-hosted workers [08:15] mborzecki: one option would be to move the if: clause down to the step level [08:16] zyga: have you seen the 'cancel workflow' request to have any effect? [08:16] jamesh: supposedly job level `if` is supported now https://github.blog/changelog/2019-10-01-github-actions-new-workflow-syntax-features/ [08:16] mborzecki: it's not quite as efficient since a job would still be sent to a runner, but it would mean the job would be considered successful [08:17] unless it isn't :/ idk, maybe i just need to wait [08:17] mborzecki: yes, but if the conditional causes the job not to run, then it isn't considered successful [08:18] if you want to get rid of the "Some checks haven’t completed yet" message, the jobs need to at least do something [08:25] mvo: there's a problem with the -32 bit build [08:25] src/github.com/snapcore/snapd/vendor/github.com/chrisccoulson/go-tpm2/mu.go:267:17: constant 4294967295 overflows int [08:26] chrisccoulson: ^ FYI [08:26] mborzecki: IIRC cancelling works but spread doesn't cancel and the worker is killed [08:33] zyga: I know, I updated the PR that adds 32bit works, it should have a fix [08:34] maybe the hash is wrong? [08:34] zyga: oh, let me double check :( [08:34] zyga: could be that govendor confused me [08:34] when you push again merge master please [08:35] zyga: sorry, I'm an idiot, I updated go-tpm instead go-tpm2 [08:35] * zyga hugs mvo [08:36] https://github.com/snapcore/snapd/pull/8403 needs a 2nd review [08:36] PR #8403: sandbox/cgroup: avoid making arrays we don't use [08:37] it failed on store traffic: - Fetch and check assertions for snap "test-snapd-content-slot-no-content-attr" (1) (error reading assertion headers: read tcp 10.240.1.50:58298->91.189.92.20:443: use of closed network connection (Client.Timeout exceeded while reading body)) [08:41] PR snapd#8458 opened: github: allow cached debian downloads to restore [08:41] jamesh: https://github.com/snapcore/snapd/pull/8458 [08:41] PR #8458: github: allow cached debian downloads to restore [08:41] this should fix the cache [08:42] though I think it looks only in the scope of the PR, there's still more opportunity to cache things than we exploit [08:42] (caches are associated with objects and are not global) [08:44] brb [08:46] I suspect caches are probably scoped to the (repo, user) pair [08:53] * zyga monitors https://github.com/snapcore/snapd/actions?query=is%3Aqueued [09:01] PR snapd#8421 closed: tests: enable unit tests on debian-sid again [09:03] mvo: that seems to have fixed things [09:03] oh, I spoke too soon [09:03] mvo: src/github.com/snapcore/snapd/vendor/github.com/snapcore/secboot/utils.go:73:37: cannot call non-function he.TPMError.Code (type tpm2.ErrorCode) [09:03] I think this commit is not good :/ [09:04] why didn't this get flagged by the unit test run? [09:04] are we not building / testing secboot? [09:04] ahh wait [09:04] that's weird [09:04] ah, snapcore/secboot is a different repository [09:04] oh well [09:05] (we don't seem to test anything there in CI) [09:06] zyga: meh [09:06] but at least the tests were quick now :) [09:08] zyga: haha, yes. but that's slightly annoying that this fails [09:10] zyga: one more try [09:10] ok [09:10] still 0 queued [09:11] (which is good) [09:15] mborzecki: thanks for the suggestion in https://github.com/snapcore/snapd/pull/7614 [09:15] updated [09:15] PR #7614: cmd/snap-confine: implement snap-device-helper internally [09:16] still 0 queued [09:16] mvo: I also wonder if actions are more heavily used in US, making afternoon "harder" [09:17] I've always found CI runs faster before you Europeans wake up [09:18] mborzecki: could you look at https://github.com/snapcore/snapd/pull/7825 and tell me if you think it's work splitting [09:18] I think it is more a case of two groups of users using CI at once [09:18] PR #7825: many: use transient scope for tracking apps and hooks [09:18] I could take the go bits that do cgroup scanning out and push separately [09:18] jamesh: haha, yeah [09:20] heh, as jamesh commented, https://github.com/snapcore/snapd/pull/8457 does appear to be stuck [09:20] PR #8457: github: skip spread jobs when corresponding label is set [09:20] the unit tests job should run though, but it hasn't yet [09:22] wierd, i'll wait a little bit longer [09:22] could it have rejected the workflow entirely? [09:27] idk, clearly something is off [09:27] one job queued [09:28] (all 32 spread workers are busy) [09:28] mborzecki: werid [09:28] mborzecki: can you rebase on master and push? [09:29] at 32 spread runs I'm seeing roughly 1MB/s in and 1MB/s out [09:29] that's not too terrible [09:29] it spikes to 10MB/s [09:29] especially when new jobs kick in and there's the initial sync [09:29] zyga: where do you see that? [09:29] spread has an inefficiency where the starting worker pushes the same tarball to each node [09:29] mborzecki: on the machine running spread workers [09:30] we could optimize that traffic down by just sending the tarball once and then fetching it from the cloud [09:36] pedronis: hi. currently FilesystemOnlyApply skips core-only handlers if release is classic; i think this needs to be relaxed for image/setupSeed with a flag passed down to FilesystemOnlyApply; makes sense? [09:39] pstolowski: let me look [09:44] pstolowski: yes, the cleanest thing is probably for the package not use release.OnClassic at all, and get info through some options [09:45] pedronis: k, thanks for confirming [10:05] core 18 revert tests failed: https://github.com/snapcore/snapd/pull/8454/checks?check_run_id=570248002 [10:05] PR #8454: tests/session-tool: session ordering is non-deterministic [10:06] + snap list [10:06] error: cannot list snaps: cannot communicate with server: timeout exceeded while waiting for response [10:10] PR snapd#8454 closed: tests/session-tool: session ordering is non-deterministic [10:15] ogra where should bugs about ubuntu core images be filed? [10:16] actually, probably a bug in the installer, is that subiquity on core? (the first run thing) [10:21] * popey starts a forum thread. [10:26] mborzecki: TBH I really wish there were type annotations [10:26] reading foreign python code is like "where are the types" :( [10:30] mborzecki: did you try adding any annotations? [10:34] zyga: not really, i've had enough fun with implementing the chooser ui [10:34] zyga: anyways if you want to play with it, better talk to mwhudson first [10:37] mborzecki: https://github.com/CanonicalLtd/subiquity/pull/692#pullrequestreview-389844549 [10:37] * zyga goes upstaris to make tea [10:37] PR CanonicalLtd/subiquity#692: console_conf: various recover chooser tweaks [10:37] we are running at 23/32 workers now [10:37] we've reached saturation once for about 20 minutes [10:39] mvo: I made some comments in #8325, some are really general hindsight questions [10:39] PR #8325: snap-bootstrap: copy auth data from real ubuntu-data in recovery mode [10:45] PR snapd#8458 closed: github: allow cached debian downloads to restore [10:47] PR snapd#8448 closed: tests/session-tool: add session-tool --dump [10:47] thanks! [10:48] pedronis: thanks, will look in a wee bit, looks like it is closed, I will try to get it to a landable point today :) [10:51] popey, yeah, subiquity is correct [10:52] popey, but the issue is indeed the clock ... [11:02] mvo: I don't know, there are some open questions [11:03] mborzecki: https://github.com/CanonicalLtd/subiquity/pull/692#pullrequestreview-389870317 [11:03] PR CanonicalLtd/subiquity#692: console_conf: various recover chooser tweaks [11:04] zyga: thanks! [11:07] ogra ok [11:22] mvo: can you merge #8449, it's all green but travis never came back or started, afaict ? [11:22] PR #8449: dirs: don't depend on osutil anymore, mv apparmor vars to apparmor pkg [11:25] pedronis: sure [11:26] PR snapd#8449 closed: dirs: don't depend on osutil anymore, mv apparmor vars to apparmor pkg [11:30] core 20 recovery design [11:30] MAGA - make appliance good again [11:30] * zyga hides [11:31] we are at 3/32 workers [11:31] though it will go back to ~20 once canary jobs are done [11:32] MAGA ? so should we deny it exists until it hits us hard ? :) [11:37] ogra: you mean another customs war? [11:39] I started implementing snapctl refresh-available [11:39] should have a simple version today [11:39] but first, *hot* tea [11:39] the office is horribly cold even today [11:40] I need a 2nd review for https://github.com/snapcore/snapd/pull/8403 [11:40] PR #8403: sandbox/cgroup: avoid making arrays we don't use [12:08] PR snapd#8459 opened: asserts: it should be possible to omit many snap-ids if allowed, fix [12:15] pedronis: ^ gofmt [12:17] no, you go fmt! [12:26] PR snapd#8460 opened: tests/session-tool: kill cron session, if any [12:26] pedronis: ta [12:42] I'm seeing failures on core-16-64, that are not obviously bogus [12:44] what kind of failures? [12:44] I saw two kinds today: [12:44] - reboot that went nowhere [12:45] - snap rollback and timeout on "snap list" [12:45] that felt really broken [12:45] mborzecki: https://github.com/snapcore/snapd/pull/8457/checks?check_run_id=570718915 <- cache of debian deps worked! [12:45] PR #8457: github: skip spread jobs when corresponding label is set [12:45] zyga: possibly, yes, reboot that went nowhere, but it seems new and real [12:45] mborzecki: I wonder if we can set cache scope to "global" to make sure everyone benefits [12:45] pedronis: I saw the reboot failure about twice last week as well [12:46] but never when testing with -debug to see :/ [12:46] mborzecki: spread-canary started on your skip label PR [12:46] mborzecki: and it works!!! [12:46] mborzecki: cool [12:47] mborzecki: with some extra love you could set a status label that shows it was skipped [12:47] but the feature works :) [12:47] zyga: uhh i don't like it though [12:47] mborzecki: why? [12:47] zyga: we still need to take as many workers as distros [12:47] mborzecki: but not spread Vms [12:47] mborzecki: that's nearly free [12:47] mborzecki: they all passed now [12:47] mborzecki: it adds ~30 seconds [12:48] and it's green - except for "pending travis" [12:48] hahah [12:48] mvo: https://github.com/snapcore/snapd/pull/8457 <- [12:48] PR #8457: github: skip spread jobs when corresponding label is set [12:48] no surprises there [12:48] * zyga hugs maciek [12:48] thank you :) [12:53] now, i still need to figure out that cla check [12:54] looks like there's a difference in what gets merged where between gh and travis [12:56] mvo: src/github.com/snapcore/snapd/vendor/github.com/chrisccoulson/go-tpm2/mu.go:267:17: constant 4294967295 overflows int [12:57] mvo: this now breaks master .deb builds [12:57] mborzecki: you can change how we check out things [12:57] mborzecki: there's also plenty of 3rd party solutions for this but I didn't look deeper [12:57] mborzecki: one was cool though, each CLA signature was a signed file in the repo [12:57] mborzecki: so the check was entirely offline [12:57] zyga: it's because we are getting a pc-kernel update in the middle of the tests [12:58] ohh [12:58] pedronis: did you reproduce it [12:58] zyga: there's an action ready for that, broought to you by SAP (?!) [12:58] zyga: no, but the log is obvious [12:58] pedronis: we should probably hold refreshes for snaps that cause reboots [12:58] once you look at it and that the tests [12:58] mborzecki: yes, SAP [12:58] https://github.com/cla-assistant/github-action [12:58] mborzecki: fun world :) [12:58] zyga: we don't have single snaps holding [12:58]