=== diddledan0 is now known as diddledan [00:32] PR core20#55 closed: hooks: make systemd-modules-load depend on mounts at /usr/lib/{firmware,modules} [00:57] PR snapd#8623 opened: tests/lib/prepare.sh: delete patching of the initrd <⚠ Critical> [05:49] morning [05:57] PR snapd#8622 closed: cmd/snap-bootstrap/initramfs-mounts: add sudoers to dirs to copy as well [06:30] mvo: hey [06:31] mvo: we can land https://github.com/snapcore/snapd/pull/8623 [06:31] PR #8623: tests/lib/prepare.sh: delete patching of the initrd <⚠ Critical> [06:31] mborzecki: cool [06:32] mborzecki: done [06:32] mvo: thanks [06:33] PR snapd#8623 closed: tests/lib/prepare.sh: delete patching of the initrd <⚠ Critical> [06:33] seems like that go 1.9 failure in wrappers is random after all [06:51] good morning [06:51] eventful evening? [06:52] zyga: good morning - we did all the bits missing for beta I think [06:52] zyga: snapd snap just building, that was the last piece, then we hopefully have a working beta [06:52] that's impressive [06:53] mvo: I'll do what claudio mentioned [06:53] boot it and leave it running [06:53] an aging test [06:53] zyga: nice [06:53] zyga: hey [06:54] :-) [06:54] guys, if you have daughters [06:54] hm? [06:54] then spend 30zł / < 10 euro [06:55] and play it with them [06:55] https://www.gog.com/game/foxtail [06:55] this game is incredibly beautiful and fun [06:55] localized, with click-to-advance in dialogue, so kids can read at their own pace [06:55] zyga: hmm looks like an old school adventure game? [06:55] it is [06:56] zyga: do you remember teenagent? :P [06:56] and the mood and looks are just so nicely crafted [06:56] mborzecki: yes but this is way better :) [06:56] (I bought teenagent as a teen) [06:56] no way xD [06:56] yeah :D [06:57] too bad gog doesn't support their client on linux [06:57] there's no DRM and it supports linux as well [06:57] mborzecki: yeah but the game is just a zip to download and run [06:59] zyga: well, it is, but it's clearly a work in progress title, so on linux you'll have trouble keeping up with the updates, none of the open source clients i tried seem to support updates nicely and there's some issue with gog updates api that apaprently makes it hard to integrate [06:59] mborzecki: this game is updated roughly once a year [06:59] mborzecki: anyway, just consider it [07:00] it really is worth the money [07:00] mborzecki: each update brings the next episode (3 out of 7 are available now) [07:00] it is made by a tiny studio [07:03] morning [07:03] good morning pawel! [07:13] anyone able to reproduce the unit test timeout in wrappers package? [07:13] mborzecki: on master? [07:15] mborzecki: I started go test -count 1000 ./wrappers/ [07:15] I'll let you know if it triggers [07:16] pedronis: hi, were you able to reproduce the unit test timeout in wrappers by any chance? [07:16] no [07:17] hm intersting, also the PRs that ijohnson opened yesterday were green today [07:17] it's definitely weird [07:17] also it might be that something else failed and the panic covers it because it doesn't get printed [07:18] yeah, that's true [07:18] mvo: hi, it's not needed for the beta but this also needs to be backported https://github.com/snapcore/snapd/pull/8602 I don't see it in 2.45 yet [07:18] PR #8602: configcore: only reload journald if systemd is new enough [07:21] mvo: this also wasn't ported https://github.com/snapcore/snapd/pull/8622/ ? [07:21] PR #8622: cmd/snap-bootstrap/initramfs-mounts: add sudoers to dirs to copy as well [07:30] pedronis: yes, that is right, let me fix that [07:34] pedronis: I moved https://github.com/snapcore/snapd/pull/8566 to snaplock [07:34] PR #8566: cmd/cmdutil: add run inhibition operations [07:34] once this is +1 I will adjust the dependencies [07:34] thx [07:35] need to do some other things before getting back to reviews [07:35] sure [07:35] I have plenty of things to write so not blocked [07:35] PR snapd#8624 opened: cmd/snap: fix the order of positional parameters in help output [07:37] trivial fix ^^ [07:38] mborzecki: I reproduced the panic [07:38] https://www.irccloud.com/pastebin/vmigrXPM/ [07:46] mborzecki: so [07:46] mborzecki: this trace is funny [07:46] mborzecki: we clearly exec something [07:46] yet the SetUpTest says [07:46] s.systemctlRestorer = systemd.MockSystemctl(func(cmd ...string) ([]byte, error) { [07:46] s.sysdLog = append(s.sysdLog, cmd) [07:46] return []byte("ActiveState=inactive\n"), nil [07:46] }) [07:46] which seems to suggest this mock is one thing [07:47] but calling "systemctl" executable is another [07:47] zyga: which go version is that? [07:47] 1.13.8 [07:47] brb [07:48] ok, so not go specific, probably just borked setup [07:48] or test [08:35] bugs, bugs, bugs [08:35] but also progres [08:40] #8564 got a +1 from pstolowski and now I applied his suggestions and is ready for 2nd reviews [08:40] PR #8564: asserts: introduce Pool [08:53] pedronis: what is the 'h' assertion header for? [08:53] mvo, hi, I canceled today sync as there isn't really anything to talk about [08:53] ackk: ok, I see there is a bug mentioned, I have a look at that [08:54] mvo, ah yeah, not sure if it's really relevant for snapd, but if you want we can talk about it. it's just me today [08:54] mvo, mostly, I added it there to ask if there was a best practice for knowing the underlying OS details from a snap [08:55] ackk: no need to meet just for that :) [08:55] ackk: we can cover it next time [08:55] right [09:01] zyga: they are test-only assertions, they look a bit like snap-declarations and snap-revisions [09:02] zyga: but unless you reviewed the other ones in this chain, I wouldn't recommend looking a tit [09:02] heh [09:02] at it [09:04] mborzecki: so that test failed again with the timeout [09:05] mvo: some the unit tests changes that came with jamesh PR that was merged yesterday are failing regularly but is not super clear what is going on [09:06] mvo: do you need to do a pre4 ? [09:08] mvo, I have a very basic snap with just meta/snap.yaml (to test the system-files interface). I snap pack it but when I install I get - Run configure hook of "testsnap" snap if present (run hook "configure": cannot snap-exec: no such file or directory) [09:08] mvo, why is it trying hooks when there are none? [09:09] ackk: do you have meta/hooks/configure? [09:09] zyga, nope [09:09] though the snap-exec error is also curious [09:09] just meta/snap.yaml and "script" , which is a bash script [09:09] I'm using base: bare if that mattes [09:09] *matters [09:09] ah [09:10] bare base has no shell? [09:10] no /bin/sh [09:10] oh, [09:10] though the configure hook is weird still [09:10] it's bare for a reason :D [09:10] in fact, it has nothing [09:10] you must provide a static binary [09:10] right, I forgot [09:11] pedronis: can you reproduce it? [09:11] pedronis: or was that in some PR? [09:11] mborzecki: a PR [09:12] pedronis: not sure if we need a pre4, I hope we can test what we have in beta and if it's all working fine I can do a 2.45 for real but it depends a bit. do you know already something that is broken? [09:12] mvo: no, just wondered because sudoers bits are not in pre3 [09:13] so you cannot really test recover [09:13] afaiu [09:13] pedronis: yeah, that was silly of me, I cherry picked it and pushed to release/2.45 and rebuild the beta. so it will be ~pre3+git but that should be ok for testing :) [09:14] pedronis: commit d16c44f0aa mentiones some issues with shutting down the session agent [09:15] I know [09:15] maybe we should try undoing and then it happens for often and try to really understand [09:15] s/for/more/ [09:16] anyway this is very annoying [09:25] hm wonde rif it's possible that the agent hits idle timeout and stops listening [09:26] mborzecki: ah [09:26] maybe should be possible to write a test to see what happens in that case [09:26] trying to provoke that in tests [09:26] but there's shoould be an Accept around [09:26] in that case, no? [09:26] sorry, there shouldn't be [09:27] ah right [09:30] zyga: I did another pass on #8566 [09:30] PR #8566: c/snaplock/runinhibit: add run inhibition operations [10:48] hmm we could use systemd version check to avoid LazyUnmount on core16 (which generates a lot of noise in system log) [10:49] pedronis: thank you for the review, I replied to one comment and I'll adjust the rest later today [10:49] pstolowski: +1 [10:49] pstolowski: but then we never rewrite .mount units [10:49] pstolowski: so -1 unless you know how to do that on systemd upgrades [10:50] ah, hmm [10:50] anyway, something to think about [11:01] PR snapd#8625 opened: wrappers: tweak the order of restoring, use client timeout in services test teardown [11:01] pedronis: ^^ maybe this will give us more insight [11:07] hm removed that thing poking the client in teardown and can reproduce the timeout now [11:11] zyga: maybe we could make systemd version part of systemkey [11:12] pstolowski: we don't rewrite mount units on system key [11:12] zyga: yes, but maybe then we would [11:12] pstolowski: what happens when you rewrite mount unit [11:13] I tried to propose that but it got stuck on this discussion [11:13] do we remount things? we cannot really [11:13] allright... just thinking aloud [11:13] no no, that's good [11:13] it's just we need to think how to do that in practice [11:15] zyga: interesting, firefox does this HOME: $SNAP_USER_COMMOM in its snap.yaml [11:15] I hope it's COMMON :) [11:15] but interesting, yeah [11:15] we give people the means and they get creative [11:26] meh, go test -timeout is kinda silly [11:26] PR snapd#8626 opened: tests: fix passing stdin via session-tool [11:27] mborzecki: ^ a trivial patch and an annoying discovery === pedronis_ is now known as pedornis === pedornis is now known as pedronis [11:27] mborzecki: your PR is green, so maybe we were trying to talk to the real agent? [11:27] do we need that poking code? [11:28] yeah, but it'd hard to explain why there would be another agent [11:28] maybe we activate it via socket? [11:28] do we mask the real agent? [11:28] however, go test ./... runs all package tests in parallel, if there's another package starting the agent, we could try to talk to taht other agent ? [11:28] ah, unit tests [11:28] (assuming mocking elewhere is incorrect too) [11:29] mborzecki: that sounds fragile, well there are agents own tests of course [11:30] mborzecki: anyway I'm still unsure why we need that poking code [11:30] right [11:31] perhaps we could land #8625 while i'll try to poke further [11:31] PR #8625: wrappers: tweak the order of restoring, use client timeout in services test teardown [11:31] mborzecki: yes, that's fine [11:38] it's finally warm enough to use the standing desk in the colder corner of the office :) [11:44] zyga, hi [11:44] hi [11:44] yesteday I left some errors related to session tool [11:44] I see you created a new PR with fixes [11:44] oh? [11:44] I didn't see those [11:45] which fixes? :) [11:45] zyga, is it related to this? https://paste.ubuntu.com/p/9Svw3Qd7yB/ [11:45] https://paste.ubuntu.com/p/k7fyPJch78/ [11:45] no [11:45] restore EOFs? [11:45] zyga, ahh, these fixes #8626 [11:45] PR #8626: tests: fix passing stdin via session-tool [11:46] that is related to another PR but not to the pastebin [11:46] is this EOF a one-off or something that always happens/ [11:46] zyga, restore EOF and also the other test [11:46] zyga, the test failing on this https://paste.ubuntu.com/p/k7fyPJch78/ [11:48] zyga, cannot connect to server -> curl --unix-socket /run/user/12345/snapd-session-agent.socket -D- -X POST -H 'Content-Type: application/json' -d '{"action": "daemon-reload"}' http://localhost/v1/service-control [11:50] zyga, this is also failing on uc20 https://paste.ubuntu.com/p/NgHhWVVPFW/ [11:50] zyga, the problem is that as it is failing on restore, it breaks the whole test suite [11:50] zyga, my concern is why it is not failing in google execution [11:51] but fails on edge and beta validation [12:06] re [12:06] cachio: probably random [12:07] cachio: unless it happens each time when executing a specific test [12:07] cachio: can you reproduce the problem in tests/main/snap-session-agent-service-control? [12:07] zyga, mborzecki it happens 100% of the time [12:07] mborzecki, yes I can [12:07] cachio: did you collect information about the other session? [12:08] cachio: from the session-tool failure? [12:08] ah [12:08] wait [12:08] sorry, I misread [12:08] it's the EOF [12:08] so no new data [12:08] perhaps the session agent isn't running yet/already/at all [12:08] zyga, no, but I can do it [12:08] it's only interesting iff we can reproduce it in isolation [12:08] cachio: if you can run session-tool test _alone_ [12:09] zyga, sure [12:09] ok [12:10] zyga, running [12:10] gime me 5 minutes until it fails [12:21] zyga, I reproduced the error [12:21] is any info which I could provide? [12:22] I have an ssh opened [12:22] what's the last thing that was logged in the spread run? [12:24] 2020-05-08 09:15:58 Error restoring external:ubuntu-core-18-64:tests/main/session-tool:test (external:ubuntu-core-18-64) : + session-tool --restore -u test [12:24] 2020-05-08 09:15:58 Error debugging external:ubuntu-core-18-64:tests/main/session-tool:test (external:ubuntu-core-18-64) : EOF [12:24] 2020-05-08 09:15:58 Restoring external:ubuntu-core-18-64:tests/main/ (external:ubuntu-core-18-64)... [12:24] 2020-05-08 09:15:58 Error restoring external:ubuntu-core-18-64:tests/main/ (external:ubuntu-core-18-64) : EOF [12:25] I could run with the flag to show the output [12:25] cachio: how do you log into the system? [12:25] it is a vm here [12:25] cachio: how does spread log into the system? [12:25] I it does not log [12:26] well, does it execute test by magic? [12:26] I have a spreac which show ssh on real time [12:26] it must connect to the system somehow [12:26] zyga, I have an image to reproduce the error if you want [12:26] no [12:26] I'd like to understand if this is a normal spread run that uses ssh to connect [12:27] zyga, it is a normal spread run [12:27] ok [12:27] using external backend [12:27] is it using ssh to log in as the root user on the device under test? [12:28] or is there any other user used [12:29] zyga, it uses root [12:29] same as in google backend [12:29] PR snapd#8627 opened: c/snap-bootstrap: port mount state mocking to the new style on master (2.45) [12:29] ok [12:29] cachio: can you log into the system [12:29] and run, as root [12:30] session-tool --prepare -u test [12:30] session-tool --restore -u test [12:32] zyga, done [12:32] did it EOF? [12:32] no [12:32] ok [12:32] is there anything in journal from the time of the failure? [12:34] zyga, https://paste.ubuntu.com/p/bfJYGthXwB/ [12:35] this is the only I see which seems to be relevant [12:35] I see it many times [12:35] cachio: what specifically? [12:36] May 08 12:15:58 localhost systemd[1]: session-tool-c24e7dda-59d7-4fbd-a871-fc2431b4f5d1.service: Main process exited, code=exited, status=1/FAILURE [12:36] the test is running "false" [12:37] Starting session-tool running false as test... [12:37] to check that exit status is forwarded [12:37] ah [12:39] cachio: K w [12:39] I wonder if this is just a network timeout over ssh [12:39] this test takes a while to run [12:39] maybe tweak it so that there's fewer iterations [12:39] and see if that changes anything [12:40] I am running again but now showing the output [12:40] so I'll see on which line fails [12:41] zyga, perhaps it helps [12:41] yeah, let's try [12:43] it is looping [12:56] cachio: any failures? [12:56] still preparing [12:56] ok [12:56] zyga, am I doing something wrong here https://bugs.launchpad.net/maas/+bug/1876217/comments/5 ? [12:56] I had to re-execute [12:56] Bug #1876217: Controllers report Ubuntu Core version in the Snap [12:56] because once it fails, then the session is not correctly cleaned and fails the next run [12:57] zyga, (wrt system-files plug) [12:57] zyga, I see same behaviour [12:57] as it calls itself it goes into an infinite loop [12:58] ackk: yeah [12:58] ackk: probably ..l [12:58] ls -ld /etc/os-release [12:58] is it a symlink? [12:58] zyga, outside or inside the snap? [12:59] well [12:59] outside [12:59] the hostfs one is failing [12:59] /etc/os-release -> ../usr/lib/os-release [12:59] so [12:59] ah I see [12:59] so I have to use the actual file? [12:59] apparmor grants you rights to read /var/lib/snapd/hostfs/etc/os-release [12:59] but you that is a symlink [12:59] so you need /var/lib/snapd/hostfs/usr/lib/os-release [13:00] and you need to understand this in your app [13:00] it kind of sucks because symlinks seen via hostfs are "hard" [13:00] it's good we don't have absolute paths in them [13:00] extend the read section [13:00] to say /var/lib/snapd/hostfs/usr/lib/os-release [13:00] (in addition to the etc line) [13:00] and you should be good [13:00] zyga, so I suspect that won't work for the purpose of being os-agnostic, as on other OSes that might not be a symlink [13:00] yeah [13:00] zyga, so it won't work even if the interface allows both? [13:00] the denial would also tell you [13:00] well [13:01] the symlink can in theory point anywhere [13:01] in practice it either is not a symlink [13:01] or it points to a finite set of files [13:01] which do differ by OS flavour (I recommend booting a fedora workstation) [13:02] zyga, I see, thanks [13:02] good luck! :) [13:02] oh [13:02] standup!! [13:03] ackk: I would prefer a snapctl os-release -like API [13:03] where you could just ask [13:04] and we'd tell you without this mess [13:04] zyga, yeah, that'd be good [13:05] zyga, alternatively the actual /etc/os-release content could be exposed in some other file, like /etc/os-release-host or something [13:05] although that would only work for core* bases [13:05] ackk: that's tricky too [13:05] it's easier to have an API [13:05] zyga, yeah I assumed it would be trickier :) [13:06] zyga, this is the last part of the output [13:06] https://paste.ubuntu.com/p/mysVkdXQ4M/ [13:06] "last" :D [13:07] I don't know where to look [13:07] the end looks like just cleanup [13:08] I am logging the full run now [13:08] no [13:08] maybe just look at the log [13:08] I mean [13:08] I just don't know what to look at [13:08] there are 35K lines there [13:09] so a longer log doesn't really help much [13:09] do you see where it failed? [13:09] zyga, https://paste.ubuntu.com/p/2NT6zGG5QB/ [13:09] this could help [13:09] what is the status of user-1001.slice? [13:10] cachio: that does not seem to be a fragment of the earlier log [13:10] as I cannot see the EOF messages there [13:11] zyga, no, it is a new run [13:11] zyga, this is from journal https://paste.ubuntu.com/p/hkcJwwpsV3/ [13:12] May 08 13:07:54 localhost sshd[19701]: pam_unix(sshd:session): session closed for user test [13:12] this is weird [13:12] how come are we logging in as the test user? [13:12] cheking [13:13] so [13:14] cachio: let me tell you what I think [13:14] session-tool --restore does stop the slice of the test user [13:14] that stops all the services running as the test user [13:14] and kills all the test users's processes [13:14] the only explanation for the EOF [13:14] is that we ssh as the test user [13:14] not as root [13:14] maybe both, dunno [13:14] but that message looks like we just ssh as test [13:19] zyga, I ran with -vv and I see it is running as root [13:19] export USER="root" [13:20] cachio: can you find the moment where we ssh [13:20] or log into the machine and see if we've ssh'd as root or as test/ [13:21] pstolowski: maybe move the window to the mac's main screen? [13:22] mvo: if you review the shell fix, please merge it, it failed on mount-protocol-error :/ [13:23] zyga: will do [13:23] thanks [13:28] cachio: I know what the problem is [13:28] cachio: look at spread.yaml [13:28] cachio: and look for user: test [13:28] that's the problem [13:29] we do connect as the test user [13:29] no idea why [13:29] it's probably not a good idea [13:29] (username: test) [13:29] zyga, yes [13:29] you are right [13:30] Let me update the test user and try again [13:30] ok [13:31] some tests failed on the store [13:32] updates: got unexpected HTTP status code 408 via POST to "https://api.snapcraft.io/v2/snaps/refresh" [13:32] pedronis: ^ is that expected? [13:37] zyga, https://paste.ubuntu.com/p/fww8t62NbC/ [13:37] thanks! [13:37] excellent [13:37] thanks! [13:37] I'll create a PR with the fix [13:40] PR snapd#8628 opened: cmd/snap: coldplug auto-import assertions from all removable devices [13:44] mvo: if you look at https://tracker.debian.org/pkg/snapd it says there were 8K new commits since release [13:44] https://qa.debian.org/cgi-bin/vcswatch?package=snapd [13:50] mvo: overrides for https://github.com/snapcore/snapd/pull/8626 and https://github.com/snapcore/snapd/pull/8614 would be great [13:50] PR #8626: tests: fix passing stdin via session-tool [13:50] PR #8614: tests: don't create root-owned things in ~test [13:50] I could re-trigger but that would just waste time [13:56] zyga: sure, OMW [13:57] thank you [13:57] PR snapd#8614 closed: tests: don't create root-owned things in ~test [13:57] PR snapd#8626 closed: tests: fix passing stdin via session-tool [13:58] thanks! [14:04] PR snapcraft#3111 opened: elf: fix parsing of notes after patchelf mangling [14:13] mvo: you might want to cherry-pick https://github.com/snapcore/snapd/pull/8623 onto 2.45 as well, since spread tests on the release/2.45 branch will fail there without that PR [14:13] PR #8623: tests/lib/prepare.sh: delete patching of the initrd <⚠ Critical> [14:13] mvo: I milestoned the PR as 2.45 for you [14:17] ijohnson: \o/ [14:35] PR snapd#8629 opened: tests: new test user is used for external backend [14:36] cachio: thank you :) [14:37] zyga, yaw [14:37] zyga, thaanks to you for helping with the issue [14:37] :) [14:37] yeah, it was a fun thing to discover :) [14:39] #8577 needs a 2nd review, maybe ijohnson ? [14:39] PR #8577: secboot,cmd/snap-bootstrap: move initramfs-mounts tpm access to secboot [14:39] * zyga is sorry for missing the TGIF [14:39] lunch [14:43] pedronis: yes it's been in my queue, will take a look [14:44] I forgot about your 8559 [14:46] mborzecki: so I double checked, the workaround would have poked the real agents, so it cannot have a useful effect [14:49] pedronis: i think we should just drop that bit that pokes the agent and see how it will perform in the unit tests accross more runs [14:49] let me push that change actually [14:51] mborzecki: one other thing I notice is that the agent uses SdNotify but I don't see it mocked [14:52] pedronis: that code shoudl be noop mostly withouth the env vars [15:09] PR snapd#8630 opened: tests: inject snapd from edge into seeds of the image in manual preseed test [15:10] cachio: ^ this should fix the problem, passes for me on 19.10 & 20.04 [15:11] pedronis: so i dropped that bit that poked the session agent, and it failed now ;P [15:12] mborzecki: it failed here too, but only with 1.9 [15:12] :D [15:12] maybe that shitdown is buggy [15:12] is it a bug in go? [15:12] shutdown [15:13] shitdown :D [15:13] heh, funny typo [15:13] shit happens [15:13] shutdown may hang [15:13] likely yes, but is not obvious because the shutdown code seems the same [15:14] so it would be lower level [15:14] maybe it relies on infra beneath [15:14] if we only could restart successful unit test jobs [15:14] oh about that [15:14] mvo: what about bit pusher? [15:14] zyga: bit-cacher? still an option, someone needs to review my pr [15:15] got to go, my father's name day, bringing him some cake and all [15:15] mborzecki: ^ you have a solution there [15:15] mborzecki: o/ [15:21] mmh [15:21] there is something odd about the code [15:21] maybe it relates to the problem [15:21] or not [15:28] kenvandine, btw, I reported https://bugs.launchpad.net/snapd/+bug/1877610 now [15:28] Bug #1877610: The gdb option takes over the user configurations [15:29] about the 'gdb screws the configuration and breaks your application' [15:29] seb128: thanks [15:29] pstolowski, great, thanks [15:37] * cachio lunch [15:45] PR snapd#8631 opened: image,cmd/snap,tests: add support for store-wide cohort keys [15:49] maybe I understand what is going on [16:18] PR snapd#8624 closed: cmd/snap: fix the order of positional parameters in help output [16:18] PR snapd#8627 closed: c/snap-bootstrap: port mount state mocking to the new style on master (2.45) [16:19] PR snapd#8629 closed: tests: new test user is used for external backend [16:23] PR snapd#8632 opened: configcore: only reload journald if systemd is new enough (2.45) [16:27] PR snapd#8633 opened: tests: detect and report root-owned files in /home [16:29] ijohnson: I guess with this ^ we could drop the FIXME comments in the pulseaudio tests [16:30] zyga: there's a lot of things there, which one? [16:30] I think I have a fix for the wrappers mystery, actual the workaround would have also helped (if it talks to the right agent as now in Maciej PR) [16:30] ijohnson: ah sorry, this code says BZZ BROKEN if a test leaks root-owned files in /home [16:31] ijohnson: this is the code that showed the three tests fixed today [16:31] the issue is that on older go it seems if you call Shutdown before Serve, you get a race and Shutdown will not quite work [16:31] pedronis: nice [16:31] pedronis: what was the problem? [16:31] ah :) [16:31] in normal usage that shouldnd't happen, but because this is setting up the agent also for a lot of cases that don't use it [16:31] it happens [16:31] ijohnson: if you scroll to the bottom, you can see how this works [16:32] pedronis: which go version is fixed? [16:32] I was looking at some hangs that I could not explain and perhaps those are related [16:32] zyga: no sorry I meant which PR were you referring to, there was like 8 messages from mup above your msg [16:32] the "^" was ambiguous [16:32] ijohnson: ah :D [16:33] ijohnson: see here please https://github.com/snapcore/snapd/pull/8633/files#diff-01bb876cbb65b7a336cd99a41c4b97a9R5 [16:33] PR #8633: tests: detect and report root-owned files in /home [16:33] zyga: I don't know [16:33] actually [16:33] ack looking now [16:33] zyga: it's not a change in Shutdown itself [16:33] so it's not very easy to track [16:33] and haven't looked deeply [16:34] ijohnson: some things we do in ad-hoc way could move there, like looking for selinux denials or apparmor denials or messages from broken udev rules [16:34] ijohnson: I have one more checker, that shows we don't have processes running as "test" user [16:35] this checker looks nice and I like having more checker things like this that make sure tests clean up after themselves [16:35] one thing this buys us [16:35] is perhaps a way to accelerate prepare/restore [16:35] as now it's somewhat heaveyweight because it must be very conservative [16:36] we could maybe make it faster over time [16:36] right [16:37] It would be good to see what kind of performance that is though, because I wouldn't be surprised if it's not actually that slow [16:37] that fast or that slow? [16:38] or rather than not doing automatic cleanup doesn't net us that much total performance [16:38] ah, I see [16:38] *performance gain [16:38] I would be happy if we get a clear message from broken tests :) [16:38] yes I think that's the MVP we should be looking for right now :-) [16:38] and not "one of the tests before, guess which please" but "this test right here" :) [16:39] agreed [16:39] also zyga what happened to "let's write all the -tool's in python :-P [16:39] " [16:39] well [16:39] I did [16:39] believe me [16:39] but it's 20K lines now [16:39] haha what [16:39] and I just realized this will never get merged [16:39] I wrote this [16:39] waaaay more complex [16:39] that's testbed-tool [16:40] it supports python and shell plugins [16:40] and has tests [16:40] and man [16:40] this shell script is good enough [16:40] ah I see what you mean [16:40] indeed [16:40] but [16:40] also much easier to review than 20000 lines of python [16:40] since this is not a source-me shell program [16:40] we can :) [16:40] that's the important distinction [16:41] zyga: seems it's this issue: https://github.com/golang/go/issues/20239 and it's still thee 1.10 but fixed in 1.11 [16:42] and we are building with 1.10? [16:42] we use 1.9 and 1.10 [16:42] I see [16:42] and 1.13 [16:42] well, maybe after beta we could discuss having more recent builds [16:42] in our tests [16:42] maybe for snapd.snap [16:42] dunno [16:43] I guess apart from core / snapd nobody is on go this old anymore [16:43] but after beta [16:43] pedronis: and great find! [16:44] ok I should EOW now [16:45] ijohnson: I'll push a few more checker patches that covert existing random checks into something more organized but probably later tonight only [16:45] sure sounds godo [16:45] good even [16:45] thanks, have a great weekend everyone :) [16:45] o/ [16:45] you too zyga o/ [16:47] PR snapd#8625 closed: wrappers: tweak the order of restoring, use client timeout in services test teardown [16:53] PR snapd#8634 opened: usersession/agent,wrappers: fix races between Shutdown and Serve [16:53] mvo: ijohnson: ^ [16:54] mmm pedronis I haven't been following this discussion super close, is it ok if I review a bit later, trying to wrap up the reboot from recover -> run PR to propose first [16:54] yes [16:55] mostly I pointed it out because it should help with some of the red we are seeing [16:55] ack, sounds good [17:36] pedronis: nice! [17:59] mvo: is the network fix for recover mode already in 2.45? [18:12] mvo: well it seems so, it worked now (but for some reason it failed once before) [18:30] cmatsuoka: yes it should be in the kernel snaps available on edge/beta [18:35] PR snapd#8635 opened: o/devicestate: support doing system action reboots from recover mode === ijohnson is now known as ijohnson|lunch [18:44] zyga, any idea for this? https://paste.ubuntu.com/p/MRVj2xkQJf/ [18:44] it is happening with the new bionic image [18:45] this happens with session tool as I see === ijohnson|lunch is now known as ijohnson [19:06] ijohnson: I added some small comments/questions to that PR [19:06] pedronis: ack I'm addressing them now [19:06] cachio: re [19:07] cachio: I think it's the race I mentioned during standup, I'll send a patch that probably fixes it next week [19:07] zyga, nice, thanks [19:22] so, why do our test images have an "ubuntu" useR? [19:22] sorry [19:22] why does spread-created core-16 and core-18 have an ubuntu user? [19:22] it's really weird [19:22] there's a /home/ubuntu/.ssh/authorized keys that's empty [19:23] but /home/ubuntu is owned by root:root [19:23] but /home/ubuntu/.ssh is ubuntu:ubuntu [19:23] do we have a snap changes change for creating users? [19:30] ok, found the bug [19:37] zyga, did you find it? [19:37] no, it was a false trail [19:37] I added a few more checks [19:37] if you know I'm all ears :) [19:38] zyga, I also see this error + session-tool -u test --has-systemd-and-dbus [19:38] grep error: pattern not found, got: [19:38] no user dbus.socket [19:38] is it related? [19:38] no [19:38] where do you see it? [19:38] 16.04? [19:38] bionic [19:38] using the new image [19:39] which I didnt publish yet [19:39] image: test-bionic-1 [19:39] for this test google:ubuntu-18.04-64:tests/main/session-tool-support:test [19:39] no-install-recommends probably ate dbus-user-session [19:40] should I install dbus-user-session ? [19:40] as a dependency [19:41] if it was preinstalled in the other image, yeah [19:41] or adjust the test, the test really shows us what the system defaults are [19:43] now we hit a different snapd-session-agent issue [19:44] zyga, it is not installed in the previous version [19:45] cachio: that's odd, the test really checks if you have dbus.socket as systemctl --user [19:45] cachio: and we seem to have it in a vanilla bionic image [19:45] cachio: otherwise it would not pass [19:45] cachio: perhaps something just removed it [19:45] cachio: if you have a debug shell, maybe it is in apt/dpkg history log [19:46] zyga, I think the problem is that now we have installed dbus-x11 installed and previously we didnt [19:46] pedronis: what did you run into? [19:46] cachio: dbus-x11 is not checked by session-tool-support [19:46] cachio: look at the test and at the filesystem please [19:46] zyga: test are failing in on core 16 for user services saying there's a start limit issue with the snapd-session-agent [19:47] pedronis: start limit, as in it starts and crashes quicky? [19:47] maybe [19:47] pedronis: core 16 and core 18 did not ship dbus for user sessions (20 did) [19:47] but did that test ever pass there? [19:47] maybe is my code change, but unsure it happens only on core 16 [19:47] I see [19:47] oh fun [19:47] Friday :/ [19:48] I'm trying to run the test alone, to see if it's flaky or always fail [19:57] zyga: alone it works [19:57] fwiw [20:01] zyga: do we start sessions for root in tests? [20:02] pedronis: if the test explicitly wants to, as in runs session-tool --prepare (-u root is default) and session-tool .... [20:02] pedronis: IIRC some tests do but most don't [20:02] pedronis: probably only tests that I wrote do that [20:02] maybe those tests are leaving something behind [20:02] pedronis: try with -repeat 30 or something like this [20:02] it is good at catching weird mistakes [20:03] like test breaking itself [20:03] as well as some chance of catching races [20:03] pedronis: perhaps, which test did you see fail? [20:03] if you dump all the information you have here I may have luck over weekend [20:04] zyga: https://github.com/snapcore/snapd/pull/8634/checks?check_run_id=657046716 [20:04] PR #8634: usersession/agent,wrappers: fix races between Shutdown and Serve [20:05] - Make snap "test-snapd-user-service" (unset) available to the system (Post http://0/v1/service-control: read unix @->/run/user/0/snapd-session-agent.socket: read: connection reset by peer) [20:05] yes, see /0 is root [20:05] yeah [20:05] let me check the test quickly [20:05] but the tests really set up a session only for test [20:05] sorry the test [20:06] I think there's an issue with the tests but also probably a robustness issue in general [20:06] well [20:06] the root user has a session [20:06] because spread logs in over ssh [20:06] perhaps we should mask session services for root [20:07] at least for testing [20:07] until we understand what's going on [20:07] so, without using session-tool, there is a session [20:07] loginctl shows it [20:07] it's the ssh we use to connect [20:07] maybe yes [20:07] so you get session services there [20:07] including auto-start and other things [20:07] if snapd does something to per-user sessions now [20:08] (I kind of forgot how this works) [20:08] it's plausible it talks to root's session as well [20:08] as a quick way to check [20:08] it does, the logic is very naive in some ways [20:08] that's why I said we might have to think a bit [20:08] but indeed my concern right now is not even the feature [20:09] but more test robustness [20:09] systemctl --user mask snapd.session-agent.{socket,service} [20:09] right [20:09] that line in core-16 prepare should disable this for the root user [20:09] so no socket, no autostart [20:09] if you want to get sanity quickly, maybe that's the solution [20:10] not today at this point [20:10] sure, [20:10] it's super late [20:10] I probably should look a bit more how the world looks when it works [20:10] the one problem with root session [20:10] is that we cannot "wipe the slate" like we do for test [20:10] since we're connected as root [20:11] so I guess we must be more careful with what happens across tests [20:13] well it seems we get the user agent for root not working somehow [20:14] ijohnson: I don't think I will get to re-review that PR today, also seems the check I made you add is redundant otoh I don't know how covered all that is [20:14] the connection reset by peer is simply the agent closing? [20:14] pedronis: no problem, I think it's reasonably well tested the only thing I don't have more tests for there that I could add would be that doing various combos of some mode to a differnet mode with a different system [20:14] pedronis: but as mentioned all such cases currently just fail [20:15] err are not supported, and thus fail [20:15] zyga: it's socket activated [20:15] or should be [20:15] maybe the agent crashes? [20:15] do we get a journal entry? [20:15] note [20:15] it's funny [20:15] or mabye the stop on idle is naive [20:15] because journalctl --user is distinct [20:15] so maybe we should look at journalctl --user as ewll [20:15] *well [20:16] the test does this in debug [20:16] + session-tool -u test systemctl --user status snapd.session-agent.service [20:16] we should also do [20:16] + session-tool -u root systemctl --user status snapd.session-agent.service [20:16] or really just [20:16] systemctl --user status [20:16] (without the extra fanfare) [20:28] May 08 20:23:40 may082014-475733 passwd[1570]: password for 'ubuntu' changed by 'root' [20:28] cloud init? [20:28] is our core16 system affected by cloud init data from GCE? [20:29] ha [20:29] it seems so [20:29] :D [20:29] May 08 20:23:40 may082014-475733 passwd[1570]: password for 'ubuntu' changed by 'root' [20:30] that's from cloud-init.service [20:30] pedronis: ^^ lol [20:30] is this expecteD? [20:38] cachio: so, do you know why we get the "ubuntu" user on core16? [20:38] cachio: we do have a weird ubuntu user that belongs to grup "niemeyer" [20:38] this looks very much like an artefact of cloud-init [20:38] but I cannot put my finger on it [20:39] specifically the group is surprising [20:40] how do we build images, do we take a image, ssh to it, add package and then clean up and snapshot it? [20:42] we build an image and copy some bits but I confirmed that /home/ubuntu does *not* exist at that stage [20:42] it really shows up during boot [20:42] even timestamps support that [20:42] I'm unfamiliar with cloud init detials, looking at some of what the logs say now [20:44] (so the directory and the user get added on first boot) [20:44] and it really seems that it's cloud init [20:44] and I would not really complain [20:44] except that it's buggy [20:44] the owner of /home/ubuntu is root :D [20:46] another fun find [20:46] core 16 ships this file [20:46] /usr/share/click/frameworks# cat ubuntu-core-15.04-dev1.framework [20:47] (messed up but you get the idea) [20:47] we also have /usr/share/upstart [21:10] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1639955 [21:10] Bug #1639955: bad test for snappy systems