=== Wouter01004 is now known as Wouter0100 [06:39] morning [07:02] mborzecki: hey :) [07:03] zyga: hey hey [07:03] how was your weekend? [07:03] mborzecki: easy landing https://github.com/snapcore/snapd/pull/7867 [07:04] PR #7867: tests: fix fwupd version regular expression <⚠ Critical> [07:05] mborzecki: https://github.com/snapcore/snapd/pull/7864 needs love [07:05] PR #7864: cmd/snap-mgmt, packaging/postrm: stop and remove socket units when purging [07:06] PR snapd#7867 closed: tests: fix fwupd version regular expression <⚠ Critical> [07:07] zyga: do you know the backstory of https://github.com/snapcore/snapd/pull/7865 ? [07:07] PR #7865: travis-ci: add go import path [07:08] mborzecki: no idea but probably allows one to travis a fork [07:09] zyga: https://golang.org/cmd/go/#hdr-Remote_import_paths [07:12] hm, why didn't that fail on < 19.04 [07:25] PR snapd#7854 closed: snap-confine: raise egid before calling setup_private_mount() [07:25] hello :-) [07:37] hey sdhd-sascha [07:37] PR snapd#7840 closed: tests: use test-snapd-sh snap instead of test-snapd-tools - Part 2 [07:42] zyga: I'm not sure if it made sense to split the travis jobs. Maybe they were summarized because of the total build-time. And it is difficult to get a free build slot on the online travis-platform... [07:43] sdhd-sascha: yeah, it's a weird limitation; I don't know if your PR should be merged or not yet. I think sergio should review it [07:44] mborzecki: can you please review https://github.com/snapcore/snapd/pull/7129 [07:44] PR #7129: userd: allow setting default-url-scheme-handler [07:44] zyga: but the gopath import would be nice. Because with it, it's possible to build on own github/travis account, before PR [07:44] it's pretty old and I think nobody looks at it [07:52] mborzecki: thinking about https://github.com/snapcore/snapd/pull/7205 [07:52] PR #7205: rfc: introduce confinement options failsafe flag [07:52] what can we do to make it landable? [07:54] zyga: hmm so the snap-mgm test fails bc there's lxd socket unit, turns out the socket units are there base image [07:55] ok, let's filter those out then [08:00] morning [08:00] hey :) [08:18] PR snapd#7866 closed: travis-ci: split integration tests into parts (>4MB log limit) [08:19] PR snapd#7868 opened: Revert "travis-ci: split integration tests into parts (>4MB log limit)" [08:21] pstolowski: mvo: hey [08:25] hey mborzecki [08:26] PR snapd#7868 closed: Revert "travis-ci: split integration tests into parts (>4MB log limit)" [08:27] PR snapd#7862 closed: release: 2.42.5 [08:33] zyga: heh, updated #7864 [08:33] mmm, looking [08:33] PR #7864: cmd/snap-mgmt, packaging/postrm: stop and remove socket units when purging [08:33] aha [08:33] smart [08:33] nice [08:35] mborzecki: going to review your seccomp branch this morning [08:35] pstolowski: cool, thank you! [08:35] pstolowski: can you do a pass over https://github.com/snapcore/snapd/pull/7863 [08:35] PR #7863: interfaces/builtin: add uio interface [08:36] zyga: yes, sure [08:36] sdhd-sascha: hey, re 7866 - I accidently merged it but it was not reviewed yet so I reverted it again. please just reopen (or I can reopen). sorry for this! fwiw, we only hit the 4mb limit under certain circumstances of failing tests. not sure we want to split generally - the downside of a general split is that we only have a fixed amount of travis slots aiui so it may slow things down [08:36] ha, wireguard has landed in mainline [08:38] coffee and 1-2-1 prep [08:38] ttyl [08:39] mvo: no problem. [08:39] sdhd-sascha: thank you! [08:40] mvo: could you test the PR with the go import path? with them it was possible to build snapd on my own repo and travis [08:41] sdhd-sascha: could you please merge it on master or rebase it on master - there was a bug in master that caused the integration tests to fail which got fixed this morning. this should make the test green. [08:42] mvo: so anybody can test building before any PR. [08:42] sdhd-sascha: also could you please elaborate a little bit on this works? it sounds very cool! [08:42] mvo: what should i merge? [08:42] sdhd-sascha: just fetch the lastest upstream from snapd and merge master into your PR. I'm happy to do this for you if you prefer [08:43] mvo: sure, but i'm also new on travis ;-) i'm just googling for a solution ... [08:43] sdhd-sascha: I see :) [08:43] mvo: ok [08:44] sdhd-sascha: so with 7865, what would I have to do to run my forked branch? what steps to follow etc :) ? (does my question make sense?) [08:50] mvo: as i understand, you'll be able to run the travis job on your private fork [08:50] mvo: with 7865. just add the go_import_path, and add your repo under "MyRepositories" on travis. Then manually build on travis or wait on the next git push [08:50] mborzecki: right [08:53] re [08:55] I will now merge master on my pull-request. Can somebody please cancel my travis-jobs ? I wait until it works my private repo first. [08:55] sdhd-sascha: jobs auto-cancel on push [08:55] ...on my private repo [08:55] zyga: ? [08:55] no, on your private repo you can cancel them [08:55] I was talking about the snapd repo [08:56] zyga: I'm too. I won't waste your build time [08:56] don't want [08:59] mborzecki, zyga : do either of you know what's going on here? https://bugzilla.redhat.com/show_bug.cgi?id=1780712 [08:59] it seems like snapd doesn't know how to read the path properly... [09:00] Eighth_Doctor: it's sudo dropping env [09:00] Eighth_Doctor: the keyword is sudo [09:00] blech [09:00] I was afraid of that [09:00] try sudo env [09:00] I need to jump to a meeting [09:00] Eighth_Doctor: https://forum.snapcraft.io/t/centos-7-error-system-does-not-fully-support-snapd-cannot-read-the-value-of-fs-may-detach-mounts-kernel-parameter/14392/5 [09:02] would "sudo -E" help ? [09:02] """-E, --preserve-env preserve user environment when running command""" [09:03] sdhd-sascha: iirc the security policy will still drop your current $PATH [09:03] sudo -i should work though [09:32] mvo: is there a reopen button on github for 7866, or should i opened a new pr [09:36] re [09:36] sdhd-sascha: I think you need to open a new PR [09:37] sdhd-sascha, mborzecki thanks for explaining how 7865 works [09:48] Eighth_Doctor: do you think the proposed solution in https://bugzilla.redhat.com/show_bug.cgi?id=1691996 is workable? [09:48] Eighth_Doctor: btw, I need to thank you for excellent maintenance of snapd in Fedora. I've been using Fedora for a few weeks on my daily driver and it's nothing short of excellent [09:49] Eighth_Doctor: it really helps me to develop cgroupv2 features [09:50] PR snapd#7869 opened: travis-ci: split integration tests into parts (>4MB log limit) [09:51] mvo: i had to reapply the commit, because of the revert [09:52] sdhd-sascha: yeah, again, sorry for the trouble :( won't happen again [09:54] PR snapd#7853 closed: [RFC] boot,bootloader: setup the snapd_recovery_kernel grubenv [09:56] zyga: will you be pushing fixes to 7129? [09:56] checking [09:57] what kind of fixes? [09:57] zyga: just added a bunch of comments, but i can push the fixes too if you're busy with somehting else [09:57] mborzecki: you forgot to click send [09:57] there are no new comments [09:57] zyga: try refreshing [09:57] mborzecki: did [09:58] again [09:58] zyga: https://github.com/snapcore/snapd/pull/7129#pullrequestreview-328794160 [09:58] PR #7129: userd: allow setting default-url-scheme-handler [09:58] wow, now I see them [09:58] haha ;) [09:58] man, eventually consistent my ass [09:58] yeah, go for it [09:58] thank you for looking [09:58] ok [09:58] trivials anyway ;) [09:58] feels like something close [09:58] I'll jump back into https://github.com/snapcore/snapd/pull/7625 [09:59] to get it unblocked [09:59] PR #7625: cmd/snap-confine: stop being setgid root [09:59] but really, the desktop stuff is like duct tape and glue all the way [09:59] mborzecki: notice that 7129 is blocked on jdstrand review [09:59] mborzecki: yeah [10:00] pedronis: right, i can ping jdstrand_ in case he missed it (though i'm sure it's in his queue) [10:00] it's in his queue [10:13] PR snapd#7768 closed: interfaces: add raw-volume interface for access to partitions <β›” Blocked> [10:14] PR snapd#7768 opened: interfaces: add raw-volume interface for access to partitions [10:37] morning folks [10:38] hey ijohnson [10:38] hey zyga [10:39] hey ijohnson [10:39] mvo: did you see my comment on the SU doc? I didn't make it very far in updating the grub.cfg.normal for the UC20 pc gadget, also I had it in my notes that grub.cfg.recovery was the ubuntu-seed partition but reading through it, I think grub.cfg.recovery is for ubuntu-boot no? [10:40] reading the grub.cfg specifically [10:42] ijohnson: recovery goes to ubuntu-seed and *not* recovery goes to ubuntu-boot [10:43] pedronis: Hmm that's not how it's written now though, the grub.cfg.recovery is currently doing the chainloading to the recovery system grub IIUC [10:43] ijohnson: sorry, we are talking different things [10:44] sure, it is like 4 am here so not surprised I have something confused :-) [10:44] ijohnson: that should happen only for mode != run [10:45] not saying that is what happens right now, neither are definitive [10:45] or close to that [10:45] ijohnson: I have not yet, will look at it next, sorry! [10:45] which is part of the problem [10:45] ijohnson: I mean in run mode we should chain from ubuntu-seed recovery grub into ubuntu-boot grub [10:46] mvo: no problem just wanted to make sure you were aware if you needed someone else to work on this before Wednesday [10:46] ijohnson: yeah, that's fine, I figured :) [10:46] the recovery systems should really play a role only if mode != run [10:46] ijohnson: hope the trip was fine? [10:46] pedronis: yes I understand that part, I'm just a bit confused which grub file in the repo corresponds to which partition [10:47] ijohnson: recovery should correspond to seed, and the other to boot [10:47] if it's the other way around right, things are more messed up than I thought [10:48] mvo: actually I found a flight this morning so I'm at the airport waiting for my flight (hence being up at 4 am)... Hopefully the trip is fine it's currently snowing and we're supposed to get like 10 cm+ of snow this morning [10:49] ijohnson: uh! good luck! [10:49] pedronis: it's possible I am wrong about what it's currently doing but the recovery grub seems to be doing the logic to look for a recovery system to boot into [10:49] ijohnson: also - 4am sounds brual [10:49] ijohnson: brutal :) [10:49] ijohnson: was wonder about the early time :) [10:49] ijohnson: that's exactly what I would expect for seed [10:50] seed == recovery [10:50] pedronis: right that's why I was confused as well [10:50] Err wait sorry I misread [10:51] it looks correct, maybe there is no run mode logic there yet [10:51] I don't know [10:51] it's all part of the problem [10:52] I think there is no run mode logic in the recovery grub yet [10:52] pedronis: why would the seed recovery system grub need to look for all the other recovery systems? Is that by design that you can switch between recovery systems using grub? I guess that kinda makes sense now thinking about it [10:52] mvo: yeah it's a _bit_ early :-) [10:54] mvo: ah [10:55] pedronis: it's something to discuss with xnox today [11:26] brb, cold, going to make tea [11:34] pstolowski: thanks for the comments, updated #7821 [11:34] PR #7821: interfaces/seccomp: parallelize seccomp backend setup [11:34] PR snapd#7864 closed: cmd/snap-mgmt, packaging/postrm: stop and remove socket units when purging [11:36] yw, thanks for the PR! [11:37] re [11:39] PR snapd#7870 opened: boot,bootlaoder: setup the snap recovery system bootenv [11:47] pstolowski: good catch, mborzecki https://github.com/snapcore/snapd/pull/7821#discussion_r355405488 [11:47] PR #7821: interfaces/seccomp: parallelize seccomp backend setup [12:00] sdhd-sascha: https://github.com/snapcore/snapd/pull/7865#pullrequestreview-328870384 [12:00] PR #7865: travis-ci: add go import path [12:09] zyga: already been addressed [12:09] mborzecki: thanks [12:09] zyga, pedronis did you want jdstrand_ 's review on https://github.com/snapcore/snapd/pull/7830 or can it land? [12:09] PR #7830: interfaces: include hooks in plug/slot apparmor label [12:10] not really but I think it's short and would be good to keep him in the loop [12:11] zyga: sure [12:19] zyga: just started a new build on my repo [12:19] sdhd-sascha: you don't need to use repo builds you know [12:19] zyga: could you see, this build from me ? : https://travis-ci.org/sd-hd/snapd/builds/621989511 [12:19] there are faster ways to iterate [12:19] you can just use spread on a local system [12:19] zyga: kitchen was on my plan [12:19] zyga: kitchen is local travis [12:20] zyga: the build above, from weekend. has this patch and only debian sid is failed. all others are green [12:21] PR snapd#7850 closed: apparmor: allow 'r' /sys/kernel/mm/transparent_hugepage/hpage_pmd_size [12:23] zyga: any blockers to merge #7228 ? [12:23] PR #7228: interfaces: add audio-playback/record and pulseaudio spread tests [12:23] pedronis: checking [12:25] zyga: i do more tests, i will report [12:26] pedronis: I think it's good [12:27] PR snapd#7871 opened: tests: add spread test for hook permissions [12:28] PR snapd#7228 closed: interfaces: add audio-playback/record and pulseaudio spread tests [12:33] zyga: this possibly will be in linux 5.6 https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git/commit/?h=work.openat2&id=b767b87044c4b15c4f222daf6948a7a43f802f7e [12:33] yeah, I noticed on twitter yesterday [12:33] along with the new mount APIs it's great to see those [12:34] I don't have a clear vision of how to adopt such new APIs in snapd [12:34] as we will likely support the old APIs for foreseeable future [12:41] zyga: I pushed something to #7768, let me know whether it is what you expected [12:41] PR #7768: interfaces: add raw-volume interface for access to partitions [12:41] looking [12:43] pedronis: looks great, thanks [12:45] zyga: I am a bit confused by https://github.com/snapcore/snapd/pull/7768/files#r355413333, that's a go regexp, not something we feed apparmor [12:45] PR #7768: interfaces: add raw-volume interface for access to partitions [12:45] oh, did I misunderstand that [12:45] uff [12:45] sorry [12:46] I didn't notice that [12:46] I just noticed composition and was worried [12:46] commented [12:47] given that hotplug is for follow up (and we don't have an immediate use case), I think it can land if it gets green [12:47] I agree [12:47] hotplug is should be more careful, it's not as easy as uio as it is fairly dynamic in nature [12:48] and covers a wide variety of cases [12:48] but I think it's a natural case to attempt [13:04] * zyga small break [13:08] * Chipaca lunch [13:10] PR snapd#7872 opened: doc: HACKING.md change autopkgtest-trusty-amd64.img name [13:11] Can anybody stop my Travis-Build for modifying "HACKING.md" ? [13:19] pstolowski: what's the status of this: https://bugs.launchpad.net/snapd/+bug/1849564 [13:20] Bug #1849564: snap connections doesn't work as expected when interface attributes change === ricab is now known as ricab|lunch [13:30] mvo: zyga: I think we have finally fixed this: https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1814141 ? [13:31] Bug #1814141: fail to run any snap after snapd refresh, reinstalling snapd from the archive is a temporary fix [13:32] mvo: I suppose this should be closed somehow at this point: https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1739494 [13:32] Bug #1739494: snapd 2.29.4.2 is not installable on powerpc [13:34] pedronis: totally [13:41] pedronis: it's probably another case of https://bugs.launchpad.net/bugs/1848516 (as suggested in the report) but i'd need to double check if it's fixed too. note that in the bug report the plug name was changed, that's why it's the same problem (i think). if it was just attributes change, then i think we would hit the (known) problem of stale attributes on refresh that we only fixed for content interface [13:41] Bug #1848516: snap connections/interfaces shows dropped interfaces as connected after refresh [13:51] PR snapd#7873 opened: image: set recovery system label when creating the image [14:12] mvo: is it won't fix or fix released? [14:30] mvo: how are things going with getting 2.42.5 rolled out? [14:37] stgraber: looking good, we are waiting for the cert team right now to give us a +1 for moving to candidate, cachio has the details on where we stand here [14:38] pedronis: please don't land 7768 yet. some of the changes that went in I want to change === jdstrand_ is now known as jdstrand [14:39] jdstrand: you'll have to tell me more [14:39] jdstrand: which of the changes? [14:39] jdstrand: the test changes or the path changes [14:40] pedronis: in ijohnson's comment, there needed to be a change, but not to rip out the mmc and vda1 [14:40] jdstrand: they aren't used === ricab|lunch is now known as ricab [14:40] pedronis: I know. that is what needed to change, to use them, not to remove them [14:40] jdstrand: the code doesn't care where the slot come from [14:40] so not sure what difference it would make [14:41] jdstrand: I mean, I follow up sounds ok, I don't think anything fondumental is untested at this point [14:41] I picked those 3 paths specifically so make sure it worked in the normal case, the requested mmc case and the i2o case [14:42] the tests aren't done yet so I'd like to add it back [14:42] plus, I don't understand the s#vda1#vda1/# change [14:43] I only am just looking at it, but I want to understand why that was changed before committing [14:43] jdstrand: that tests the Clean [14:43] it wasn't tested [14:43] and zyga asked about it [14:44] we are actually likely missing tests about that for other interfaces that also have a Clean fwiw [14:47] jdstrand: anyway I don't see any fundamental blockers, but of course more testing is alway appreciated [14:48] pedronis: now I'm rethinking that clean is there at all. the regex is very precise [14:48] that clean should be there* [14:48] jdstrand: as I said to zyga all the other interfaces like this one do that [14:48] the Clean [14:48] so it would be incosistent not to have it at this point [14:49] jdstrand: see serial-port or spi [14:49] etc [14:49] I don't know what it was decided to do it that way [14:49] s/what/why/ [14:50] pedronis: yeah, but maybe that is wrong. why would we allow /dev/././././vda1//////// [14:50] jdstrand: we cannot simply change it, you would need to start with warning in review-tools [14:50] for the old ones [14:50] pedronis: well, I'm talking about this one [14:51] it's a well defined process, clean and then check [14:51] and consistency is important [14:51] but consitently wrong? [14:51] I didn't make the initial decision [14:52] it is not a bad pattern in general [14:52] just conceptually [14:52] but seeing /dev/vda1/ - that is not what we want to allow [14:52] jdstrand: but we do allow it [14:52] there are things, like i2o, that are a subdir [14:52] /dev/i2o/ [14:53] that's the code you wrote [14:53] I just refactored it [14:53] pedronis: yes, and I am rethinking that after seeing the test for /dev/vda1/ [14:54] jdstrand: ok, but then you need to give a list of other interfaces you think we ideally should remove the Clean [14:54] of course, /dev/i2o/ going to /dev/i2o doesn't match the regex, but still [14:54] because is everywhere atm [14:54] which is probably why you also put it in the first place [14:55] it is, I copied an existing over as a template [14:55] nice 500s from the forum [14:55] jdstrand: what I'm saying, you either convince me it's exactly a problem only for the new one, or which it is, and a path forward [14:55] pedronis: we should've been doing if path != filepath.Clean(path) err [14:56] just changing it for this one feels arbitrary [14:56] well, bugs happen. sometimes you only notice after a while [14:56] I just noticed this [14:57] I'm not sure consistency is a reason to add another one [14:57] that's ok, but we still need a path forward for everything [14:57] also because otherwise we'll get there again [14:58] jdstrand: 1 thing doing it one way, and 5 another is not a pattern [14:59] pedronis: I'm not saying that none of them can or can't change, but my point of 5 historical and everything else going forward as correct is a pattern. the 5 historical can be documented in the code [14:59] what's the problem with 'path != filepath.Clean(path)'? [15:00] Chipaca: that all the previous interfaces don't do that [15:00] Chipaca: the potential issue is that we might break existing snaps [15:00] ah [15:00] k [15:00] (if we change the existing ones) [15:00] Chipaca: we cannot just change it for old interfaces [15:00] without going to some review-tools long warning phase [15:00] or something [15:01] fyi, common files uses if p != np { [15:02] so does content [15:03] bool does not [15:03] re [15:04] neither does hidraw, i2c, iio, serial-port or spi [15:04] to be clear this has path, so the closest are the ones useing a path attribute [15:04] jdstrand: did snapd solve p != np? [15:04] ;D [15:05] I'll grab lunch and be back [15:05] do I need to read the backlog? [15:05] so, content, personal-files and system-files use !=, and bool, hidraw, i2c, iip, serial-port and spi just feed the cleaned path through the regex [15:07] jdstrand: as I said from attribute used (path), this belong to the latter set [15:08] the first three don't have an attribute named 'path' [15:10] jdstrand: which 3 ? [15:11] pedronis: now that I've seen it. I'm uncomfortable adding this to raw-volume. I disagree we should for consistency. I see your point about content, personal-files and system-files (ie, the first three) not having a "path" attribute, but they are effectively the same an imo should be considered [15:12] pedronis: I don't see how to move existing over in a very acceptable manner. it is your prerogative to say no to me [15:13] pedronis: what I could do is fix raw-volume and then add to all the others a code comment explaining that the pattern is historical [15:13] pedronis: that is probably the best I can do barring a long drawn out migration [15:14] jdstrand: ok, we probably want to think how to change those but is not trivial [15:14] they have been like that since a long time [15:15] yeah [15:18] pedronis: I'm not sure where we stand. I think you said fix raw-volume to use != (and while you didn't say it, I'll do a followup pr to add a code comment to the others and a test verifying the current behavior). if that isn't what you meant, please let me know what you want [15:19] jdstrand: what I would prefer is also that we make the path function a helper in utils, so there's a chance that we can use it again for similar new cases [15:20] jdstrand: also probably all the comments in the old code needs to point to raw-volume and or this helper [15:20] pedronis: something like where the api is pathHelper(path, regex) (name tbd)? [15:21] well it can take the slot etc [15:21] but whatever works [15:21] jdstrand: look at the helper as it exist now [15:21] pedronis: gotcha. ok, I'll do that in a followup PR [15:30] * zyga -> afk [15:31] pedronis: jdstrand: could you do the store check thing to see a full listing of all the paths that folks are using for these slots? If nobody's using a non-clean path then you could make the change more safely [15:36] ijohnson: that is true [15:37] ijohnson: btw, did you solve the rofs issue? [15:37] Well haven't tested solving it yet, but the first thing I did was patch containerd to always put containers under a specified label if an env var is set [15:38] pedronis: btw, when 2.43 goes to candidate, we need to do the final round of pulseaudio grandfathering. we could do a round now(ish) to get a jump start on it [15:38] jdstrand: But I also got strace of the mounts that containerd does and the mount ns of the containerd and I can't see why the rootfs would be ro [15:38] pedronis: what you gave me last time is sufficient for my needs [15:39] jdstrand: ok [15:40] ijohnson: weird. it occurred to me that perhaps temporarily an overlayfs could be used to workaround it. that wouldn't be a long term fix of course [15:40] Hey guys, I am trying to build snap with QtWebEngine for rpi eglfs. So it builds fine. Qt apps work. But when trying to run apps that actually use webengine I am getting a crash in webengine process (failed to execvp...). When running with LD_DEBUG I get this - /snap/screenly-client/x40/bin/screenly-client: error: symbol lookup error: undefined symbol: nspr_use_zone_allocator (fatal) [15:40] any idea what that can be realted with? I am building latest Qt, using system libs as dependencies [15:40] pedronis: just whenever you have a chance. if you don't have a chance until candidate, that is ok too [15:41] jdstrand: yea, we'll see, I'm also not sure when we'll branch 2.43 [15:43] jdstrand: If you have a little bit of time maybe you could look at https://pastebin.canonical.com/p/yh8M7G6qpk/ and https://www.irccloud.com/pastebin/IhtxTEjG/ [15:44] I can't see anything obvious there about why the rootfs in the container would be read-only [15:44] Note that in the strace log, there are multiple containers being created and we really only care about the last one [15:46] * cachio lunch [15:59] ijohnson: sorry, someone asked me something. line 369 of https://pastebin.canonical.com/p/yh8M7G6qpk/ [16:01] * zyga back from lunch, hacking more [16:01] ijohnson: that is doing what I thought it might be. it isn't necessarily an issue, but a pivot_root did happen before [16:02] jdstrand: no worries, so that mount call is making "/" which at this point is an overlayfs mount, a slave mount, but that shouldn't make it read-only right? [16:02] jdstrand: yeah I don't think pivot_root is as big of a deal as I thought it was [16:03] Because apparently our dockerd patches don't even prevent docker from doing pivot_root anymore and it still works because it runs inside a different apparmor label [16:03] sdhd-sascha: left a comment https://github.com/snapcore/snapd/pull/7845#issuecomment-563306248 imo there's an opportunity to fix in in systemd upstream too [16:03] PR #7845: cmd: add prefix to SYSTEMD_SYSTEM_GENERATOR_DIR [16:03] s/it runs/the container runs/ [16:04] ijohnson: no, but doesn't line 285 look weird? what is going on with lowerdir? [16:05] jdstrand: what's weird about that overlayfs lowerdir? It just has multiple lower dirs which last I checked is perfectly valid and well supported [16:06] ijohnson, jdstrand: what are you chasing? [16:06] zyga: the dream [16:06] don't we all? :) [16:06] (of kubernetes as a snap working nicely) [16:06] :-) [16:07] Maybe zyga you could take a look at the links I posted a bit above in the backlog, essentially the issue is that a container is started and it has "/" as read-only which is weird because it should have its own "/" built from overlayfs mounts [16:08] looking [16:09] mborzecki: is there a bug in systemd too ? [16:09] zyga: note to only look at the last set of mounts, etc. Because the first few are unrelated containers [16:09] sdhd-sascha: hm not a bug really, they don't have that path in sytemd.pc [16:09] cachio: is this something you're aware of? https://api.travis-ci.org/v3/job/622591349/log.txt [16:10] mborzecki: ah, ok. i could do that ? [16:11] ijohnson: ok, I haven't seen that before but see in https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt that it is supported. [16:11] ijohnson: stat -f /var/snap/kubernetes-worker/common/containerd/var/lib/io.containerd.snapshotter.v1.overlayfs/snapshots/91/fs [16:12] jdstrand: yeah, that's the thing I mentioned in Vancouver :/ [16:12] just many lowerdirs [16:12] zyga: issue is that this is an ephemeral container that is launched immediately and dies before we can really do anything [16:12] ok [16:13] ijohnson: so what's the symptom, that / is read-only? [16:13] zyga: yes / is read-only is the symptom [16:13] ijohnson: how is this determined? [16:13] ijohnson: can I reproduce this? [16:14] cmatsuoka, thanks for the report, I'll ake a look [16:14] well not easily reproducible locally due to many levels of $CONTAINER_TECHNOLOGIES and $CLOUD_TECHNOLOGIES [16:14] cmatsuoka, this test has been updated recently [16:15] ijohnson: I don't see why it would be read only otherwise [16:15] zyga: the container in it's entry point tries to create `/host/dir` [16:15] aha [16:15] and you get EROFS? [16:15] Yes [16:15] do you get a write error or a specific errno? [16:15] Uhh I don't have the specific error in front of me [16:15] ijohnson: I can help but I would love to get my hands on this [16:15] joedborg: are you around? Do you have the specific error when you get that read only issue? [16:16] ijohnson: otherwise it looks like it's not true, something is read only for real [16:16] the mountinfo dump, where is that from [16:16] from the container? [16:16] ijohnson: i'm here [16:16] can we get a shell at the moment that mountinfo was collected? [16:16] ijohnson: which did you want? [16:17] joedborg: can you pastebin the error you see when launching the container with the AWS script? [16:17] ijohnson: sure [16:17] zyga: kinda it's tough because the container pops up for very short amount of time then exits [16:17] what is going on with line 173: [pid 3012] mount("/", "/", 0xc0001c60a0, MS_RDONLY|MS_REMOUNT|MS_BIND|MS_REC, NULL) = 0 [16:18] ijohnson: how do you get that shell trace then? [16:18] zyga: so to get that mountinfo we had an `nsenter -all -t $(until pgrep install-aws.sh)` [16:18] ijohnson: looks like it makes the bind mount read only without remounting the filesystem [16:18] er [16:18] jdstrand: ^ [16:18] (sorry can only type so fast on a plane from my phone :-) ) [16:18] and recursively so, though I'm not quite sure that is really respected in the kernel [16:18] ijohnson: oh [16:18] ijohnson: man, you're good :D [16:19] zyga: I more meant, hey, ijohnson, does this look like it is doing the right thing? [16:19] I see [16:19] * ijohnson thinks [16:19] jdstrand: but note that the 1st line of mountinfo shows it is writable, both the filesystem and the mount point - you can see "rw" there twice [16:19] jdstrand: I saw that too I think it's a red herring [16:20] There are multiple containers run and the first one is just a bug checking container for the kernel I think [16:20] ijohnson: so, I bet you could get some of the info zyga wants by using a container that works (ie, one that doesn't try to create /host/..., run a shell, and then try to touch /host/foo or something to see if you get the same error [16:20] I think containerd creates a dummy container on startup to see if it has to work around kernel bugs or not [16:20] jdstrand: that's a good idea [16:21] well yes that would be good but is complicated due to $CLOUD_TECHNOLOGIES and $CONTAINER_TECHNOLOGIES [16:21] Essentially we don't know the full config of the container being created so we could reproduce [16:22] https://www.irccloud.com/pastebin/dna5cAMu/ [16:22] ijohnson: zyga: ^ [16:22] If we did yes it would be easy to reproduce but for example a simple container created with the cli tool ctr for containerd doesn't use anywhere near the same config as this cloud provider does [16:23] ijohnson: well, I know that joedborg was able to get some containers to run with what is available today. I'm just saying, maybe shell into one of those pods and see if the /host issue exists there too. if it does, perhaps it is the same issue was are seeing with eks [16:23] joedborg: that's EROFS allright [16:23] joedborg: how hard would it be to run some custom shell script instead? [16:23] ijohnson: that would at least rule in if it is eks only or if it is a generic issue [16:24] zyga: it’s pretty easy to open a shell, just not within that container as that comes straight from Amazon. Can it be a new container and be useful? [16:24] yeah [16:24] anything that is similar enough to reproduce the problem [16:24] I'm mainly interested in: [16:25] joedborg: that is the question. it is probably useful if we can reproduce the same rofs error in that container [16:25] joedborg: /proc/self/mountinfo from the point of view of the failure [16:25] s/same/same type of/ [16:25] joedborg: stat -f / [16:25] joedborg: along with any logs from the host that may contain relevant stuff [16:26] jdstrand: it would be quite evil to craft a seccomp profile that returns EROFS instead of EPERM ;) [16:26] I'm about to land so I'll be AFK for a bit but yeah if we can get a similarly configured container shell that would be helpful [16:26] ijohnson: safe travel [16:26] hah, it would [16:26] I'll be around for a while [16:26] if I'm gone just leave me a message please [16:31] now working on getting a shell [16:32] thank you [16:36] the issue is, because it's the networking pod that's failing, interacting with the cluster is a pain [16:37] joedborg: are you racing with the container failure? [16:37] joedborg: is this .... (winks) pod racing? :D [16:38] zyga: hahahaha, there are as many crashes [16:57] Hahaha yes this is exactly pod racing, this demo is just like Anakin's racer :-) [16:58] two thin strings holding it all together ? [16:58] PR snapcraft#2825 closed: remote-build: remove need to specify user [17:18] zyga: ijohnson: jdstrand: just wondering if any of you could take a look over this https://pastebin.canonical.com/p/yxh2bnq6ms/, can anyone see if the URI specified at the end is inside a pivot root? I don't think it is, but I think the Permission Denied is related to what we're seeing [17:18] looking [17:19] joedborg: is this from snap run --strace? [17:19] joedborg: if so _all_ of is it after snap-induced pivot [17:20] zyga: thanks! yep, i'm running ctr within the snap [17:20] zyga: when i say within, i mean via [17:20] how do you mean? [17:20] * ijohnson looks [17:20] pivot affects everything in the mount namespace [17:20] all paths are affected [17:21] I see some permission failures on epoll [17:22] zyga: from what i understand, this socket is meant to be opened to transport the tty, but it's failing to so I can't get a shell [17:22] zyga, epoll EPERM is expected I think IIRC [17:22] joedborg: on the host do you see any denials? [17:22] joedborg: to answer your question, yes this is "inside" a pivot root [17:23] joedborg: do you have all interfaces connected [17:23] the permission denial needs to be overcome first [17:23] there are ways to do that [17:23] connect existing interfaces [17:23] or switch the entire profile to complain mode [17:23] ijohnson: yeah i do this time :) [17:23] zyga: there aren't any denails [17:23] *denials [17:23] are you running this as root? perhaps those are DAC failures [17:24] zyga: i am yeah [17:27] 'abstract unix socket' [17:27] dog [17:27] doh [17:27] that won't be affected by pivot_root, assuming the error is accurate [17:27] I didn't notice that [17:27] yeah [17:27] jdstrand: yeah it is /not/ documented well [17:27] jdstrand: sharp eyes! [17:27] so i have no idea where it's meant to be written [17:27] PR snapd#7874 opened: seed: support ModeSnaps(mode) for mode != "run" [17:28] jdstrand: zyga: ijohnson: I've got to run to an appointment, i'll have patchy comms, but will try to answer anything [17:29] joedborg: the policy doesn't allow that path [17:29] we have: [17:29] unix (bind,listen) type=stream addr="@/containerd-shim/moby/*/shim.sock\x00", [17:29] unix (bind,listen) type=stream addr="@/containerd-shim/k8s.io/*/shim.sock\x00", [17:29] we need unix (bind,listen) type=stream addr="@/containerd-shim/default/*/shim.sock\x00", [17:29] jdstrand: it's interesting that joedborg said there are no denials on the host [17:29] jdstrand: i think it used to be the case, but now has changed [17:29] joedborg: did you disable kernel rate limiting? [17:29] zyga: that might be because i have a custom snapd [17:30] jdstrand: yeah [17:30] joedborg: I suggest while trying to get this up and running to stay on a particular tag/commit/something for containerd [17:31] jdstrand: ah i meant between when that policy was written for docker and now, i have always stayed on 1 commiut [17:31] joedborg: can you add to /var/lib/snapd/apparmor/profiles/snap.kubernetes-worker.containerd the above rule? [17:31] joedborg: and reload the policy with apparmor_parser -r /var/lib/... and then restart containerd and see if you get farther? [17:32] jdstrand: ah that works for getting a containerd shell [17:32] thanks [17:32] cool [17:33] joedborg: i've made a note of that [17:34] jdstrand: thanks! [17:36] * zyga EODs [17:36] okay, so this shell allows me to mkdir /host and write to it, so it's either an issue specifically via k8s or the aws container image [17:37] thanks for your help zyga [17:37] good luck [17:38] joedborg: so with that rule you still have issues with the eks container image? [17:39] joedborg: that is useful. you said "it's either an issue specifically via k8s or the aws container image". is the shell you just used not via k8s (ie, the MVP from last cycle)? [17:40] jdstrand: no so i can't get a shell via k8s atm because this aws container that won't start controlls the netwroking in and out of the node [17:40] jdstrand: with out own control plane, it works fine [17:41] joedborg: is this a regression from the mvp, this never worked or this wasn't tested before? [17:41] jdstrand: it's an aws specific thing [17:42] joedborg: ok, so we've seen it work with OTHER_CONTROL_PLANE_TECHNOLOGY [17:42] ijohnson: ;) [17:42] haha [17:42] jdstrand: we have, but this issue will surface with any workload that doesn't conform with standard unix root directories [17:43] jdstrand: i agree they shouldn't do this, but people like to use /myapp when in containers [17:43] jdstrand: :-) if only we weren't going to try and demo this with $SPECIFIC_CONTROL_PLANE vendor [17:43] we can whitelist any number of /myapp variants ;) [17:43] joedborg: sure, I'm just trying to understand if this is a regression or not [17:43] maybe they get bored eventualyl [17:44] * zyga really EODs [17:44] good luck [17:44] guys, really sorry, i have to run for this appointment (normally i wouldn't, but it's not rescheduable) [17:44] ijohnson: you are interested in understanding if the added unix rule allows for a shell with aws? [17:44] joedborg: no problem [17:45] joedborg: np. I'll give you a deb with the unix rule [17:45] jdstrand: sorry don't quite follow the question, yes I am interested in getting AWS eks to work with the snap [17:46] ijohnson: you asked "joedborg: so with that rule you still have issues with the eks container image?" - I was just making sure that question wasn't lost [17:46] jdstrand: thanks, maybe where it says β€œdefault”, β€œk8s.io” should probably be generic [17:46] Oh right yes I did [17:46] Because that’s just the namespace sep by the container [17:47] joedborg: I already did that [17:47] joedborg: thinkikng the same thing [17:47] Awesome thanks [17:52] joedborg: ok, uploaded a new deb [18:04] cachio: got this one too: [18:04] rsync: write failed on "/mnt/user-data/gopath/src/github.com/snapcore/snapd/share/locale/mnw/LC_MESSAGES/snappy.mo": No space left on device (28) [18:05] cmatsuoka, about the previous one I could reproduce the issue, the problem is in another test which is leaving the environment corrupted [18:06] cachio: ah interesting [18:06] cachio: do you know how to fix it? [18:06] so you know in which system is happening hte "no space error"? [18:07] cachio: ubuntu-core-18-64 [18:08] on snap-set-core-config [18:08] cmatsuoka, not yet because for some reason the snap test-snapd-service-watchdog is leaving garbage after it is deleted [18:08] cmatsuoka, I left some links in the stanup doc [18:08] cmatsuoka, https://paste.ubuntu.com/p/FgV9nr9rDg/ [18:09] so then when we run the test snap-set-core-config it fails [18:09] but is it because systemd is degraded [18:09] cmatsuoka, and if I run the snap-set-core-config in a loop it works perfect [18:10] so I presume there is some test which is executed before and it is causing this [18:10] cmatsuoka, I am now trying to see which is that test [18:16] PR snapd#7875 opened: interfaces: refactor path() from raw-volume into utils with comments for old [18:39] PR snapd#7876 opened: seed: fix seed location of local but asserted snaps <:birthday:> [19:06] * cachio afk [20:15] PR snapd#7848 closed: cmd/snap-bootstrap: mount ubuntu-data tmpfs, in one go with kernel & base [20:19] cachio: that test is passing now, did you do anything? === pedronis_ is now known as pedronis [20:24] cmatsuoka, no, it dependes on the tests order [20:25] the problem is on the test ubuntu-core-gadget-config-defaults [20:25] it is leaving a wrong systemd state [20:25] I am testing a fix for this [20:25] cachio: ah nice, thank you [20:26] cmatsuoka, thanks for showing me the issue