[00:00] and if so, how does it run performance wise? [00:04] bashingboi: I haven't tried on debian yet [00:04] I can try tomorrow [00:04] today I should really hit the bed :) [00:05] ha, np. I will try for myself and check how it runs [02:17] Laney: yeah i guess unpacking repacking the initramfs is even more steps [02:18] Laney: it's at https://github.com/CanonicalLtd/subiquity/blob/master/scripts/inject-subiquity-snap.sh although bits of that are hardcoded for subiquity, the general bits could be extracted i think === chihchun_afk is now known as chihchun === chihchun is now known as chihchun_afk [06:10] morning [06:14] o/ [06:16] mborzecki: can you please review https://github.com/snapcore/snapd/pull/4912 [06:16] PR #4912: overlord/configstate: change how ssh is stopped/started (2.32) [06:16] mborzecki: if you have a kvm or a pi at home it's best to try it for real too [06:17] zyga: looking [06:18] zyga: wasn't sshd.service -> ssh.service already addressed by a patch from mvo? [06:18] zyga: or maybe it was the other way around [06:19] it was [06:19] but it broke badly [06:19] once sshd is disabled [06:19] it cannot be enabled [06:19] because it is an alias [06:20] zyga: snapd alias or some sort of system alias? [06:21] systemd alias [06:22] look at systemctl cat ssh.service [06:22] sshd is an alias there [06:22] if you disable ssh.service [06:22] the alias goes away [06:22] and cannot be enabled, started or stopped [06:22] we used sshd instead of ssh because of the issue with writable paths [06:22] if you just switch to ssh everything is fine [06:22] but then you reboot [06:22] and ssh is re-enabled [06:22] because writable paths is terrible [06:23] and ssh is enabled by default so it bring the file back [06:23] this is why I used two things: sshd -> ssh so that enable/disable works [06:23] and the conditional file to actually control being off or on [06:24] i'm reading up the systemd.unit manpage [06:24] ok [06:24] nice, > aliases specified through Alias= are only effective when the unit is enabled. [06:24] I may have missed something, hence the request for review [06:24] it's an urgen issue affecting customer deployments [06:32] zyga: wow that service file is interesting [06:32] zyga: sshd_not_to_be_run is a debian thing right? [06:34] Probably [06:52] zyga: a similar attemp was there in configure hook in the core snap https://github.com/snapcore/core/commit/dc782a37a3aef3fb11f256260f5205f2ff256acd [06:53] oh, that's interesting [06:53] we we had this [06:53] and lost this? [06:54] yeah, i was looking for the actual contents of writable paths and found this instead :P [07:09] PR snapd#4903 opened: tests: [WIP] split prepare-restore.sh into prepare-restore.d modules [07:36] PR snapd#4913 opened: interfaces,release: probe seccomp features lazily [07:38] hmmm, my PR just got a massive TLS handshake failure with everything (google, linode) [07:38] all tasks aborted [07:38] hope this is not a norm today [07:39] zyga: ok, played a bit with core image, if i understand it right since we disable the service, the symlink is removed from etc, but then it's still restored from core snap because /etc/systemd/system is synced in writable-paths? [07:40] yes [07:40] that's the issue [07:40] that's the reason we switched to sshd service [07:40] but this had unintended consequences [07:40] from what i see disabling/masking sshd does not work either [07:46] yep [07:46] I'm testing another mode now [07:46] where sshd was disabled [07:46] thus breaking people [07:47] and chceking if the new core with my fix heals that [07:48] hah, now i'm locked out of the ssh, and serial console only shows the prompt with ssh login :P, how can i log in? [07:50] mborzecki: ha [07:50] yank the card [07:50] set a password [07:50] and put it back in [07:50] mvo: so to recap [07:51] mvo: I tested the scenario where sshd.service was disabled [07:51] and that the new patched snapd can fix that state simply by rebooting (which happens on update) and starting ssh.service [07:51] by setting the core config value [07:51] so I think this is a fix for the issue in general and for the machines affected in particular [07:52] zyga: we do more than just disable, we also mask, I think this will result in a different outcome [07:53] my feeling is that we need to at least unmask (opportunisticly and ignore the error) [07:55] I will test that now [07:55] yes, perhaps [07:56] mvo: I think masking doesn't work [07:56] zyga@localhost:~$ sudo systemctl disable sshd.service [07:56] zyga@localhost:~$ sudo systemctl mask sshd.service [07:56] Failed to execute operation: File exists [07:57] the initial state was that neither sshd nor ssh were masked [07:57] zyga: let me check [07:57] hmm [07:57] maybe I'm wrong [07:57] let me reset everything [07:58] zyga: it used to be the case, we masked ssh.service and then sshd.service would reappear via the writable-magic. but let me double check :) [07:58] ok [07:58] tested that now [07:59] my vm will take a moment to be ready, it has an older core [07:59] initial state https://www.irccloud.com/pastebin/BMckHA77/ [07:59] so sshd is disabled and masked [07:59] now I'm setting the core config [07:59] and ssh starts and I can log in [07:59] ah [07:59] because I *start* it [08:00] not enable it [08:00] let's reboot now [08:00] I mean, the hack just starts ssh directly [08:00] so even if sshd is masked [08:00] that's not an issue [08:00] I think we should unmaks it though [08:00] but it's not for corectness [08:00] but more for purity [08:01] mvo: it's also good after reboot [08:01] even though sshd is masked [08:01] zyga: even with the mask? [08:01] lrwxrwxrwx 1 root root 9 Mar 23 2018 /etc/systemd/system/sshd.service -> /dev/null [08:01] zyga: wuuut? crazy [08:01] (this is after reboot) [08:02] woah [08:02] https://www.irccloud.com/pastebin/LHAlPevl/ [08:02] look at this [08:02] sshd.service changed on disk [08:02] this is after reboot, no touching at all [08:02] how come? [08:02] I'll reboot again just to be 100% sure [08:03] zyga: I think its confusion from ssh.service vs sshd.service but let me double check [08:03] so +1 to unmask it [08:03] zyga: I'm so happy we have the /etc/ssh_no_start thing, this unit aliases seem to be crazy [08:04] yes, I have the same feeling [08:04] testing the new build now [08:07] I need to buy some more serial ports === pstolowski|afk is now known as pstolowski [08:07] mornings [08:07] this makes testing so much easier [08:08] mvo: pstolowski: morning guys [08:09] hey mborzecki [08:09] zyga: mvo: could we straighten out the situation with ssh instead of using that ssh_no_run.. monstrosity? [08:09] mborzecki: maybe [08:09] mborzecki: so there are multiple issues [08:09] mborzecki: do you have a proposal? [08:10] I mean, yes, but not trivially [08:10] good morning [08:10] we'd have to remove writable-path sync on systemd services [08:10] mborzecki: our ssh.service is using Alias=sshd.service [08:11] mborzecki: our core snap has "/etc/systemd/system/sshd.service" which is in writable-paths marked as "sync". this means that when the file is missing it will be copied from the core snap [08:11] as i understand we could fix it in the core snap, have just one ssh.service and no aliases [08:11] mborzecki: so we need to unmask both ssh and sshd service [08:11] mborzecki: yes, I think that is a good idea, fix the core snap to a) remove the alias b) not use /etc/ but /lib for the unit [08:12] mborzecki: but its more risky than what zyga is doing [08:12] mborzecki: what I really like about the zyga PR is that its dead-simple [08:12] mvo: force pushed with the unmask [08:12] tested that it works all the way [08:12] ta [08:12] mhm, it is :) [08:13] mborzecki: but gustavo also wants to talk about this so maybe it will be solved differently :) [08:13] mvo, mborzecki: one note, we need to tie this to core16 [08:13] with core18 the approach should be direct and without hacks [08:13] zyga, mborzecki I think its fine to revisit this very soon and fix core16 directly, i.e. fix our ssh install and go back to straight systemctl stop/disable ssh [08:14] as far as i'm considered, it's +1, but ideally we should fix this and use ssh.service (or sshd, just make sure that it's a single one, no aliases or whatnot) [08:14] agreed [08:14] (sshd cannot be used) [08:14] we must use ssh.service [08:14] and get rid of that sync stanza on that director [08:14] *directory [08:14] mborzecki: whats also annoying is that even if we do stop/disable/mask ssh and sshd it fails on enable because the operations are not symetric [08:15] mborzecki: i.e. unmask/enable/start ssh sshd will fail during enable sshd because it says there is a symlink alsready [08:15] mvo: yeah, i did this in the console :P [08:15] mborzecki: which boggles my mind, but maybe I'm missing something [08:15] and then locked out myself :P [08:15] haha [08:15] :D [08:15] well [08:15] mborzecki: like how can you sanely stop a service with aliases [08:15] mborzecki: haha [08:15] I'm happy with this patch [08:15] * mvo hugs zyga [08:15] I won't touch it until we talk to gustavo [08:15] * mvo hugs zyga harder [08:17] mvo: 'Failed to execute operation: Too many levels of symbolic links' maybe it's something in system(d,ctl) too [08:19] mborzecki: yeah [08:19] mborzecki: like "wuuuuut"? [08:19] mborzecki: what kind of error is that to begin with [08:19] mvo: hi, there's quite bit of PRs marked for 2.32, anything in particular that I could help with a review? [08:20] mborzecki: anyway, sorry, I'm still perplexed how difficult it is to disable ssh via systemctl [08:20] zyga: pushed an update to #4902 [08:20] PR #4902: cmd/snap-confine: nvidia: preserve globbed file prefix [08:20] pedronis: let me unmark some [08:20] mvo: disabling is simple, making sure that it never gets enabled is a bit harder :P [08:20] pedronis: 4882 might be a good one, that is the most critical one right now [08:21] mborzecki: heh, exactly [08:22] I'll push a unmask of ssh.service now [08:22] just testing that after reboot [08:24] mborzecki: quick question about 4902 - that fixes nvidia issues on arch? [08:25] mvo: does 4882 need a follow up to use 4911 ? [08:25] mvo: yes, there's another pr for ubuntu #4908 (though still RFC) that includes the patch from 4902 [08:25] pedronis: yes, I will talk to gustavo about it [08:25] PR #4908: [RFC] cmd/snap-confine: attempt to detect if multiarch host uses arch triplets [08:26] pedronis: I think technically 4882 will solve the problems we are seeing but gustavo was keen to add the second layer of check (snap run talking to snapd on system-key mismatch) [08:26] mborzecki: ta [08:27] mvo: unmasking both now [08:27] zyga: ta [08:28] PR snapd#4909 closed: interfaces: harden snap-update-ns profile (2.32) [08:30] PR snapd#4907 closed: advisor: deal with missing commands.db file [08:30] PR snapd#4913 closed: interfaces,release: probe seccomp features lazily [08:34] PR snapd#4914 opened: advisor: add comment why osutil.FileExists(dirs.SnapCommandsDB) is needed [08:35] PR snapd#4915 opened: interfaces,release: probe seccomp features lazily [08:38] mvo: left some questions/comments [08:38] pedronis: yay,thank you! [08:54] mborzecki: https://github.com/snapcore/snapd/pull/4902/files#r176670685 [08:54] PR #4902: cmd/snap-confine: nvidia: preserve globbed file prefix [08:56] zyga: responded :) [08:57] replied too [08:59] zyga: ok, makes sense, btw. i was surprised by it too [09:01] mvo: not urgent but I'm confused/see some duplication of things in the roadmap page (for 2.32 and 2.33 and after) [09:03] mvo: did you notice https://github.com/snapcore/core/commit/dc782a37a3aef3fb11f256260f5205f2ff256acd [09:03] mborzecki pointed me to it [09:05] zyga: force pushed [09:05] thank you [09:08] damn, i feel sick, finally caught something from the junior [09:12] zyga: no, huh [09:12] mborzecki take it easy [09:12] mvo: fun, right? [09:12] where is that, it's gone now? [09:13] it was lost when we went to in-core config [09:13] zyga: yeah :( [09:13] zyga: sadly [09:13] full circle [09:14] 'sup? [09:14] Chipaca: fire, smoke, screams and tears [09:14] zyga: I didn't ask about your bedroom [09:17] Chipaca: it's slowly getting better onw [09:18] zyga: good to hear :-) [09:18] what did we lose by moving to in-core config? [09:18] * Chipaca curious [09:19] Chipaca: ssh control [09:19] https://github.com/snapcore/snapd/pull/4912 [09:19] PR #4912: overlord/configstate: change how ssh is stopped/started (2.32) [09:23] so, I have a question about the ssh-keys interface. I added it to my snap, did `snap connect` to enable it but when the snap runs I'm getting `Permissions 0644 for '/home/rumo/.ssh/id_local' are too open.` [09:24] The permissions that ls reports within --shell as the same as on the real system, though. And of course the permissions are correct on the real system. [09:26] zyga: quick question about reverts from 2.32 -> 2.31. the new/updated security profiles will break reverts, no? I mean, suppose 2.32 has the stricter profiles and it loads the per-snap update-ns profiles. now on revert the old snap run will start network-manager and that will use the new 2.32 profiles which are strict and won't let n-m run, is that correct? [09:27] on revert snap-confine will load the old profile which is permissive for content and doesn't support layouts [09:27] kalikiana: that's very weird [09:27] kalikiana: is the snap in the store? i'd like to take a look [09:27] mvo: the snap-confine profile is from core directly [09:27] zyga: awsome [09:27] mvo: and the issue didn't affect the regular profiles, just s-u-n profile [09:28] zyga: so no problem here, just wanted to double check [09:28] mvo: I think it should work [09:28] but I did not check [09:28] zyga: its part of our standard tests so we will know [09:28] ok [09:28] Chipaca: I haven't pushed that bit yet since it's not working, but I can push it [09:29] kalikiana: is it an easy to build snap? [09:29] i mean, just run snapcraft and wait sort of thing? [09:29] in that case i could use just the snapcraft.yaml :-) [09:30] to be clear, this is just me being lazy: i'd rather tinker with it to figure out what's going on, than think about it in the abstract [09:43] zyga: hey, does https://paste.ubuntu.com/p/jxtYz7z9CD/ ring any bells? [09:43] mmm [09:43] zyga: happend on master [09:43] is that on release [09:43] aww [09:43] I didn't push the debug thing to master [09:45] mvo: can you please force-land that https://github.com/snapcore/snapd/pull/4916 [09:45] PR #4916: tests: change debug for layout test [09:46] PR snapd#4916 opened: tests: change debug for layout test [09:56] kalikiana: so, I can't reproduce your issue [09:56] kalikiana: wondering what's different between here and there [09:58] kalikiana: one thing though, i don't think your snap needs to include openssh-client, as that's part of core and ssh-keys gives you access to it (at least to /usr/bin/ssh) [09:58] but here it works with both the ssh in the snap, and the one in core [10:00] Ah, I didn't realize that. So I can drop the package. [10:00] kalikiana: but that doesn't answer where the 0644 error is coming from :-) [10:01] Chipaca: Could it be a bug in snap connect then? Vaguely thinking of content interface issues depending on the order of install/connect/snap executation [10:01] kalikiana: wait, can you ssh from within --shell? [10:01] that's what i am doing and it works, if it works for you as well the issue is even weirder [10:02] Chipaca: No, same error, even on a simple "ssh helga" (helga being an arbitrary host that works outside confinement) [10:03] phew [10:04] Chipaca: Also getting a `Control socket connect(/home/rumo/.ssh/socket-root@***.***.***.***:22): Permission denied`with just ssh, before the other error [10:04] kalikiana: just to make sure, can you ‘find /home/rumo/.ssh -ls’? [10:06] Chipaca: That works and lists each key with -rw-r--r-- [10:06] kalikiana: inside, or outside? [10:07] Chipaca: both are the same [10:07] kalikiana: why are your private keys world-readable? [10:08] kalikiana: (see also: how is your ssh 'outside' not complaining about it all the time?) [10:08] Hmmm now that you mention it, they probably shouldn't be... [10:08] Chipaca: Outside is fine [10:09] kalikiana: fine how? [10:09] As in, I get no errors [10:09] right [10:09] But lemme try and change them [10:09] kalikiana: but you should :-) [10:10] kalikiana: only thing I can think of is that you have seahorse auto-adding your ssh keys on session start, and it's less picky about permissions, and ssh never looks at your actual keys because by the time you run it they're in the agent [10:11] kalikiana: if that's the case, it sounds like a bug in seahorse to me ;) [10:12] kalikiana: but also, sounds like an easy thing to detect and warn about from your snap [10:12] ie, can i access the keys? no: tell user about snap connect, yes: are they the right perms? no: tell user about it [10:13] Dropping the group/world reads seems to fix it indeed. [10:14] Now ssh just tells me `bind: Permission denied` and `unix_listener: cannot bind to path` [10:14] zyga: i think i need to check for a specific library in autodetection, one thats' always there, libnvidia-glcore.so.. looks like a good candidate [10:14] But it unlocked the key/passphrase fine [10:14] * kalikiana tests git push from the snap [10:16] kalikiana: that socket might be a control master thing, which isn't supported from inside [10:16] kalikiana: (the thing that lets you multiplex ssh sessions over a single connection) [10:21] Chipaca: You were right, disabling it explictly works [10:25] Chipaca: Thanks a lot for helping me track this down! [10:26] Saviq, a new fedora image with the correct size is uploaded [10:26] Now if it'd actually use the ssh agent, that'd be sweet, but it works perfectly [10:26] cachio: ack, me tries [10:27] Saviq, great, tell me how if goes [10:27] cachio: -64 suffix, too? [10:28] kalikiana: talk with jdstrand, but I suspect much laughter to be had :-) [10:29] :-D [10:40] Saviq, fedora-27-64 [10:40] this is the name you have to use [10:41] Saviq, you don't need to specify the image [10:44] zyga: can you take another look at #4902? [10:44] PR #4902: cmd/snap-confine: nvidia: preserve globbed file prefix [10:45] zyga: also i'm finishing with fixes for 4908 and will be switching back to ubuntu to check if things didn't break :) [10:45] zyga: might #1757284 be an instance of the 'interface connections go away' bug? [10:45] Bug #1757284: Several snap apps fail to launch [10:46] (now that i think about it, it probably is -- but don't know that bug #) [10:48] mborzecki: looking [10:49] PR snapd#4916 closed: tests: change debug for layout test [10:49] * Chipaca attempts to dad up and get out of bed [10:50] Chipaca: updated the bug with a question [10:57] cachio: yeah, it's lookin' good https://travis-ci.org/MirServer/mir/builds/357325003 [11:01] * Chipaca AFK for a while (physio, etc) [11:04] pedronis: hrm, hrm, my system-key pr has a bit of a problem on core. on firstboot on core snapd starts, generates the system-key and the core revision is empty (because things are not seeded yet). then stuff gets seeded but that means the systme-key changes because the core revision is part of the inputs [11:05] zyga: (cc -^) [11:05] * mvo will need to think about this while taking a short lunch break [11:05] hmm hmm hmm [11:05] mvo: if we drop core revision? [11:05] we keep it because apparmor profiles are there (includes) [11:06] hmm :/ [11:06] zyga: yeah, feels strange to just drop it, but maybe we can drop it on core? [11:07] well, if apparmor gets updated [11:07] zyga: let me try that [11:07] and snapd doesn't [11:07] it's not correct then [11:07] zyga: hm, good point [11:08] will apparmor come from the snapd snap? [11:08] fun questions [11:09] Saviq, nice [11:09] good to hear that [11:09] don't we check the feature of apparmor becuase of that? [11:10] pedronis: partly, the abstractions are also part of the mix and those come from the core snap [11:11] pedronis: apparmor has two inputs: the kernel capabilities and the abstractions with pre-writen rules [11:11] we could force a re-gen of the profiles after seeding but thats a bit ugly [11:11] full of dragons this pr [11:12] https://paste.ubuntu.com/p/77k9YJbVQK/ [11:12] humanSuite.TestHuman failed here, but was passing yesterday [11:12] pedronis: yes, apparmor will come from core snap (on core) [11:13] pedronis: I mean the include statements, we check for kernel features but each profile we make has include statements that reach to apparmor profiles from either native package or from the core snap on all-snap boxes [11:14] Chipaca: DST change https://paste.ubuntu.com/p/77k9YJbVQK/ [11:14] LOL [11:14] stuff will never stop to amaze me [11:15] otoh, i had no clue that we're changing clocks on saturday ;) [11:22] mvo: also the issue with seeding exists also on classic [11:22] mvo: we support seeding there with snaps [11:23] mvo: I have an idea [11:23] mvo: it's a bit unclear to me why are we writing a key before being seeded ? [11:24] mvo: we should wait/skip that, if there are not snaps, we are not seeded [11:24] mvo: instead of revision we can inspect a single file in /meta/manifest [11:24] (we don't have that file yet) [11:24] but as long as it encodes the version of every package [11:25] it caputres the essence of core [11:25] pedronis: we need that on boot to setup profiles for core itself [11:25] pedronis: on snapd daemon startup actually [11:25] mborzecki: on the one hand, huzzah it works [11:25] mborzecki: on the other, booo [11:25] zyga: ok, there's a recursion problem here [11:25] mborzecki: on the last one, a test that works 350 days a year is fine, right? [11:25] the double build-id doesn't make sense then [11:26] * Chipaca now really goes [11:26] pedronis: can you explain? [11:26] zyga: we generate profile with a key id without build-id for a thing that by definition has one [11:26] mborzecki: actually that might be a real bug, need to look when i get back [11:26] then we regenerate just because [11:26] pedronis: why without a build-id? [11:27] zyga: because there's no /snap/core/current [11:27] at that point [11:27] at least that's what I understood was the problem [11:27] mvo mentioned [11:27] pedronis: on core we could just use /usr/lib/snapd/snapd [11:27] no need for core's build-id [11:27] zyga: as I said the same issue exists with classic seeding [11:27] first boot is not a core only thing [11:28] on classic we can use /proc/self/exe [11:28] but we cannot capture apparmor changes in the distro [11:28] but I think that is done by apparmor itself, it has a trick to detect that in the init script [11:28] zyga: did you look at the PR, the system-key has two build-ids in it [11:28] atm [11:28] yes, we discussed that last night [11:28] PR snapd#4917 opened: repo: added repo ConnectionsInfo method (for the new snap connections API) [11:28] I think we can evolve that pattern [11:28] on core we need only one [11:28] but on classic [11:28] we have a problem [11:29] on classic we also need only one but we did two because that's easier [11:29] but then seeding [11:29] (because we don't know which) [11:29] it's not easy anymore [11:29] mvo: what do you think? [11:29] pedronis: I agree [11:29] pedronis: no easy way out yetr [11:29] *yet [11:29] one issue is also that we think reexec [11:30] is something you can turn on and off [11:30] easily [11:30] but is quite true [11:30] especial if you are outside snapd and the inside snapd [11:30] are far apart [11:30] *is not quite true [11:45] mborzecki: https://github.com/snapcore/snapd/pull/4908#issuecomment-375392384 [11:45] PR #4908: [RFC] cmd/snap-confine: attempt to detect if multiarch host uses arch triplets [11:45] * cachio afk [11:46] PR snapd#4903 closed: tests: [WIP] split prepare-restore.sh into prepare-restore.d modules [11:47] zyga: it's alredy fixed, i also improved detection and look for a specific nvidia lib (libnvidia-glcore.so), the packaging is also updated, now i'm cleaning up s-c apparmor profile [11:48] zyga: https://github.com/bboozzoo/snapd/commits/bboozzoo/nvidia-glob-prefix-ubuntu-triplet-wip if you want to take a look [11:50] mvo: zyga: I made a comment, one idea could be to simply delay writint the key when we are not yet seeded [11:50] thanks, [11:51] but this is going to change one important property [11:51] (perhaps) [11:51] that daemon doesn't respond to api requests before having stable security profiles [11:51] ? [11:51] maybe if we are not seeded (rare) we should just always rewrite profiles on startup (mvo: opinion?) [11:51] actually, I'm confused: writing the system key is not the same as computing it in memory [11:51] so perhaps that's okay [11:51] if we are seeded there no snap profiles [11:52] sorry [11:52] that's true [11:52] if we are not seeeded [11:52] yes, that's a good point [11:52] I'm saying, if not seeded delay until we really need it [11:52] which is about when we generate profiles for core [11:52] (which is the first thing we seed) [11:52] right, I understand that now, I forgot that not seeded == nothins is installed [11:52] so I think this is okay [11:53] I might be missing something [11:56] mborzecki: interesting https://build.opensuse.org/request/show/590390 [12:16] pstolowski: https://github.com/snapcore/snapd/pull/4917#pullrequestreview-106469715 [12:16] PR #4917: repo: added repo ConnectionsInfo method (for the new snap connections API) [12:23] zyga: that opensuse review looks good, maybe advise him to update to 2.31.2 while at it [12:24] I'll merge it and encourage him to iterate [12:25] mborzecki-ubuntu: the package is not buildable on leap though [12:27] actually, I didn't merge it but I added some comments [12:32] * zyga lunch [12:33] zyga: i'll force push to #4908 [12:33] PR #4908: [RFC] cmd/snap-confine: attempt to detect if multiarch host uses arch triplets [12:33] k [12:35] and pushed [12:36] ohmygiraffe works with nvidia and confinement now ;) [12:37] let me try the the flare snap [12:41] Glad for your giraffe :P [12:41] Morning all [12:46] pedronis: delaying sounds reasonable, let me have a look at this [12:50] Chipaca: is you that moved the waitMixin stuff to its own wait.go file? [12:58] pedronis: that sounds like me [12:58] pedronis: why [12:58] Chipaca: there's a function in cmd_snap_op.go that probably should have moved too, given that is used only in the wait code, lastLogStr [12:59] pedronis: ah, good call [12:59] Chipaca: I noticed because I'm solving conflicts in one of my old PRs [13:25] pedronis, zyga yet another `snap run` system-key issue on core. when we install a new core we update "current". but we only reboot ~10min later. during that time the system-key on disk and the one that snap run calcualtes are different because /snap/core/current is part of the inputs [13:25] mvo: I'm fixed the 10 minutes things but that's part of the problem [13:26] *that's only [13:26] pedronis: [13:26] pedronis: yes [13:26] mvo: I suppose there the endpoint might help [13:30] mvo: is there a way to `snap run --shell core18....`? [13:36] sergiusens: no, snap run only runs apps* [13:37] * and hooks but you don't want to do that [13:38] sergiusens: a trivial snap that just has a /bin/sh and uses base: core18 [13:38] sergiusens: might be the solution [13:39] mvo: yeah, I am doing just that; to verify core18 is in good shape (using the test job I created, but using yours should be fine as well, I couldn't find it from the store just yet though) [13:39] python/asciinema base: core18 and install to test the story end2end [13:42] mvo ok, everything is working now [13:43] sergiusens: sys [13:43] sergiusens: eh, I mean yay [13:44] lol, I was trying to decipher that :-P [13:44] Chipaca: this bit of code seems nonsensical or am I confused?: https://github.com/snapcore/snapd/blob/master/cmd/snap/cmd_snap_op.go#L543 [13:45] E: Type 'curity' is not known on line 50 in source list /etc/apt/sources.list [13:45] E: The list of sources could not be read. [13:45] how did we get 'curity' out of 'security' [13:45] pedronis: that looks broken indeed [13:45] https://api.travis-ci.org/v3/job/357284871/log.txt [13:47] Chipaca: I find it by chance, I suppose not all the error paths are tested :/ [13:51] mborzecki-ubuntu: https://github.com/snapcore/snapd/pull/4908 broken [13:51] PR #4908: [RFC] cmd/snap-confine: attempt to detect if multiarch host uses arch triplets [13:51] zyga: force pushed just now [13:52] zyga: `dpkg-architecture -a i386` doesn't work on 14.04, you need to do `dpkg-architecture -ai386` :/ [13:52] pedronis: the wait one is hard to test i guess [13:52] seems to be the only one with that particular bug though :-) [13:53] yes [13:53] funnily enough my test used that one command [13:53] Hangouts going crazy [13:56] pedronis: do you want me to fix it, or will you? [13:58] Chipaca: I can make a PR in a bit, also moving that other function [13:58] Chipaca: trying to finish the merge right now [13:58] pedronis: k [14:02] cachio: for the sru validation, could you pastebin the failures again for me please (or link) [14:12] Chipaca: if you're willing to try the nvidia stuff, please grab https://github.com/snapcore/snapd/pull/4908 [14:12] PR #4908: [RFC] cmd/snap-confine: attempt to detect if multiarch host uses arch triplets [14:13] mborzecki: actual nvidia, right? [14:13] I need to restart my session for that, give me a mo' [14:13] Chipaca: yes [14:14] mvo, hi, WRT the discussion at the sprint about having a "nobody" user snaps, has there been further discussion on whether that's acceptable? [14:17] mborzecki: what do i need to build against this to test? [14:18] Chipaca: just build and reinstall the package, add SNAP_REEXEC=0 to /etc/environment to make sure you're runing the right thing [14:20] Chipaca: once snapd deb is installed, please snap install graphics-debug-tools-bboozzoo too [14:28] mborzecki: … i'm getting the DST change error also [14:29] Chipaca: i've commented it out locally :) sorry [14:29] mborzecki: is this your revenge for that code :-) [14:29] * Chipaca runs it agian with DEB_BUILD_OPTIONS=nocheck [14:34] Chipaca: can you also do `cat /sys/module/nvidia/version`? [14:34] mborzecki: 384.111 [14:35] mborzecki: snapd build and running [14:35] mborzecki: an' now? [14:35] Chipaca: have you installed graphics-debug-tools-bboozzoo --edge? [14:36] * mborzecki forgot to mention --edge before [14:36] mborzecki: graphics-debug-tools-bboozzoo 1.0 3 edge maciek-borzecki - [14:36] mvo, https://paste.ubuntu.com/p/KB7SyvgRn4/ [14:36] Chipaca: SNAP_CONFINE_DEBUG=1 SNAPD_DEBUG=1 snap run graphics-debug-tools-bboozzoo.glxinfo |& tee log and paste the log somewhere [14:36] these are the spread tests [14:36] oh, and SNAP_REEXEC=0 [14:37] mvo, give me 5 minutes [14:37] Chipaca: SNAP_CONFINE_DEBUG=1 SNAPD_DEBUG=1 SNAP_REEXEC=0 snap run .. [14:38] mvo, also many of the autopkgtets failed because of "package context: unrecognized import path "context" (import path does not begin with hostname)" [14:39] seem to be a different go version installed there [14:40] mborzecki: http://paste.ubuntu.com/p/94hmf6KcHN/ [14:40] mborzecki: do you also want me to test it with the intel board (ie with prime turned off) [14:40] or turned to nvidia [14:40] Chipaca: looking good, can you try some snap that does graphics? eg. ohmygiraffe [14:41] * Chipaca doesn't know what these things are called [14:41] mborzecki: ohmygiraffe, supertuxkart and minecraft worked [14:41] Chipaca: I managed to add tests, not super interesting tests but better than nothing [14:42] * Chipaca hugs pedronis [14:42] pedronis: thank you [14:42] mborzecki: so as far as i'm concerned there is no regression with this :-) [14:43] Chipaca: one more thing, can you sudo nsenter -m/run/snap/ns/ohmygiraffe.mnt [14:43] Chipaca: and then `cat /proc/self/mountinfo` and paste it [14:43] mborzecki: snapd, but yes [14:44] right /run/snapd/ns/.. [14:44] mborzecki: whoa, leaving the mountspace makes x unhappy for a bit [14:45] hm? how so? [14:48] Chipaca: if you could also check that intel works :) [14:48] zyga: mvo: do we want the nvidia fixes for 2.32? [14:52] mborzecki: http://paste.ubuntu.com/p/4GhMJmt22t/ [14:52] mborzecki: or http://paste.ubuntu.com/p/Rp7TQVjsDY/ [14:53] mborzecki: I can try nvidia in a bit [14:53] Chipaca: great, so it doesn't regress and mounts the right thing :) [14:53] mborzecki: wrt leaving the mount namespace, it's as if i'd gone to console and back [14:54] even redshift gets reset [14:54] Chipaca: that's weird, maybe it's because the env is still there [14:54] mborzecki: you don't see that? [14:54] nope [14:54] ok, switching to nvidia [14:54] * Chipaca in a rush [14:56] mborzecki: http://paste.ubuntu.com/p/yKb7RhVffw/ [14:56] mborzecki: with intel [14:56] mborzecki: but … the ns are still there, i guess, maybe that's the problem? [14:56] mborzecki: http://paste.ubuntu.com/p/8xhdRfjnZw/ [14:57] now 3d apps don't work [14:58] Chipaca: is the driver loaded now? lsmod|grep nvidia? [14:58] mborzecki: i've got to go, but i'll be back in ~45 [14:58] mborzecki: it is not [14:58] PR snapd#4918 opened: cmd/snap: fix one issue with noWait error handling logic, add tests [14:59] * Chipaca runs [14:59] Chipaca: sudo /usr/lib/snapd/snap-discard-ns graphics-debug-tools-bboozzoo [14:59] ah ok :) i'll leave you some notes here [14:59] Chipaca: thank you for testing this [15:12] PR snapcraft#2022 opened: Polish our landing page ✨ [15:24] mvo: zyga: on classic we use the apparmor abstractions from the host? or from core ? [15:24] pedronis: host [15:25] mvo: that is not really part of the system key atm, no? [15:25] then [15:25] pedronis: yes, we just discussed htis that we can drop [15:28] pedronis: which makes the whole problem much easier [15:29] pedronis: also on core we apparently don't need this input because one of the init scripts (apparmor init) will rebuild the profiles if the abstraction change [15:29] pedronis: this makes the problem on core go away when the current symlink is missing [15:29] pedronis: eh, missing or updated too early [15:29] pedronis: which is really nice, I will update my PR now with this, I hope we finally have figured it out [15:30] zyga: ping [15:31] ah, maybe nothing [15:31] Chipaca: I pushed #4918 [15:31] PR #4918: cmd/snap: fix one issue with noWait error handling logic, add tests plus other cleanups [15:31] pedronis: ok [15:32] I’m on a walk [15:34] mvo, there? [15:34] mvo: let me know when/if I should look again [15:35] cachio: yes, sorry [15:35] pedronis: will do [15:35] np [15:36] so, about the sru [15:36] yes [15:36] mvo, did you see the expect error? [15:36] cachio: what was the pastebin again, sorry, missed it was in a very long meeting [15:37] np [15:37] mvo, many execs failed because of this https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-xenial/xenial/i386/s/snapd/20180319_122711_e1003@/log.gz [15:37] in autopkgtests [15:38] cachio: uhhh, I don't know where this one comes from :( [15:39] mvo, in xenial all the execs failed because of that [15:39] seem to be a problem with the go version [15:40] mvo: cachio: seems like spread doesn't build with 1.6 anymore [15:40] that's the issue I think [15:40] pedronis, yes [15:40] it's using context directly [15:40] I am builing with 1.9 [15:40] to avoid issues on spread [15:40] well but that test [15:41] is trying to build spread [15:41] on xenail [15:41] go get -u github.com/snapcore/spread/cmd/spread [15:41] boom [15:41] but, why it is doing go get instead of downloading spread? [15:41] downloading from where? [15:42] this is autopkgtests [15:42] from aws [15:42] not even sure they will let us download a random binary [15:42] or the snap [15:42] the snap might be an option I suppose [15:43] or spread need to be made to build 1.6 again [15:43] with 1.6 [15:43] pedronis, well that's something for niemeyer [15:44] though I'm not quire sure if the autopackage test for snapd can use snap [15:44] mvo: we cannot build spread anymore on xenial [15:45] PR snapcraft#2008 closed: Release changelog for 2.40 [15:56] Chipaca: what was the status of "human_tsest.go:67" ? it fails here with http://paste.ubuntu.com/p/nyxKqcQFqQ/ [15:56] :-( [15:56] mvo: yeah, mborzecki alerted me of that, and indeed it's failing here as well [15:57] so there's two bugs in it [15:57] mvo: it will be fixed tomorrow, or a day after [15:57] one in the test, which happily pretends it'll never get run on dst, and one in the code because there's a day missing [15:57] until the next dst change [15:57] need to look into that [15:58] mborzecki: glxinfo is still not finding the intel driver, but the games work [15:58] mborzecki: http://paste.ubuntu.com/p/XcmQZmvN5z/ [15:59] mborzecki: and http://paste.ubuntu.com/p/cZBYzXvX73/ [15:59] Chipaca: hmm interesting, can you check if it behaves the same with the regular snapd? [15:59] mborzecki: it did [16:00] Chipaca: ah ok, then there's no regression :) [16:00] bah, i can double check just in case [16:01] mborzecki: I can't wait that long … [16:02] mborzecki: hmm, hold on, some of these tests were run without SNAP_REEXEC=0 :-( [16:02] augh [16:09] mvo, also there errors I saw in linode exec for sru https://paste.ubuntu.com/p/KB7SyvgRn4/ [16:09] mvo, many "Permission denied" [16:10] cachio: I think these are the ones i'm concerned about [16:10] mborzecki: yeah, same [16:10] mvo, I am gonna debug those [16:10] mvo: did we land https://github.com/snapcore/snapd/pull/4877 for 2.32? [16:10] PR #4877: snap-confine: fallback to /lib/udev/snappy-app-dev if the core is older [16:10] mvo: I think we need too [16:10] to too [16:10] Chipaca: that's sort of great :) thank you for checking [16:10] mvo: I can fix that human time bug [16:11] mvo: probably :-p [16:11] Chipaca: please [16:12] zyga: I think so [16:12] zyga: but not for 2.31 .( [16:12] zyga: that might be it [16:13] all the failures there look crazy [16:13] I restarted it now [16:15] zyga: I suspect its the symlink for snappy-app-dev to snap-device-helper [16:15] mvo: symlink => copy [16:15] actually, symlink is fine [16:15] I think [16:15] zyga: I think we did not do that [16:16] PR snapd#4891 closed: tests: add support for phased prepare-restore logic [16:16] PR snapd#4914 closed: advisor: add comment why osutil.FileExists(dirs.SnapCommandsDB) is needed [16:16] PR snapd#4915 closed: interfaces,release: probe seccomp features lazily [16:19] zyga: did you merge #4891 with one review? [16:19] PR #4891: tests: add support for phased prepare-restore logic [16:19] mvo: I will tweak the ssh fix now [16:19] pedronis: yes, because it's tiny and I have useful fixes on top [16:19] pedronis: (and doesn't do anything yet) [16:20] did you discuss this with niemeyer? [16:20] I remember we had long code org discussion around this [16:21] I wasn't aware of that, what was the discussion about specifically? [16:21] how stuff should be organized [16:21] the usual contentious topic [16:22] was the outcome captured somewhere? [16:22] no [16:23] well, I can still back that out, I just want to do something because we have real issues and no solution in sight [16:23] I'm just saying that merging somethign that changes deeply how we structure spread bits [16:23] well, it's not doing that yet [16:23] with one review and without niemeyer input [16:23] is perilious [16:24] zyga: also is the usual issue with do nothing code, it's hard to tell how it will be used in practice [16:24] zyga: Yeah, please revert this.. [16:25] sure, in a moment, just working on ssh now [16:25] zyga: This sounds like a recipe for shell sausage [16:25] I explained how it will be used in the example file but I'm fine with reverting it [16:26] zyga: Yes, it's an example of sausage [16:26] I don't know what a sausage is [16:26] zyga: It's also not the right way of doing these changes.. please hear pedronis [16:26] zyga: I'll send you a picture later [16:27] our code is a mess there, I want someone to own it and fix it; I may do it in a way some people don't agree with but my motivation was to fix this [16:27] zyga: We have a forum here: https://forum.snapcraft.io [16:28] I don't mind if someone fixes our test suite to run better, really [16:28] zyga: Merging these changes, which sound controversial and unclear to begin with, with no actual discussion despite the fact we spent hours talking today, on a Friday, before a release, when people are leaving on holiday... man... [16:28] With one review! [16:29] niemeyer: none of that is in the release, none of that is contoversial or unclear, did you see what that code did? IMO that is an over reaction now [16:30] zyga: Please step back and breath.. it's not the way we do things.. [16:30] zyga: If you want to reorganize that logic, please open a forum topic and let's talk [16:31] zyga: As perdronis insightfully pointed out, we are very careful to not create a mess of shell on our tests.. we simplified things out of seemingly innocent practices several times.. [16:33] Hi: I'm trying to use wifi-ap snap on ubuntu core in Raspberry Pi 3, but status keeps changing to active=false after a couple of seconds i activate it. Any hints? [16:34] zyga, any news on https://travis-ci.org/snapcore/snapd/builds/357331248#L8138 ? [16:34] cachio: ha, interesting [16:35] cachio: can you look at the denial for the permission denied failures please? [16:35] cachio: only that I still don't understand something there [16:35] cachio: can you pastebin those? [16:36] 1 sec [16:36] [Fri Mar 23 11:08:29 2018] audit: type=1400 audit(1521803309.374:626): apparmor="DENIED" operation="open" profile="snap-update-ns.test-snapd-layout" name="/proc/sys/kernel/seccomp/actions_avail" pid=16648 comm="3" requested_mask="r" denied_mask="r" fsuid=0 ouid=0 [16:36] [Fri Mar 23 11:08:29 2018] audit: type=1400 audit(1521803309.402:627): apparmor="DENIED" operation="mount" info="Failed name lookup - deleted entry" error=-2 profile="snap-update-ns.test-snapd-layout" name="/etc/group" pid=16648 comm="3" flags="rw, bind" [16:36] cachio: the 1st one is fixed in another pr [16:37] today at least 3 builds failed on master because of this :( [16:37] cachio: the 2nd one says something removed /etc/group while we were looking and I don't understand it [16:37] jdstrand: ^ do you know what could cause the denial above? [16:38] (the second one) [16:39] PR snapd#4919 opened: Revert "tests: add support for phased prepare-restore logic" [16:39] niemeyer: I pushed the ssh changes [16:39] cachio: After you're done with the current topic, what spread images are ready to merge? [16:40] PR snapd#4919 closed: Revert "tests: add support for phased prepare-restore logic" [16:40] is that what you had on your mind earlier? [16:40] https://github.com/snapcore/snapd/pull/4912/files [16:40] PR #4912: overlord/configstate: change how ssh is stopped/started (2.32) [16:40] cachio: Sorry.. I mean, what snapd PRs moving images to Google [16:40] zyga: Looking [16:40] niemeyer, opensuse is almost ready, I think today will be ready to be merged [16:41] then, just fedora is missing, we need to see how to fix those deniails from selinux [16:42] zyga: I'm aware that we have issues with tests, it's a bit unclear that they are organisation problem though, the feeling in that prepare/restore do a lot, unclear if doing a lot differently is what we need, or just try to be more careful/streamline [16:42] yes, I know [16:42] zyga: One comment and LGTM [16:42] anyway, we can discuss this later when there's more time to see what I wanted to do [16:42] niemeyer, well then just missing debian sid which is currently set as manual [16:43] I have a refactor of the original code, that does what it did before but is more readable, that I wanted to use a base for refactoring [16:43] to restore the property that setup sets stuff up from pristine and restore undoes that to pristine [16:43] with hard checks that ensure this is happening for real [16:43] pedronis: Right, exactly.. the moment we allow ourselves to just throw files at a directory, which in fact *need* to be ordered because they depend on each other, is the day we lose complete control over the sanity of something that is already not as clean as it should be [16:43] niemeyer, are we gonna try arch or centos ? [16:44] niemeyer: they don't depend on each other much; I'd love to discuss this but I think you are jumping to conclusions now [16:44] cachio: Yeah, mborzecki had arch working already.. I closed the PR because it was still on Linode, but you two should be able to quickly get it up again on Google once you have everything else sorted [16:44] updating ssh again, just a sec [16:44] niemeyer, ok [16:44] zyga: I'm not jumping into conclusions.. I'm concluding from experience from many years looking at similar systems evolve [16:45] zyga: What's the rationale for the change in the first place? [16:45] niemeyer, then we need to discuss the changes needed to make fedora work on google with selinux [16:45] niemeyer, that's missing to move fedora [16:45] cachio: ping me when you'd want to give arch a try, IIRC there was just a couple of tests failing there (one of those was merged usr which got fixed in master not long ago) [16:46] cachio: Ok.. but gback to the original question: what can I merge? [16:46] niemeyer: pushed ssh, [16:46] the rationale... [16:46] " This patch will help to modularize the prepare/restore logic and hopefully make it more testable and easy to reason about." [16:47] niemeyer, opensuse will be ready but later doday [16:47] also that related prepare/restore code is in one file [16:47] cachio: So nothing yet? [16:47] niemeyer, #4886 [16:47] PR #4886: tests: adding opensuse-42.3 to google [16:47] and not far away from each other [16:47] zyga: Yep.. so exactly what I said.. [16:47] yesterday we merged debian [16:47] niemeyer: I disagree but I dont' want to argue abut it this way now [16:48] zyga: Me neither.. that was part of the point [16:48] niemeyer, I pushed a fix for opensuse, tests are waiting to be executed [16:48] niemeyer: is https://github.com/snapcore/snapd/pull/4912/commits/1213460ccc6b96baf28f478bde752d25b8212dbb ok? I think that's it for this fx [16:48] PR #4912: overlord/configstate: change how ssh is stopped/started (2.32) [16:48] zyga: Yeah, that was all, thank you [16:48] excellent, now what's left for the system key work [16:48] mvo: how can I help? [16:49] mvo, zyga: I had a quick question/comment.. can we hang out for a few (hopefully short) minutes? [16:49] yeah [16:50] zyga, niemeyer sure [16:50] I'm there [16:58] ogra_: hi, i followed your wifi-ap tutorial, but i cannot make it work... could you help me pls? [16:58] pablo_, on what board is that ? [16:58] (did you check jrounalctl for errors ?) [16:59] *journalctl [16:59] ogra_: raspberrypi 3. I'll check right now [16:59] any other network related snaps installed ? [17:01] ogra_: nmcli [17:01] ah, i think that will take over the wlan device ... remove NM [17:02] (if youo need to configure ethernet, use the config in /etc/netplan/ [17:02] = [17:02] ) [17:02] ogra_:journalctl shows lots of messages i don't understand.. i'll pastebin them. [17:02] ogra_: i'm using raspberry through ssh [17:02] well, likely that wifi-ap cn not switch the interface to ap mode [17:02] if i remove nmcli would i be disconnecteD? [17:03] well, if you did the initial setup with console-conf (to get the user set up etc) you should have a working config for eth0 already ... but you can also double check in /etc/netplan/ [17:04] ogra_: i'll check... if eth0 is configured there i can safely remove NM? then i install wifi-ap? [17:04] yes, that should work [17:06] jdstrand, I see this denial https://paste.ubuntu.com/p/8ZQVRQjZvZ/ running the test security-device-cgroups [17:06] it is happening when execute test-snapd-tools.env [17:07] jdstrand, any idea about what could be causing that? [17:07] ogra_, i removed it, then executer wifi-ap.setup_wizard. Connection keeps geting ap.active=false [17:08] did you reboot ? [17:08] no :S sorry [17:08] the wlan driver might be in a weird state [17:10] ogra_: same thing... after a couple of seconds, ap.active changes to false [17:10] well, then lets check the log ... [17:12] ogra_: how do i check the logg? sorry i'm very noob with this [17:14] journalctl? https://pastebin.com/hcRmcvbq [17:15] Mar 23 17:10:07 ema systemd-networkd[577]: wlan0: DHCPv4 address 192.168.2.111/24 via 192.168.2.1 [17:15] thats clearly in client mode [17:16] do you have an entry for wlan0 in your netplan config ? [17:16] (in /etc/netplan/00-snapd-config.yaml ) === pstolowski is now known as pstolowski|afk [17:18] ogra_, yes i have... it was created when i flashed ubuntu core.. [17:18] ogra_, how i change it? [17:18] well, if you use the wlan device in clinet mode it can not run as AP [17:18] *client [17:19] I cannot even use the wifi as client right now.. how can i change client mode? [17:20] you can re-run console-conf with "sudo console-conf" and make sure to only configure wired ... or you can edit /etc/netplan/00-snapd-config.yaml and remove the "wifis" block (but make sure you keep ethernet configured to still get into the system) [17:20] thanks! i'll try with console-conf [17:21] mvo: 4920 fixes the human thing [17:21] PR snapd#4920 opened: timeutil: in Human, count days with fingers [17:22] ogra_, in console-conf, should i erase everything in wlan0? is that enough? or how should i config it to work as ap [17:22] just pick "do not use" [17:24] ogra_, i can select "do not use" in ipv4 and ipv5 settings, but it says "associated to mywifi" [17:24] well, then change the wifi settings too [17:24] not sure how the "dont use" option in there is named [17:25] there isn't... i try to erase configuration. [17:29] pedronis: 4882 is ready for a second look I think [17:29] ogra_, excellent! now it works! thanks a lot.. [17:29] :) [17:29] enjoy [17:30] mvo: ok, almost dinner here [17:30] though [17:30] pedronis: yeah, same here [17:30] ogra_, is it possible to connecto trhough ssh if i connecto to the ap also? i mainly want it to do a simple webhost, but connecting trhough ssh could be helpful [17:31] sure, as long as you use the same key on the client side it doesnt matter through which network device you connect [17:32] thanks a lot! i'll disconnect a little time to connect to the boards ap === sparkieg` is now known as sparkiegeek [17:43] PR snapcraft#2023 opened: repo: catch error due to broken build packages [17:43] the BBC is wondering about Ubuntu's future: https://www.bbc.co.uk/cbbc/quizzes/beyond-bionic-mega-quiz-1 [17:43] * kalikiana wrapping up for the day [17:50] kalikiana: Heya.. didn't forget our conversation.. still need to write the notes into the forum.. let me do that now [17:50] kalikiana: Enjoy the weekend [17:51] cachio: Haha :) [17:51] Oops.. that was Chipaca [17:53] niemeyer: Grand! Looking forward to reading it. [17:54] I think this is a mostly-EOD from me [17:55] I'm off to do housework, might pop back in when the neighbours object to the hoovering [17:58] PR snapcraft#2022 closed: Polish our landing page ✨ [18:00] Chipaca: Sweet.. I'm might have a review done for you, but you shouldn't care anyway because it's your evening isn't it :) [18:00] niemeyer: sorry i can't hear you over all the hoovering [18:00] :) === alan_g is now known as alan_g|EOW [18:04] PR snapcraft#1999 closed: tests: document the SRU testing process [18:08] mborzecki, hey I'll start preparing the image [18:08] cachio: ok [18:09] mborzecki, which version should I use? [18:10] mborzecki, I think we can use this https://github.com/GoogleCloudPlatform/compute-archlinux-image-builder [18:10] cachio: take the last one, 2018.03.01, it's a rolling release anyway, so we'll have to update the image (bi-)weekly [18:10] the ones tht are prebuilt are pretty old [18:11] cachio: you can start with those scripts, if they work then we'll have an image, if not we'll try another way :) [18:14] mborzecki, ok, I'll try those scripts and see [18:15] mborzecki, shnould we try first with some stable image? [18:15] or it is ok with the last one? [18:16] cachio: there's no stable image per se, it's just a snapshot of current repos, it's always updating [18:16] mborzecki, ah, ok [18:16] cachio: https://www.archlinux.org/download/ 2018.03.01 is a good starting point [18:16] let's try with this last one [18:16] mborzecki, nice thanks [18:16] cachio: let me check one more thing though, there was some discussion in the mailing liast about a cloud/vagrant image of arch [18:19] cachio: does google require cloud init? [18:19] mborzecki, I don't think so [18:19] it requires its own cloud dependencies [18:20] mborzecki, why? [18:20] cachio: do you know what these deps are? [18:22] mborzecki, some of these ones https://github.com/GoogleCloudPlatform/compute-image-packages [18:23] cachio: ok, let me know if you run into problems, mayble i'll need to package something [18:26] PR snapd#4902 closed: cmd/snap-confine: nvidia: preserve globbed file prefix [18:28] mvo: back from dinner, I will look at the PR now [18:28] pedronis: great, thank you [18:32] Thank you pedronis [18:41] Here is a short teaser [18:42] root@spread-cluster:/proc# cat cpuinfo | grep processor | wc -l [18:42] 96 [18:42] root@spread-cluster:/proc# ls -ls kcore [18:42] 0 -r-------- 1 root root 141905719468032 Mar 23 18:42 kcore [18:43] PR snapcraft#2024 opened: errors: improve the UX for sending error data [18:45] mvo: left a few comments, I'm wondering what the move from yaml to json means in terms of the behavior on 2.31->2.32 transition [18:45] niemeyer, awesome [18:45] 141905719468032 is bigggg [18:46] cachio: Yeah, and wrong.. sorry.. :) It's close to the real one, but kcore shows virtual memory.. the actual size is 128GB, which is close, but in GB not TB [18:47] niemeyer, so, spread should be in charge of creating vms inside this server? [18:48] niemeyer, is it possible to create vms with arm32 architecture too? [18:48] pedronis: yeah, thats a critical point indeed [18:48] cachio: Yeah.. I'll do a tiny daemon that basically allows a conversation like "give me a server" => "okay, server 142 has ssh at ...:8782" => "thanks, destroy 142" [18:48] pedronis: I think we need to handle this as a mismatch without [18:48] mvo: bug we ingore mismatches [18:49] s/bug/but/ [18:49] pedronis: I mean, I think we need to return that we have a mismatch and no errror here which sucks a bit [18:49] pedronis: or we could first try json and if that fails try nyaml it [18:50] but the content is different, no? [18:50] pedronis: yes, the content is different so it would be a case of "there is a mismatch, try to talk to snapd before contiue" [18:50] niemeyer, opensuse tests passed 100% but I had to retrigger because the layout test failed for ubuntu-sore [18:51] cachio: Due a known issue or hiccup? [18:51] pedronis: maybe that is actually the answer, on error we try to reach snapd instead of silently ignoring the issue? [18:51] niemeyer, there are 2 issues related, for one there is a PR [18:51] niemeyer, the second one is under research [18:51] cachio: Ack [18:52] niemeyer, zyga has more info for sure [18:52] PR snapd#4921 opened: skip test if no user "daemon" in build jail [18:52] mvo: the issue is that there are various scenarios [18:52] the race described in the comment is only one of them [18:52] there's also switchin on or off REEXEC [18:52] potentially [18:53] and this yaml to json transition [18:54] mvo: I sent my review [18:54] pedronis: indeed, at least for the race described in the comment and for the yaml -> json handling this by waiting for snapd should be ok. i.e. on error try to reach snapd and then continue [18:54] zyga: cool, thanks [18:54] sorry for taking so long, I took a break from this [18:55] zyga: no problem, thanks for the review. will address things in a little bit [18:56] pedronis: I'm not sure about the switching on/off of re-exec though [18:56] mvo: there is nothing we can do in general there [18:56] * zyga EOWs [18:57] except for testing/development it shouldn't really be done [18:58] mvo: anyway hopfully people don't have things trying to run while they switch [19:00] we just have to keep the packages up to date ;-) [19:07] mvo: Just reviewed 4882 as well.. very nice, and looks super close [19:20] PR snapd#4918 closed: cmd/snap: fix one issue with noWait error handling logic, add tests plus other cleanups [19:31] niemeyer: yay, thank you all. fixing the comments now [19:31] mborzecki, the images that that project is creating are about 100GB [19:32] 107 GB [19:32] I am aborting this, creating arch linux image from scratch [19:42] mvo: niemeyer: the comment about connection to old snapd means we really need #4911 too ? [19:42] PR #4911: daemon,client: add build-id to /v2/system-info [19:48] niemeyer, #4886 in green [19:48] PR #4886: tests: adding opensuse-42.3 to google [19:48] cachio: Is it stable, or are there issues to sort out still? [19:48] niemeyer, opensuse is stable [19:49] I ran more than 10 times and I didn't see any error [19:49] last 3 executions in travis 0 errors for opensuse [19:51] cachio: LGTM then! [19:53] * niemeyer needs to step out.. back later [19:59] As Snap is container based solution, does it effect the performance? [20:04] niemeyer, pedronis updated the PR [20:14] PR snapd#4912 closed: overlord/configstate: change how ssh is stopped/started (2.32) [20:14] zyga: quick question - 4912 needs a port to master, right? [20:16] PR snapd#4922 opened: overlord/configstate: change how ssh is stopped/started (2.32) (#4912) [20:27] Re [20:31] PR snapd#4886 closed: tests: adding opensuse-42.3 to google [20:43] * zyga checks 4912 [20:43] mvo: yes b [20:43] zyga: I added it [20:43] mvo: I assumed it would not be an issue if this lands with the beta release you plan on making today [20:43] zyga: all except 4882 are now in [20:44] zyga: its fine [20:44] zyga: I was just asking to not duplicate work :) [20:44] sorry I wasn't around, I didn't hear the pings [20:44] I was on headphones watching a movie [20:45] zyga: toally fine [20:45] zyga: no worries [20:46] * mvo hugs pedronis for his review [20:46] * zyga makes tea and reads the reviews [20:49] zyga: just 4882 [20:49] zyga: :) [20:49] mvo: I added one nitpick and +1 [20:49] it's not important though [20:49] not worth waiting another round of tests if they go green [20:52] * mvo hugs zyga [20:52] zyga: the fearful kernel [20:52] haha, did I write fearful? [20:52] zyga: no but that is what I read at first [20:54] Yeah they sound kind of similar [21:01] So after this branch lands then what? [21:01] Do you need a PPA build [21:05] zyga: yeah, when it lands-> ppa build [21:24] oooooh [21:24] I reproduced the /etc issue!!!!!! [21:24] for the first time, I have debug shell [21:24] this is like candy [21:24] but diet [21:27] zyga: yay [21:27] * mvo hugs zyga [21:28] mvo: do we need #4877 ? [21:28] PR #4877: snap-confine: fallback to /lib/udev/snappy-app-dev if the core is older [21:31] and I have a strace [21:31] pedronis: thats the master version of a pr that landed in 2.32 already [21:31] [pid 2951] mount("/tmp/.snap/etc/group", "/etc/group", 0xc82010fb6b, MS_BIND, NULL) = -1 ENOENT (No such file or directory) [21:31] this is what fails [21:31] mvo: ah, ok [21:31] pedronis: its "funny" becasue the tests for this one in master fail all the time with snap-confine.dsajlkjfslda (tmpname) leftovers [21:31] pedronis: and its not clear what is writing these leftover files [21:32] mvo: I have a PR that gets layout problems most of the time, interestingly follow up PRs are all green [21:35] * zyga has a feeling this is a kernel bug [21:35] I'll paste the strace [21:37] https://pastebin.ubuntu.com/p/qKyPqTnT3N/ [21:37] pedronis: hm,hm,I saw the layout ones too but not on recent test runs [21:38] * cachio afk [21:39] super interesting facts: [21:39] 1) /dev/sda3 on /etc/group type ext4 (rw,relatime,data=ordered) [21:39] : /etc/group is a bind-mount [21:39] hmmm [21:40] is that the result of threspassing ? [21:40] no [21:41] this is just /etc/ on core [21:41] I don't get this thing: [pid 2951] lstat("/etc/group", 0xc8201097c8) = -1 ENOENT (No such file or directory) [21:41] [pid 2951] mount("/tmp/.snap/etc/group", "/etc/group", 0xc82010fb6b, MS_BIND, NULL) = -1 ENOENT (No such file or directory) [21:42] this last error is the error we are seeing [21:42] we carefully create /tmp/.snap/etc/group [21:42] and it exits, we can see that on line 1478 in the pastebin [21:43] but when we try to bind mount /tmp/.snap/etc/group over to /etc/group, something fails and we get ENOENT [21:43] that pastebin is really weird, [21:43] we are bind mounting over a bind-mount? [21:43] I will need to study it [21:44] no, technically we are mount --rbind /etc /tmp/.snap/etc [21:44] mounting tmpfs over /etc [21:44] and bind-mounting things back. one by one [21:44] at this stage /etc/group should be an empty file [21:44] but I think the code gets confused [21:44] but you said it's a bind-mount? [21:44] because we check if it's a file or directory [21:44] on the host [21:45] on core it seems to be [21:45] let me look on my dragon [21:45] hmm [21:45] it's not [21:46] it's squashfs there [21:46] * zyga wonders what's the google core-16 image then [21:46] we build the image [21:46] in the tests [21:46] mountinfo on google ubuntu-core-16 https://www.irccloud.com/pastebin/h9nNyYdf/ [21:47] mmh [21:47] ahhh [21:47] I see what's wrong now [21:47] 117 25 8:3 /system-data/var/lib/extrausers/group//deleted /etc/group rw,relatime shared:7 - ext4 /dev/sda3 rw,data=ordered [21:47] ! [21:47] after booting [21:47] something wiped /writable [21:47] and all hell broke loose [21:48] it's a very peculiar thing in mounts [21:48] I need to add code to detect that [21:48] and bail early [21:48] maybe then we'll know more [21:48] so....... [21:48] I would say that some of our tests killed /writable [21:48] this broke /etc in the real system [21:48] and then my code doesn't cope with a file being there but not really being there because it's a //deleted mount [21:49] I need to check the kernel what the semantics is then [21:49] but I bet a beer that this is broken prepare/restore [21:49] zyga: notice that we do special things to the image to get the spread user in it [21:49] let me find a pointer [21:50] zyga: we add special units that bind mount /var/lib/extrausers/x to /etc/x [21:50] I don't understand why /etc/group is the extrausers/group in this image but not in my dragon [21:50] ooooh [21:50] where is this? [21:50] man, [21:50] I didn't know that [21:50] one sec [21:50] sure, thank you! [21:51] I bet it's suite-level restore [21:51] as I only ran this test (layout) with -repeat [21:51] zyga: no, it's done with the image [21:51] is not a restore thing [21:51] zyga: https://github.com/snapcore/snapd/blob/master/tests/lib/prepare.sh#L388 [21:52] but it doesn't always fail [21:52] that is a different question [21:52] it must be the ordering or tests so that layout runs after something restored suite in the same system [21:52] but this operation is done with the image once [21:52] when I ran layout (and just layout) with repeat [21:52] but the mounts will happen at boot each time [21:52] yes, but it doesn't hurt normally [21:52] what is wrong is that something removed them [21:54] so reset_all_snap has this at the end [21:54] https://www.irccloud.com/pastebin/SPpOyIr1/ [21:56] -rw-r--r-- 1 root root 453M Mar 23 21:22 /home/gopath/src/github.com/snapcore/snapd/snapd-state.tar.gz [21:56] this doesn't feel right [21:56] 450MB? [21:56] ha [21:56] it's not tar.gz [21:56] it's tar [21:56] what? [21:56] some this is clearly wrong here [21:57] ? [22:03] it's probably mostly /var/lib/snapd/snaps that make up that size [22:14] zyga: I see a few tests that manipulate things in /var/lib/extrausers but nothing removes things totally [22:15] ah but [22:22] re [22:23] but? [22:23] https://github.com/snapcore/snapd/pull/4923 may help us find the culprit [22:23] PR #4923: spread.yaml: look for signs of //deleted mounts [22:23] PR snapd#4923 opened: spread.yaml: look for signs of //deleted mounts [22:23] I need to check what the kernel semantics is [22:24] but I think we are much closer to understanding the issue [22:26] where //deleted comes from https://www.irccloud.com/pastebin/jtTaEdO3/ [22:28] * pedronis -> rest [22:28] pedronis: let's catch up with this next week, have a great weekend