[00:31] hello [05:05] morning [05:30] PR snapd#5664 closed: interfaces: workaround for activated services and newer DBus [05:36] mvo: morning [05:37] hey mborzecki [05:37] mvo: regarding experimental.= we seem to have a problem there [05:37] mborzecki: uh, ok, tell me more [05:37] mvo: it's actually quite silly https://paste.ubuntu.com/p/tVGVZjstcm/ [05:38] mborzecki: ohhh, i see [05:39] mborzecki: its because of the bool i guess [05:39] mvo: yeah, should be a quick fix [05:39] mborzecki: funny, so it accepts null as false [05:39] mborzecki: but not "" [05:39] mborzecki: thanks [05:39] mvo: null == nothing to unmarshal, so a bool stays in its default value [05:40] mborzecki: aha, nice [05:40] mborzecki: wasn't aware of this [05:41] mvo: even funnier, the corecfg code unmarshals to a string and allows "" [06:23] PR snapd#5675 opened: overlord/snapstate: improve feature flag validation [06:23] mvo: ^^ [06:31] mvo: updated #5671 too [06:31] PR #5671: tests: basic test for parallel installs from the store [06:43] mborzecki: yay, thank you [07:00] good morning [07:00] I'm not really here, just checking which office to go to handle the car paperwork [07:01] zyga: good morning [07:01] mvo: that initialiazation is for the case when there is nothing to unmarshal or null (cause json, null is untyped :)) [07:01] mborzecki: aha, ok [07:01] zyga: hey, so you're off-really-off today or just regular off? [07:02] mborzecki: I'm swapping for the weekend [07:02] zyga: if the former we can chat about s-c on monday [07:02] mborzecki: but after I handle the paperwork I will return [07:02] mborzecki: and we can chat [07:02] or we can chat straight away now since I'm here [07:02] haha so the 'regular off' ;) [07:02] haha, so that's what you meant by "regular off" :D [07:03] I guess that's only fair :D [07:03] off-but-not-off === pstolowski|afk is now known as pstolowski [07:09] morning [07:23] pstolowski: heyah [07:31] moin moin [07:32] is the lxd snap busted? [07:32] google:ubuntu-16.04-32:tests/main/lxd seems to be failing [07:32] yeah [07:32] same here [07:32] Chipaca: pedronis: hellos [07:32] mborzecki: hi [07:32] hey pstolowski and Chipaca [07:33] mvo: o/ [07:34] Chipaca: I added some comments to the dump-db PR, I think a slightly more generic format for the output would be nice. I'm thinking about the field spearator, \ff is used right now [07:34] mvo: yep, and wotsisname said it'd be fine [07:34] Chipaca: do you think ":" is reasonable? or shall we go with something else? [07:35] mvo: an emoji would be frouned on, i guess [07:36] : is reasonable [07:36] Chipaca: heh, ok [07:40] mvo: I'm looking again at the changes in device_asserts.go and now I'm very confused [07:41] pedronis: can I help fix that somehow? [07:41] mvo: this should go away no: https://github.com/snapcore/snapd/blob/master/asserts/device_asserts.go#L248 ? [07:42] pedronis: yes, sorry, that was a oversight, let me kill it (with fire) [07:42] mvo: why is this here and not one level up: https://github.com/snapcore/snapd/blob/master/asserts/device_asserts.go#L163 ? [07:42] mvo: the new branch needs a similar check for gadget, no? [07:43] mvo: checkModel means check the "model" header [07:43] pedronis: indeed, let me fix that too [07:44] pedronis: plus some gadget track error tests are missing (which of course would have found the issue) [07:45] mvo: yea, I found these because I had the nagging feeling that something was missing in the new PR, it was too short :) [07:45] and so I went back to see what we did for kernel [07:46] pedronis: thanks for noticing! I will generalize it a bit, I get the feeling that this will come again [07:46] pedronis: should I split the PR up? [07:46] mvo: as your prefer [07:47] pedronis: ok, I think I do that then [07:47] pedronis: do you have an opinion about "Gagdget() SnapWithTrack" vs "Gadget() string and GadgetTrack() string" ? [07:52] mvo: I think I prefer the latter until we can do something outside of asserts [07:53] PR snapd#5669 closed: asserts,image: support gadget tracks in the model assertion [07:55] pedronis: ok, thats fine, its easy enough to fix later especially if/when we get support for this in snap install [08:03] PR snapd#5676 opened: asserts: add support for gadget tracks in the model assertion [08:03] pedronis: the first part -^ [08:10] mvo: reviewed [08:12] pedronis: yay, thank you [08:15] PR snapd#5654 closed: cmd/snap-confine: establish snap directory mappings for parallel instances [08:16] PR snapd#5677 opened: image: add support for "gadget=track" [08:16] PR snapd#5678 opened: snapstate: add support for gadget tracks in model assertion [08:17] mvo: this is are all for 2.35, right? [08:18] pedronis: correct [08:18] pedronis: I added tags now [08:19] so [08:20] selftest is failing in lxd in 16.04-32 [08:21] Chipaca: uh, what is the error? [08:21] Chipaca: I mean, what part of the selftest fails? [08:21] error: cannot start snapd: cannot mount squashfs image using "squashfs": mount: /tmp/selftest-mountpoint-487148902: mount failed: Unknown error -1 [08:22] lxd fails to start the first time with an error, and it restarts and doesn't print the logs leading up to the error either, which is suspicious [08:22] Aug 17 09:12:18 autopkgtest lxd.daemon[18103]: ==> Setting up persistent shmounts path [08:22] Aug 17 09:12:18 autopkgtest lxd.daemon[18103]: ====> Making LXD shmounts use the persistent path [08:22] Aug 17 09:12:18 autopkgtest lxd.daemon[18103]: ln: failed to create symbolic link '/var/snap/lxd/common/lxd/shmounts': No such file or directory [08:23] ohhh drat, I need to remove the lxd quirk in 18 [08:23] or maybe I did [08:24] the lxd quirk is applied on all classic systems [08:24] _hmmm_ [08:24] * Chipaca goes for more coffee [09:03] Chipaca: root@my-ubuntu:~# systemd-detect-virt --help [09:03] bash: /usr/bin/systemd-detect-virt: Numerical result out of range [09:03] Chipaca: that's inside lxc container [09:04] mborzecki: why does that start with bash:? [09:04] mborzecki: i mean, that's a bash error? [09:04] Chipaca: heh, beats me, no clu [09:04] yes [09:05] mborzecki: it's an error from bash [09:05] because [09:05] *EXECVE RETURNED THAT* [09:05] Chipaca: root@my-ubuntu:~# /usr/bin/systemd-detect-virt --container [09:05] bash: /usr/bin/systemd-detect-virt: Numerical result out of range [09:05] omg [09:05] execve("/usr/bin/systemd-detect-virt", ["/usr/bin/systemd-detect-virt"], [/* 12 vars */]) = -1 ERANGE (Numerical result out of range) [09:05] Chipaca: so if that fails, useFuse() => false, mount is done with -t squashfs which fails [09:05] root@my-ubuntu:~# mount -t squashfs $PWD/data.squashfs /mnt/ [09:06] mount: /mnt/: mount failed: Unknown error -1 [09:06] Chipaca: like this ^^ [09:06] do you have squashfuse installed ? [09:06] mborzecki: it gets more interesting [09:06] mborzecki: do a getcap of systemd-detect-virt [09:07] Chipaca: Failed to get capabilities of file `/usr/bin/systemd-detect-virt' (Numerical result out of range) ? [09:08] mborzecki: yes [09:43] stgraber: in 16.04 i386 (only), installing lxd from stable and launching an unprivileged container results in weirdness: /usr/bin/systemd-detect-virt fails to execve, returning ERANGE [09:50] sparkieg`: is that a typo for a german war on spas === sparkieg` is now known as sparkiegeek [09:50] Chipaca: heh, glitch in the matrix, combined with a non-friendly unique-naming scheme in my IRC client :) [09:51] sparkiegeek: you could've gone with 'yes' [09:54] guys, how abou we disable tests/main/lxd on *-32 until this is resolved? [09:54] Chipaca: ah, interesting, I was wondering if systemd-detect-virt coul fail when ways that weren't just I'm not a container [09:55] s/when/in/ [09:55] Chipaca: I might have even asked mvo at some point to put more defensive code around it [09:55] pedronis: I doubt it's systemd-detect-virt itself [09:55] pedronis: it never gets to have a say in the matter [09:56] pedronis: (the execve call fails) [09:56] Chipaca: ok, but our code anyway assumes it means we are not virtualized ? [09:56] yes, yes it does [09:56] that was more my point [09:56] anyway [09:57] we should probably bail there instead of assuming tbf [09:57] maybe we should bubble the error up [09:57] otherwise it's rather cryptic while anything fails at this stage [09:59] mborzecki: dunno, stgraber is often up really early [10:00] Chipaca: i can open a PR and we can close it if a solution is found soon(ish) [10:01] what's VGAuthService [10:02] nm [10:02] mborzecki: sure [10:07] damn, that test has a whitelist of systems [10:10] fun fact: byobu-config will lock up the whole everything [10:10] PR snapd#5679 opened: tests/main/lxd: run ubuntu-16.04 only on 64 bit variant [10:11] * Chipaca was looking to see if any other binaries failed to exec in the same way [10:18] Chipaca: anything else failed? [10:18] mborzecki: my patience [10:18] heh ;) [10:20] I should probably step away from the forum for a bit [10:24] anyone feels like looking at https://github.com/snapcore/snapd/pull/5614 ? [10:24] PR #5614: interfaces: parallel instances support, extend unit tests [10:25] PR snapd#5680 opened: [RFC] hotplug: handling of simple add/remove scenario [10:25] uh, what [10:26] (about byobu-config) [10:28] pstolowski: inside a snapped lxd, inside kvm, inside spread, running 'byoby-config --help /dev/null' locks the whole thing up [10:28] byobu* [10:29] finny [10:29] *funny [10:29] viry finny. hilirius, ivin [10:30] just checked the dictionary in case the word finny exists and means something. not in my dict ;) [10:30] pstolowski: 'abounding in fishes', fwiw [10:31] pstolowski: http://www.dict.org/bin/Dict?Form=Dict1&Query=finny&Strategy=*&Database=*&submit=Submit+ [10:32] aha [10:32] sergiusens: poing [10:51] why do people sell things they call "Ubuntu" with just random crap running as the kernel [10:51] >:-( [10:52] well, its a massive improvement ... the last openvz servers i saw (when deugging something similar together with zyga) was 2.x or early 3.x [10:53] surprisingly openvz finally supports 4.x kernels :) [11:00] ogra: that's why snapd ships a test squashfs :-) [11:00] ogra: https://github.com/snapcore/snapd/blob/master/selftest/squashfs.go#L55 [11:01] Chipaca: So looking at snapshotstate, the last missing point is the last id name [11:01] Chipaca: "last-snapshot-set-id"? It's a mouthful, but has precedence in other options [11:01] niemeyer: what do you mean? [11:02] Chipaca: The "snapshots.last-id" thing, and the comment from me and pedronis in the PR [11:02] ah [11:02] snapshots.last-set-id is what it is now [11:04] which seems alright to me, if we need to add more info about snapshots it won't be out of place in there [11:04] dunno [11:07] niemeyer: I think both approaches are fine (I mean: snapshots.last-set-id is fine, and a toplevel last-snapshot-set-id is fine; anything more structured and I'm going to call YAGNI on it) [11:07] exactly which names are best, I don't know [11:08] Chipaca: I think pedronis had a point about "snapshots" generally being a map of actual snapshots per other cases [11:08] Chipaca: And we indeed have the last-foo-bar-id case already in other places [11:09] last-refresh, last-refresh-hints, last-change-id, last-task-id [11:09] "ubuntu-core-transition-last-retry-time" [11:10] /o\ [11:12] did we figure out more about the lxd issue btw? [11:12] mvo: stgraber: in 16.04 i386 (only), installing lxd from stable and launching an unprivileged container results in weirdness: /usr/bin/systemd-detect-virt fails to execve, returning ERANGE [11:12] Chipaca: heh, woah, ERANGE [11:12] mvo: getcap of the file _also_ fails with ERANGE [11:13] mvo: so we're about to learn something about _something_ [11:13] Chipaca: what a surprising error [11:13] Chipaca: yeah, its amazing [11:13] Chipaca: I've +1d assuming that's tuned per agreement.. someone else needs a final +1 too [11:13] pedronis ^ [11:14] man, i'm shaking [11:14] whoa [11:14] ok [11:14] niemeyer: thanks [11:14] Chipaca: mvo: yes, ERANGE usually makes me think of math libraries [11:14] I didn't expect the emotional response from myself ¯\_(ツ)_/¯ good thing i'm going on holidays next wednesday :-) [11:14] Chipaca: uhhhh, snapshot going in? === pstolowski is now known as pstolowski|lunch [11:15] mvo: got a +1 from niemeyer [11:15] pedronis: heh, exactly [11:15] ooooh, somebody's jealous :-p [11:15] Chipaca, dmesg -H means he needs to understand that he is in a pager ... i'd have suppressed the -H [11:15] ogra: maybe TERM isn't set or something stoopid [11:16] or that, yeah [11:16] mvo: sorry for being annoying about context, but is really mostly meant to have a place talk back to itself or connected places talk between unrelated or user layers [11:17] I mean it's Value feature [11:18] niemeyer: yes either me or mvo need to do a 2nd pass of snapshotstate [11:20] any consistent reason why all newish PR seems to be red ? [11:20] pedronis: lxd [11:20] pedronis: 16.04-32 lxd errors [11:21] the ERANGE issue ? [11:21] pedronis: yes [11:21] EDERANGED [11:21] fun :( [11:21] ERANGE is not even listed as a return value for execve [11:21] :-) [11:22] (but it's probably something to do with the xattrs) [11:22] if I had to guess, I'd guess that [11:22] because systemd-detect-virt is one of the very few files in 16.04 that uses caps (via xattrs) [11:22] in fact, i should look at the other ones, d'oh [11:22] * Chipaca does that [11:24] pedronis: dunno if you noticed but mvo wasn't online when you were apologising to him [11:24] no [11:24] pedronis: maybe you just wanted to get it off your chest :-D [11:25] https://github.com/snapcore/snapd/pull/5679 shall we pull the trigger? [11:25] maybe I didn't complete his nick [11:25] PR #5679: tests/main/lxd: run ubuntu-16.04 only on 64 bit variant [11:28] "The linux beginners course with ogra and Chipaca" [11:28] session 1 ... [11:28] :) [11:29] ogra: "If you survive with both your kidneys, [...]" [11:29] lol [11:30] ogra: as an aside, what on earth have they done to that poor "Ubuntu" that dmesg doesn't work [11:31] good question [11:32] lol [11:32] "no entries" [11:32] probably he run no kernel at all !!!! [11:33] ogra: it's secretly just running WSL [11:38] YESS [11:38] it's the capabilities [11:38] mtr fails in the exact same way [11:39] as does traceroute6.iputils [11:40] Chipaca: what's up? [11:41] Chipaca: should I merge 5679 or is the solution so close that its not worth adding the workaround? [11:41] mvo: NFI about the solution -- merge away [11:41] sergiusens: WHY DIDN'T I GIVE MYSELF MORE CONTEXT WITH THE PING :-( [11:41] sergiusens: now I have _no_ idea what it was about [11:41] it was, like, six context switches ago [11:42] PR snapd#5679 closed: tests/main/lxd: run ubuntu-16.04 only on 64 bit variant [11:42] sergiusens: I hope I'll remember and ping you again [11:42] oh! [11:42] sergiusens: i remembered :-D [11:42] sergiusens: were you aware of 'snap watch --last=auto-refresh?' [11:43] sergiusens: coming in 2.35 [11:45] Chipaca: yes we are https://github.com/snapcore/snapcraft/blob/master/snapcraft/internal/build_providers/_snap.py#L263 [11:45] but I think I might want to disable refreshes for a lot longer to not get killed mid build :-) [11:45] sergiusens: no no [11:45] sergiusens: the question mark at the end of the change type [11:46] sergiusens: means "no error if none found" [11:46] sergiusens: or from --help, “A question mark at the end of the type means to do nothing (instead of returning an error) if no change of the given type is found.” [11:46] Chipaca: oh, then I can get rid of the suppress. I must say though that that syntax is hard to spot given you phrased the sentence as a question :-) [11:47] sergiusens: mbuahaha [11:47] sergiusens: (sorry) [11:47] and I did an improper quote match [11:47] tut tut [11:47] * sergiusens needs to put his glasses on [11:47] sergiusens: anyway, 2.35+, so you probably can't use it yet [11:47] Chipaca: still good to know! [11:47] sergiusens: but, with our conversation abour --check-skeleton from the other day i thought i'd call it out to you [11:48] all is good :-) [11:51] ogra: wasn't there a file in proc to tweak the kernel log level? we could ask this person to try that [11:52] Chipaca, not sure if in /proc ... there is a sysctl setting you can apply though [11:52] mvo, this is an interesting one https://forum.snapcraft.io/t/set-system-proxy-from-custom-snap-service/6926 [11:52] ogra: sysctl is just writing to /proc/sys/ [11:52] ogra: :-) [11:52] ah, indeed [11:53] so yeah, there is one, i just dont know the node then ;) [11:53] ogra: but yes better to present it with sysctl [11:53] should be something about log_level [11:54] ogra: sys.kernel.printk ? [11:54] Chipaca, /proc/sys/kernel/printk [11:54] yeah [11:54] heh [11:55] $ cat /proc/sys/kernel/printk [11:55] 4 4 1 7 [12:04] ogra: I did not point them to https://i.imgur.com/Pfr9dj0.jpg ! I think I deserve a cookie. [12:05] * ogra hands Chipaca a well deserved cookie :D [12:06] i wonder if he paid for that carp ... [12:06] ogra: at lunch time? are you mad?!? [12:06] *crap [12:06] :-) [12:06] hahaha [12:06] ogra: 8GB of ram? you betcha it was paid for [12:06] although given they said it was Ubuntu and it wasn't, maybe it's "8GB" of "RAM" [12:07] (actually just a big swap file on a 1GB netbook) [12:07] haha [12:12] zyga: snap-update-ns is looking good, did a change, it's actually surprisingly simple [12:12] woot, that's great [12:14] zyga: you know, i might have screwed something up there too :P [12:14] I'll check next week :) [12:15] zyga: hey, fyi, responded to https://github.com/snapcore/snapd/pull/5644 [12:15] PR #5644: interfaces: add audio-playback/audio-record and make pulseaudio manually connect [12:15] jdstrand: thank you [12:15] jdstrand: I'm swapping today, doing office move and legal paperwork [12:15] zyga: https://paste.ubuntu.com/p/YZ7w7RKw5d/ mountinfo (does not include $SNAP_USER_DATA yet) [12:15] zyga: np. I suspect you'll just agree with me and ack [12:16] I'll look quickly now :) [12:16] zyga: but by all means, exercise your day off :) [12:16] PR snapd#5675 closed: overlord/snapstate: improve feature flag validation [12:16] Chipaca: fyi, I think that hostnamectl issue will be resolved if the PR merges from trunk [12:17] Chipaca: and hi :) [12:17] jdstrand: I was assuming as much :-) hi [12:18] jdstrand: excitement now is about lxd on 16.04-32 being unable to execve files that use capabilities [12:19] * Chipaca ~> lunch === pstolowski|lunch is now known as pstolowski [12:23] pedronis, Chipaca https://imgur.com/a/ZFNu5pV ... finally able to reproduce ... capturing logs now [12:23] jdstrand: +1 [12:24] zyga: thanks :) [12:24] marked as such on the PR [12:26] ogra: aha, you reproduced the shutdown hang? [12:28] damn, the trackpoint left click in my x220 is starting to fail :( [12:29] mvo, yeah [12:30] mvo, pedronis Chipaca https://pastebin.canonical.com/p/DGKBDMzQ2r/ logs (filtered out binder and anbox audit messages since they ake it unreadable) [12:30] *make [12:33] internal shutdown seems correct [12:33] so it would some handshake with systemd problem [12:34] seems we get a sigterm but don't do the right thing: snapd.service: State 'stop-sigterm' timed out. Killing. [12:36] * mvo wonders if anything has chnaged here [12:37] mvo: well it might have been like since a while [12:38] mvo: it's related to the waiting we do on reboot and signal unhandling [12:38] note this is all edge plus a devmode daemon (anbox) though i see the daemon being killed several lines before the snapd timeout shows up [12:38] (also not sure if a misbehaving snap could actually make snapd not stop) [12:38] mvo: we also added the watchdog [12:39] pedronis: aha, thats a good one [12:39] pedronis: Aug 17 12:19:55 localhost.localdomain snapd[1005]: daemon.go:577: Waiting for system reboot ? [12:39] mborzecki: ? [12:39] isn't snapd waiting in a long sleep here? [12:39] yes [12:39] as I said signal handling is not quite right over shutdown [12:39] so any signals are not really handled [12:39] yes [12:39] what I'm trying to say [12:39] not sure it's related to the timeout though [12:39] if it's really much later [12:41] mvo: we need to call Reset or Stop for the signal handling we setup in main.go, not sure exactly where [12:42] if you want to repro: create a qemu VM with an image from edge ... leave it off over night so there is a new core ... make sure to start it only after core has updated in the store... boot it and watch it to do an auto-refresh with that hang [12:42] i have never seen it when doing a manual refresh [12:44] pedronis: i'd assume it's related, then things start to make sense, term get queued, if systemd gets a request to restart the process it would make sense to ignore it since the system is going down anyway, systemd timeouts waiting for snapd to exit, snapd would timout waiting for reset :) [12:45] well, not from the logs it seems systemd then kills snapd [12:46] so snapd doesn't timeout [12:46] yeah, I'm puzzled [12:46] ogra timeout is systemd saying something about snapd again [12:46] but maybe I'm confused [12:46] if snapd does not handle sigterm correct I should be able to simulate this by simply kill -TERM $(pidof snapd) [12:46] and that exist normally [12:46] mvo: this is about reboot mode [12:46] not normal running snapd [12:47] pedronis, i'm seeing exactly whats in the imgur png [12:47] we are inside daemon Stop at that point [12:47] mvo ^ [12:47] pedronis: ok [12:47] I mean, I know what we need to do about sigterm (except exactly how to place it) [12:47] not sure it fixes what ogra sees [12:48] pedronis: that makes sense, we are in stop at this, so yeah [12:49] ah, it says A Stop Job so yes, it's related [12:49] pedronis: sorry to sidetrack you, but did you see Gustavo's comment about last-snapshot-set-id? [12:49] yes [12:49] pedronis: +1? [12:49] I agree with that [12:49] pedronis: k [12:56] mvo: something like this perhaps (untested): https://pastebin.ubuntu.com/p/rYWJ8PR5HH/ [12:58] pedronis: yeah, that looks sensible [13:13] PR snapd#5318 closed: interfaces/builtin: add new cuda-support interface [13:42] mvo: I can try to cut a PR from that pastebin on monday I suppose unless you want to [13:44] pedronis: I can look at this while waiting for travis to biuld my gadget-track PRs [13:44] ok [13:44] pedronis: this mostly needs tests, right? the feature itself looks reasonalbe [13:45] yes, some kind of tests (not sure it's easy to test) [13:46] pedronis: yeah, it looks tricky, maybe I can manage some sort of integration test at least, I will poke a bit at it [13:47] a 2nd review for #5676 would be great [13:47] PR #5676: asserts: add support for gadget tracks in the model assertion [13:48] yeah, was about to ask for this - should be straightforward (now) === cory_fu_ is now known as cory_fu [13:55] stgraber: ping? [13:56] Chipaca: pong [13:56] stgraber: I don't know if you saw that we're having issues with lxd since the release yesterday? [13:57] Chipaca: I just read your comment about systemd-detect-virt on i386, will take a look [13:58] stgraber: it's any executable that uses capabilities [13:59] stgraber: systemd-detect-virt, mtr, and traceroute6.iputils [13:59] stgraber: but, on 64 bits, snapcraft is also having trouble with snapd dying, that goes away with switching to 3.0 [14:00] Chipaca: what kernel are you on? [14:00] stgraber: e.g., on travis, «apt install snapd; snap install snapcraft --classic; snap install lxd; mkdir project; cd project; snapcraft init; snapcraft cleanbuild» fails [14:00] not sure what travis is using [14:00] I'm on 4.4.0-131-generic; mborzecki is on something newer I think [14:01] and the spread runs on 16.04-32 are on fresh cloud images, whatever's shipped there [14:01] (I can track it down if it's important) [14:01] i was on whatever we used in spread instances [14:02] oh wait my machine is 4.4.0-131-generic but the test was run in kvm [14:02] I'd have to check what was there :-) [14:02] * Chipaca checks [14:03] i'll start the test and check what's used in gce [14:04] stgraber: 4.4.0-133-generic [14:04] running on a 4.4.0-131-generic [14:07] Chipaca: reproduced the issue [14:08] stgraber: should we be installing lxd from candidate in our integration suite, to catch this family of errors before they hit stable? [14:09] Chipaca: that may be useful, yeah, this is definitely related to fscaps in this case so our own tests didn't trip on that [14:09] stgraber: also mborzecki noticed that a privileged container didn't have this issue [14:10] yeah, that part would be expected [14:10] ok [14:10] stgraber: does it only failing on 32 bit intel make sense to you? [14:11] (this particular failure i mean -- the snapcraft one i am yet to dig into) [14:12] Chipaca: no, got it failing the same way on amd64 [14:13] that's very strange [14:13] that's good news, right? [14:13] mborzecki: how are our tests working on amd64? [14:13] Chipaca: they are 'passing' [14:15] mborzecki: that's my point: if the bug is there, they shouldn't [14:15] mborzecki: I'll dig [14:15] stgraber: Chipaca: got a debug shell on i386, do you want to check anything? [14:15] tracked it down and working on a fix now [14:15] stgraber: what's the issue? [14:16] tl&dr is your kernel doesn't support unprivileged file capabilities, yet it lets us write an xattr that uses that new v3 fscap format [14:16] but then blows up when reading the file [14:16] stgraber: heh, nice [14:16] stgraber: so it's a bug in the image itself? [14:17] s/bug/regression caused by a change/ [14:18] Chipaca: it's a combination of things, LXD 3.4 introduced logic to remap file caps rather than just strip them and unsquashfs-tools was fixed yesterday to not drop xattrs in 16.04 [14:19] Chipaca: so even in our candidate and edge channels, everything was good until the last snap rebuild yesterday which picked up the fixed unsquashfs [14:19] and most of our manual testing is done on 4.15 kernels which do support the v3 caps or on 4.4 systems with -proposed enabled that again do support v3 caps (next kernel SRU has a backport of the feature) [14:19] MAGIC [14:20] sigh [14:20] I'll push a pr that adds --candidate to the lxd integration test [14:20] and even though we do have snap tests that use the broken kernels, our test image doesn't use file caps (it's just a tiny busybox image) [14:21] stgraber: we woudln't've caught it if we hadn't just happened to be using one of the three binaries that have caps in xenial [14:22] Chipaca: yeah, I expect that detect-virt is what's going to break most users so trying to rush a fix now [14:29] stgraber: as soon as the fix is in candidate let me know, and I'll push a pr that turns the test back on and fetches lxd from candidate [14:32] mborzecki: I'm guessing amd64 just happened to get a newer kernel or something [14:32] dunno [14:35] Chipaca: ubuntu-16.04-32 uses 4.4.0-131-generic [14:35] mborzecki: I got 133 here, but ok [14:36] and indeed if I rebooted I'd have 133 as well [14:36] mborzecki: mirror sync [14:43] Chipaca: ubuntu-16.04-64 image users different kernel indeed, it's 4.13.0-1019-gcp [14:44] haha, hehe. Ha ha ha ha, he he he he [14:44] mborzecki: ok [14:44] big difference [14:44] so there was a little tiny window of hitting the bug, and we *nailed* it [14:44] :-) [14:45] flawless victory [14:45] next question: should we make sure -32 and -64 are running similarish kernels? [14:45] heh :) [14:45] cachio__: ^^ ? [14:47] mborzecki, yes [14:48] mborzecki, I think the problem with the function [14:50] is that "if apt-cache policy "linux-image-extra-$(uname -r)" >/dev/null 2>&1; then" is going always through the then [14:50] with match or without [14:52] cachio__: yeah, something to figure out [14:52] the current pr will effectively switch the kernel to something else [14:53] cachio__: i have 4.13.0-1019-gcp and the proposed code wants to install linux-image-extra-4.13.0-31-generic, this will pull in the non-gcp kernel [14:54] mborzecki, yes, but that linux-image-extra-4.13.0-31-generic works well with the current kernel [14:54] mborzecki, at least the tests pass [14:55] so, what should we use when there is no linux-image-extra pkg for the current kernel? [14:58] cachio__: i see that there's linux-module-extra for the newer kernels, and linux-image-extra seems to be gone [14:58] cachio__: the real issue is that apt-cache policy returns 0 :( [14:58] mborzecki, yes [14:59] damn, and so does apt list [14:59] let me try installing linux-modules-extra-4.15.0-1017-gcp and running the tests [15:00] need to look at output it seems [15:00] cachio__: can you replace apt-cache policy with apt-cache show? [15:02] mborzecki, that works [15:05] cachio__: something like this perhaps https://paste.ubuntu.com/p/QVSbPQcqWC/ [15:06] mborzecki, yes, makes sense [15:07] niemeyer: in spread, is there a reason why I can't say “systems: [ubuntu-*, -ubuntu-14*]”? [15:08] mborzecki, I pushed that [15:08] I'll test this with the new gce images to see if it works [15:14] PR snapd#5681 opened: overlord: handle sigterm during shutdown better [15:14] pedronis: ^- this is your earlier pastebin with tests, hope the tests make (some) sense, its a bit tricky [15:14] ogra: ^- this should hopefully fix the reboot timeout [15:14] ogra: I mean the wait [15:15] a second review for 5676 and 5677 would be great, should be easy [15:25] mvo, awesome, will happily test once it landed [15:26] is there a way to speed up snap building if I already have all the .deb dependencies installed in the system? [15:27] mvo, btw, kyle told me it is likly he had the SD mounted when dd'ing to it ... (yay, nautilus) ... you really need to promote godd more ;) [15:27] I have a full pipeline on Jenkins now, which runs inside a Docker image that has all the dependencies. Still, the snapcraft step takes a long time because it is re-downloading all the deb files [15:28] PR snapd#5676 closed: asserts: add support for gadget tracks in the model assertion [15:28] mvo: #5678 has conflicts now, merging master into it would be nice either way [15:28] PR #5678: snapstate: add support for gadget tracks in model assertion [15:30] pedronis: yeah, I updated it [15:30] pedronis: and also the snapstate one [15:31] pedronis: when you say "before adding the close probably it would have worked anyway" in the review of 5681 - do you mean that without the close the test I added would behave the same? or am I misunderstanding? [15:31] pedronis: I will add the nil checks, I like that [15:31] mvo: no, I mean the code would not have explode passing nil in, but with the close it does [15:32] pedronis: aha, got it - will fix it [15:32] Chipaca: The logic is simpler.. you start with the fixed list for the file at hand, and can either add or drop.. it's not sequential [15:32] pedronis: I just need the close for my test, I could do it differently but this looked most natural (next to sending sigkill itself in the test which does not work) [15:32] mvo: the close is ok, but I think making sigCh optional is also natural [15:33] Chipaca: At least from my vague memories.. it's been a while [15:33] pedronis: yeah, we are in agreement :) [15:34] pedronis: updated, thanks for the feedback [15:37] mvo: I would have put the if around the whole bit using sigCh, Stop supporting nil works but is not super clear from the docs why it should [15:38] pedronis: indeed [15:47] mvo: any idea what failing to reset devices.list means? [15:49] Chipaca: do you have more context [15:50] what is devices.list [15:51] Chipaca: is that an lxc error? related to the lxd devices cgroup? [15:51] Chipaca: or to our deivices cgroup? [15:55] PR snapd#5660 closed: wayland: add extra sockets that are used by older toolkits (e.g. gtk3) [15:55] sergiusens: how often do you normally do a release to stable of the snapcraft snap? Do you spend any time in beta/candidate? it looks like the current set of snaps is stable==candidate==beta, so I'm guessing you just do testing in edge? [15:56] sergiusens: the reason I ask, is we'd like to do some testing of our snaps with upcoming releases of snapcraft, trying to figure out which channel would be the best to pull from [16:02] plars: when we tag, we put the snap in beta once it passes all our automated tests, then we do internal testing and if it all works we move it to candidate and make an ANN for a call for testing [16:09] sergiusens: ok cool, how long do you normally give it in candidate? sounds like that's where we should aim for [16:19] mvo: I'm afraid we moved on, but yes related to lxd (not sure if it's lxd or snapd printing that) [16:29] pedronis: are you around? [16:31] he EOW'd a bit ago [16:32] ah well [16:33] pedronis: it was to avoid having to reflash a device to undo a serial assertion, but it'd only save us ~5 minutes :-) [16:33] noise][: thanks === pstolowski is now known as pstolowski|afk [16:44] is there a way to cache a snapcraft pull when I run it in fresh docker images on Jenkins? [16:44] that stage always takes 10 minutes of installing new deb packages [16:45] :%s/fresh docker images/fresh docker containers/ [16:54] plars: 3 days, but we can negotiate on that with something reasonable (like a week) [16:55] s/reasonable/expectable/ [16:59] t1mp, two thoughts: 1) If we're talking build-packages, generate a docker image with them already installed and use that, should speed things up. 2) if we're talking stage-packages, snapcraft caches them in ~/.cache/snapcraft/, you can preserve that between runs [17:00] Consider that each of those steps makes the build less "clean" [17:01] Chipaca: having dinner [17:02] kyrofa: thanks. That's useful info. I'm snapping a python app, so I don't really need to build anything. [17:03] kyrofa: maybe it would be nice to have a 'snapcraft create-cache' command that downloads the .debs. I can include that in my Dockerfile since it doesn't change often. [17:04] basically I only need the 'dump' plugin to copy some files that are already generated in previous steps (using PyInstaller) [17:04] but setting up the stage takes 20 minutes :( [17:05] t1mp, I don't quite understand. If you're only using the dump plugin, why is snapcraft fetching a bunch of stuff? [17:06] sergiusens: I don't have any concerns about that. 3 days sounds reasonable. I don't expect we would normally have any problem [17:10] kyrofa: it has dependencies, see https://pastebin.ubuntu.com/p/tbvGycDvvt/ [17:11] Ah, the remote part is probably taking a chunk of time [17:12] yes, and it is repeating every time I make a small change somewhere [17:12] I really only need to build the snap before publishing a new version, but for now I'm building it for each commit to make sure I don't break it. And to test that the snap building works properly [17:12] t1mp, every time you make a small change you fire up a clean docker container? [17:13] Ah, commits, okay that's fair [17:13] yes, Jenkins does [17:13] for each push [17:15] t1mp, if you run `snapcraft define desktop-gtk3` you'll see that it's mostly stage-packages as well [17:15] t1mp, stage-packages are not installed on the host, which means creating a new docker image with them installed won't help you [17:15] t1mp, but you can try preserving that cache between runs (point (2)) so they don't need to be fetched again [17:15] kyrofa, t1mp: you can mount the cache directory into the container [17:16] sergiusens, you're stealing my thunder [17:16] exactly that [17:16] I realized that :) I meant to have an image that includes ~/.cache/snapcraft. That would be kind of ugly though. Probably better to mount it. [17:16] Indeed [17:16] yeah, that (thunder) too :-) [17:16] :D [17:16] sergiusens, kyrofa: right :) [17:17] that caches only the deb files right? They still need to be installed [17:17] t1mp, they're just unpacked into the snap, yeah [17:17] Should be quick compared to fetching them [17:19] yeah pull takes 3 minutes now https://pastebin.ubuntu.com/p/PJqfM6ndQW/ [17:19] (the total time went down from 20 to 5.. I think the server was overloaded before) [17:19] err.. to 8 min, not 5. :) [17:20] I'll look into keeping the cache. On Monday :) [17:20] t1mp, we'll be here! [17:20] great, thanks :) [17:20] Have a lovely weekend [17:21] you too [18:14] Chipaca, sergiusens: we have a fix (https://github.com/lxc/lxd/pull/4943), should land upstream in ~30min, then in snap another 30min or so later [18:14] PR lxc/lxd#4943: shared/idmap: test fcaps support [18:14] thanks for the update [19:31] stgraber: so I could push the PR about now? [19:32] Chipaca: yeah, we should have the fix in candidate in the next 10min or so [19:32] Chipaca: once Jenkins is happy on our side, I'll promote to stable [19:32] stgraber: the pr I'd push would bring it back for i386, but also pull from candidate [19:33] stgraber: (is candidate the right place to pull from?) [19:34] Chipaca: the issue isn't arch-specific though so you shouldn't make anything specific to i386 [19:34] yeah, we only use edge, candidate and stable [19:34] stgraber: the difference is our i386 images have 4.4.x, whereas amd64 has 4.13.x [19:34] stgraber: so the issue only manifested on i386 [19:35] Chipaca: looks like we're seconds away from having the fix in candidate now [19:35] that snap takes so long to build... [19:36] 52min for that build [19:36] stgraber: via snapcraft? [19:40] Chipaca: yeah on LP, but it's not a simple snap :) [19:41] PR snapd#5682 opened: tests/main/lxd: pull lxd from candidate; reënable i386 [19:42] stgraber: tadaa [19:52] sergiusens, Chipaca: promoting to stable now [19:52] woop [20:19] Thank you, stgraber