=== NickZ3 is now known as NickZ [01:52] PR snapcraft#2950 opened: meta: Snap to_dict() cleanup [05:14] PR snapd#8148 closed: overlord/configstate/configcore: add support for backlight service <⛔ Blocked> [06:08] morning [06:35] school run [06:38] PR snapd#8160 opened: overlord/configstate: add backlight option [07:12] re [07:33] mvo: hey [07:38] mborzecki: good morning [07:50] good morning [07:50] hmm looks like my git filter-branch incantations on #8156 were not good enough [07:50] zyga: hey [07:50] PR #8156: [RFC] cmd/snap-bootstrap: subcommand to detect if we want a chooser to run [07:51] hmm [07:51] didn't I review https://github.com/snapcore/snapd/pull/8160 already? [07:51] PR #8160: overlord/configstate: add backlight option [07:51] or something just like it? [07:53] zyga: I think the config var got tweaks [07:54] meanwhile master is still red [07:54] Ian sent a fix but it seems to be insufficient or also broken somehow [07:55] hmm, what is curious is that the failure is only present on core18 [07:56] I wonder if that's because of the extra logic on how snapd is started there [08:03] zyga: I run a debug session now [08:03] good morning pstolowski [08:03] morning [08:03] hey Pawel, good morning [08:03] zyga: it's a bit annoying, also it's super unclear why it's starting now [08:03] pstolowski: morning! [08:04] zyga: actually - new systemd in bionic since 2020-02-17 [08:04] ohhh [08:04] zyga: I think this is roughtly when the trouble started? [08:04] that's a good trail [08:04] yeah, let's see the changelog [08:05] zyga: we could run core18 tests with stable core18, that still has the old version [08:05] wow, that's true, that's brilliant [08:05] if that passes we know for sure it's the base OS that changed [08:05] zyga: the changelog has a smoking gun [08:06] reading it now [08:06] og [08:06] OnFailure job something [08:06] don't trigger ONFailure [08:06] - Only trigger OnFailure= if Restart= is not in effect (LP: #1849261) [08:06] Bug #1849261: Update systemd for ubuntu 18.04 with fix for interaction between OnFailure= and Restart=

[08:06] [08:06] zyga: exactly, this looks super suspicious [08:06] zyga: also wrong, I mean, if it fails to restart for n times we need to go into failure [08:06] anyway [08:07] hmm hmm hmm [08:07] mvo: but it passes in 20.04 [08:07] so something changed there [08:08] the patch is https://github.com/systemd/systemd/pull/9158/files [08:08] PR systemd/systemd#9158: trigger OnFailure= only if Restart= is not in effect [08:08] I'll check if that got changed further down the line [08:08] zyga: cool, thank you [08:08] zyga: I think this is it, kind of annoying [08:08] zyga: but at least we know now what to do [08:08] mvo: the test does run on 20.04, right? [08:09] zyga: lol - no [08:09] zyga: it has a TODO:UC20: "does not work for unknown reasons" [08:09] pfff [08:09] hahahaha [08:09] so it's broken [08:09] and now we need to live with it [08:10] ok [08:10] zyga: well [08:10] zyga: let me try something [08:10] the code didn't change since that patch, systemd master still behaves the same way it does in 18.04 [08:10] ok [08:10] so [08:11] I think we need to change how snapd restarts [08:11] if we limit that to a fixed number [08:11] zyga: I hope we just need to change the startlimitinterval [08:11] then on failure will trigger again [08:11] no, not the interval [08:11] zyga: then the failure will be triggered again [08:11] zyga: no? [08:11] we need to allow snapd to really fail [08:11] that's how I understand this fragment: [08:11] https://www.irccloud.com/pastebin/phqUSrfE/ [08:11] let me read the diff again [08:12] zyga: right, my understanding is that once we hit the restart interval it will actually go into failed state and then the OnFailure is run [08:12] zyga: but I just infered this, did not read the patch [08:12] we need to check [08:13] I mean [08:13] just do [08:13] zyga: mvo: what I think is the best fix to fix this with the new systemd is 2 things, 1 nfs-support test needs to also reset-failed on snapd.socket and 2 snap-failure needs to reset-failed before trying to start the socket unit [08:13] https://github.com/systemd/systemd/pull/9158/files#diff-a89a9b6f80aada989d298b4c2c3a9d64R2435 [08:14] PR systemd/systemd#9158: trigger OnFailure= only if Restart= is not in effect [08:14] so, as long as the service will auto restart [08:14] we don't get failed state [08:14] ijohnson: good mornign! [08:14] (it's kind of early, wow :) [08:14] Yeah couldn't sleep but probably will try to go back to bed shortly [08:15] ijohnson: woha, hey [08:15] ijohnson: good evening :) [08:15] * mvo hugs ijohnson [08:15] mvo: indeed :-) [08:15] ijohnson: hey! already adjusting to CET timezone? [08:16] mvo: I need to handle an errand at home, I'll be back in 20 [08:16] zyga: no worries [08:16] zyga: testing a patch now [08:16] Anyways snap-failure needs to be a bit more robust about starting the snapd.socket service anyways [08:16] ijohnson: thanks, that's good to know [08:17] Because cachio ran into another problem with this test that happens because the socket is still lying around [08:17] ijohnson: I'm poking at it now, a bit annoying since it takes forever to run each test [08:17] Haha yes that was most of my day yesterday waiting for spread and fixing other random things in the meantime [08:18] mvo: but also see my comment about the nfs-support test, it calls reset-failed on _only_ snapd, it should also call that on snapd.socket too [08:22] stepping out for a bit to get the papers to my accountant, and then will probably work from some place in the city [08:34] PR snapd#8161 opened: tests: set StartLimitInterval in snapd failover test [08:36] ijohnson: thanks, I check these area too but it seems the current issue is about the StartLimitInterval [08:36] re [08:41] there's some chaos at home today [08:41] I'm heading out [08:41] mvo: what I was trying to say is that the nfs-support test will start failing too when you make adjustments to the StartLimitInterval that are favorable to snapd-failover because that test is flaky due to not resetting the failed count of the snapd.socket service [08:41] don't want to be a part of this [08:41] (family life and living with parent-in-law) [08:43] * mvo hugs zyga [08:44] ijohnson: aha, thanks. my current approach was to adjust the StartLimitInterval only in this one test [08:44] ijohnson: i.e. just enough to get us on our feed again, not at all against fixing more [08:45] Ah okay nvm me then [08:47] ijohnson: let's talk some more later, I really don't want to keep you from sleep :) [08:47] It's fine no worries [09:08] mvo: re [09:08] mvo: small observation [09:08] mvo: perhaps we want to consider a spread suite that does run in autopkgtest and gates classic systemd [09:09] mvo: we would catch this if that test was a part of that set [09:09] mvo: something to consider after this is resolved [09:10] mvo: I'll work on my tasks from Jamie's review - please pull me into a review once you have the fix [09:10] mvo: and it's also a good good catch about this test, if we had disabled that test and released a new core we could really have a bad day [09:11] zyga: it's an interessting idea, we did have autopkgtest for snapd but it was terrile unreliable [09:11] zyga: we would have caught this thought [09:11] yeah, perhaps we can turn some knobs to make it better though [09:11] dunno [09:11] * mvo nods [09:13] totally offtopic, I noticed that systemd --user runs for gdm [09:13] we should probably not run snap services for gdm [09:15] mvo: hi, is it expected there's no standup in the calendar today? [09:16] pedronis: not expected, uh, let me fix this [09:16] pedronis: thanks for catching that [09:23] #8130 needs a 2nd review [09:24] PR #8130: overlord, state: don't abort changes if spawn time before StartOfOperationTime (2/2) [09:27] PR snapd#8162 opened: features: enable "parallel-installs" by default [09:29] yay [09:29] mvo: that's for 2.44? [09:30] mvo: that's great to see, thank you! [09:32] mborzecki: that's something to discuss [09:32] mborzecki: 2.44 means it's ready for 20.04 and most likely on the CD [09:32] mborzecki: which is great [09:33] mborzecki: but also risky [09:34] ayy ;) [09:34] mborzecki: let's talk at the standup [09:34] mvo: ok [09:34] also - amazon linux is failing to install packages right now :( [09:35] mvo: duh, got logs? [09:36] mborzecki: yes, on sec [09:36] mborzecki: I move it to unstable for now, look like some repo issue [09:36] mborzecki: probably transient [09:37] mborzecki: https://travis-ci.org/snapcore/snapd/jobs/652883253?utm_medium=notification&utm_source=github_status [09:39] yay, 8161 unbreaks the snapd-failover test [09:39] mvo: we really need to do something about aliases before turning on parallel installs by default [09:39] mvo: yeah, looks like inconsistency in amazon's repos [09:40] mvo: and kinda basic, iproute, iptables :/ [09:40] PR snapd#8162 closed: features: enable "parallel-installs" by default [09:40] pedronis: ok, closing this again then until we have time for this [09:41] mborzecki: yeah, that was my impression as well, hopefully transient [09:41] pedronis: thanks for the review [09:41] mvo: yea, sadly it needs more work, but I think the current experience is suboptimal [09:41] and we turn it on we are stuck with the behavior forever [09:45] reviews for 8161 would be appreicated [09:45] (should be super simple) [09:45] and fixed the snapd-failover test - at least in the one run that happened there :) [10:09] mvo: what's the context of amazon linux and unstable [10:09] the commit message has no other information [10:12] also - amazon linux is failing to install packages right now :( [10:13] PR snapd#8163 opened: tests: enable snapd-failover on uc20 [10:37] * pedronis reboots [11:26] PR snapd#8164 opened: snap: use the actual staging snap-id for snapd [11:26] pedronis: got a question about https://github.com/snapcore/snapd/pull/8156#discussion_r381881462 you'd see the WaitTriggerKey() be moved to some other file right and just keep the interfaces and related structs in triggerwatch.go? [11:26] PR #8156: [RFC] cmd/snap-bootstrap: subcommand to detect if we want a chooser to run [11:27] mborzecki: not quite, I think the structs should go where they are used as input or output to functions [11:27] mborzecki: this is go so the interfaces can be defined on the consuming side [11:29] mborzecki: I mean keyEvent etc should be in evdev.go [11:35] pedronis: hmm right, the only concern i have is that keyEvent is a concrete type, so it's kind of shared between the consumer and producer, not sure if i'm explaining this right :) [11:35] mborzecki: you are, but is not a concern in go [11:36] the interfaces exist mostly for testing, no? [11:36] mborzecki: it's very unsual for a consumer to define structs it wants [11:36] pedronis: yeah, well, maybe i'm trying to complicated it too har :) [11:37] mborzecki: let's put it this, way if you tried to stick evdev.go in its own package as is, it wouldn't work [11:42] mborzecki: does this ^ remark makes sense? I'm also happy to HO if I'm still confusing you [11:47] pedronis: it's ok, i'm overcomplicating this, thought about a scenario when evdev went to a separate pacakge and KeyEvent would either need to be an iface, or one package would need to import the other to get the definition, but it makes no sense to restructure it this way [11:49] cachio: hey! i've asked 2 questions under #8157 [11:49] PR #8157: tests: using google storage when downloading ubuntu cloud images from gce [11:49] pstolowski, hi, I'll take a look [11:54] mborzecki: just out of curiosity, what's the time difference between running the chooser trigger check from initramfs or early in the normal system? [11:56] cmatsuoka: we can start the detection early, in my VMs it's like <1s earlier than early boot, but i guess it may be different on the actual devices [11:56] cmatsuoka: and there's also a question, whether we'd be able to execute the gadget-provided trigger hook in initramfs, which i suspect we won't [11:57] cmatsuoka: as i said during the standup, it's probably only to the benefit of a device that supports the trivial input method like keyboard or somesuch [11:57] mborzecki: I had this feeling that initramfs should execute really fast [11:58] mborzecki: but yeah I don't know how slow real devices could be [12:07] zyga: hey, fyi, https://github.com/snapcore/snapd/pull/7980 is -><- close [12:07] PR #7980: packaging,snap-confine: stop being setgid root [12:15] jdstrand: cool [12:16] jdstrand: I'll check soon [12:16] jdstrand: going through coverity, last one to fix [12:20] * pstolowski lunch [12:20] moving back home [12:20] bbiab [12:25] mvo: the enable failover tests failed for some reason [12:26] jdstrand: https://github.com/snapcore/snapd/pull/8165 [12:27] PR #8165: cmd/snap-confine: fix everything flagged by coverity [12:27] PR snapd#8165 opened: cmd/snap-confine: fix everything flagged by coverity [12:30] jdstrand: replied to setgid, I'll jump there immediately now as it's so low hanging and it could be done shortly [12:30] jdstrand: the bigger branch is still in progress, I'm working on improving the testing setup and resolving the issue you highlighted in remote access to 18.04 system [12:30] jdstrand: it is real but for reasons I need to dig deeper to understand [12:31] mvo, jdstrand: FYI - I will make coverity scan automatic, this is just the first step [12:32] jdstrand: btw, I think we got remarkably little things flagged by coeveity [12:32] Coverity [12:33] I was bracing for a huge list [12:33] It is a really cool tool [12:37] mvo: it seems core 20 needs more love for failover [12:37] mvo: /bin/bash: line 89: /usr/bin/snap: No such file or directory [12:39] * zyga small coffee & pączek break [12:41] Pączki look delicious! [12:44] cmatsuoka: it's fat Thursday here [12:44] cmatsuoka: so pączki are everywhere [12:52] mvo: is it known that snapd snap ships the code in /snap ? [12:56] pedronis: you mean the go source code? yes, but it's to be solved with ijohnson's patch to use base: core in snapcraft.yaml [12:59] we back up like real men [12:59] by shipping all code to each user in production [13:00] Dear $NAME; This is not a scam. Can you send us your core snap back please? We lost all disks in a thunderstorm. kthxbye [13:05] re [13:07] I just jumped through initrd break=premount [13:07] fun [13:15] I think I will head back [13:15] better to have the standup in a private setting [13:15] I'll finish my coffee and walk home [13:19] Speaking of my branch it should be ready to land when the Snapcraft in candidate goes to stable [13:19] cmatsuoka, zyga re core20 failover> it is strange, I ran this locally and it worked, oh well. needs investigation [13:21] mvo: your fix for master failed on preseed test during image download. Perhaps it would make sense to land cachio's 8157 first (and flag it with 'skip spread') [13:29] zyga: Can you explain what syntax/grammer of the mount command here has? And where receive the command then? [13:29] ``` [13:29] emit(" mount options=(bind, rw) %s -> %s,\n", bindFile, path) [13:29] emit(" mount options=(rprivate) -> %s,\n", path) [13:29] emit(" umount %s,\n", path) [13:29] ``` [13:29] I mean: who receives the message? [13:31] zyga: wait. it's in the comments. Thank you [13:34] pstolowski: hm, I can cherry pick his fix into my PR [13:36] mvo: that works too [13:44] re [13:44] sdhd-sascha: that's apparmor mount specification [13:45] sdhd-sascha: I'm glad you asked about grammar [13:45] sdhd-sascha: that's about the only thing that is very thoroughly documented :) [13:45] sdhd-sascha: go to http://manpages.ubuntu.com/manpages/bionic/man5/apparmor.d.5.html [13:46] sdhd-sascha: and check the MOUNT RULE = part [13:55] mvo: Thursdays SUs are in a different room now? [13:56] zyga: :-) thank you, again. [13:56] i'm looking for a way to mount when a classic snap service restarts. Think the install hook is not enough. [13:56] sdhd-sascha: can you summarize as to what you want to detect and what you want to do in response please [13:57] cmatsuoka: not really, sorry, silly calendar [13:59] hmm [13:59] should I join what meet proposes [13:59] I'm in the standp [13:59] nobody there [13:59] zyga: use the regular one I guess [14:00] can you spare the link [14:00] zyga: the github-runner, needs write access in the same directory. so i create /var/lib/runner/$SNAP_NAME/_work directory on install-hook. Then i want to be sure, that this is mounted to $SNAP/usr/lib/runner/_work [14:00] I just go to meet.google.com [14:00] mvo: can you share the correct link [14:01] zyga: it's classic, because i don't know what the github-action-runner-scripts can do [14:02] zyga: on classic, the layout in snapcraft.yaml is not executed... [14:02] sdhd-sascha: layouts don't do anything during classic confinement [14:03] * sdhd-sascha nod [14:04] Would be nice if layouts in classic mode would also work. [14:04] sdhd-sascha: what would you expect them to do? [14:04] sdhd-sascha: change the host? [14:05] sdhd-sascha: what if two snaps want to have $SNAP/foo in /usr/lib? [14:05] sdhd-sascha: who wins? [14:05] maybe, limit the mounts to `/var/lib/runner/$SNAP_NAME/$SNAP_REVISION/_work` like above [14:06] sdhd-sascha: what is /var/lib/runner? [14:06] i mean, without runner [14:06] PR snapd#8166 opened: cmd/snap-bootstrap: create a new parser instance [14:06] sdhd-sascha: I think this is weird, a classic snap can just mount stuff there [14:06] sdhd-sascha: perhaps I'm missing something but I think layouts are not meant for this [14:07] ok [14:07] thank you [14:07] sdhd-sascha: layouts take something from the snap and put it somewhere in the view of the snap [14:07] sdhd-sascha: but none of that changes the host for real [14:24] cmatsuoka: btw. could we just mkfs rather than delete/remove partitions? [14:27] mborzecki: right, I think we could just redeploy them instead of messing with the partition table [14:27] cmatsuoka: just look at the attributes to know which ones we can wipe [14:29] mborzecki: let's try that [14:36] mvo, hi, sorry to ping again wrt #1817276, is there anything I can capture when this happens to help debugging? I see it quite frequently locally. it might be made worse by the fact that my laptop is a bit old [14:36] Bug #1817276: snapfuse use a lot of CPU inside containers [14:39] ackk: I need to appologize for this one, we still have not invstigated how this can happen [14:40] mborzecki: I blocked the PR while I try those changes, but I'll work on the TPM cmdline measuring first [14:40] cmatsuoka: cool, added a note there too [14:41] mvo, np. I can't change the status of that bug. does it make sense to reopen or should I file a new one ? [14:42] mborzecki: ping me when the chooser PR is ready for re-review [14:42] ackk: let me reopen it [14:42] mvo, thanks [14:43] ackk: reopened and updated the description. let me try to investigate again [14:48] ijohnson: do you think you could have a look at 8161 ? it's green and seems to fix the snapd failover, if it's not wrong I'm in favor of merging and then we can merge your improvements next? [14:48] ijohnson: it should unblock all the other PRs we have open [14:50] mvo: sure [14:50] mvo, thanks [14:53] mvo: approved [14:54] pedronis: thanks for letting me know about the /snaps subdir in the snapd snap - I (strongly) suspect that snapcraft is acting silly here, will investigate [14:55] mvo: it's fixed in modern snapcraft [14:55] ijohnson: aha! so we just need to land your PR :) ? [14:55] mvo: my PR open right now fixes that, but we just need to wait until snapcraft 3.10 on candidate channel is promoted to stable [14:55] yes [14:55] ijohnson: I guess I could change our build recipe to use snapcraft from candidate? [14:56] ijohnson: let me try this [14:56] mvo: yes that would let us land sooner, not sure if that's okay to build snapd that gets released with candidate? depends on how much you trust sergiusens I guess :-) [14:56] mvo: the spread test we have is already using candidate [14:56] (but that doesn't release anywhere, just makes sure it builds) [14:57] ijohnson: I trust sergiusens a lot - plus the extra testing snapcraft would get this way is probably good [14:57] sure sounds good then :-) [14:57] PR snapd#8161 closed: tests: set StartLimitInterval in snapd failover test <⚠ Critical> [14:57] ijohnson: I am mostly waiting on feedback from field, but I probably won't be releasing this week [14:58] sergiusens: I switched our default snapd builds to use the snapcraft from candidate now, I think this makes sense anyway for testing your stuff earlier [14:59] ijohnson: so just to double check 7904 will work with snapcraft 3.10 in candidate? so I can remove the blocked label? [14:59] mvo: yes [15:00] ijohnson: awsome [15:02] thanks mvo, that will help! [15:06] zyga: https://github.com/snapcore/snapd/pull/8165/files#r382057305 [15:06] PR #8165: cmd/snap-confine: fix everything flagged by coverity [15:07] zyga: found an example of use in qemu: https://github.com/qemu/qemu/blob/master/scripts/coverity-model.c [15:08] mborzecki: looking [15:21] cmatsuoka: can you take a quick look at https://github.com/snapcore/snapd/pull/8166 ? [15:21]