=== chihchun_afk is now known as chihchun [05:39] morning [06:28] PR snapd#5459 opened: cmd/snap: add 'debug paths' command [06:50] good morning [06:56] zyga: hey [07:05] mornigns [07:05] pstolowski: heya [07:08] hey pstolowski and mborzecki [07:08] mvo: hey, have you seen https://forum.snapcraft.io/t/need-help-snapd-services-in-ubuntu-18-04/6205/4? [07:10] mborzecki: good morning - not yet, let me look [07:12] * zyga settles in for work [07:13] today looks, finally, like an emerging summer day [07:13] it's not expected to rain and temperatures will be a little above 20 [07:14] mborzecki: hm, hm [07:22] mvo: wouldn't that be caused by stuff running with 5minute iteration times? [07:27] mvo: that's not great :/ [07:27] mvo: is seeding blocking 1st boot? [07:27] mvo: or perhaps registration blocking 1st boot [07:28] mborzecki: can you elaborate a bit, not sure I get what you mean. I think we have two problems: a) its so slow - why is that? we don't seed on classic, so the snapd startup is slow. I saw earlier reports about missing entropy in VMs so that might be a clue then we need to figure out why we need random data b) why we block login, i.e. we should not have wantedby=multi-user.target [07:28] zyga: aiui this happens on every boot not just first boot [07:29] zyga: in any case, I think we need to check what target we can use that is not blocking login [07:29] mvo: if this is on every boot then something is very wrong [07:29] unless it fails and keeps retrying [07:30] zyga: yeah, it *might* be entropy but it might also be something else, definitely not reproducible on my (real HW) box but I try a VM next [07:30] mvo: I'm all-day-vm and I haven't seen that [07:30] mvo: but I'm in vmware [07:30] ok [07:34] mvo: tought that it's some ensure thingy, but i see that doMarkSeeded calls EnsureBefore(0) so it's probably good [07:36] mborzecki: yes, I added it so that registration starts immediately but nothing waits on registration afaiu [07:36] (I mean refresh does but shouldn't be visible outside) [07:36] hm if release.OnClassic && !osutil.FileExists(seedYamlFile) {, is there a seed.yaml in the desktop images? [07:37] they do seeding in 18.04, no? [07:37] I mean desktop [07:37] hmm maybe that gnome-calculator snap [07:38] still doesn't explain why this happens on each boot :( [07:38] each is weird [07:38] apparmor stuff ? [07:38] do we reload profiles for some strange reasons at each boot? [07:42] mborzecki: do you know if there is a default systemd target that runs after multi-user? [07:42] pedronis: on a new kernel we would but otherwise no [07:42] mvo: iirc graphical requires multi-user [07:42] mborzecki: graphical.target [07:43] yup systemctl cat graphical.target [07:43] Requires=multi-user.target [07:43] * zyga learned this while installing RHEL 7 [07:43] haha ;) [07:44] any objections for merging https://github.com/snapcore/snapd/pull/5443 [07:45] PR #5443: interfaces: treat "snapd" snap as type:os [07:45] mborzecki: I was thinking if we should build a script that collects some information for debugging [07:46] for people that come and say "foo is broken" [07:46] we ask snap version, os-release, a few others [07:52] zyga: a combination of all debug commands probably [07:56] zyga: 'snap debug lies' [07:57] Chipaca: snap debug report [07:57] I already like it [07:57] but I think it should be a shell script in case "snap debug" is broken [07:57] hm, askubuntu is interessting [07:57] mvo: ? [07:57] one guy reports the problem *and* he did not upgrade snapd but the kernel and a bunch of unrelated packages [08:01] PR snapd#5443 closed: interfaces: treat "snapd" snap as type:os [08:05] Chipaca: and you don't know if it's a verb or a noun ;) [08:10] * zyga debugs interfaces-many [08:13] public service announcement [08:13] using MATCH for if ... MATCH; then ... else .... fi is not good, it spams the log with garbage that you don't want to see [08:21] also running this test makes me want to run and optimize [08:24] so askubuntu indicates that using the older kernel makes the problem go away and the kernel was released two days ago [08:24] so that matches - *but* its not clear why I can't reproduce it :( [08:25] zyga: make 'snap foo' try to run snap-foo if foo isn't an internal command, and ship snap-debug-report? [08:25] ooooh [08:25] that's sweet [08:25] actually /usr/lib/snapd/snap-foo maybe [08:25] ye [08:26] that would be nice, too bad golang cannot do small executables [08:26] * zyga applied one simple apparmor optimization and goes to look for breakfast [08:26] zyga: i thought you said it'd be a script? [08:26] yes [08:26] but in general [08:26] this one would be a script, yes [08:26] zyga: second breakfast i hope [08:27] um [08:27] nope [08:28] I tend to shift my hours late [08:28] And I wake up later as well [08:29] Well apart from today at 5AM because $SUNSHINE [08:35] * Chipaca off to take one of the boys to the doc [08:35] bbiab [08:40] simple review https://github.com/snapcore/snapd/pull/5459 anyone? [08:40] PR #5459: cmd/snap: add 'debug paths' command [08:43] ohh, nice [08:47] Job for systemd-journald.service failed. See "systemctl status systemd-journald.service" and "journalctl -xe" for details. [08:47] pedronis: I saw that on travis yesterday [08:47] fun [08:47] weird-ish but I think not new [08:47] very unlikely combination of jurnal ops [08:48] (racy) [08:48] we probably don't wait for it to stop and sync [08:48] and restart fails because it's running [08:48] or something [08:51] xnox: hey, good morning! a recent kernel update caused some issues with random numbers and that caused snapd to start super slow (because it now waits for entropy). what is the best way to a) ensure snapd starts b) but is not in the way of the login screen? do we need a different target (wantedby) than multi-user.target? === chihchun is now known as chihchun_afk [09:11] tests are taking longer, I suspect they will pass now [09:11] which is good :) [09:13] Mornings [09:13] hey [09:14] mvo: I've merged the spread PR, with a small tweak so closing the old session is done right before assigning the new one, similar to what we have before.. that avoids multi-closing and leaving a closed session assigned.. please let me know if the old behavior was intended somehow [09:15] mvo: Travis is also updated with the new logic [09:15] niemeyer: sounds correct, thank you === chihchun_afk is now known as chihchun [09:24] mvo: Replied on https://forum.snapcraft.io/t/how-to-specific-the-kernel-snap-on-core18/5947/5 .. please let me know if we need anything else about this [09:25] mborzecki: mvo: this is happening quite a lot: Job for systemd-journald.service failed. See "systemctl status systemd-journald.service" and "journalctl -xe" for details. [09:25] I got two test runs in a row hitting that [09:25] niemeyer: thanks, I think this is perfect [09:25] pedronis: and on ubuntu-core only iirc [09:26] seems so, but on random tests [09:26] anyway is part of prepare [09:27] pedronis, mborzecki thanks, sounds like we need add debug code there. side-note: tests seems to be more unstable lately again, kind of annoying [09:27] pedronis: i saw one more with `find ..../state-lib/*`, seemd like a glob gone wrong [09:27] * mvo meanwhile goes into a systemd fistfight [09:27] https://paste.ubuntu.com/p/Cp3SdyRyxj/ [09:27] pedronis: ^^ [09:28] uh, google:ubuntu-16.04-64:tests/main/interfaces-contacts-service failure again [09:29] pstolowski: same as before? [09:30] mborzecki: i don't remember the previous failure. relevant bit is https://pastebin.ubuntu.com/p/p6vTVH5nxx/ [09:31] pstolowski: nope, this one occurred randomly before (rarely though) and iirc i tracked it back to libevolution-blahblah [09:31] mborzecki: ack, not good.. thanks [09:45] Chipaca: hey [09:45] question about the warnings [09:45] zyga: 'sup [09:46] zyga: sure [09:46] should we seek integration with desktop notification systems? [09:46] (where appropriate) [09:46] imagine we never open the CLI [09:46] and never see any warnings there [09:46] zyga: I'd expect each client to keep track of warnings in its own way [09:46] aha [09:46] so gnome-software [09:47] interesting [09:47] that's sane, yes [09:47] zyga: e.g. gnome-software would keep a 'last warning seen' timestamp around, separate from snapd's [09:47] zyga: even the acking mechanism could be different [09:47] zyga: for example, gnome software might take 'ownership' of the warnings itself [09:47] could a waning be generated asynchronously by snapd itself [09:47] zyga: whereas snap does not [09:48] zyga: did not follow [09:48] zyga: all warnings are generated by snapd itself [09:48] you are on an idle desktop, snapd fires a warning, a desktop notification shows up, you click on it and go to gnome-software showing the details there [09:48] Chipaca: (without user action triggering it) [09:48] zyga: that'd be gnome-software's integration work if so [09:48] also that'd probably only happen after we got notifications working from snapd [09:48] that is [09:49] gnome-software have asked for a no-polling way of getting stuff [09:49] (so if you install something from snap, and gnome software has a listing, it can update the listing for example) [09:49] or if it's showing a listing and a snap refreshes, it can update it [09:49] anyway, i need to step away again [09:50] zyga: but, this thing should support all that (modulo the no-poll thing) [10:13] mvo, it sounds like you want to fix the kernel, no? [10:13] mvo, do you have the entropy bug? is snapd consuming entropy? (e.g. we had a bug were generating a few uuids could stall things) [10:14] mvo, if a hardware number generator is available.... is it in use? [10:15] mvo, and we do need to ensure that snapd starts when it does =/ because of cloud-init, etc. [10:16] mvo, also, there were some kernels rolled back. [10:34] xnox: well, I follow #stable-kernel and have no seen anything rolled back yet. as for fixing> yes. however I wonder if we can mitigate this somehow by ensuring that snapd is not blocking login [10:34] xnox: I think its an entropy problem, we use random numbers for various things, I need to dig where this exactly hangs though [10:35] mvo, that's conflicting goals. as then you will break all the public clouds that preseed snaps and expect that sdk/agents are there, and working upon login. [10:35] xnox: hm, hm. fair point [10:35] honestly I think that is a special case [10:36] and it should not be default [10:37] well, we cannot really change it [10:37] we have the relationship setup that way since a while [10:39] I need 2nd review on https://github.com/snapcore/snapd/pull/5457 [10:39] PR #5457: many: lessen the use of core-support [10:39] mvo: also I'm not quite sure what we need entropy for after the first boot [10:41] pedronis: yeah, I'm checking if I can find anything [10:55] pedronis: catalogRefresh needs quite a bit of getrandom() data - that one is easy to delay. however we also have something that uses 4 bytes of getrandom() before main() which I'm looking at right now. aiui the problem is that the urandom entropy now only becomes non-blocking once a certain amount of real entropy is available to seed the prng (but I might be wrong here). so even those 4 bytes are problem blocking the boot [11:07] mvo: are all our timer randomization things using real randomness? [11:09] zyga: getrandom is pseudo random unless a flag is specified. but it looks like the bug is that urandom now is blocking until its initialzed with real entropy (my working theory so far) [11:09] hmmm, I strongly doubt that is the case [11:09] (urandom blocking ever) [11:12] PR snapd#5460 opened: tests: use grep to avoid non-matching messages from MATCH [11:13] PR snapd#5461 opened: tests: "snap connect" is idempotent so just connect [11:13] zyga: well [11:13] zyga: " If the urandom source has not yet been initialized, then getrandom() [11:13] will block, unless GRND_NONBLOCK is specified in flags. [11:13] " [11:13] zyga: from the man-page [11:14] ! [11:14] that's very interesting then [11:15] so urandom is not really that reliable after all [11:15] zyga: it looks like it, I'm digging. we are not the only ones affected I'm trying to figure out a fix for the common case [11:16] zyga: seeding will be hard though, here we need urandom [11:16] mvo: can we polinate from snapd? [11:16] zyga: but at least we should not block in the already-seeded case [11:16] mvo: if urandom blocks then we do what polinate does [11:16] zyga: an interessting idea! the kernel team did investigate polinate and they suspect it might be buggy though [11:16] fun :) [11:16] I'll go to core18 topics for now [11:16] zyga: I did not follow that [11:16] good luck [11:17] ta [11:20] PR snapd#5403 closed: many: use extra "releases" information on store "revision-not-found" errors to produce better errors [11:23] hmmpf, woring on some changes in snapstate, broke aliases not even touching that code :/ [11:26] mborzecki: aliases are delicate [11:27] mborzecki: let me know if I cand help [11:38] * Chipaca ~> lunch [11:48] * zyga tests a switch over to internal LTE [12:00] mvo: why does catalogUpdate delays start up? it's a goroutine, no? [12:02] pedronis: because it accesses getrandom [12:02] and it blocks everything? [12:03] sorry, I'm dense but still not understanding === alan_g is now known as alan_g|lunch [12:03] pedronis: sorry, let me give a bit more context [12:03] pedronis: the latest kernel update make reading from urandom block at early boot [12:03] pedronis: until it has a certain amount of entropy [12:04] pedronis: that seems to be a regression and a bug but its not totally clear yet, the kernel team is researching this, it might be a valid security fix [12:04] I understand [12:04] pedronis: I looked into why we need urandom at startup of snapd [12:04] but I understand why getrandom from some init code whould block stuff [12:04] I don't understand why getrandom from a gorotuine not in the main daemon start path [12:04] does [12:05] we do systemdSdNotify READY from daemon.Start [12:05] pedronis: I need to look, maybe the catalog-update is not the problem. there is another reader of getrandom() (bson.go) which happens during "func init()" time [12:06] pedronis: I just wrote it in the forum, its two places right now [12:06] yes, I understand [12:06] I see the problem with init [12:06] not sure I understand the other one [12:07] unless go is starved somehow of threads for os calls [12:10] hmm, wasnt the purpose of urandom (vs random) to be non-blocking ? [12:11] (sounds like a kernel bug all over, why would someone change that) === pstolowski is now known as pstolowski|lunch [12:14] pedronis: if the theory is correct it does not even enter main() because the init code in bson.go reads urandom - or am I misunderstanding you maybe :) ? [12:14] mvo: yea, then why would catalog-update matter? [12:14] ogra_: yeah, I'm simplifying here a bit but the getrandom syscall man page explains that it may block if the prng is not initialized yet [12:15] pedronis: maybe/probably it does not [12:15] pedronis: sorry, I was just hunting for the sources of what reads getrandom [12:15] pedronis: I don't run into the bug myself so I can't test my theory :/ but I will add code that does [12:15] (that does test my theory) [12:18] mvo, well, but thats new behaviour and getrandom can be switched back to the old one if GRND_NONBLOCK is set as i understand [12:23] pedronis: you are correct, catalog-update does not matter - [12:23] pedronis: sorry for that [12:23] ogra_: correct, GRND_NONBLOCK is not used by the golang runtime though and we can not control that [12:24] ah [12:24] evil ... [12:27] ogra_: yes [12:28] pedronis: I updated the forum message - its just bson.go it seems that we would have to fix to workaround the issue [12:29] pedronis: anyway lets talk in the standup === alan_g|lunch is now known as alan_g === pstolowski|lunch is now known as pstolowski [13:01] * Chipaca on his way [13:25] mvo: is #1779948 the same issue again? [13:25] Bug #1779948: Snapd gets stuck when starting Ubuntu. [13:28] Chipaca: probably, let me look [13:32] ha, this is fun [13:32] while systemctl restart systemd-journald; do :; done [13:32] this fails nearly instantly [13:32] * zyga investigates [13:56] pstolowski: you reminded me of https://www.facebook.com/jesse.newton.37/posts/776177951574 (viewable in incognito if you don't use the book of faces) [14:04] pstolowski: hahah https://github.com/snapcore/snapd/pull/5416/files#diff-9c91792e9bb71d29a3ae728fc544152fR48 [14:04] * zyga goes to make some coffee [14:04] PR #5416: interfaces/hotplug: add hotplug Specification and HotplugDeviceInfo [14:04] Chipaca: I know that story, it's terrible [14:05] Chipaca: lovely, rotfl :). and btw, ircloud kindly gave me entire content inline, the dod that apparently for facebook too [14:06] pstolowski: irc cloud is nice, eh? [14:06] are you using the snap or the web browser to use it? [14:06] mborzecki: yeah i do that when i run out of foo & bar vocabulary ;) [14:06] my only issue with it is lack (apparent) of any way to set the font I want [14:07] zyga: i didn't know of a snap; i just use it in the browser. yes it's nice, i haven't looked back since i started my subscription a few months back [14:11] Chipaca: found the problem with aliases :( magic name handling in fakeSnappyBackend.ReadInfo [14:16] mborzecki: that's used for everything though, is not just aliases [14:16] we fake various snaps there [14:19] mborzecki: I'm still grinning evilly, here [14:23] pedronis: yeah, i just missed a little `if strings.Contains(snapName, "alias-snap") {` which changes the name :( [14:25] funny that it worked until it hit reenabling of manual aliases which looks in info.Apps[], which was obviously an empty map [14:42] anyone up for a simple review of #5459? [14:42] PR #5459: cmd/snap: add 'debug paths' command [14:45] PR snapd#5462 opened: many: use extra "releases" information on store "revision-not-found" errors to produce better errors (2.34) [15:09] pedronis: i've resolved the conflicts in #5452 and force pushed [15:09] PR #5452: store, overlord/snapstate: introduce instance name in store APIs [15:10] mborzecki: thx, I will look tomorrow morning I think [15:10] pedronis: great, thanks! [15:12] PR snapd#5463 opened: Optimize snap install time 1 [15:25] PR snapd#5464 opened: vendor: switch to latest bson [15:30] *ahem* [15:30] * Chipaca grins [15:30] PR snapd#5465 opened: daemon, overlord/state: warnings pipeline === chihchun is now known as chihchun_afk [15:44] * cachio lunch === pstolowski is now known as pstolowski|afk [16:11] * ogra_ hugs popey [16:11] \o/ [16:11] (for also doing an armhf build of xonotic !) [16:11] lulz [16:12] bet that doesn't work [16:12] we're dumping their pre-built binaries, not building from source [16:13] dont lauch, my next objective is kiosk systems so after i have a proper chromium kiosk image for the pi (which might still take a lot of work, it is definitely not accelerated atm) i'll move on to kodi and xonotic ;) [16:13] *laugh [16:13] i dont see why it wouldnt work ... [16:14] anyway ... as a xonotic junkie i'm happy to see we ha a snap now [16:14] *have [16:15] i see threads suggesting it could work [16:15] the snap is smaller than the upstream zip too :) [16:15] ha ! [16:16] (and stays compressed of course, double bonus) [16:17] yeah [16:17] the littel ram might be an issue while gaming [16:17] *little [16:18] though if it fully utilizes the GPU it should work [16:22] zyga, do you have a dragonboard with you? [16:23] I can reproduce the error of MATCH [16:23] mvo, do you have one? [16:29] cachio: no, I don't :-( [16:29] cachio: well, I do back in my offie [16:29] office [16:29] I could get it online shortly but not instantly [16:29] cachio: can you tell me more about the match issue? [16:30] zyga, the test interfaces-bluetooth-control is failing on dragonboard [16:31] it fails when it does MATCH "Permission denied" < btusb.error [16:31] but when I debug it, the file contains the string [16:31] then, if I change the match by a grep it works [16:32] zyga, it is supper weird [16:32] I'll run with -vv to see the deatuls [16:32] details [16:33] cachio: that's the same as the "^test:" string we've seen elsewhere [16:33] I don't think it's specific to any hardware [16:34] do we get the wrong MATCH definition in some context? [16:34] zyga, the test just works on dragonboard [16:34] it's starting to be too strange [16:34] cachio: can we add a trace to what MATCH does [16:34] cachio: yes but the user check is universal [16:34] pedronis: I doubt that, it is defined by spread [16:35] pedronis: definitely we don't have one that's nearly the same but inverts one bit of logic while keeping everything else [16:35] zyga: well, I see tests/lib/spread-funcs [16:35] .sh [16:35] look inside [16:35] that is used sometimes [16:35] the only definition is that from spread [16:35] are we getting coonfused by that [16:35] and this is not new, we had that for months [16:36] maybe somthing changed [16:36] it'd be interesting to see if we can run tests from last month and hit this [16:38] in the debug session I defined MATCH as spread does [16:38] and I run the same line which is failing during the test and it works [16:41] zyga, pedronis, I'll make a change to spread to add some debug info [16:41] cachio: maybe just set -x [16:41] we can also try to do declare -pf MATCH somewhere close to where we get those errors [16:41] hm [16:41] or redirect to a file [16:41] pedronis: good idea [16:42] maybe in that prepare logic [16:42] that seems to be hit very often [16:42] we can also re-define MATCH just ahead of that [16:42] though I think it must be something that is racing in the system [16:42] in a way that we don't understand [16:42] that impacts MATCH [16:43] but I cannot put my finger on anything that could do something like this [16:43] one thing to, perhaps, think about [16:43] is a very obscure mechanism in bash (and maybe dash) [16:43] that "inherits" function definitions [16:43] from one shell to another [16:43] but this still doesn't explain why it is racy [16:43] especially when invoked with a file [16:43] we could also try to copy /etc/passwd to /tmp/INSANE [16:44] and MATCH that to isolate from anything writing to passwd [16:44] some ideas to explore [16:51] zyga, ok, working on that [16:51] let's see what's going on [16:52] thank you! [17:24] PR snapd#5466 opened: tests: remove extra ' which breaks interfaces-bluetooth-control test [17:26] sigh [17:41] cachio: sorry, I don't have a dragonboard [17:54] mvo, np [17:54] Chipaca: are we there yet ;-) https://twitter.com/c___f___b/status/1014529179742810112 [18:33] * zyga break [18:58] PR snapd#5467 opened: tests: stop restarting journald service on prepare [19:22] PR snapd#5461 closed: tests: "snap connect" is idempotent so just connect [19:32] PR core#92 closed: Remove core-support plug [19:40] * cachio afk [20:09] PR snapd#5279 closed: interfaces/builtin: create can-bus interface