[05:56] good morning [06:06] zyga: morning [06:06] hey :) [06:33] mvo: hey [06:34] hey mborzecki [06:45] hey mvo [06:45] quick comment, do you remember the remodel kernel work? there's a test that reliably fails in osc build, there's a trello card for it [06:45] I wanted to look now but if you know more I'd love to get help [06:46] zyga: can you point me to the log? [06:46] zyga: the remodel kernel test takes a while iirc, we had to increase the settle timeout on arm [06:46] zyga: is it maybe that? [06:46] that's the same tet [06:46] but it failed on my xeon ;D [06:48] also with a timeout? [06:48] it's hard to say, there's very little useful stuff when settle fails [06:48] I complained that it should not be time-based at all but this is not a simple problem to solve [06:49] mvo: https://trello.com/c/foU3iOrs/321-investigate-testremodelswitchtodifferentkernel-failure [06:52] mvo: the whole output is [06:52] https://www.irccloud.com/pastebin/8NhFWypY/ [06:52] zyga: this happens on your opensuse as well? [06:52] no, it only happens in 'osc build' [06:52] though my master is "after" 2.42 [06:52] zyga: is it easy to submit test builds? I could add something that prints more debug [06:52] mvo: it's not remote [06:52] it's local [06:53] mvo: it's like dpkg-buildpackage [06:53] mvo: anything extra can be added, just make a patch against 2.42 [06:53] mvo: if you push a branch I can extract the patch and run it easily [06:53] mvo: I would like to make some general changes to the tests later on [06:53] though I doubt this will be fasst [06:54] 1) log all handlers that executed during failed test [06:54] 2) change settle in tests to just do more stuff as long as handlers want to run, remove the timeout entirely [06:54] 3) patch tests that measure failure to introduce callback to break further loops [06:55] current time based code is simply not reliable and not engineered correctly as tests IMO === pstolowski|afk is now known as pstolowski [07:03] mornings [07:22] pstolowski: hey :) [07:22] man, I'm so sleepy today [07:23] the rain, the low pressure [07:23] coffee, sorry [07:24] hey pstolowski - good morning! [07:24] mvo: sorry if I came across cranky, I just think we need to acknowledge shortcomings and not just paper over them with "this is how it's been done" [07:25] zyga: not cranky at all [07:26] zyga: let me just finish this one thing here and then I can do something, probably something like for _, t := range chg.Tasks() { fmt.Printl(t.Kind(), t.Status()) } [07:26] yeah [07:29] flashing the card again, last night I had issues and somehow I could not get the card to copy from macos [07:29] (new sandbox for all apps) [07:29] mvo: macos catalina switched to ... read only image as root filesystem [07:29] mvo: yeah :) [07:29] mvo: ubuntu personal [07:30] mvo: /Users (/home for macos) and /Applications are writable [07:30] rest is not [07:30] and they live on a different apfs dataset (like zfs thing) [07:38] zyga: updated #7571 [07:38] PR #7571: sandbox/cgroup: refactor process cgroup helper to support v2 and named hierarchies [07:38] checking [07:39] zyga: is there a place like /var/log on osx where system logs land? wonder how it's combined with the read only filesystem image [07:39] mborzecki: actually macos has /home /var /usr and all that [07:39] they are just hidden [07:39] this is the current mount table [07:39] https://www.irccloud.com/pastebin/r4LiHBk7/ [07:40] mborzecki: and yeah, there is /var/log [07:41] mborzecki: this is also interesting [07:42] https://www.irccloud.com/pastebin/fRUAdcOV/ls%20-la%20%2F%20 [07:42] note that /home is /System/Volumes/Data/home [07:42] where Data is the new writable part of apfs [07:44] mborzecki: the new PR looks nice [07:46] mborzecki: approved [07:49] zyga: thx [07:50] mvo: hi, I made this comment: https://github.com/snapcore/snapd/pull/7538#issuecomment-539882778 let me know if you prefer me to go over line by line [07:50] PR #7538: tests: use `snap model` instead of `snap known model` in tests [07:55] PR snapd#7575 closed: cmd/model: add authority-id to verbose fields [08:00] pedronis: thanks, looking now [08:02] pedronis: no need to go over it line by line, thanks, I will make the changes [08:08] zyga: I created github.com/mvo5/snappy:debug-remodel-kernel-zyga with some extra debug (quite small right now). please run when you get a chance and let me know what it outputs [08:08] mvo: thank you [08:10] mvo: I've rebased on 2.42, trying now [08:13] mvo: new log https://www.irccloud.com/pastebin/EUXdHkdl/ [08:15] zyga: aha, looks like something in the mock restart is not working [08:21] mvo: I'm logged into the image from rogpeppe [08:21] refreshing snaps now [08:21] we'll know in a second if it crashes on reboot [08:22] zyga: sometimes it doesn't :) [08:22] rogpeppe: fingers crossed [08:22] I have serial output [08:22] so we'll know what is attempted at least [08:22] zyga: the other thing is that it surely shouldn't be rebooting very often [08:22] and I should send the raspi-tool to snapd into some contrib/ directory [08:23] zyga: how often would you expect? is it usual for it to try to reboot once or twice a day? [08:23] zyga: will look into this a little bit [08:23] rogpeppe: not sure, one sec [08:23] Chipaca: from that device [08:24] rogpeppe: the only valid reason to reboot that often is if it tracks edge for core/kernel everything else is most likely a bug [08:24] Chipaca: system-shutdown messages https://www.irccloud.com/pastebin/sSWsFr69/ [08:24] rogpeppe: thank you so much for helping us tracking this down! [08:24] zyga: thank you, super curious about this [08:24] mvo: i hope you do! :) [08:24] rogpeppe: it updated to 2.42 correctly [08:24] zyga: those messages are fine [08:24] ok [08:24] oh [08:24] it reboots instantly? [08:24] what? [08:25] I snap refreshed [08:25] it rebooted [08:25] then it reboots again [08:25] straight away [08:26] mvo: suggestion for improvement [08:26] mvo: uboot script should log basics [08:26] like booting that kernel and this base [08:26] and in this mode [08:26] would be great to see [08:27] mvo: snapd change after update [08:27] it seems that 2nd reboot was the only one [08:27] snap change on successful 2nd reboot https://www.irccloud.com/pastebin/4X2z8O6R/ [08:27] I'll reboot manually to see if it boots ok [08:29] rogpeppe: sadly, it works :/ [08:32] mborzecki: question in #7443 [08:32] PR #7443: timeutil: fix schedules with ambiguous nth weekday spans [08:32] rogpeppe: I'll leave it running [08:32] zyga: one thing you could do is just leave it running [08:32] zyga: lol [08:32] rogpeppe: to see how it ages for a few days [08:32] :D [08:33] zyga: yup. i guess it might be that we've got two pi's both with buggy h/w [08:33] rogpeppe: could it be the SD card? [08:33] could it be that the additional hardware somehow destabilises the pies? [08:34] zyga: i used a different SD card too [08:34] interesting [08:34] how is the hardware connecteD? [08:34] zyga: and it's a decent quality SD card, not much used. [08:34] zyga: connected to the network? [08:34] no, any custom stuff added to your pi? [08:34] not network [08:35] zyga: nothing custom added [08:35] there goes that hypothesis then :) [08:36] zyga: there was an RTC and a pi-glow on the old pi, but not on this one [08:36] I can send you my pi if you want to swap :) [08:37] zyga: that's an interesting thought. the other thing i might try (sorry) is to use a different OS. [08:37] that would be an interesting data point as well [08:52] Chipaca: hey, health-check task is not executed on install if flags&skipConfigure != 0 (because we return early and don't schedule config hook - see the bottom of doInstall); intended or a bug? [08:54] pedronis: iirc it's not strictly needed, though i kind of highlights the thing happens every day [08:55] mborzecki: highlights or confuses people like me :) [08:55] pedronis: hah ok, i can drop it [08:55] mborzecki: one possible reading is that it triggers every day of the month [08:55] anyway with /1 [08:55] the "spec" is not great [08:56] duh, it is not :/ [09:01] mborzecki: if it's not strictly needed I would drop it, less mental burden to read it. The spec says " Values may be suffixed with "/" and a repetition value, which indicates that the value itself and the value plus all multiples of the repetition value are matched. Two values separated by ".." may be used to indicate a range of values; ranges may also be followed with "/" and a repetition value." of which I'm not sure how to [09:01] interpret in practice [09:02] i'm looking at the spec, and we may have a bug there, at laest on older systemd versions [09:02] there's no .. range on xenial :/ [09:03] but out snap with timers does not fail, so, the bug is only for numbered ranges [09:04] another conclusion is that since we had no bug reports or complains about snap install failing, then people do not use it at least in snap.yaml [09:06] mborzecki: we'll need to update this, right? https://forum.snapcraft.io/t/timer-string-format/6562 [09:06] yes [09:07] also before you could do mon1-fri2, right? now you need something like mon1-sun,,mon2-fri ? [09:11] yes, mon1-fri or mon-fri2 (or mon2-fri) [09:12] so at most the range can describe 8 days, eg mon1-mon [09:12] ok, not sure I'm worried, also the other one does quite do the same thing [09:12] just making sure I understand [09:12] *does not quite do [09:14] zyga: I updated the kenrel-remodel debug PR, will probably produce a lot of output now, could you please run it? [09:14] sure [09:14] one sec [09:15] mvo: try to rebase on 2.42 next time [09:16] zyga: mvo: are you looking into TestRemodelSwitchToDifferentKernel ? [09:17] mvo: building now [09:17] pedronis: yeah, it seems to fail for zyga in obs [09:17] pedronis: mvo is really, I'm just the test runner [09:17] mvo: in osc, not obs (I didn't push yet) [09:17] pedronis: its a bit strange I cannot reproduce and I'm a bit lost [09:17] there's a card, I will move it and put you on it [09:17] osc - dpkg-buildpackage, obs -> that thing in the cloud [09:17] mvo: new log coming up [09:17] zyga: thanks, is there an ocd as well? [09:18] ocd? :D [09:18] zyga: just kidding [09:18] hahaha [09:18] I was suspecting some :D [09:18] zyga: we also have obi here [09:18] zyga: SCNR [09:18] same here [09:18] very useful [09:18] SCNR? [09:18] zyga: sorry-could-not-resist [09:19] zyga: these short acronyms are just too hard for me to remember :) [09:19] zyga: anyway, hope it gives me clues but I have *no* idea right now [09:20] I can look as well at some point if it remains a mystery, not quite right now [09:21] though [09:21] mvo: the log is large and got truncated from my screen, give me a sec to grab it from disk [09:21] zyga: only the bits before the test failure are of intersst [09:22] zyga: you could also modify the buld to just run the remodel test during the build [09:22] the build takes a second, it's just the timeout that we are waiting for [09:22] pstolowski: in practice we use SkipConfigure only in firstboot code, doesn't really answer either way [09:22] ok, collecting now [09:23] mborzecki: 7572 has an arch specific failure in the spread logs that I have not seen before, could you please have a look before I restart the build? [09:23] mborzecki: (not urgent :) [09:24] pstolowski: bug [09:25] Chipaca: pstolowski: notice that in practice we don't want that behavior, is right now immaterial, we don't want to run health-checks before at least the essential snaps are installed, so what we do for configure there, would apply also for health-checks [09:26] so in practice is a feature, just obscurely expressed [09:26] pedronis: true [09:26] mvo, zyga, ondra showed me a bootchart yesterday and we noticed that there are a gazillion unsquashfs calls during seeding ... why is that (i.e. if we extract metadata or some such, why dont we mount/umount the squashfs and cp the file which is a ton times faster and less CPU hungry) [09:26] ogra: we unsquash to look at info AFAIR [09:27] right, thats was my guess [09:27] ogra: but yeah, I agree [09:27] pstolowski: basically renaming the flag SkipConfigureAndHealthCheck would also work [09:27] *that [09:29] ogra: at one point we determined it to be cheaper to unsquash it to extract just one file, than to mount it [09:30] we also do some checks before we consider a snap safe to mount but I don't remember the specifics there for firstboot [09:30] we could save on one op entirely by mounting to /tmp/$RANDOM and then moving the mount over to final location [09:30] mvo: IMO our checks are not in the "safe to mount" category [09:30] mvo: it's valid vs invalid but not safe [09:31] either way, we will do nothing without measuring [09:31] also it's a delicate area under churn [09:31] so also nothing very soon [09:32] ogra: I recommend to create a forum topic with some of those logs, and we'll see [09:32] Chipaca, well, on first boot of a low end system it significantly eats CPU [09:32] mvo: it's still running, it's odd - I had a look at the log file it's running under now and it keeps printing open /dev/null and open /bin/false [09:32] ogra: yes. I sent an email about this three years ago. [09:32] ogra: I profiled it [09:34] mvo: the log is crashing pastebin :D [09:34] one sec [09:34] zyga: heh, I just need the last few lines [09:35] zyga: well, I need to see why its not converging, i.e. what it tries to run over and over again [09:35] [ 618s] trying to run: runMgrForever Doing 1... [] [09:35] [ 618s] trying to run: runMgrForever Doing 1... [] [09:35] [ 618s] trying to run: runMgr1 Do 1... [] [09:35] [ 618s] trying to run: unknown Do ... [] [09:36] runMgrForever is in overlord_test.go ? [09:36] no managers_test.go [09:36] there's a long sequence of those [09:36] but earlier there's also this [09:36] do we have a test that doesn't clean up? [09:36] https://www.irccloud.com/pastebin/8PTTypwd/ [09:36] anyway it seems unrelated [09:37] then there's a long wall of [09:37] zyga: how big is the log? maybe you can put it into google drive? [09:37] sure, one sec [09:38] zyga: something along the lines of "] Done link-snap Make snap "brand-kernel" (2) available to the system [2019-10-09T08:11:58Z INFO Requested system restart.]" is what I'm hoping for [09:38] gedit hangs for a while while pasting [09:38] 2019 [09:38] ogra: mail subject 'booting the pi', 26 Jan 2017 [09:38] ogra: fwiw [09:39] mvo: 32M [09:39] mvo: uploading now [09:40] zyga: woah [09:41] Chipaca, hmm, that talks about sha3 and keccac [09:41] ogra: yes [09:41] ogra: having profiled, those were the slow things [09:42] ogra: that's what we mean by 'we will do nothing without measuring': measure first, determine what is slow, see how to improve it [09:42] ogra: optimising away the calls to mskquashfs will do nothing if those calls are not the thing slowing down first boot [09:42] unsquashfs* [09:43] of course, it's entirely possibe that those calls are now slow [09:43] but after having received exactly 0 responses to my original mail I'm not going to waste time on this again [09:43] Chipaca, well, they eat CPU via userspace while mount/umount via the kernel dont [09:44] the outcome of your mail was in the end that ijohnson developed assembly to improve sha3 (even though that has never been merged or is in use anywhere still) [09:45] PR snapd#7538 closed: tests: use `snap model` instead of `snap known model` in tests [09:48] PR snapd#7566 closed: snap-repair: add missing check in TestRepairBasicRun [09:49] * mvo hugs pedronis [09:52] mvo: #7568 needs a tweak (or I am confused) [09:52] PR #7568: snap: when running `snap repair` without arguments, show hint [09:52] there is some mark down in the error message which I don't think we do [09:53] heh, Markdown [09:56] pedronis: sure, let me fix this [09:57] grr old systemd [10:07] Chipaca I'm collecting some bootcharts I can share with you [10:07] Chipaca those are the one ogra mentioned [10:08] * zyga is somewhat hungry [10:08] making progress on cgroup branches, I'll have something to share soon [10:08] trying to eliminate the extra binary now [10:09] well, it works, just needs some cleanup still [10:09] mborzecki: ^ FYI [10:12] ondra: I'm not sure what you expect to do with them though [10:13] heh, I just noticed that we have cmd/snapd-generator that claims to be snapd-workaround-generator [10:15] ondra: ogra: if I'm reading the bootcharts right, the unsquashfs calls are not a big contributor to the overall boot? most don't even register as taking any time at all, only three (of 23) take more than a second [10:17] Chipaca, they eat CPU .. all of them have pretty bold blue bars .... onn a multicore system that wont be significant, on single cores it will bite [10:17] ogra: sure. So if we avoid them we save... .01% of the overall boot time? or how much? [10:18] not on multicore where your IO is rather serialized [10:18] err [10:18] s/multi/single/ [10:18] that boot time is, still, over 5 minutes. The total sum of all time spent on unsquashfs is less than 10 seconds. [10:18] point is it wastese ressources that we could avoid [10:19] sure, there is something else going on with the system ondra took the bootchart [10:19] i'm not saying this needs urgent fixing or anything but it is an obvious place for improvement [10:19] Chipaca my main focus is what is happening between ~40 and ~140 seconds [10:20] right, it *feels* like there is a Sleep(100) somewhere in snapd's code :D [10:20] Chipaca it takes almost 100s to start snapd from snapd snap, and I don't seem to get timing for this anywhere, or logs [10:20] ondra: have you enabled debug logs? [10:20] but thats unrelated to the unsquashfs ... i'm just pointing out that unsquashfs is wasteful and it would be nic to improve it [10:21] *nice [10:21] i'm not asking for an urgent fix but for considering to improve it if there is time [10:22] ogra: please write something on the forum with data, these IRC discussions will go nowhere [10:22] will do [10:22] ogra: what pedronis said. Thanks! [10:22] Chipaca I'd need to use custom snapd snap I guess [10:22] Chipaca or how to do it at first boot [10:23] ondra: no, you just need to drop a file in etc/systemd/system/snapd.service.d [10:23] Chipaca also it seems like it's before we start snapd, so logs in snapd will not help [10:23] ondra: you can do that if you pause the bootloader for ex [10:23] it would be cool if we could enable debug options for snpd on the kernel cmdline BT [10:23] W [10:23] Chipaca can you give me example of that service, I will drop it to the image [10:24] adjusting the cmdline on firstboot is very easy .. while injecting a file to the rootfs is hard [10:24] ondra: $ cat /etc/systemd/system/snapd.service.d/debug.conf [10:24] [Service] [10:24] Environment=SNAPD_DEBUG=1 SNAPD_DEBUG_HTTP=7 [10:24] ondra: ^ I also sent an email asking anybody developing things on snapd to have this, ages ago (and it's all over the forum) [10:25] ondra: or, it might be easier, to just drop those two env vars in /etc/environment [10:25] * pstolowski walk, bbiab [10:26] ondra: in core18, given it's a funkier service, you might need to adjust the service name [10:27] Chipaca Ok thanks, I will try that [10:27] ondra: i.e. /etc/systemd/system/core18.start-snapd.service.d/debug.conf [10:27] mvo: I notice this service doesn't source /etc/environment, that's probably not good [10:28] ondra: or maybe it's snapd.seeding.service? [10:28] ondra: at this point I don't know which service actually starts snapd :-| [10:28] in core18 [10:29] Chipaca: oh, good point [10:29] Chipaca: different ones at different points [10:29] mvo: do you know how that dance happens? [10:29] i can't find a snapd.seeding.service but it's in the bootchart [10:29] there's a bootstrapping one then it does a bit of seeding then snapd restarts [10:30] Chipaca: yes, snapd itself writes it [10:30] as the real service I think [10:30] mvo: from where? [10:30] wrappers/core18.go [10:30] Chipaca: i.e. when snapd starts and there is nothing on the system yet it calls wrappers/core18.go [10:30] pedronis: but there isn't a seeding there either [10:31] are you sure it's seeding? [10:31] there is no such thing afaik [10:31] there's a seeded service [10:31] that one doesn't snapd [10:31] pedronis: https://private-fileshare.canonical.com/~okubik/bootchart-20191008-1710.svg [10:32] pedronis: there's a snapd-seeding.service [10:32] Chipaca I think this one core18.start-snapd.service [10:34] Chipaca: it's not a service, it's an ephemeral unit [10:35] Chipaca: from here (as ogra said): https://github.com/snapcore/core18/blob/master/static/usr/lib/core18/run-snapd-from-snap [10:35] * pedronis lunch [10:36] ondra: wrt not sure if it's snapd or not, snapd is running from t=39s at least [10:36] ondra: so debug logs should shed some light on what's going on there [10:37] Chipaca, pedronis https://forum.snapcraft.io/t/unsquashfs-vs-mount-during-firstboot-seeding/13614/1 ... (really just a suggestion for times where you guys dont have other things to do :) ) [10:38] Chipaca https://private-fileshare.canonical.com/~okubik/boot-plot.svg [10:39] Chipaca if you look here, we only mount snap-snapd-5038.mount around 130 second [10:39] Chipaca and snapd is started at 140 second [10:39] yep [10:40] Chipaca and my interest is before 140 [10:41] Chipaca also from boot-chart you can see that during that time system is relatively idle [10:41] once it starts seeding, it pegs on core well, there we would need to consider parallel jobs to speed it up. but that first 100s seems to be as easier to address now [10:43] ondra: what is it waiting for? [10:44] Chipaca you tell me :) [10:44] Chipaca why do you thing I'm digging into it, I want to know as well :) [10:46] ondra: can you stop it inside the initramfs just before it pivots in? [10:46] i.e. before systemd kicks in [10:46] Chipaca, pedronis, ondra https://forum.snapcraft.io/t/allowing-snapd-debug-options-to-be-set-on-kernel-cmdline/13615 [10:46] but after everything is mounted [10:46] :) [10:46] ogra: tks [10:47] we need whishlist tags in the forum :D [10:47] Chipaca possibly, what do you want to do there? [10:48] ondra: add 'set -x' to run-snapd-from-snap :-) [10:48] ondra, break=bottom ... then you can modify /writable [10:48] (indeed only stuff thats not inside snaps :P ) [10:49] Chipaca but that one is core18 snap [10:49] ondra: copy it somewhere, edit it, bind mount it back [10:49] heh [10:50] Chipaca you are bringing big guns right away :) [10:50] PR snapd#7576 opened: spread.yaml,.gitignore: sync ignored files [10:52] Chipaca I guess I can also create own core18 snap can't I? [10:52] ondra: probably, yes :) [10:53] pstolowski: btw, are you using core18 for your spike ? [10:54] ogra is that right to have in core18 snap things like dev/null dev/random dev/urandom dev/zero ? [10:55] it doesnt do harm [10:55] ogra seems to me useless [10:55] (but in general these things should be handled by devtmpfs which we mount already in the initrd) [10:55] yeah, they likely are ... unless you have a kernel without devtmpfs [10:56] (these are rare but i guess they still exist) [10:56] Chipaca anything else you want me to modify in core18 snap? set -eux -> set -x? [10:56] Chipaca I will sprinkle that file with few echos once on it [10:58] set -x should be enough ... you dont need eu (said boris johnson ...) [10:59] ondra: get some timestamps in there as well [10:59] ondra: dunno where the log will end up :) [10:59] hopefully the journal? [10:59] if so no timestamps needed [10:59] dunno [10:59] Chipaca I will see [11:00] Chipaca let me run test boot [11:18] pstolowski: pedronis: pushed a fix to #7443, we had an actual bug on systems with systemd older than 233 [11:18] PR #7443: timeutil: fix schedules with ambiguous nth weekday spans [11:21] mborzecki: thanks, I'm a bit confused about the places where the comments mention 28 and the ranges have 29 [11:24] pedronis: heh, right so the 28th day is part of the range genrated by actual calculation, while 29-31 is the bit that varies from month to month (or year), but technically a month can have anywhere from 28 to 31 days [11:25] perhaps i should leave a note that 28th is included in the range added further down the code [11:27] * Chipaca takes a break [11:31] re [11:32] pedronis: no, not yet, i'm using cloud image (core + lxd) [11:33] pstolowski: ok, that will have it's complications [11:33] maybe [11:36] * zyga -> tea [11:38] PR snapd#7577 opened: overlord: set fake sertial in TestRemodelSwitchToDifferentKernel [11:50] pedronis: force pushed a comment about 29-31 day range [11:56] Chipaca https://paste.ubuntu.com/p/fQQXtCBSnQ/ [11:57] Chipaca with different formatting https://paste.ubuntu.com/p/q5MFkKy5cw/ [11:57] Chipaca attempts to overlay that file failed though [11:59] ondra: and is the time you need to know about the time between 11:36:25 (131.827907) and 11:38:20 (247.375225), there? [12:00] ondra: because if so, that _is_ snapd running, and you should be able to set env vars to get it to print more debug info [12:01] ondra: add --setenv="SNAPD_DEBUG=1 SNAPD_DEBUG_HTTP=7" to the systemd-run call, there [12:02] Chipaca but 'Started Start the snapd services from the snapd snap.' is only printed at the end [12:02] Chipaca and don't you connect eligible plugs and slots before you start snapd? [12:02] ondra: yes, but "started" does not mean "just did the exec" [12:02] ondra: snapd is the thing doing the connecting [12:02] Chipaca true [12:02] those log lines about connecting come from snapd [12:03] oooh! kernel parameter! [12:03] systemd.setenv=SNAPD_DEBUG=1 [12:03] ^ that should work! [12:03] says the manpage [12:03] Chipaca OK let me test [12:04] ondra: you can pass it more than once (if you also need _HTTP) but just SNAPD_DEBUG=1 should be enough [12:04] Chipaca OK [12:05] Chipaca I might got bind mount working [12:06] Chipaca so i will get logs from this boot first [12:06] Chipaca do you want logs from snapd.service then? [12:06] ondra: just the ones from that first run [12:07] Chipaca but from which service? [12:07] ondra: snapd-seeding.service [12:13] Chipaca do you mean snapd.seeded.service ? [12:13] ondra: no, I mean snapd-seeding.service [12:13] ondra: as in what systemd-run starts in that script [12:14] that's the mystery snapd [12:14] that's the one you want to look at [12:14] Chipaca I cannot see such a service [12:14] ondra: where are you looking? [12:14] systemctl status snapd-seeding.service [12:14] Unit snapd-seeding.service could not be found. [12:15] ondra: it doesn't have a unit file [12:15] ondra: journalctl will know of it though [12:15] ondra: journalctl -u snapd-seeding.service should have it [12:15] Chipaca OK I need to re-run boot as logs rotated now [12:16] ondra: wait a mo' [12:16] Chipaca how can I increase journal buffer? [12:16] ondra: there was a way to make it non-ephemeral, maybe ogra knows offhand [12:16] ondra: systemd.journald.forward_to_syslo [12:16] syslog* [12:17] ondra: in the commandline [12:20] Chipaca but on core18 we have no syslog [12:20] i didnt know there is a cmdline option ... i only know you can create a dir (was it /var/log/journal ???) that will then automagically be used for journald logs [12:20] ondra: I didn't know that :) [12:20] yeah, no more syslog [12:20] Chipaca will it create it? [12:20] no [12:20] Chipaca then no syslog [12:20] bah, i dunno [12:20] i guess it will just try to write to logger [12:21] but [12:21] ondra: but [12:21] (and not find it ...) [12:21] Chipaca so I do actually have a log from seeding service [12:21] ondra: edit that script, and redirect the output from systemd-run to somewhere [12:21] that should work [12:21] well, creating that dir should also work for persistent logs [12:21] ondra: actually it might _not_ be in the journal, it might all be spewed out from the script [12:21] Chipaca but it's useless https://paste.ubuntu.com/p/sw5sNBp6WK/ [12:21] they are binary files though [12:21] I could check but I'm not sure it's the same in 16 and 18 [12:22] ondra: that's not debug output :-| [12:22] Chipaca booting now with debug env from cmd line [12:24] and I can confirm the output from the systemd-run does reach the journal (but I think your recent pastebin does, too) [12:26] Chipaca actually reading logs over ssh seems to messed it up, check this https://paste.ubuntu.com/p/ZFB78ycbMz/ [12:26] ondra: if the rotation is still getting in the way, we can either tweak journald.conf, or have it forward to kmsg and bump the kmsg buffer (but that might be expensive on a low lowmem board) [12:27] Chipaca it's fine if I know what logs to get there is time to do it once I can access seeded device [12:27] ondra: is that with -u snapd-seeding.service ? [12:27] ondra: it looks like the logs of the start-snapd-wotsit script [12:27] which would not be in snapd-seeding.service log [12:28] (and yes there is noise in that that could be avoided... :-| ) [12:30] Chipaca sudo journalctl --all -u snapd-seeding.service is always empty [12:30] ondra: what was https://paste.ubuntu.com/p/sw5sNBp6WK/ ? [12:31] Chipaca core18.start-snapd.service [12:31] gr [12:31] welcome to our world :P [12:32] ondra: and if you foraward_to_kmsg ? [12:32] (debugging the indebuggable) [12:33] ondra: I've got another option [12:34] Chipaca OK now with extra verbosity I do not get start of the log :( [12:34] ondra: forwarding to kmsg? [12:34] ondra: or without? [12:34] I don't know which of all the options you've gone for :) [12:34] Chipaca without [12:35] Chipaca BTW it's clear from journalcrl of whole system, there is only kernel chatty from 40 to 140 second [12:35] Chipaca no changes to where journal goes for now [12:37] ondra: ok, I'm downloading a core18 for amd64, and will play around with it until I can get logs for that first boot snapd [12:37] ondra: I'll let you know [12:37] Chipaca looks like we have no journald.conf at all, so it should be easy to add one to etc [12:37] ondra: feel free to do so :) [12:38] seems silly that it doesn't have a kernel option for setting storage as it has for everything else [12:44] hmm [12:44] ondra: so, on amd64, all I need to do is set systemd.setenv=SNAPD_DEBUG=1 in the kernel commandline, and when I log in after creating my account and etc I have logs for snapd-seeding.service [12:45] ondra: https://paste.ubuntu.com/p/kSmxKc6zKw/ [12:46] ogra@pi4:~$ ls /etc/systemd/journald.conf [12:46] /etc/systemd/journald.conf [12:46] my core18 image has journald.conf here [12:46] ogra not mine [12:47] weird [12:47] ogra I can add it, but how do I increase max size [12:47] given we are most likely using the exact same core18 snap [12:47] it's set to 10M [12:47] ondra, ogra: where did you get yours from? [12:47] ogra also it is not in core snap [12:48] mine is my pi4 custom image ... but that uses core18 from stable [12:48] ogra@pi4:~$ snap info core18|grep refresh [12:48] refresh-date: 6 days ago, at 22:09 UTC [12:49] journald.conf is there, and writable, in the amd64 core [12:49] core18* [12:49] same here [12:49] ogra@pi4:~$ sudo touch /etc/systemd/journald.conf [12:49] ogra@pi4:~$ [12:49] ogra ah RuntimeMaxUse= [12:50] ondra: what core18 are you running, that it doesn't have this file? [12:51] Chipaca edge [12:51] Chipaca rev 1202 [12:52] ondra: what arch? [12:52] armhf [12:53] same here ... but stable [12:54] ondra: core18 1202 has etc/systemd/journald.conf [12:55] Chipaca armhs? [12:55] 1202 is armhf [12:55] 1202 [12:55] :) [12:55] would be surprising if it vamished all of a sudden [12:55] ondra: UBUNTU_STORE_ARCH=armhf snap download core18 --edge [12:55] ondra: unsquashfs core18_1202.snap etc/systemd/journald.conf [12:56] ondra: less squashfs-root/etc/systemd/journald.conf [12:56] ¯\_(ツ)_/¯ [12:56] ogra: yeah it'd be a bug [12:56] Chipaca OK my bad, I must have had type when searching [12:56] sorry I have it too :) [12:57] sergiusens, hi, any chance you could take a look at https://github.com/sergiusens/snapcraft-preload/pull/34 and https://github.com/sergiusens/snapcraft-preload/pull/36 ? [12:57] PR sergiusens/snapcraft-preload#34: preserve LD_PRELOAD from the calling environment (fixes #33) [12:57] PR sergiusens/snapcraft-preload#36: use SNAP_INSTANCE_NAME instead of SNAP_NAME for prefixing /dev/shm fi… [12:57] ondra: so now I'm suspicious of your "it's empty" :-| [12:57] anyway, i've got an up to stand [12:58] Chipaca non it is not :) [12:59] Chipaca ogra anyway it only exists in core snap, and not in overlay /etc [12:59] which is probably good, as I can add mine [12:59] sorry auth [12:59] I'm increasing buffer 4times so I should have now more time [12:59] core18 has an overlay /etc ?? [13:00] Chipaca overlay of /etc/systemd/ [13:00] Chipaca how would we add new units :) [13:02] ogra@pi4:~$ grep etc/systemd /etc/system-image/writable-paths [13:02] /etc/systemd auto persistent transition none [13:02] ogra@pi4:~$ cat /etc/issue [13:02] Ubuntu Core 18 on \4 (\l) [13:03] if you dont have the journald.conf file then something with your initrd is broken [13:04] (transition means it copies the content from core to /writale on first mount) [13:04] ondra: sorry, overlay is a bad word (overlayfs ...) [13:05] well, the point is that it *should* be in the overlay /etc [13:05] if it isnt, there is a bigger prob [13:05] ogra@pi4:~$ ls /writable/system-data/etc/systemd/journald.conf [13:05] /writable/system-data/etc/systemd/journald.conf [13:05] ogra did we find more bugs ? :) [13:05] smells like ... if you are sure the file inst there at least [13:06] *isn't [13:06] ogra if I have file there, will ill override it? [13:06] if you have the file there it is fully writable [13:06] so you ccan just edit [13:07] ogra no if I have already file there, it will not nuke it, right? [13:07] ogra ignore me it should not [13:08] yeah, it wont [13:09] transition only copies once on first mount ... in subsequent mounts it just uses whats there [13:09] (so you could even create a new file there) [13:10] Chipaca https://paste.ubuntu.com/p/bq6b39PBJ4/ === ricab is now known as ricab|lunch [13:11] ondra: so there you go [13:11] ondra: and, snap debug timings 1, for more info [13:13] Chipaca ? [13:13] Chipaca how to get more logs? [13:13] "error trying to compare the snap system key: system-key missing on disk" [13:13] is that relevant ? [13:13] ogra: nah, that's expected on first boot [13:13] or does the system-key only come in with initialize-device ? [13:13] right [13:14] so the gap is before "Load: validated crc32" ... i guess the checksumming is slow then ? [13:14] (once again) [13:15] crc32 wow what a throwback [13:15] ogra no that is check of lk boot env [13:16] ogra that takes 1ms [13:17] hmm, well, there is a long gap before it runs [13:17] ondra: did you run 'snap debug timings 1'? [13:17] ogra exactly, and I want to know why [13:18] ondra: or even just 'snap changes' [13:18] ch-ch-ch-changesss [13:18] * ogra humms david bowie [13:18] Chipaca https://paste.ubuntu.com/p/FsdG3V4Fjg/ [13:19] PR snapd#7578 opened: sandbox/cgroup, overlord/snapstate: move helper for listing pids in group to the cgroup package [13:19] zyga: ^^ [13:19] thanks [13:19] ondra, that doesnt seem to list the bootloader stuff ... perhaps check change 2 instead ? [13:20] ondra: also, --verbose [13:20] ogra that only shows last 2 changes :) [13:21] Chipaca no it always start at number 3 [13:21] those are task numbers, not change numbers [13:21] but [13:21] ondra: snap debug timings --verbose 1 [13:22] ondra: and, as ogra said, also look at change 2 [13:23] change 2 will have the keys generation [13:24] mborzecki: look again [13:26] Chipaca it does not give me change 1 or 2 https://paste.ubuntu.com/p/Yj5Q2HYpp9/ [13:27] ondra: the "1" in "snap debug timings --verbose 1" is \change 1 [13:27] ondra: please also run the same command but for change 2, i.e. "snap debug timings --verbose 2" [13:27] Chipaca and 'snap debug timings --verbose 2' will only print timing around change 206 [13:27] ondra: the numbers under the ID column in that output are about tasks [13:28] Chipaca ah sorry [13:28] soooo complicated ! [13:28] ondra: what else do you have in "snap changes"? [13:28] Chipaca so change 2 is two tasks, and both take 1 combined [13:29] Chipaca just starting services and initialize device [13:29] ondra: can I see the output of 'snap changes' please [13:30] what do you pay ? [13:30] Chipaca https://pastebin.canonical.com/p/pzJTRfPtdq/ [13:30] :P [13:30] ondra: ok, the changes you want to look at are 1 and 6 [13:30] yeah, 6 [13:31] Chipaca https://paste.ubuntu.com/p/dtgwn5C9SK/ [13:31] is that waiting 56s on key generation? [13:32] ondra: yes [13:32] ondra: so [13:32] ondra: your random is probably empty [13:32] zyga: replied, i had a really trivial check in mind there [13:32] two times actually, no ? [13:32] but is that blocking boot? [13:32] ondra: get more randomses [13:32] ogra: no, one is the sub of the other [13:32] ah, k [13:32] ondra: yes, you can't boot with an empty random pool [13:33] right, i see the indendation in the summary when looking closer :) [13:33] ondra, did you already add the cmdline changes for rng_core ? [13:33] ondra: your device probably has a hardware rng you can plug in … somehow? [13:33] Chipaca still even if blocking, and that's a bit odd to block at that stage, it only explain 56s there [13:34] ogra yep rng_core is there [13:34] ogra rng_core.default_quality=700 [13:35] yeah, that should have improved a little bit ... (i'd expect a few seconds less ... so 80-90 vs the 100 before) [13:36] ogra I do not think it improved [13:36] well, perhaps a missing kernel config ? rng_core is tied to the kernel SW rng [13:36] (and will bee quietly ignored if the kernel doesnt have it) [13:37] ogra what config would be that? [13:37] Chipaca when do we need that random number first time? [13:38] ogra CONFIG_HW_RANDOM ? [13:39] ogra as that is enabled [13:39] CONFIG_HW_RANDOM and sub-options of that (see menuconfig) === chesty_ is now known as chesty [13:41] ogra is there actually device node which I should see? [13:41] ondra, tyhicks was the one that went over the whole rng stuff back then ... he might know what other options are needed (he also suggested the cmdline option ... which back then helped a lot on a beaglebone where we saw the slowness first9 [13:42] not with in-kernel rng [13:42] if you want to use a device node there is likely some other kernel config option thats specific to that HW [13:42] and gives you HW rng ... (if the SoC actually has it at all) [13:43] you probably want to take a look at the original android config [13:43] ogra like CONFIG_HW_RANDOM_MTK :) [13:43] ogra and enabled [13:43] fooor example :) [13:43] tat might need userspace bits ... not sure [13:45] ondra: you probably have things in dmesg about not having enough entropy? [13:45] when we need it → to generate keys [13:46] 🔑 [13:46] we should just generate them at build time ... makes everything easier ... and everyone has the same key ... :) [13:47] ondra, btw, do you also see the slowness on second boot ? [13:47] (i assume not) [13:47] ogra nope, sub 30 seconds [13:47] perhaps we should simply tell the customer to boot the device once in-factory [13:48] ogra from kernel code, there is driver using TZ to generate rn, so this should be fast [13:48] make it part of the flashing procedure [13:49] "power on ... leave it sit for 5min" [13:49] (unconnected) [13:55] timer wont be enough to get entropy [13:55] you need interrupts [13:55] attach a mouse and wiggle it :) [13:55] Chipaca stupid question but task ID is assigned when task is started, is that correct? [13:55] Chipaca and tasks from change id 1 will run before tasks from change 2 [13:55] Chipaca because random number generation is run as task 200+ [13:55] Chipaca this is way after snaps were seeded === ^arcade_droid is now known as zarcade_droid === cjp256_ is now known as cjp256 === arnatious_ is now known as arnatious [13:56] wow lot's of dead ppl here [13:56] * Chipaca got better [13:56] Chipaca did you see my question about task IDs? [13:57] Chipaca is that ID assigned when task was started? [13:57] Chipaca so I can assume they go in sequence? [13:57] ondra: I did not see your question [13:57] Chipaca OK last two question I did ask now [13:57] ondra: they go in sequence at the time of task creation, which might not line up exactly with task execution [13:58] Chipaca OK, so then this random key generation might take some time, but it's run way way later [13:58] Chipaca so it's not that 100s we are looking into [13:58] Chipaca it's task id 200+ and change 6 [13:59] Chipaca we are looking into 100s before or at the beginning of change 1 [13:59] Chipaca like task id 1 and 2 [13:59] ondra: I … what? why? [13:59] ondra: if it's before snapd starts, then it's somebody else you should be talking to [14:00] Chipaca that 100 is before we start seeding, rsa key is generated after seeding [14:01] Chipaca nope, it's this log https://paste.ubuntu.com/p/5jx9kcfZXV/ [14:02] Chipaca or in that seeding log, there is that pause [14:02] Chipaca rsa-key comes after seeding [14:03] ondra: ok [14:05] Chipaca so can I get more logs there? [14:07] ondra: did you ever get me a 'snap debug timings --verbose 1'? i might have closed it already [14:07] Chipaca yes I did, https://paste.ubuntu.com/p/Yj5Q2HYpp9/ [14:08] Chipaca task 4 there is too late, this is after that quiet period [14:09] ondra: 101ms is too late? [14:10] ondra: can you actual timestamps on the journal output, or is it too late for that? [14:10] ondra: the way I read the one you pasted is probably wrong [14:10] Chipaca probably too late now [14:11] Chipaca I have timing from start of the boot [14:11] ondra: ok, maybe you can help me then [14:11] ondra: https://paste.ubuntu.com/p/bq6b39PBJ4/ [14:11] Chipaca which is then 0 [14:11] in that one [14:11] ondra: the numbers in the left column are seconds in offset since the 'logs start', yes? [14:11] Chipaca time in that log is relative from 0 which is device power on [14:11] ondra: ah [14:11] Chipaca since device power on [14:12] ondra: and how do I compare with other things? [14:12] Chipaca and yes, seconds from device power on [14:12] Chipaca all the logs and boot chart is in seconds from device power on [14:12] Chipaca snap changes are not though [14:13] nor is '-- Logs begin at Wed 2019-10-09 13:00:46 UTC, end at Wed 2019-10-09 13:10:02 UTC. --' [14:13] Chipaca so depends what do you want to compare it to? [14:13] Chipaca OK there you go, you have also log start [14:14] ondra: that doesn't help, does it? [14:14] ondra: what wall clock time does the gap happen at? [14:15] bootchart looks at the kernel counter [14:15] which starts at 0 when the kernel fires up [14:15] Chipaca but what are you interested in? Absolute time when service started, or actual time processing [14:15] ogra and I did same for journal logs [14:15] (same thing you get with dmesg if you dont use -T ) [14:16] ogra so you can link them to bootchart [14:16] ogra and same you get with journalctl -o short-monotonic [14:16] ah, i didnt knwo that one (i rarely want the kernel timer anyway) [14:17] Chipaca but if I tell you wall clock time wha will you compare it to? [14:17] ondra: what's the output of 'snap debug timings --verbose --ensure=seed' ? [14:18] Chipaca https://pastebin.canonical.com/p/3NRRGZYJ9S/ [14:18] Chipaca '75739ms - state-from-seed populate state from seed'? [14:18] ondra: how many more seconds do you need to find? [14:18] Chipaca what is that? [14:19] Chipaca what do you mean 'how many more seconds'? [14:19] ondra: last time I found you 50s and your first reaction was "i need more" [14:19] so here i am finding you more [14:19] that's a chunk of time [14:19] Chipaca no, that 50 seconds you found is when services run [14:20] Chipaca device communicated [14:20] Chipaca so they are not helping me at all === ricab|lunch is now known as ricab [14:20] yes, that was your second reaction [14:21] Chipaca that's 50 seconds to connect to the store to check for update, but at that point device will be already communicating with user, so that 50s is for me irrelevant [14:21] ondra: I regret trying to help you, now [14:22] Chipaca well then just tell me "yep, during first boot we stop for 100s before we start seeding, this is feature" [14:23] the first thing I told you before trying to help you was that years ago I pointed out that first boot on armhf takes 5+ minutes because of things [14:23] and now, several hours of trying to walk you through it all I point you at the logs that show you that same bit of information [14:23] I have no idea what you're so upset about at this point [14:23] Chipaca this is smart speaker, it takes 5 minutes to boot, before it can connect to the phone. that 5 minutes does not include or is affected by that 50 seconds key generation, as you do not have internet anyway. [14:23] but you have the data, right there [14:24] Chipaca but you can connect over the BT to provision WiFi, while device is generating rsa-key [14:24] I am not talking about those 50 seconds for key generation, I am talking about _these_ 300 seconds of seeding [14:24] Chipaca so it's not having any effect on user experience [14:25] Chipaca I accept seeding takes time, related to number of snaps [14:25] Chipaca but snapd snap seeds for 150 seconds, that seems odd [14:27] Chipaca and nobody complains about that, I'm just wondering why we have 100s of nothing, Now you explain me this is feature. And you should just take it [14:27] Chipaca thanks for help [14:31] ondra: at what point did I say this was a feature and you should just take it [14:32] Chipaca at point that you said I was not happy with 50s discovery, which are post seeding [14:32] ...? [14:32] Chipaca I told you those make no difference [14:34] ondra: I still don't see how I told you that. I see you saying I did, though. [14:35] or maybe saying I _should_ [14:35] but I won't, because it isn't [14:39] ondra: could you please snap install http, and pastebin 'http snapd:///v2/changes select==all' [14:45] Chipaca https://pastebin.canonical.com/p/k8Q6ryzwhn/ [14:47] ondra: where does 2019-10-09T13:06:29.005541125Z fall WRT the missing time? [14:47] is that before or after the time we're looking for? [14:48] Chipaca that's after [14:50] Chipaca I think we are looking for things before 2019-10-09 13:03:00 [14:51] Chipaca more precise between 2019-10-09 13:01:26 and 2019-10-09 13:03:00 [14:51] ondra: where do those timestamps fall in https://paste.ubuntu.com/p/bq6b39PBJ4/ ? [14:55] Chipaca between 40 and 140 [14:56] Chipaca from what I understand, logs star is when kernel started [15:03] * zyga calls it a day [15:10] ondra: just in case, is there anything in any of the other --ensure= options? [15:10] anything in the times we're looking at [15:13] Chipaca I did get whole journal if you want [15:13] Chipaca I could not see anything there though [15:14] Chipaca let me paste it for you [15:14] ondra: there's a little more detail in the timings output [15:15] ondra: can you patch the snapd for first boot, or is that too much? [15:15] Chipaca not that is actually fairly easy [15:15] Chipaca unlike core as I learnt [15:16] Chipaca https://pastebin.canonical.com/p/Bzv44GKwmD/ [15:17] ondra: out of curiosity, what are these about? [15:17] [ 42.310781] localhost kernel: CPU3 killed. [15:19] Chipaca just some MTK kernel verbosity :) [15:19] hmm [15:33] Chipaca, at least it is a friendly killer ... i have seen other boards where daemons kill their children and such ... way more blood there [15:33] ondra: so if I give you a patch for snapd you can apply it to your snapd? [15:34] Chipaca yep [15:34] Chipaca that wont' be problem [15:35] ondra: http://paste.ubuntu.com/p/DWYXndYs5C/ [15:35] ondra: won't be too informative but hopefully it'll give us somewhere to start looking [15:36] hmm [15:36] ondra: it'll need the debug trick still [15:38] Chipaca sure I will let you know once I have it === pstolowski is now known as pstolowski|afk [15:47] * cachio lunch [15:52] Chipaca it will take some time, as my LP git sync is now hanging to run sync 'as soon as possible' for past 15 minutes [15:52] Chipaca time is relative thing [16:03] blame einstein [16:04] only important if you're going fast, so it doesn't apply to this board [16:04] * Chipaca hides [16:07] If you're routinely waiting for an LP git import, it's probably a sign that you should be hosting the repository directly on LP instead. Imports are best-effort [16:12] but thats less funny because you cant complain then ... [16:20] cjwatson it's a bit pain to push to two locations as snapd is in github, but I guess best option [16:20] cjwatson as now I'm waiting over half an hour and still nada [16:23] folks, so: mir-kiosk has a plugs: - x11, so it offers x11 for other snaps to connect to it, right? [16:23] ondra: Hm, one of our importds doesn't seem to be doing much though, let me see [16:27] cjwatson thanks [16:27] cjwatson I did create copy git repo to kick build now, so it should be OK [16:32] ondra: I'm off but I'll check back later tonight [16:32] Chipaca no worries I have some other tasks anyway, let's check tomorrow [17:16] ondra: Repaired the stuck worker, so it should be doing a better job of keeping up with imports now anyway [17:18] cjwatson thank you :) [17:32] Chipaca https://paste.ubuntu.com/p/WMKkCPSBjg/ [17:47] * cachio afk [18:00] roadmr, i think that plug is only for running mir-kiosk on desktop systems .... to actually run an x11 app on top of mir-kiosk (which only offers wayland to apps) your app needs to ship Xwayland and set that up through a wrapper .... the app also needs a loopback slot/plug x11 setup [18:01] roadmr, i.e. line 50,56 and 81 in https://github.com/ogra1/opencv-demo-snap/blob/master/snap/snapcraft.yaml [19:24] thanks ogra, that'll be useful as a reference [20:25] PR snapd#7579 opened: interfaces/network-setup-observe: add Info D-Bus method accesses