/srv/irclogs.ubuntu.com/2021/06/17/#snappy.txt

mborzeckimorning04:43
mardymborzecki: hi! Early bird, today :-)05:07
mborzeckimardy: hi, yeah, up since 5:3005:07
mardysleep is overrated anyways ;-)05:09
mborzeckihm ci was very unhappy yesterday but things seem to have improved now05:41
mborzeckieh some failures in tests poking the session agent, wtf? https://paste.ubuntu.com/p/P2zpc9Jbgq/05:47
mborzeckiwhy wasn't this faling yesterday?05:48
mborzeckican this be related? https://paste.ubuntu.com/p/3vYjVrkz5N/06:09
mborzeckidoesn't seem to be a system update06:18
mborzeckihm no way of telling if something was updated in the images 😕06:32
mborzeckimvo: morning07:00
mborzeckimvo: can you land https://github.com/snapcore/snapd/pull/10386 ?07:00
mvogood morning mborzecki ! sure07:01
mborzeckimvo: i see recurring failures in 2 tests that poke user session agent: https://paste.ubuntu.com/p/P2zpc9Jbgq/07:02
mborzeckimvo: i suspect this may be related to https://paste.ubuntu.com/p/3vYjVrkz5N/ but the tests are passing in isolation (well no surprise there though)07:02
mborzeckifun, our code using the cgroup v1 freezer is completely oblivious to v2 and since it assumes that ENOENT on the freezer file is fine, nothing fails ;)07:03
mborzeckipstolowski: hey07:06
mvogood morning pstolowski 07:06
pstolowskihey mborzecki and mvo07:06
zyga-mbpgood morning 08:07
zyga-mbpI wrote an ini-file encoder / decoder, similar to encoding/json08:10
zyga-mbpit's currently inside a project repo but if there is any interest I can split it out to a standalone repository08:10
zyga-mbpit has no dependencies08:10
zyga-mbpexcept for check for testing08:10
mborzeckitrivial PR: https://github.com/snapcore/snapd/pull/1042209:43
zyga-mbp+!09:47
zyga-mbp+109:47
jameshpedronis: was this the post you were referring to in the meeting? https://forum.snapcraft.io/t/exceeded-maximum-runtime-installing-snap-store-font-generation-issue/2480109:49
pedronisjamesh: yes09:52
mborzeckijamesh: any clue what may be happening here? https://paste.ubuntu.com/p/P2zpc9Jbgq/ i've only started seeing those today, and found this in the logs too: https://paste.ubuntu.com/p/3vYjVrkz5N/10:26
* jamesh looks10:26
jameshmborzecki: it means there is a stale UNIX socket file that nothing (neither "systemd --user" or snap-session-agent) is listening on10:28
mborzeckiit seems that most of the tests that poke user session agent are affected10:28
jameshmborzecki: that test is mostly concerned with the session agent of the test user, but the checks during install will try to talk to all the session agents for all users logged into the system.10:30
mborzeckijchittum: ofc it isn't 100% reproducible all the time, individual tests are passing, so i'm suspecting a stale session10:30
jameshMaybe some of the common test setup/teardown code has changed?10:30
mborzeckihm not quite sure, most of what landede seems unrelated, i looked at system package updates but it hasn't changed since may or so10:32
jameshmborzecki: there's code in tests/lib/tools/cleanup-state that tries to make sure the root user systemd instance is in a sane state, but perhaps it is getting confused somehow10:34
jameshbut that hasn't been updated in ~ 5 months10:34
mborzeckiyeah10:35
mborzeckinothing really stands out ;)10:35
pedronispstolowski: I reviewed https://github.com/snapcore/snapd/pull/1038411:39
pstolowskipedronis: ty11:41
mborzeckimvo: first bit: https://github.com/snapcore/snapd/pull/1042311:57
mvomborzecki: in a meeting but \o/ 12:23
ijohnson[m]oh ffs13:51
ijohnson[m]https://pastebin.ubuntu.com/p/hXvqxMm5FW/13:52
ijohnson[m]I'm pretty sure systemd on centos 7 just like doesn't understand numbers13:53
ijohnson[m]systemctl show --property TasksCurrent snap.$(systemd-escape --path group-top1/group-one/group-sub-one).slice13:59
ijohnson[m]TasksCurrent=1844674407370955161513:59
ijohnson[m]amazing13:59
ijohnson[m]our workaround for memory account doesn't work because sometimes systemd can't keep track of the tasks either14:00
ijohnson[m]*account14:00
ijohnson[m]*accounting14:00
pstolowskiwoah14:04
mvoijohnson[m]: my gut feeling is that we should just not support this feature on this old systemd there if we can14:07
ijohnson[m]yeah14:08
ijohnson[m]I'm trying to see if xenial has the same problem so we can put a lower bound on the minimum systemd version14:08
ijohnson[m]damn xenial too14:09
mvothat's sad :/14:11
ijohnson[m]I'm checking bionic too fwiw14:11
ijohnson[m]I think bionic was fine, I never saw this sort of thing happen on bionic, but this is a new way to reproduce the same thing14:12
ijohnson[m]also the really unfortunate thing about this is that now we have a quota group (cgroup) which has a real non-zero memory usage with only one task in it, and removing another quota group causes systemd to think that group has infinite tasks and infinite memory usage14:13
ijohnson[m]if the bug was just affecting empty quota groups it's meh but since this is affecting real groups with real services in them it's pretty sad14:13
pedroniswell I hope bionic is fine, if it is we can just make bionic systemd the minal requirement, if it's not fine we are in bigger trouble14:14
ijohnson[m]indeed, I should know in a minute or two14:15
ijohnson[m]phew bionic is okay14:15
ijohnson[m]so systemd 229 is broken, 237 is okay though14:16
ijohnson[m]I'm gonna put this new reproducer into a spread test and try with the minimum set to 230, see if any other systems are affected with systemd versions in between 229 and 23714:16
pedronisit means it's UC18+ feature, but I think that's ok14:16
ijohnson[m]sad for our original uc16 customer who originally asked for this feature in 2018 for uc16 but oh well14:17
mardymvo, pedronis: I guess this MP needs your superpowers to get merged (but please remember to squash): https://github.com/snapcore/snapd/pull/1036314:18
mvomardy: looking14:18
mvomardy: you checked the failures and they are unrelated?14:19
mardymvo: those on 20.04 are about this error: "ERROR Post http://0/v1/service-control: dial unix /run/user/0/snapd-session-agent.socket: connect: connection refused"14:22
pedronisthat's the issues mborzecki mentioned in the standup 14:24
mvoI saw this a bunch of times14:24
ijohnson[m]fwiw I didn't see that at all yesterday in my afternoon14:25
pedronisit's new, we wonder a bit what changed, it's not systemd apparently14:26
pedronismardy: ijohnson[m]: I reviewed https://github.com/snapcore/snapd/pull/10266 , I think there is more code that can be removed but conflict-wise it might be better for ijohnson[m] to do it in one its open quota PRs14:31
pedronisonce this lands14:32
mardypedronis: thanks, looking14:32
ijohnson[m]pedronis: what are your thoughts about having a flag to not format units as 1.29MB and instead return 1294336 B without the SI units ? it would be convenient for some tests at least I think14:32
ijohnson[m]pedronis: ack yeah that was my thoughts too that it will be simpler to just land mardy's PR and then do followups to clean up the quota stuff14:32
ijohnson[m]pedronis: my thoughts on the flag is that it would be a flag mixin like --abs-time like --abs-units or something14:33
pedronisyes, I was thinking that we have precedent for this in --abs-time, so the question is how to call it14:33
ijohnson[m]--no-metric-unit-prefixes14:34
ijohnson[m]?14:34
ijohnson[m]maybe just --no-unit-prefixes or --base-units14:34
pedronisis it about anything other than sizes?14:35
ijohnson[m]not sure if we have other units that get formatted like that, I suppose right now it is just sizes14:35
ijohnson[m]though also it's probably safe to say that snapd will probably not start returning any volts or grams :-)14:36
mvopedronis: fwiw, there was a systemd update 22h ago into focal-updates, no idea right now if that is related to the new failures14:39
mvo(changes look unrelated though)14:40
pedronisijohnson[m]: so I discovered that we strutil.SizeToStr nad quantity.FormatAmount :/14:40
pedronis*we have14:40
ijohnson[m]nice 2 implementations is better than 1 haha14:41
pedronisijohnson[m]: so I suppose we would need a sizesMixin like we have a timeMixin14:43
ijohnson[m]yes14:43
pedronisabout the name --base-sizes  --byte-sizes ?14:44
ijohnson[m]hmm I like --byte-sizes just because it seems smaller and more bite sized to me 🥁 14:44
pedronis--byte-sizes seems fine to me14:45
ijohnson[m]alright I'll try to work that in14:45
pedronisijohnson[m]: afaict info quota and snapshot are relevant, apparently quota is reusing code from snapshot atm14:46
pedroniss/relevant/are affected/14:47
ijohnson[m]right, I was also thinking snap info which reports the sizes of snaps too ?14:47
pedronisyea, also snapshots report snapshot sizes14:48
pedronisthey define fmtSize which is used by quota14:48
pedronisinfo is using SizeToStr instead14:48
ijohnson[m]pedronis: I don't see any other commands that use memory so I think it's just the snapshots family, the quota family and info14:50
pedronisyea14:50
pedronisso we have strutil.SizeToStr strutil/quantity and gadget/quantity, fun14:51
pedronisnot saying you sort that out now, but we should look into that at some point14:52
pedronisit's a bit much/confusing14:52
ijohnson[m]double the fun14:54
pedronismborzecki: question in https://github.com/snapcore/snapd/pull/1042115:03
ijohnson[m]alright let's see what other versions of systemd want to lose their quota groups support https://github.com/snapcore/snapd/pull/1042515:05
ijohnson[m]I'm confused by the quota group spread test failures in https://github.com/snapcore/snapd/pull/10266, it seems that on some systems the service did get restarted :-/15:08
ijohnson[m]hmm and I can't reproduce them immediately either :-/15:43
pedronisijohnson[m]: are these tests and maybe also your test hitting an issue where start comes back but the services is not active yet?16:12
ijohnson[m]I know the issue now16:12
ijohnson[m]pedronis: the issue is that the test runs too fast, essentially, it dies 1-2 times because of OOM, and systemd is trying to restart it since it failed (since it will try to restart it at most 5 times with default settings), and then we remove the quota before the 3rd try succeeds, and so by the time the slice is removed the 3rd try to start the service succeeds16:13
ijohnson[m]we need to wait after creating the service until systemd has given up trying to start the service16:14
pedronisah16:14
ijohnson[m]*after creating the quota group16:14
pedronisijohnson[m]: or set restart condition to never? but that service is shared with other tests I suppose16:17
ijohnson[m]yeah, I'm hoping there's something we can ask systemd in a loop to see if the start limit has been hit16:17
ijohnson[m]I suppose we could grep the journal for the message that systemd says when it gives up16:17
ijohnson[m]I guess we want `SubState=failed` to indicate that systemd has stopped trying to restart it, otherwise systemd says it is in `SubState=running` or `SubState=auto-restart`16:20
ijohnson[m]cachio: have you seen this sort of failure before? https://pastebin.ubuntu.com/p/qgpKPFtjrN/20:32
cachioijohnson[m], hi21:23
cachiono 21:23
cachiofirst time21:23
cachiois this a PR? master?21:24
ijohnson[m]cachio: it was one of my PR's21:24
ijohnson[m]I since restarted the checks on that PR so I don't have any more logs21:24
cachiook, I am reviewing logs and adding fixes today21:24
cachioif I see this in another pr I'll tell you21:25
ijohnson[m]ack thanks cachio 21:36
=== RzR is now known as rZr

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!