[00:23] hello, can we get some love on the auto-aliasing for the slurm snap? === ijohnson is now known as ijohnson|pto [05:11] morning [06:32] hmm the gadget update test isn't really enabled for uc20 [06:40] mvo: hey [06:40] mvo: do you remember why we kept the gadget update spread test disabled for uc20? [06:46] mborzecki: good morning [06:46] mborzecki: no idea, let's enable it [06:47] mvo: preserving boot config landed yday, so i'm trying to enable it now [06:48] mborzecki: nice [06:50] mvo: there's a bunch of times we do some snap install which is expected to trigger a reboot, and then hope that REBOOT issued by spread will be faster, i'm thinking that maybe we should intercept calls to shutdown like we do in the core20 recovery test [06:52] mborzecki: I think so too [06:53] mborzecki: it's probably the reason why sometimes the core die with connection lost [06:59] PR snapd#8953 closed: tests: enable system-snap-refresh test on uc20 [07:00] mborzecki: I think we have a smoking gun in the 8952 logs [07:00] let me see [07:00] mborzecki: the core20 prepare failed there [07:01] mborzecki: and with your extra debug we now have state.json and the journal :) [07:01] wow, didn't expect state to be soo full [07:03] morning [07:03] mborzecki: yeah, same here, it's quite huge [07:04] mvo: hah emacs choked on font-lock-mode when i pasted the contents [07:05] lol [07:05] mborzecki: get moar RAM! [07:05] and thanks to pstolowski's snap debug state we have nice info [07:05] mborzecki: yeah, it's cool [07:05] mvo: https://paste.ubuntu.com/p/KCmBJVjHZH/ [07:06] mborzecki: a nice smoking gun [07:07] mvo: so there's a purge during autorefresh i take [07:07] mvo: but it's a cloud image, so shouldn't there be a refresh.hold set? [07:08] mborzecki: I think I saw in the logs the previous refresh was 2020-04-28 [07:08] mvo: it clearly isn't https://paste.ubuntu.com/p/vxJ2sYZfkN/ [07:08] mborzecki: so maybe we just need to ask cachio to refresh the image [07:09] mborzecki: indeed [07:09] oh right, it's been more than 60 days? [07:09] PR snapd#8952 closed: tests: enable tests on uc20 which now work with the real model assertion [07:10] mborzecki: yeah, it looks like it (I was looking at the raw json) :/ [07:10] mborzecki: anyway, I think we should also make purge more robust [07:11] mborzecki: it's a bit of an open question how, not sure my (naive) approach that I proposed works for all cases [07:11] mvo: python tells me 60 days was 27.06 [07:11] oh fun [07:12] mborzecki: 8951 fails the same way, it really affects our tests [07:13] mvo: hm i guess only cachio can bump the image? [07:13] mborzecki: yeah, unfortunately [07:15] ok, maybe we can fix purge in the meantime [07:15] mvo: fwiw, the gadget update test failed on uc20 [07:16] are you talking about state.Purge()? [07:16] mborzecki: meh, how so? [07:16] pstolowski: apt purge snapd [07:16] aaah [07:16] ok [07:16] pstolowski: that's what we were discussuing :) [07:16] got scared for a moment ;) [07:17] mvo: idk yet, looks like the new files that were suppsoed to be we added by the update are not present, so either the script that modifies gadget.yaml is doing something incorrect or the test needs tweaking [07:19] mborzecki: ok, keep me updated on this :) [07:21] mborzecki: I'm inclinded to merge 8933 and just do a followup? wdyt? [07:21] mborzecki: especially since ian is not around for the rest of the week [07:22] mvo: yeah, i think it's ok, we can open a PR with tweaks separately [07:24] mborzecki: cool, will do that then [07:34] PR snapd#8933 closed: tests/core/uc20-recovery: apply hack to get gopath in recover mode w/ external backend [07:39] PR snapd#8954 opened: tests: tweak comments/output in uc20-recovery test [07:43] mvo: there's a new typo in https://github.com/snapcore/snapd/pull/8954 [07:43] PR #8954: tests: tweak comments/output in uc20-recovery test [07:44] pedronis: oh no, silly me, fixing now [07:45] PR snapd#8918 closed: many: make nested spread tests more reliable [07:45] mborzecki: I just checked the removal hook of lxd, I don't think it needs a working snapd [07:45] mborzecki: so on purge we could stop snapd first [07:45] mborzecki: and then do all the stopping/removing of snaps [07:45] mvo: heh, so all structures the 20/pc gadget have edition set to 1 or 2, and the test jsut blindly generates structures with edition 1, so nothing gets updated [07:45] mborzecki: given that lxd is installed on the 20.04 image we can even test this for real easily :) [07:46] mborzecki: haha - nice find [07:46] mvo: lxd used to have a stop command that poked lxd to check the reason for stopping [07:47] mborzecki: indeed, I remember onw [07:48] mborzecki: hm, so we need to either abort all refreshes before purging or put snapd in some sort of maintenance mode where it does not install/refreshes anything [07:48] (which we don't have right now) [07:52] mvo: hm in postrm we already do rm -rf /var/lib/snapd, so maybe a problem with with running the postrm from the distro package only? [07:52] mvo: i mean the version that's currently installed in the image [07:55] re [07:55] sorry, polari is a terrible terrible IRC client [07:55] can you guys see my messages now? [07:56] zyga: hahahah, i feel you [07:56] I was talking all morning [07:56] hexchat FTW !! [07:56] but apparently polari didn't care to tell me I was not getting my messages across [07:56] I even asked you guys what IRC clients you use [07:56] anyway [07:56] mvo, mborzecki please just remember that stopping services must be done before unmounting snapd/core [07:56] (I wrote this a moment ago) [07:56] zyga: works now [07:56] zyga: i've tried polari so many times, disappointment each time [07:57] thank you Pawel [07:57] zyga: fwiw, try to find the directory where the channel logs are kept :P [07:57] hot garbage, let me get hexchat [07:57] in other news, most programs suck at error handling [07:59] mvo: ok, so we ahve prerm which stops all snap.* services, followed by snapd (via dh), then purge runs which removes everything incl /var/lib/snapd followed by dh purge [07:59] mvo: so if purge runs and /var/lib/snapd is removed as the last step, how come it's still there? [07:59] mborzecki: is stopping lxd really stopping it? [08:01] mvo: maybe purge fails but since we wrap it with quiet there's no logs [08:02] did you guys see this ? https://forum.snapcraft.io/t/auto-connected-interfaces-disappeared-after-dist-upgrade/18566 [08:02] ogra: I did [08:02] oh right [08:02] he was around on the weekend here on IRC [08:02] pstolowski: I wrote about this but it never got through [08:02] pstolowski: we should not auto-remove connections from the state [08:02] pstolowski: if those refer to implicit slots [08:02] pstolowski: as those are just masking a bug [08:03] mborzecki: how do I ask hexchat to talk to nickserv [08:04] zyga: idk, isn't that automatic? [08:04] I cannot even find the setting [08:04] I don't want to manually authenticate each time I disconncet [08:05] PR snapd#8955 opened: tests/lib/pkgdb: do not use quiet when purging debs [08:05] zyga: https://freenode.net/kb/answer/hexchat ? [08:05] thanks! [08:06] mvo: ^^ 8955 [08:06] hmm === zyga is now known as zyga-x240 [08:10] zyga: oh, hmm [08:11] test [08:11] maybe now? [08:11] pstolowski, can you see this? [08:11] zyga: i can see your messages [08:12] \o/ [08:12] thanks, and sorry for not being talkative :P [08:12] how can polari not to any error handing :P [08:13] Polari felt kind of half finished when I last tried it [08:13] that, plus it used a lot of memory for what it did [08:13] removing it I noticed it pulled in the whole telepathy/empathy stack [08:14] it uses the Telepathy IRC backend, yes. [08:14] jamesh: maybe it's because it relied on gjs [08:14] zyga: what auto-remove of connections do you mean? sorry i'm slow this morning and also fighting conflicts in my branch [08:15] pstolowski, we have a piece of code that runs on startup [08:15] pstolowski, if you have a connection in the state [08:15] pstolowski, but the corresponding plug and slot is gone, we remove the connection from the state [08:15] mborzecki: that's probably part of it, but probably not all of it. [08:15] pstolowski, now let's assume that there's another bug that makes core not have any interfaces on startup [08:15] pstolowski, (i have an idea what that bug is) [08:15] pstolowski, when that bug happens, you permanently drop connections [08:15] pstolowski, like we saw twice now [08:15] pstolowski, on next boot you will get properly slotted core but the connections will be gone [08:16] jamesh: otoh, it's a nice demo showing how you can actually build an app in javascript for gnome desktop [08:16] zyga: removeStaleConnections ? we only remove if snap is not installed [08:16] exactly [08:16] pstolowski, and when core is broken [08:16] it's bye bye [08:16] I bet what is going on is that on startup snapd runs before core is mounted [08:16] I saw that in spread tests a few times [08:16] zyga: right, that could explain it [08:17] and I don't think there's explicit synchronization anywhere to force that [08:17] I'll check what happens when a snap is broken [08:17] we know about it from the state [08:17] but it's not mounted [08:18] if we drop connections there then that's a good indicator of what occured [08:20] zyga: thank you, i can help later, need to finish services PR [08:20] sure [08:20] * zyga returns to gimp for now [08:25] mborzecki: looking in a bit, in a meeting right now === zyga-x240 is now known as zyga [09:03] * zyga is in love with vscode === rawr is now known as grumble [09:37] zyga: I had a go at porting my dbus-activation-session-legacy test to use a systemd unit to manage the private session bus. It seems to be hitting the invariant-tool checks though. Can you think of anything obviously wrong with this? https://github.com/jhenstridge/snapd/blob/dbus-activation-wrappers/tests/main/dbus-activation-session-legacy/task.yaml [09:38] jamesh: looking [09:39] hmmm [09:39] My understanding is that the unit should have been killed when "systemctl stop" completes, but it looks like the process is live enough for invariant-tool to notice [09:39] do I read it right that you set up two buses? [09:39] eval "$(dbus-launch --sh-syntax)" [09:39] and systemd-run --unit=private-session-bus.service \ ... [09:40] the systemd managed one is surely killed [09:40] but the dbus-launch one seems to be unmanaged [09:40] does this make sense? [09:40] You are completely correct. I forgot to delete the code I was converting [09:41] :D [09:42] You happy with managing a private dbus instance through systemd-run otherwise? [09:43] I can't really use tests.session for this, since I explicitly don't want a systemd managed dbus-daemon [09:44] yes, totally [09:44] I understand, I was wondering about that test myself in a review and then it clicked [09:44] I'm happy we have the detector [09:44] as it's one thing to easily leak [09:45] As I understand it, this should also make sure any services spawned by the private session bus are reaped too [09:45] indeed [09:45] since they will be considered part of the same systemd level service [09:45] that's right [09:46] Just be careful with eval "$(cat dbus-launch.env)" [09:46] as it sets the address for the test process as well [09:46] so you may unexpectedly start using it instead of the normal session bus [09:46] it's usually not a problem [09:46] I want the test processes in that test to use that bus [09:47] yeah, I know - you could move that eval so that it only affects [09:47] test-snapd-dbus-service-client.session | MATCH hello [09:47] ( eval / source ; run test cmd ) [09:52] there shouldn't be any other session buses available in that test anyway [09:59] jamesh: you will get a session bus that's socket activated from PAM of root logging in [10:00] mvo: can you take a look at https://github.com/snapcore/snapd/pull/8930 ? [10:00] PR #8930: many: managed boot config during run mode setup [10:01] mvo: in other news, i think i got the spread test working now, there's been a bit more changes to the gadget.yaml that needed accounting fore in the test :/ [10:13] * zyga also got the test right [10:28] brb, small break [10:33] re [10:40] mvo: https://github.com/snapcore/snapd/pull/8956 the last commit there [10:40] PR #8956: tests/core/gadget-update-pc: port to UC20 <⛔ Blocked> [10:40] PR snapd#8956 opened: tests/core/gadget-update-pc: port to UC20 <⛔ Blocked> [10:55] xnox, hey ... seems we have a timedatectl issue on core18 ... [10:55] ogra@pi4:~$ timedatectl |grep service [10:55] systemd-timesyncd.service active: no [10:55] ogra@pi4:~$ systemctl status systemd-timesyncd.service | grep Active [10:55] Active: active (running) since Tue 2020-06-30 18:06:18 UTC; 16h ago [10:55] ogra@pi4:~$ [10:56] not sure why timedatectl reports the service status at all there (it doesnt on any of my non core machines) [10:56] but it definitely reports it worng ... [10:56] xnox, want a bug ? [11:06] xnox, https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1885901 is for you [11:06] Bug #1885901: timedatectl reports wrong status for timesyncd in core18 [11:15] mvo: thank you for the review [11:15] I need a 2nd review on https://github.com/snapcore/snapd/pull/8938 [11:15] PR #8938: sandbox/cgroup: extend SnapNameFromPid with tracking cgroup data [11:17] just a little bit of help to move the backend branch forward [12:01] * zyga did some improvements to snap-update-ns [12:01] need some trivial but boring test adjustment [12:01] but first need some food [12:11] PR snapd#8957 opened: tests: improve nested tests flexibility [12:11] PR snapd#8958 opened: tests: nested test improvements from master (2.45) [12:14] zyga 8955 has a log about dbus from the invariant tool [12:19] mvo: did the test pass in #8591, it seemed to consistently fail setting up core 20 ? [12:19] PR #8591: secboot,cmd/snap-bootstrap: add tpm sealing support to secboot [12:19] mvo: sorry, I meant #8951 [12:19] PR #8951: gadget/install: move udev trigger to gadget/install [12:21] PR snapd#8951 closed: gadget/install: move udev trigger to gadget/install [12:21] PR snapd#8959 opened: gadget,gadget/install: refactor partition table update [12:29] zyga-mbp: i've restated the spread jobs in https://github.com/snapcore/snapd/pull/8909 does it need a review from pedronis/mvo? [12:29] PR #8909: interfaces/apparmor: allow snap-specific /run/lock [12:30] mborzecki: afaiu it was nacked by security, no? [12:30] has that changed? [12:30] pedronis: jdstrand gave +1 [12:31] mborzecki: ok, I see, I should give it a quick look though [12:31] pedronis: afaiu concerns were raised in a related PR #8926 [12:31] PR #8926: Add microstack-support interface [12:32] mborzecki: I labeled it [12:32] mborzecki: I'll probably merge it myself if it's green and I have double checked it [12:32] pedronis: ok, cool [12:33] mborzecki: it's failing on arch atm though? [12:33] pedronis: it's unrelated to the PR, store threw some 400s at some point in mount protocol spread test [12:34] ok [12:56] PR snapd#8892 closed: o/snapstate,servicestate: use service-control task for service actions (9/9) <⛔ Blocked> [12:57] Issue core18#157 opened: timedatectl reports wrong status for timesyncd in core18 [13:00] Hi, I am trying to update FreeCAD snap and get an error during the push operation at the end of the uploard. Anybody experiencing the same issue ? [13:01] The error is Bad Gateway 502 after 99% of uploading [13:01] PR snapd#8960 opened: o/snapstate,servicestate: use service-control task for service actions (9/9) [13:01] the snap is huge about 570269696 [13:01] bytes [13:02] What is the process to report an issue associated with that dashboard ? https://status.snapcraft.io/ [13:51] zyga: https://github.com/snapcore/snapd/pull/8938 this one? [13:51] PR #8938: sandbox/cgroup: extend SnapNameFromPid with tracking cgroup data [13:59] pedronis: do you want to review 8946 yourself or do you think a peek at it is sufficient? [14:03] PR snapcraft#3193 closed: extensions: plug the opengl interface for GNOME [14:28] re [14:28] sorry, had to break to rest [14:28] mborzecki: yes, that one, it has +2 now [14:28] ah ok [14:29] * zyga hates the moment when the drugs wear out and not kick back in again [14:31] PR snapd#8938 closed: sandbox/cgroup: extend SnapNameFromPid with tracking cgroup data [14:47] * zyga starts to feel okay [14:52] zyga: great to hear! [14:54] zyga: can you take a look at #8932 again? [14:54] PR #8932: o/ifacestate: update security profiles in connect undo handler [14:54] sure [14:54] +1 [14:57] ty [15:06] PR snapd#8948 closed: cmd/snap-update-ns: detect unmounted mounts [15:11] jdstrand: when you have a moment, asked a question to unblock: https://github.com/snapcore/snapd/pull/8870#discussion_r448431814 [15:11] PR #8870: interfaces: add gconf interface [15:12] mborzecki: re-reviewed #8930, small comment [15:12] PR #8930: many: managed boot config during run mode setup [15:12] this needs 2nd reviews ^ [15:12] * cachio -> doctor app [15:14] #8909 just needs to get green [15:14] PR #8909: interfaces/apparmor: allow snap-specific /run/lock [15:17] woot :) [15:23] pstolowski: I updatted #8853 [15:23] PR #8853: asserts: introduce the concept of sequence-forming assertion types [15:24] ty, +1 [15:41] PR snapd#8961 opened: cmd/snap-update-ns: handle anomalies better [15:42] is Jamie Strandboge in this channel [15:42] or Alex Murray === alvesadrian is now known as adrian-gluu [15:52] adrian-gluu: jdstrand and amurray [15:52] @zyga thanks as usuall [15:56] @zyga also can you help me with this one https://dashboard.snapcraft.io/snaps/gluu-server/revisions/1/feedback/ [15:57] I don't have permissions for that, sorry [16:06] * zyga reviews assertion sequences [16:42] zyga: fyi, I answered in the forum. their latest revision passed automated review and they just need to push to a channel [16:43] jdstrand: thank you for the note [16:43] np === alvesadrian is now known as adrian-gluu [17:00] adrian-gluu: hey [17:00] hey [17:00] The Snap Store encountered an error while processing your request: bad gateway (code 502).============== ] 99% [17:00] The operational status of the Snap Store can be checked at https://status.snapcraft.io/ [17:00] noise][, nessita: hey, adrian-gluu is having trouble publishing his revision 3 of https://dashboard.snapcraft.io/snaps/gluu-server to stable ^ [17:03] noise][, nessita: https://status.snapcraft.io/ is all green [17:04] i will rebuld the snap just in case [17:04] because the first erorr that it get was this one [17:05] Error while processing... [17:05] The store was unable to accept this snap. [17:05] - binary_sha3_384: A file with this exact same content has already been uploaded [17:05] that will happen if you try to snapcraft push the same snap, but it won't affect the snap that is already there [17:06] adrian-gluu: are you using 'snapcraft release' or the web interface? [17:06] so / i sorted? [17:06] web interface? [17:06] adrian-gluu: the 502 seems like a store side issue, not something you did [17:06] i know [17:08] adrian-gluu: I'm sorry are you saying you used 'snapcraft release' or the web inteface (eg, https://snapcraft.io/gluu-server/releases) [17:08] am using the cli snapcrat --release [17:09] https://snapcraft.io/docs/releasing-your-app [17:09] snapcraft upload --release=stable [17:09] @jdstrand ^^^^ [17:10] adrian-gluu: ah, ok, well, you already uploaded it so you can't again. what you want to do is: snapcraft release gluu-server 3 stable [17:11] degville: there might be a documentation improvement opportunity on https://snapcraft.io/docs/releasing-your-app. that page doesn't mention 'snapcraft release', which might be needed in certain circumstances [17:11] @jdstrand from cli? "snapcraft release gluu-server 3 stable" [17:12] jdstrand: thanks! I'll look into it. [17:12] adrian-gluu: yes, eg, wherever you ran your snapcraft upload command, run that one instead [17:12] ok [17:12] thanks [17:12] i'll try that [17:14] degville: in this case, what happened was the first upload failed automated review, a manual review was requested, then other revisions were uploaded, but they were queued behind the manual review. when the revision that ultimately passed automated review went through review, it didn't get published to a channel (that might be a bug that it lost track of the --stable in upload --stable) [17:15] degville: so the only way out would be snapcraft release (or upload a new revision). not that you have to go into all that in the docs, but that is the context [17:15] hence 'certain circumstances' :) [17:16] adrian-gluu: I see now that it is published. yay! [17:16] YAY! [17:16] is in the store now? [17:16] noise][, nessita: your help is not needed for this, but you may want to investigate on your end the 502 [17:16] adrian-gluu: yes [17:16] $ snap find gluu-server [17:16] Name Version Publisher Notes Summary [17:16] gluu-server 4.1.1 mike-gluu - Gluu Server 4.1.1 [17:17] jdstrand: that makes complete sense, thanks for the explanation. I'll add a similar example because I think pushing a revision rather than waiting for a manual review is common. [17:18] degville: cool, thanks! :) [17:18] np! [17:18] thanks guys [17:19] you're welcome === alvesadrian is now known as adrian-gluu [17:44] pedronis: thanks for the PR feedback in my PRs. I'm going through all of it now [18:38] FYI, something fork-bombed our spread runner [18:38] I'm recovering it now [18:39] we should look, I think that is our fault actually [18:39] I will send logs [18:51] * cachio -> kinesiologist [18:52] PR snapd#8962 opened: tests: allow to add a new label to run nested tests as part of PR validation [19:20] everything is operational now and we are burning through the backlog, early analysis seems to indicate that we just ran over capacity, with individual worker nodes using too much memory all in one go [19:20] I've established memory limits on each container now [19:21] we may also consider reducing the number of workers a little so we never run over maximum per container * number of containers [19:21] I will also try to set a global limit on all containers together, so they never eat all the memory [19:22] * zyga-x240 EODs [19:22] I will check back from time to time to see if anything breaks [19:22] I also enabled backup capacity to drain the queue faster [19:22] so we are now at 64 workers [20:13] PR snapcraft#3195 opened: extensions: introduce flutter-master [20:22] zyga-mbp: hey o/ [20:22] zyga-mbp: you recommended a rpi case / cooling solution a while back that worked really well, do you have a link to that ? [20:22] istr it was passive cooling