mborzecki | morning | 05:54 |
---|---|---|
mardy | 'morning mborzecki | 06:13 |
zyga-mbp | good morning | 06:30 |
mborzecki | mardy: zyga-mbp: hey | 06:32 |
mardy | zyga-mbp: hi! | 06:41 |
pstolowski | morning | 07:02 |
zyga-mbp | hej pstolowski | 07:02 |
mardy | hi pstolowski | 07:33 |
pstolowski | hey mardy | 07:35 |
pstolowski | good morning mvo ! | 07:35 |
mvo | good morning pstolowski | 07:36 |
zyga-mbp | good morning mvo | 07:41 |
zyga-mbp | long time no see | 07:41 |
mvo | zyga-mbp: hey, indeed! nice to see you | 07:42 |
* zyga-mbp I was busy on my windows system lately, doing some fun and exciting stuff :) | 07:42 | |
zyga-mbp | it's about deploying Raspberry Pi 4 as a LAVA dispatcher for testing zephyr boards | 07:43 |
* zyga-mbp did his first cloud-init based nested setup | 07:43 | |
zyga-mbp | there's ubuntu, snaps and lava in the mix | 07:43 |
zyga-mbp | what's not to like :) | 07:43 |
zyga-mbp | oh and landscape too | 07:44 |
* zyga-mbp mvo check out my detailed instructions and the photos of the assembled set https://git.ostc-eu.org/OSTC/infrastructure/rpi4-metal-setup :) | 07:45 | |
mvo | zyga-mbp: oh, nice! | 07:46 |
zyga-mbp | I will be setting up a few of those tomorrow, should fill one shelf in the rack | 07:47 |
zyga-mbp | and finally some zephyr :) | 07:47 |
mup | PR snapd#10866 closed: many: replace state.State restart support with overlord/restart <Created by pedronis> <Merged by pedronis> <https://github.com/snapcore/snapd/pull/10866> | 08:28 |
dn___ | I have a desperate problem with snap (I think). I see ` Switch "lxd" snap to cohort "+"` in snap changes and this kills my lxd cluster; I have than to remove lxd, install lxd and restore from savepoint; | 09:07 |
dn___ | What does the `cohort '+'` mean? And how can I stop it? | 09:08 |
dn___ | I see e.g. | 09:19 |
dn___ | 119 Done today at 03:34 UTC today at 03:34 UTC Switch "lxd" snap to cohort "+" | 09:20 |
dn___ | several times in the log after each other, at some point it stops and lxd is broken and I need to remove/install it again to make it work; | 09:20 |
pstolowski | dn___: that "+" looks very weird, i cannot see anything obvious in the code (yet). you haven't played with cohorts before, have you? you can try snap switch --leave-cohort lxd | 09:31 |
dn___ | pstolowski: never - what does leaving mean? (I don't want to wreak avok in the lcust eagain) | 09:32 |
dn___ | I'm kinda struggeling with snap/lxd since a while; most of the time it works fine - sometimes it starts upgrading, but not all nodes update at the same moment and than the cluster breaks - but doesn't shutdown; this chorot thingie shutdown the lxd and all VMs | 09:32 |
pstolowski | dn___: see https://forum.snapcraft.io/t/managing-cohorts/8995 ; leaving a cohort basically means that lxd snap would not be constrained by the given cohort when refreshing. what does 'snap info lxd' show - is there a cohort listed? | 09:48 |
dn___ | pstolowski I run now the command - but let me check if I have an old output | 09:49 |
dn___ | http://pastie.org/p/2Dc7yuneXwLZFEpe0nUfjz this was before I run it - and http://pastie.org/p/31hdtXvVAag2PeUp9ORLJh after | 09:53 |
mborzecki | mvo: can you land https://github.com/snapcore/snapd/pull/10882 ? | 09:54 |
mup | PR #10882: tests/main/interfaces-many: run both variants on all possible Ubuntu systems <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/10882> | 09:54 |
mborzecki | mvo: and this one too: https://github.com/snapcore/snapd/pull/10884 | 09:56 |
mup | PR #10884: tests/main: disable cgroup-devices-v1 and freezer tests on 21.10 <Simple 😃> <Created by bboozzoo> <https://github.com/snapcore/snapd/pull/10884> | 09:56 |
pstolowski | dn___: that's very weird, there is no trace of cohort in the first output. something tried to switch cohort for lxd snap but I've no idea what. do you use any kind of orchestration for your clusters - something that could automatically do something like this (e.g. snap switch --cohort ... or snap refresh --cohort=... lxd? | 10:00 |
dn___ | pstolowski: hm - the servers are bare metal, setuped via ansible - but that's like 1y ago :) otherwise I only use the lxc create/destroy via API nothing else/nothing fancy | 10:04 |
dn___ | pstolowski: the only other thingie I got as piece of info: it seems like the cluster goes out of sync from time to time (lxd) - and this seems to be related to snap updating/refreshing nodes | 10:05 |
dn___ | pstolowski: http://pastie.org/p/0DzGQNrXA1mdwKbjWQRjIY if that is any clue/help - the remove/install and restore is me fixing manual the node | 10:06 |
pstolowski | dn___: it's a bit misterious what happened, the "+" isn't even a valid cohort id afaik. | 10:08 |
pstolowski | dn___: also, "Switch "lxd" snap to cohort ..." task can only be a result of manual invocation of "snap switch.." command (or call to snapd's rest api) | 10:09 |
dn___ | oh, I run it manual on all instances - but it happend again :/ | 10:10 |
dn___ | cluster down - what a joy; anything you want me to check before I do remove lxd, install lxd?:) I'm so glad for any idea | 10:10 |
dn___ | s | 10:10 |
pstolowski | dn___: is lxd in cohort "+" on these other instances? | 10:10 |
dn___ | http://pastie.org/p/2TrdClykJi6sQUGCQrN4Zw | 10:11 |
pstolowski | dn___: oh it got switched to "+" again? | 10:11 |
dn___ | pstolowski: I'm not 100% sure how to check; but all throw the same error before they kill LXD in a bad way | 10:11 |
dn___ | yeap :/ | 10:11 |
dn___ | also -> this than leads to `-bash: /snap/bin/lxc: No such file or directory` | 10:12 |
dn___ | If I now stop lxd; remove it, install it, restore from saved, start it - it will be fine agian | 10:12 |
dn___ | also removing is bugging than, too and I need todo it manual | 10:13 |
dn___ | http://pastie.org/p/0CGlgR34Fn0onbyoEJ1EvD | 10:13 |
dn___ | ls -al /var/snap/lxd/21497/ | 10:14 |
dn___ | drwxr-xr-x 2 root root 4096 Oct 6 09:52 . | 10:14 |
dn___ | drwxr-xr-x 3 root root 4096 Oct 6 10:13 .. | 10:14 |
dn___ | it keeps an empty dir | 10:14 |
dn___ | also -> `lxd 21624 latest/stable canonical✓ disabled,broken,in-cohort` | 10:14 |
dn___ | while broken | 10:15 |
pstolowski | dn___: so I think the magical "Switch "lxd" snap to cohort "+"" is done by something else outside of our control (it's not snapd) | 10:15 |
pstolowski | and this breaks everything | 10:15 |
ogra | ogra@anubis:~$ grep -r cohort /snap/lxd/current/* | 10:17 |
pstolowski | we shouldn't fall over this for sure so this looks like a problem | 10:17 |
dn___ | hmm, any idea what it coudl be? It's a bare metal machine and nothing beside LXD/snap runs on it/is installed + the VMs | 10:17 |
ogra | /snap/lxd/current/commands/daemon.start: nsenter -t 1 -m snap switch lxd --cohort=+ >/dev/null || true | 10:17 |
ogra | pstolowski, ^^^ | 10:17 |
pstolowski | ouch | 10:17 |
dn___ | right after install snap list shows `lxd 4.19 21654 4.19/candidate canonical✓ -` | 10:17 |
ogra | yes | 10:17 |
dn___ | but after a moment it shows | 10:17 |
dn___ | `lxd 4.19 21654 4.19/candidate canonical✓ in-cohort` | 10:17 |
mup | PR snapd#10874 closed: gadget: mv ensureLayoutCompatibility to gadget proper, add gadgettest pkg <Created by anonymouse64> <Merged by bboozzoo> <https://github.com/snapcore/snapd/pull/10874> | 10:18 |
mup | PR snapd#10882 closed: tests/main/interfaces-many: run both variants on all possible Ubuntu systems <Created by bboozzoo> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/10882> | 10:18 |
mup | PR snapd#10884 closed: tests/main: disable cgroup-devices-v1 and freezer tests on 21.10 <Simple 😃> <Created by bboozzoo> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/10884> | 10:18 |
mup | PR snapd#10892 opened: tests: add (strict) microk8s smoke test <Created by mvo5> <https://github.com/snapcore/snapd/pull/10892> | 10:18 |
pstolowski | ok looking at the comment there "+" means something special | 10:19 |
dn___ | thank you very much; will try to get the node up agian :/ | 10:19 |
dn___ | does it maybe help if I try to downgrade the cluster from 4.19/candidate to 4.18/stable - and is that even possible?:) | 10:21 |
ogra | uh, yo are running production from a candidate channel ? | 10:22 |
ogra | *you | 10:22 |
pstolowski | dn___: might be worth filing a bug against snapd+lxd with all the details (and snap changes list) if it is reproducible | 10:22 |
dn___ | @o | 10:22 |
dn___ | @o | 10:22 |
dn___ | orga: sorry for spam / not on purpose... I think | 10:22 |
dn___ | I might wanted 4.X when I did the setupt and it was just a candidate - and forgot about it :/ | 10:22 |
dn___ | Is there a way of downgrading it to 4.18/stable? | 10:23 |
pstolowski | snap refresh --stable lxd | 10:23 |
dn___ | I would need this on all nodes + is there any issue e.g. with sqlite/migration/something? (just wondering - never did that) | 10:24 |
pstolowski | dn___: i don't know, may make sense to ask this on snapcraft.io forums, or maybe stgraber ^ can advise if he is online | 10:26 |
ogra | and you should probably file a bug about the cohort thing so it does not move from candidate to stable later ... | 10:27 |
dn___ | thank you - do you know by chance if all 'config/settings/sqlite' are stored in the /var/snap/lxd folder? If so I would stop it, backup, reinstall stable and just try | 10:27 |
dn___ | ogra: got a url to a tracker/where to fill it? | 10:27 |
ogra | ogra@anubis:~$ snap info lxd|grep contact | 10:28 |
ogra | contact: https://github.com/lxc/lxd/issues | 10:28 |
ogra | try there 🙂 | 10:28 |
dn___ | thank you | 10:28 |
dn___ | I still don't really understand what is failing/failover to - is there anything I can google/check? | 10:31 |
vidal72[m] | for how long old snap versions are available in snap store? is there time limit or version limit? | 10:32 |
ogra | vidal72[m], that differs between user and developer/uploader | 10:33 |
ogra | as developer you have access to all revisions ever uploaded ... | 10:34 |
vidal72[m] | and as user? | 10:34 |
ogra | only what the developer released to a track or channel | 10:35 |
vidal72[m] | all revisions ever uploaded to track/channel? | 10:37 |
ogra | no, only the ones the developer released | 10:38 |
ogra | i.e. the current ones | 10:38 |
vidal72[m] | you mean only the latest one? For sure it's not the case, I can download older versions. My question is if there is some cleanup happening over time or can I download those old revisions infinitely? | 10:44 |
pstolowski | mardy: hey, can you take a look at https://github.com/snapcore/snapd/pull/10824 again? | 10:44 |
mup | PR #10824: tests: check that a snap that doesn't have gate-auto-refresh hook can call --proceed <Refresh control> <Created by stolowski> <https://github.com/snapcore/snapd/pull/10824> | 10:44 |
pstolowski | would love to land that one | 10:45 |
vidal72[m] | ogra: is my understanding correct that if uploader has access to old revisions forever then everyone else can have access too if they know the link to old revision? | 10:47 |
dn___ | I found one more odd thing, but maybe I'm reading it wrong: `lxd 4.19 21624 latest/stable canonical✓ -` - read for me like 4.19 is in latest/stable; but snap info lxd shows ` latest/stable: 4.18 2021-09-13 (21497) 75MB -` - am I just reading it wrong? | 10:51 |
ogra | vidal72[m], i dont think so, but a store person would have to answer that ... technically a user should not have any access to not currentlöy released versions | 10:53 |
vidal72[m] | a hidden feature then? :) | 10:58 |
ogra | rather a glaring bug 🙂 | 10:58 |
vidal72[m] | if that's the case then please forget this conversation ;) | 10:59 |
ogra | haha | 10:59 |
mup | PR snapd#10886 closed: o/snapstate: test prereq update if started by old version <Simple 😃> <Skip spread> <Created by MiguelPires> <Merged by bboozzoo> <https://github.com/snapcore/snapd/pull/10886> | 11:13 |
dn___ | without any change from me -> http://pastie.org/p/1AhUlvB7bEZyBwMo8ycAZT | 11:32 |
pstolowski | mvo: could you please land https://github.com/snapcore/snapd/pull/10868 ? | 11:41 |
mup | PR #10868: o/snapstate: support ignore-validation flag when updating to a specific snap revision <validation-sets :white_check_mark:> <Created by stolowski> <https://github.com/snapcore/snapd/pull/10868> | 11:41 |
mardy | pstolowski: +1 | 11:46 |
pstolowski | thx | 11:47 |
pstolowski | dn___: yeah i see this cohort-switching is also present in older versions of lxd (i checked 4.18). can you paste 'snap changes' again? I think it may be best to take this to the forum to also have input from lxd developers | 11:50 |
dn___ | pstolowski: http://pastie.org/p/3o6pg2XAXSXrUHEmlKrbAy is there any better pastie service these days? it seems to have only a ttl of 24h | 12:06 |
pstolowski | dn___: pastebin.ubuntu.com ftw! | 12:07 |
dn___ | needs an account :) | 12:08 |
pstolowski | dn___: just to double-check: right after you notice `Switch "lxd" snap to cohort "+"` task, the lxd snap appears broken right? | 12:09 |
pstolowski | hmm right it needs an account | 12:09 |
dn___ | pstolowski: yes, but I can't say if it happens after the first switch or the last - because the switches happen so fast | 12:10 |
dn___ | pstolowski: https://pastebin.ubuntu.com/p/cdzRymdnhz/ | 12:10 |
pstolowski | dn___: thanks | 12:10 |
dn___ | pstolowski: thank you! Line: 123/124 - it happens quickly and after that it wa sbroken | 12:11 |
pstolowski | dn___: silly question, do you know whey are there multiple switches, is this related to the cluster configuration? | 12:13 |
pstolowski | hmm maybe the daemon start simply fails and is retried | 12:13 |
pstolowski | dn___: one more thing, could you paste the output of systemctl status snap.lxd.daemon.service after it appears broken (and before you remove/reinstall it)? | 12:15 |
dn___ | sorry, lost my connection ... super day :) | 12:36 |
dn___ | pstolowski: regarding multiple switches: no idea, I never knew about cohort before I think or noticed - it also happens 'randomly' - was | 12:36 |
dn___ | will try to keep an eye on the status & check systemctl when it happens | 12:36 |
pstolowski | dn___: ok. i think it's just the deamon failing to start and getting re-tried by systemd a couple of times | 12:42 |
mup | PR snapd#10868 closed: o/snapstate: support ignore-validation flag when updating to a specific snap revision <validation-sets :white_check_mark:> <Created by stolowski> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/10868> | 12:49 |
dn___ | pstolowski: does older log entries help you? I can try to check them when it happend - let me try to find them | 12:56 |
pstolowski | dn___: yes, they might. But it would be best if you collected all this under a bug report | 12:57 |
dn___ | you are right, will try | 12:58 |
pstolowski | dn___: thanks. someone will probably look at this soon, but i'm not sure if it will be me | 13:01 |
dn___ | I'll start collecting - https://pastebin.ubuntu.com/p/f7kP4YJrbg/ and make an issue | 13:02 |
dn___ | that's when the last 'swicherido' happend | 13:02 |
dn___ | pstolowski: happend again -> https://pastebin.ubuntu.com/p/nxysmpc4FT/ (will do issue, still fighting) | 13:45 |
dn___ | pstolowski: just fyi - this is my full journey & output how I fix it when it happens: https://pastebin.ubuntu.com/p/bHG6cvSqb2/ | 13:50 |
pstolowski | dn___: you can stop doing "snap switch --leave-cohort lxd" because lxd wants to be in this cohort and will just keep switching to it anyway | 13:51 |
dn___ | oki, I see - would have been to easy l;-) | 13:51 |
pstolowski | (I initially tought this was a wrong cohort coming from somewhere) | 13:51 |
dn___ | will try to debug/keep the cluster going & write the issue - also glad for any other idea;-) | 13:54 |
pstolowski | dn___: input from lxd guys may help, therefore forum.snapcraft.io may be a good place | 13:57 |
pstolowski | to discuss | 13:57 |
stgraber | dn___, ogra, pstolowski: the + cohort is a special cohort that all LXD cluster users must be in | 14:01 |
stgraber | if some servers aren't in it, they'll get different LXD releases during phased rollout, breaking the cluster | 14:01 |
dn___ | oki, thank you will try (forum/and trying to figure out if same cohort) | 14:04 |
pstolowski | mvo: can you also land https://github.com/snapcore/snapd/pull/10824 ? | 14:14 |
mup | PR #10824: tests: check that a snap that doesn't have gate-auto-refresh hook can call --proceed <Refresh control> <Created by stolowski> <https://github.com/snapcore/snapd/pull/10824> | 14:14 |
mvo | pstolowski: sure | 14:17 |
pstolowski | ty | 14:18 |
mup | PR snapd#10824 closed: tests: check that a snap that doesn't have gate-auto-refresh hook can call --proceed <Refresh control> <Created by stolowski> <Merged by mvo5> <https://github.com/snapcore/snapd/pull/10824> | 14:19 |
mup | PR core20#116 opened: hooks: adjtime: add adjtime file to etc <Created by stulluk> <https://github.com/snapcore/core20/pull/116> | 14:30 |
ijohnson[m] | degville: can you publish https://forum.snapcraft.io/t/quota-groups/25553 on the snapcraft.io/docs site? | 14:58 |
mup | PR snapd#10893 opened: i/builtin/kubernetes_support: add access to Calico lock file <Created by mardy> <https://github.com/snapcore/snapd/pull/10893> | 15:04 |
=== graham1 is now known as degville | ||
degville | ijohnson[m]: sorry for the delay. That doc can be found at https://snapcraft.io/docs/quota-groups, but I'll also add it to the navigation so it's more discoverable. | 15:48 |
ijohnson[m] | ah perfect, I couldn't find it via searching | 15:48 |
ijohnson[m] | thanks degville | 15:48 |
degville | ijohnson[m]: good point about the search. That sounds like a bug. | 15:49 |
pstolowski | mvo: a bit of a confusion: https://github.com/snapcore/snapd/pull/10894 | 16:14 |
mup | PR #10894: [RFC] o/configcore: allow hostnames up to 253 characters, with dot-delimited elements <Created by stolowski> <https://github.com/snapcore/snapd/pull/10894> | 16:14 |
mup | PR snapd#10894 opened: [RFC] o/configcore: allow hostnames up to 253 characters, with dot-delimited elements <Created by stolowski> <https://github.com/snapcore/snapd/pull/10894> | 16:14 |
pstolowski | also ijohnson[m] ^ | 16:20 |
mup | PR snapd#10895 opened: many: wait for up to 10min for NTP syncronization before autorefresh <Created by mvo5> <https://github.com/snapcore/snapd/pull/10895> | 16:34 |
dn___ | Oct 6 18:54:07 n02 snap-failure[331032]: retry.go:49: DEBUG: Retrying https://api.snapcraft.io/api/v1/snaps/names?confinement=strict%2Cclassic, attempt 1, elapsed time=5.781µs | 18:57 |
dn___ | Oct 6 18:54:07 n02 snap-failure[331032]: retry.go:184: DEBUG: Not retrying: &errors.errorString{s:"too many requests"} | 18:57 |
dn___ | does it mean snap does to many requests to the API? (I only got 10 instances, but all use the same external ip) | 18:57 |
mup | PR snapcraft#3588 opened: snap: patch patchelf on riscv64 (CRAFT-566) <Created by sergiusens> <https://github.com/snapcore/snapcraft/pull/3588> | 21:15 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!