wallyworld | kelvinliu: babbageclunk: forgot to check, it shouldn't be too hard to support tag constraints in vsphere i hope. main thing is to be able to query the nodes for what tags they have i think? | 00:52 |
---|---|---|
wallyworld | ie as per the discourse discussion | 00:53 |
kelvinliu | tags is marked as validator.RegisterUnsupported(unsupportedConstraints) , i m not sure why we decided to not support it | 00:58 |
babbageclunk | wallyworld: just reading about tag handling in the vsphere api, I haven't seen it before | 00:59 |
wallyworld | kelvinliu: i'm guess because we didn't yet implelent the api calls to ask vsphere about tags | 01:05 |
wallyworld | i reckon we could do something (he says all handwavy) | 01:05 |
kelvinliu | currently, we just fetch all instances then filter tags in metadata field from client side, https://github.com/juju/juju/blob/develop/provider/vsphere/environ_instance.go#L88 | 01:05 |
wallyworld | so we could do that server side too and support tag placement | 01:06 |
wallyworld | hpidcock: in the for loop to hande the process cancel/kill/exit, i think we'd want to cap the amount of times we retry kill and not get notification of process exit? with a suitable user facing message surfaced | 01:26 |
wallyworld | kelvinliu: tlm: hpidcock: i have a stateful set issue if any of you guys have time for a HO in standup | 03:49 |
tlm | ok | 03:49 |
kelvinliu | yep | 03:50 |
kelvinliu | Updates to the volume claim template are not currently permitted. A feature request to permit this is open at #69041 | 04:27 |
mup | Bug #69041: Beagle: German translation incomplete <beagle (Ubuntu):Fix Released by ubuntu-l10n-de> <https://launchpad.net/bugs/69041> | 04:27 |
kelvinliu | wallyworld: tlm https://github.com/kubernetes/kubernetes/issues/85955 | 04:28 |
=== wgrant is now known as wgrant_ | ||
=== wgrant_ is now known as wgrant | ||
skay | what do I do when my unit thinks there is a relation when there isn't? https://paste.ubuntu.com/p/w5b8Hw82tZ/ | 14:39 |
rick_h | skay: um don't know? and it skipped it so all good? :) | 14:40 |
skay | rick_h: no, juju status says that the agent is in error | 14:40 |
skay | I'm trying to figure out why and that's a suspicious thing in the logs | 14:40 |
skay | I restarted the service for that unit and the service for the machine, btw | 14:41 |
rick_h | skay: oh hmmm, what does juju status --relations show? and are there > 1 units (peer relation?) | 14:41 |
skay | rick_h: there's only 1 postgresql unit. https://paste.ubuntu.com/p/t5kBzghkg2/ | 14:42 |
skay | I did recently remove a relation I no longer needed. Previously there was pgbouncer. I removed it and connected things to postgresql directly | 14:43 |
skay | and since this is my test environment, I take down units that are connected to it willy-nilly and then spin up new units. sometimes apps | 14:44 |
skay | I like how this unit is on machine 101. it is juju failed 101. I should learn some lessons from this | 14:45 |
skay | if I had more ranks in dadjoke I would be able to make a good-worse joke than that | 14:47 |
skay | rick_h: do you have any troubleshooting tips for this? I am at a loss. | 14:54 |
rick_h | skay: sorry, getting pulled in a few directions atm and we've got a bunch of folks out of the office today | 14:56 |
rick_h | skay: no, I mean I would mark it --resolved and try to see if you can get past it | 14:56 |
rick_h | I'm not sure what is up with "skipping" but then an error | 14:56 |
skay | rick_h: I tried marking it as resolved. I can ask again later when things are less hectic | 14:57 |
skay | it's not in an error state, it's 'active' and 'failed' | 14:57 |
skay | I just noticed the yaml status has a better message. 'message: resolver loop error' | 14:58 |
rick_h | skay: oh sorry, I thought you mentioned it was in an agent error | 14:58 |
rick_h | skay: hmmm, can you paste more of the unit log on there then please? | 14:58 |
skay | brb standup | 15:00 |
rick_h | k | 15:00 |
skay | (the only thing I see in the log is the thing I pasted. I'm tailing it. I'll restart it to see if it has different output after) | 15:00 |
rick_h | skay: ok | 15:06 |
skay | rick_h: I've been tailing hte postgres unit's log for a while now and those two lines are the only thing that show up. | 15:21 |
rick_h | achilleasa: do you have any ideas around this agent error skay is seeing? https://paste.ubuntu.com/p/w5b8Hw82tZ/ | 15:22 |
skay | achilleasa: here's a snippit from the status. the juju-status messages is 'resolver loop error' https://paste.ubuntu.com/p/GxVq58pH3z/ | 15:26 |
achilleasa | skay: looking | 15:43 |
achilleasa | skay: which juju version are you using on the controller? | 15:47 |
skay | achilleasa: 2.6.10 | 15:47 |
skay | they will be upgrading the controller soon | 15:49 |
achilleasa | skay rick_h: so there are a couple of places in (https://github.com/juju/juju/blob/2.6/worker/uniter/relation/relations.go) where this error is raised but there is not enough context to figure out which one is it (best guess is L383 or L400) | 16:02 |
achilleasa | maybe you could try to remove-unit --force to get rif of the stuck unit and spin up a new one? | 16:03 |
skay | achilleasa: ouch. that's my postgresql unit. it's not extremely painful since i don't care about the database in this environment, but if it happens in a real environment it would be painful | 16:06 |
achilleasa | skay: can you share a mongo dump with me? maybe I can track down which relation name is associated with the 303 ID | 16:07 |
skay | achilleasa: I do not have access to the controller. would I need that? if I don't, then are there docs on how to get a dump? | 16:11 |
achilleasa | skay: you might be able to use "juju create-backup" (see https://jaas.ai/docs/controller-backups) | 16:15 |
hml | achilleasa: i’ve reviewed 11255 and added comments. still have qa to do | 16:41 |
hml | achilleasa: one is more of an observation and question, rather than a request to change as it’s a set pattern in the code there. :-/ | 16:42 |
achilleasa | hml: I keep messing up the import stanzas... :-( | 16:43 |
hml | achilleasa: doesn’t help that the static analysis job isn’t working correctly for imports either. | 16:43 |
hml | mine get minimized, so i don’t see them until i push the code up to GH | 16:44 |
achilleasa | I will clean up the commits and force-push the right version | 16:44 |
achilleasa | hml: I think I fixed the stanza issues; can you take another look? | 17:00 |
hml | achilleasa: sure | 17:04 |
hml | achilleasa: the qa isn’t working for me. ho? | 17:15 |
achilleasa | hml: omw | 17:20 |
hml | achilleasa: https://pastebin.canonical.com/p/kS82HDydk7/ model errors | 17:24 |
hml | achilleasa: https://pastebin.canonical.com/p/BQxgqD7zqC/ controller errors | 17:25 |
gnuoy | I'm after some advice if anyone has a sec. I have a charm with a hook erroring with: | 18:03 |
gnuoy | 2020-02-27 17:49:12 ERROR juju.worker.uniter.operation runhook.go:132 hook "ceph-client-relation-changed" failed: could not write settings from "ceph-client-relation-changed" to relation 0: permission denied | 18:03 |
gnuoy | if I resolve the hook it works | 18:03 |
gnuoy | sorry, I meant: | 18:03 |
gnuoy | if I resolve the hook using a debug-hook session the error goes away | 18:03 |
rick_h | gnuoy: ooh, I think achilleasa just fixed this one | 18:03 |
gnuoy | if I do it without a debug-hook session it persists | 18:03 |
rick_h | gnuoy: single unit? | 18:04 |
gnuoy | always the non-leader of a two unit deploy in my case | 18:04 |
rick_h | gnuoy: hmmm yea non-leaders can't write leader data. Sounds like a charm logic problem then | 18:04 |
gnuoy | right, but why does the debug-hook session make a difference ? | 18:05 |
rick_h | gnuoy: the charm should be checking if it's the leader before trying to write the data | 18:05 |
gnuoy | yep, it should, and I believe it is | 18:05 |
rick_h | gnuoy: ok so the question is why does it not do it with debug-hooks? | 18:06 |
gnuoy | yes. Full disclosure I'm using the operator framework and the bug could lie in there. but its hard to track down when using debug-hooks seems to make it go away | 18:08 |
gnuoy | I'm ssh'd onto the unit and happily resolving the hook and reproducing the error | 18:09 |
gnuoy | rick_h, I don't want to waste your time, the bug is almost certainly outside of juju. Just wonder if anything spring to mind about the difference in hook env when using debug-hooks ? | 18:10 |
rick_h | gnuoy: thinking but confused tbh...if you're ssh'd to the unit I would think you'd not have hook context and have issues | 18:10 |
gnuoy | rick_h,oh, I'mm ssh'd in just to observe whats happining, not executing the hook from the ssh session | 18:10 |
rick_h | debug-hooks sets up the hook context, but I can't think of why it would affect leader data type stuff...unless maybe it's working around a check somehow? | 18:10 |
gnuoy | I wonder if this is a focal'ism | 18:13 |
hml | gnuoy: which version of juju? | 18:18 |
gnuoy | I tried with 2.8.1 and 2.7.3 | 18:19 |
hml | gnuoy: there should be no difference between the context during debug-hook and the regular hook execute then. | 18:19 |
hml | previously the only diff i know of was an env var not set with debug hook | 18:20 |
gnuoy | hmm, ok, I must be doing something truly stupid | 18:20 |
hml | but nothing to do with leadership | 18:20 |
hml | gnuoy: it’s always possible something is off with not debug-hooks. definitely shouldn’t be seeing a diff in hook execute | 18:26 |
gnuoy | fwiw https://paste.ubuntu.com/p/5df6Yw6XFH/ | 18:27 |
hml | gnuoy: are there any errors in the juju debug-log for that model? | 18:29 |
gnuoy | hml, does juju just use the exit code of the hook to determine if the hook worked ? | 18:32 |
hml | gnuoy: yes | 18:32 |
gnuoy | hml, I'm going to stop using up your time and go do some more digging, thanks for the ideas | 18:34 |
hml | gnuoy: have fun. i need to lunch, but will be back later | 18:34 |
tlm | morning | 20:42 |
rick_h | morning tlm | 20:42 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!