wallyworld | kelvinliu: next azure PR if you get a chance at some point (another step along the way) https://github.com/juju/juju/pull/11998 | 04:26 |
---|---|---|
kelvinliu | looking | 04:44 |
kelvinliu | wallyworld: lgtm | 04:59 |
kelvinliu | ty | 04:59 |
wallyworld | tyvm | 05:02 |
flxfoo | hi all | 06:22 |
flxfoo | having trouble with pylibjuju, can I use something else? like `curl` to retrieve data from a model? | 06:22 |
wallyworld | what trouble? curl will be messy, using a client is better. what's the issue? | 06:25 |
flxfoo | wallyworld:I posted on discourse, but I have `RPC: Connection closed, reconnecting` | 06:27 |
flxfoo | then on controller side the `logsink.log` has `ERROR juju.apiserver apiserver.go:939 error serving RPCs: codec.ReadHeader error: error receiving message: websocket: close 1009 (message too big)` | 06:28 |
flxfoo | of course, nothing changed on those machines, the day before the script was working fine, then the next I have this | 06:28 |
wallyworld | i've not used pylibjuju much, i assume the backup action is configured to leave the backup on the controller and not try to stream it down to the client? | 06:29 |
flxfoo | yeah, mainly that just run the action. login, run action, do some server stuff,... | 06:30 |
flxfoo | steps are | 06:30 |
flxfoo | instanciate the controller is fine | 06:30 |
wallyworld | oh, so a charm action | 06:30 |
flxfoo | yes | 06:30 |
wallyworld | sorry, i head went to a juju backup | 06:31 |
flxfoo | the error raise up, on `get_model()` python call | 06:31 |
flxfoo | it is much before an action is called | 06:31 |
flxfoo | the RPC cuts, when getting model data | 06:32 |
wallyworld | i know we had to increase the msg frame size for the juju client, but that was ages ago, perhaps there's a pylibjuju tweak needed | 06:32 |
flxfoo | I tried from 512 until 65535 using max_frame_size, with no luck | 06:33 |
flxfoo | I am not sure this is about the python lib actually but maybe what the controller answers | 06:34 |
wallyworld | what version is the controller | 06:35 |
flxfoo | 2.8.0 | 06:35 |
wallyworld | i can't recall exactly what's been fixed since 2.8.0, but upgrading to 2.8.2 would be something to consider | 06:36 |
flxfoo | ok, I would need to create a new controller and migrate model yes? | 06:36 |
wallyworld | no, you can just upgrade | 06:36 |
wallyworld | juju upgrade-controller | 06:37 |
wallyworld | you can use --dry-run to see what it would do | 06:37 |
flxfoo | it is a production platform... not sure... | 06:37 |
wallyworld | migration is a good option then | 06:38 |
flxfoo | k thanks | 06:38 |
wallyworld | you can even deploy atest model on 2.8.2 | 06:38 |
wallyworld | and see if the issue goes away | 06:38 |
flxfoo | I have found in logs that a previous model with same name, which raise some error could it be a potential issue related? | 06:39 |
wallyworld | juju doesn't allow 2 models with the same name, so any logs would be for a previously delete model i would expect | 06:39 |
flxfoo | like this `ERROR juju.worker.dependency engine.go:671 “api-caller” manifold worker returned unexpected error: [619092] “machine-0” cannot open api: model cache: model “61909253-939f-48c0-8452-389111410e43” did not appear in cache timeout` | 06:39 |
wallyworld | that could be something else, i don't know off hand | 06:40 |
flxfoo | k | 06:40 |
wallyworld | 2.8.2 did fix some model cache issues i think | 06:40 |
flxfoo | and this `ERROR juju.worker.dependency engine.go:671 “mgo-txn-resumer” manifold worker returned unexpected error: cannot resume transactions: The update path ‘settings.’ contains an empty field name, which is not allowed.` | 06:41 |
flxfoo | (sorry to bother, and I don't have more) :P | 06:41 |
flxfoo | p | 06:41 |
wallyworld | not seen that before. we'd need to see a bit more info like what lead up to that error, any previous errors etc. and maybe even a sanitised chunk of the relevant db records. worth filing a bug | 06:43 |
flxfoo | that's from logsink.log file (appear as well in machine-0.log) , that error is looping together with the previous one. around every 3 minutes. | 06:45 |
flxfoo | last one, more general, in case of spawning a new controller and migrating the model, would previous non exists model or objects in general would be cleaned up? | 06:46 |
wallyworld | a new controller starts empty. only the migrated model is copied across, so any old cruft is left behind | 06:48 |
wallyworld | stuff restarting mneans something has gone wrong and needs looking into, likely a bug that needs fixing | 06:49 |
flxfoo | talking about the RPC? | 06:49 |
wallyworld | no the error being logged every 3 minutes | 06:50 |
flxfoo | k | 06:50 |
flxfoo | the uuid in front of that log line is the model uuid? | 06:51 |
flxfoo | probably not | 06:51 |
flxfoo | the whole line is eg. | 06:51 |
wallyworld | the uuid is the model uuid | 06:52 |
flxfoo | 66c9ba24-69af-40f0-842f-8613777c1491: machine-0 2020-09-15 06:45:04 ERROR juju.worker.dependency engine.go:671 "mgo-txn-resumer" manifold worker returned unexpected error: cannot resume transactions: The update path 'settings.' contains an empty field name, which is not allowed. | 06:52 |
wallyworld | something has really got itself into a funny state | 06:52 |
flxfoo | how do I know the uuid of models? | 06:52 |
wallyworld | juju models --format yaml | 06:53 |
flxfoo | yeah ... I might have done something wierd, but except creating and removing models or apps, really nothing special | 06:53 |
flxfoo | k | 06:53 |
flxfoo | ok so that controller model for sure | 06:54 |
wallyworld | juju should not error for that | 06:54 |
flxfoo | mgo I suppose this is mongodb related | 06:54 |
flxfoo | platform is on aws | 06:55 |
wallyworld | more likely juju messing up when a model was removed | 06:55 |
flxfoo | I see... so yeah best option would be to respawn a new controller and migrate the current model... | 06:57 |
flxfoo | how can I debug or get into the mongodb db? | 06:58 |
wallyworld | you can dump a model as yaml by export JUJU_DEV_FATURE_FLAGS=developer-mode and then juju dump-model or juju dump-db | 06:59 |
flxfoo | is output heavy? | 07:00 |
flxfoo | there is not a lot of nodes | 07:00 |
flxfoo | 17 | 07:00 |
flxfoo | sorry 13 | 07:01 |
wallyworld | the size is more related to how many apps / units | 07:02 |
flxfoo | 7 apps, 13 units | 07:04 |
wallyworld | it won't be much | 07:04 |
flxfoo | done | 07:08 |
flxfoo | sequences: have a few application entries that are no more | 07:08 |
flxfoo | wallyworld:thanks ... I will try to look into that... | 07:19 |
wallyworld | the sequence entries are ok, they will be ignored. but they should have been cleaned up | 07:33 |
flxfoo | wallyworld:ok, so definitely something did not get right... | 08:02 |
flxfoo | wallyworld:sorry to bother, port 37017 should be accessible to juju client (cli) right? | 08:15 |
flxfoo | because sniffing network I have traffic coming from the vm public interface to the private interface on port 37017... | 08:17 |
stickupkid | flxfoo, that's the mongo port | 08:28 |
stickupkid | flxfoo, the default API port is 17070 | 08:29 |
flxfoo | ok good news I found in https://pythonlibjuju.readthedocs.io/en/latest/api/juju.client.html#module-juju.client.connection | 08:43 |
flxfoo | that the default MAX_FRAME_SIZE is 4194304 | 08:43 |
flxfoo | when I doubled it, RPC error is gone | 08:43 |
stickupkid | flxfoo, over 4MB of data, that is some impressive data | 08:45 |
flxfoo | stickupkid:right | 08:45 |
flxfoo | now I dumped the model and the yaml file is about 8022129o | 08:46 |
flxfoo | which fits | 08:46 |
stickupkid | flxfoo, classic | 08:46 |
flxfoo | now , there might be some extra data in there | 08:46 |
flxfoo | is there is way to "cleanup" | 08:46 |
flxfoo | ? | 08:47 |
stickupkid | flxfoo, without seeing it, I'm unsure | 08:47 |
flxfoo | ok, I mean there is no gc type of tools? | 08:48 |
flxfoo | stickupkid:I quite manipulate (for testing pov) the models, like adding removing apps/units etc... | 08:49 |
flxfoo | I suppose that comes from that | 08:49 |
flxfoo | If there is no more much changes, that size should not move forward right? | 08:49 |
flxfoo | and the only way would be to respawn a model? or migrate maybe? | 08:49 |
stickupkid | flxfoo, so I would hope we would clean up if you removed a unit/application | 08:50 |
stickupkid | flxfoo, but yeah, they should be stable size if your model is stable | 08:50 |
flxfoo | stickupkid:right now (production) there is 13 units 7 apps not much | 08:51 |
flxfoo | stickupkid:before that "stable" model, I played quite a while adding/removing etc probably a lot of units and apps... and maybe some pieces haven't been removed properly or so... | 08:52 |
flxfoo | stickupkid:does that sounds plausable? | 08:52 |
stickupkid | yeah, but do keep an eye on it and open a bug if you believe we should be cleaning up when we're not | 08:52 |
flxfoo | stickupkid:sure will do... I try to localize the issue to be able to fillup something useful... controller version is 2.8.0, and wally told me already that few things were fixed in model since | 08:53 |
flxfoo | stickupkid:do unit (instance) connect to 37017 (mongodb) directly? | 09:01 |
flxfoo | does not look like (sniffing | 09:02 |
stickupkid | flxfoo, they should all go via an API | 09:02 |
flxfoo | k | 09:03 |
flxfoo | stickupkid:still don't understand why private interface tries to connect to public interface port 37017 on the controller | 09:04 |
flxfoo | stickupkid:what is connecting to 37017? a client (cli) would or not? or is it internal to ctrl? | 09:06 |
stickupkid | flxfoo, internal to ctrl, achilleasa any thoughts on above? | 09:07 |
achilleasa | flxfoo: is that a HA controller? | 09:09 |
flxfoo | stickupkid:not yet :p | 09:11 |
flxfoo | stickupkid:no it is not HA | 09:32 |
achilleasa | stickupkid: are you aware of any existing watcher for space topology changes? | 11:27 |
achilleasa | or for subnet doc changes | 11:31 |
achilleasa | the latter is what I actually need | 11:32 |
stickupkid | achilleasa, not that I'm aware of | 11:32 |
achilleasa | stickupkid: there is a lifecycle watcher for subnets | 11:35 |
stickupkid | achilleasa, on develop if you run make statis-analysis do you get this issue? | 13:17 |
stickupkid | go/src/github.com/juju/juju/apiserver/facades/client/applicationoffers/state.go:58:2: UserPermission redeclared | 13:17 |
stickupkid | not sure how that's not being picked up, it really should be | 13:17 |
achilleasa | can check in 5' | 13:17 |
stickupkid | wicked | 13:18 |
=== hallback_ is now known as hallback | ||
=== vern_ is now known as vern | ||
=== marosg_ is now known as marosg | ||
=== mskalka_ is now known as mskalka | ||
=== beisner_ is now known as beisner | ||
=== nicolasbock_ is now known as nicolasbock | ||
=== coreycb_ is now known as coreycb | ||
=== skay__ is now known as skay_ | ||
=== skay__ is now known as skay | ||
stickupkid | hml, updated my q&a steps https://github.com/juju/juju/pull/11994 | 14:27 |
hml | stickupkid: rgr | 14:27 |
=== xnox1 is now known as xnox | ||
stickupkid | hml, approved | 14:53 |
hml | stickupkid: ta | 14:53 |
stickupkid | why is CharmID.Metadata a untyped map[string]string | 15:51 |
stickupkid | ? | 15:51 |
stickupkid | just mystery meat | 15:51 |
pmatulis | lol | 15:56 |
tychicus | https://jaas.ai/apache2, it does not appear that the apache2 charm supports the certificates:tls-certificates relation, to obtains tls certificates from vault, is that correct? | 16:03 |
tychicus | If you set bluestore-block-db-size in ceph-osd post deployment, what additional steps are necessary to update the db partition size on all of the osd servers | 21:21 |
tychicus | do you have to remove and re-add the unit to pick up the new setting or can you use osd-out zap-disk and then add-disk | 21:23 |
=== mwhudson_ is now known as mwhudso | ||
=== mwhudso is now known as mwhudson |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!