[04:26] kelvinliu: next azure PR if you get a chance at some point (another step along the way) https://github.com/juju/juju/pull/11998 [04:44] looking [04:59] wallyworld: lgtm [04:59] ty [05:02] tyvm [06:22] hi all [06:22] having trouble with pylibjuju, can I use something else? like `curl` to retrieve data from a model? [06:25] what trouble? curl will be messy, using a client is better. what's the issue? [06:27] wallyworld:I posted on discourse, but I have `RPC: Connection closed, reconnecting` [06:28] then on controller side the `logsink.log` has `ERROR juju.apiserver apiserver.go:939 error serving RPCs: codec.ReadHeader error: error receiving message: websocket: close 1009 (message too big)` [06:28] of course, nothing changed on those machines, the day before the script was working fine, then the next I have this [06:29] i've not used pylibjuju much, i assume the backup action is configured to leave the backup on the controller and not try to stream it down to the client? [06:30] yeah, mainly that just run the action. login, run action, do some server stuff,... [06:30] steps are [06:30] instanciate the controller is fine [06:30] oh, so a charm action [06:30] yes [06:31] sorry, i head went to a juju backup [06:31] the error raise up, on `get_model()` python call [06:31] it is much before an action is called [06:32] the RPC cuts, when getting model data [06:32] i know we had to increase the msg frame size for the juju client, but that was ages ago, perhaps there's a pylibjuju tweak needed [06:33] I tried from 512 until 65535 using max_frame_size, with no luck [06:34] I am not sure this is about the python lib actually but maybe what the controller answers [06:35] what version is the controller [06:35] 2.8.0 [06:36] i can't recall exactly what's been fixed since 2.8.0, but upgrading to 2.8.2 would be something to consider [06:36] ok, I would need to create a new controller and migrate model yes? [06:36] no, you can just upgrade [06:37] juju upgrade-controller [06:37] you can use --dry-run to see what it would do [06:37] it is a production platform... not sure... [06:38] migration is a good option then [06:38] k thanks [06:38] you can even deploy atest model on 2.8.2 [06:38] and see if the issue goes away [06:39] I have found in logs that a previous model with same name, which raise some error could it be a potential issue related? [06:39] juju doesn't allow 2 models with the same name, so any logs would be for a previously delete model i would expect [06:39] like this `ERROR juju.worker.dependency engine.go:671 “api-caller” manifold worker returned unexpected error: [619092] “machine-0” cannot open api: model cache: model “61909253-939f-48c0-8452-389111410e43” did not appear in cache timeout` [06:40] that could be something else, i don't know off hand [06:40] k [06:40] 2.8.2 did fix some model cache issues i think [06:41] and this `ERROR juju.worker.dependency engine.go:671 “mgo-txn-resumer” manifold worker returned unexpected error: cannot resume transactions: The update path ‘settings.’ contains an empty field name, which is not allowed.` [06:41] (sorry to bother, and I don't have more) :P [06:41] p [06:43] not seen that before. we'd need to see a bit more info like what lead up to that error, any previous errors etc. and maybe even a sanitised chunk of the relevant db records. worth filing a bug [06:45] that's from logsink.log file (appear as well in machine-0.log) , that error is looping together with the previous one. around every 3 minutes. [06:46] last one, more general, in case of spawning a new controller and migrating the model, would previous non exists model or objects in general would be cleaned up? [06:48] a new controller starts empty. only the migrated model is copied across, so any old cruft is left behind [06:49] stuff restarting mneans something has gone wrong and needs looking into, likely a bug that needs fixing [06:49] talking about the RPC? [06:50] no the error being logged every 3 minutes [06:50] k [06:51] the uuid in front of that log line is the model uuid? [06:51] probably not [06:51] the whole line is eg. [06:52] the uuid is the model uuid [06:52] 66c9ba24-69af-40f0-842f-8613777c1491: machine-0 2020-09-15 06:45:04 ERROR juju.worker.dependency engine.go:671 "mgo-txn-resumer" manifold worker returned unexpected error: cannot resume transactions: The update path 'settings.' contains an empty field name, which is not allowed. [06:52] something has really got itself into a funny state [06:52] how do I know the uuid of models? [06:53] juju models --format yaml [06:53] yeah ... I might have done something wierd, but except creating and removing models or apps, really nothing special [06:53] k [06:54] ok so that controller model for sure [06:54] juju should not error for that [06:54] mgo I suppose this is mongodb related [06:55] platform is on aws [06:55] more likely juju messing up when a model was removed [06:57] I see... so yeah best option would be to respawn a new controller and migrate the current model... [06:58] how can I debug or get into the mongodb db? [06:59] you can dump a model as yaml by export JUJU_DEV_FATURE_FLAGS=developer-mode and then juju dump-model or juju dump-db [07:00] is output heavy? [07:00] there is not a lot of nodes [07:00] 17 [07:01] sorry 13 [07:02] the size is more related to how many apps / units [07:04] 7 apps, 13 units [07:04] it won't be much [07:08] done [07:08] sequences: have a few application entries that are no more [07:19] wallyworld:thanks ... I will try to look into that... [07:33] the sequence entries are ok, they will be ignored. but they should have been cleaned up [08:02] wallyworld:ok, so definitely something did not get right... [08:15] wallyworld:sorry to bother, port 37017 should be accessible to juju client (cli) right? [08:17] because sniffing network I have traffic coming from the vm public interface to the private interface on port 37017... [08:28] flxfoo, that's the mongo port [08:29] flxfoo, the default API port is 17070 [08:43] ok good news I found in https://pythonlibjuju.readthedocs.io/en/latest/api/juju.client.html#module-juju.client.connection [08:43] that the default MAX_FRAME_SIZE is 4194304 [08:43] when I doubled it, RPC error is gone [08:45] flxfoo, over 4MB of data, that is some impressive data [08:45] stickupkid:right [08:46] now I dumped the model and the yaml file is about 8022129o [08:46] which fits [08:46] flxfoo, classic [08:46] now , there might be some extra data in there [08:46] is there is way to "cleanup" [08:47] ? [08:47] flxfoo, without seeing it, I'm unsure [08:48] ok, I mean there is no gc type of tools? [08:49] stickupkid:I quite manipulate (for testing pov) the models, like adding removing apps/units etc... [08:49] I suppose that comes from that [08:49] If there is no more much changes, that size should not move forward right? [08:49] and the only way would be to respawn a model? or migrate maybe? [08:50] flxfoo, so I would hope we would clean up if you removed a unit/application [08:50] flxfoo, but yeah, they should be stable size if your model is stable [08:51] stickupkid:right now (production) there is 13 units 7 apps not much [08:52] stickupkid:before that "stable" model, I played quite a while adding/removing etc probably a lot of units and apps... and maybe some pieces haven't been removed properly or so... [08:52] stickupkid:does that sounds plausable? [08:52] yeah, but do keep an eye on it and open a bug if you believe we should be cleaning up when we're not [08:53] stickupkid:sure will do... I try to localize the issue to be able to fillup something useful... controller version is 2.8.0, and wally told me already that few things were fixed in model since [09:01] stickupkid:do unit (instance) connect to 37017 (mongodb) directly? [09:02] does not look like (sniffing [09:02] flxfoo, they should all go via an API [09:03] k [09:04] stickupkid:still don't understand why private interface tries to connect to public interface port 37017 on the controller [09:06] stickupkid:what is connecting to 37017? a client (cli) would or not? or is it internal to ctrl? [09:07] flxfoo, internal to ctrl, achilleasa any thoughts on above? [09:09] flxfoo: is that a HA controller? [09:11] stickupkid:not yet :p [09:32] stickupkid:no it is not HA [11:27] stickupkid: are you aware of any existing watcher for space topology changes? [11:31] or for subnet doc changes [11:32] the latter is what I actually need [11:32] achilleasa, not that I'm aware of [11:35] stickupkid: there is a lifecycle watcher for subnets [13:17] achilleasa, on develop if you run make statis-analysis do you get this issue? [13:17] go/src/github.com/juju/juju/apiserver/facades/client/applicationoffers/state.go:58:2: UserPermission redeclared [13:17] not sure how that's not being picked up, it really should be [13:17] can check in 5' [13:18] wicked === hallback_ is now known as hallback === vern_ is now known as vern === marosg_ is now known as marosg === mskalka_ is now known as mskalka === beisner_ is now known as beisner === nicolasbock_ is now known as nicolasbock === coreycb_ is now known as coreycb === skay__ is now known as skay_ === skay__ is now known as skay [14:27] hml, updated my q&a steps https://github.com/juju/juju/pull/11994 [14:27] stickupkid: rgr === xnox1 is now known as xnox [14:53] hml, approved [14:53] stickupkid: ta [15:51] why is CharmID.Metadata a untyped map[string]string [15:51] ? [15:51] just mystery meat [15:56] lol [16:03] https://jaas.ai/apache2, it does not appear that the apache2 charm supports the certificates:tls-certificates relation, to obtains tls certificates from vault, is that correct? [21:21] If you set bluestore-block-db-size in ceph-osd post deployment, what additional steps are necessary to update the db partition size on all of the osd servers [21:23] do you have to remove and re-add the unit to pick up the new setting or can you use osd-out zap-disk and then add-disk === mwhudson_ is now known as mwhudso === mwhudso is now known as mwhudson