=== _thumper_ is now known as thumper [09:34] ec0, xavpaice: woutervb tells me one of you might be able to help with testing of my fix for https://github.com/paulgear/ntpmon/issues/5 [09:34] I've also been talking to ganso from support about the same issue. [09:48] stickupkid, achilleasa: Still chasing a review of https://github.com/juju/juju/pull/12118 [09:49] manadart: on it [09:49] manadart: did you see my comment in 12111? [09:49] achilleasa: Yep. [09:50] Thanks. [11:00] [blahdeblah](https://matrix.to/#/@freenode_blahdeblah:matrix.org) sure thing, although I haven't seen that issue myself. I'm not sure about converting those conditions to warning though, or did you have something else in mind? [11:42] manadart, here is the thing we just discussed https://github.com/juju/description/pull/91 [11:44] stickupkid: OK, gimme a bit. [12:02] Hi everyone [12:03] Can I bother somebody regarding prometheus-ceph-exporter from the "-next" branch ? [12:04] there are great improvements in this branch that we'd love to use but deployment always fail due to this bug : https://bugs.launchpad.net/charm-prometheus-ceph-exporter/+bug/1895531 [12:04] Bug #1895531: -next fails to deploy with TypeError: 'str' object is not callable in ceph_client.auth() [20:16] blahdeblah, to be honest we've disabled that check on the clouds where it was a problem, because of the noise. Re testing, we didn't have a reliable reproducer so confirming yay or nay is going to be tricky [21:42] ec0: Yeah - I am not planning to convert either problem from critical to warning; just planning to prevent the NaN from leaking through to the alert value. [21:42] xavpaice: Understood re: the noise and the difficulty of reproducing. ganso has a couple of clouds where he seems able to reproduce it fairly frequently, so I'll work with him on that. [21:42] Mostly just wanting someone to test the patches, and if possible, do a code review on an upcoming test suite addition. [21:43] @blahdeblah - that makes sense to me, if you get a patch together I'll review & test [21:44] great to see you still hilight on NTP in a round-about way :) [22:12] ec0: Actually jsing poked me about it a few weeks back. :-P [22:34] ec0: Also, drewn3ss submitted https://github.com/paulgear/ntpmon/issues/6 a while back, but the Nagios check is stateless. [22:34] Given that you're just muting the check at the moment, I'm reluctant to invest time on introducing state management. [22:34] The alternative is using telegraf -> prometheus and adding a minimum time period to the check in prometheus alerter. [22:34] well, we shouldn't be muting it, frankly [22:35] I agree, but when it's hard to find time to make progress on actually fixing the reason for the sync failure, and it's intermittent, I can understand making that choice... [22:38] I've also got limited time I can put into this, and I feel like it's probably better spent making better tests and helping ganso fix the underlying cause of the sync failure (at least in the 2 clouds he's working on at the moment). [22:45] totally understand [22:45] the other way to approach it is we could move it into a shared namespace and have some of the people reporting these issues help to contribute and review [22:50] Happy to consider that - any suggestions as to where? [22:56] we could set something up on Launchpad maybe?