[00:33] what is the rationale for prohibiting read access to application-level data (via relation-get/state-get) to non-leader units? [00:34] is the recommended course of action for the leader to read those values, then propogate them via peer relations? [00:47] timClicks: the basic idea is that you shouldn't let a unit read a value unless it'll be notified if the value changes. [00:47] timClicks: and we don't want the leader setting an application bag value to trigger a hook on the other units. [00:48] (except in the case of peer relations) [00:49] I mean, triggering a hook on the other units on the same side of the relation. [00:50] Anticipating the "why not" though - I'm not totally sure... Having a look in the spec doc for rationale [00:56] babbageclunk: I can certainly understand the rationale for only allowing leaders to modify application-level values [00:56] but I'm less clear on why other units are not allowed to access that data [00:57] is there a recommended pattern for propogating data to peers? [01:02] timClicks: I think it's to have a peer relation - leader sets the information in application data there, then the units can read it because they're on the "other" side. [01:05] wallyworld: https://github.com/juju/juju/pull/11561 [01:05] looking [01:12] hpidcock: i merged directly [01:30] hpidcock: so the issue with the rc is that the juju binary in the snap has the OfficialBuild label set. the "stable" release jobs don't do that. and we need to not do it for rc either, or if we do a proper beta1, ie anything not published to the edge channel [01:30] hpidcock: ok I'll have a look at that too [01:31] only the edge snaps built from a PR need the OfficialBuild set [01:31] so that juju constructs the qualified image tag [01:31] wallyworld: problem is the ci-run doesn't know this [01:32] I have some thoughts, let me look at that first [01:32] righto, let me know if you want to chat about it [01:33] i think the requirements are clear now, we just need to make the jenkins jobs comply [01:34] we *could* publish another rc snap without the OfficialBuild to test [01:34] or just make sure it's fixed for rc2 [01:40] hpidcock: for now, i've simply added the missing tag that the rc1 snap expects and it's all good [01:44] i upgraded the series of a machine hosting a single principle charm (and one subordinate) and the value under the Series column (juju status) did not change. i am using containers and MAAS. is this a charm bug or a juju bug? [02:29] pmatulis: it looks like a juju bug to me (as an educated guess looking at a few things in the code) [02:29] i'm not 100% sure though, but it seems like it could be [02:29] i can't see where the series is updated on the machne [02:30] there's an old deprecated method but i can't see a replacement [02:30] wallyworld, thanks. it actually happened to two charms (different model). but the second one only updated the series of the leade'sr machine [02:31] (each application/charm had 3 units) [02:31] hmmm,there must be a code path to do it then if one of the machnes was updated. [02:32] raise a bug i think is best and the folks who wrote upgrade-series can take a look [02:32] they'll probably know straight away what any issue is without having to poke around [02:35] wallyworld, i will reproduce and open a juju bug. thank you! [02:35] ty, sorry i couldn't help immediately. not 100% familair off hand with the code [03:13] wallyworld: can we HO? [03:31] wallyworld: I think this is the issue we were talking about this morning... https://bugs.launchpad.net/bugs/1878110 [03:31] Bug #1878110: ErrImagePull on k8s deploys with freshly bootstrapped 2.8-rc1 controller [03:32] wallyworld: and perhaps this is another dupe of the other k8s delete model work? https://bugs.launchpad.net/bugs/1878086 [03:32] Bug #1878086: juju destroy-model of the k8s model hangs [03:32] I don't want to triage them incorrectly given this is something you have touched recently [03:34] what is the upgrade-charm syntax to upgrade from https://jaas.ai/percona-cluster/286 to https://jaas.ai/u/openstack-charmers-next/percona-cluster/365 ? [03:34] there is a --channel option but i can't get it to work [03:46] hpidcock: yeah, we can, sorry mssed ping [03:47] wallyworld: in stdup [03:47] thumper: that rc1 issue is solved now, i added an image tag to dockerhub [03:47] rc2 will have it fixed. it's a build issue [03:49] wallyworld: I thought it was, but I figured you were best to comment and close / assign the bugs [03:49] thumper: yup will do, on my list [03:49] just messing with gui stuff [03:49] * thumper nods [03:49] wallyworld: how's it going? [03:49] thumper: sec, otp [03:50] pmatulis: is the charm in the same channel? [03:50] ah... different owner [03:50] I think you need to use --switch [03:50] thumper, ohh [03:51] pmatulis: as far as juju is concerned the charms are unrelated [03:55] that actually worked really well [03:56] and was instantaneous [03:56] perhaps just very quick :) [03:57] wallyworld: I think perhaps we should land the GUI parts as much as we have them (as we fix the minor issues) [03:57] and rely on getting the simplestreams data updated soonish [03:58] outside of our chagnes [04:10] thumper: that's exactly my plan. i am almost done with code fixes to address *all* the issues I identified. need to test 2.7 -> 2.8 upgrades. code will not care if streams is broken, you can still deploy gui from a local tarball. and i had a longish meeting with jeff this morning to communicate the fixes we need to make to the workflow and behaviour and the streams and i am hoping for updated streams tomorrow [04:11] kelvinliu: are you having luck with the retriable error concept? [04:12] wallyworld: it didn't work, still investigating [04:14] kelvinliu: ah, so leaving the operation as pending didn't cause it to be picked up again next time through the loop. i haven't got the exact logic intrnalised right now for the resolved FSM, but let me know if you need help [04:15] no, it didn't work.. [04:15] nws, im investigating a bit further [04:33] wallyworld: do you remember we we can tweak cert complexity for controllers in tests? [04:33] hmmm, rings a bell, but details not paged in :-( [04:34] probs in same package as the test ModelTag stuff [04:34] nah, me neither [04:34] I thought briefly that juju conn suite might set it [04:34] but seems not [04:34] * thumper digs some more [04:35] thumper: juju/juju/testing/cert.go [04:36] pregenerated certs we then use elsewhere [04:37] looks like we are using those for the dummy provider [04:37] I'm trying to track down a weird test timeout [04:38] we spin doing 7.5 seconds of nothing [04:53] wallyworld: thumper: tlm might have touched these in the recent certs work [04:56] hpidcock: locally my machine pauses for about 300 ms [04:56] so it is clearly doing something [04:56] yep [04:56] on a slower loaded machine, it could easily take a lot longer [04:56] I had a branch that fixed this, let me have a look [04:56] trying to find out what it is doing [04:59] thumper: this was my commit that sped up all the cert generation a while back https://github.com/juju/juju/commit/866052a37846a0230e5b3eaf89b373eb5bb5e764 but tlm's recent work might have undone this [05:01] hpidcock: that still seems to be there [05:01] which test is taking a long time? [05:01] a bootstrap test [05:01] tim@terry:~/go/src/github.com/juju/juju/cmd/juju/commands (develop)$ go test -check.vv -check.f TestAutoUploadAfterFailedSync [05:02] the pause is between these two lines: [05:02] [LOG] 0:00.031 DEBUG juju.cmd.juju.commands provider attrs: map[broken: controller:false secret:pork] [05:02] [LOG] 0:00.370 INFO cmd Adding contents of "/tmp/check-2439852674146353662/1/.ssh/id_rsa.pub" to authorized-keys [05:02] on my machine it was 320ms [05:02] on the test machine it was 7.5s [05:02] let me have a quick look [05:02] my machine is probably faster and doing less [05:14] thumper: i've pushed a dashboard PR, but there's a fundamental issue (from when it was written) concerning version checking i need to fix, it currently uses the CLI version not the controller version. i'll work on that and push a commit, but what's there *should* work with the new simplestreams metadata when published tomorrow (hopefully) [05:14] ok, thanks [05:14] hpidcock: thumper Some of my pki tests generate keys [05:14] that could be the slowdown ? [05:15] tlm: you aren't doing that in the bootstrap tests though? [05:15] thumper: I'll look. Can't remember off the top of my head [05:18] thumper https://usercontent.irccloud-cdn.com/file/ceDs66KD/out.png [05:24] hpidcock: fyi, I made a bug for it https://bugs.launchpad.net/juju/+bug/1878135 [05:24] Bug #1878135: BootstrapSuite.TestAutoUploadAfterFailedSync times out intermittently [05:24] thumper: out of interest what architecture was this on? [05:24] hpidcock: I need to work out how to get that [05:25] hpidcock: pretty sure it was just amd64 [05:27] ok amd64 should have been fast [05:27] but 3072bit rsa primes are pretty chonky [05:28] hpidcock: probably from environs/bootstrap/config.go:211 [05:28] ideally for tests that are not testing certs we either use pregenerated certs or drop the key strength to 512 [05:28] hpidcock: should be fast, but if the machine is very busy and swapping [05:28] hpidcock: right [05:29] perhaps these tests are just missing some patching of values [05:29] yeah [05:29] * thumper wonders what that bug is [05:29] bug 1558657 [05:29] Bug #1558657: many parts of the code still don't use clocks [05:29] left there as a todo [05:30] doesn't look like we patch pki.DefaultKeyProfile anywhere [05:30] I think there is room for us to do work on the test waiting. If we have a known long running operation, if it could suspend test timers [05:31] I'd rather use a smaller key for tests so they are faster [05:47] tlm: i just going to relocate, can look at PR again soon if lyou need me to [05:48] wallyworld: just gonna take the dogs for a run and punch on for a little bit longer. Have fixed a few bugs today and done most of your feedback. About to fix the password generation point you made in the PR [05:49] no worries, enjoy the walk, i might do the same amd come back later [05:49] wallyworld: hpidcock https://github.com/juju/juju/pull/11558 got this WIP pr to let the 137 error retry, could u take a look to see if it makes sense ? ty [05:49] sure [05:49] if all good, i will sort out the test [05:50] kelvinliu: so it all works now? [05:52] so when I turn the wrench on, I can see nextOps returns a new retry==true remote-init operation. then it finishes if I remove the wrench toggle file [05:54] I made a bit change in the container resolver, Im not 100% sure if the change makes sense. would be good to get hpidcock to take a look [06:01] kelvinliu: can we put the retable check inside RemoteInit implementation to avoid importing k8s into uniter worker [06:01] *retryable [06:04] it's currently in remoteInit.go, we need to mutate the local state, not sure if we can do this outside of uniter package [06:04] kelvinliu: i mea just the error wrapping [06:04] state mutation is fine where it is [06:05] ok, u mean add one more layer of this special error ? [06:05] the aim is to avoid importing k8s package into worker/uniter, but i'd need to look closer at the code [06:07] kelvinliu: so in here func (op *caasOperator) remoteInit( [06:07] that's what is ultimatly called by callbacks.RemoteInit() [06:07] sure, im already on it now [06:08] awesome, that keeps the k8s stuff in one place [06:09] kelvinliu: also, i thought for sure we have a generic CanRetry() type error somehwere already. that would then fully abstract away any k8s from the worker/uniter [06:09] maybe we don't, i'll check [06:12] kelvinliu: ah,, this is what i was thinking of. golang net/Error interface has a Temporary() bool method [06:12] so not directly usable here as it's not really a network error [06:13] we could add a new interface to core package perhaps [06:14] that would decouple worker/uniter from k8s related knowledge [06:15] * wallyworld bbiab [06:19] wallyworld: I added a new uniter error `retryableError` to detect if an operation should be retried or not [06:55] tlm: i've unresolved a few comments that were either resolved without the necessary changes, or resolved without a comment as to why the request was invalid [12:09] achilleasa: https://github.com/juju/juju/pull/11563. [12:10] manadart: looking [12:42] I'd like to deploy another postgresql unit to an environment so that I can have a swapable unit and I've never done it before. is there a post I can read about that somewhere? [12:45] on the charm page it looks like all one does is add another unit and magic happens. I'd like to know a little more beyond trying to read the src code in the charm about this. like, how long does it take for the database to get replicated (based on size of db) [12:45] etc etc etc [12:46] and I never set anything on the charm when I deployed it, do the defaults magically work or do I need to set different ones. etc etc [13:38] hml: the changes in 11523 LGTM. I just have a comment about the relationer constructor. Can you take a look? [13:38] achilleasa: looking [13:39] achilleasa: i’ll give it a try. [13:40] achilleasa: thank you. [13:40] hml: it shouldn't make a difference in total LOC as the PR already lowercaases the type [13:47] manadart: qa good, looking at the code now [15:25] achilleasa: i agree in principal, however it’s breaking down for me here: https://github.com/juju/juju/blob/8ed98bba62d60df44e5648962bf25bb10060f488/worker/uniter/relation/statetracker.go#L52. [15:26] achilleasa: the StateTracker can’t get the Relationer at creation. [15:26] achilleasa: there are work arounds, but they seemed to defete the idea behind the change. Or use a PatchValue call? eh. [15:27] achilleasa: open to more ideas [15:31] hml: you could hoist the method to RelationStateTrackerConfig and provide a closure 'func(....) Relationer { return relation.NewRelationer(...) }' when you populate the uniter config [15:31] this way, it can be easily (in relative uniter test-suite terms) mocked at test time [15:32] achilleasa: i did that, but it seemed to defete the purpose of the change for me. [15:32] achilleasa: the only use of it not in test, would require a wrapper [15:33] hml: I guess you are right... it's not like we will have a different Relationer implementation anyway right? [15:33] achilleasa: nope. the only reason I added the func, was for test. to avoid doing a PatchValue() [15:34] hml: ok then, I can approve the PR and you can land as-is. Sorry for sending you down a rabbithole; should have spotted that func bit earlier [15:34] achilleasa: it mostly has to do with how the StateTracker is written. no easy way around [15:35] achilleasa: np problem. it’s a good learning for me. perhaps i can use it later. does solve a few things i’ve noticed over time with the take an interface return a struct interacting with mocked testing [15:35] if it wasn’t for the darn func. :-/ [15:39] hml: approved [15:39] achilleasa: ty [16:53] hml: can you take a look at the appended commits to get https://github.com/juju/juju/pull/11557 green before I hit merge? [16:53] achilleasa: sure, give me a min or [16:53] 2 [17:04] hml: since I hit EOD feel free to land the PR if it looks OK [17:04] achilleasa: rgr === gabriel1109|away is now known as gabriel1109