[01:30] wallyworld: just got this bootstrap lxd on 2.8-rc branch [01:30] Unable to fetch Juju GUI info: error fetching simplestreams metadata: cannot unmarshal JSON metadata at URL "https://streams.canonical.com/juju/gui/streams/v1/com.canonical.streams-released-gui.sjson": json: cannot unmarshal string into Go struct field Metadata.juju-version of type int [01:31] hpidcock: yup, they messed up the metadata [01:31] a fix is in progress [01:31] awesome [01:31] indeed [02:07] wallyworld: https://github.com/juju/python-libjuju/pull/412 [02:07] and anyone else interested in python-libjuju [02:07] looking [02:08] +43000!!!!!! [02:09] wgrant: appreciate the time taken to write that response, thanks [02:09] timClicks: It's possible that I may have some Opinions :) [02:10] Thanks for starting conversations like this [02:10] wgrant: am glad that thread was started with a warning ;) [02:10] Heh, indeed. [02:14] hpidcock: ideally we'd use 2.8.0 for the schema not 2.8-rc2 as that reference will be out of date in days. can we add in the remaining schema from the model operator branch and use 2.8.0 [02:16] wallyworld: rc2 is 2.8.0 at the moment. If stuff hasn't landed in the rc branch it's not 2.8.0 yet [02:17] sure, but it seems unfortunate to release a new libjuju with will be out of date in terms of that in a matter of days [02:17] i guess we can do a 2.8.1 after 2.8 ships [02:29] hpidcock: reviewed libjuju [02:41] thumper: i'm still waiting on fixes/changes to the dashboard tarball, but this works with the one that's currently published. i also need to retest upgrades after changes to accommodate the recent tarball revision https://github.com/juju/juju/pull/11562 [02:48] wallyworld: ack [02:51] wallyworld: did you check that the old gui continued to work after the upgrade? [02:51] i did [02:52] will retest everything though after final tweaks for the new new new dashboard [04:01] thumper: everyone else super busy, so sorry, https://github.com/juju/juju/pull/11565 [04:04] ah balls, targetted to wrong branch, should be 2.7 [04:07] wallyworld: did you want me to retarget? [04:07] or do you need to rebase first? [04:07] thumper: i just rebased and retargetted [04:08] now i need coffee before i hack on the dashboard stuff again to support the new new format [04:14] wallyworld: asked a question... [04:19] thumper: answered [04:20] previously the offer stirng was a const [04:20] now it's built from series, and we also need the sku [04:25] wallyworld: right, but the number at the front is something they increment [04:25] not a fixed 0001 [04:25] supposedly [04:26] it's not something we know ahead of time, it's not something were can query. we could make it a vonfig somewhere but that has its own issues [04:26] which is why I asked can we do a substring match on the offer? [04:26] it's an arbirary decision outside of juju [04:27] no, becaue it's a param passed to azure [04:27] we don't match on it [04:27] we pass it to azure [04:27] right [04:28] I understand that [04:28] can we pass a substring? [04:28] is it an exact match? [04:28] because we shouldn't be using 0001 [04:29] it's used to tell azure to pick a particular image by tag, no substring involved, eg [04:29] return &compute.StorageProfile{ [04:29] ImageReference: &compute.ImageReference{ [04:29] Publisher: to.StringPtr(publisher), [04:29] Offer: to.StringPtr(offer), [04:29] Sku: to.StringPtr(sku), [04:29] Version: to.StringPtr(version), [04:29] }, [04:29] OsDisk: osDisk, [04:29] } [04:29] Offer is a straight out arg to an api call [04:30] we either have it or we don't [04:31] in which case we have a problem [04:31] if they ever chabge it yes [04:32] guarantee that they'll change it [04:32] wallyworld: https://github.com/juju/juju/pull/11566 got this pr to move the retry to initialization function to retry the coping charm request only rather than retrying the whole remote init operation [04:32] we need to tell them not to [04:32] could u take a look? [04:32] sure [04:32] ty [04:48] kelvinliu: lgtm but retarget to 2.8-rc branch [04:48] wallyworld: yep, ty [04:53] https://github.com/juju/juju/pull/11567 for a fix to the full status query tracker tests [04:53] hpidcock: kelvinliu: looks like another remote-init issue, bug 1878329 application-mattermost: 15:32:59 ERROR juju.worker.uniter resolver loop error: executing operation "remote init": caas-unit-init for unit "mattermost/0" failed: ERROR failed to remove unit tools dir /var/lib/juju/tools/unit-mattermost-0: unlinkat /var/lib/juju/tools/unit-mattermost-0/goal-state: permission denied [04:53] Bug #1878329: stuck k8s workload unit following upgrade-charm with new image [04:53] thumper: looking [04:55] that's weird, operator is the root user [04:57] thumper: just a small suggestion [04:58] thanks for the libjuju relase hpidcock, you've made several folks very happy, including the osm guys [04:59] not to mention solutions qa [04:59] wallyworld: https://bugs.launchpad.net/juju/+bug/1877935 we still have this bug, [04:59] Bug #1877935: operator trys to exec into the workload container but the container is not running yet [05:00] yeah we do :-( [05:00] forgot mention in standup, I think this one and the watcher one should have higher priority than the block storage one? [05:00] yup [05:00] the block storage one is more a guard rail [05:01] ok, im gonna look on these two first [05:01] with the init one, maybe we can try looking for the pod a few times, using retry() [05:01] or at least query the cluster to see if things are stareting up [05:02] and give them time to get done if they are [05:03] i think there might be a upgrade charm flow missing in the init process [05:04] could be, i'd need to look at the logic again to see what's been implemented [05:05] also I guess the watcher bug is related operator rather than the watcher itself. It seems the operator is not responding correctly if any uniter got those errors(like 137, etc). [05:07] wallyworld: you think I should just say "FullStatus" without the prefix ? [05:07] I think it would probably match too [05:08] thumper: it's more filtering out other method names that contain the one currently being filtered on [05:08] wallyworld: I don't know what you just said [05:09] so if the tracer has been set up to look for "FullStatus" and the traceback has "FullStatusVerbose", it would match both with the current code [05:09] FWIW, tracker := s.State.TrackQueries("FullStatus") works [05:09] we want to exclude "FullStatusVerbose" [05:09] well, then the caller should say "FullStatus(" [05:10] ok, I wasn't sure if the expectation / desire was for a single method match [05:10] to avoid ambiguity [05:11] it is sufficient for now, but we may need to tweak later, I'll add more of a comment on the TrackQueries method to explain so future people will know [05:11] sgtm [05:13] wallyworld: added more of a comment to explain on the tracker [05:13] so it shouldn't be a surprise to any future people using it [05:14] ty [05:44] wallyworld, or anyone: https://github.com/juju/juju/pull/11568 [05:48] tlm, hpidcock, kelvinliu: ^^? [05:49] looking now? [05:49] kelvinliu: thanks [05:53] kelvinliu: generally for things we need to wait on, we use testing.LongWait [05:53] even if we expect them to happen quickly [05:53] there is no real change to the test times for running in that package [05:54] with the patch I had 2.2, 3 and 4s for the package on three different runs [05:54] before it was 2.7 seconds [05:54] so within the realm of ok IMO [05:54] thumper: it's weird to see this errors because we mocked anything in caas provider, so no db or any network access, but the fix lgtm. [05:54] yeah, I'm betting it is just CPU contention [05:55] and the number of concurrent goroutines [05:55] yes it shouldn't take long [05:55] * thumper shrugs [05:55] I'm pretty sure this will make the intermittent issues go away [05:55] yeah, thanks for the fix [06:03] thumper: that PR should land in the 2.8 branch IMO [06:04] not 2.8-rc but 2.8 [06:04] tlm: your PR ready for another look? [07:48] an excellent thread has emerged on a charm's error state for anyone who would like a few minutes of procrastination ahead of them https://discourse.juju.is/t/when-to-send-an-application-into-an-error-state/3046 === salmankhan1 is now known as salmankhan === salmankhan1 is now known as salmankhan [14:07] hml: Cherry-pick to the RC branch of Tim's ENI/Netplan patch: https://github.com/juju/juju/pull/11569 [14:14] manadart: looking [14:15] manadart: tick [14:15] hml: Ta. === cory_fu_ is now known as cory_fu [15:07] hml: Small utility addition. No hurry; I am EoD: https://github.com/juju/juju/pull/11570 [19:26] petevg: I have a present for you: https://github.com/juju/python-libjuju/pull/415 There are still failing tests, but you can actually see what they are now and they would have caught the open issues with the 2.8 release. [19:27] cory_fu: nice! Thank you :-) [19:28] petevg: It would also be nice to see all of those "Event loop is closed" errors when a test fails cleaned up to make the results less noisy, but I'll leave that to you. ;) [19:29] +1 [19:29] :-) [19:55] petevg: I see that the Juju repo is also using GH Actions now. It would be really interesting if a portion of that PR could be ported over to the Juju repo so that breakages to libjuju are caught immediately rather than depending on a change to land in libjuju to find it. I guess one aspect of that is that you'd need to do the upstream sync as part of the test so that things like facade or definition changes got picked up. That probably worth [19:55] doing on the edge portion of the libjuju tests as well, TBH. [19:55] cory_fu: that makes sense. It'd be nice to catch this stuff right away ... [19:57] petevg: Downside is the 20+ minute duration of the integration tests, plus the fact that while the change in Juju might lead to a break, it would likely need to be fixed in libjuju rather than the Juju PR. [19:57] True. [19:57] That would create a chicken and egg issue! [19:57] I wonder if GitHub Actions could do a daily run with notifications? Or since you have a Jenkins already, adding a daily run to that to watch for libjuju breakage. [19:58] petevg: I actually thought that last one was planned when the Juju team took over libjuju, but I guess it got dropped. [20:44] cory_fu: petevg one of the issues we had was the whole "land the thing that generates the new schema" and then go update the schema in the library chicken and egg issue [20:44] cory_fu: petevg I think there is a test that's non-gating in jenkins that checks if things are likely to be broken. You can check with stickupkid on those as he's managed most of that to date [20:45] rick_h: cory_fu made a pr to make that test gating :-) [20:45] petevg: Different test [20:45] Aha! Got it. [20:47] rick_h, petevg: Would that be this test? https://jenkins.juju.canonical.com/view/github/job/github-integration-tests-pylibjuju/ [20:48] Hrm. Maybe this one? https://jenkins.juju.canonical.com/job/github-schema-tests-pylibjuju/ [20:50] You're showing me red circles and making me sad, cory_fu :-( [20:50] petevg: Red circles that haven't been run in months, too [20:51] petevg: Maybe this one will make you happier? https://jenkins.juju.canonical.com/job/github-check-merge-juju-python-libjuju/147/ [20:52] That is running the unit tests, at least, but that doesn't help with catching any of the stuff that the integration tests caught. [20:54] petevg: Thinking about it more, there's a chicken-and-egg issue in the other direction as well. If we make the libjuju edge tests do the upstream sync and it fails, it wouldn't really be relevant to the PR on that side either. [20:55] petevg: It definitely seems like the right thing is instead something like a daily build that uses master of Juju and master of libjuju, syncs and builds them, then runs the integration tests from libjuju. As long as that actually generated an alert that got paid attention to, it wouldn't block specific PRs but would let you know if there was an issue that would need to be addressed. [20:57] Plus the 20 or so minute runtime of the integration tests wouldn't be so onerous if only being run once a day. [23:42] wallyworld: https://github.com/juju/juju/pull/11571