babbageclunk | tlm: can you review this plz? https://github.com/juju/juju/pull/11591 | 01:10 |
---|---|---|
tlm | sure | 01:10 |
kelvinliu | wallyworld: free HO? | 01:11 |
wallyworld | kelvinliu: ok | 01:11 |
tlm | lgtm babbageclunk, will try again with that when it merges. Just go it again | 01:12 |
babbageclunk | tlm: thanks! | 01:29 |
babbageclunk | sorry about that | 01:29 |
tlm | not your fault | 01:34 |
thumper | hpidcock: if you are going to update juju to use -N on ppc64el, did you want to look at the etcd-io/bbolt? | 02:27 |
hpidcock | thumper: I can look at that too | 02:28 |
thumper | lt looks like a drop in replacement with a mem fix | 02:29 |
thumper | perhaps extra fixes too | 02:29 |
thumper | at least that is the theory | 02:29 |
thumper | hpidcock: how's the python fun going? | 02:29 |
hpidcock | thumper: I think I'm done for pylibjuju for now, just need to cut a release, might have it coincide with rc2 | 02:31 |
thumper | tlm: looks like a test failure in your model agent landing | 02:31 |
thumper | hpidcock: ack | 02:31 |
thumper | tlm: FAIL: machine_test.go:640: MachineSuite.TestManageModelRunsCleaner | 02:31 |
thumper | tlm: I'm wondering how useful that test is, looking at the content, it is doing a hell of a lot that isn't what we care about | 02:37 |
thumper | tlm: looking, I'm not sure you touched that one at all, and it is just a time sensitive test | 02:54 |
tlm | thumper: sorry back from lunch | 02:56 |
tlm | are this test is driving me nuts | 02:56 |
tlm | thumper: what do you recommend ? | 03:00 |
babbageclunk | tlm: sorry - I've been playing whack-a-mole with those tests. I'm going to bump up the timeouts wholesale | 03:06 |
tlm | ok babbageclunk, no issues at all. Weird that mine is the one suffering. Making me wonder if I have missed something | 03:07 |
babbageclunk | tlm: it's possible I guess but I can't see how it would be something you've done - those tests won't be running your workers | 03:12 |
thumper | tlm: just try to merge again | 03:29 |
thumper | I have a branch that should fix that intermittent timeouit | 03:29 |
thumper | just pushing now | 03:29 |
thumper | babbageclunk: https://github.com/juju/juju/pull/11592 | 03:32 |
hpidcock | wallyworld: can you fork github.com/hashicorp/raft-boltdb into juju/raft-boltdb please | 03:33 |
wallyworld | ok | 03:34 |
thumper | hpidcock: what level of changes do we need? | 03:34 |
hpidcock | thumper: just a path rewrite on the bolt db | 03:34 |
hpidcock | because etcd renamed the project | 03:34 |
thumper | hpidcock: and go mod doesn't help there? | 03:34 |
thumper | ah... | 03:34 |
thumper | poo | 03:34 |
hpidcock | not when they renamed it | 03:34 |
wallyworld | thumper: could you do the fork, i'm in the middle of some Z^%W%@! unit tests | 03:35 |
thumper | wallyworld: ack | 03:35 |
thumper | hpidcock: here you go: https://github.com/juju/raft-boltdb | 03:35 |
hpidcock | thumper: thanks | 03:35 |
babbageclunk | thumper: approved with gusto! | 03:40 |
hpidcock | thumper: wallyworld: can you both review and merge please https://github.com/juju/raft-boltdb/pull/1 | 03:45 |
tlm | thumper: thanks for the PR | 03:57 |
babbageclunk | thumper: your pr hit a different intermittent failure, I kicked it off again | 04:15 |
babbageclunk | duh, sorry, that was a check build not a merge one | 04:15 |
* babbageclunk is a dork | 04:15 | |
* tlm offers babbageclunk a run | 04:16 | |
* babbageclunk accepts | 04:17 | |
thumper | babbageclunk: no worries | 04:21 |
thumper | I've filed a bug for that | 04:21 |
thumper | we get a lot of intermittent failures in that package | 04:21 |
thumper | I feel that they all have the same root cause | 04:21 |
thumper | but I've not looked yet | 04:21 |
thumper | tlm: for the record, I kicked your PR merge again | 04:22 |
tlm | thanks thumper | 04:24 |
hpidcock | thumper: can you both review and merge please https://github.com/juju/raft-boltdb/pull/1 | 04:29 |
wallyworld | babbageclunk: it's bigger than it looks due to deleting a lot of code and moving some code. i still have a unit test to fix in worker/uniter but good apart from that https://github.com/juju/juju/pull/11593 | 04:33 |
babbageclunk | wallyworld: normally people say the other way? | 04:33 |
wallyworld | hpidcock: looking now | 04:33 |
babbageclunk | wallyworld: ok. looking | 04:33 |
babbageclunk | oops meant a comma there | 04:33 |
babbageclunk | fullstop sounds super terse! | 04:33 |
wallyworld | all good | 04:33 |
babbageclunk | whoa, looks big! | 04:35 |
wallyworld | lots of deleted code | 04:35 |
wallyworld | and moved code | 04:35 |
wallyworld | core changes not too bad | 04:35 |
wallyworld | hpidcock: done | 04:35 |
hpidcock | wallyworld: many thanks | 04:35 |
wallyworld | babbageclunk: thre's 4 commits which natch the pr description if that helps. the raft and lease worker bits should be familiar hopefully | 04:36 |
babbageclunk | ok | 04:37 |
babbageclunk | yeah, that definitely helps | 04:37 |
wallyworld | it's all a bit of a rush sorry | 04:37 |
wallyworld | otherwise i'd have done separate prs | 04:37 |
wallyworld | just got this fix this %W@$!%$ uniter test | 04:38 |
babbageclunk | no worries! | 04:43 |
babbageclunk | wallyworld: oh, you've done the autoexpire removal work, nice | 04:48 |
* babbageclunk gets rid of that part of his branch | 04:48 | |
wallyworld | babbageclunk: yeah, sorry, i had to cause it was all mixed up in the work | 04:51 |
babbageclunk | makes sense | 04:53 |
wallyworld | there' sstill the dummy provider stuff though | 04:54 |
wallyworld | i think there's a fair bit that can be deleted off that | 04:54 |
* tlm ducking out for a little bit to get some air | 04:59 | |
wallyworld | babbageclunk: i added an implementation of RevokeLease() in the dummy store and that fixes the tests | 05:05 |
babbageclunk | nice | 05:06 |
wallyworld | maybe i can delete ExpireLease() now for the dummy store, i think we only use it to claim a lease for leadership tsting | 05:07 |
wallyworld | yup, nothing uses it | 05:08 |
babbageclunk | wallyworld: the only extra bit is that there needs to be a background goroutine for the dummy lease store so it can expire leases internally | 05:08 |
wallyworld | babbageclunk: i thought about it but from what i can see, we only ever claim a lease to set up a unit leader | 05:08 |
wallyworld | i am pretty sure the testst will now all pass | 05:09 |
babbageclunk | ok, if you don't think there are any places that need expiry that's easier | 05:09 |
wallyworld | yeah, i'll see if the current tests pass | 05:09 |
babbageclunk | sounds good | 05:09 |
wallyworld | i'll add expiry if needed but i don't think so | 05:09 |
wallyworld | kelvinliu: did moving the uniter struct initialisation help? | 05:11 |
kelvinliu | HO? | 05:11 |
wallyworld | sure | 05:11 |
hpidcock | thumper: the deferreturn issue fix was landed https://go-review.googlesource.com/c/go/+/234105/ | 05:26 |
hpidcock | hasn't been picked up for a backport to 1.14 yet. Will need to keep an eye out for it. | 05:31 |
hpidcock | thumper: https://github.com/juju/juju/pull/11594 | 06:17 |
wallyworld | kelvinliu: this solves most of it - leadership stable after removing wrench. it keeps logging that it wants to depose leadership so a small issue to solve still https://pastebin.ubuntu.com/p/Fx8Y8XsfSd/ | 06:19 |
wallyworld | kelvinliu: just afk for a bit, be back soon | 06:26 |
kelvinliu | wallyworld: looking now, ty | 06:28 |
wallyworld | kelvinliu: did it work for you too? | 07:20 |
kelvinliu | yes finishing the pr now | 07:21 |
wallyworld | kelvinliu: did you see the repeated messages about running a leader deposed hook? | 07:21 |
wallyworld | seems to be more log noise than anything since show-status-log is ok | 07:22 |
wallyworld | but something needs fixing | 07:22 |
wallyworld | it might be the addition of the logger which now prints messages | 07:23 |
wallyworld | so it's always been there | 07:23 |
kelvinliu | I saw the warning message even before this branch | 07:27 |
wallyworld | lots og them repeated? | 07:29 |
wallyworld | i'll see if i can fix | 07:29 |
kelvinliu | did u see lots of repeat? I only saw once | 07:29 |
wallyworld | i saw lots of repeats | 07:29 |
wallyworld | we expect one but not repeated | 07:30 |
kelvinliu | I can't re-produce the warning message now.. | 07:41 |
wallyworld | i'm trying again, we'll see | 07:44 |
wallyworld | kelvinliu: it happens after adding and removing the wrench file | 07:46 |
wallyworld | kelvinliu: and it happens because the unit agent local state struct gets Leader=true for non leaders for some reason | 07:47 |
wallyworld | because looks like leader tracker is setting remotestate leader to true | 07:48 |
kelvinliu | u mean the local leader state is out of sync | 07:49 |
wallyworld | seems like it, need to do more debugging | 07:49 |
wallyworld | kelvinliu: yeah, the new tracker still gives bd results for no leaders after the wrench file is removed :-( | 07:57 |
kelvinliu | wallyworld: did u build the latest code? | 07:59 |
kelvinliu | it works fine for me | 07:59 |
wallyworld | kelvinliu: i'll pull your latest code and try again | 08:00 |
wallyworld | i was working with my initial diff | 08:00 |
kelvinliu | wallyworld: I just removed debugging msg and fixed tests. no much change | 08:01 |
wallyworld | ok, i'll pull latest any and try | 08:02 |
kelvinliu | yep | 08:10 |
stickupkid | manadart, https://github.com/juju/python-libjuju/pull/423 | 10:27 |
stickupkid | or hpidcock if you're around | 10:28 |
manadart | stickupkid: Approved it. | 10:31 |
stickupkid | ta | 10:31 |
manadart | stickupkid: Landed on develop instead of 2.8. Backport: https://github.com/juju/juju/pull/11597./ | 14:06 |
manadart | achilleasa, hml: Can you tick that patch? ^ | 14:24 |
hml | manadart: looking | 14:24 |
manadart | hml, petevg: Test the shutdown service. It does indeed just fail on Bionic, and is pointless. | 14:25 |
hml | manadart: i’m getting a compare changes screen, not a pr | 14:26 |
manadart | hml: https://github.com/juju/juju/pull/11597 | 14:26 |
stickupkid | manadart, done | 14:26 |
petevg | manadart: hah! I guess that's a strong argument for just queuing it up for now. And also for making a bug to actually fix it ... | 14:26 |
stickupkid | manadart, achilleasa do we still need this acceptance test, or does the new CI test cover this? https://github.com/juju/juju/blob/develop/acceptancetests/assess_network_spaces.py | 15:13 |
manadart | stickupkid: Which new one do you mean? | 15:31 |
stickupkid | https://github.com/juju/juju/tree/develop/tests/suites/spaces_ec2 | 15:31 |
manadart | stickupkid: Thought so. It doesn't really test the same things. That one tests bindings, including the upgrade-charm path. | 15:33 |
stickupkid | shame, wanted to get rid of another test tbh | 15:33 |
manadart | stickupkid: Python one tests space constraints, including container-in-machine. | 15:33 |
stickupkid | manadart, fiiiiiiiiiiiiiiiiiiine, will add it to the lst of things to move rather than delete | 15:34 |
manadart | stickupkid: I can re-write in shell style when I do that bindings card in the "Doing" lane./ | 15:34 |
stickupkid | manadart, let's do that because the python one doesn't do a good job at cleaning up | 15:35 |
manadart | stickupkid: Ack. | 15:35 |
stickupkid | also I'm pretty sure we can reuse the VPC... in the python test, but let's ignore that for now | 15:35 |
stickupkid | hml, you could fix that as a tempary measure I guess for running out of VPCs in eu-west-1 | 15:36 |
manadart | hml, petevg. Did a quick smoke test on MAAS. Bionic containers appear to release IPs upon both remove-machine and kill-controller, so we don't need a network shutdown service. | 15:37 |
* manadart heads home. | 15:37 | |
thumper | petevg: https://github.com/juju/juju/pull/11598 plz | 20:57 |
petevg | thumper: taking a look | 20:58 |
thumper | petevg: it is just forward porting the fix I did on friday | 20:58 |
thumper | as wallyworld mentioned, should get it into the 2.8 branch | 20:58 |
thumper | I should have done it friday, or yesterday, but it slipped my mind | 20:58 |
petevg | thumper: Got it. I marked it as approved. | 20:59 |
thumper | petevg: ta | 21:00 |
petevg | np | 21:00 |
thumper | petevg: bug 1876849 | 21:25 |
mup | Bug #1876849: [bionic-stein] openvswitch kernel module was not loaded prior to a container startup which lead to an error <cdo-qa> <OpenStack neutron-openvswitch charm:Incomplete> <juju:New> <https://launchpad.net/bugs/1876849> | 21:25 |
tlm | wallyworld: https://github.com/juju/juju/pull/11599 | 23:42 |
wallyworld | looking | 23:43 |
wallyworld | tlm: ta, lgtm | 23:43 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!