=== mwhudson_ is now known as mwhudson [00:24] wallyworld: https://github.com/juju/juju/pull/8065 is part of a fix for the enable-ha bug [00:24] will look at the replicaset stuff after school drop off [00:24] axw: nw, ty, will look after talking to xtian [00:27] * thumper needs food badly [00:36] hml: we need a unit test [00:36] wallyworld: okay [00:36] there should be stuff to copy from; it's a bit hairy [01:22] * thumper is grumpy walking through resources code [02:25] wallyworld: just a little hairy. ha! - pushing the unit test now [02:25] great [02:29] hml: we just need to also check call names to ensure the rovider behaved as expected, in addition to not crashing wit hthe error [02:29] there's examples to copy from [02:33] wallyworld: i saw examples for storage clients and such… but not the general sender [02:34] hml: yeah, guess so. it seems TestStopInstancesNotFound() for example just checks err is nil [02:35] so should be ok to land based on that precedent [02:35] wallyworld: looked around, not much setup for checking the call tree - though i did verify with some logger messages before finaliing [02:36] sgtm [02:36] wallyworld: had a few false positives so i wanted to verify [02:36] yeah, testing manually is good for this type of issue [02:40] wallyworld: ty - merging now [03:03] wallyworld: can you please take a look at https://github.com/juju/juju/pull/8065? [03:04] sure [03:04] sorry, forgot [03:16] axw: done [03:16] ta [04:23] jam: do you know why mongo.SelectPeerAddress allows machine-level addresses? [04:23] axw: you mean 127.* stuff? [04:23] axw: you can run an HA cluster for testing on just your local machine [04:23] jam: I meant to say machine-local, but yeah [04:24] hmm ok. [04:24] axw: we don't want to allow them ourselves [04:24] so doing so is a bug [04:24] axw: but I think that's why *mongo* doesn't refuse them [04:24] jam: ok, I'll change it then. I meant in our juju/mongo package [04:25] axw: so I don't think we personally ever do local-only testing [04:25] and if we did, we could just use your eth0 ip address 3 times [05:06] jam: can you please take a look at https://github.com/juju/juju/pull/8066? [05:49] axw: will do [06:14] wallyworld: I've added another commit to https://github.com/juju/juju/pull/8056/, can you please look at the last commit? moves the CACert methods around [06:14] ok [06:14] wallyworld: sorry wait a sec [06:14] I mucked up rebase [06:14] wallyworld: ok, all good now [06:14] ok [06:22] axw: so there's 3 facades that dupe the getting of ca cert from controller config - you saying that sinces it's only half a dozen line sof code each time, it's not worth a common plugin [06:23] axw: 8066 lgtm [07:17] jam: thanks [07:17] wallyworld: sorry, was afk. I took it off APIAddresser because (a) it doesn't have anything to do with API addresses, and (b) it was being exposed by things that didn't care about API addresses, and vice versa [07:17] wallyworld: i.e. things only cared about CACert and not API addresses [07:18] which should be a pretty clear indication that they're orthogonal [07:19] sure, i was thinking about a new common plugin [07:19] but probably overkill [07:19] for what it saves [07:22] anyway, lgtm [07:25] wallyworld: yeah I don't think it's worthwhile. if it's used again maybe, but I don't see that happening any time soon [07:25] I guess the caas provisioner might need it. I'll add it then if required [07:25] np === frankban|afk is now known as frankban [08:17] jam: ping [08:26] thumper: pong [08:26] jam: got time for a quick chat about pingers? [08:26] I'm past EOD, but wanted to follow up [08:27] jam: I know you are on your standup so I'll leave ideas... [08:27] The dealing with reasources is required but perhaps not sufficient [08:28] I agree that we should work out where the other pingers are coming from [08:28] here's a thought... [08:28] api.Open will try all the apiservers, and kill those that aren't the first to respond [08:29] perhaps some of those don't get a close noticed on the apiserver, so they hang around for ~1 minute before the agent pinger closes them for not calling Pinger.Ping [08:30] if we were trying to open every few seconds, and there were some left around, this might be a reason why it floats around 20-30 [08:30] just a thought [08:30] given that it is required I'd still like to land it [08:31] I'll leave it to you to do the $$merge$$ if you are happy enough with my comments and rationale [08:31] * thumper out [10:08] balloons: something's borked in CI, https://github.com/juju/juju/pull/8056 says it's been accepted, but there's nothing running in jenkins [10:27] axw: possibly. I've run into a few of those where the bot fails in such a way that it doesn't respond to the PR [10:27] I can trigger a rebuild if you feel its ready to land [10:28] I do see a http://ci.jujucharms.com/job/github-merge-juju/508/ [10:28] which says it failed [10:33] jam: should be ready, it just failed on an intermittent unit test -- will try and fix that on develop tomorrow [10:33] jam: what's the procedure? I can probably do it too, I have jenkins login [10:34] axw: I *do* think we should bring it up to balloons / veebers, since I know when it was happening to me, it was a bug in the test script that it wasn't talking back to the bug. [10:34] axw: if you log into CI (I use 'developer') you should be able to go back to the bug and just use "rebuild" [10:34] on http://ci.jujucharms.com/job/github-merge-juju/508/ on the left hand side is a link to: http://ci.jujucharms.com/job/github-merge-juju/508/rebuild [10:35] jam: ok, thanks [10:36] externalreality: can you confirm the PR that you wanted me to review? It seems I had linked to the wrong one earlier [10:37] jam: seems like jenkins is busted. rebuilding, or starting a new build with the same parameters, does not result in a build job... [10:37] balloons: ^ [10:38] axw: hm. maybe the blue ocean stuff broke what I used to do [10:39] axw: the other option is that you just reply with the same message that the bot usually does [10:39] jam: tried that :( [10:39] never mind, I can land this tomorrow [10:39] ah, I see you did try that [10:53] jam: https://github.com/juju/juju/pull/8048 [10:53] thx [10:53] np [10:59] externalreality: I'll see about running your stuff in a sec as well [11:00] cool [11:05] wpk: did you do a patch to show normal machine error messages in tabular 'juju status'? [11:06] I'm running 2.3b3 to test things out, and I had an upgrade try-but-fail which is weird in its own right, but then the machines went to "error" but I don't see it normal status [11:14] It's even in 2.2 [11:14] the 'Message' field, so it should be there [11:30] wpk: is it not there because we only include INstance status and not Juju Agent status? [11:30] wpk: bug #1732156 [11:30] Bug #1732156: juju upgrade-juju --build-agent allows invalid upgrades [11:32] we're showing machine-status: message: [11:32] not juju-status: [11:32] IIRC [11:39] wpk: so, arguably we should allow for both [11:39] the former shows provisioning errors [11:39] the latter shows machiner errors once things are up [11:40] http://github.com/juju/juju/pull/8063 and http://github.com/juju/juju/pull/8068 could both use reviews [11:40] externalreality: wpk ^^ if you have a chance [11:40] I'm happy to be on-hand if someone wants context [11:40] though I think axw effectively approved 8068 because he approved the upstream mgo patch. [11:42] I think I figured out the problem with Trello's github integration, is that it doesn't default to hiding closed PRs [11:51] Bug #1732163 opened: juju status triggers some uninteresting DEBUG level mesasges [11:54] externalreality: so, how were you testing this that you found sometimes it breaks? Is it the CI tests, or just running "go test" in the right directory? [11:55] you were mentioning you thought it might be your mongo version, so I'm guessing it was somewhere in local tests [11:56] balloons: axw: I can confirm the same bad bot behavior for PR #8057 [11:56] something seems very wedged with the bot. [11:59] wpk: can you join: https://hangouts.google.com/hangouts/_/canonical.com/juju-doc?authuser=1 he had some FAN questions [11:59] jam: I can't be completely sure what it was [12:00] externalreality: right, I'm just trying to make sure that I'm exercising the same test that you saw failing [12:00] I know you said it was blocked at one point, but I don't see what was actually failing. [12:01] Ah, for example, initialization_test.go would fail attempting to build "txns.log" twice. [12:02] other tests would fail too, all suites that used stateSuite to establish connections to mongo [12:02] externalreality: I don't see an "initialization_test.go" file [12:02] am I just missing it? [12:03] hmm [12:03] initialize_test.go ? [12:05] jam, correct. And a good example of a test that was failing is `TestDoubleInitializeConfig` [12:33] externalreality: so, that test doesn't have anything to do with your changes, and I don't think it could possibly fail because of your changes (AFAICT). [12:33] since its a state/state.go test [12:33] might still be worth looking at, but otherwise its just a flaky test, and not related to your patch [12:38] Yes, perhaps a flaky test or something related to the specific vm that I was running it on (some akin to a messed up clock or something). [12:39] jam: blah, missed it while lunching. Are you still there? [12:40] wpk: no, we're done, but if you can respond to peter's questions around setting up VPC and the FAN would be useful. [12:48] kk [13:31] balloons: just to note, the CI bot seems thoroughly wedged right now, not sure if there is something we could do to fix it. we should probably learn how, so that we can be landing code even when part of the world is asleep [13:34] * jam heads away for EOD, though I'm likely to stop back again later. [13:52] I'll look [13:52] And I agree === freyes__ is now known as freyes [14:51] just fyi, I did nothing but it seems to have worked itself out [14:51] I'm curious if someone can comment about what was wrong [15:27] jam: I realized that I've never created a VPC for Juju, always used existing ones [15:27] (and if we don't have a clear doc on how to do it that's bad...) [16:32] balloons: we were submitting requests, and it was saying "going into the queue" but the queue itself was not updating. [16:33] jam, are things still pending? [16:34] balloons: I know axw had a PR, but also PR 8057 [16:34] balloons: actually, still just as broken for us [16:34] balloons: axw was trying to resubmit PR 8056 [16:35] and that is the top of the queue, but didn't get retried, and nothing else got queued [16:35] balloons: we also tried manually "rebuild" from the Jenkins UI, but didn't seem to do anything [16:35] hmm [16:43] jam, ah-hah! the disk is full [17:19] balloons: ... and there's no nagios to tell anyone ;) [17:19] wpk, indeed. Jenkins monitors all the nodes; but not itself === frankban is now known as frankban|afk [17:29] hml: fyi https://bugs.launchpad.net/juju/+bug/1732233 [17:29] Bug #1732233: Exiting from a debug-hook session puts hook in error state [17:31] thedac: was debug-hooks used because of an hook error? if so was it resolved before exit? [17:31] hml: I purposefully jumped into debug-hooks run them serially. Tried to exit clean but no matter what I do it goes into error state [17:32] all those log entries are me trying exit, exit 0 etc [17:33] I then have to do juju resolved --no-retry but this never actually passes relation data as juju thinks the hook has not "run" [17:33] thedac: well that’s not cool, i’m trying to remember if we changed debug-hooks recently… [17:33] Should be easily reproducible, not specific to openstack [17:33] thanks [17:33] no problem [18:25] balloons: are we back in business with jenkins? [20:48] hml, sorry, I missed your ping. I was following up on the pr's that seemed stuck [20:49] hml, yours failed to merge "FAIL github.com/juju/juju/worker/firewaller 1502.008s" [20:49] can I get a reivew on https://github.com/juju/juju/pull/8072? [20:49] just bumping the version [21:37] balloons: ty for restarting my merge, that failure is really odd, esp with my change, retrying [22:12] babbageclunk: how goes it with the ss stuff? [22:13] wallyworld: got confused about it again yesterday afternoon. But going alright again now. [22:13] ok, i'll review once it's ready [22:33] wallyworld: have you got a moment for a quick hangout? want to check something with you. [22:46] babbageclunk, wallyworld, https://github.com/juju/juju/pull/8074. This does juju-versions.yaml now in the snap [22:48] wallyworld, babbageclunk, however, note the juju-versions.yaml file will be in /snap/bin/juju; aka, next to the binaries [22:48] balloons: nice [22:49] tomorrow I'll get the patches included as well, and test it works for how we build / release [22:49] that will be a bit trickier. I may want to add a note about how to seed an agent yourself [23:20] balloons: yay, good progress