[00:09] <blahdeblah> axw: Does jujud-machine-0 need to contact itself on 127.0.0.1:17070?  I'm seeing those connections pop up even after seeing remote ones drop away.
[00:10] <blahdeblah> axw: i.e. should I allow the loopback ones?
[00:13] <axw> blahdeblah: it does yeah, maybe just stop the agents *except* jujud-machine-0 because jujud-machine-0 talks to itself over loopback also
[00:14] <blahdeblah> axw: Righto; so allow loopback, and stop the other agents on the machine 0
[00:14] <axw> blahdeblah: yup
[00:16] <blahdeblah> axw: OK - that's done; how long shall I leave it before checking for recovery?
[00:23] <axw> blahdeblah: I would expect it to be fairly quick if it's going to fix the issue. 10-15 mins
[00:23] <blahdeblah> OK - time to make a coffee and see how it goes then. :-)
[00:41] <blahdeblah> axw: load dropped off pretty dramatically: https://libertysys.com.au/imagebin/2iugcW6L.png
[00:41] <blahdeblah> and network traffic, but that's pretty much expected given I'm stopping the packets: https://libertysys.com.au/imagebin/B8B9g8F7.png
[00:42] <blahdeblah> axw: and here's your smoking gun: network https://libertysys.com.au/imagebin/Dqygzb0N.png
[00:43] <blahdeblah> axw: what next - start the local agents back up?
[00:45] <axw> blahdeblah: yes, unblock them and see what happens please
[00:48] <blahdeblah> OK - that's done
[00:48]  * blahdeblah starts writing an update for the bug report
[00:49] <axw> blahdeblah: thanks for that
[00:58] <blahdeblah> axw: So, where to next with this?  The agents have gone straight back to spamming the controller's logs...
[00:59] <blahdeblah> lots of these: unit-ntpmaster-5[2554]: 2016-11-23 00:58:53 INFO juju.worker.dependency engine.go:351 "uniter" manifold worker stopped: dependency not available
[00:59] <blahdeblah> and these:
[00:59] <blahdeblah> unit-nrpe-6[2568]: 2016-11-23 00:59:20 DEBUG juju.worker.dependency engine.go:301 starting "uniter" manifold worker
[00:59] <blahdeblah> unit-nrpe-6[2568]: 2016-11-23 00:59:20 DEBUG juju.worker.dependency engine.go:268 "uniter" manifold requested "agent" resource
[01:19] <blahdeblah> axw: And we're straight back to previous load & network traffic levels
[01:19] <blahdeblah> RAM still creeping up
[01:20] <axw> blahdeblah: are there errors in the log? about lease manager not being available or whatever it is
[01:57] <thumper> FAIL: pinger_test.go:104: pingerSuite.TestAgentConnectionDelaysShutdownWithPing
[01:57] <thumper> seriously?
[01:58] <thumper> this test is still failing?
[01:58] <thumper> FFS
[01:58] <menn0> thumper: yep... I thought I had it licked but not
[02:00] <anastasiamac> thumper: menn0: funny how it's not on my hit list of intemittents that block promotion from develop to staging..
[02:00] <anastasiamac> thumper: menn0: do u have time to pick it up?
[02:01] <thumper> dealing with migration stuff right now
[02:01] <menn0> anastasiamac: ditto... I'm not supposed to pick up anything else
[02:01] <anastasiamac> awesome \o/
[02:04] <blahdeblah> axw: Sorry - missed that message earlier.  Should the messages be of type ERROR?
[02:04] <axw> blahdeblah: yes
[02:04] <axw> blahdeblah: in the agent logs
[02:04] <blahdeblah> axw: not machine 0?
[02:05] <axw> blahdeblah: there might be some in there too, but the errors I'm thinking of are the cause of the (non-controller) agent restarts
[02:05] <menn0> wallyworld: review done
[02:05] <wallyworld> tyvm
[02:24] <blahdeblah> axw: no sign of any errors to do with lease manager
[02:25] <axw> blahdeblah: hmm, ok. no errors at all in the agents, suggesting they're restarting?
[02:25] <blahdeblah> axw: only the ones relating to the connection going down when I blocked traffic
[02:25] <blahdeblah> you want me to collect full logs from machine 0 for you?
[02:26] <axw> blahdeblah: yeah wouldn't hurt, thanks
[02:27] <anastasiamac> thumper: i wonder what the magic incantation is - still can't reproduce it after 520 runs under stress
[02:30] <thumper> menn0: https://github.com/juju/juju/pull/6597 easy one
[02:30] <menn0> thumper: looking
[02:31] <menn0> thumper: QA steps?
[02:31] <thumper> run the tests
[02:31] <thumper> I made sure it failed first
[02:31] <menn0> thumper: ok fine
[02:32] <menn0> thumper: how serious is the problem that's being fixed?
[02:32] <menn0> seems problematic
[02:32] <thumper> menn0: migrations will fail
[02:32] <thumper> if anyone has ever removed a unit that had an action
[02:32] <thumper> as the cleanups needs to be empty for the model
[02:32] <thumper> also, data just hangs around
[02:33] <thumper> plan on backporting to the 2.0 branch...
[02:33] <menn0> thumper: glad you found it now then
[02:33] <thumper> well
[02:33] <thumper> are we going to do another 2.0 release?
[02:33] <menn0> nfi
[02:33] <menn0> thumper: ship it
[02:33] <thumper> menn0: found it as part of the b7 upgrade
[02:34] <thumper> as horrible as that work has been, found many issues
[02:38] <anastasiamac> thumper: another 2.0.x release mayb a possibility -it's on a per-need basis..
[02:38] <anastasiamac> thumper: original plan is to support 2.0.x until feb
[03:03] <wallyworld> menn0: tech board?
[03:03] <menn0> wallyworld: just when I was actually starting to make some progress today...
[03:03] <wallyworld> yeah, always the way
[08:41] <macgreagoir> voidspace frobware: I've a couple of IPv6-related PRs in, if you have a minute. Both required for the next juju PR.
[08:41] <macgreagoir> https://github.com/juju/testing/pull/117
[08:42] <macgreagoir> https://github.com/juju/replicaset/pull/3
[08:42] <macgreagoir> Both simple too.
[11:21] <jam> frobware: ping
[11:21] <jam> just wondering what you discovered WRT the bridge script in your testing yesterdoy
[11:36] <jam> macgreagoir: just to mention, you can ping me as well now. :)
[11:37] <macgreagoir> jam: Of course :-) cheers.
[11:42] <frobware> jam: well, one thing is that if there's nothing to do we still run ifupdown - which was fine before, but not now.
[11:42] <frobware> jam: was offline - powercut
[11:42] <jam> frobware: good catch
[11:42] <jam> good thing you're using 4g everywhere, right? :)
[11:42] <jam> macgreagoir: reviewing
[11:43] <frobware> jam: I got sidetracked by landing https://github.com/juju/juju/pull/6599
[11:43] <frobware> jam: we will need the script to be present from now on so it seemed prudent to make that so...
[11:44] <jam> sure, though maybe we should just trigger writing it from jujud itself? given we have a copy inside the jujud binary
[11:45] <frobware> jam: can do. baby steps.
[11:45] <jam> :)
[11:45] <jam> I also wonder if we ultimately want it as a python script if we are moving to doing it ourselves. It made sense when it was driven by cloud-init. Stuff to consider, at least.
[11:45] <frobware> jam: I got a little annoyed it wasn't on there during my testing.
[11:45] <frobware> jam: yes, something I had considered too.
[11:46] <frobware> jam: as it stands now, it will work for dynamic and our current static bridging.
[11:59] <macgreagoir> frobware: replicaset comments amended.
[12:00] <jam> macgreagoir: commented.
[12:00] <frobware> macgreagoir: was still looking at this
[12:15] <frobware> macgreagoir: reviewed both
[12:15] <frobware> jam, macgreagoir, voidspace: PTAL @ https://github.com/juju/juju/pull/6599
[12:27] <frobware> it seems the pre-flight PR checks are failing
[12:27] <frobware> take a look at: http://juju-ci.vapour.ws/job/github-check-merge-juju/268/artifact/artifacts/trusty-err.log
[12:27] <frobware> scripts/setup-lxd.sh: line 10: lxd: command not found
[12:29] <voidspace> rick_h: grabbing coffee will be 2 mins late to first standup
[12:29] <rick_h> voidspace: rgr
[12:54] <jam> mgz: http://juju-ci.vapour.ws/job/github-check-merge-juju/268/ is that just out-of-disk space error?
[12:54] <jam> lxd-out.log at least has: "Failed to copy file. Source: /var/lib/jenkins/cloud-city/jes-homes/merge-juju-lxd/models/cache.yaml Destination: /var/lib/jenkins/workspace/github-check-merge-juju@2/artifacts/lxd/controller"
[12:55] <jam> and why does http://juju-ci.vapour.ws/job/github-check-merge-juju/268/artifact/artifacts/trusty-out.log have "17.04,Angsty Antelope,angsty,2016-10-23,2017-04-30,2018-01-29 "
[12:58] <jam> ++ lxd --version
[12:58] <jam> scripts/setup-lxd.sh: line 10: lxd: command not found
[12:58] <jam> seems fishy
[12:58] <perrito666> hey jam are you on our tz this week? or just working incredibly late?
[12:59] <jam> perrito666: it is just hitting 5pm here.
[12:59] <jam> I'm UTC+4, so not *that* far away from your TZ, just not very close.
[12:59] <perrito666> its 7hs, it far enough
[13:00] <perrito666> also I forgot it was morning here :p
[13:00] <mgz> jam: the Angsty thing is a hack to make sure distro-info behaves
[13:00] <perrito666> I might be needing vacations
[13:01] <jam> perrito666: when you start mixing up day and night...
[13:01] <jam> mgz: k, just thinking that since it is labeled 17.04 it will become wrong pretty soon.
[13:01] <perrito666> jam: just morning and afgernoon, to be fair 10AM and 8PM are not really different around here
[13:01] <jam> perrito666: more drinking at 10am?
[13:02] <perrito666> I wont admit nor deny that
[13:03] <jam> mgz: so I think the final thing is that trusty somehow is missing LXD, (not installed from trusty-backports?)
[13:06] <mgz> jam: the thing is, it works a portion of the time
[13:06] <mgz> and the script isn't changing
[13:06] <mgz> so I don't at a glance know what's up
[13:07] <mgz> someone just needs to spend some time to debug it, happens enough that it really needs resolving
[13:07] <mgz> I'll see if there's a bug already
[13:07] <jam> mgz: I don't see anything in http://juju-ci.vapour.ws/job/github-check-merge-juju/268/artifact/artifacts/trusty-out.log that indicates it is installing lxd
[13:08] <jam> is it doing something like accidentally using the Xenial script where lxd is preinstalled ?
[13:08]  * mgz finds a passing one to compare
[13:08] <jam> mgz: also, looking at the apt-update HTTP list, I don't see trusty-backports in there
[13:08] <jam> trusty and trusty-updates, but not trusty-backport
[13:09] <jam> mgz: I remember there were some issues with particular Trusty MAAS images that didn't have trusty-backports enabled by default, which differed from CPC images
[13:09] <mgz> ha, a passing one also fails there
[13:09] <mgz> but exits 0
[13:09] <jam> mgz: so passing is because it failed-to-fail ?
[13:10] <mgz> see 267
[13:10] <mgz> no, it ran the tests, by the trusty-out.log and they pass
[13:10] <jam> mgz: yeah, I see the "lxd not found" line.
[13:11] <mgz> so, the fatal error is actually the "Connection to ...amazonaws.com closed by remote host."
[13:12] <mgz> I don't see a bug, so will file one.
[13:12] <mgz> atm check job history is mostly red, and I suspect that's largely spurious
[13:21] <jam> frobware: commented on https://github.com/juju/juju/pull/6599
[13:33] <voidspace> every time the SSO two factor login annoys me, I remember I implemented it...
[13:33] <voidspace> dammit
[13:34] <mgz> voidspace: aha! does this mean we can all come to you with complaints? :)
[13:34] <voidspace> mgz: haha, I'm not on that team any more
[13:34] <voidspace> mgz: but feel free to complain, sure
[13:35] <gnuoy> do juju 1.X and 2.X both use the same version of goose ?
[13:36] <voidspace> mgz: ^^^^
[13:36] <voidspace> gnuoy: I doubt it
[13:38] <voidspace> gnuoy: different revisions of goose.v1
[13:38] <gnuoy> voidspace, hmm, ok, ta
[13:38] <voidspace> gnuoy: http://pastebin.ubuntu.com/23522141/
[13:38] <voidspace> gnuoy: 2.0 is on a 2016-11 revision
[13:39] <voidspace> gnuoy: 1.25 a 2015-11 revision
[13:39] <gnuoy> voidspace, ok thanks, thats v. helpful
[13:42] <mgz> gnuoy: nope
[13:42]  * mgz was late
[13:42] <mgz> gnuoy: we can bug fix goose for 1.25 if needed though
[13:42] <mgz> but it notably isn't going to have the new neutron stuff
[13:42] <gnuoy> mgz, I think the bug impacts both tbh but I've only confirmed on 1.X
[13:43] <gnuoy> mgz, the new neutron stuff has landeD ?
[13:43] <mgz> gnuoy: yes, I've delayed sending out announcement as there was some fallout to handle
[13:43] <mgz> but will do so.
[13:45] <voidspace> rick_h: I'm out of bourbon, can you bring me a nice bottle to Barcelona and I'll recompense you there... (pretty please :-)
[13:45] <voidspace> rick_h: if you can remember and be bothered of course...
[13:47] <rick_h> Voidspace hah, what is your preferred one?
[13:48] <voidspace> rick_h: any reasonable kentucky bourbon will be fine, I'm no expert
[13:48] <voidspace> rick_h: the one I've just finished that I liked
[13:48] <voidspace> rick_h: was bulleit (or something like that)
[13:48] <voidspace> rick_h: scotch is too rough for me, but you Americans can make nice whisky
[13:49] <frobware> jam: there is a wrapper around the bridge script which invokes it at its new location
[13:50] <frobware> jam: that wrapper always existed. It looks to see what version of python is available, and whether all interfaces should be bridged.
[13:51] <frobware> jam: and then runs: /path/to/python /path/to/the/bridge/script.py
[13:54] <gnuoy> mgz, fwiw https://bugs.launchpad.net/juju/+bug/1625624/comments/7
[13:54] <mup> Bug #1625624: juju 2 doesn't remove openstack security groups <ci> <landscape> <openstack-provider> <juju:Triaged by alexis-bruemmer> <https://launchpad.net/bugs/1625624>
[13:55] <voidspace> rick_h: and if there's anything I can bring from the UK just let me know, scotch, marmite, scottish shortbread....
[13:56] <mgz> gnuoy: thanks, having a look
[13:56] <frobware> note bug 1643057 was fixed in MAAS 2.1.2
[13:56] <mup> Bug #1643057: juju2 with maas 2.1.1 LXD containers get wrong ip addresses <cdo-qa-blocker> <landscape> <juju:Invalid> <MAAS:Fix Committed by mpontillo> <MAAS 1.9:Won't Fix> <MAAS 2.0:Won't Fix> <MAAS 2.1:Fix Released by mpontillo> <https://launchpad.net/bugs/1643057>
[13:57] <mgz> gnuoy: do you know if this is tied to a particular openstack version?
[13:57] <mgz> because goose hasn't changed the underlying http handling in a long time I believe
[13:57] <gnuoy> mgz, that is an excellent question. I believe this broke for us when we went to Mitaka
[13:58] <mgz> gnuoy: that sounds very likely
[13:58] <mgz> most of our CI testing is against older (much older) versions
[14:08] <balloons> ahh frobware, you need to look at http://juju-ci.vapour.ws/job/github-check-merge-juju/269/artifact/artifacts/trusty-out.log/*view*/. You have unit test failures. Everything else is noise
[14:22] <frobware> balloons: heh, none of these seem to be related to what I changed.
[14:22] <mgz> balloons: 268 is the one to look at. it either dced after build or something else similar.
[14:22] <mgz> 269 is perrito666's
[14:22] <mgz> (which does have real unit test failures)
[15:09] <mup> Bug #1625624 opened: juju 2 doesn't remove openstack security groups <ci> <landscape> <openstack-provider> <juju:Triaged by alexis-bruemmer> <juju 2.0:Triaged> <juju-core:Triaged> <https://launchpad.net/bugs/1625624>
[15:20] <katco> natefinch: on that branch; not shown: i have to land the CallMocker type into juju/testing
[15:21] <natefinch> katco: ahh, thanks, that would be a key piece :)
[15:21] <katco> natefinch: yeah sorry i just ran out of time last night
[15:21] <katco> natefinch: you can see what it looks like because it's removed from deploy_test.go
[15:21] <katco> natefinch: but doesn't build atm. feel free to wait on review if you like
[15:22] <katco> natefinch: i will endeavor to get that landed around lunch
[15:25] <natefinch> katco: I can start the review now, that's no problem
[15:27] <natefinch> wow that call mocker adds a lot of complexity
[15:30] <katco> natefinch: going into a meeting, but would appreciate more thoughts on that
[16:31] <perrito666> mgz: ??
[16:32] <perrito666> mgz: I dont have a merge in progress
[16:36] <mgz> perrito666: the check on pr #6564
[16:38] <perrito666> mgz: ah yes, I know
[18:45] <mup> Bug #1644333 opened: !amd64 controller on MAAS cloud requires constraints <juju-core:New> <https://launchpad.net/bugs/1644333>
[20:24] <babbageclunk> hey menn0
[20:24] <menn0> babbageclunk: howdy
[20:25] <babbageclunk> so my logtransfer stuff is failing because the migrationmaster worker is connecting as a machine agent but the debuglog endpoint expects a user.
[20:26] <katco> menn0: hey, curious what you think of this https://github.com/juju/testing/pull/118/files in relation to this https://github.com/juju/juju/pull/6598/files
[20:27] <babbageclunk> menn0: I guess I should add another httpContext authentication method that has an authfunc for any one of machine/unit/user?
[20:27] <menn0> babbageclunk: well just machine+user would be enough right?
[20:28] <babbageclunk> menn0: oh, right
[20:28] <menn0> katco: i'll take a look shortly
[20:28] <katco> menn0: no rush, ta
[20:28] <babbageclunk> menn0: I was going to use stateForRequestAuthenticated, but that doesn't check permissions
[20:29] <menn0> babbageclunk: but yes that seems ok. there's not much danger in opening up that endpoint to machines
[20:29] <babbageclunk> menn0: Ok, thanks, I'll do that. Tempted to do it as a function that accepts the tag kinds and rework the others in the same way.
[20:30] <menn0> babbageclunk: that could be nice as long as it's not too big a change
[20:30] <menn0> babbageclunk: i.e. don't get too distracted
[20:32] <babbageclunk> menn0: No, it's simple.
[20:32] <menn0> babbageclunk: cool, then go for it
[20:32] <babbageclunk> menn0: I'm so close to getting this working I can taste it!
[20:33] <menn0> babbageclunk: what does it taste like? :)
[21:34] <babbageclunk> menn0: it tastes like... victory
[21:40] <perrito666> babbageclunk: for a moment I thought we where still talking about what came out of your keyboard :p
[21:41] <babbageclunk> perrito666: ugh, who put all this cat hair on my keyboard!?
[21:41] <babbageclunk> perrito666: never clean anything, it only leads to heartache
[22:23] <thumper> anyone else noticed a plethora of ERRORs now being written to the logs like this:  11:17:30 ERROR juju.rpc error writing response: write tcp 127.0.0.1:17070->127.0.0.1:52408: write: connection reset by peer
[22:23] <thumper> normally about 10 at a time
[22:42] <babbageclunk> thumper: yeah, I think I've seen those - I didn't realise they were new though
[22:47] <babbageclunk> axw, thumper, menn0: can someone take a look at this? https://github.com/juju/juju/pull/6606
[23:00] <menn0> katco: I like the approach. i've added a bunch of suggestions let's get that in.
[23:00] <menn0> babbageclunk: looking
[23:01] <thumper> menn0: oh fuck...
[23:02]  * menn0 cringes
[23:02] <thumper> menn0: I've just hit something quite interesting
[23:02] <thumper> quick HO to discuss?
[23:02]  * menn0 doesn't like interesting
[23:02] <menn0> thumper: no :)
[23:02] <thumper> :(
[23:02]  * menn0 buries head in sand
[23:02] <menn0> thumper: see you in 1:1
[23:02] <thumper> ack
[23:04] <thumper> machine-0: 12:01:20 ERROR juju.worker.migrationmaster:9097b3 source prechecks failed: unit ubuntu/0 not idle (failed)
[23:20] <babbageclunk> thumper: ping?
[23:20] <thumper> babbageclunk: hey
[23:22] <babbageclunk> thumper: my migration master's getting an error when connecting to the logtransfer endpoint - the target model isn't importing (since it's already at success).
[23:22] <thumper> ah...
[23:22] <babbageclunk> thumper: how do you think I should handle that - another phase?
[23:23] <thumper> what comes after success?
[23:23] <babbageclunk> thumper: ...profit?
[23:23] <thumper> heh
[23:23] <menn0> thumper, babbageclunk: SUCCESS -> LOGTRANSFER
[23:23] <babbageclunk> menn0: but is the target model in that state?
[23:24] <thumper> I think perhaps the method should take the expected import state
[23:24] <thumper> stateForMigration should take an expected state
[23:24]  * menn0 agrees with thumper
[23:24] <thumper> and check against that
[23:24] <thumper> rather than just assuming importing
[23:25] <menn0> the logtransfer endpoint should only work if the migration state is "none"
[23:25] <babbageclunk> thumper, menn0 - yeah, makes sense. In this case that's MigrationModeNone
[23:25] <menn0> (that's not the exact name)
[23:25] <thumper> why is the state none?
[23:25] <babbageclunk> ok, thanks chaps - doing that now
[23:25] <thumper> if it is still migrating?
[23:26] <babbageclunk> menn0: ^
[23:26] <thumper> menn0: FYI, the precheck is failing because the machine instance status is "started" not "running"
[23:27] <menn0> thumper: ok cool .. easy to fix then
[23:27] <thumper> I'm looking at what full status is looking at
[23:27]  * thumper is still digging
[23:28] <menn0> thumper: fullstatus should be using the same thing to generate the status but perhaps it's the interpretation of the status
[23:28] <thumper> yeah, that's what I'm checking
[23:29] <menn0> babbageclunk: review done. looks great.
[23:29] <babbageclunk> menn0: thanks!
[23:33] <axw> mgz: thanks for the review, PTAL at my response when you have a moment
[23:34] <perrito666> axw: wallyworld I need to skip standup sorry ill update you guys upon return
[23:34] <perrito666> axw: good catch with region check
[23:35] <axw> perrito666: ok, ttyl
[23:37] <axw> perrito666: BTW see validateCloudRegion in state/model.go for one place we're doing validation of region. I think we just want to copy that logic into environs.MakeCloudSpec
[23:37] <axw> we're validating when we *do* pass a region, but not when we don't
[23:37] <thumper> menn0: status always shows "started" never "running"
[23:43]  * thumper sighs
[23:44] <thumper> apiserver converts running -> started somewhere
[23:44]  * thumper still digging
[23:46] <menn0> thumper: i'll show you where after the standup
[23:46] <thumper> ok
[23:47] <wallyworld> perrito666: standup?
[23:55] <thumper> menn0: fyi https://bugs.launchpad.net/juju/+bug/1620438
[23:55] <mup> Bug #1620438: model-migration pre-checks failed to catch issue: not idle <ci> <intermittent-failure> <regression> <juju:Triaged by menno.smits> <https://launchpad.net/bugs/1620438>
[23:59] <mup> Bug #1640521 changed: Unable to deploy Windows 2012 R2 on AWS <juju-core:Won't Fix> <https://launchpad.net/bugs/1640521>
[23:59] <mup> Bug #1644333 changed: !amd64 controller on MAAS cloud requires constraints <juju:New> <https://launchpad.net/bugs/1644333>