[00:15] <wpk> anastasiamac: thanks
[00:16] <anastasiamac> wpk: anytime \o/
[00:45] <menn0> babbageclunk: I guess the lazy lookup was in case something happened to import the package without actually running mongod using tests?
[00:45] <menn0> babbageclunk: for that reason I guess it makes sense to keep it lazy
[00:49] <thumper> wallyworld: I have things to do in town this afternoon, so let's cancel our call
[00:49] <thumper> wallyworld: we will be seeing each other in two days anyway
[00:51] <wallyworld> thumper: sgtm
[01:05] <axw> menn0: I've noticed that we're not passing "--storageEngine wiredTiger" to mongo 3.2 in the tests. have you tested with that added?
[01:20] <babbageclunk> menn0: yeah, that'd be the reason not to do it at import time.
[01:52] <menn0> axw: isn't the default WT?
[01:53] <axw> menn0: yeah, just the help text is a bit confusing "defaults to wiredTiger if no data files present"
[01:53] <axw> menn0: anyway, I tested and it made no difference
[01:53] <menn0> axw: ok good to know
[01:54] <axw> menn0: forcing it to mmapv1 makes it faster though, in the limited testing I did
[01:54] <axw> menn0: I don't think we want to do that though, since we use WT in prod
[01:54] <axw> though maybe just having WT in CI would be good enough
[01:56] <menn0> axw: we think the problem is that we're still deleting DBs instead of clearing the collections in a lot of places
[01:56] <menn0> WT is much slower at deletings DBs
[01:59] <axw> menn0: could be, but I did notice that at least some of the test code (i.e. between SetUpTest completion and before TearDownTest start is slower with WT
[02:01] <menn0> ok, so there's issues there as well
[02:32] <menn0> axw, thumper, babbageclunk: there's quite a significant performance increase with wiredtiger if you disable transparent huge pages
[02:32] <menn0> echo never > /sys/kernel/mm/transparent_hugepage/enabled
[02:33] <menn0> without changing anything else this takes the agent/agentbootstrap tests from 2.8s to 1.8s on my machine
[02:33] <menn0> consistently
[02:35] <menn0> interestingly, Juju is supposed to be setting that on controller machines but I don't see that
[02:36] <babbageclunk> menn0: right, but do you know what other impact that would have on our systems?
[02:36] <menn0> babbageclunk: not sure yet
[02:36] <axw> menn0: hmm interesting. doesn't improve allWatcherStateSuite.TestChangeApplications in state for me
[02:36] <menn0> axw: what's the setting set to on your machine?
[02:36] <axw> menn0: it was set to always
[02:37] <menn0> ok
[02:37] <babbageclunk> menn0: It makes sense to do it on dedicated machines (although I guess it wouldn't help in lxd?)
[02:37] <menn0> duh
[02:37] <axw> menn0: with mongo 2.4 (juju-mongodb), that test takes just under 0.4s. with mongo 3.2 (juju-mongodb3.2) it takes 1.3s
[02:38] <menn0> that sucks
[02:38] <menn0> maybe we shouldn't be using wiredtiger at all
[02:38] <menn0> there's a lot of horror stories online about WT
[02:38] <menn0> lots of people seem to have switched back
[02:43] <menn0> babbageclunk: https://github.com/juju/testing/pull/123/files
[02:43] <thumper> hmm...
[02:43] <thumper> wow
[02:45] <thumper> menn0: initial timing tests... 2.6 mongod 23m17s, api 32s, apiserver 138s, state 671s
[02:45] <menn0> babbageclunk, axw, thumper: seems like we get most of the performance back by passing --storageEngine mmapv1
[02:45] <menn0> but then it's not like production
[02:45] <babbageclunk> menn0: I think I floated that at the time.
[02:46] <thumper> menn0: initial timing tests... 3.2 mongod 55m36s, api 366s, apiserver 1515s, state 1455s
[02:46] <thumper> api and apiserver packages are 10x slower
[02:46] <menn0> thumper: can you try with huge transparent pages turned off?
[02:46] <babbageclunk> menn0: I guess the main risk/annoyance would be if there's some behaviour difference and we only find out when the CI tests fail.
[02:46] <thumper> let me address the clear databases
[02:47] <thumper> then I'll try with the transparent pages off
[02:47] <menn0> thumper: yep ok.
[02:47] <axw> menn0: yeah that's what I found too. I think I'm OK with that as long as our CI still uses 3.2 (which it would, since there's no way to override that?)
[02:47] <babbageclunk> menn0: less of a problem now that we have check builds though
[02:51] <axw> babbageclunk: do you have time for a small review? https://github.com/juju/testing/pull/124
[02:52]  * thumper has kicked off the new test run
[02:53] <menn0> babbageclunk, thumper: I'm going to return to other work for now but will be happy to discuss ideas or test stuff
[02:53] <thumper> menn0: ack
[02:57] <thumper> babbageclunk: using clear databases rather than reset actually looks like it is taking longer...
[03:00] <thumper> hmm
[03:00] <thumper> api package was 378s
[03:01] <thumper> menn0, babbageclunk: I'll wait for the apiserver package timings, but the clear databases call makes it even slower
[03:01] <thumper> I'll try the transparent pages turned off
[03:03] <babbageclunk> axw: sure, looking at that now (and menn0's too).
[03:10] <babbageclunk> axw: LGTM'd
[03:19] <axw> babbageclunk: thanks
[03:26] <axw> babbageclunk: did you forget about this one? https://github.com/juju/description/pull/8
[03:26] <axw> there's no bot on that repo, I can merge if you like
[03:27] <babbageclunk> axw: oh, yes please - I think I decided to merge it when thumper was away and then forgot to follow it up.
[03:27] <babbageclunk> ta
[03:27] <axw> babbageclunk: done
[03:31] <menn0> babbageclunk: thanks for the review
[03:31] <babbageclunk> menn0: <thumbsup emoji>
[03:32] <thumper> menn0, babbageclunk: changing the huge transparent pages is making very little difference
[03:32] <thumper> I don't know what to do next
[03:33] <menn0> thumper: weird. it seemed to make a big difference for me.
[03:33] <babbageclunk> thumper: :( you said clearDatabases is slower?
[03:33] <thumper> yeah
[03:35] <babbageclunk> thumper: well bums
[03:35] <thumper> yeah
[03:37] <thumper> even with the huge pages set to never, the api package is 10x slower
[03:37] <thumper> so 300s instead of 30s
[03:38] <thumper> NFI what to do next
[03:39] <babbageclunk> convienient timing too
[03:41] <menn0> thumper: --storageEngine mmapv1 :)
[03:41] <thumper> menn0: where?
[03:41] <menn0> in juju/testing/mgo.go, in the args we pass to mongod
[03:42] <menn0> have to be careful that mongod is 3.x though as --storageEngine doesn't exist in 2.x
[03:43] <menn0> thumper: also, what about dropping DBs in MgoSuite.TearDownTest instead of clearing?
[03:54] <thumper> menn0: did you do something about caching the mgo version?
[03:54] <menn0> thumper: yes
[03:55] <thumper> menn0: where?
[03:55] <menn0> https://github.com/juju/testing/commit/3ccb7d0a3f3412b41f8c1bf000ff7078d28c0af9
[03:59] <thumper> k
[04:00] <menn0> thumper: that doesn't deal with the slowness. it just closes a potential problem.
[04:02] <thumper> hmm with mmapv1 storage engine it is only 30% slower
[04:09] <thumper> are we prepared to accept a 30% decrease in test speed?
[04:09] <thumper> and what other choice do we have?
[04:10] <thumper> and should this info drive production choices?
[04:13] <babbageclunk> thumper: production usage patterns are very different from the tests though - we very rarely drop databases. :)
[04:13] <thumper> yeah...
[04:13]  * thumper needs to organise some trip stuff
[05:19] <babbageclunk> jam: ping?
[06:16] <axw> wallyworld: oracle bootstraps for me on my trial account FWIW
[06:17] <wallyworld> axw: it is failing because it is starting a yakkety instance (and looking for yakkety tools in /var/lib/juju) even though juju thinks it has asked for a xenial instance. so there's an issue with image selection :-(
[06:18] <wallyworld> i will try and remove the yakkety image from my account, that should fix it
[06:18] <axw> okey dokey. I think I only have xenial in mine
[06:19] <wallyworld> yeah, i'll let gabriel know, just testing to make sure
[07:22] <wpk> axw: re: proxy - this can break things, as it really sets the proxy globally
[07:45] <wpk> axw: I forwarded you an e-mail from jam describing changes, I don't want to land it without a proper test run, preferably by someone who uses proxying
[07:45] <wpk> axw: there is a 'workaround' (remove systemd code), but that has yet to be decided
[07:59] <jam> wpk: how about if we leave the ability to do systemd, but have it disabled, and land the rest ?
[07:59] <jam> I think the rest is a clear improvement
[08:00] <jam> and we can discuss whether systemd will actually break things
[09:14] <mup> Bug #1686938 opened: During a destroy-model the units first update/upgrade which delays the destroy process <apt> <delay> <destroy-model> <upgrade> <juju-core:New> <https://launchpad.net/bugs/1686938>
[09:59] <wpk> Is anyone willing to review #7204? I know it's big but it's been waiting for almost a month now..
[11:12] <jam> wpk: did you go through it as though you were doing a review as well?
[11:26] <wpk> jam: yes, although it is ahuge one so I might have missed some things...
[11:27] <jam> wpk: so it seems there is one piece that needs fixing wrt bridge_ports and inet and inet6 stanzas
[11:28] <wpk> jam: we should only put one?
[11:28] <jam> bug #1650304
[11:28] <mup> Bug #1650304: Juju2: 'Creating container: failed to ensure LXD image: image not imported!' <oil> <oil-2.0> <regression> <juju:Incomplete> <juju 2.1:Incomplete> <https://launchpad.net/bugs/1650304>
[11:29] <jam> from axw: https://bugs.launchpad.net/juju/+bug/1650304/comments/7
[11:37] <wpk> jam: Reading through bridge-utils, IMHO it's OK to have it but I'll check
[11:38] <jam> wpk: it wasn't ok from experience, given the original bug
[11:41] <wpk> jam: as I understand that wasn't the problem, and looking at the code it's simply doing for (bridge in bridge_ports) if (bridge not already in the bridge) brctl addif....
[11:42] <jam> hopefully thats if "port not already in the bridge" :)
[11:42] <jam> but I get your point
[11:43] <jam> wpk: are we sure that is true across trusty+xenial+yaketty, etc?
[11:46] <wpk> trusty, xenial, zesty
[11:46] <wpk>     if [ "$MODE" = "start" ] && [ ! -d /sys/class/net/$IFACE/brif/$port ]; then
[11:46] <wpk> I don't have any Yakkety but I don't think they'd change that line just for this one :)
[12:35] <axw> wpk: if you're changing it back (I'm happy to be told I was wrong - again, I'm no expert), can you please run the QA steps from that PR that jam linked to?
[12:35] <axw> I mean, the PR linked from the bug that jam linked to...
[12:44] <wpk> axw: willdo, I'm checking how the 'old version' worked and bridge_ports wasn't the only problem (repeated auto, etc.)
[12:44] <wpk> axw: was the problem only with rackspace or with other providers too?
[12:45] <axw> wpk: yep, I may well have conflated the other issues. that's the only place I observed and reproduced the issue. I think it was also seen on MAAS by others, but could never confirm myself
[12:48] <rogpeppe> here's a tiny little PR for review, controller-specific cookie jars: https://github.com/juju/juju/pull/7294
[12:49] <rogpeppe> axw, wpk, jam, wallyworld: if someone manages to review it, i'll owe them a beer or two :)
[12:50] <axw> rogpeppe: heh. I've already got a beer and it's 9pm on a a friday night ;)  I'll take a look on monday if nobody does it sooner
[12:50] <rogpeppe> axw: thanks
[12:51] <rogpeppe> axw: this work was the subject of this tweet: https://twitter.com/rogpeppe/status/849963422032723968 :)
[12:52] <axw> rogpeppe: I figured :)