[00:55] <wallyworld_> axw: hey, with your race fix PR - would it be simpler just to have each test start/stop workers explicitly rather than mess with setup/teardown
[01:02] <axw> wallyworld_: TBH I don't think we really need to duplicate the tests for each permutation. I'd rather fix the dependencies in the long term, and do the easy fix for now
[01:03] <wallyworld_> ok, maybe add a TODO
[01:13] <axw> wallyworld_: done
[01:13] <wallyworld_> ta
[02:01] <axw> wallyworld_: did I miss anything in standup? I've got PRs up for GCE and OpenStack to use volume attachment AZs in StartInstance now, will move on to PrecheckInstance if there's nothing release-related I should be looking at
[02:03] <wallyworld_> axw: we're waiting on a new CI test run with resource limits fixed so we can see where we stand. So right now, nothing that warrants attention (but that might change) We want to release Thursday. I saw the PRs, will look soon
[02:03] <axw> wallyworld_: okey dokey
[03:04]  * axw screams into a pillow
[03:04] <axw> ci failures driving me crazy
[03:04] <veebers> axw: which failure is giving you grief?
[03:05] <axw> veebers: mostly grant and windows
[03:05] <axw> veebers: and mostly mongo stuff on windows
[03:05] <axw> if not only
[03:06] <veebers> ah right :-\
[03:07] <veebers> axw: I might get a chance today to look at why grant is so flaky, see how it might be fixed
[03:07] <axw> veebers: that would be wonderful, thanks. I know you're busy though
[03:08] <axw> I intend to see what we can do to either cut mongo out of windows tests, or see if they can be made more robust
[03:08] <axw> but features
[04:44] <wallyworld_> babbageclunk: have you looked at assess_log_forward.py ?
[04:45] <wallyworld_> that Ci test should have deets on how to set stuff up
[04:45] <babbageclunk> wallyworld_: ooh, no - I'm trying the charm at the moment to see how that does it. Looking at the test now.
[04:46] <babbageclunk> wallyworld_: thanks
[06:29] <rogpeppe> wpk: ping
[06:29] <wpk> pong
[06:29] <rogpeppe> wpk: i just took a look at https://github.com/juju/juju/pull/7414
[06:30] <rogpeppe> wpk: i think that returning the original error is fine, but there should definitely not be a warning there
[06:30] <wpk> rogpeppe: Ian merged it already, I wanted for you to take a look at it as that's your change
[06:30] <wpk> Why no warning?
[06:30] <rogpeppe> wpk: because that situation happens all the time and it's normal
[06:31] <rogpeppe> wpk: we don't want people to see warnings all the time - it makes them worried :)
[06:31] <wpk> rogpeppe: the warning is issued only if both methods fail
[06:31] <wpk> rogpeppe: (and the connection fails)
[06:33] <rogpeppe> wpk: hmm, i guess it's not that common to get a cert error then another error. i still don't think it's worth a warning though.
[06:34] <rogpeppe> wpk: i think i'd put it at info level
[06:34] <wpk> rogpeppe: but the connection fails
[06:34] <rogpeppe> wpk: not necessarily
[06:34] <rogpeppe> wpk: there can be many concurrent dial instances
[06:34] <rogpeppe> wpk: and if it's the only reason it fails, we'll display the returned error anyway
[06:35] <wpk> but the returned error is half of the story
[06:36] <rogpeppe> wpk: yeah, but i can't see that mattering much unless you're trying to debug, and in that case you can see Info or Debug level messages easily
[06:37] <rogpeppe> wpk: i'm wary of unnecessary warnings
[06:37] <rogpeppe> wpk: oh yes, and also line 705 already logs the returned error, so we'll be logging the same error twice
[06:39] <rogpeppe> wpk: i'd suggest changing line 746 to: logger.Debugf("failed to connect to websocket with public cert after private cert failed: %v", rootCAErr)
[06:44] <rogpeppe> wpk: i'm particularly wary of extra warning messages because they're a common source of bug reports
[07:46] <wpk> rogpeppe logger.Debugf("Failed dialing websocket using fallback public CA - %q", rootCAErr
[07:46] <wpk> rogpeppe: looks OK?
[07:47] <rogpeppe> wpk: usually we prefix errors with a colon
[07:48] <rogpeppe> wpk: and i think "failed to dial" reads a bit better that "failed dialing"
[07:48] <rogpeppe> wpk: i'm never sure whether log messages should start with a capital letter or not...
[07:49] <wpk> logger.Debugf("Failed to dial websocket using fallback public CA: %v", rootCAErr)
[07:50] <rogpeppe> wpk: SGTM, but i'd probably use lower case "failed" as 90% of debug messages start with lower case
[07:51] <rogpeppe> wpk: thanks
[07:55] <wpk> rogpeppe: https://github.com/juju/juju/pull/7416
[07:56] <rogpeppe> wpk: LGTM
[08:34] <SimonKLB> is it possible to run setup the controller and a worker on the same machine using manual provisioning?
[08:35] <SimonKLB> im just setting up a poc on a single machine right now, but id like to have the possibility to scale out to two machines later on
[08:35] <SimonKLB> using the localhost controller will bind me to a single machine right?
[08:45] <axw> SimonKLB: you can deploy applications to the controller machine. just bootstrap, then switch to the controller model ("juju switch controller"), and then deploy the app with "--to 0" to place the application unit on machine 0
[08:55] <SimonKLB> axw: ah of course :) thanks
[09:09] <SimonKLB> there seem to be some problems installing a snap-based charm in lxd, the mount unit fails
[09:09] <SimonKLB> is this a known limitation?
[09:10] <SimonKLB> May 30 09:07:02 juju-222327-0-lxd-0 mount[10844]: fusermount: mount failed: Operation not permitted
[09:49] <rogpeppe> axw: if you're still around, i need a second review of https://github.com/juju/juju/pull/7407
[09:50] <rogpeppe> jam: ^
[09:51] <jam> SimonKLB: snaps and lxd have some known caveats regardless of juju and charms
[09:51] <jam> SimonKLB: I don't remember exactly, it may be that you can install user-space mounting tools and get it to work
[09:51] <jam> SimonKLB: https://stgraber.org/2016/12/07/running-snaps-in-lxd-containers/
[09:51] <jam> SimonKLB: seems you need 'squashfuse' ?
[09:54] <SimonKLB> jam: https://bugs.launchpad.net/snappy/+bug/1611078
[09:54] <mup> Bug #1611078: Support snaps inside of lxd containers <landscape> <lxd> <nova-lxd> <verification-failed-xenial> <Snappy:Fix Released by stgraber> <apparmor (Ubuntu):Fix Released by tyhicks> <linux (Ubuntu):Fix Released by jjohansen> <lxd (Ubuntu):Fix Released by stgraber> <apparmor (Ubuntu
[09:54] <mup> Xenial):Fix Released by tyhicks> <linux (Ubuntu Xenial):Fix Released by jjohansen> <lxd (Ubuntu Xenial):Fix Committed> <apparmor (Ubuntu Yakkety):Fix Released
[09:54] <mup> by tyhicks> <linux (Ubuntu Yakkety):Fix Released by jjohansen> <lxd (Ubuntu Yakkety):Fix Released by stgraber> <https://launchpad.net/bugs/1611078>
[09:54] <SimonKLB> it got fixed but then it hit a new (?) issue
[09:55] <SimonKLB> this guy seem to have hit the exact same thing as i have https://bugs.launchpad.net/snappy/+bug/1611078/comments/29
[09:55] <mup> Bug #1611078: Support snaps inside of lxd containers <landscape> <lxd> <nova-lxd> <verification-failed-xenial> <Snappy:Fix Released by stgraber> <apparmor (Ubuntu):Fix Released by tyhicks> <linux (Ubuntu):Fix Released by jjohansen> <lxd (Ubuntu):Fix Released by stgraber> <apparmor (Ubuntu
[09:55] <mup> Xenial):Fix Released by tyhicks> <linux (Ubuntu Xenial):Fix Released by jjohansen> <lxd (Ubuntu Xenial):Fix Committed> <apparmor (Ubuntu Yakkety):Fix Released
[09:55] <mup> by tyhicks> <linux (Ubuntu Yakkety):Fix Released by jjohansen> <lxd (Ubuntu Yakkety):Fix Released by stgraber> <https://launchpad.net/bugs/1611078>
[09:56] <jam> SimonKLB: is that after installing squashfuse?
[10:00] <SimonKLB> jam: nope :) that fixed it
[10:00] <SimonKLB> probably need to add that to the snap layer
[10:00] <jam> SimonKLB: you only need squashfuse inside lxd
[10:00] <jam> so snaps on VM/baremetal don't need it
[10:01] <SimonKLB> jam: right, so what would be the correct way to fix it? determine where the charm is deployed and install squashfuse if it's in lxd?
[10:02] <jam> SimonKLB: its an unfortunate leaky abstraction, and I don't have a great answer for it. charms generally shouldn't know where they are installed, but maybe occasionally they have to
[10:02] <SimonKLB> jam: agreed
[10:02] <jam> *juju* shouldn't care that a charm uses snaps, etc
[10:03] <jam> SimonKLB: if anything, I would tend to say that 'snapd' should know its in a container and depend on squashfuse
[10:04] <SimonKLB> jam: yea, could this perhaps be fixed using profiles?
[10:04] <jam> SimonKLB: or maybe, since the default ubuntu images come for lxd have snapd installed, then they should have squashfuse installed as well
[10:04] <SimonKLB> jam: i think that would be the most straight forward fix
[10:05] <SimonKLB> since snap is pretty useless in lxd, the lxd image should have squashfuse installed
[10:05] <jam> (caveat that there is a push to have 1 'pristine' Ubuntu build, that then gets used for MAAS/LXD, etc)
[10:05] <SimonKLB> pretty useless without squashfuse that is
[10:05] <jam> SimonKLB: yeah, I agree that snaps in lxd without squashfuse just don't work
[10:05] <jam> SimonKLB: I think it is "known", I'm not sure who/where they should be working on it.
[10:05] <SimonKLB> jam: any idea where to propose this addition to the lxd images?
[10:05] <SimonKLB> ah
[10:06] <jam> SimonKLB: so, file a bug, tell me about it, and I'll raise it to some people
[10:06] <SimonKLB> is there a repo or something like that for the images?
[10:06] <SimonKLB> or where should it be filed
[10:06] <jam> SimonKLB: if anything I would raise it against Snappy in Ubuntu
[10:06] <SimonKLB> okok!
[10:06] <jam> https://bugs.launchpad.net/ubuntu/+source/snapd
[10:07] <SimonKLB> snappy or snapd? :D
[10:07] <jam> they can always add other projects to it
[10:07] <SimonKLB> right
[10:07] <jam> well 'snapcraft'/ building snaps doesn't need it, I think
[10:07] <jam> I don't know snappy vs snapd very well
[10:07] <jam> snapcraft being the user tools, and something being the snap store, etc.
[10:08] <SimonKLB> me neither :) ill report it in snapd and someone that knows better can re-label it perhaps
[10:11] <SimonKLB> jam: https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1694411
[10:11] <mup> Bug #1694411: Add squashfuse to the Ubuntu LXD images <snapd (Ubuntu):New> <https://launchpad.net/bugs/1694411>
[10:11] <jam> SimonKLB: thanks, will add some people
[10:11] <SimonKLB> great!
[11:13] <SimonKLB> got everything installed for a basic kubernetes install on manual provisioned lxd containers except etcd
[11:13] <SimonKLB> ERROR cannot add application "etcd": cannot deploy to machine 0/lxd/6: adding storage to lxd container not supported
[11:13] <SimonKLB> :(
[11:14] <SimonKLB> is this a WIP?
[11:35] <SimonKLB> turns out this only applies when you use the --to flag when deploying a charm
[11:35] <SimonKLB> https://github.com/juju/juju/blob/635d98d8cb34e0124f1baf1aa3baec2e28511a64/state/unit.go#L1496
[12:22] <jam> SimonKLB: axw is in Perth, Australia whose been working on storage, but AIUI we don't support custom storage on LXD provider yet, I'm not sure whether the bundle is trying to supply something custom that we don't support, or whether we just have a bug where we're trying storage when it isn't requested
[12:24] <SimonKLB> jam: looking at that line i linked to it seems that the storage is only validated when you deploy using the --to flag to do a specific machine assignment
[12:24] <SimonKLB> wasn't a problem deploying etcd without --to
[12:25] <SimonKLB> should probably validated it a bit less strict though, since persistent storage isnt a requirement of etcd, just an option
[13:24] <rogpeppe> jam: ping
[13:45] <rogpeppe> jam: i've replied to your concerns voiced in https://github.com/juju/juju/pull/7407 - i'd be interested to have a chat about it if you have a moment at some point
[13:46] <jam> rogpeppe: hey, I'm mostly at the end of my day, can we do something more when you come online tomorrow? Should be middle of my day
[13:46] <rogpeppe> jam: i'd really love to be able to get the next PR up for review today - it's been too long already
[13:46] <rogpeppe> jam: but i can't propose it until this one lands
[13:47] <rogpeppe> jam: but if your day's finished, then not much we can do i guess
[13:50] <jam> rogpeppe: so i think we can talk more about whether we want a dns cache, but that doesn't have to block this code landing
[13:50] <jam> I don't think you've made anything worse, as long as the next PR wouldn't be unhinged if we removed it all
[13:51] <rogpeppe> jam: ok, cool. if you could LGTM it with that caveat, that would be great, thanks
[13:51] <jam> done
[13:52] <jam> rogpeppe: found an interesting performance bug
[13:52] <wpk> yay, another dumb Windows-only unit test error fixed!
[13:52] <jam> rogpeppe: txn.Resume ends up being quadratic on the number of txns to resume at least when they are all on the same doc
[13:52] <rogpeppe> jam: ha, i'm not too surprised
[13:53] <jam> interestingly, cpuprofile says we spend 5.5s/10s in token.id() which parses the token string into a ObjectId
[13:53] <jam> and those numbers are also quadratic
[13:54] <jam> for 10 entries, we make 316 calls, for 400 entries we make 402,601 calls
[13:54] <jam> for 800 entries, 1,605,201 calls
[13:55] <jam> thats a lot of short-lived strincgs and hex parsing
[13:55] <wpk> Q: if you'd see DialOpts{Total: 0, RetryDelay: 0, Min: 0} - what's your intuition about how many times could it be retried ?
[13:58] <jam> wpk: either we'd say "that's a zero value, so replace it with the default", or 1 attempt
[13:59] <wpk> jam: that's passed to retry.Regular used in retry.StartWithCancel
[13:59] <wpk> (directly)
[13:59] <wpk> on Linux that means try once, we failed, oops - Total exceeded, return
[13:59] <wpk> on Windows we can try to connect 3 or even 4 times during those 0 seconds.
[14:01] <jam> wpk: often on windows time resolution is actually like 15ms
[14:01] <jam> its a very old 'clock()' resolution thing
[14:01] <jam> I believe you have to explicitly request high-perf clocks on Windows and many don't
[14:01] <wpk> jam: I'll just add a workaround
[14:02] <jam> either that or something like if you call it it *globally* changes the clock down to 1ms resolution (affects other programs), something like that
[14:02] <wpk> if delay==0 && total == 0 -> delay = 1
[14:02] <jam> wpk: delay = 1 or total = 1?
[14:02] <wpk> delay = 1
[14:02] <wpk> after first try it'll notice that now+delay > total and return
[14:03] <rogpeppe> wpk: that's a really good question
[14:03] <rogpeppe> wpk: i think the current behaviour is wrong
[14:04] <wpk> rogpeppe: and what would be the correct behaviour?
[14:04] <wpk> rogpeppe: I think that 'try once' is OK in this case
[14:04] <wpk> At least it's intuitive for me
[14:04] <rogpeppe> wpk: i'm not sure. we'd need to decide how long to try for.
[14:04] <rogpeppe> wpk: does a zero deadline mean an infinite deadline?
[14:05] <wpk> for me a zero means no deadline in this case
[14:05] <wpk> 'try once'.
[14:10] <rogpeppe> wpk: so zero deadline means "no deadline" and a zero retry-delay means "no retries" ?
[14:13] <wpk> zero deadline means 'try once', zero retry-delay means 'no delay between retries'
[14:13] <wpk> total 1, retry-delay 0 means try as many times as possible in 1 second
[14:14] <wpk> that's consistent with gopkg.in/retry nomenclature
[14:16] <wpk> https://github.com/juju/juju/pull/7417
[14:20] <wpk> This fixes TestDialAPIMultipleError for me
[14:53] <rogpeppe> wpk: i'd like to make the deadline a proper hard deadline on the whole dial - at the moment it's not
[14:55] <rogpeppe> wpk: are you aware that the delay timing is in nanoseconds? a delay of 1 really won't make it wait much longer... :)
[15:09] <wpk> rogpeppe: are you sure?
[15:09] <rogpeppe> wpk: sure about what?
[15:09] <wpk> rogpeppe: that it's in nanoseconds?
[15:09] <wpk> rogpeppe: because it works
[15:09] <wpk> oh, and it makes a difference
[15:09] <rogpeppe> wpk: see https://golang.org/pkg/time/#Dur
[15:10] <rogpeppe> wpk: see https://golang.org/pkg/time/#Dur
[15:10] <wpk> because N + 1 is still > N
[15:10] <rogpeppe> wpk: see https://golang.org/pkg/time/#Duration
[15:10] <wpk> so it's enough
[15:11] <rogpeppe> wpk: maybe the retry code is wrong. i'm not sure it should retry if the timeout is zero
[15:13] <wpk> it's start=now, end=now+Total, start.add(delay); if start.after(end) (a sharp comparision) return
[15:14] <wpk> that'd work should that be 'not_before'
[15:17] <wpk> oh, that's your package :)
[15:18] <wpk> if !end.after(start) should work here
[15:59] <rogpeppe> wpk: agreed
[18:23] <stokachu> rogpeppe: does https://github.com/juju/juju/commit/874fbd53dd898c325edc36ec37d0518f03bfd987 fix that issue i was seeing with the certificate error trying to connect to jaas?
[18:31] <rogpeppe> stokachu: not quite. the next PR to follow it will though.
[18:32] <rogpeppe> stokachu: sorry, it's taken a while to get it reviewed
[18:38] <stokachu> Cool np!
[21:01] <thumper> mornign
[21:01] <thumper> ^T
[21:15] <babbageclunk> wallyworld_: ping?
[21:15] <babbageclunk> morning thumper
[21:15] <thumper> babbageclunk: morning
[21:17] <babbageclunk> thumper: I finally got log forwarding working - certificates are fiddly and opaque, although that's probably just unfamiliarity.
[21:17]  * thumper nods
[21:17] <thumper> coolio
[21:17] <wpk> 23:01 <@thumper> mornign
[21:17] <thumper> wpk: hey
[21:17] <thumper> wpk: why are you still up?
[21:18] <babbageclunk> thumper: I still need to fix something about the last-seen tracking, but otherwise it seems to be working.
[21:18] <thumper> babbageclunk: sweet
[21:18] <babbageclunk> wpk: timzones, ha
[21:18] <wpk> thumper: bugs won't fix themselves ;)
[21:18] <wpk> babbageclunk: I had a proposal that we all should just abandon timezones and switch to UTC
[21:18] <wpk> babbageclunk: that'd make scheduling much easier
[21:19] <babbageclunk> wpk: good call - gets rid of the blight that is daylight savings too.