/srv/irclogs.ubuntu.com/2017/05/30/#juju-dev.txt

wallyworld_axw: hey, with your race fix PR - would it be simpler just to have each test start/stop workers explicitly rather than mess with setup/teardown00:55
axwwallyworld_: TBH I don't think we really need to duplicate the tests for each permutation. I'd rather fix the dependencies in the long term, and do the easy fix for now01:02
wallyworld_ok, maybe add a TODO01:03
axwwallyworld_: done01:13
wallyworld_ta01:13
axwwallyworld_: did I miss anything in standup? I've got PRs up for GCE and OpenStack to use volume attachment AZs in StartInstance now, will move on to PrecheckInstance if there's nothing release-related I should be looking at02:01
wallyworld_axw: we're waiting on a new CI test run with resource limits fixed so we can see where we stand. So right now, nothing that warrants attention (but that might change) We want to release Thursday. I saw the PRs, will look soon02:03
axwwallyworld_: okey dokey02:03
=== thumper is now known as thumper-afk
* axw screams into a pillow03:04
axwci failures driving me crazy03:04
veebersaxw: which failure is giving you grief?03:04
axwveebers: mostly grant and windows03:05
axwveebers: and mostly mongo stuff on windows03:05
axwif not only03:05
veebersah right :-\03:06
veebersaxw: I might get a chance today to look at why grant is so flaky, see how it might be fixed03:07
axwveebers: that would be wonderful, thanks. I know you're busy though03:07
axwI intend to see what we can do to either cut mongo out of windows tests, or see if they can be made more robust03:08
axwbut features03:08
=== thumper-afk is now known as thumper
wallyworld_babbageclunk: have you looked at assess_log_forward.py ?04:44
wallyworld_that Ci test should have deets on how to set stuff up04:45
babbageclunkwallyworld_: ooh, no - I'm trying the charm at the moment to see how that does it. Looking at the test now.04:45
babbageclunkwallyworld_: thanks04:46
=== frankban is now known as frankban|afk
rogpeppewpk: ping06:29
wpkpong06:29
rogpeppewpk: i just took a look at https://github.com/juju/juju/pull/741406:29
rogpeppewpk: i think that returning the original error is fine, but there should definitely not be a warning there06:30
wpkrogpeppe: Ian merged it already, I wanted for you to take a look at it as that's your change06:30
wpkWhy no warning?06:30
rogpeppewpk: because that situation happens all the time and it's normal06:30
rogpeppewpk: we don't want people to see warnings all the time - it makes them worried :)06:31
wpkrogpeppe: the warning is issued only if both methods fail06:31
wpkrogpeppe: (and the connection fails)06:31
rogpeppewpk: hmm, i guess it's not that common to get a cert error then another error. i still don't think it's worth a warning though.06:33
rogpeppewpk: i think i'd put it at info level06:34
wpkrogpeppe: but the connection fails06:34
rogpeppewpk: not necessarily06:34
rogpeppewpk: there can be many concurrent dial instances06:34
rogpeppewpk: and if it's the only reason it fails, we'll display the returned error anyway06:34
wpkbut the returned error is half of the story06:35
rogpeppewpk: yeah, but i can't see that mattering much unless you're trying to debug, and in that case you can see Info or Debug level messages easily06:36
rogpeppewpk: i'm wary of unnecessary warnings06:37
rogpeppewpk: oh yes, and also line 705 already logs the returned error, so we'll be logging the same error twice06:37
rogpeppewpk: i'd suggest changing line 746 to: logger.Debugf("failed to connect to websocket with public cert after private cert failed: %v", rootCAErr)06:39
rogpeppewpk: i'm particularly wary of extra warning messages because they're a common source of bug reports06:44
wpkrogpeppe logger.Debugf("Failed dialing websocket using fallback public CA - %q", rootCAErr07:46
wpkrogpeppe: looks OK?07:46
rogpeppewpk: usually we prefix errors with a colon07:47
rogpeppewpk: and i think "failed to dial" reads a bit better that "failed dialing"07:48
rogpeppewpk: i'm never sure whether log messages should start with a capital letter or not...07:48
wpklogger.Debugf("Failed to dial websocket using fallback public CA: %v", rootCAErr)07:49
rogpeppewpk: SGTM, but i'd probably use lower case "failed" as 90% of debug messages start with lower case07:50
rogpeppewpk: thanks07:51
wpkrogpeppe: https://github.com/juju/juju/pull/741607:55
rogpeppewpk: LGTM07:56
SimonKLBis it possible to run setup the controller and a worker on the same machine using manual provisioning?08:34
SimonKLBim just setting up a poc on a single machine right now, but id like to have the possibility to scale out to two machines later on08:35
SimonKLBusing the localhost controller will bind me to a single machine right?08:35
axwSimonKLB: you can deploy applications to the controller machine. just bootstrap, then switch to the controller model ("juju switch controller"), and then deploy the app with "--to 0" to place the application unit on machine 008:45
SimonKLBaxw: ah of course :) thanks08:55
SimonKLBthere seem to be some problems installing a snap-based charm in lxd, the mount unit fails09:09
SimonKLBis this a known limitation?09:09
SimonKLBMay 30 09:07:02 juju-222327-0-lxd-0 mount[10844]: fusermount: mount failed: Operation not permitted09:10
rogpeppeaxw: if you're still around, i need a second review of https://github.com/juju/juju/pull/740709:49
rogpeppejam: ^09:50
jamSimonKLB: snaps and lxd have some known caveats regardless of juju and charms09:51
jamSimonKLB: I don't remember exactly, it may be that you can install user-space mounting tools and get it to work09:51
jamSimonKLB: https://stgraber.org/2016/12/07/running-snaps-in-lxd-containers/09:51
jamSimonKLB: seems you need 'squashfuse' ?09:51
SimonKLBjam: https://bugs.launchpad.net/snappy/+bug/161107809:54
mupBug #1611078: Support snaps inside of lxd containers <landscape> <lxd> <nova-lxd> <verification-failed-xenial> <Snappy:Fix Released by stgraber> <apparmor (Ubuntu):Fix Released by tyhicks> <linux (Ubuntu):Fix Released by jjohansen> <lxd (Ubuntu):Fix Released by stgraber> <apparmor (Ubuntu09:54
mupXenial):Fix Released by tyhicks> <linux (Ubuntu Xenial):Fix Released by jjohansen> <lxd (Ubuntu Xenial):Fix Committed> <apparmor (Ubuntu Yakkety):Fix Released09:54
mupby tyhicks> <linux (Ubuntu Yakkety):Fix Released by jjohansen> <lxd (Ubuntu Yakkety):Fix Released by stgraber> <https://launchpad.net/bugs/1611078>09:54
SimonKLBit got fixed but then it hit a new (?) issue09:54
SimonKLBthis guy seem to have hit the exact same thing as i have https://bugs.launchpad.net/snappy/+bug/1611078/comments/2909:55
mupBug #1611078: Support snaps inside of lxd containers <landscape> <lxd> <nova-lxd> <verification-failed-xenial> <Snappy:Fix Released by stgraber> <apparmor (Ubuntu):Fix Released by tyhicks> <linux (Ubuntu):Fix Released by jjohansen> <lxd (Ubuntu):Fix Released by stgraber> <apparmor (Ubuntu09:55
mupXenial):Fix Released by tyhicks> <linux (Ubuntu Xenial):Fix Released by jjohansen> <lxd (Ubuntu Xenial):Fix Committed> <apparmor (Ubuntu Yakkety):Fix Released09:55
mupby tyhicks> <linux (Ubuntu Yakkety):Fix Released by jjohansen> <lxd (Ubuntu Yakkety):Fix Released by stgraber> <https://launchpad.net/bugs/1611078>09:55
jamSimonKLB: is that after installing squashfuse?09:56
SimonKLBjam: nope :) that fixed it10:00
SimonKLBprobably need to add that to the snap layer10:00
jamSimonKLB: you only need squashfuse inside lxd10:00
jamso snaps on VM/baremetal don't need it10:00
SimonKLBjam: right, so what would be the correct way to fix it? determine where the charm is deployed and install squashfuse if it's in lxd?10:01
jamSimonKLB: its an unfortunate leaky abstraction, and I don't have a great answer for it. charms generally shouldn't know where they are installed, but maybe occasionally they have to10:02
SimonKLBjam: agreed10:02
jam*juju* shouldn't care that a charm uses snaps, etc10:02
jamSimonKLB: if anything, I would tend to say that 'snapd' should know its in a container and depend on squashfuse10:03
SimonKLBjam: yea, could this perhaps be fixed using profiles?10:04
jamSimonKLB: or maybe, since the default ubuntu images come for lxd have snapd installed, then they should have squashfuse installed as well10:04
SimonKLBjam: i think that would be the most straight forward fix10:04
SimonKLBsince snap is pretty useless in lxd, the lxd image should have squashfuse installed10:05
jam(caveat that there is a push to have 1 'pristine' Ubuntu build, that then gets used for MAAS/LXD, etc)10:05
SimonKLBpretty useless without squashfuse that is10:05
jamSimonKLB: yeah, I agree that snaps in lxd without squashfuse just don't work10:05
jamSimonKLB: I think it is "known", I'm not sure who/where they should be working on it.10:05
SimonKLBjam: any idea where to propose this addition to the lxd images?10:05
SimonKLBah10:05
jamSimonKLB: so, file a bug, tell me about it, and I'll raise it to some people10:06
SimonKLBis there a repo or something like that for the images?10:06
SimonKLBor where should it be filed10:06
jamSimonKLB: if anything I would raise it against Snappy in Ubuntu10:06
SimonKLBokok!10:06
jamhttps://bugs.launchpad.net/ubuntu/+source/snapd10:06
SimonKLBsnappy or snapd? :D10:07
jamthey can always add other projects to it10:07
SimonKLBright10:07
jamwell 'snapcraft'/ building snaps doesn't need it, I think10:07
jamI don't know snappy vs snapd very well10:07
jamsnapcraft being the user tools, and something being the snap store, etc.10:07
SimonKLBme neither :) ill report it in snapd and someone that knows better can re-label it perhaps10:08
SimonKLBjam: https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/169441110:11
mupBug #1694411: Add squashfuse to the Ubuntu LXD images <snapd (Ubuntu):New> <https://launchpad.net/bugs/1694411>10:11
jamSimonKLB: thanks, will add some people10:11
SimonKLBgreat!10:11
SimonKLBgot everything installed for a basic kubernetes install on manual provisioned lxd containers except etcd11:13
SimonKLBERROR cannot add application "etcd": cannot deploy to machine 0/lxd/6: adding storage to lxd container not supported11:13
SimonKLB:(11:13
SimonKLBis this a WIP?11:14
SimonKLBturns out this only applies when you use the --to flag when deploying a charm11:35
SimonKLBhttps://github.com/juju/juju/blob/635d98d8cb34e0124f1baf1aa3baec2e28511a64/state/unit.go#L149611:35
jamSimonKLB: axw is in Perth, Australia whose been working on storage, but AIUI we don't support custom storage on LXD provider yet, I'm not sure whether the bundle is trying to supply something custom that we don't support, or whether we just have a bug where we're trying storage when it isn't requested12:22
SimonKLBjam: looking at that line i linked to it seems that the storage is only validated when you deploy using the --to flag to do a specific machine assignment12:24
SimonKLBwasn't a problem deploying etcd without --to12:24
SimonKLBshould probably validated it a bit less strict though, since persistent storage isnt a requirement of etcd, just an option12:25
rogpeppejam: ping13:24
rogpeppejam: i've replied to your concerns voiced in https://github.com/juju/juju/pull/7407 - i'd be interested to have a chat about it if you have a moment at some point13:45
jamrogpeppe: hey, I'm mostly at the end of my day, can we do something more when you come online tomorrow? Should be middle of my day13:46
rogpeppejam: i'd really love to be able to get the next PR up for review today - it's been too long already13:46
rogpeppejam: but i can't propose it until this one lands13:46
rogpeppejam: but if your day's finished, then not much we can do i guess13:47
jamrogpeppe: so i think we can talk more about whether we want a dns cache, but that doesn't have to block this code landing13:50
jamI don't think you've made anything worse, as long as the next PR wouldn't be unhinged if we removed it all13:50
rogpeppejam: ok, cool. if you could LGTM it with that caveat, that would be great, thanks13:51
jamdone13:51
jamrogpeppe: found an interesting performance bug13:52
wpkyay, another dumb Windows-only unit test error fixed!13:52
jamrogpeppe: txn.Resume ends up being quadratic on the number of txns to resume at least when they are all on the same doc13:52
rogpeppejam: ha, i'm not too surprised13:52
jaminterestingly, cpuprofile says we spend 5.5s/10s in token.id() which parses the token string into a ObjectId13:53
jamand those numbers are also quadratic13:53
jamfor 10 entries, we make 316 calls, for 400 entries we make 402,601 calls13:54
jamfor 800 entries, 1,605,201 calls13:54
jamthats a lot of short-lived strincgs and hex parsing13:55
wpkQ: if you'd see DialOpts{Total: 0, RetryDelay: 0, Min: 0} - what's your intuition about how many times could it be retried ?13:55
jamwpk: either we'd say "that's a zero value, so replace it with the default", or 1 attempt13:58
wpkjam: that's passed to retry.Regular used in retry.StartWithCancel13:59
wpk(directly)13:59
wpkon Linux that means try once, we failed, oops - Total exceeded, return13:59
wpkon Windows we can try to connect 3 or even 4 times during those 0 seconds.13:59
jamwpk: often on windows time resolution is actually like 15ms14:01
jamits a very old 'clock()' resolution thing14:01
jamI believe you have to explicitly request high-perf clocks on Windows and many don't14:01
wpkjam: I'll just add a workaround14:01
jameither that or something like if you call it it *globally* changes the clock down to 1ms resolution (affects other programs), something like that14:02
wpkif delay==0 && total == 0 -> delay = 114:02
jamwpk: delay = 1 or total = 1?14:02
wpkdelay = 114:02
wpkafter first try it'll notice that now+delay > total and return14:02
rogpeppewpk: that's a really good question14:03
rogpeppewpk: i think the current behaviour is wrong14:03
wpkrogpeppe: and what would be the correct behaviour?14:04
wpkrogpeppe: I think that 'try once' is OK in this case14:04
wpkAt least it's intuitive for me14:04
rogpeppewpk: i'm not sure. we'd need to decide how long to try for.14:04
rogpeppewpk: does a zero deadline mean an infinite deadline?14:04
wpkfor me a zero means no deadline in this case14:05
wpk'try once'.14:05
rogpeppewpk: so zero deadline means "no deadline" and a zero retry-delay means "no retries" ?14:10
wpkzero deadline means 'try once', zero retry-delay means 'no delay between retries'14:13
wpktotal 1, retry-delay 0 means try as many times as possible in 1 second14:13
wpkthat's consistent with gopkg.in/retry nomenclature14:14
wpkhttps://github.com/juju/juju/pull/741714:16
wpkThis fixes TestDialAPIMultipleError for me14:20
rogpeppewpk: i'd like to make the deadline a proper hard deadline on the whole dial - at the moment it's not14:53
rogpeppewpk: are you aware that the delay timing is in nanoseconds? a delay of 1 really won't make it wait much longer... :)14:55
wpkrogpeppe: are you sure?15:09
rogpeppewpk: sure about what?15:09
wpkrogpeppe: that it's in nanoseconds?15:09
wpkrogpeppe: because it works15:09
wpkoh, and it makes a difference15:09
rogpeppewpk: see https://golang.org/pkg/time/#Dur15:09
rogpeppewpk: see https://golang.org/pkg/time/#Dur15:10
wpkbecause N + 1 is still > N15:10
rogpeppewpk: see https://golang.org/pkg/time/#Duration15:10
wpkso it's enough15:10
rogpeppewpk: maybe the retry code is wrong. i'm not sure it should retry if the timeout is zero15:11
wpkit's start=now, end=now+Total, start.add(delay); if start.after(end) (a sharp comparision) return15:13
wpkthat'd work should that be 'not_before'15:14
wpkoh, that's your package :)15:17
wpkif !end.after(start) should work here15:18
rogpeppewpk: agreed15:59
stokachurogpeppe: does https://github.com/juju/juju/commit/874fbd53dd898c325edc36ec37d0518f03bfd987 fix that issue i was seeing with the certificate error trying to connect to jaas?18:23
rogpeppestokachu: not quite. the next PR to follow it will though.18:31
rogpeppestokachu: sorry, it's taken a while to get it reviewed18:32
stokachuCool np!18:38
thumpermornign21:01
thumper^T21:01
babbageclunkwallyworld_: ping?21:15
babbageclunkmorning thumper21:15
thumperbabbageclunk: morning21:15
babbageclunkthumper: I finally got log forwarding working - certificates are fiddly and opaque, although that's probably just unfamiliarity.21:17
* thumper nods21:17
thumpercoolio21:17
wpk23:01 <@thumper> mornign21:17
thumperwpk: hey21:17
thumperwpk: why are you still up?21:17
babbageclunkthumper: I still need to fix something about the last-seen tracking, but otherwise it seems to be working.21:18
thumperbabbageclunk: sweet21:18
babbageclunkwpk: timzones, ha21:18
wpkthumper: bugs won't fix themselves ;)21:18
wpkbabbageclunk: I had a proposal that we all should just abandon timezones and switch to UTC21:18
wpkbabbageclunk: that'd make scheduling much easier21:18
babbageclunkwpk: good call - gets rid of the blight that is daylight savings too.21:19

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!