[00:44] <thumper> dog walk, then addressing review comments...
[00:44]  * thumper afk for a bit
[01:21] <redir> manana juju-dev
[01:41] <wallyworld> menn0: a small one, fixes 2 blockers, when you have a moment http://reviews.vapour.ws/r/5276/
[01:43] <menn0> wallyworld: looking
[01:45] <menn0> wallyworld: ship it
[01:45] <wallyworld> menn0: ta
[01:45] <anastasiamac> menn0: axw beat u to it too :D
[02:02] <menn0> wallyworld: I think I've figure out what's going on with https://bugs.launchpad.net/juju-core/+bug/1604514
[02:02] <mup> Bug #1604514: Race in github.com/joyent/gosdc/localservices/cloudapi <blocker> <ci> <joyent-provider> <race-condition> <regression> <juju-core:In Progress by menno.smits> <https://launchpad.net/bugs/1604514>
[02:02] <menn0> it's certainly not a new issue
[02:02] <menn0> and I really don't think it should be a blocker
[02:02] <wallyworld> yeah, i'd be surprised if it were
[02:03] <menn0> I think the problem is that the joyent provider destroys machines in parallel
[02:03] <wallyworld> it's not a regression
[02:03] <wallyworld> i'm surprised it was marked as such
[02:03] <menn0> but the joyent API test double isn't safe to access concurrently
[02:03] <wallyworld> sounds plausible
[02:04] <menn0> the correct place to fix it is in the test double but that's not our code
[02:04] <wallyworld> yep, i think we can unmark as a blocker and figure out what to do from there
[02:05] <wallyworld> we may need to pull in that external code, as a i doubt we will get it to be fixed
[02:05] <menn0> wallyworld: ok, i'll update the ticket so it's no longer blocking
[02:06] <menn0> wallyworld: and then I'll poke it some more to see if I can figure out a fix
[02:06] <menn0> wallyworld: I can /occasionally/ reproduce the race if I use dave's stress script
[02:07] <wallyworld> maybe there's a work around in the non test code, but would be better to fix upstream i guess
[02:13] <stokachu> menn0: im still seeing https://bugs.launchpad.net/juju-core/+bug/1604644
[02:13] <mup> Bug #1604644: juju2beta12: E11000 duplicate key error collection: juju.txns.stash <blocker> <conjure> <mongodb> <juju-core:Triaged> <https://launchpad.net/bugs/1604644>
[02:13] <stokachu> just fyi
[02:13] <menn0> stokachu: that's the issue xtian was looking at
[02:14] <stokachu> menn0: this one was https://bugs.launchpad.net/bugs/1593828
[02:14] <mup> Bug #1593828: cannot assign unit E11000 duplicate key error collection: juju.txns.stash <ci> <conjure> <deploy> <intermittent-failure> <oil> <oil-2.0> <juju-core:Fix Released by 2-xtian> <https://launchpad.net/bugs/1593828>
[02:14] <stokachu> and it was marked fixed
[02:15] <menn0> stokachu: they're the same issue (dup)
[02:16] <menn0> stokachu: which version of Juju are you using? I think it was only fixed very recently (not sure exactly when though)
[02:16] <stokachu> menn0: correct, i opened a new issue as the previous version was marked fixed release
[02:17] <stokachu> Bug #1604644: juju2beta12: E11000 duplicate key error collection: juju.txns.stash
[02:17] <mup> Bug #1604644: juju2beta12: E11000 duplicate key error collection: juju.txns.stash <blocker> <conjure> <mongodb> <juju-core:Triaged> <https://launchpad.net/bugs/1604644>
[02:17] <stokachu> juju beta 12
[02:17]  * menn0 checks when the fix went in
[02:18] <stokachu> beta12 lol
[02:19] <thumper> menn0: perhaps the patch approach didn't work?
[02:20] <mup> Bug #1589471 changed: Mongo cannot resume transaction <canonical-bootstack> <juju-core:Invalid> <https://launchpad.net/bugs/1589471>
[02:20] <menn0> stokachu, thumper: nope the fix didn't make beta12
[02:20] <mup> Bug #1604641 opened: restore-backup fails when attempting to 'replay oplog' again <backup-restore> <blocker> <ci> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1604641>
[02:20] <mup> Bug #1604644 opened: juju2beta12: E11000 duplicate key error collection: juju.txns.stash <blocker> <conjure> <mongodb> <juju-core:Triaged> <https://launchpad.net/bugs/1604644>
[02:20] <stokachu> lmao
[02:20] <stokachu> it got mark fixed release
[02:20] <menn0> the fix is here: 99cb2d1c148f5ed1d246bf4fe44064363226e12e (Jul 15)
[02:20] <menn0> it's not in beta12
[02:21] <stokachu> menn0: can you update that bug with your findings
[02:21] <stokachu> 1604644
[02:21] <menn0> stokachu: will do. shall I also mark it as a dup of the other one?
[02:22] <stokachu> menn0: the other bug is already mark fixed released
[02:22] <thumper> menn0: I thought the patch was applied to the top of our mgo branch
[02:22] <stokachu> i think we should leave that one alone and work off this new one
[02:22] <thumper> menn0: check with mgz and sinzui
[02:22] <thumper> and balloons I suppose
[02:22] <stokachu> sinzui: ^ they are saying it didnt make it in
[02:22] <stokachu> make it in beta12
[02:22] <menn0> thumper: no, it looks like we copied in a fixed version of mgo's upsert code into juju
[02:24] <menn0> ah crap... chrome crash
[02:26] <menn0> thumper: oh never mind, you're right we patch over mgo in the build
[02:26] <menn0> thumper: at any rate, that change isn't in beta12
[02:26] <thumper> Which was the release we just did? It should be in that
[02:27] <thumper> if stokachu is building from source, he won't have it
[02:27] <stokachu> this is from the ppa
[02:27] <thumper> hmm...
[02:27] <thumper> that should have the fix
[02:27] <menn0> thumper: the latest tag in git is "juju-2.0-beta12"
[02:27] <menn0> the fix is 99cb2d1c148f5ed1d246bf4fe44064363226e12e
[02:27] <menn0> when I check out the tag, the fix isn't there
[02:28] <menn0> when I check out master, it is
[02:28] <thumper> ugh
[02:28] <stokachu> im guessing a one-off was done for this issue
[02:28] <stokachu> ?
[02:29] <menn0> perhaps there was some miscommunication about when the release was ok to cut
[02:29] <lazyPower> booo, that was in the release notes too
[02:29] <lazyPower> mgo package update that retries upserts that fail with ‘duplicate key error’ lp1593828
[02:29] <lazyPower> speaking of o/ hey core team :
[02:29] <lazyPower> :)
[02:31] <stokachu> so we're sure that fix isn't in beta 12 from the ppa?
[02:32] <stokachu> because it's also uploaded to the archive :)
[02:32] <menn0> stokachu: pretty sure. the release tag is there in git, and the fix isn't part of that release.
[02:33] <stokachu> menn0: ok, if you don't mind updating that bug so i can follow up with balloons/mgz in the morning
[02:33] <menn0> awesome :(
[02:33] <menn0> stokachu: will do. i'll poke xtian too so he's in the loop
[02:33] <stokachu> menn0: ok cool thanks a bunch
[02:33] <sinzui> menn0: thumper: The patch was added to the juju tree, and the scrpt that makes the tar file applies it. that is the hack that mgz put together
[02:33] <thumper> sinzui: looks like something didn't take though
[02:34] <thumper> o/ lazyPower
[02:34] <axw> wallyworld: I'm planning to add this to the cloud package: http://paste.ubuntu.com/20129296/. one of those will be present in a new environs.OpenParams struct. sound sane?
[02:34] <menn0> sinzui: it looks like the rev didn't make the cut of the release.
[02:34] <axw> sound/look
[02:34] <sinzui> Yeah, that is a bad way to deliver a fix
[02:34] <wallyworld> axw: loking
[02:34] <menn0> sinzui: what to do now?
[02:35] <axw> wallyworld: open to suggestions for a better name also
[02:35] <sinzui> menn0: I have no idea. I think godeps should define the repo and rev. Other wise we continue to maintain the patch in the tree and apply it each time the tar file is made
[02:36] <menn0> sinzui: the immediate problem is that beta12 didn't include the fix at all. the revision with the fix was committed *after* beta12 was cut.
[02:37] <menn0> sinzui: the mgo patch doesn't exist in beta12
[02:37] <stokachu> we should amend the release notes and set the fix for beta13
[02:37] <wallyworld> axw: i don't think that struct belongs in cloud - it's an analgamation of things used for an environ should really belongs in there
[02:37] <wallyworld> and then it could be called CloudSpec
[02:37] <wallyworld> or something
[02:37] <sinzui> menn0: I cannot help at this point. The release was started we aboorted and tried again.
[02:38] <stokachu> so can the mgo fix be pulled into godeps now?
[02:38] <stokachu> what was the reason for applying the fix during the tarball build
[02:39] <thumper> anastasiamac: while you are doing virt-type fixes, core/description/constraints_test.go:25, the virt type needs to be added to the allArgs func
[02:39] <axw> wallyworld: yeah ok, that's what I had to start with. issue is how to then make State implement EnvironConfigGetter. I think I'll have to define a type outside of the state package that adapts it to that interface
[02:40] <wallyworld> stokachu: the reason was we don't control upstream and we could not get the fix landed for us to use
[02:40] <wallyworld> so we were forced to adopt a solution where the change was patched in a s part og the build
[02:40] <stokachu> wallyworld: so the fixed was pulled in before the PR was accepted?
[02:40] <thumper> stokachu: more complicated than that...
[02:40] <stokachu> ah ok
[02:41] <stokachu> just trying to understand
[02:41] <thumper> related to golang, imports and the mgo release process
[02:41] <wallyworld> stokachu: no, the upstream PR was unaccepted but it was landed in an unstable v2 branch which we could not use directly
[02:41] <wallyworld> it's all a mess
[02:41] <thumper> s/unaccepted/accepted/
[02:41] <stokachu> ok, but the status in master is it is now part of the tree?
[02:41] <wallyworld> no :-(
[02:41] <thumper> kinda
[02:41] <wallyworld> not that i am aware of
[02:41] <thumper> but poorly
[02:42] <thumper> wallyworld: it is in a patch...
[02:42] <thumper> in the tree
[02:42] <thumper> ick
[02:42] <stokachu> how do you guys do it, this makes my head hurt
[02:42] <wallyworld> sure, but unless you apply the patch manually....
[02:42] <thumper> yes
[02:42] <wallyworld> mine too
[02:42] <thumper> stokachu: many years of built up resistence
[02:42] <lazyPower> stokachu - i'm going to say copious amounts of beer and callous to schenanigans
[02:42]  * thumper goes to put the kettle on
[02:42] <stokachu> thumper: lol, you guys will lead the zombie resistance
[02:43] <stokachu> lazyPower: :D
[02:43] <thumper> I for one await the zomie appocalypse
[02:43] <menn0> stokachu: this is partially due to the way Go handles imports
[02:43] <lazyPower> I never trusted go imports
[02:43] <stokachu> ok so not as simple as placing the git rev in the Godeps stuff
[02:43] <menn0> stokachu: b/c mgo is imported all over the place across multiple repos, if we want to fork it, we would have to change *everything*
[02:44] <menn0> stokachu: no, b/c the fix got accepted into mgo's unstable branch, but isn't yet in the stable branch
[02:44] <stokachu> ah i see
[02:44] <stokachu> gotcha, i didnt realize it was never in the stable branch
[02:44] <menn0> stokachu: we *could* use the unstable branch, but that pulls in a bunch of other stuff we don't really want
[02:44] <stokachu> understood
[02:46] <lazyPower> doesn't that mean its going to wind up landing in stable and pull in that bunch of other stuff eventually?
[02:46]  * lazyPower is showing his ineptitude at golang
[02:52] <natefinch> the whole "unstable" thing in the import path just seems like a bad idea.  Either make it a new version or don't.  If you want to mark it as unstable, do so in the readme.
[02:57]  * thumper notes that we are still using charm.v6-unstable
[02:57] <natefinch> yep
[02:57] <natefinch> dumb idea
[02:58] <natefinch> instead of having to go change all the imports once when we move to a new version, we have to do it twice.  Assuming we ever actually bother to rename it from unstable.
[04:08] <menn0> wallyworld: fix for the joyent race: http://reviews.vapour.ws/r/5277/
[04:08] <wallyworld> looking
[04:09] <wallyworld> menn0: lgtm
[04:10] <menn0> wallyworld: thansk
[04:10] <menn0> thanks even
[04:11] <menn0> wallyworld: backport to 1.25 as well/
[04:11] <menn0> ?
[04:11] <wallyworld> menn0: um, it's such a simple fix, why not
[04:11] <menn0> wallyworld: ok
[04:11] <wallyworld> might get a bless more often than twice a year
[04:22] <thumper> menn0: re dump-model review, and See Also, I copied it from elsewhere...
[04:23] <thumper> I did think it was strange
[04:25]  * thumper looks for a good example
[04:38] <thumper> menn0: updated http://reviews.vapour.ws/r/5265/
[04:38] <thumper> added a few drive by fixes for "See also:" formatting, made consistent with juju switch
[04:38] <thumper> menn0: made the apiserver side a bulk call, client api still single
[04:38] <thumper> added client side formatting
[04:44] <menn0> thumper: looking. I wasn't really suggesting that you had to do the bulk API work given the rest of the facade but great that you did anyway :)
[04:45] <menn0> thumper: "See also" is already quite inconsistent between commands
[04:45] <menn0> sigh
[04:45] <thumper> I thought that switch was most likely to be right
[04:45] <thumper> I looked at quite a few
[04:45] <menn0> thumper: oh hang on... you fixed them all!
[04:45] <thumper> and picked the resulting style
[04:45] <menn0> thumper: nice
[04:46] <thumper> well, in that package
[04:49] <menn0> thumper: ship it!
[04:50] <thumper> menn0: ta
[05:53] <babbageclunk> menn0: D'oh.
[08:00] <frobware> dooferlad: ping
[08:01] <dooferlad> frobware: hi
[08:01] <frobware> dooferlad: any change we can meet now?
[08:02] <frobware> chance
[08:02] <dooferlad> frobware: need 5 mins
[08:02] <frobware> dooferlad: I have a plumber arriving in ~30 mins which is likely to clash with our 1:1
[08:02] <frobware> dooferlad: ok
[08:03] <babbageclunk> menn0: ping?
[08:04] <menn0> babbageclunk: howdy... i'm in the tech board call atm. talk after?
[08:04] <babbageclunk> menn0: cool cool
[09:13] <menn0> babbageclunk: hey, done now
[09:14] <babbageclunk> menn0: Sorry, in standup.
[09:14] <wallyworld> fwereade: in prep for some work, i have needed to move model config get/set/unset off client facade to their own new facade, so essentially a copy of stuff and a bit of boiler plate for backwards compat until gui is updated. would love a review at your leisure so i can land when CI is unclocked http://reviews.vapour.ws/r/5279/
[09:15] <wallyworld> i also removed jujuconnsuite tests \o/
[09:17] <menn0> babbageclunk: np, I'll hang around for a bit.
[09:23] <fwereade> wallyworld, ack, thanks
[09:24] <fwereade> wallyworld, I presume: s/have needed to/gladly took the opportunity to/ ;p
[09:34] <wallyworld> fwereade: that too, but also a need
[09:34] <wallyworld> :)
[09:44] <babbageclunk> menn0: Sorry, rambling discussion about godeps and vendoring. Nearly done.
[09:45] <menn0> babbageclunk: sounds like a repeat of the tech board meeting :)
[09:45] <babbageclunk> menn0: quite
[09:45] <babbageclunk> menn0: ok, done
[09:46] <babbageclunk> menn0: did you manage to reproduce stokachu's problem?
[09:46] <babbageclunk> menn0: sorry, I mean, has anyone had a chance to reproduce it?
[09:47] <menn0> babbageclunk: nope. I gave stokachu a rebuild of 2.0-beta12 which definite had the patch applied.
[09:47] <babbageclunk> menn0: And does he see it with that?
[09:47] <menn0> babbageclunk: he was going to try it out and see if the problem happened with that as he's able to make it happen fairly reliably.
[09:48] <menn0> babbageclunk: I don't know. He never got back to me. I think it was quite late for him at the time.
[09:48] <menn0> babbageclunk: he was going to report back on the ticket but hasn't yet.
[09:48] <babbageclunk> menn0: Ok, cool - I had a go with a checkout of the right commit and the patch applied, but no luck yet - not sure which bundle to use.
[09:48] <menn0> babbageclunk:
[09:48] <menn0> babbageclunk: my goal was to establish whether or not the patch made it into the release or not
[09:49] <menn0> (and whether or not it worked)
[09:49] <menn0> babbageclunk: I imagine we'll hear back from stokachu when he starts work again
[09:49] <babbageclunk> menn0: Also not sure whether my laptop has enough oomph to cause the contention needed.
[09:50] <menn0> babbageclunk: it seems like there was some process failure when the official beta12 was produced so I'm not ruling out that the patch didn't actually make it into the release
[09:50] <babbageclunk> menn0: Yeah, it was a bit crazy.
[09:52] <menn0> babbageclunk: stokachu said he could make the problem happen quite often with just using add-model and destroy-model
[09:52] <menn0> I'm not sure how hard he was really pushing things
[09:53] <babbageclunk> menn0: Ok, I'll try that a few more times. The hadoop-spark-zeppelin bundle really squishes my machine. It's pretty cool.
[09:53] <menn0> babbageclunk: I guess you could try making the problem happen with a juju that's built without the patch
[09:54] <menn0> and when you have a reliable way of triggering the problem
[09:54] <menn0> rebuild with the patch and see if it goes away
[09:54] <babbageclunk> menn0: Well, I'm more concerned that the 5-retry thing just made it a bit less likely, but not really better.
[09:54] <menn0> or, you could hold off and do something else until we hear more from the QA peeps and stokachu
[09:55] <babbageclunk> menn0: I'll give it a couple more kicks and then get in touch with the US peeps.
[09:55] <menn0> you would think 5 would be enough...
[09:55] <babbageclunk> I would and did!
[09:55] <menn0> maybe a random short sleep between each loop would help?
[09:55] <menn0> ethernet style
[09:56] <babbageclunk> Yeah, could help - want to be sure it's happening first though.
[09:56] <menn0> for sure... need more info
[09:57] <babbageclunk> amusing - the test that was originally causing the problem in tests has been deleted.
[09:57] <babbageclunk> I mean, in our suite.
[09:58] <menn0> babbageclunk: for unrelated reasons?
[09:58] <babbageclunk> yeah, because address picking has been removed.
[10:01] <menn0> ha funny... still needs to be fixed of course
[10:01] <menn0> babbageclunk: I've got to go. I've got a literal mountain of washing to contend with.
[10:03] <babbageclunk> menn0: ok, thanks. Happy climbing!
[11:40] <mup> Bug #1604785 opened: repeatedly getting rsyslogd-2078 on node#0 /var/log/syslog <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1604785>
[11:40] <mup> Bug #1604787 opened: juju agents trying to log to 192.168.122.1:6514 (virbr0 IP) <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1604787>
[11:51] <frankban> cherylj: hey morning, could you please merge trivial http://reviews.vapour.ws/r/5280/ ?
[11:55] <cherylj> frankban: sure
[11:55] <frankban> cherylj: ty!
[12:10] <mup> Bug #1598272 changed: LogStreamIntSuite.TestFullRequest sometimes fails <ci> <intermittent-failure> <test-failure> <juju-core:Fix Released by fwereade> <https://launchpad.net/bugs/1598272>
[12:20] <stokachu> babbageclunk: retrying to reproduce this morning, was late last night for me
[12:22] <perrito666> morning all
[12:44] <frankban> cherylj: how do I check what failed at /var/lib/jenkins/workspace/github-merge-juju/artifacts/trusty-err.log ?
[12:44] <frankban> cherylj: sorry, at http://juju-ci.vapour.ws:8080/job/github-merge-juju/8475/console
[12:45] <cherylj> frankban: I've pinged mgz to take a look.  I think it's a merge job failure
[12:45] <frankban> cherylj: ty
[12:46] <mup> Bug # changed: 1603596, 1604176, 1604408, 1604561, 1604644
[13:31] <perrito666> wallyworld: go to sleep?
[13:31] <wallyworld> ok, about that time
[13:31] <mup> Bug #1604817 opened: Race in github.com/juju/juju/featuretests <blocker> <ci> <intermittent-failure> <race-condition> <regression> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1604817>
[13:33] <natefinch> wallyworld: if you he 2 minutes, I'd love it if you could just read and maybe quickly respond to a couple review comments I have: http://reviews.vapour.ws/r/5238/
[13:34] <natefinch> s/he/have
[13:34] <wallyworld> ok
[13:36] <wallyworld> natefinch: done
[13:36] <natefinch> wallyworld: thanks
[13:37] <natefinch> hey, we're down to just two blocking tests in master, awesome
[13:37] <natefinch> (sorta)
[13:48] <babbageclunk> fwereade: ping?
[13:48] <frankban> cherylj: should I try merge again?
[13:49] <fwereade> babbageclunk, pong
[13:50] <fwereade> babbageclunk, what can I do for you?
[13:50] <babbageclunk> fwereade: I'm trying to understand the relationship between container and machine provisioners.
[13:50] <babbageclunk> fwereade: Sorry, environ provisioners
[13:51] <babbageclunk> fwereade: (looking at bug 1585878)
[13:51] <mup> Bug #1585878: Removing a container does not remove the underlying MAAS device representing the container unless the host is also removed. <2.0> <hours> <maas-provider> <network> <reliability> <juju-core:Triaged by 2-xtian> <https://launchpad.net/bugs/1585878>
[13:52] <fwereade> babbageclunk, at the heart of a provisioner there is a simple idea: watch the machines and StartInstance/StopInstance in response
[13:52] <fwereade> babbageclunk, I think that's called ProvisionerTask?
[13:53] <babbageclunk> fwereade: yup, and it's the same between the environ and container provisioners.
[13:53] <babbageclunk> fwereade: but with different brokers, I think.
[13:53] <fwereade> babbageclunk, yeah, exactly
[13:54] <babbageclunk> fwereade: So it looks like the environ provisioner explicitly excludes containers from the things it watches
[13:54] <fwereade> babbageclunk, ultimately we *should* be able to just start each of them with a broker, an api facade, and knowledge of what set of machines they should watch
[13:54] <fwereade> babbageclunk, yeah, that should be encapsulated in what it watches
[13:55] <fwereade> babbageclunk, I expect they actually make different watch calls or something, though? :(
[13:55] <babbageclunk> fwereade: Ok - in the maas case I need to tell maas the container's gone away after getting rid of it.
[13:55] <cherylj> frankban: yes, looks like one PR went through, so something's working...
[13:55] <cherylj> frankban: so I'd retry
[13:56] <frankban> cherylj: retrying
[13:56] <fwereade> babbageclunk, ha, ok, let me think
[13:56] <babbageclunk> fwereade: Until I started saying this, I thought that the container broker didn't talk to the environ, but now I think that's wrong - it needs to tell it when it starts, right?
[13:57] <fwereade> babbageclunk, I am confident that a container provisioner should *not* talk to the environ directly, because that would entail distributing environ creds to every machine
[13:58] <babbageclunk> fwereade: Ok, that makes sense. So in order to clean up the maas record of the container, the environ provisioner would also need to watch containers, right?
[13:59] <babbageclunk> I should trace the start path so I can see where maas gets told about the container.
[14:00] <fwereade> babbageclunk, I would be most inclined to have a separate instance-cleanup worker on the controllers, fed by provisioners leaving messages (directly or indirectly) on instance destruction
[14:00] <babbageclunk> fwereade: leaving messages how? Files?
[14:00] <fwereade> babbageclunk, db docs?
[14:00] <babbageclunk> fwereade: oh, duh
[14:00] <fwereade> babbageclunk, ;p
[14:01] <fwereade> babbageclunk, there is a general problem with having all-the-necessary-stuff set up before a provisioner sees a machine to try to deploy
[14:02] <babbageclunk> fwereade: ok, so the container provisioner creates a record indicating that it killed a container, and a controller-based worker watches those and does the environ-level cleanup.
[14:02] <fwereade> babbageclunk, trying to set up networks etc in the provisioner is wilful SRP violation -- but I think we do have a PrepareContainerNetworking (or something) call that the provisioner task makes
[14:03] <babbageclunk> fwereade: ok
[14:03] <fwereade> babbageclunk, yeah, I would be grateful if we would cast it in terms that applied to machines and containers both, and didn't distinguish between them except in the worker that actually handles them
[14:03] <babbageclunk> fwereade: so that's in the environ provisioner - it talks to the provider.
[14:04] <fwereade> babbageclunk, I don't think any provisioner should be responsible for doing this work, I think it should be a separate instance-cleanup worker
[14:04] <babbageclunk> fwereade: (oops, that was in response the prev)
[14:04] <fwereade> babbageclunk, yuck :)
[14:05] <babbageclunk> fwereade: Ok - so the provisioner task would just say "this instance needs cleaning"...
[14:05] <babbageclunk> fwereade: and then the new worker would see all of them and just do stuff for the containers for now.
[14:05] <fwereade> babbageclunk, so, really, *that* should be happening in an instance-preparer worker, which creates tokens watched by the appropriate provisioner, which can then only try to start instances that have all their deps ready
[14:06] <fwereade> babbageclunk, yeah, I think so
[14:06] <fwereade> (I refer, above, to the instance-prep work currently done by the provisioner, not to what you just said, which I agree with)
[14:06] <babbageclunk> fwereade: right, I was just going to check that.
[14:07] <babbageclunk> fwereade: sounds good, thanks!
[14:07] <fwereade> babbageclunk, note that there's an environ-tracker manifold available on environ manager machines already, it gets you a shared environ that's updated in the background, you don't need to dirty your worker up with those concerns
[14:08] <babbageclunk> fwereade: ok, I'll make sure to base my worker on that.
[14:10] <fwereade> babbageclunk, and it is called "environ-tracker", set up in agent/model.Manifolds
[14:10] <fwereade> babbageclunk, just use it as a dependency and focus the worker around the watch/response loop
[14:11] <fwereade> babbageclunk, you should then be able to just assume the environ's always up to date, and if you do race with a credential change or something it's nbd, just an error, fail out and let the mechanism bring you up to try again soon
[14:12] <babbageclunk> fwereade: ok
[14:12] <fwereade> babbageclunk, ...or, hmm. be careful about those errors, actually
[14:12] <fwereade> babbageclunk, we want those to be observable, I think
[14:13] <babbageclunk> fwereade: observable?
[14:13] <fwereade> babbageclunk, and we probably shouldn't mark the machine that used them dead until they've succeeded
[14:13] <fwereade> babbageclunk, report the error in status, I think, nothing should be competing for it by the time this is running
[14:14] <babbageclunk> fwereade: oh, gotch
[14:14] <babbageclunk> a
[14:14] <fwereade> babbageclunk, so, /sigh, this implies moving responsibility for set-machine-dead off the provisioner and onto the instance-cleaner
[14:14] <fwereade> babbageclunk, which is clearly architecturally sane, but a bit of a hassle
[14:15] <fwereade> babbageclunk, otherwise we'll be leaking resources and not having any entity against which to report the errors
[14:15] <fwereade> babbageclunk, sorry again: not set-machine-dead, but remove-machine
[14:15] <fwereade> babbageclunk, the machine agent sets itself dead to signal to the rest of the system that its resources should be cleaned up
[14:16] <babbageclunk> fwereade: ok
[14:16] <frankban> cherylj: looked at the tests and I've found that the failure is real for my branch. I have a fix already, but how do I check the tests that actually failed from the CI logs?
[14:16] <fwereade> babbageclunk, but we shouldn't *remove* it until both the instance (by the provisioner) and other associated resources (by instance-cleaner, maybe more in future) have been cleaned up
[14:17] <babbageclunk> fwereade: Yeah, that makes sense.
[14:17] <cherylj> frankban: go to your merge job:  http://juju-ci.vapour.ws:8080/job/github-merge-juju/8475/
[14:17] <cherylj> frankban: and click trusty-err.log
[14:17] <fwereade> babbageclunk, ...and ofc *that* now implies that we *will* potentially have workers competing for status writes
[14:17] <cherylj> frankban: argh, looks like it failed to run again
[14:17] <cherylj> balloons, sinzui - can you take a look:  http://juju-ci.vapour.ws:8080/job/github-merge-juju/8475/artifact/artifacts/trusty-err.log
[14:18] <fwereade> babbageclunk, so... it's not trivial, I'm afraid, but I can't think of any other things that'll interfere
[14:18] <frankban> cherylj: I am running 8477 now
[14:18] <frankban> cherylj: let's see if it will fail to run again, it should fail with 2 tests failures in theory
[14:19] <fwereade> babbageclunk, do you know what dimitern has been doing lately? I think he had semi-detailed plans for addressing the corresponding setup concerns but I'm not sure he started implementing them
[14:19] <cherylj> frankban: ah, well, when it completes you can view that trusty-err.log file for the test output
[14:19] <babbageclunk> fwereade: sorry, no - he's been away for the last week and a bit, not sure what he's working on at the moment.
[14:19] <frankban> cherylj: yes thank you, good to know
[14:19] <fwereade> babbageclunk, no worries
[14:20] <fwereade> babbageclunk, do sync up with him when he returns
[14:20] <babbageclunk> fwereade: hang on, why multiple workers competing to write status?
[14:20] <fwereade> babbageclunk, if the provisioner StopInstance fails that should report; if the instance-cleaner Whatever fails, that should also report
[14:21] <fwereade> babbageclunk, it might also be useful to look at what storageprovisioner has done
[14:21] <fwereade> babbageclunk, with the internal queue for delaying operations if they can't be done yet
[14:22] <babbageclunk> fwereade: Oh I see, so if both of them fail then an error in the provisioner might be hidden by one in the cleanup worker.
[14:22] <fwereade> babbageclunk, yeah, exactly
[14:23] <babbageclunk> fwereade: ok, that's heaps to go on with - I'll probably need more pointers once I'm a bit further along.
[14:23] <babbageclunk> fwereade: Thanks!
[14:23] <fwereade> babbageclunk, (nothing would be *lost*, because status-history, but it would be good to do better)
[14:23] <fwereade> babbageclunk, np
[14:23] <fwereade> babbageclunk, always a pleasure
[14:25] <mup> Bug #1604644 opened: juju2beta12: E11000 duplicate key error collection: juju.txns.stash <conjure> <mongodb> <juju-core:New> <https://launchpad.net/bugs/1604644>
[14:43] <sinzui> sorry cherylj: got pulled inot a meeting. Go is writing errors to stdout You can see the failure in http://juju-ci.vapour.ws:8080/job/github-merge-juju/8475/artifact/artifacts/trusty-out.log
[14:43] <sinzui> cherylj: I think we can create unified log so that the order of events and where to look are in a single place
[15:02] <rick_h_> katco: ping for standup
[15:02] <katco> rick_h_: oops omw
[16:16] <mup> Bug #1604883 opened: add us-west1 to gce regions in clouds via update-clouds <juju-core:New> <https://launchpad.net/bugs/1604883>
[16:25] <mup> Bug #1604883 changed: add us-west1 to gce regions in clouds via update-clouds <juju-core:New> <https://launchpad.net/bugs/1604883>
[16:34] <mup> Bug #1604883 opened: add us-west1 to gce regions in clouds via update-clouds <juju-core:New> <https://launchpad.net/bugs/1604883>
[16:41] <perrito666> anyone has spare time to review this http://reviews.vapour.ws/r/5282/diff/# ? its not a very short one, its part of a set of changes to support ControllerUser permissions, I am happy to discuss what this particular patch does if anyone goes for the rev
[16:45] <natefinch> rick_h_: I have a ship it for the interactive bootstrap stuff... should I push it through or wait for master to be unblocked?
[16:45] <rick_h_> natefinch: wait for master please atm
[16:46] <rick_h_> natefinch: just mark it as blocked on the card on master
[16:51] <natefinch> rick_h_: will do
[16:52] <rick_h_> natefinch: got a sec?
[16:52] <natefinch> rick_h_: yep
[16:52] <rick_h_> natefinch: https://hangouts.google.com/hangouts/_/canonical.com/rick?authuser=1
[16:58]  * rick_h_ goes for lunchables then
[17:37] <natefinch> god I love small unit tests
[17:38] <natefinch> I love that it tells me "you have an error in this 20 lines of code"
[18:49] <rick_h_> jcastro: marcoceppi arosales heads up docs switch is done and the jujucharms.com site is all 2.0 all the time https://jujucharms.com/docs
[18:49] <marcoceppi> rick_h_: yesssss
[18:50] <mup> Bug #1604915 opened: juju status message: "resolver loop error" <oil> <oil-2.0> <juju-core:New> <https://launchpad.net/bugs/1604915>
[18:50] <rick_h_> marcoceppi: will send an email shortly, want to check on status of b12 in xenial update to go along with it
[19:05] <mup> Bug #1604919 opened: juju-status stuck in pending on win2012hvr2 deployment <oil> <oil-2.0> <juju-core:New> <https://launchpad.net/bugs/1604919>
[19:06] <natefinch> rick_h_: output for interactive commands on stdout or stderr?
[19:07] <rick_h_> natefinch: so jam had some thoughts and added notes to the interactive spec on that
[19:07] <natefinch> rick_h_: ok, I was wondering who added that.  it was incomplete so I was hoping for clarification
[19:07]  * rick_h_ loads doc to double check
[19:08] <rick_h_> natefinch: ah, yea looks like he didn't finish typing
[19:08] <natefinch> rick_h_: I know the answer for non-interactive commands, but not sure if it should be different for interactive
[19:08] <natefinch> rick_h_: given that there's no real scriptable output
[19:09] <natefinch> (I mean, you can script anything, but it's not made with that in mind)
[19:09] <rick_h_> natefinch: can you ping him to clarify the rest, but the start is there as far as for interactive I think that's the idea that the questions/etc should go to stderr, but if we confirm things "successfully added X" it's stdout
[19:09] <natefinch> rick_h_: ok, yeah, I'll talk to him about it.
[19:09] <rick_h_> natefinch: ty
[19:15] <natefinch> oh man... writing this package to handle the formatting of user interactions was the best idea I ever had.
[19:16] <natefinch> ok, maybe not the best idea ever. But... it's certainly saving my ass.
[19:22] <rick_h_> natefinch: <3
[19:23] <alexisb> natefinch, so that begs the questions, what was your best idea ever
[19:24] <natefinch> alexisb: that's like the best set up for a joke I've ever had....
[19:25] <natefinch> alexisb: marrying my wife, obviously.  Only slightly behind would be the idea to switch from Mechanical Engineering to Computer Science in school.  Dodged a bullet there.
[19:26] <natefinch> I have a couple mech-e friends... they basically design screws all day long
[19:26] <alexisb> natefinch, yep
[19:27] <alexisb> I got to my first statics class, followed by drafting and went "o hell no!"
[19:28] <alexisb> I also had some time at Racor systems (Parker affiliate) and watched there engineers at a ProE screen all day
[19:28] <alexisb> no thank you
[19:28] <natefinch> yuuup
[19:29] <natefinch> I realized fairly early that I found physics fascinating in the abstract, but the reality of actually figuring shit out was mind-bogglingly boring.
[19:29] <alexisb> at racor, the acturally factory was AWESOME, which is where I started wtih control systems
[19:30] <perrito666> wanna do some boring mech things, try calculating elevators for a living
[19:30] <perrito666> most revealing class I ever had
[19:30] <natefinch> friend of mine makes maglev elevators for things like aircraft carriers.... still pretty boring work in the small
[19:31] <natefinch> he'd probably say the same for my job, though ;)
[19:31] <natefinch> "So... you twiddled with carriage returns all day?"
[19:32] <perrito666> lol "so, found that missing statement?"
[19:32] <mup> Bug #1604931 opened: juju2beta12: unable to destroy controller properly on localhost <conjure> <juju-core:New> <https://launchpad.net/bugs/1604931>
[19:32] <perrito666> but I was talking about builting elevators, I actually had to spend a semester calculating those
[19:38]  * rick_h_ runs to get the boy from school
[19:53] <arosales> rick_h_: great to hear thanks for the fyi
[20:30] <natefinch> are we supposed to be able to add-cloud for providers like ec2?
[20:31] <natefinch> it doesn't look like we're stopping people from doing that
[20:35] <mup> Bug #1604955 opened: TestUpdateStatusTicker can fail with timeout <ci> <intermittent-failure> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1604955>
[20:44] <mup> Bug #1604959 opened: Failed restore juju.txns.stash 'collection already exists' <backup-restore> <ci> <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1604959>
[20:48] <natefinch> rick_h_: It's a little weird that clicking on "stable" in jujucharms brings you to the 2.0 docs, which say at the top, in red "Juju 2.0 is currently in beta which is updated frequently. We don’t recommend using it for production deployments."
[20:58] <natefinch> the problem with MAAS API URL is that it looks like I'm shouting, but really it's just TLA proliferation
[21:11] <mup> Bug #1604961 opened: TestWaitSSHRefreshAddresses can fail on windows <ci> <intermittent-failure> <test-failure> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1604961>
[21:11] <mup> Bug #1604965 opened: machine stays in pending state even though node has been marked failed deployment in MAAS <oil> <oil-2.0> <juju-core:New> <https://launchpad.net/bugs/1604965>
[21:17] <redir> you sure are
[21:19] <redir> ignore ^^
[21:25] <menn0> perrito666: ping
[21:26] <perrito666> menn0: ping
[21:27] <perrito666> sorry pong
[21:27] <perrito666> menn0: did I break something?
[21:28] <menn0> perrito666: no, I just wanted to apologise for not getting to your ACLs PR yesterday... the day got swallowed up by critical bugs
[21:28] <menn0> perrito666: I was about to review now and see that it's been discarded?
[21:29] <perrito666> menn0: oh, no worries, you would not have been able to review it yesterday anyway, it had a dependency on an unmerged branch and the diff was uncomprehensible, I droped it, merged the pending branch and re-proposed
[21:30] <perrito666> RB is really misleading, I thought that adding the dependency on the depends on field would fix the diff but did nothing at all and then it would not allow me to upload my own diff
[21:30] <perrito666> I think we should change RB for something a bit more useful, like snapchat
[21:30] <menn0> perrito666: LOL :)
[21:31] <menn0> perrito666: I've been wondering about Gerrit or Phabricator, they seem like the best alternatives
[21:31] <perrito666> I checked one of those during the sprint, and I liked it, I cant rememmber which one though, Phabricator I think
[21:32] <perrito666> menn0: also, the only person that knew something about our RB is no longer in this team which makes an interesting SPOF
[21:32] <menn0> perrito666: I don't think the ops side of RB is particularly hard
[21:32] <menn0> perrito666: and I *think* the details are written down /somewhere/
[21:33] <perrito666> menn0: I fear that the certain somewhere is an email :s
[21:33] <perrito666> anyway, eric usually knew the dark secrets like how to actually make a branch depend on
[21:33] <menn0> perrito666: phab is nice. I've used it a bit at one job. it support enforcing a fairly strict (but customized) development process..
[21:34] <menn0> perrito666: you do this: rbt post --disable-ssl-verification -r <review_number> --parent <parent_branch>
[21:34] <menn0> perrito666: and then check how it looks on the RB website and hit Publish
[21:35] <katco> menn0: perrito666: i've been interested in how this works out for teams: https://github.com/google/git-appraise
[21:35] <perrito666> menn0: ah, I need some non magic interaction :)
[21:35] <perrito666> menn0: if you ask me (and even if you dont) if it cant be done on the website, its broken
[21:36] <menn0> perrito666: I think you can upload arbitrary diffs to RB... but I've never done it
[21:36] <menn0> katco: looks interesting! I hadn't heard of git-appraise before
[21:36]  * menn0 reads more
[21:37] <katco> menn0: i enjoy the decentralized nature. no ops needed
[21:37] <katco> menn0: or at least i think i *would*. i've never used this
[21:37] <perrito666> menn0: well I actually tried, It seems to assume rb has something it doesnt, we might have broken that particular workflow with our magic bot
[21:39] <perrito666> katco: that looks amazing but seems to not work very nicely with github workflow (which we sort of use)
[21:39] <menn0> katco: storing the reviews in git is a nice idea. the way you add comments is a little unfriendly though. I guess the expectation is that someone will create a UI/tool for that.
[21:40] <katco> perrito666: just saw this: https://github.com/google/git-pull-request-mirror
[21:40] <katco> menn0: and just found this: https://github.com/google/git-appraise-web
[21:41] <redir> who's the resident data race expert?
[21:41] <perrito666> katco: mm, really interesting, do you know actual users of this, I am interested in seeing how it behaves in heavily conflictive envs
[21:41] <perrito666> redir: we all are good adding data races :p
[21:42] <redir> perrito666: OK who's the resident data race tortoise?
[21:42] <perrito666> redir: well, you are not in luck, its dave cheney :p
[21:42] <menn0> katco: that improves the situation somewhat! :)
[21:42] <perrito666> redir: just throw the problem to the field and well see how can we attack it
[21:43] <katco> perrito666: i do not. this looks fairly active? https://git-appraise-web.appspot.com/static/reviews.html#?repo=23824c029398
[21:43] <perrito666> man, was that english broken or what? ;p I am loosing my linguistic skills
[21:43] <redir> I think it is pretty straightforward
[21:46] <perrito666> katco: very interesting, I really like the idea of storage of these things In the repo
[21:46] <perrito666> but ill say something very shallow
[21:46] <redir> https://github.com/go-mgo/mgo/blob/v2/socket.go#L329 needs to be locked so it doesn't race with https://github.com/go-mgo/mgo/blob/v2/stats.go#L59
[21:46] <redir>  I think
[21:46] <perrito666> the UI is ugly as f***
[21:46] <redir> trouble reproducing
[21:46] <katco> perrito666: it is certainly spartan
[21:47] <katco> perrito666: personally, i would be writing an emacs plugin for this if someone hasn't already
[21:48] <katco> redir: why are stats being reset before kill has been returned? i think there's a logic bomb there
[21:49] <perrito666> I dont know what kind of spartans you know, the ones from the movie certainly look better than that UI :p
[21:49] <katco> perrito666: sorry, i intended this usage: "adj.	Simple, frugal, or austere: a Spartan diet; a spartan lifestyle."
[21:51] <perrito666> katco: I know, I intended to : " troll (/ˈtroʊl/, /ˈtrɒl/) is a person who sows discord on the Internet by starting arguments or upsetting people,"
[21:51] <katco> lol
[21:52] <perrito666> redir: while killing, imho you should be locking everything indeed, but I have not checked past these two links to know if I am speaking the thruth about this particular issue
[21:53] <perrito666> katco: I do dislike the ui though, I prefer something like github without the insane one mail per comment thing
[21:55] <katco> redir: also i don't think that's the race. socketsAlive locks the mutex before doing anything: https://github.com/go-mgo/mgo/blob/v2/stats.go#L135
[21:56] <redir> mkay thanks
[21:56] <redir> perrito666: katco ^
[21:59] <perrito666> moving to a silent neigbourhood is glorious for work
[22:05] <mup> Bug #1604988 opened: Inconsistent licence in github.com/juju/utils/series <jujuqa> <packaging> <juju-core:Triaged> <juju-core 1.25:Triaged> <juju-core (Ubuntu):New> <https://launchpad.net/bugs/1604988>
[22:43] <menn0> katco: you're convincing me that we should experiement with vendoring some more :)
[22:43] <katco> menn0: eep...
[22:43] <katco> menn0: as long as how go does vendoring is well understood, i'm happy. i am scared of diverging too much without forethought
[22:44] <menn0> katco: sure... it's not something we should do lightly. and if do it, it should use Go's standard mechanism.
[22:44] <perrito666> menn0: re our previous talk http://reviews.vapour.ws/r/5282/diff/#
[22:45] <katco> menn0: yeah, agreed
[22:48] <menn0> perrito666: ok. I can take a look.
[22:49] <menn0> perrito666: my initial comment is that I wish this was 2 PRs: one for state and one for apiserver (but I will cope)
[22:50] <perrito666> menn0: I am sorry I promise I tried to make it smaller
[22:51] <perrito666> menn0: its smaller than it looks though, small changes in many files
[22:52] <katco> menn0: i think i messed up the tech board permissions. i was trying to get a link and it looked publicly accessible, so i disabled that. now i can't view it
[22:53] <menn0> katco: I'll take a look
[22:53] <katco> menn0: sorry about that
[22:53] <menn0> katco: you completely removed canonical access :) not sure how to put it back yet
[22:54] <mup> Bug #1605008 opened: juju2beta12 and maas2rc2:  juju status shows 'failed deployment' for node that was 'deployed' in maas <oil> <oil-2.0> <juju-core:New> <MAAS:New> <https://launchpad.net/bugs/1605008>
[22:54] <katco> menn0: wait what! all i did was turn off link sharing :(
[22:54] <menn0> katco: figured it out. what was it before? anyone at canonical can edit or view?
[22:55] <menn0> or comment?
[22:55] <katco> menn0: could comment i think, but it looked like external people with link could view as well
[22:56] <menn0> katco: ok, it's fixed. anyone from canonical can comment again.
[22:56] <katco> menn0: ta, sorry
[22:57] <axw> wallyworld: did I miss anything on the call? slept through my alarm supposedly, pretty sure it didn't go off though
[22:57] <axw> need to ditch this dodgy phone
[22:58] <perrito666> axw: or get a clock
[22:58] <wallyworld> axw: not a great deal, just release recap, tech board summary
[22:58] <axw> perrito666: could do that too, I'd rather have it near my head so I don't wake up my wife
[22:58] <axw> suppose I could move the clock...
[22:58] <perrito666> axw: get a deaf people clock
[22:58] <axw> wallyworld: ok, ta
[22:58] <perrito666> (not trolling, these are a thing)
[22:59] <axw> perrito666: ah, have not seen one
[22:59] <perrito666> they have a thing that you put in your pillow and it vibrates
[22:59] <perrito666> much like your phone, but less points of failure
[22:59] <axw> I guess I could just use my fitbit then. if I can find it, and my charger...
[23:00] <mup> Bug #1605008 changed: juju2beta12 and maas2rc2:  juju status shows 'failed deployment' for node that was 'deployed' in maas <oil> <oil-2.0> <juju-core:New> <MAAS:New> <https://launchpad.net/bugs/1605008>
[23:00] <axw> anyway
[23:00]  * axw stops debugging alarm replacement issues
[23:06] <mup> Bug #1605008 opened: juju2beta12 and maas2rc2:  juju status shows 'failed deployment' for node that was 'deployed' in maas <oil> <oil-2.0> <juju-core:New> <MAAS:New> <https://launchpad.net/bugs/1605008>
[23:18] <alexisb> axw, thumper ping
[23:18] <axw> coming, sorry
[23:18] <thumper> coming
[23:49] <redir> axw: thanks for the protip in the review. helpful
[23:49] <axw> redir: np
[23:57] <thumper> menn0: so you are working with redir on the race?