[01:46] <davecheney> thumper: more debug is neede here
[01:46] <davecheney> 2013-08-29 01:42:25 DEBUG juju.worker.uniter.filter filter.go:289 got unit change
[01:46] <davecheney> 2013-08-29 01:42:26 INFO juju.worker.upgrader upgrader.go:138 required tools: 1.13.2-precise-amd64
[01:46] <davecheney> all the units are blocked on this line
[01:46] <davecheney> no further output
[01:46] <thumper> hmm...
[01:47] <thumper> doesn't seem too helpful
[01:47] <davecheney> this is in ap-southeast-2 as well
[01:47] <bigjools> wallyworld__: how's the coffee machine?
[01:47] <davecheney> so it shouldn't take more than a few seconds to get the tools
[01:49] <thumper> I wonder where it is blocked
[01:50] <davecheney> i'll hit it with SIGQUIT and hope stderr goes somewhere
[01:51] <axw> wallyworld__: I've just pushed some changes to my image-metadata branch that fixes the marshalling so numbers aren't floats
[01:51] <davecheney> thumper: http://paste.ubuntu.com/6038698/
[01:52] <axw> wallyworld__: it's a bit gnarly though, you might want to review my changes to environs/simplestreams
[01:52] <davecheney> thumper: right, it did the upgrade, but it didn't restart
[01:53] <thumper> hmm...
[01:54] <davecheney> hmm, maybe it worked
[01:55] <davecheney> hard to tell from the output
[01:59] <davecheney> thumper: ok, here is the issue
[01:59] <davecheney> 2013-08-29 01:59:40 INFO juju runner.go:253 worker: start "upgrader"
[01:59] <davecheney> 2013-08-29 01:59:43 DEBUG juju.worker.uniter.filter filter.go:289 got unit change
[02:00] <davecheney> 2013-08-29 01:59:43 INFO juju.worker.upgrader upgrader.go:138 required tools: 1.13.3.1-precise-amd64
[02:00] <davecheney> ^ this is the message from the upgrade _after_ it has restated
[02:04] <davecheney> oops 2013-08-29 02:03:48 INFO juju.worker.uniter context.go:234 HOOK Shutting down without a db
[02:04] <davecheney> 2013-08-29 02:03:48 INFO juju.worker.uniter context.go:234 HOOK /var/lib/juju/agents/unit-mediawiki-0/charm/hooks/db-relation-departed: line 4: /var/lib/juju/agents/unit-mediawiki-0/charm/hooks/stop: No such file or directory
[02:04] <davecheney> nope, charm bug
[02:05] <wallyworld__> bigjools: awesome. i've just made a second one for today
[02:05] <bigjools> wallyworld__: \o/
[02:05] <wallyworld__> axw: thanks, i'll take a look
[02:06] <axw> wallyworld__: thanks. also, I started using tools.Fetch to test... but it looks like that only returns a single version? is that right?
[02:07] <axw> so I guess I'll just grab the files from storage and check their contents directly
[02:08] <wallyworld__> axw: the Fetch method can return multiple but the constraint limits it. i've pushed a branch for review which allows loser matching and hence returns multiple
[02:08] <axw> wallyworld__: ok, cool
[02:08] <wallyworld__> to generate the metadata, i think the current best option is to grab from storage
[02:08] <wallyworld__> we will/should revist that i think
[02:24] <wallyworld__> axw: it is hacky isn't it. perhaps we can make it better by working the construct logic into the unmarshall method
[02:24] <axw> wallyworld__: moving it into which unmarshal method?
[02:25] <wallyworld__> the json unmarshaller for the collection
[02:26] <wallyworld__> so instead of shoving stuff into a "" key, we can call construct and fill out the collection directly
[02:26] <axw> wallyworld__: the problem is that the item type isn't known there
[02:27] <axw> wallyworld__: there's no way to convey context to the unmarshaller, except through a global :(
[02:27] <wallyworld__> is it sort of since we figure out the call point
[02:27] <axw> hm
[02:27] <wallyworld__> so we can map that to type
[02:27] <axw> I don't like any of the solutions :)
[02:27] <wallyworld__> i'm being a bt hand wavey, but i think it is possible
[02:28] <wallyworld__> i agree, they all suck
[02:28] <axw> yeah I get you
[02:28] <wallyworld__> i think Go's json unmarshalling sucks
[02:28] <wallyworld__> it can't be extended quite right
[02:29] <wallyworld__> at least we would be confining the hackery to a single unmarshall method
[02:29] <wallyworld__> so the method is a black box that "just works" but is hacky inside
[02:29] <wallyworld__> and i'd put it separately in a json.go file in the same package
[02:31] <axw> hmmmm
[02:31] <axw> I'll have a look later
[02:31] <wallyworld__> ok
[02:31] <axw> need to respond to some review comments
[02:31] <wallyworld__> np
[02:34] <axw> wallyworld__: so one way you could do it is this: have ParseCloudMetadata store itemType in a global, protected with a mutex (only one unmarshal at a time); have ItemsMap.UnmarshalJSON do the Callers check, and then use the global
[02:34] <axw> is that what you were thinking?
[02:35] <wallyworld__> axw: sort of. i was thinking there'd be a map of method name (as determined in the current code) -> type
[02:35] <wallyworld__> so no mutex needed
[02:35] <wallyworld__> just look u the map after calling method is determined
[02:36] <axw> ah yeah ok
[02:36] <axw> so they just register up front the function name -> reflect.Type
[02:37] <wallyworld__> yep
[02:37] <wallyworld__> still hacky, but it allows it to be kept under the rug
[02:38] <wallyworld__> and isolated from the core business logic
[03:23] <axw> thumper: can you please explain this to me (comment by William)? "Never use conn.Environ if you can possibly help it. It's basically never up to date."
[03:24] <thumper> axw: sure
[03:24] <thumper> the conn.Environ is the environment based purely on the parsing of the local config
[03:24] <thumper> not the value in the bootstrap node
[03:24] <thumper> when a machine is bootstrapped
[03:24] <thumper> is initializes passwords etc
[03:24] <thumper> so they are no longer the same
[03:24] <axw> ah right
[03:24] <thumper> also, if someone calls juju set-environment
[03:24] <thumper> it doesn't modify the local
[03:25] <thumper> only the bootstrap copy
[03:25] <thumper> which then notifies all the workers
[03:25] <thumper> does that make sense?
[03:25] <axw> thumper: yes
[03:25] <thumper> cool
[03:26] <axw> I've seen the code that updates config from state
[03:26] <axw> not sure about an Environ tho
[03:27] <davecheney> trollololo, more bugs
[03:33] <davecheney> https://bugs.launchpad.net/juju-core/+bug/1218168
[03:33] <_mup_> Bug #1218168: cmd/juju: upgrade-charm does not expand tilde in filepaths <papercut> <juju-core:Triaged> <https://launchpad.net/bugs/1218168>
[04:08]  * thumper runs the tests knowing that they'll fail
[04:09]  * thumper was very surprised to see them all pass
[04:09] <thumper> was in the wrong pipe
[04:12]  * thumper smiles at the failures
[04:12] <thumper> seing base64 encoded certs rather than yaml serialized []byte
[04:34] <davecheney> thumper: so you fixed that huge turd of output in the cloud-init-output ?
[04:42] <davecheney> bigjools: https://bugs.launchpad.net/maas/+bug/1218182
[04:42] <_mup_> Bug #1218182: No way to put a node into "maintenance mode" <MAAS:New> <https://launchpad.net/bugs/1218182>
[04:42] <davecheney> how can this be a thing
[04:43] <davecheney> surely someone has asked for this already
[04:43] <bigjools> just re-commission it
[04:43] <bigjools> but yes generally that would be useful
[04:43] <bigjools> we even have a node state for it that's not used yet
[04:52] <davecheney> bigjools: can you recomission remotely ?
[04:52] <davecheney> or does one need to get off ones ass ?
[04:53] <bigjools> davecheney: just click on the ui button to do it
[04:53] <bigjools> however
[04:53] <bigjools> it'll do the usual cycle and may not fail if nothing is getting installed
[04:53] <bigjools> we need to add commissioning tests
[04:54] <bigjools> deletion is the best way of taking it out for now
[04:58] <jam1> davecheney: ~ is expanded by your shell, and has different behaviors if you attach an argument. Try: "echo -f=~/" vs "echo -f ~/" For me the former doesn't expand, the latter does expand.
[05:00] <davecheney> jam: we hit this in japan
[05:00] <davecheney> it would be nice if there was a solution
[05:01] <jam> davecheney: use --repository ~/foo/bar
[05:01] <jam> it works
[05:01] <jam> don't use --repository=~/foo/bar it doesn't
[05:01] <davecheney> lucky(~/src/launchpad.net/juju-core) % juju upgrade-charm --repository=~/charms --switch local:mediawiki mediawiki
[05:01] <jam> bash-ism
[05:01] <davecheney> error: no repository found at "/home/dfc/src/launchpad.net/juju-core/~/charms"
[05:01] <davecheney> sure
[05:01] <davecheney> but can we expand it in the command
[05:30] <axw> thumper: did you want to do another review of my manual provisioning changes, or are you happy with it? fwereade_ has LGTM'd
[05:41] <thumper> axw: if you don't mind, I'll take a quick look
[05:41] <thumper> but probably just before the meeting
[05:42]  * thumper breaks for dinner, back later
[05:45] <axw> thumper-dinner: not at all, I will go back to simplestreams stuff for now
[05:46] <axw> jam: what is that gwacl/failing-test thing about?
[05:47] <jam> axw: sorry for the noise. bigjools has asked that I set up gwacl under our tarmac bot, and I want to test that it both "successfully lands good patches" and "successfully fails bad patches"
[05:49] <axw> jam: I was just curious, not bothered :)
[05:49] <axw> thanks
[06:23] <rogpeppe1> mornin' all
[06:28] <jam> morning rogpeppe1
[06:28] <rogpeppe> jam: hiya
[06:33] <rogpeppe> jam: any idea what might be going on here? It stopped my merge last night and i can't reproduce it locally. https://code.launchpad.net/~rogpeppe/juju-core/376-factor-out-provider-utils/+merge/182465/comments/414178
[06:33] <rogpeppe> fwereade_: ^
[06:35] <jam> rogpeppe: the only thing that comes to mind is that the test suite is "leaking" information an the test case is trying to contact a mongodb running elsewhere.
[06:35] <jam> (either a 'zombie' one that didn't get torn down properly or something else)
[06:36] <rogpeppe> jam: that is a possibility i suppose, if the port selection logic isn't working
[06:36] <rogpeppe> jam: i'm considering changing it actually
[06:37] <rogpeppe> jam: currently it makes a socket, letting the system choose the port, then closes the socket, and trusts that it will be still ok to use in a short while
[06:37] <rogpeppe> jam: we can't avoid some window, but i'm wondering if it might be better to pick a port at random, check that we can't dial it, then use that
[06:38]  * rogpeppe wishes the port name space was considerably larger
[06:38] <jam> rogpeppe: could we pick a port but open it as reopenable and only close it after something else has grabbed it?
[06:39] <jam> SOCK_REUSEADDR or whatever that param is
[06:39] <rogpeppe> jam: i don't think SOCK_REUSEADDR works like that
[06:39] <rogpeppe> jam: it only allows address reuse of unique local/remote pairs, AFAIR
[06:39] <rogpeppe> jam: i don't think it allows you to bind two listeners to the same port
[06:39] <jam> http://stackoverflow.com/questions/775638/using-so-reuseaddr-what-happens-to-previously-open-socket
[06:40] <jam> rogpeppe: probably
[06:40] <jam> I know you can bind them to the same port by doing fork magic
[06:40] <rogpeppe> jam: you mean by inheriting the fd?
[06:40] <jam> rogpeppe: right. it is how apache used to do it with their forking daemons.
[06:41] <jam> one of the subprocesses "wins" the Accept request
[06:41] <rogpeppe> jam: that's kind of a different thing - it's just sharing the already bound socket
[06:41] <jam> rogpeppe: though to be fair, I don't have high expectations that it is actually the bug you are seeing.
[06:41] <jam> rogpeppe: but I *have* been seeing a lot of zombie mongodb's this week.
[06:41] <rogpeppe> jam: i went through a rash of "cannot bind to port" problems last week
[06:43] <jam> rogpeppe: "SO_REUSEPORT... allows you to bind an arbitrary number of sockets to exactly the same source address and port as long as all prior bound sockets also had SO_REUSEPORT"
[06:43]  * rogpeppe didn't know about REUSEPORT
[06:43] <jam> but it looks like it may be a BSD flag
[06:43] <rogpeppe> jam: but i don't think that helps us
[06:44] <rogpeppe> jam: even if it was available
[06:44] <rogpeppe> jam: because mongod won't be using that flag
[06:44] <jam> rogpeppe: "Linux 3.9 added the option SO_REUSEPORT to Linux as well"
[06:44] <jam> rogpeppe: it does
[06:44] <jam> because *we* use that flag
[06:44] <jam> and then the next person can bind without the flag
[06:44] <jam> at least from what I read
[06:44] <jam> I could be wrong
[06:45] <rogpeppe> jam: i slightly doubt it. let's check, one mo
[06:45] <jam> rogpeppe: the wording is a bit funny, so no guarantees
[06:45] <jam> also, I think precise is older than kernel 3.9
[06:45] <rogpeppe> jam: the wording sounds to me as if all binders to the port must use that option
[06:45] <rogpeppe> jam: otherwise it's really quite dangerous
[06:46] <rogpeppe> jam: because i might wish to bind a server to a port, but because the previous server on that port has used REUSEPORT, it allows us anyway, and then we get two different servers randomly sharing the same network address
[06:47] <rogpeppe> jam: there's another possibility which won't rule out the failure but will make it more obvious what's happening
[06:48] <rogpeppe> jam: which is to get the API server to send back the env UUID and have the client check that
[06:48] <rogpeppe> jam: then at least we will definitively know when we're talking to an unexpected server
[06:50] <jam> rogpeppe: except we've already gotten rejected by that point haven't we? Or are you saying we send it before login?
[06:51] <rogpeppe> jam: we'd need to make it available even to non-logged-in clients though, and i'm not sure if that would be judged to be an unwanted information leak
[06:51] <jam> haven't we validated the cert by this point?
[06:51] <rogpeppe> jam: yeah, we have validated the cert, yes
[06:51] <rogpeppe> jam: and the cert is randomly generated, so actually that's a good point
[06:53] <rogpeppe> jam: and i *think* we use a secure mongo connection even in tests
[06:55] <rogpeppe> jam: no, i don't think it can be a duplicate mongo problem
[06:55] <jam> rogpeppe: I know that if you have a mongo that doesn't support TLS the test suite dies in horrible ways
[06:58] <rogpeppe> hrmph, well it's managed to merge noe
[06:58] <rogpeppe> now
[07:00] <rogpeppe> that failure does concern me though. wtf can be going on, when we've got two concurrent sessions connected to the same API port, the second of which fails logging in with exactly the same creds as the first succeeded with?
[07:05] <rogpeppe> fwereade_: i just noticed this comment of yours: "
[07:05] <rogpeppe> Never use conn.Environ if you can possibly help it. It's basically never up to
[07:05] <rogpeppe> date.
[07:05] <rogpeppe> "
[07:05] <rogpeppe> fwereade_: what does it mean for an Environ to be "up to date"?
[07:06] <rogpeppe> fwereade_: ah, you mean that it might have out of date config attrs?
[07:09] <jam> rogpeppe: he means that conn.Environ is read from local disk, but the source of truth is actually conn.State.Environ
[07:09] <jam> or whatever the actual request is
[07:10] <jam> rogpeppe: which is why for the CLI API stuff we went away from using APIConn and are trying to get away with just api.Client
[07:10] <rogpeppe> jam: yeah
[07:10] <rogpeppe> jam: (well, that's not the only reason for avoiding client use of Environ)
[07:11] <jam> rogpeppe:  this one also failed earlier today, and left a mongodb running: https://code.launchpad.net/~allenap/juju-core/makefile-stuff/+merge/181113
[07:11] <jam> a Watcher didn't err when it exited
[07:11] <jam> I don't know why
[07:11] <jam> I'm concerned we introduced some race conditions recently without realizing it.
[07:11] <jam> Or the bot is just on a VM that is having neighbor issues, which triggers these less frequent problems.
[07:14] <rogpeppe> jam: interesting, that also died in TestManageStateServesAPI
[07:14] <rogpeppe> jam: (the same place i saw the problem)
[07:15] <jam> rogpeppe: I don't quite see that in the 500 line panic, but I trust you did
[07:15] <jam> and yes, we might just have a flakey test, that when it fails might leave a mongodb running.
[07:16] <rogpeppe> jam: (to find which test is running, search for .Test in the stack trace
[07:16] <rogpeppe> )
[07:16] <rogpeppe> jam: even if we have moribund mongodb's, that shouldn't be a problem AFAICS
[07:17] <jam> rogpeppe: I'm saying the test is *causing* moribound mongodb's not that it is affected by them.
[07:17] <rogpeppe> jam: ah, right, yeah
[07:17] <jam> (test suite teardown is known to fail to tear down the mongodb it started)
[07:17] <rogpeppe> jam: it probably happens when a goroutine panics without a recover
[07:18] <rogpeppe> jam: (that happened in this case)
[07:21] <jam> rogpeppe: I thought gocheck recovered from all panics in order to catch them cleanly and report errors from test cases ?
[07:21] <jam> is this happening in a TearDownSuite or something where it isn't being caught?
[07:21] <rogpeppe> jam: it can't recover from panics in goroutines that it didn't start
[07:22] <jam> rogpeppe: ah, even if it starts the one that started it
[07:22] <rogpeppe> jam: which is true in this case - the panic is in a watcher goroutine
[07:22] <rogpeppe> jam: yes - you can't catch panics from many goroutines at once - that would be nastily asynchronous
[07:23] <rogpeppe> jam: the usual solution is to put some cleanup code outside the main executable
[07:27] <axw> wallyworld_: I came up with a better approach to the JSON problem. It's now very similar to how it was originally, but without the float problem
[07:28] <wallyworld_> great :-)
[07:28] <axw> and with no runtime callstack crap
[07:28] <wallyworld_> hooray, i'll take a look in a bit
[07:29] <rogpeppe> jam: i've found how that panic can happen, i think
[07:54] <TheMue> rogpeppe: you've once discussed https://bugs.launchpad.net/juju-core/+bug/1202163 with hazmat
[07:54] <_mup_> Bug #1202163: openstack provider should have config option to ignore invalid certs <papercut> <juju-core:Triaged by themue> <https://launchpad.net/bugs/1202163>
[07:54] <mgz> okay, nearly reboot for meeting time
[07:54] <mgz> TheMue: I'm probably the best person to ask about that bug
[07:54] <TheMue> rogpeppe: could you give me a hint where this change has to be done
[07:55] <TheMue> mgz: ok, so we'll continue after the meeting, thanks
[07:55] <TheMue> mgz: that you can reboot now ;)
[07:55] <rogpeppe> TheMue: it's probably something that needs a change in goose too
[07:56] <rogpeppe> TheMue: basically we need some way to tell goose that it should ignore unknown certs
[07:56] <rogpeppe> TheMue: within goose, it would need to set the TLS config on the https request that it makes to have InsecureSkipVerify=true
[07:57] <TheMue> rogpeppe: ah, this makes it more clear. already looked in our code as a first step, but not goose
[07:58] <rogpeppe> TheMue: cool
[08:01] <mgz> TheMue: posted a comment in the bug
[08:03] <TheMue> mgz: thx
[08:04] <thumper> davecheney: coming to the meeting?
[08:43]  * fwereade_ goes to get breakfast
[08:44] <jam> rogpeppe: you mention that you might have a solution for the TestManageStateServesAPI bug?
[08:44] <rogpeppe> jam: for the panic(no error) problem anyway, i think, yes
[08:45] <jam> fwereade_: what did you decide on: https://code.launchpad.net/~fwereade/juju-core/prepare-leave-scope/+merge/181065
[08:45] <jam> the thread went long, and sort of ended on "maybe this is or isn't correct"
[08:46] <rogpeppe> jam: i believe it's because the underlying State is being closed, which causes the state.Watcher to be stopped and return without an error
[08:46] <rogpeppe> s/state\.Watcher/watcher.Watcher/
[08:46] <jam> rogpeppe: so still just a race condition, right?
[08:46] <rogpeppe> jam: not really
[08:46] <jam> as in we expect the Watchers to be closed first
[08:46] <jam> but in this case the State got closed.
[08:47] <jam> rogpeppe: given it doesn't always fail it is clearly *some* sort of race condition :)
[08:47] <rogpeppe> jam: hmm, there's definitely some racing involved, yes
[08:48] <rogpeppe> jam: but i think it's wrong that the state watchers assume that because the underlying state has been closed they can panic
[08:48] <rogpeppe> jam: you're probably right that the watchers should probably be closed nicely before the state is closed
[08:49] <jam> rogpeppe: so I think the MustErr thing is because in production they want to expect that if they are shutting down, there is a reason for it.
[08:49] <jam> rogpeppe: But I find it very strange to panic when you *don't* have an error :)
[08:50] <jam> rogpeppe: there *might* be a case where we purposefully trigger something like that during upgrade to force the process to restart, but I'm not 100% sure how all that ties together.
[08:50] <rogpeppe> jam: the usual reason for MustErr is that if you're the code solely in control of a watcher, you *know* that it can't have been stopped by something else
[08:50] <rogpeppe> jam: so if it dies without an error there's something weird enough going on to warrant a panic
[08:50] <jam> fwereade_: thinking of upgrading. You had a comment about WatchAPIVersion (I think) being unhappy before we've gotten our env credentials set up. Is it just that you don't want that API call to return until we're ready to handle FindTools requests, or ?
[08:51] <rogpeppe> jam, fwereade_: FWIW i don't think it's a good idea to make that call block indefinitely until there's a valid environ config
[08:52] <rogpeppe> jam, fwereade_: because that could take forever and there's no way of interrupting that call once it's in progress
[08:53] <rogpeppe> fwereade_:
[08:53] <rogpeppe> jam, fwereade_: although having said that, i'm not sure i can think of a better alternative
[08:54] <jam> rogpeppe: well, avoiding panic will be a good first step towards figuring out what is going wrong. :)
[08:54] <rogpeppe> jam: indeed
[08:55] <jam> rogpeppe: because I think fwereade_ said jpds managed to get 'juju status' to run and either we still have the log-replay bug, *or* it was still failing on something.
[08:55] <jam> I was very surprised to see the err was a "schema.error_" which sounds like it is coming from somewhere else.
[08:56] <rogpeppe> jam: that panic should have been fixed by the recent ServerError change
[08:56] <rogpeppe> jam: that error comes when a schema doesn't match
[08:59] <fwereade_> jam, it's Tools in particular
[08:59] <fwereade_> jam, rogpeppe: I am relatively unbothered by blocking forever there
[08:59] <fwereade_> jam, rogpeppe: and using the environ in the first place is somewhat suboptimal
[09:00] <rogpeppe> fwereade_: yeah, i was just wondering about that - we're only using for agent-version, right?
[09:00] <rogpeppe> s/using/using it/
[09:00] <fwereade_> jam, rogpeppe: so at some point it will wither away regardless, in favour of a cache of simplestreams data
[09:00] <jam> fwereade_: as in, WatchAPIVersion runs, finds stuff, and then we call FindTools but that blocks until we have creds?
[09:00] <fwereade_> rogpeppe, nope
[09:00] <fwereade_> rogpeppe, it's FindExactTools that's the problem
[09:01] <fwereade_> rogpeppe, I'm 80% sure that we manage to extract agent-version, for the watcher, without creating an environ
[09:01] <fwereade_> jam, yeah
[09:01] <rogpeppe> fwereade_: oh, of course
[09:02] <rogpeppe> fwereade_: i think i agree that we can block forever - it doesn't mean we can't still respond the Upgrader.Kill, as long as we run the Tools call in a new goroutine
[09:02] <rogpeppe> s/respond the/respond to/
[09:03] <jam> fwereade_: well one option would be to add one more api which is "State.DesiredAgentVersion()" rather than having it always be Tools with the URL to get the new tools.
[09:03] <jam> fwereade_: because then it will go quiescent without having to search the provider.
[09:03] <fwereade_> rogpeppe, WaitForEnviron requires a done chan orsomething already iirc
[09:03] <jam> fwereade_: the change that worker/upgrader/upgrader.go did
[09:03] <jam> is that on startup for every agent it must go scan the bucket
[09:03] <fwereade_> jam, what's the utility there?
[09:03] <rogpeppe> fwereade_: but we're talking server side here, right?
[09:04] <jam> I believe we did it that way because it seemed silly to do 2 api requests
[09:04] <jam> but given that 1 is expensive (must read the provider buckets)
[09:04] <jam> and one is cheap (just from the db)
[09:04] <jam> maybe it is worth splitting up
[09:04] <fwereade_> jam, istm that it just complicates things without delivering any benefits?
[09:05] <jam> fwereade_: it very specifically moves us closer to how we used to do it, and would avoid this bug
[09:05] <jam> fwereade_: as the Upgrader would say "what version do you want me to be at?" and the response would match currentTools.version and we would go back to waiting on a change
[09:05] <jam> fwereade_: and we don't need provider creds to get that answer.
[09:06] <jam> rogpeppe: we got Unauthed access again: https://code.launchpad.net/~thumper/juju-core/container-address/+merge/182271
[09:07] <jam> mgz: it has already been approved, but if you could look over https://code.launchpad.net/~thumper/juju-core/container-address/+merge/182271 to see how it fits with your Addresses stuff.
[09:07] <rogpeppe> jam: that is *so weird*
[09:07] <fwereade_> jam, ah, ok... maybe that leads to a nicer server-side implementation too?
[09:07] <jam> rogpeppe: I agree, it is clearly a test that is an issue
[09:07] <jam> fwereade_: it splits the bits nicely
[09:07] <jam> fwereade_: and is probably really easy to implement
[09:08] <fwereade_> jam, ok, consider it blessed, thanks :)
[09:08] <jam> fwereade_: do we have a bug about failing during startup in 1.13.x ?
[09:11] <mgz> jam: is this the one I read through the other day, looking...
[09:11] <jam> fwereade_: also, this probably only happens when you are using a private bucket that requires creds.
[09:11] <jam> mgz: if you read thumper's MP you didn't respond to it
[09:12] <jam> fwereade_: it also means that we don't scan the provider every time *anything* in environconfig changes
[09:13] <jam> since we don't have smarts about just noticing api version changes yet
[09:16] <rogpeppe> jam: i'm not sure i see how DesiredAgentVersion would help
[09:16] <rogpeppe> jam: we'd still need to wait for the environment config to be valid, wouldn't we?
[09:17] <mgz> jam: right, didn't comment, still not completely sure on the implications (but as a hack, it seemed fine)
[09:22] <rogpeppe> afk
[09:35] <jam> rogpeppe: we write AgentVersion into the environment config during bootstrap. We don't yet have *provider* (eg EC2/MaaS/Openstack) credentials.
[09:35] <jam> rogpeppe: the existing Tools() api requires searching the provider.Storage for the binary that matches the desired version
[09:35] <jam> rogpeppe: because it returns the URL to download those tools.
[09:35] <jam> which is a problem during bootstrap because we don't actually have the credentials to search Storage yet.
[09:35] <jam> well, "first boot"
[09:40] <rogpeppe> jam: ah yes, of course
[09:41] <rogpeppe> jam: i think that separating the API calls makes a lot of sense actually
[09:41] <rogpeppe> jam: then the relationship between the version watcher and the API call we're making is obvious
[10:19] <fwereade_> jam, I'm feeling somewhat nauseous and it isn't going away, I'm going to lie down for a while
[10:20] <jam> fwereade_: shame, I'm just about to propose the fix. Feel better soon.
[10:20] <rogpeppe> fwereade_: hope it *does* go away
[10:20] <fwereade_> jam, if I don't come back for the meetings please make my apologies
[10:20] <fwereade_> jam, I will surely at least manage some reviewing later today though
[10:21] <fwereade_> rogpeppe, cheers )
[10:22] <jam> davecheney: if you are around, didn't you have a bug about seeing listing-the-storage-bucket over and over in the logs?
[10:22] <jam> I think my fix will also address that
[10:25] <jam> rogpeppe: if you have time https://codereview.appspot.com/13380043/
[10:33]  * TheMue => lunchtime
[10:49] <rogpeppe> jam: reviewed
[10:51] <jam> rogpeppe: so I was following the example of "tools.Tools" which returns a pointer rather than a struct.
[10:51] <jam> Is there a reason they should be different?
[10:52] <rogpeppe> jam: just convention
[10:52] <rogpeppe> jam: we always pass around *tools.Tools
[10:52] <rogpeppe> jam: and version.Number
[10:53] <rogpeppe> jam: if we want to change it, we should do it consistently across the code base
[10:53] <jam> rogpeppe: what about the API itself then? I think it was using *tools so that it just puts nil there when it say, has an error.
[10:53] <jam> it seems a little strange to serialize a bunch of 0's onto the wire when you have an error.
[10:54] <rogpeppe> jam: omitempty might possibly work
[10:54] <rogpeppe> jam: although it may not work on structs, come to think of it
[10:55] <rogpeppe> jam: i don't mind if the serialisation struct uses a pointer, if that's what we need to lose it from the RPC resulyt
[10:56] <mgz> what is "empty" in the context of a struct... can you test for nilledness easily with go?
[10:56] <mgz> zeroedness actually I guess
[10:56] <jam> mgz: foo == Foo{}
[10:56] <rogpeppe> mgz: yeah
[10:56] <rogpeppe> mgz: but it's not that easy to check for using reflection, so it's quite probably it doesn't do that
[10:57] <rogpeppe> probable even
[10:57] <mgz> that makes sense
[10:58] <rogpeppe> mgz: yeah, it doesn't do that
[11:06] <mgz> jam: any ideas why sync-tools --source is trying to list tools before doing anything?

[11:06] <mgz> on canonistack, so may involve simplestreams...
[11:06] <jam> mgz: as in why it would be listing ~/go/bin or listing something upsteram?
[11:06] <jam> it normally lists the target tools
[11:06] <jam> so it can find what tools need to be copied
[11:06] <mgz> it's trying to list a swift container. it should just splat up the local stuff, no?
[11:07] <jam> mgz: not when you already have stuff
[11:07] <jam> it is "put stuff I don't have"
[11:07] <jam> not "overwrite everything"
[11:07] <jam> mgz: that way if I get interupted after copying 3 things, I don't start from scratch again.
[11:07] <mgz> but... this is then a catch-22. have no tools, can't upload my local tools, because I have not tools?
[11:07] <jam> not that it should fail if it can't find any target tools
[11:07] <jam> mgz: I think you are misreading it. I think it is failing because it can't find any *source* tools to copy
[11:08] <mgz> jam: that dir certainly contains jujud
[11:09] <mgz> all: will miss standup, things to mention, filed a selection of goose bugs about issues related to rackspace
[11:09] <mgz> and er... I'm being prodded
[11:11] <jam> mgz: tools are usually a foo-series-arch-number.tar.gz sort of thing
[11:16] <arosales> jam, https://bugs.launchpad.net/juju-core/+bug/1216768
[11:16] <_mup_> Bug #1216768: Azure provider: Authentication error when using public tools <juju-core:Fix Committed by axwalk> <https://launchpad.net/bugs/1216768>
[11:32] <jam> rogpeppe: https://plus.google.com/hangouts/_/f497381ca4d154890227b3b35a85a985b894b471 standup?
[11:49] <arosales> jam, https://bugs.launchpad.net/juju-core/+bug/1218329 is the other Azure bug needed.
[11:49] <_mup_> Bug #1218329: Update default environment.yaml for Azure to use Precise for default-series <juju-core:New> <https://launchpad.net/bugs/1218329>
[11:50] <arosales> jam what method would you suggest to test trunk with a public tools? sync-tools to upload the current trunk tools to a public bucket?
[11:50] <jam> arosales: you can create a different "public" bucket and sync-tools into it, I think.
[11:51] <jam> and then configure you environment to treat that as the actual public bucket.
[11:52] <arosales> jam, ok I'll give that a try to day thanks.
[11:52] <jam> arosales: for bug #1218329 you've confirmed we can switch to precise? because it is pretty trivial to change that one line.
[11:52] <_mup_> Bug #1218329: Update default environment.yaml for Azure to use Precise for default-series <juju-core:New> <https://launchpad.net/bugs/1218329>
[11:53] <arosales> jam, the other point I need to confirm is the image-stream.
[11:53] <arosales> I need to confirm if we are calling the latest precise images with the fix as "released" or "daily" in simple streams and in Azure publication.
[11:53] <arosales> jam, I also added a comment with that.
[12:07] <jam> rogpeppe: I think I responded to all of your requests: https://codereview.appspot.com/13380043/
[12:08] <jam> arosales: I won't be around tomorrow to make sure azure things are addressed before the cut of the release (usually happens on the weeked). You can probably poke some of the people who are in Europe during your morning (mgz, fwereade_, allenap come to mind)
[12:09] <jam> hey mramm, we didn't think we'd see you today
[12:09] <mramm> jam: I had the overnight flight to london
[12:09] <mramm> just checking in at the hotel
[12:09] <arosales> jam, I'll sync with davecheney and fwereade_  on it if a release is targetted this week
[12:12] <jam> arosales: I fully expect at least 1.13.3 to be out this weekend. And we'd like to release it, test it, and possible directly upgrade it to 1.14
[12:14] <arosales> jam, ok, and if a stable release causes a lot of pain and you just go with devel just let me know, for azure documentation set up purposes.
[12:18] <rogpeppe> jam: reviewed
[12:40] <TheMue> jam: due to doc sprint tomorrow and now having discovered that it's a deeper goose change too i would like to see lp:1202163 reassigned to one of the goose team. what do you say?
[12:42] <rogpeppe> jam, fwereade_: state.RelationUnitsWatcher doesn't seem to have any tests at all. do you know what's going on there?
[12:50] <rogpeppe> hmm, looks like it was deleted in https://codereview.appspot.com/7198051/
[12:54] <rogpeppe> https://bugs.launchpad.net/juju-core/+bug/1218362
[12:54] <_mup_> Bug #1218362: state.RelationUnitWatcher is not tested <tech-debt> <juju-core:New> <https://launchpad.net/bugs/1218362>
[13:36]  * rogpeppe just ran the coverage test tool for the first time
[13:36] <rogpeppe> it works well
[13:36] <natefinch> nice
[13:36] <rogpeppe> 85% coverage of the state package
[13:36] <natefinch> hey, 85% is really good
[13:36] <rogpeppe> natefinch: it would be 86.9% if we actually tested RelationUnitWatcher
[13:37] <rogpeppe> aw, "cannot use test profile flag with multiple packages"
[13:37] <natefinch> rogpeppe: heh. I don't put a ton of stock in coverage, since covered doesn't necessarily mean tested... but not covered is definitely not tested
[13:37] <rogpeppe> natefinch: that's my thought
[13:38] <rogpeppe> natefinch: i was wondering about things that would have made it obvious that we'd lost test coverage
[13:38] <natefinch> rogpeppe: so, only 15% definitely not tested, and 85% that is at least exercised, so that's pretty great
[13:38] <rogpeppe> natefinch: (as happened in this CL in january: https://codereview.appspot.com/7198051/ )
[13:39] <natefinch> rogpeppe: yeah, detecting when we lose coverage is a good idea
[14:01] <rogpeppe> jam: if you're still around, this fixes the watcher panic: https://codereview.appspot.com/13386044/
[14:01] <rogpeppe> fwereade_, natefinch, mgz: reviews appreciated
[14:02]  * rogpeppe goes for some lunch
[14:03] <natefinch> rogpeppe: I'll take a look
[15:21] <arosales> any juju core folks around to attend vUDS session: http://summit.ubuntu.com/uds-1308/meeting/21899/servercloud-s-juju-new-user-ux/
[15:22] <arosales> rogpeppe,  ^
[15:22]  * arosales was going to bother fwereade but I don't see him online atm
[15:23] <rogpeppe> arosales: he's sick atm, i think
[15:23] <arosales> ah, sorry to hear
[15:23] <rogpeppe> arosales: what time zone is that time in?
[15:23] <arosales> rogpeppe, utc
[15:23] <arosales> so roughly in half hour
[15:23] <rogpeppe> arosales: ok, i'll be there
[15:24] <arosales> rogpeppe, much appreciated, thank you
[15:28] <rogpeppe> arosales: do i have to register as attending to join?
[15:28] <arosales> rogpeppe, I don't think so
[15:28] <arosales> rogpeppe, I'll post the hangout url in the channel here in a bit, and we can take it from there.
[15:29] <rogpeppe> arosales: ah, it's a hangout - i was wondering where the video was
[15:29] <rogpeppe> arosales: thanks
[15:30] <arosales> rogpeppe, yup live hangout, but I haven't started it just yet
[15:30] <arosales> still have about 20 minutes
[15:47] <rogpeppe> natefinch: i've addressed your concerns, i hope: https://codereview.appspot.com/13386044
[15:48] <rogpeppe> natefinch: or replied, at any rate :-)
[15:57] <natefinch> rogpeppe: cool, I'll take a look.  Would like someone more familiar with the problem to give it the lgtm if possible, though
[15:58] <rogpeppe> natefinch: yeah, i guess i'll have to wait until tomorrow unless mgz or jam are around
[16:00] <mgz> the watcher change makes some sense to me, but I'm not sure we have more qualified than rog on the problem :)
[16:01] <mgz> what's the implication of a one-value return from a channel, rather than the `, ok` form?
[16:09] <natefinch> rogpeppe: it sorta bugs me that Watcher is an interface without a Watch() method :p
[16:09] <rogpeppe> please think of a better name!
[16:09] <rogpeppe> natefinch: ^
[16:12] <natefinch> rogpeppe: I would almost call it Stopper or Ender... it's the  Changes() method that really makes a watcher a watcher
[16:14] <rogpeppe> natefinch: i originally thought about Worker
[16:14] <natefinch> rogpeppe: yeah, I was thinking something like that originally
[16:16] <natefinch> arosales: you should email the team ahead of time so we can plan on being around for these meetings :)
[16:17] <arosales> natefinch, you guys didn't know about vUDS?
[16:18] <arosales> natefinch, but noted I don't think I included juju-core folks in my uds reminders
[16:18] <natefinch> arosales: reminders are much appreciated :) Also, I'm new, so I might not know things I'm supposed to know :)
[16:18] <arosales> natefinch, for sure you get a pass :-)
[16:19] <arosales> but these other veterans  . . . .
[16:19] <arosales> natefinch, but I agree with you on reminders. I'll note for next time bother the juju-core folks :-)
[17:39] <weblife> Juan Negron in here?
[18:27]  * rogpeppe has reached eod
[18:27] <rogpeppe> g'night all
[21:31] <thumper> morning folks
[23:18]  * thumper is at that WTF moment debugging
[23:24]  * thumper grunts...
[23:37] <thumper> found the source of my confusion
[23:37] <thumper> simplified to this:
[23:37] <thumper>     c.Assert(obtained, gc.DeepEquals, expected)
[23:37] <thumper> ... obtained []uint8 = []byte{}
[23:37] <thumper> ... expected []uint8 = []byte{}
[23:39] <thumper> bigjools: the magic of gocheck  :)
[23:39]  * thumper fixes 
[23:40] <bigjools> thumper: wtf
[23:40] <thumper> bigjools: I know, right?
[23:40] <thumper> had me scratching my head for quite a while
[23:40] <bigjools> what is it hiding?
[23:40] <thumper> it is outputting something incorrectly
[23:40] <thumper> here are the two lines prior
[23:41] <thumper> 	obtained := []byte{}
[23:41] <thumper> 	var expected []byte
[23:41] <thumper> obtained is an empty slice
[23:41] <thumper> expected is a nil slice
[23:45] <davecheney> thumper: we need a SliceEquals checker
[23:45] <thumper> what is shown here is just a part of the problem
[23:45] <thumper> I'm checking an entire structure
[23:45] <thumper> using deep equals
[23:45] <thumper> the slice is just one part
[23:47] <thumper> a key part of the problem is that it isn't showing []byte{nil}
[23:47] <thumper> which is what *should* be shown for a nil slicke
[23:47] <thumper> slice
[23:47] <thumper> that would have made the problem completely obvious
[23:47] <thumper> instead of hiding it
[23:48] <thumper> oh...
[23:48] <thumper> check this out:
[23:48] <thumper> http://play.golang.org/p/LRDkBszMNa
[23:48] <thumper> vs http://play.golang.org/p/iO2wpSUxeO
[23:49] <thumper> a nil string slice is output as []string(nil), but a nil byte slice is shown as []byte{}
[23:49] <thumper> however, a nil byte slice does not equal an empty byte slice
[23:50] <thumper> davecheney: go bug?
[23:53] <davecheney> nope
[23:54]  * bigjools boggles
[23:56] <thumper> davecheney: why?
[23:56] <davecheney> otp