[01:50] <axw> wallyworld: in case you're wondering why I didn't land the simplestreams changes yet, I found a bunch of tests I forgot to update :)
[01:51] <wallyworld> ah np :-)
[01:59]  * thumper -> school run
[02:51] <thumper> wallyworld: please don't remove the names when done
[02:52] <wallyworld> thumper: ok, i saw some names had already been removed for things besides mine so i thought i'd tidy it up
[02:52] <wallyworld> looks neater :-)
[02:52]  * thumper fixes
[02:52] <thumper> it isn't how its doen
[02:52] <wallyworld> do we care about the names?
[02:52] <thumper> done
[02:52] <wallyworld> once done
[04:48] <axw> jam: I had it as two tests, but changed it because I keep getting told to ;)   I agree - will change it back to two
[04:48] <jam> axw: having worked through the Uniter tests, I find them very hard to debug when things go wrong
[04:48] <jam> because the Nth item in the test is failing
[04:49] <jam> and the log is 1000 lines long
[04:49] <axw> yeah, I find this problematic too
[05:05] <jam> wallyworld: poke
[05:06] <jam> for https://code.launchpad.net/~jameinel/juju-core/faster-passwords/+merge/193667 I realized that if an agent logs in with the "slow" hash, we can just rewrite it there to the fast one
[05:07] <jam> (or if we want something with salt, etc, etc)
[05:33] <wallyworld> jam: hi
[05:34] <jam> hey wallyworld
[05:34] <wallyworld> you want me to look at that review again?
[05:41] <wallyworld> jam: so did you need me to do anything re: the above poke?
[05:42] <jam> wallyworld: Sorry, I haven't finished responding to all the feedback, but I wanted to ask if a change seemed reasonable.
[05:42] <jam> Namely
[05:42] <jam> when running entity.PasswordValid
[05:42] <jam> if we see that the PasswordHash in the DB is the old form
[05:42] <jam> just rewrite the DB to the new form
[05:42] <jam> we can trivially compute the hash
[05:42] <jam> because the agents always just pass in the full password
[05:43] <wallyworld> right, i thought you were looking to do that. seems reasonable to me, since cost is trivial and it will incrementally upgrade the db
[05:44] <jam> wallyworld: what about "salt" for UserPasswords
[05:44] <jam> sounds like fwereade would like to see that added
[05:44] <jam> doesn't seem hard, though it means adding another field to the DB
[05:44] <wallyworld> that makes sense to me too - o'd personally feel better with it
[05:44] <wallyworld> i'd
[05:44] <jam> wallyworld: is it worth doing for agent passwords?
[05:45] <wallyworld> that's a harder question
[05:45] <jam> also, is it worth doing something like a len(password) >= 18 for the fast version?
[05:45] <jam> wallyworld: so "worth it" is just in the "so the code paths are similar"
[05:45] <jam> I'm 99% sure it isn't worth it from an actual increased security
[05:45] <wallyworld> i'd like that since were relying on long enough password = hard enough to brute force
[05:45] <jam> (len(password) maybe)
[05:46] <wallyworld> does the salt for agent passwords add any tangible benefit?
[05:46] <wallyworld> cf the extra complexity
[05:47] <wallyworld> i guess if the code is the same anyways....
[05:47] <jam> wallyworld: no
[05:47] <jam> salt is a "prevent someone from precomputing 1B password hashes"
[05:47] <jam> of known user-likely passwords
[05:47] <jam> the whole point is we don't have known user-passwords for agents
[05:47] <wallyworld> yeah
[05:47] <wallyworld> so i wouldn't do it
[05:48] <wallyworld> we could still use same code
[05:48] <wallyworld> ie look up salt and use it if there
[05:48] <wallyworld> i think mongo returns empty for non existent fields
[05:49] <jam> wallyworld: I'm sure we can tell and be compatible
[05:50] <wallyworld> so then, add salt for user passwords, check length of agent passwords, rewrite out of date hashes
[05:50] <wallyworld> do we pass password over the wire in plain text?
[05:50] <wallyworld> i guess we do?
[05:51] <jam> wallyworld: yes. though we have a TLS connection by that point
[05:51] <wallyworld> ok
[05:51] <jam> wallyworld: I know fwereade also talked about using CA signed client certs for agents
[05:51] <jam> because that also helps in the case of "recovery" mod.
[05:51] <jam> mode
[05:52] <wallyworld> ok
[05:52] <jam> but you'll still want *some* sort of user identity token/password/thingy
[05:52] <wallyworld> yeah
[05:52] <jam> because you don't want machine-1 agent pretending to be machine-0
[05:52] <jam> since machine-0 gets all the passwords
[05:52] <wallyworld> yep :-)
[05:52] <wallyworld> or machine-N
[05:52] <jam> right
[05:53] <wallyworld> where N is a HA state server
[05:53] <jam> wallyworld: yeah, I think recovery will need some thought about security
[05:53] <wallyworld> indeed
[05:53] <jam> one can argue the attack surface is minimized by requiring a user to engage the mode
[05:53] <jam> and it could even require Admin registration sort of thing
[05:54] <jam> I don't know how much we want to automate all of recovery
[05:54] <jam> so EOUTOFSCOPE for now :)
[05:54] <wallyworld> yep
[05:54] <wallyworld> there's lots of prior art for this sort of thing too i think
[05:54] <wallyworld> let's not reinvent the wheel
[06:02] <jam> wallyworld: oh, the other bit about requiring min length of Agent passwords
[06:02] <jam> is it is going to disrupt the test suite a lot
[06:02] <jam> because we have tests that set the password for machine to "test-password"
[06:02] <jam> which is a lot less than the 24 bytes we normally have
[06:02] <wallyworld> s/test-password/test-password1234567890 :-)
[06:03] <jam> wallyworld: well that, and it *might* bite us in backwards compatibility mode
[06:03] <jam> since we can't change the actual password
[06:03] <jam> we can change what we *store*
[06:03] <jam> but we don't have a "that Login is valid, but you need to create a new Password now"
[06:04] <wallyworld> true
[06:04] <jam> utils.RandomPassword has generated 24-byte passwords for a long time now, though
[06:04] <wallyworld> i reckon it's worth trying
[08:47] <rogpeppe> mornin' all
[08:51] <rogpeppe> fwereade: ping
[08:51] <axw> morning rogpeppe
[08:51] <rogpeppe> axw: yo!
[08:55] <axw> rogpeppe: should everything prefer to use state.Machine.Addresses() rather than go to environs.Environ.WaitDNSName()?
[08:55] <axw> I'm updating juju ssh to use the API; it uses WaitDNSName currently
[08:55] <rogpeppe> axw: yes, it should definitely use state.Machine.Addresses
[08:55] <axw> ok
[08:56] <rogpeppe> axw: or Unit.Address(es?) when appropriate
[08:56] <axw> yup
[08:56] <axw> cool
[08:56] <axw> rogpeppe: main problem now is that NewAPIConn doesn't pass secrets... I guess I'll do that now
[08:57] <rogpeppe> axw: ah yes, that definitely needs to happen
[08:57] <rogpeppe> axw: you mean, it doesn't push secrets if it's the first connection, right?
[08:57] <axw> rogpeppe: yup
[08:57] <axw> there's a TODO
[08:59] <rogpeppe> axw: ha, the TODO just above it is very stale...
[08:59] <rogpeppe> axw: i'm just wondering if there's a nicer way to do secret pushing that doesn't incur an extra round trip
[09:00] <axw> rogpeppe: the API server could return an error that says "I haven't got my secrets yet"? and then we push and retry?
[09:00] <rogpeppe> axw: something a little like that, yes
[09:01] <TheMue> morning
[09:01] <axw> morning TheMue
[09:01]  * TheMue fights with a mail backlog of one week :)
[09:04] <rogpeppe> TheMue: hiya
[09:04] <TheMue> hiya axw and rogpeppe
[09:05] <rogpeppe> axw: one alternative i'm considering is that the Login response, rather than failing, returns a "lacking secrets" status
[09:06] <rogpeppe> axw: that can be cached locally in the api.State and queried.
[09:06] <axw> rogpeppe: ah ok. I don't know the internals well to know how separated they are...
[09:06] <axw> sounds sensible
[09:07] <axw> so instead of a GetEnvironment/SetEnvironment, it'd just push secrets during login
[09:07] <rogpeppe> axw: yes
[09:08] <rogpeppe> axw: so we might add an environ config argument to api.Open
[09:08] <rogpeppe> axw: which is allowed to be nil, but if it is, then the connection will fail if it's the first API connection
[09:09] <rogpeppe> axw: in that case in fact it might work well to have Login fail
[09:10] <rogpeppe> axw: hmm, not sure though
[09:10] <rogpeppe> axw: depends whether we want the login message to contain the environ config
[09:12] <axw> rogpeppe: if it's just the first connection, why not just have the login proceed, and then have the server request the secrets (via a special error)?
[09:12] <axw> then a second message
[09:12] <axw> it's only once
[09:13] <rogpeppe> axw: so the error implies "login has actually succeeded (despite the error) but secrets are needed" ?
[09:14] <axw> rogpeppe: yeah. perhaps confusing, but that's one option anyway :)
[09:14] <rogpeppe> axw: i definitely don't mind a second message to push the secrets
[09:18] <rogpeppe> axw: one question that arises from this: is there ever a case where we want to allow some request to the API server *without* pushing the admin secrets?
[09:18] <rogpeppe> axw: because if we push environ config with Login, that will be ruled out
[09:18] <rogpeppe> axw: but that might well be a good thing
[09:19] <rogpeppe> axw: because then there's no way that any client can do anything at all with an environment with no secrets
[09:20] <axw> hmm
[09:20] <rogpeppe> axw: the other thing that i'm thinking about is how does the server know when secrets have been pushed
[09:23] <rogpeppe> axw: the most straightforward approach is simply to get the environment config and see if there are any secrets in it
[09:23] <rogpeppe> axw: but perhaps there might be an environment that has no secrets
[09:23] <rogpeppe> axw: ha, that's actually not a problem, i realise
[09:23] <axw> rogpeppe: I thought the idea was an environ's config must be invalid if it doesn't have its secrets
[09:23] <rogpeppe> axw: yeah, it is
[09:23] <rogpeppe> axw: but we don't even need to create the Environ
[09:23] <axw> rogpeppe: as for allowing no secrets... sounds preferable to require them always, but I don't know if there's a case or not
[09:23] <rogpeppe> axw: because we've got EnvironProvider.SecretAttrs
[09:23] <rogpeppe> axw: so if that returns nothing, we know that we don't require secrets to be pushed
[09:23] <axw> ah yeah, I see. then we can distinguish an invalid env from one with no secrets
[09:24] <rogpeppe> axw: yeah
[09:24] <rogpeppe> axw: although...
[09:24] <rogpeppe> axw: perhaps it might be a good plan to actually validate the environ
[09:24] <rogpeppe> axw: something we can't do currently
[09:25] <axw> can't?
[09:25] <axw> rogpeppe: why can't we?
[09:26] <rogpeppe> axw: because clients talk directly to mongo
[09:26] <rogpeppe> axw: so there's nothing stopping a dodgy client pushing a bad environ config
[09:27] <axw> ah ok, I see
[09:27] <axw> there's no good time to do it currently
[09:28] <rogpeppe> axw: the other thing that occurs to me is that we could cache "secrets pushed" in the apiserver.Server (because it can only go from false to true), meaning that any api server would only need to check once
[09:28] <rogpeppe> axw: but that's just an optimisation (but one that's not possible if the secrets checking is done client-side)
[09:30] <axw> rogpeppe: is all of this going to break the GUI horribly?
[09:30] <rogpeppe> axw: i don't think so
[09:30] <rogpeppe> axw: because AFAIK the GUI can't currently make the first connection anyway
[09:30] <rogpeppe> axw: and the Login call can be changed in a totally backwardly compatible way
[09:31] <axw> cool
[09:38] <rogpeppe> axw: so to summarise, how does this sound? http://paste.ubuntu.com/6363731/
[09:39] <axw> rogpeppe: sounds great.
[09:41] <axw> rogpeppe: are you planning to do this yourself, or are you working on other things?
[09:41] <rogpeppe> axw: i'm currently oriented more towards the HA stuff - if you feel like doing this, it would be great.
[09:42] <axw> sure, I will look into it (probably in the morning)
[09:42] <jam> axw: just to let you know, I'm currently poking a lot of stuff underneath login (PasswordHash) stuff
[09:42] <jam> it probably won't conflict, but you might want to wait a sec on it
[09:42] <axw> jam: ok no worries
[09:42] <rogpeppe> jam, fwereade: how does the above plan look to you?
[09:42] <axw> thanks
[09:43] <jam> rogpeppe: I *think* it is all unnecessary. thumper was quite keen on changing "juju bootstrap" to wait until it can connect to the API server
[09:43] <jam> in which case
[09:43] <jam> bootstrap does all the work
[09:43] <jam> and then we don't have to do it for every API connection.
[09:43] <rogpeppe> jam: i don't think that's viable
[09:43] <jam> rogpeppe: because ?
[09:44] <rogpeppe> jam: what happens if someone interrupts "juju bootstrap" ?
[09:44] <jam> rogpeppe: they have to start over
[09:44] <jam> they don't have more than 1 machine at that point
[09:44] <jam> so we aren't destroying an environment that is well set up anyway
[09:44] <jam> Or we allow "juju bootstrap" to start where it left off
[09:45] <fwereade> rogpeppe, jam: the plan was to catch interrupts and takes the machine down if it's interrupted
[09:46] <rogpeppe> fwereade: i'm not sure that's great actually - what if the network is down? does that mean you can't interrupt bootstrap?
[09:46] <fwereade> rogpeppe, jam: blocking bootstrap actually has a lot of advantages -- no silly secrets dance, ability to create storage in the environment instead of the provider
[09:47] <fwereade> rogpeppe, it just fails
[09:47] <jam> fwereade: I'm a big fan, plus the fact you can give the user feedback about how far it gets
[09:47] <fwereade> rogpeppe, don't think it's any worse than the network going down during a normal bootstrap
[09:47] <fwereade> jam, rogpeppe: indeed, useful feedback during bootstrap is also awesome
[09:47] <jam> rather than trying to do that at every "juju status" or "juju deploy" or ... etc
[09:48] <fwereade> jam, in a sense that's just an extension of the secrets dance, but yeah, would be good to drop it entirely, no argument
[09:50] <rogpeppe> fwereade: FWIW i think we can create storage in the environment instead of the provider anyway, can't we?
[09:50] <fwereade> jam, rogpeppe: the question is *when* thumper is likely to do this, because we need some solution for the cli-api work
[09:51] <jam> fwereade, rogpeppe: going back to another discussion, I'm going back to the PasswordHash stuff, and splitting it into a UserPasswordHash(password, salt) and AgentPasswordHash(password)
[09:51] <jam> where we allow CompatPasswordHash, but if that succeeds, we then change the DB to set it to the new methods
[09:51] <jam> fwereade: well, 'juju status' and 'juju deploy' are going to be some of the last steps we actually finish :)
[09:51] <fwereade> rogpeppe, well, we do, for the manual provider -- but that's a blocking bootstrap ;)
[09:51] <jam> we can set someone on it, even if it isn't thumper
[09:52] <rogpeppe> jam: i don't think you can salt user passwords until the entire CLI is API, can you?
[09:53] <fwereade> jam, heh, axw springs to mind given that he did the manual stuff -- it's just "make everything else work like manual bootstrap" :)
[09:53] <axw> heh
[09:53] <fwereade> jam, axw: modulo *also* needing provider storage -- or some alternative mechanism -- to store the bootstrap info
[09:53] <jam> rogpeppe: so we don't salt the Mongo password, so I don't think that actually changes, it is just when someone *does* connect via the API, we look up the hash + salt.
[09:54] <axw> jam, fwereade: per my email before, lack of secrets via API kinda blocks work I'm doing
[09:55] <axw> I can move onto destroy-environment maybe
[09:55] <axw> but otherwise, I could look at the secrets/bootstrap business
[09:55] <rogpeppe> jam: currently we do hash the mongo password, but i guess we could use a known salt for that
[09:56] <rogpeppe> jam: (in fact that's what we do currently)
[09:56] <jam> rogpeppe: CompatPasswordHash() uses the same UserPasswordHash(password, FIXEDSALT)
[09:58] <rogpeppe> jam: seems reasonable
[09:59] <rogpeppe> jam: while we're about it, can we increase the password strength?
[09:59] <jam> rogpeppe: 18-bytes of entropy is about 2^53 or so. We could easily go up to 24 (and get 32-byte base64 passwords). which gets us up into 2**72.
[10:00] <jam> I may be wrong on the exact values
[10:00] <jam> but < 2^64 today, and >2^64 with a size bump.
[10:00] <jam> rogpeppe: the code itself says "we stick to 18-bytes because mongo uses md5sum anyway"
[10:00] <rogpeppe> jam: i'm thinking we could usually use 256-bit random passwords and hashes
[10:01] <rogpeppe> s/usually/usefully/
[10:01] <jam> rogpeppe: I honestly don't think that improves our actual security, but yes, we could make it really big.
[10:02] <jam> #1 mongo is still the most critical part
[10:02] <jam> as getting *that* password gives you everything
[10:02] <rogpeppe> jam: yeah, but cracking the user password might give you access to other environments
[10:02] <jam> rogpeppe: doesn't matter, we don't set the user password
[10:03] <jam> and these passwords that we are generating are only good for a given agent
[10:03] <jam> so no leakage
[10:04] <jam> we're using sha512 as our hash, so we have room internally
[10:04] <jam> if mongo is using md5sum
[10:04] <jam> then that would be 128 bits
[10:05] <jam> but 18 bytes of entropy = 2^144 (i was doing the math wrong before)
[10:05] <jam> so we're already better than md5
[10:06] <rogpeppe> yeah, i thought your numbers looked weird (i thought you perhaps meant 10^53)
[10:07] <jam> rogpeppe: I was doing 8 bytes == 8^8 rather than 256^8
[10:08] <rogpeppe> jam: or 2^(8 * 8)
[10:08] <rogpeppe> jam: (easier just to work in bits, i reckon)
[10:09] <jam> rogpeppe: so because we take the raw bits and put it into base64 encoding, the useful bits are 18 bytes, 24 bytes and 30 bytes
[10:09] <jam> since those leave us with a base64 password that doesn't have '=' padding.
[10:09] <rogpeppe> jam: tbh a 144 bit random password is probably ample
[10:10] <jam> rogpeppe: for our attack surface I think it is more than ample myself
[10:10] <rogpeppe> jam: that's not gonna be the way that someone breaks into our system
[10:10] <jam> rogpeppe: yeah, I was considering it when my math was bad, because 2^64 isn't great security. but 2^144 is perfectly fine
[10:11] <rogpeppe> jam: yeah
[10:37] <rogpeppe> fwereade: are you around for a chat about the HA stuff?
[10:37] <rogpeppe> fwereade: ah, it's standup in a mo actually
[10:46] <dimitern> rogpeppe, mgz, fwereade: standup
[10:46] <mgz> ta
[10:46] <jam> fwereade: standup?
[10:46] <jam> fwereade: https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig
[11:50]  * TheMue => lunch
[12:32] <tasdomas> does juju support tokens in config.yaml? I.e. in cases where one value is dependant on another one (like base path and subfolders)?
[12:38] <mgz> tasdomas: nope
[12:38] <tasdomas> mgz, right, thanks
[14:16] <abentley> sinzui: did thumper fill you in on the lxc/local-provider developments?
[14:16] <sinzui> No, but I saw the branch merge
[14:17] <abentley> sinzui: I did a test run after the branch merged and the local provider still failed, but I did not have time to check whether it was the same issue as before.
[14:17] <sinzui> abentley, are you using mysql in the test?
[14:17] <abentley> sinzui: Yes.
[14:18] <jam> rogpeppe1: fwereade: https://code.launchpad.net/~jameinel/juju-core/faster-passwords/+merge/193667 has been updated. It now sets a Salt for User passwords and uses clearly denoted AgentPasswordHash vs UserPasswordHash vs CompatPasswordHash
[14:19] <rogpeppe1> jam: looking
[14:19] <jam> I haven't had a chance to test live upgrades, but I have every belief things will JustWork
[14:22] <rogpeppe1> jam: one thing that occurs to me as *potentially* useful in the future, if we have many "users" that are actually agents, is that if we've generated the admin secret automatically (i.e. it's got lots of entropy) we could eschew the salting.
[14:22] <rogpeppe1> jam: something to think about for the future, perhaps
[14:24] <jam> rogpeppe1: well, eventually we'll get real users
[14:24] <jam> we do often generate admin-secret today
[14:24] <rogpeppe1> jam: indeed.
[14:24] <jam> but, meh, salting is cheap, I'm only looking to change this stuff for the AgentPasswordHash changes
[14:25] <rogpeppe1> jam: salting is cheap, but UserPasswordHash is not. at some point in the future, we *might* come to a situtation where we've got many agents (the GUI is one example) that reconnect when an API server goes down, and hence use lots of CPU resources when doing so
[14:26] <rogpeppe1> jam: so, i guess i'm not really talking about the salting per se
[14:26] <rogpeppe1> jam: anyway, it was just a thought that occurred to me; ignore me :-)
[14:28] <abentley> sinzui: my test was invalid, because it started with 1.16.x, which isn't expected to have a fix yet.
[14:28] <sinzui> ah
[14:29] <sinzui> abentley, we could add stable branches to the test? juju/1.16 will build a 1.16.3 client and server
[14:30] <abentley> sinzui: Certainly.  Did that temporarily for thumper yesterday.
[14:32] <abentley> sinzui: Are you in the stand-up?  I switched my urls around.
[14:33] <rogpeppe1> jam: why is environs/cloudinit.go using CompatPasswordHash ?
[14:34] <jam> rogpeppe1: that is the old "use the hashed password until we can use the real one"
[14:34] <sinzui> abentley g+ is asking me to juggle three identities
[14:34] <abentley> sinzui: Fun.
[14:34] <rogpeppe1> jam: why can't we use a salted password there too?
[14:35] <jam> rogpeppe1: that would require changing what we pass to cloud-init, I think
[14:35] <rogpeppe1> jam: (after all, it's actually one of the most insecure places that the admin password is kept)
[14:35] <rogpeppe1> jam: yes, it would.
[14:35] <rogpeppe1> jam: is that a problem?
[14:35] <jam> which is something I wasn't as comfortable with because it isn't hidden behind the api
[14:36] <rogpeppe1> jam: i'm not that comfortable seeing "Compat"PasswordHash being used in a place where it looks like it will not be deprecated.
[14:36] <jam> rogpeppe1: regardless, the data we write to cloud init gets rewritten anyway,
[14:36] <rogpeppe1> jam: how do you mean?
[14:36] <jam> rogpeppe1: I'm fine changing the name back, or having multiple names for the same thing.
[14:36] <jam> rogpeppe1: once an agent is up, it resets its password
[14:37] <jam> so it won't match what is in cloud-init
[14:37] <jam> and bootstrap changes the admin password back to the real password rather than the hashed password
[14:37] <rogpeppe1> jam: but the admin password is still hashed in cloud-init, no?
[14:38] <rogpeppe1> jam: so someone that gets access to the cloud-init data (probably not too hard) can still brute-force the non-salted admin password AFAICS
[14:38] <rogpeppe1> jam: if we *are* going to salt user passwords, i think that's probably one of the most important places to do it
[14:40] <natefinch> rogpeppe1, jam:  any place we *can* salt passwords, we  should.  It's not computationally expensive, and even if we're not too worried about that vector of attack, it certainly can't hurt.
[14:40] <jam> rogpeppe1: so I dont think anything I've done precludes us adding salt there, and I think bootstrap is particularly a place where it is easy to break compatibility accidentally
[14:44] <rogpeppe1> jam: yeah, it *would* mean you couldn't use a new juju to bootstrap with old tools
[14:45] <rogpeppe1> jam: but perhaps you could change the occurrences of CompatPasswordHash to call UserPasswordHash with the known constant salt - then it's more obvious what's going on, perhaps. And a TODO in the code would be nice too.
[15:19] <rogpeppe1> jam: you have a review
[15:21] <abentley> sinzui: As we move to have more sets tests running, we'll want to have multiple versions of juju running concurrently.  Which makes me think we need to chroot (not lxc because that would break local provider).
[15:23] <sinzui> abentley, I understand
[15:39] <tasdomas> I am getting a strange panic when running go test on juju-core/worker/uniter
[15:39] <tasdomas> http://pastebin.com/ER6GUuza
[15:40] <tasdomas> (this is juju-core trunk)
[15:42] <jcsackett> sinzui: just realized what time it is. are we 1x1ing?
[15:44] <sinzui> jcsackett, sorry, had another meeting
[16:03] <abentley> sinzui: AFAICT, it's impossible to run "make clean", because "go list -e -f '{{.Dir}}' launchpad.net/juju-core" doesn't find anything.  I could do "bzr clean-tree --unknown --ignored"
[16:15] <sinzui> abentley, the make-recipe-and-package script creates a new directory with the revision in the name, so I don't understand how we could be reusing a built tree.
[16:17] <abentley> sinzui: Yesterday, I was testing 1.16 in the tree normally used for trunk.  With bad luck, a 1.16 revno could match a trunk revno.
[16:18] <sinzui> ah@
[16:18] <sinzui> abentley, I have experienced that just after I created the 1.16 branch
[16:20] <abentley> sinzui: I also trigger the tests manually, without waiting for the revno to update, but that's probably less of an issue.
[16:23] <bac> jcsackett: are you around?  can we talk in a bit?
[16:47] <abentley> sinzui: I'm adding the clean-tree anyway to reclaim disk space.
[16:47] <sinzui> Yay
[17:53] <jcsackett> bac: sorry, i missed your message earlier. i can chat now, if you like.
[18:01] <bac> hi jcsackett
[18:01] <bac> now is good
[18:02] <jcsackett>   bac: g+?
[18:02] <bac> jcsackett: https://plus.google.com/hangouts/_/72cpjm0fnhduq36l7pim15v0mk?hl=en
[18:22] <jcsackett> sinzui: can you join in on the above g+? ^
[18:29] <jcsackett> sinzui: nm.
[18:40]  * rogpeppe1 is done for the day.
[18:41] <rogpeppe1> g'night all
[19:37] <bac> sinzui: would you have a moment to review my one-line migration script?  i'll get someone else to look at the rest.  https://codereview.appspot.com/21790045
[20:06] <jcsackett> sinzui: can you look at https://code.launchpad.net/~jcsackett/charmworld/rollback-422/+merge/193999 today?
[20:33] <sinzui> bac, jcsackett I can start the reviews now
[20:33] <jcsackett> sinzui: awesome, thanks.
[20:33] <bac> sinzui: please do jc's first
[20:42] <sinzui> jcsackett, r=me
[20:43] <jcsackett> sinzui: thanks.
[20:45] <jcsackett> bac: i'll ping you when i've qa'ed it on staging.
[20:45] <thumper> sinzui: morning
[20:45] <thumper> sinzui: thoughts on 1.16.3?
[20:47] <sinzui> bac: LGTM. Thank you for updating the migration template
[20:47] <bac> cool.  thanks for looking at the migration stuff sinzui.
[20:48] <bac> jcsackett: any problem with me landing my branch now or do you want me to wait?  i can walk the dog now and do it later
[20:48] <sinzui> bac: I am just happy we remember that es-update is automatically run for us
[20:49] <jcsackett> bac: i don't think our branches collide, so it should be fine.
[20:55] <sinzui> thumper, These bugs are fix releases in stable. they are fixed in trunk. I think I can mark these as fix released because everyone has the fix https://bugs.launchpad.net/juju-core/+bugs?search=Search&field.importance=Critical&field.status=New&field.status=Incomplete&field.status=Confirmed&field.status=Triaged&field.status=In+Progress&field.status=Fix+Committed
[20:56] <thumper> sinzui: well, kinda, are we going to roll out 1.17?
[20:56] <thumper> should we make them fix released when we release them?
[20:57] <sinzui> I was think of doing it this week, but doing 1.16.3 might exhaust me
[20:57] <thumper> well, bug 1246556 is in 1.16.2
[20:57] <_mup_> Bug #1246556: lxc containers broken with maas <api> <maas-provider> <juju-core:Fix Committed by thumper> <juju-core 1.16:Fix Released by thumper> <juju-core (Ubuntu):Fix Released> <juju-core (Ubuntu Saucy):New> <juju-core (Ubuntu Trusty):Fix Released> <https://launchpad.net/bugs/1246556>
[20:57] <thumper> so I think that should be fix released
[20:57] <thumper> hmm, I see the 1.16 task is released
[20:58] <thumper> personally I'd rather not have them marked fix released until we have a release with the fix
[20:58] <thumper> fix committed is enough to say they are in trunk
[21:00] <sinzui> thumper, I think only your addition last night is unreleased as well as the complicated maas bug
[21:01] <thumper> sinzui: I was concerned that we'd break people in a charm school at ODS
[21:01] <thumper> as the local provider is broken for everyone
[21:01] <thumper> due to an old mistake and a precise update
[21:01] <jcsackett> bac: qa-ok.
[21:02] <jcsackett> bac: how did you cleanup the bad old review jobs last time? my terminal-fu is weak, and we have processes that won't die.
[21:02] <sinzui> thumper, yes, but not every charm is affected. I wont push back on the releases, but each releases of stable delays other work. It took days to get 1.16.2 to every place it had to be
[21:03] <thumper> sinzui: hmm, not every charm, but the local provider is broken now
[21:03] <thumper> for everyone except those compiling trunk
[21:04] <thumper> no install hooks complete
[21:04] <thumper> because apt is left in an incomplete state
[21:04] <sinzui> look at https://bugs.launchpad.net/juju-core/+bug/1240709. I think this is fix released for everyone because the juju-core maintains stable and devel trees.
[21:04] <_mup_> Bug #1240709: local provider fails to start <local-provider> <juju-core:Fix Committed by thumper> <juju-core 1.16:Fix Released by thumper> <juju-core (Ubuntu):Fix Released> <juju-core (Ubuntu Saucy):New> <juju-core (Ubuntu Trusty):Fix Released> <https://launchpad.net/bugs/1240709>
[21:04] <bac> jcsackett: it was in such a sad state i just rebooted the charmworld instance
[21:06] <sinzui> thumper, I typed into the to wrong channel 30 minutes ago: thumper I can do it. I was hoping that that CI would be up today. but abentley and I have had to rethink the branch + revno tactics used with jenkins
[21:06]  * thumper nods
[21:06] <thumper> Normally I wouldn't be too concerned, but with ODS and charm school...
[21:06] <thumper> could look real bad
[21:07] <sinzui> thumper, remember that the release takes between 8 hours and many days
[21:07] <thumper> :(
[21:07] <sinzui> I don't want to be rushed it it has to be rushed because we don't control the builders and the copy step just adds more hours to the release
[21:09] <sinzui> thumper, If I had off a package in a few hours, I can hope that I have something to republish to end users in the first hours of my morning
[21:09] <sinzui> s/had/hand/
[21:09] <thumper> sinzui: I think we wouldn't need to push new tools
[21:09] <thumper> sinzui: as 1.16.3 should get 1.16.2 tools
[21:10] <thumper> which would work fine
[21:10] <thumper> the only change is in the local provider cloud-init config
[21:10] <sinzui> because juju-core (client and server) are deployed by the client on the same machine?
[21:10] <thumper> yeah
[21:11] <thumper> the local provider always pushes local tools
[21:11] <thumper> it is horrible
[21:11] <thumper> and we should fix it to be nicer
[21:11] <thumper> but it works for now
[21:11] <sinzui> thumper, this is an awkward position to take because we have gone out of our way to ensure the jujuds are published before users get the clients
[21:11] <thumper> :)
[21:11] <sinzui> The number mismatches are scary
[21:12] <thumper> I understand
[21:12] <thumper> was just trying to make things easier
[21:12] <thumper> but I can understand that it isn't working
[21:12] <sinzui> thumper, jamespage makes the stable packages...
[21:14] <sinzui> but I could go crazy and upload my own package to the builders. My packaging does not yet support dpkg alternatives, which would be a nasty regression for IS
[21:14] <sinzui> and I haven't fix packaging because I personally spent 2.5 days releasing 1.16.2
[21:15] <thumper> wow, why so long?
[21:19] <sinzui> CI does not test stable yet. I do all the testing, send the tarball for building. While it builds I write release notes and steal natefinch's time to get a window's installer, and setup all the upgrade tests again. When I see all packages are build, I spend an hour assembling the tools and publishing them. Then a few hours testing that everything works still. Then I do the release annoucements, Then I work on windows and mac
[21:19] <sinzui> distribution with upstreams
[21:20] <sinzui> If Hp loses authentication like last time, I can spend 90 minutes flailing to get jujuds in the cloud
[21:23] <sinzui> thumper, how many hours until charm school?
[21:24] <thumper> sinzui: no idea, best asking jcastro or marcoceppi
[21:24] <thumper> I don't know that there is one
[21:24]  * sinzui is calculating risk and time
[21:24] <thumper> but they normally do something
[21:28] <wallyworld> fwereade: hey, any update on bug 1233457?
[21:28] <_mup_> Bug #1233457: service with no units stuck in lifecycle dying  <cts-cloud-review> <destroy-service> <juju-core:Triaged> <https://launchpad.net/bugs/1233457>
[21:45] <thumper> sinzui: news isn't good, cts is doing the charm school
[21:45] <sinzui> when?
[21:46] <thumper> don't know
[21:56] <sinzui> thumper, I have a release plan nonetheless https://docs.google.com/a/canonical.com/document/d/1J0xf_G1ZRU5timhVBnPsrDbQmZmW3iuw02ReCpO9cMk/edit#
[21:58] <thumper> sinzui: weird
[21:58] <thumper> sinzui: I paste that in to the browser and get nothing
[21:59] <thumper> not an error, just nothing
[22:00] <sinzui> thumper, I think you are not logged in as your canonical id
[22:00] <abentley> thumper: I could see it.
[22:00] <sinzui> thumper, I can go through the steps to the tarball phase. At that point I could return to fixing the devel packaging rules. When the rules are compatible, I can safely place packages into any archive. OR we appandone recipe...just ignore that is who we build test packages and fall back to tarball + packaging + bzr
[22:01] <thumper> sinzui: do you have 1.16 installed locally?
[22:01] <thumper> abentley: do you have 1.16 and not trunk?
[22:01] <sinzui> If I have the right rules, I am unblocked from loading the package to any archive
[22:02] <sinzui> thumper, I do, the current packaging rules do not support dpkg switch...IS cannot use pyjuju in production
[22:02] <abentley> thumper: I have 1.16.0 on my local machine.
[22:02] <thumper> abentley: 1.16.0 or 1.16.2?
[22:02] <abentley> thumper: It claims to be 1.16.0.
[22:02] <thumper> hmm...
[22:02] <thumper> abentley: does it start the local provider?
[22:03] <abentley> thumper: At the sprint, it worked sometimes.
[22:03] <sinzui> oh, thumper , abentley you might be referring to my quickly put together notes. stable is 1.16.2 now. that is what we test
[22:04] <thumper> funny, but my system one says 1.16.0 too
[22:05] <sinzui> thumper, depending on your mirrors and update checks you can be a few weeks behind
[22:06]  * thumper nods
[22:06] <thumper> sinzui: my ppa's were probably disabled when I moved to saucy
[22:06] <thumper> and I've not enabled them
[22:06] <thumper> yet
[22:06] <sinzui> thumper, The only way for me to get the 1.16.3 to complete the release is to install the deb I pulled locally, or bleed. I am running trusty (bleed)
[22:07] <thumper> :)
[22:07] <thumper> I'm not that trusty of trusty
[22:07] <sinzui> I still think of her are tarty
[22:08] <thumper> sinzui: what is the doc called? going through the listing
[22:08] <thumper> link still doesn't work, and I am logged in with the right creds
[22:08] <fwereade> wallyworld, heyhey
[22:08] <sinzui> thumper, 1.16.3 Release Log
[22:08] <wallyworld> hi
[22:09] <wallyworld> fwereade: just thought i'd check in in case i could help at all
[22:09] <thumper> sinzui: can't find it
[22:10] <thumper> sinzui: can I get you to check the sharing on it?
[22:10] <sinzui> thumper, try again
[22:11] <sinzui> I think I have ensured the entire folder is searchable
[22:12] <thumper> sinzui: nope...
[22:12] <thumper> what is the folder?
[22:13] <sinzui> thumper, "Juju QA" and I just shared the doc directly with you.
[22:13] <thumper> restarting chromium seemed to help
[22:13] <wallyworld> fwereade: also, save me some search time - is there currently a way to get an environs.Environ instance using an apiclient made from a call to NewAPIClientFromName? I can't see a way
[22:14]  * sinzui starts blessing and cursing
[22:14] <thumper> sinzui: expose on the local provider does precisely nothing
[22:15] <thumper> sinzui: also, why test installing apache not mysql?
[22:22] <sinzui> thumper, That was copied from the 1.16.3 release. I am going to do the mysql + wordpress stack. The apache charm was an example of a charm not affected by the apparmor/cgroups issue
[22:25] <thumper> sinzui: wow, that is quite a list
[22:25] <sinzui> This is half the size of the 1.16.2 that has canonistack, hp, azure, and ec2
[22:26] <davechen1y> linkage
[22:26] <sinzui> If I die, someone can check off the boxes I didn't get too
[22:26] <davechen1y> sorry, i missed the discussion
[22:32] <abentley> thumper, sinzui: I just tested 1.16r1982, and mysql was unhappy: http://162.213.35.28/job/test-no-upgrade-stable/3/console
[22:33]  * sinzui looks
[22:35] <thumper> abentley: this isn't the local provider
[22:35] <abentley> thumper: Yeah, I don't know whether it was caused by the local provider or just happened on the local provider.
[22:36] <thumper> abentley: looked like it was happening on the hp test
[22:36] <thumper> isn't it?
[22:36] <sinzui> yep
[22:37] <sinzui> that is the very risk I am taking with my plan. The change only affected the local-provider
[22:37] <abentley> thumper: No, we set them up in parallel to reduce lag.
[22:37] <abentley> thumper: So exposing on test-release-hp is the last thing we did before we turned our attention back to the local provider.
[22:38] <thumper> abentley: do you have access to logs?
[22:38] <thumper> I don't entirely believe this
[22:38] <abentley> sure, what do you want?
[22:39] <thumper> /var/lib/juju/containers/*/console.log
[22:39] <thumper> from the machine running the local provider
[22:39] <sinzui> abentley, thumper I commonly see start-errors with local. I did *not* get errors starting mysql+wordpress with 1.16.2 just now. I am setup for an upgrade test
[22:40] <thumper> sinzui: really? WTF?
[22:40] <thumper> I guess it is good...
[22:40] <thumper> but kinda shit
[22:40] <thumper> sinzui: doing mysql+wordpress with the local provider?
[22:40] <sinzui> I have noted in the past that mysql seems to work between the hours of UTC 0 and 6
[22:40] <sinzui> yep
[22:41] <sinzui> thumper, remember when you helped my last week with local provider...that is the thing I deployed 3 times to convince myself all was good...
[22:41] <davechen1y> 09:40 < sinzui> I have noted in the past that mysql seems to work between the hours of UTC 0 and 6
[22:41] <davechen1y> o_O!
[22:41] <sinzui> then the next morning mysql told me to go fuck myself
[22:42] <davechen1y> sinzui: are you running through squid-apt-proxy ?
[22:42] <thumper> sinzui: it worked last week, but I didn't expect it to work today...
[22:42] <sinzui> davechen1y, not in this test
[22:42] <davechen1y> if N, maybe that would make things more reliable^h^h^h^h^h reproducable
[22:43] <sinzui> I have a package. I will do the upgrade
[22:45] <abentley> thumper: mailed to you.
[22:45] <thumper> ta
[22:45] <sinzui> thumper, abentley local 1.16.2 to 1.16.3 worked.
[22:48] <thumper> abentley: the error seems to be unrelated to apt or lxc, but a networking glitch
[22:48] <thumper> near the end of the first machine log file
[22:50] <abentley> thumper: it appears that mysql is on the second machine.
[22:50] <abentley> thumper: And that the problem is with mysql itself not coming up.
[22:50] <thumper> abentley: unit log file for it?
[22:52] <abentley> thumper: something different from local-machine-2/console.log?
[22:52] <thumper> abentley: yeah...
[22:52] <thumper> is the environment still "live"?
[22:52] <abentley> thumper: No, it's down.
[22:53] <thumper> bugger
[22:53] <thumper> that means the internal log files are gone too
[22:53] <sinzui> 1.16.3 deployment of mysql+wordpress == PASS
[22:54] <abentley> thumper: It might just be ENOMEM.
[22:56] <sinzui> tarball is building
[23:24] <sinzui> thumper, I have a tarball that I am willing to (and have) signed. I could send this to be packaged. I would prefer to fix my packaging branch. If I cannot solve its dep problems by my bedtime, I can send off the tarball.
[23:42]  * wallyworld off to accountant, bbiab
[23:49] <sinzui> thumper, do you have any insight into why this upgrade did not complete? http://pastebin.com/QuLFM3hq
[23:50] <sinzui> oops, thumper, I think this has the log that shows a failed upgrade http://pastebin.com/3dTCS9W1