[00:41] <redir> so I am back to: value of (*params.Error) is nil, but a typed nil
[00:41] <redir> :/
[00:49] <redir> nm
[01:15] <axw> wallyworld: I'm around now, let me know when you want to chat (can wait till 1:1 if you like)
[01:15] <wallyworld> axw: am just typing in PR, will push in a sec
[01:15] <menn0> thumper: tools migration is going well so far. here's one change - several more on their way: http://reviews.vapour.ws/r/5033/
[01:22] <wallyworld> axw: i have not reviewed or tested live yet http://reviews.vapour.ws/r/5034/ we can chat soon
[01:23] <wallyworld> omfg those internal networking tests are a waste of time and a bitch to fix
[01:24] <wallyworld> damn, am still missing one apiserver test too
[01:34] <menn0> thumper: thanks
[02:33] <redir> off to do dinner. bbiab
[02:39] <natefinch> thumper: why is our default log level <root>=WARNING;unit=DEBUG ?
[02:40] <thumper> because it is how we see what the units are doing
[02:40] <thumper> unit logging is the output from hooks
[02:40] <thumper> and always useful
[02:40] <thumper> but you can explicitly turn it off
[02:40] <natefinch> ....then why is it at debug?
[02:41] <natefinch> also, I thought it was juju.unit ?  or is unit special?
[02:42] <natefinch> also, why don't we show info by default?  defaulting to warning means we drop a ton of useful context on the floor, and make debugging production systems really difficult
[02:46] <thumper> wallyworld: if you ignore all the hook failures... http://pastebin.ubuntu.com/17163593/
[02:46] <thumper> natefinch: unit is special
[02:46] <thumper> natefinch: we should probably change to default to INFO
[02:47] <thumper> I have no real good reason why
[02:47] <wallyworld> thumper: nice, were you going to split the charm url also?
[02:47] <thumper> not in this branch
[02:47] <wallyworld> lots of people want warning
[02:47] <wallyworld> info is too verbose for them
[02:47] <natefinch> wallyworld:  they're welcome to set it to warning, but I think Info is a more reasonable default
[02:47] <wallyworld> depends who the audience is
[02:47] <wallyworld> do we cater for developers of devop people
[02:48] <wallyworld> or
[02:48] <natefinch> wallyworld: not really.  We limit the amount of logs we store
[02:48] <natefinch> wallyworld: and they can turn it down to warning if they want
[02:48] <wallyworld> and we can turn it up if we want
[02:48] <wallyworld> devop people i have met do not want lots of verbose logging
[02:48] <natefinch> it's not verbose.  It's specifically not.
[02:48] <wallyworld> but i have not talked to lots and lots of them
[02:49] <natefinch> it's not debug... except unit, evidently :/
[02:49] <thumper> but info is noise
[02:49] <wallyworld> verbose is subjective
[02:49] <wallyworld> yes it is noise
[02:49] <wallyworld> they just want warnings
[02:49] <natefinch> I have tried working with logs set to warning and it's basically unusable
[02:49] <wallyworld> theu just want to know when things go wrong
[02:49] <natefinch> you can't tell WTF is going on
[02:49] <thumper> natefinch: for us, yes
[02:49] <wallyworld> unusable for you as a dev
[02:49] <wallyworld> not unusable for a devop person
[02:49] <natefinch> usable for anyone who wants to support the server and figure out what is wrong
[02:50] <wallyworld> and that's the friction that always happens in these cases
[02:50] <natefinch> I don't believe the devops people choosing warning know what they're talking about.
[02:51] <thumper> wallyworld: http://reviews.vapour.ws/r/5035/
[02:51] <wallyworld> you forgot the IMHO
[02:51] <wallyworld> looking after i finish current queue
[02:51] <axw> wallyworld: in all cases, APIPort use by the providers is only used in StartInstance. how about we just add it to StartInstanceParams for now?
[02:51] <wallyworld> hmmm, that would work i think
[02:51] <natefinch> .... some of them do.  The people (mostly internal to canonical) who have used juju a lot, sure.
[02:52] <axw> wallyworld: we could do the same for controller-uuid, and then add another method to Environ to destroy all hosted models/resources
[02:52] <axw> (passing in the controller UUID to that)
[02:52] <wallyworld> axw: for now, i can just add controller uuid to setconfig params
[02:52] <wallyworld> and do that next bit later
[02:53] <axw> wallyworld: yep doesn't have to be in one hit, but I think that's how we can make it a bit cleaner
[02:53] <wallyworld> +1
[02:53] <wallyworld> one step at a time
[03:09] <menn0> thumper: well that's gone a bit better than charms. tools migration worked first time once all the required infrastructure was in place.
[03:09] <thumper> menn0: awesome
[03:12] <thumper> wallyworld: I'm looking at breaking out the charm now rather than my normal friday afternoon thing
[03:12] <wallyworld> rightio, almost starting a review
[03:22] <wallyworld> axw: were we going to put region in controllers.yaml?
[03:24] <axw> wallyworld: I already did
[03:24] <axw> wallyworld: maybe we want to remove region from there? and just have it on the model?
[03:25] <wallyworld> yep
[03:25] <axw> cloud on controller, region on model
[03:25] <wallyworld> yep
[03:26] <axw> wallyworld: I added some comments to the diff
[03:26] <axw> er review comments to your diff
[03:26] <wallyworld> ty, looking
[03:27] <wallyworld> axw: what's wrong with embedding that interface?
[03:27] <axw> it's not what the interface is meant to be doing ...
[03:27] <axw> wallyworld: its purpose is to get you a state.Model
[03:27] <axw> not to get a model and model config and controller config
[03:27] <wallyworld> sure, but i'm extending its behaviour
[03:28] <axw> wallyworld: which defines its purpose
[03:28] <wallyworld> an interface can do whatever methods you decide to put on it
[03:28] <wallyworld> i should change its name i guess
[03:28] <wallyworld> an Environ i think was from the old days when model was environ
[03:29] <axw> wallyworld: no, I don't think you should change the name. the checkToolsAvailability function isn't even using the existing method on EnvironGetter AFAICS
[03:30] <axw> wallyworld: separate responsibilities -> separate interfaces
[03:30] <wallyworld> axw: it does because it passes it to GetEnviron
[03:30] <axw> wallyworld: which expects a ConfigGetter, no?
[03:31] <wallyworld> yes, or an interface that embeds that
[03:31] <axw> wallyworld: so why would you wrap X in Y, only to pass X through to some other thing? that is pointless
[03:32] <axw> and makes it unclear what the function really needs
[03:32] <axw> it doesn't need the Model() method, it only needs the ConfigGetter part
[03:32] <wallyworld> it means we pass in one param whose behaviour we use in the method body in various places. i can do a separate param if you want
[03:33] <wallyworld> eg we pass in StateInterface in places and don't always use every method
[03:33] <axw> wallyworld: yeah, that's a smell. we do that so we don't have to pass around a *state.State, which we used to
[03:34] <wallyworld> but in this case the method being called directly, its logic does use every ethod on the interface
[03:34] <axw> less smelly, but still a smell
[03:34] <axw> wallyworld: checkToolsAvailability doesn't. updateToolsAvailability does
[03:35] <axw> updateToolsAvailability should take two things: an interface for getting the current config (ConfigGetter), and an interface for updating the model (EnvironGetter)
[03:35] <axw> checkToolsAvailability only needs a ConfigGetter
[03:35] <wallyworld> ah, damn, i may have been dyslexic
[03:35] <wallyworld> i think i was confusing two method names as the same thing
[03:35] <wallyworld> ffs
[03:43] <axw> wallyworld: am I making this login thing a critical/blocker to land?
[03:43] <wallyworld> sure
[03:46] <axw> thumper, wallyworld: do we really want to repeat the cloud name for each model? they are always going to be the same
[03:46] <axw> (in status)
[03:47] <wallyworld> i had read that as cloud region
[03:47] <thumper> axw: that's what was asked for
[03:47] <wallyworld> damn, dsylexic again
[03:47] <thumper> and it isn't always the same
[03:47] <thumper> if I have different models, they won't necessarily be in the same controller or cloud
[03:47] <thumper> hmm...
[03:47] <wallyworld> true, for the aggregated case
[03:48] <axw> thumper: we're going to show models for multiple controllers?
[03:48] <axw> I don't think so...
[03:48] <thumper> um...
[03:48] <axw> thumper: OTOH it would be useful to see at a glance from a snapshot of status which cloud
[03:48] <thumper> perhaps I'm no longer clear what you are talking about
[03:48] <axw> thumper: if I run "juju status", I'm seeing all the models for one controller
[03:49] <thumper> um...
[03:49] <axw> thumper: ah hm never mind
[03:49] <thumper> if you run juju status, you only see one model
[03:49] <axw> thumper: yep, forget me. that makes sense
[03:52] <wallyworld> axw: one of you comments in blank so the ditto beneath it makes no sense
[03:53] <axw> wallyworld: ignore ditto sorry. I (tried to) delete a comment after I answered my own question
[03:54] <wallyworld> ok
[04:29] <wallyworld> axw: i've left two issues open but answered the questions....
[04:30] <menn0> thumper: tools migration done: http://reviews.vapour.ws/r/5036/
[04:31] <axw> wallyworld: "no, different models will want to use their own logging levels on the agents" -- the controller agent(s) manage multiple models\
[04:33] <wallyworld> axw: so a machine agent on a worker for model 1 will want to log different to an agent for model 2
[04:33] <wallyworld> model 1 and model 2 should have their own logging-config right?
[04:33] <axw> wallyworld: I'm talking about the controller
[04:33] <axw> wallyworld: they are the same agent
[04:33] <wallyworld> sure, but not on worker nodes
[04:33] <axw> fair point about other workers tho
[04:33] <natefinch> if anyone's feeling ambitious, this is a mostly mechanical change, to drop lxc support and use lxd in its place: http://reviews.vapour.ws/r/5027/
[04:34] <axw> wallyworld: I guess we shouldn't constrain it to how it works today anyyway. it would be nice if it weren't global. we could have each worker in the controlelr take a logger with levels configured for the model
[04:34] <axw> wallyworld: so I'll drop
[04:34] <wallyworld> natefinch: any prgress on the --to lxd issue?
[04:35] <wallyworld> you have a +1 from eric right?
[04:35] <natefinch> wallyworld: I do have a +1 from eric, yes.... do we need 2 +1's now?
[04:35] <davecheney> func (fw *Firewaller) flushUnits(unitds []*unitData) error {
[04:35] <davecheney>   // flushUnits opens and closes ports for the passed unit data.
[04:35] <wallyworld> not if i have anything to do with it - except for when the review feels like they need a second opinion
[04:35] <davecheney> worst, name. ever
[04:36] <natefinch> wallyworld: also, no, I don't have an idea about the lxd thing... my guess is that it's a switch statement that we forgot to add lxd to
[04:37] <wallyworld> natefinch: so we can land this and then fix the other issue before release
[04:38] <axw> wallyworld: going for lunch then fixing car, will finish review later
[04:39] <wallyworld> axw: np, ty
[04:39] <wallyworld> i'll start on the next bits
[04:39] <natefinch> wallyworld: master is blocked, and this doesn't have a bug, AFAIK
[04:40] <wallyworld> natefinch: either jfdi or create a bug - i have been jfdi
[04:40] <wallyworld> we need this work for release
[04:40] <wallyworld> natefinch: ah but wait
[04:41] <wallyworld> we can't land until deploy --to is fixed
[04:41] <wallyworld> because it will break QA
[04:41] <wallyworld> doh
[04:41] <natefinch> right, ok. I'll do that first
[04:41] <wallyworld> ty
[04:46] <natefinch> gotta catch up on sleep, will figure it out in the morning.  Seems like it's probably something pretty dumb.
[05:18] <redir> wallyworld: axw whomever pr is in http://reviews.vapour.ws/r/5037/
[05:18] <redir> Be back in the local AM.
[05:18] <wallyworld> ty
[05:30] <davechen1y> https://github.com/juju/juju/pull/5594
[05:30] <davechen1y> ^ anyone experienced with the firewaller, this is a small fix as a prereq for 1590161
[06:07] <axw> wallyworld: reviewed
[06:33] <wallyworld> axw: ty
[07:01] <axw> wallyworld: you did a half change in your previous PR, you called the doc "defaultModelSettings" but the method is called "CloudConfig" still. shall I change it to DefaultModelConfig? sounds a bit off -- like it's config for a default model. maybe ModelConfigDefaults?
[07:02] <wallyworld> axw: yeah, sounds good ty
[07:37] <frobware> dimitern: ping
[08:05] <dimitern> frobware: pong
[08:05] <frobware> dimitern: was just about the resolv.conf issue
[08:05] <dimitern> frobware: I was looking at those bugs
[08:05] <dimitern> frobware: trying to reproduce now with lxd on 1.9.3
[08:09] <frobware> dimitern: I can help out in a bit - just trying to stash some stuff but in a meaningful state.
[08:09] <dimitern> frobware: ok
[08:27] <dimitern> frobware: no luck reproducing this so far :/
[08:27] <frobware> dimitern: sounds like the whole of my yesterday :/
[08:27] <dimitern> frobware: (that is, if the lxds even come up ok)
[08:27] <frobware> dimitern: oh?
[08:28] <dimitern> frobware: I noticed on machine-0 there was an issue and all 3 lxds came up with 10.0.0.x addresses
[08:28] <frobware> dimitern: heh, that caught me out this moring. they are on the LXD bridge.
[08:29] <frobware> dimitern: when we probe for an unused subnet, that's pretty much the default address you'll get as there's not much else, network-wise, running
[08:29] <dimitern> frobware: yeah, the issue due to a race between setting the observed machine config with the created bridges and containers starting to deploy and trying to bridge their nics to yet-to-be-created host bridges
[08:29]  * frobware notes that his git stash list has grown to a depth of 32...
[08:32] <frobware> dimitern: explain that one to me in standup :)
[08:32] <dimitern> frobware: otoh, if the bridges are created ok, lxds come up as expected with all NICs, and /e/resolv.conf has both nameserver and search (i.e. ping node-5 and ping node-5.maas-19 both work)
[08:33] <dimitern> frobware: sure :)
[09:02] <frobware> dimitern: standup
[09:07] <fwereade> voidspace, http://reviews.vapour.ws/r/5029/
[09:48] <frobware> dimitern: regarding resolv.conf. we did a change way back to copy the /etc/resolv.conf from the host. is it possible that it is triggering that path but the host has no valid entry (not for you, but the bug reporter)
[09:50] <dimitern> frobware: it's very much guaranteed that container's resolv.conf will be broken if their host's resolv.conf is also broken
[09:51] <dimitern> frobware: btw commented on that bug for '--to lxd'
[09:51] <dimitern> mgz: hey
[09:52] <dimitern> mgz: are there any places in the CI tests which do the equivalent of 'juju deploy xyz --to lxd' ?
[09:53] <dimitern> mgz: if there are any, it should be because there is a machine with hostname 'lxd' that's the intended target
[10:13] <babbageclunk> dimitern: is it actually ambiguous? Can you use a maas-level machine name there instead of a juju-level machine number?
[10:13] <dimitern> babbageclunk: of course you can
[10:13] <dimitern> babbageclunk: unless your node happens to be called 'lxd'
[10:14] <babbageclunk> dimitern: ok, just thought I'd check.
[10:15] <dimitern> babbageclunk: actually... hmm - maybe only on maas I guess?
[10:17] <dimitern> babbageclunk: placement is supposed to work with existing machines (including containers), or new containers on existing machines
[10:18] <babbageclunk> dimitern: So is the bug really that --to lxd (or lxc or kvm) should be an error?
[10:18] <dimitern> babbageclunk: it even supports a list when num-units > 1: `juju deploy ubuntu -n 3 --to 0,0/lxd/1,lxd:1`
[10:19] <dimitern> babbageclunk: placement for deploy and add-machine/bootstrap is handled slightly differently
[10:19] <dimitern> babbageclunk: for the latter you *can* use 'add-machine ... --to lxd' or 'bootstrap --to node-x' (on maas)
[10:20] <babbageclunk> dimitern: yeah, I was getting confused between them - I've interacted with add-machine and bootstrap more.
[10:20] <dimitern> babbageclunk: that's an inconsistency though
[10:21] <dimitern> babbageclunk: add-machine can do more than that - e.g. add-machine ssh:10.20.30.2
[10:21] <dimitern> babbageclunk: bootstrap --to lxd at least fails with `error: unsupported bootstrap placement directive "lxd"`
[10:23] <dimitern> babbageclunk: so it looks like a maas provider issue - it implements PrecheckInstance (called by state at AddMachine time), but apparently not very well
[10:24] <babbageclunk> dimitern: Ok, that seems easy enough to fix.
[10:24] <dimitern> babbageclunk: tell-tale comment on line 566 in provider/maas/environ.go: `// If there's no '=' delimiter, assume it's a node name.`
[10:25] <dimitern> but doesn't bother to validate it
[10:25] <dimitern> fwereade: hey
[10:26] <dimitern> fwereade: I think we don't have a clear separation between deploy-time placement and provision-time placement (i.e. deploy --to X vs add-machine X)
[10:28] <dimitern> fwereade: I might be wrong, but I think 'deploy ubuntu --to lxd' was never intended to work, unlike '--to lxd:2', '--to 0', or '--to 0/lxd/0'
[10:38] <dimitern> frobware: how about if we pass a list of interfaces to bridge explicitly to the script?
[10:38] <frobware> dimitern: sure; can we HO anyway as I have discovered some issues with lxd on aws
[10:39] <dimitern> frobware: I was just about to have a quick bite - top of the hour?
[10:39] <frobware> dimitern: or later if you want more time; that's only 20 mins
[10:39] <frobware> dimitern: let's say ~1 hour and I'll go and eat too
[10:40] <babbageclunk> frobware: I'm trying to add a machine to understand deploying to lxd better, but when I do add-machine it never goes from Deploying to Deployed in MAAS.
[10:40] <dimitern> frobware: ok, sgtm
[10:41] <frobware> babbageclunk: for that I think you'll have to dig into the MAAS logs.
[10:41] <frobware> babbageclunk: oh, 2.0?
[10:41] <dimitern> babbageclunk: trusty?
[10:42] <babbageclunk> frobware: 2.0, xenial
[10:42] <dimitern> babbageclunk: you run 'add-machine lxd' ?
[10:42] <babbageclunk> frobware: just the machine, first - haven't gotten to deploy anything into a container.
[10:42] <frobware> babbageclunk: I don't use 2.0 very much, if at all. Most of the bugs I'm looking at explicitly reference 1.9.x
[10:43] <babbageclunk> frobware: no, add-machine --series=xenial
[10:44] <babbageclunk> frobware: Any idea how I can get onto the machine? I think it's the network that's not coming up.
[10:44] <frobware> babbageclunk: you can get to and see the console?
[10:44] <babbageclunk> frobware: yeah, but I don't know login details.
[10:44] <dimitern> babbageclunk: use vmm ?
[10:45] <dimitern> if it's a kvm on your machine..
[10:45] <babbageclunk> dimitern: what username/password though?
[10:45] <frobware> babbageclunk: apply this http://pastebin.ubuntu.com/17167820/
[10:46] <frobware> babbageclunk: (cd juju/provider/maas; make)
[10:46] <dimitern> babbageclunk: none will work; 'ubuntu' but pwd auth is disabled
[10:46] <frobware> babbageclunk: then build juju
[10:46] <frobware> babbageclunk: then either start-over or run upgrade-juju and add another machine
[10:47] <babbageclunk> frobware: hmm, I might try removing all of the vlans from the node first.
[10:47] <frobware> babbageclunk: that's ^^ a useful exercise as it does allow you to login when we bork networking
[10:47] <babbageclunk> frobware: ok, will try it.
[11:05] <fwereade> dimitern, hey, sorry
[11:06] <dimitern> frobware: what's up?
[11:06] <fwereade> dimitern, in my understanding `--to lxd` means "hand over deployment to the notional lxd compute provider that spans the capable machines in your model"
[11:07] <fwereade> dimitern, "I want it in a container, don't bother me with the details"
[11:07] <dimitern> oops, sorry frobware
[11:08] <dimitern> fwereade: well, why do we have container=lxd as a constraint then?
[11:08] <fwereade> dimitern, hysterical raisins
[11:10] <dimitern> fwereade: so 'juju deploy ubuntu --to lxd' is supposed to work exactly like 'juju add-machine lxd && juju deploy ubuntu --to X', where X is the 'created machine X' add-machine reports
[11:16] <fwereade> dimitern, yes
[11:52] <frobware> dimitern: hey, I kept working... can we sync after I have some lunch. :)
[11:53] <dimitern> frobware: sure :)
[12:48] <voidspace> dimitern: ping
[12:49] <voidspace> dimitern: a quick sanity check. Every LinkLayerDevice should have a corresponding refs doc with a ref that defaults to 0. If non-zero the references are the number of devices that have this device as a parent (set in ParentName)?
[12:50] <voidspace> dimitern: so a quick scan of the linklayerdevices counting parent references should enable me to reproduce it without having to directly migrate it.
[13:02] <dimitern> voidspace: sorry, just got back
[13:02] <dimitern> voidspace: yes, I think that's correct
[13:03] <dimitern> voidspace: ah, well 'quick scan' could work but only if nothing else can add or remove stuff from the db while you do it
[13:15] <babbageclunk> frobware: I tried your patch after trying a few other things, but it seems like passwd -d ubuntu just makes it so that ubuntu can't login through the terminal.
[13:15] <babbageclunk> frobware: trying it with chpasswd instead.
[13:16] <frobware> babbageclunk: I use that alll the time
[13:17] <babbageclunk> frobware: hmm. Definitely didn't let me log in.
[13:17] <babbageclunk> frobware: maybe it's hanging before the bridgescript runs?
[13:17] <dimitern> frobware, babbageclunk: the ubuntu account is locked usually
[13:17] <frobware> babbageclunk: if that's the case my patch is either borked, or the bridgescript did not run
[13:18] <dimitern> in the cloud images
[13:20] <babbageclunk> frobware, dimitern - trying deploying from maas without juju.
[13:21] <babbageclunk> frobware, dimitern - how does the bridgescript get run? Juju gives it to maas which runs it via cloud-init?
[13:21] <frobware> babbageclunk: yep
[13:21] <dimitern> babbageclunk: yeah, as a runcmd: in cloud-init user data
[13:22] <frobware> dimitern: can we HO?
[13:23] <dimitern> frobware: sure - omw
[13:24] <dimitern> frobware: joined standup HO
[13:24] <frobware> dimitern: heh, I was in the other one. omw
[13:24] <babbageclunk> frobware, dimitern - ok, I see the same problem deploying with maas-only, so presumably the bridgescript never gets to run.
[13:24] <dimitern> babbageclunk: is this with trusty on maas 2.0 ?
[13:26] <babbageclunk> the install's paused for a long time with "Raise network interfaces", then it times out and continues to stop at a login prompt, but it's before cloud-init runs.
[13:26] <babbageclunk> dimitern: xenial on maas 2.0
[13:26] <dimitern> babbageclunk: hmm well that's odd
[13:28] <babbageclunk> dimitern: yeah. I'm going to kill off the vlans, that seems to trigger it. But I don't see why, since they didn't cause a problem before.
[13:28] <babbageclunk> dimitern: Then at least I can try to understand the lxd deploy bug better without this getting in the way.
[13:33] <dimitern> babbageclunk: sorry, otp
[14:02] <natefinch> dimitern, dooferlad:  are you guys looking at the deploy --to lxd issue?  I had started looking at that last night, but didn't get very far. I need that to be fixed so I can land my code that removes all the lxc stuff
[14:12] <dimitern> natefinch: yeah, I posted updates as well
[14:14] <frobware> natefinch: we may need to reassign if not finished by EOD
[14:14] <dimitern> natefinch: deploy --to lxc and --to lxd or --to kvm are equally broken, so it shouldn't block landing your patch
[14:17] <dimitern> natefinch: side-note: I'm more concerned with removing the LXC container type as valid; wasn't there a discussion to still allow both 'lxd' and 'lxc' (but treat both the same as 'lxd') for backwards-compatibility with existing bundles?
[14:27] <natefinch> dimitern: bundles will treat lxc like lxd, yes
[14:27] <natefinch> dimitern: it's just everything else that is getting lxc removed
[14:28] <dimitern> natefinch: ok then
[14:28] <natefinch> dimitern: btw, I swear there used to be help text for --to lxc that said "deploy to a container on a new machine"
[14:29] <natefinch> dimitern: but I don't see it now, so maybe I'm crazy
[14:29] <dimitern> natefinch: if there was, it was never tested
[14:30] <natefinch> dimitern: so are we fixing the bug that it doesn't immediately error out, or are we fixing the bug that it doesn't work?
[14:30] <dimitern> natefinch: and I know for sure maas provider is not handling this as it should; not tried others
[14:30] <cherylj> hey dimitern, should bug 1590689 be fixed in 1.25.6?
[14:30] <mup> Bug #1590689: MAAS 1.9.3 + Juju 1.25.5 - on the Juju controller node eth0 and juju-br0 interfaces have the same IP address at the same time <cpec> <juju> <maas> <sts> <juju-core:Fix Committed> <juju-core 1.25:Triaged> <MAAS:Invalid> <https://launchpad.net/bugs/1590689>
[14:31] <dimitern> cherylj: not without backporting the fix I linked to from master
[14:31] <cherylj> dimitern: sorry, what I mean is, should we hold off releasing 1.25.6 until that gets done?
[14:31] <dimitern> cherylj: oh, sorry not that one
[14:32] <dimitern> cherylj: ah, yeah it *is* that one - and FWIW I think we should not release 1.25.6 without it
[14:32] <cherylj> dimitern: is the backport already on your (or someone's) to do list?
[14:33] <mup> Bug #1591225 opened: Generated image stream is not considered in bootstrap on private cloud <juju-core:Incomplete> <https://launchpad.net/bugs/1591225>
[14:33] <dimitern> cherylj: not to my knowledge
[14:33] <dimitern> cherylj: I could switch to that and propose it (I have too many things in progress..)
[14:34] <cherylj> boy I know how that feels.
[14:34] <cherylj> dimitern: I think we're still a couple days away from a 1.25.6, so maybe aim to have it in by Tuesday?
[14:37] <dimitern> cherylj: that would be great!
[14:38] <cherylj> thanks, dimitern!
[14:38] <perrito666> bbl
[14:39] <dimitern> frobware: guess what?
[14:39] <frobware> its broken
[14:39] <frobware> dimitern: in beta6
[14:39] <dimitern> frobware: nope :) it works just the same with beta6
[14:39] <frobware> dimitern: sigh
[14:39] <natefinch> dimitern: so are we fixing it so that deploy --to lxd errors out the way --to lxc does?  in my tests --to lxc says: "ERROR cannot add application "ubuntu3": unknown placement directive: lxc"
[14:39] <dimitern> (...for a change)
[14:40] <dimitern> natefinch: is that on maas btw?
[14:40] <natefinch> dimitern: whereas --to lxd doesn't error out (but then never works either)
[14:40] <dimitern> frobware: added a comment anyway
[14:40] <frobware> dimitern: thx
[14:41] <natefinch> dimitern: no.  I never test on maas. don't have one.  GCE.  but I can try aws if it's not still broken like it was yesterday
[14:41] <natefinch> dimitern: it should be provider independent, though
[14:42] <dimitern> natefinch: yeah, it *should*, but as it turns out it's not unfortunately
[14:42] <natefinch> dimitern: I guess maas has that messed up "if it doesn't match anything else, let's assume it's a node" thing
[14:43] <dimitern> natefinch: I'll do a quick test now how deploy --to lxc and lxd is handled on maas, gce, and aws
[14:43] <natefinch> dimitern: I did GCE, so you can skip that one
[14:43] <dimitern> natefinch: ok, I'll try azure then
[14:44] <natefinch> dimitern: lxd and kvm behave the same - they both return no error, but then never create a machine either
[14:45] <dimitern> natefinch: something just occurred to me.. lxd uses the 'lxd' as the default domain for container FQDNs
[14:46] <dimitern> natefinch: it might be the reason why lxd is different
[14:46] <natefinch> dimitern: I'm pretty sure a placement directive of just a container type is supposed to work: https://github.com/juju/juju/blob/master/instance/placement.go#L71
[14:48] <dimitern> natefinch: yeah, but there's also the PrecheckInstance from the prechecker state policy, which is called while adding a machine
[14:51] <dimitern> natefinch: hmm it looks like only maas is affected
[14:52] <dimitern> natefinch: as all other providers expect '=' to be present in the placement or parsing fails
[14:54] <dimitern> natefinch: or like joyent simply fails with placement != ""
[14:56] <dimitern> cloudsigma doesn't even bother to do anything.. precheckInstance is { return nil }.. why implement it then?
[14:59] <babbageclunk> dimitern, natefinch: I can see in the add-machine case where the decision to add a new machine with a container is made for lxc, I can't find anything corresponding to that in the deploy code.
[15:01] <natefinch> ahh, add machine, that's where it is: juju add-machine lxd                  (starts a new machine with an lxd container)
[15:01] <natefinch> I don't know why deploy would be any different
[15:01] <babbageclunk> dimitern, natefinch: ooh - does State.addMachineWithPlacement need to grow a call to AddMachineInsideNewMachine to do it?
[15:02] <babbageclunk> (in state/state.go:1249)
[15:02] <katco> natefinch: standup time
[15:02] <natefinch> katco: oops, thanks
[15:03] <natefinch> babbageclunk: 1275
[15:03] <dimitern> babbageclunk: the actual code deploy uses lives in juju/deploy.go
[15:03] <babbageclunk> dimitern: Yeah, but that will only put a new container in an existing machine.
[15:04] <babbageclunk> dimitern: vs this code from add-machine https://github.com/juju/juju/blob/master/apiserver/machinemanager/machinemanager.go#L158
[15:04] <dimitern> natefinch: on AWS 'deploy ubuntu --to lxd' and --to lxc both appear to work, but neither adds a machine for the unit
[15:04] <natefinch> dimitern: yeah, same for GCE
[15:06] <dimitern> natefinch: so it looks consistently broken everywhere :)
[15:06] <dimitern> I'd vote to reject '--to <container-type>' for deploy on its own (i.e. still allow '--to <ctype>:<id>')
[15:07] <babbageclunk> dimitern: So the code from add-machine will create a new host with a container inside, but the deploy codepath won't because it doesn't call AddMachineInsideNewMachine.
[15:07] <dimitern> until we can untangle the mess around it and make add-machine and deploy --to behave the same way
[15:08] <dimitern> babbageclunk: yeah, because nobody thought about it too much I guess
[15:10] <babbageclunk> dimitern: I think it's just an extra check in that function - if machineId is "", call AddMachineInsideNewMachine instead of AddMachineInsideMachine.
[15:11] <babbageclunk> dimitern: testing it now
[15:11] <dimitern> babbageclunk: that sounds correct
[15:12] <dimitern> babbageclunk: but definitely *isn't* the way to fix the bug
[15:13] <dimitern> babbageclunk: I mean.. this will allow deploy --to lxd to work, but it might also open a whole new can of worms on all providers
[15:13] <babbageclunk> dimitern: I don't see why? (But I haven't been following the discussion closely.)
[15:14] <dimitern> babbageclunk: e.g. deploy --to kvm on aws will start an instance but then fail to deploy the unit as kvm won't be supported
[15:14] <babbageclunk> dimitern: Isn't that the same behaviour as add-machine kvm?
[15:15] <dimitern> babbageclunk: similarly, --to lxd with 'default-series: precise' will similarly seem to pass initially, then fail as lxd is not supported on precise
[15:15] <dimitern> babbageclunk: add-machine is similarly broken in those cases
[15:16] <babbageclunk> dimitern: Isn't it worth doing this fix so add-machine and deploy behave in the same way (although both broken in the cases you describe)?
[15:16] <dimitern> babbageclunk: add-machine accepts other things, e.g. ssh:user@hostname
[15:17] <dimitern> babbageclunk: they still won't act the same
[15:17] <dimitern> babbageclunk: but, at least they will be a step closer
[15:18] <babbageclunk> dimitern: Yeah, it still seems like people expect them to work in the same way in this case.
[15:18] <natefinch> they should be as consistent as possible
[15:19] <dimitern> babbageclunk: ok, please ignore my previous rants then :) what you suggest is a good fix to have
[15:20]  * dimitern is just twitchy about changing core behavior before the release..
[15:20] <babbageclunk> dimitern: :) I mean, I think you're right that those cases are problems.
[15:23] <dimitern> we should have a well define format for placement, which allows provider-specific scopes; e.g. deploy --to/add-machine <scope>:<args>; where <scope> := <container-type>|<provider-type>; <args> := <target>|<key>=<value>[,..]
[15:23] <frobware> dimitern: in AWS with AA-FF why do we use static addresses and not dhcp?
[15:23] <frobware> dimitern: in containers
[15:24] <dimitern> frobware: because the FF
[15:24] <frobware> dimitern: sure, but really asking why static in that case
[15:24] <dimitern> frobware: i.e. the user asked for static IPs
[15:25] <dimitern> frobware: we use dhcp otherwise
[15:26] <dimitern> frobware: but the whole point of the FF and now the multi-NIC approach on maas has always been to have static IPs for containers
[15:32] <frobware> dimitern: it was AWS I was questioning; the MAAS I can see because you can ask for static/dhcp there
[15:33] <dimitern> frobware: you can on AWS as well
[15:33] <dimitern> frobware: AssignPrivateIpAddress
[15:33] <dimitern> http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssignPrivateIpAddresses.html
[15:34] <dimitern> well not nearly equivalent to what maas offers.
[16:01] <alexisb> natefinch, when you have five minutes I have a few qs
[16:01] <dimitern> frobware: ping
[16:01] <dimitern> frobware: here's my patch so far: http://paste.ubuntu.com/17174180/
[16:02] <natefinch> alexisb: sure.
[16:02] <dimitern> frobware: now testing on aws w/ && w/o AC-FF (xenial), and on maas-19 (t) / maas-20 (x)
[16:03] <alexisb> https://hangouts.google.com/hangouts/_/canonical.com/juju-release
[16:03] <alexisb> natefinch, ^^
[16:03] <frobware> dimitern: it's nuts... all this manual testing we're BOTH doing... Grrr.
[16:03] <alexisb> cherylj, feel free to crash the party
[16:04] <dimitern> frobware: yeah..
[16:05] <frobware> dimitern: your patch "so far" - does that mean use or wait?
[16:06] <dimitern> frobware: so far only as long as the currently running make check passes
[16:07] <dimitern> frobware: or if something comes up from the live tests (will be able to tell you shortly); otherwise I think I covered everything in what I pasted
[16:11] <dimitern> frobware: yeah, I've missed a few tests in container/kvm
[16:15] <alexisb> babbageclunk, dimitern: what is the consensus for a fix on lp 1590960 ??
[16:16] <alexisb> lp1590960
[16:16] <babbageclunk> alexisb: maybe bug 1590960? Or is mup sulking?
[16:16] <mup> Bug #1590960: juju deploy --to lxd does not create base machine <deploy> <lxd> <juju-core:Triaged by 2-xtian> <https://launchpad.net/bugs/1590960>
[16:17] <alexisb> there we go :)
[16:17] <babbageclunk> I've got a fix, tested manually, just finishing the unit test for it.
[16:17] <dimitern> alexisb: we can fix deploy to work with --to <container-type>, but that's not what's blocking natefinch's patch LXC-to-LXD
[16:17] <alexisb> dimitern, correct it is not blocking
[16:18] <babbageclunk> Should be up for review in ~10 mins
[16:18] <alexisb> but looking at this mornings discussion there seemed to be some different ideas how what should work woith --to and what shouldnt
[16:18] <alexisb> was just curious what the expected behavior should be
[16:20] <cherylj> alexisb, natefinch looks like --to lxc is also a problem on 1.25:  https://bugs.launchpad.net/juju-core/+bug/1590960/comments/6
[16:20] <mup> Bug #1590960: juju deploy --to lxd does not create base machine <deploy> <lxd> <juju-core:Triaged by 2-xtian> <https://launchpad.net/bugs/1590960>
[16:20] <dimitern> alexisb: that's the real issue: behavior was neither clearly defined nor tested
[16:20] <alexisb> dimitern, exactly
[16:21] <dimitern> alexisb: but it's sensible to expect deploy --to X it should work like add-machine X does
[16:21] <alexisb> dimitern, also agree
[16:21] <natefinch> cherylj: an error is a lot better than silently half-working... but yeah, should be fixed to mirror add-machine
[16:21] <dimitern> alexisb: and babbageclunk's fix should get us there
[16:21] <natefinch> huzzah :)
[16:22] <dimitern> but not all the way
[16:22] <alexisb> dimitern, though a note to the juju-core team might be good so that we highlight the change and educate the team
[16:22] <alexisb> babbageclunk, ^^
[16:22] <dimitern> agreed
[16:22] <dimitern>  
[16:25] <babbageclunk> alexisb, dimitern: Clarifying - am I sending the note about this change?
[16:25] <dimitern> babbageclunk: I'd appreciate if you do it, I can help clarifying something or other if you need though
[16:26] <dimitern> frobware: so the patch didn't work for aws
[16:26] <alexisb> babbageclunk, yeah just to the juju-core lanchpad group
[16:26] <frobware> dimitern: what happened?
[16:26] <dimitern> frobware: ERROR juju.provisioner provisioner_task.go:681 cannot start instance for machine "0/lxd/0": missing
[16:26] <dimitern> container network config
[16:27] <dimitern> frobware: it slipped through somewhere.. looking
[16:27] <frobware> dimitern: why do I think that's an existent bug... ?
[16:28] <babbageclunk> dimitern, alexisb: Ok cool - I think I understand the wider issues now. Basically just that this will still do slightly weird things on clouds that don't support the container type, but at least that the add-machine and deploy behaviour is more consistent.
[16:28] <alexisb> babbageclunk, yep
[16:29] <alexisb> and we as a team should be clear on what the current behaviour is and the gaps, so we can both explain to users *and* make better desicions on what the behaviour should be
[16:30] <alexisb> cmars, cherylj, do we have any progress on https://bugs.launchpad.net/juju-core/+bug/1581157
[16:30] <mup> Bug #1581157: github.com/juju/juju/cmd/jujud test timeout on windows <blocker> <ci> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged by dave-cheney> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1581157>
[16:41] <dimitern> frobware: nope, it was a warning before
[16:41] <dimitern> frobware: I'll need to add a few more tweaks to the patch and will resend
[16:50] <cherylj> alexisb: I haven't heard anything from cmars about it
[17:18] <mup> Bug #1591290 opened: serverSuite.TestStop unexpected error <ci> <intermittent-failure> <regression> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1591290>
[17:21] <dimitern> frobware: fixed patch: http://paste.ubuntu.com/17177501/
[17:23] <dimitern> frobware: should now work ok on AWS (testing again); all unit tests fixed
[17:24] <dimitern> frobware: I probably should've proposed it rather than to bug you with it :/
[17:26] <natefinch> dimitern: you mention placement strings with = in them in that bug... but placement strings don't use = AFAIK?  placement is like --to 0\lxc\0 or --to lxd:4  maybe you mean constraints?
[17:28] <dimitern> natefinch: on maas you can do --to zone=foo
[17:28] <dimitern> natefinch: and I think most others support zone= as well
[17:29] <dimitern> natefinch: see, it's confusing :)
[17:37] <natefinch> dimitern: gah, zone should be a constraint :/
[17:38] <natefinch> well... maybe not
[17:38] <natefinch> I g uess constraints are for all units of a service
[17:39] <natefinch> still... weird
[17:40] <dimitern> natefinch: yeah, it can't be useful as a constraint if we're to do automatic zone distribution
[17:41] <natefinch> dimitern: right (sometimes you might not want them distributed,. but that's the exception).  Anyway... many valid placements do not use =... like specifying containers or machines
[17:41] <dimitern> natefinch: there's also a container=lxd constraint btw, hardly tested
[17:42] <babbageclunk> natefinch, dimitern: halp! After state.AddApplication's been called, the units are just staged, is that right? When/how does juju/deploy:AddUnits get called?
[17:43] <babbageclunk> Is it triggered by a watcher of some sort?
[17:43] <natefinch> babbageclunk: there's the unitassigner that makes sure units get assigned to machnies
[17:43] <dimitern> babbageclunk: it goes like this: cmd/juju/application/deploy.go -> api/application/deploy -> apiserver/application/deploy -> juju/deploy -> state
[17:43] <natefinch> babbageclunk: it's a worker
[17:45] <babbageclunk> dimitern: yeah, I could follow that, but none of the code in that chain actually ends up calling AssignUnitWithPlacement.
[17:45] <babbageclunk> natefinch: Ah, ok - thanks.
[17:46] <natefinch> babbageclunk: yeah, we add a staged assignment during deploy, and then the unitassigner reads those and turns them into real assignments.
[17:50] <babbageclunk> natefinch: ok - that makes sense. I was trying to understand why I didn't see the error I see in my unit test when running deploy manually.
[17:52] <babbageclunk> natefinch: It's because the errors are raised by the unitassigner and logged somewhere, rather than coming back from the api to the command.
[17:55] <dimitern> frobware: FYI, proposed it: http://reviews.vapour.ws/r/5040/
[18:10] <babbageclunk> dimitern, natefinch: review please? http://reviews.vapour.ws/r/5041/
[18:10] <dimitern> babbageclunk: cheers, looking
[18:10] <babbageclunk> dimitern: I mean, you shouldn't now! It's late there! It's kinda late here now!
[18:12] <babbageclunk> dimitern: but thanks!
[18:12]  * babbageclunk is off home - have delightful weekends everyone!
[18:12] <dimitern> babbageclunk: likewise! :)
[18:20] <frobware> dimitern: will take a look
[18:47] <redir> brb reboot
[19:22] <mup> Bug #1588924 opened: juju list-controllers --format=yaml displays controller that cannot be addressed. <juju-core:Fix Committed> <juju-deployer:Invalid> <https://launchpad.net/bugs/1588924>
[20:41] <perrito666> cmars: still around?
[21:19] <mup> Bug #1591379 opened: bootstrap failure with MAAS doesn't tell me which node has a problem <v-pil> <juju-core:New> <https://launchpad.net/bugs/1591379>
[21:26] <cmars> perrito666, yep, what's up?
[21:26]  * perrito666 deleted what he was writing because he began in spanish
[21:26] <perrito666> cmars: I wanted to ask you about juju/permission
[21:26] <perrito666> We are sort of moving in another direction http://reviews.vapour.ws/r/4973/#comment27181
[21:26] <cmars> leo un poquito
[21:27] <cmars> looking
[21:27] <perrito666> dont do that (the spanish) you just short circuited my brain badly :p
[21:27] <perrito666> it is fun to see your own language and not understand it
[21:28] <cmars> :)
[21:28] <cmars> perrito666, is there a doc or tl;dr for the permissions changes?
[21:55] <mup> Bug #1591387 opened: juju controller stuck in infinite loop during teardown <juju-core:New> <https://launchpad.net/bugs/1591387>
[21:58] <mup> Bug #1591387 changed: juju controller stuck in infinite loop during teardown <juju-core:New> <https://launchpad.net/bugs/1591387>
[22:10] <mup> Bug #1591387 opened: juju controller stuck in infinite loop during teardown <juju-core:New> <https://launchpad.net/bugs/1591387>