redir | so I am back to: value of (*params.Error) is nil, but a typed nil | 00:41 |
---|---|---|
redir | :/ | 00:41 |
redir | nm | 00:49 |
axw | wallyworld: I'm around now, let me know when you want to chat (can wait till 1:1 if you like) | 01:15 |
wallyworld | axw: am just typing in PR, will push in a sec | 01:15 |
menn0 | thumper: tools migration is going well so far. here's one change - several more on their way: http://reviews.vapour.ws/r/5033/ | 01:15 |
wallyworld | axw: i have not reviewed or tested live yet http://reviews.vapour.ws/r/5034/ we can chat soon | 01:22 |
wallyworld | omfg those internal networking tests are a waste of time and a bitch to fix | 01:23 |
wallyworld | damn, am still missing one apiserver test too | 01:24 |
menn0 | thumper: thanks | 01:34 |
redir | off to do dinner. bbiab | 02:33 |
=== natefinch-afk is now known as natefinch | ||
natefinch | thumper: why is our default log level <root>=WARNING;unit=DEBUG ? | 02:39 |
thumper | because it is how we see what the units are doing | 02:40 |
thumper | unit logging is the output from hooks | 02:40 |
thumper | and always useful | 02:40 |
thumper | but you can explicitly turn it off | 02:40 |
natefinch | ....then why is it at debug? | 02:40 |
natefinch | also, I thought it was juju.unit ? or is unit special? | 02:41 |
=== Spads_ is now known as Spads | ||
natefinch | also, why don't we show info by default? defaulting to warning means we drop a ton of useful context on the floor, and make debugging production systems really difficult | 02:42 |
thumper | wallyworld: if you ignore all the hook failures... http://pastebin.ubuntu.com/17163593/ | 02:46 |
thumper | natefinch: unit is special | 02:46 |
thumper | natefinch: we should probably change to default to INFO | 02:46 |
thumper | I have no real good reason why | 02:47 |
wallyworld | thumper: nice, were you going to split the charm url also? | 02:47 |
thumper | not in this branch | 02:47 |
wallyworld | lots of people want warning | 02:47 |
wallyworld | info is too verbose for them | 02:47 |
natefinch | wallyworld: they're welcome to set it to warning, but I think Info is a more reasonable default | 02:47 |
wallyworld | depends who the audience is | 02:47 |
wallyworld | do we cater for developers of devop people | 02:47 |
wallyworld | or | 02:48 |
natefinch | wallyworld: not really. We limit the amount of logs we store | 02:48 |
natefinch | wallyworld: and they can turn it down to warning if they want | 02:48 |
wallyworld | and we can turn it up if we want | 02:48 |
wallyworld | devop people i have met do not want lots of verbose logging | 02:48 |
natefinch | it's not verbose. It's specifically not. | 02:48 |
wallyworld | but i have not talked to lots and lots of them | 02:48 |
natefinch | it's not debug... except unit, evidently :/ | 02:49 |
thumper | but info is noise | 02:49 |
wallyworld | verbose is subjective | 02:49 |
wallyworld | yes it is noise | 02:49 |
wallyworld | they just want warnings | 02:49 |
natefinch | I have tried working with logs set to warning and it's basically unusable | 02:49 |
wallyworld | theu just want to know when things go wrong | 02:49 |
natefinch | you can't tell WTF is going on | 02:49 |
thumper | natefinch: for us, yes | 02:49 |
wallyworld | unusable for you as a dev | 02:49 |
wallyworld | not unusable for a devop person | 02:49 |
natefinch | usable for anyone who wants to support the server and figure out what is wrong | 02:49 |
wallyworld | and that's the friction that always happens in these cases | 02:50 |
natefinch | I don't believe the devops people choosing warning know what they're talking about. | 02:50 |
thumper | wallyworld: http://reviews.vapour.ws/r/5035/ | 02:51 |
wallyworld | you forgot the IMHO | 02:51 |
wallyworld | looking after i finish current queue | 02:51 |
axw | wallyworld: in all cases, APIPort use by the providers is only used in StartInstance. how about we just add it to StartInstanceParams for now? | 02:51 |
wallyworld | hmmm, that would work i think | 02:51 |
natefinch | .... some of them do. The people (mostly internal to canonical) who have used juju a lot, sure. | 02:51 |
axw | wallyworld: we could do the same for controller-uuid, and then add another method to Environ to destroy all hosted models/resources | 02:52 |
axw | (passing in the controller UUID to that) | 02:52 |
wallyworld | axw: for now, i can just add controller uuid to setconfig params | 02:52 |
wallyworld | and do that next bit later | 02:52 |
axw | wallyworld: yep doesn't have to be in one hit, but I think that's how we can make it a bit cleaner | 02:53 |
wallyworld | +1 | 02:53 |
wallyworld | one step at a time | 02:53 |
menn0 | thumper: well that's gone a bit better than charms. tools migration worked first time once all the required infrastructure was in place. | 03:09 |
thumper | menn0: awesome | 03:09 |
thumper | wallyworld: I'm looking at breaking out the charm now rather than my normal friday afternoon thing | 03:12 |
wallyworld | rightio, almost starting a review | 03:12 |
=== Spads_ is now known as Spads | ||
wallyworld | axw: were we going to put region in controllers.yaml? | 03:22 |
axw | wallyworld: I already did | 03:24 |
axw | wallyworld: maybe we want to remove region from there? and just have it on the model? | 03:24 |
wallyworld | yep | 03:25 |
axw | cloud on controller, region on model | 03:25 |
wallyworld | yep | 03:25 |
axw | wallyworld: I added some comments to the diff | 03:26 |
axw | er review comments to your diff | 03:26 |
wallyworld | ty, looking | 03:26 |
wallyworld | axw: what's wrong with embedding that interface? | 03:27 |
axw | it's not what the interface is meant to be doing ... | 03:27 |
axw | wallyworld: its purpose is to get you a state.Model | 03:27 |
axw | not to get a model and model config and controller config | 03:27 |
wallyworld | sure, but i'm extending its behaviour | 03:27 |
axw | wallyworld: which defines its purpose | 03:28 |
wallyworld | an interface can do whatever methods you decide to put on it | 03:28 |
wallyworld | i should change its name i guess | 03:28 |
wallyworld | an Environ i think was from the old days when model was environ | 03:28 |
axw | wallyworld: no, I don't think you should change the name. the checkToolsAvailability function isn't even using the existing method on EnvironGetter AFAICS | 03:29 |
axw | wallyworld: separate responsibilities -> separate interfaces | 03:30 |
wallyworld | axw: it does because it passes it to GetEnviron | 03:30 |
axw | wallyworld: which expects a ConfigGetter, no? | 03:30 |
wallyworld | yes, or an interface that embeds that | 03:31 |
axw | wallyworld: so why would you wrap X in Y, only to pass X through to some other thing? that is pointless | 03:31 |
axw | and makes it unclear what the function really needs | 03:32 |
axw | it doesn't need the Model() method, it only needs the ConfigGetter part | 03:32 |
wallyworld | it means we pass in one param whose behaviour we use in the method body in various places. i can do a separate param if you want | 03:32 |
wallyworld | eg we pass in StateInterface in places and don't always use every method | 03:33 |
axw | wallyworld: yeah, that's a smell. we do that so we don't have to pass around a *state.State, which we used to | 03:33 |
wallyworld | but in this case the method being called directly, its logic does use every ethod on the interface | 03:34 |
axw | less smelly, but still a smell | 03:34 |
axw | wallyworld: checkToolsAvailability doesn't. updateToolsAvailability does | 03:34 |
axw | updateToolsAvailability should take two things: an interface for getting the current config (ConfigGetter), and an interface for updating the model (EnvironGetter) | 03:35 |
axw | checkToolsAvailability only needs a ConfigGetter | 03:35 |
wallyworld | ah, damn, i may have been dyslexic | 03:35 |
wallyworld | i think i was confusing two method names as the same thing | 03:35 |
wallyworld | ffs | 03:35 |
axw | wallyworld: am I making this login thing a critical/blocker to land? | 03:43 |
wallyworld | sure | 03:43 |
axw | thumper, wallyworld: do we really want to repeat the cloud name for each model? they are always going to be the same | 03:46 |
axw | (in status) | 03:46 |
wallyworld | i had read that as cloud region | 03:47 |
thumper | axw: that's what was asked for | 03:47 |
wallyworld | damn, dsylexic again | 03:47 |
thumper | and it isn't always the same | 03:47 |
thumper | if I have different models, they won't necessarily be in the same controller or cloud | 03:47 |
thumper | hmm... | 03:47 |
wallyworld | true, for the aggregated case | 03:47 |
axw | thumper: we're going to show models for multiple controllers? | 03:48 |
axw | I don't think so... | 03:48 |
thumper | um... | 03:48 |
axw | thumper: OTOH it would be useful to see at a glance from a snapshot of status which cloud | 03:48 |
thumper | perhaps I'm no longer clear what you are talking about | 03:48 |
axw | thumper: if I run "juju status", I'm seeing all the models for one controller | 03:48 |
thumper | um... | 03:49 |
axw | thumper: ah hm never mind | 03:49 |
thumper | if you run juju status, you only see one model | 03:49 |
axw | thumper: yep, forget me. that makes sense | 03:49 |
wallyworld | axw: one of you comments in blank so the ditto beneath it makes no sense | 03:52 |
axw | wallyworld: ignore ditto sorry. I (tried to) delete a comment after I answered my own question | 03:53 |
wallyworld | ok | 03:54 |
=== Spads_ is now known as Spads | ||
wallyworld | axw: i've left two issues open but answered the questions.... | 04:29 |
menn0 | thumper: tools migration done: http://reviews.vapour.ws/r/5036/ | 04:30 |
axw | wallyworld: "no, different models will want to use their own logging levels on the agents" -- the controller agent(s) manage multiple models\ | 04:31 |
wallyworld | axw: so a machine agent on a worker for model 1 will want to log different to an agent for model 2 | 04:33 |
wallyworld | model 1 and model 2 should have their own logging-config right? | 04:33 |
axw | wallyworld: I'm talking about the controller | 04:33 |
axw | wallyworld: they are the same agent | 04:33 |
wallyworld | sure, but not on worker nodes | 04:33 |
axw | fair point about other workers tho | 04:33 |
natefinch | if anyone's feeling ambitious, this is a mostly mechanical change, to drop lxc support and use lxd in its place: http://reviews.vapour.ws/r/5027/ | 04:33 |
axw | wallyworld: I guess we shouldn't constrain it to how it works today anyyway. it would be nice if it weren't global. we could have each worker in the controlelr take a logger with levels configured for the model | 04:34 |
axw | wallyworld: so I'll drop | 04:34 |
wallyworld | natefinch: any prgress on the --to lxd issue? | 04:34 |
wallyworld | you have a +1 from eric right? | 04:35 |
natefinch | wallyworld: I do have a +1 from eric, yes.... do we need 2 +1's now? | 04:35 |
davecheney | func (fw *Firewaller) flushUnits(unitds []*unitData) error { | 04:35 |
davecheney | // flushUnits opens and closes ports for the passed unit data. | 04:35 |
wallyworld | not if i have anything to do with it - except for when the review feels like they need a second opinion | 04:35 |
davecheney | worst, name. ever | 04:35 |
natefinch | wallyworld: also, no, I don't have an idea about the lxd thing... my guess is that it's a switch statement that we forgot to add lxd to | 04:36 |
wallyworld | natefinch: so we can land this and then fix the other issue before release | 04:37 |
axw | wallyworld: going for lunch then fixing car, will finish review later | 04:38 |
wallyworld | axw: np, ty | 04:39 |
wallyworld | i'll start on the next bits | 04:39 |
natefinch | wallyworld: master is blocked, and this doesn't have a bug, AFAIK | 04:39 |
wallyworld | natefinch: either jfdi or create a bug - i have been jfdi | 04:40 |
wallyworld | we need this work for release | 04:40 |
wallyworld | natefinch: ah but wait | 04:40 |
wallyworld | we can't land until deploy --to is fixed | 04:41 |
wallyworld | because it will break QA | 04:41 |
wallyworld | doh | 04:41 |
natefinch | right, ok. I'll do that first | 04:41 |
wallyworld | ty | 04:41 |
natefinch | gotta catch up on sleep, will figure it out in the morning. Seems like it's probably something pretty dumb. | 04:46 |
redir | wallyworld: axw whomever pr is in http://reviews.vapour.ws/r/5037/ | 05:18 |
redir | Be back in the local AM. | 05:18 |
wallyworld | ty | 05:18 |
davechen1y | https://github.com/juju/juju/pull/5594 | 05:30 |
davechen1y | ^ anyone experienced with the firewaller, this is a small fix as a prereq for 1590161 | 05:30 |
axw | wallyworld: reviewed | 06:07 |
wallyworld | axw: ty | 06:33 |
axw | wallyworld: you did a half change in your previous PR, you called the doc "defaultModelSettings" but the method is called "CloudConfig" still. shall I change it to DefaultModelConfig? sounds a bit off -- like it's config for a default model. maybe ModelConfigDefaults? | 07:01 |
wallyworld | axw: yeah, sounds good ty | 07:02 |
=== frankban|afk is now known as frankban | ||
frobware | dimitern: ping | 07:37 |
dimitern | frobware: pong | 08:05 |
frobware | dimitern: was just about the resolv.conf issue | 08:05 |
dimitern | frobware: I was looking at those bugs | 08:05 |
dimitern | frobware: trying to reproduce now with lxd on 1.9.3 | 08:05 |
frobware | dimitern: I can help out in a bit - just trying to stash some stuff but in a meaningful state. | 08:09 |
dimitern | frobware: ok | 08:09 |
dimitern | frobware: no luck reproducing this so far :/ | 08:27 |
frobware | dimitern: sounds like the whole of my yesterday :/ | 08:27 |
dimitern | frobware: (that is, if the lxds even come up ok) | 08:27 |
frobware | dimitern: oh? | 08:27 |
dimitern | frobware: I noticed on machine-0 there was an issue and all 3 lxds came up with 10.0.0.x addresses | 08:28 |
frobware | dimitern: heh, that caught me out this moring. they are on the LXD bridge. | 08:28 |
frobware | dimitern: when we probe for an unused subnet, that's pretty much the default address you'll get as there's not much else, network-wise, running | 08:29 |
dimitern | frobware: yeah, the issue due to a race between setting the observed machine config with the created bridges and containers starting to deploy and trying to bridge their nics to yet-to-be-created host bridges | 08:29 |
* frobware notes that his git stash list has grown to a depth of 32... | 08:29 | |
frobware | dimitern: explain that one to me in standup :) | 08:32 |
dimitern | frobware: otoh, if the bridges are created ok, lxds come up as expected with all NICs, and /e/resolv.conf has both nameserver and search (i.e. ping node-5 and ping node-5.maas-19 both work) | 08:32 |
dimitern | frobware: sure :) | 08:33 |
frobware | dimitern: standup | 09:02 |
fwereade | voidspace, http://reviews.vapour.ws/r/5029/ | 09:07 |
frobware | dimitern: regarding resolv.conf. we did a change way back to copy the /etc/resolv.conf from the host. is it possible that it is triggering that path but the host has no valid entry (not for you, but the bug reporter) | 09:48 |
dimitern | frobware: it's very much guaranteed that container's resolv.conf will be broken if their host's resolv.conf is also broken | 09:50 |
dimitern | frobware: btw commented on that bug for '--to lxd' | 09:51 |
dimitern | mgz: hey | 09:51 |
dimitern | mgz: are there any places in the CI tests which do the equivalent of 'juju deploy xyz --to lxd' ? | 09:52 |
dimitern | mgz: if there are any, it should be because there is a machine with hostname 'lxd' that's the intended target | 09:53 |
babbageclunk | dimitern: is it actually ambiguous? Can you use a maas-level machine name there instead of a juju-level machine number? | 10:13 |
dimitern | babbageclunk: of course you can | 10:13 |
dimitern | babbageclunk: unless your node happens to be called 'lxd' | 10:13 |
babbageclunk | dimitern: ok, just thought I'd check. | 10:14 |
dimitern | babbageclunk: actually... hmm - maybe only on maas I guess? | 10:15 |
dimitern | babbageclunk: placement is supposed to work with existing machines (including containers), or new containers on existing machines | 10:17 |
babbageclunk | dimitern: So is the bug really that --to lxd (or lxc or kvm) should be an error? | 10:18 |
dimitern | babbageclunk: it even supports a list when num-units > 1: `juju deploy ubuntu -n 3 --to 0,0/lxd/1,lxd:1` | 10:18 |
dimitern | babbageclunk: placement for deploy and add-machine/bootstrap is handled slightly differently | 10:19 |
dimitern | babbageclunk: for the latter you *can* use 'add-machine ... --to lxd' or 'bootstrap --to node-x' (on maas) | 10:19 |
babbageclunk | dimitern: yeah, I was getting confused between them - I've interacted with add-machine and bootstrap more. | 10:20 |
dimitern | babbageclunk: that's an inconsistency though | 10:20 |
dimitern | babbageclunk: add-machine can do more than that - e.g. add-machine ssh:10.20.30.2 | 10:21 |
dimitern | babbageclunk: bootstrap --to lxd at least fails with `error: unsupported bootstrap placement directive "lxd"` | 10:21 |
dimitern | babbageclunk: so it looks like a maas provider issue - it implements PrecheckInstance (called by state at AddMachine time), but apparently not very well | 10:23 |
babbageclunk | dimitern: Ok, that seems easy enough to fix. | 10:24 |
dimitern | babbageclunk: tell-tale comment on line 566 in provider/maas/environ.go: `// If there's no '=' delimiter, assume it's a node name.` | 10:24 |
dimitern | but doesn't bother to validate it | 10:25 |
dimitern | fwereade: hey | 10:25 |
dimitern | fwereade: I think we don't have a clear separation between deploy-time placement and provision-time placement (i.e. deploy --to X vs add-machine X) | 10:26 |
dimitern | fwereade: I might be wrong, but I think 'deploy ubuntu --to lxd' was never intended to work, unlike '--to lxd:2', '--to 0', or '--to 0/lxd/0' | 10:28 |
dimitern | frobware: how about if we pass a list of interfaces to bridge explicitly to the script? | 10:38 |
frobware | dimitern: sure; can we HO anyway as I have discovered some issues with lxd on aws | 10:38 |
dimitern | frobware: I was just about to have a quick bite - top of the hour? | 10:39 |
frobware | dimitern: or later if you want more time; that's only 20 mins | 10:39 |
frobware | dimitern: let's say ~1 hour and I'll go and eat too | 10:39 |
babbageclunk | frobware: I'm trying to add a machine to understand deploying to lxd better, but when I do add-machine it never goes from Deploying to Deployed in MAAS. | 10:40 |
dimitern | frobware: ok, sgtm | 10:40 |
frobware | babbageclunk: for that I think you'll have to dig into the MAAS logs. | 10:41 |
frobware | babbageclunk: oh, 2.0? | 10:41 |
dimitern | babbageclunk: trusty? | 10:41 |
babbageclunk | frobware: 2.0, xenial | 10:42 |
dimitern | babbageclunk: you run 'add-machine lxd' ? | 10:42 |
babbageclunk | frobware: just the machine, first - haven't gotten to deploy anything into a container. | 10:42 |
frobware | babbageclunk: I don't use 2.0 very much, if at all. Most of the bugs I'm looking at explicitly reference 1.9.x | 10:42 |
babbageclunk | frobware: no, add-machine --series=xenial | 10:43 |
babbageclunk | frobware: Any idea how I can get onto the machine? I think it's the network that's not coming up. | 10:44 |
frobware | babbageclunk: you can get to and see the console? | 10:44 |
babbageclunk | frobware: yeah, but I don't know login details. | 10:44 |
dimitern | babbageclunk: use vmm ? | 10:44 |
dimitern | if it's a kvm on your machine.. | 10:45 |
babbageclunk | dimitern: what username/password though? | 10:45 |
frobware | babbageclunk: apply this http://pastebin.ubuntu.com/17167820/ | 10:45 |
frobware | babbageclunk: (cd juju/provider/maas; make) | 10:46 |
dimitern | babbageclunk: none will work; 'ubuntu' but pwd auth is disabled | 10:46 |
frobware | babbageclunk: then build juju | 10:46 |
frobware | babbageclunk: then either start-over or run upgrade-juju and add another machine | 10:46 |
babbageclunk | frobware: hmm, I might try removing all of the vlans from the node first. | 10:47 |
frobware | babbageclunk: that's ^^ a useful exercise as it does allow you to login when we bork networking | 10:47 |
babbageclunk | frobware: ok, will try it. | 10:47 |
fwereade | dimitern, hey, sorry | 11:05 |
dimitern | frobware: what's up? | 11:06 |
fwereade | dimitern, in my understanding `--to lxd` means "hand over deployment to the notional lxd compute provider that spans the capable machines in your model" | 11:06 |
fwereade | dimitern, "I want it in a container, don't bother me with the details" | 11:07 |
dimitern | oops, sorry frobware | 11:07 |
dimitern | fwereade: well, why do we have container=lxd as a constraint then? | 11:08 |
fwereade | dimitern, hysterical raisins | 11:08 |
dimitern | fwereade: so 'juju deploy ubuntu --to lxd' is supposed to work exactly like 'juju add-machine lxd && juju deploy ubuntu --to X', where X is the 'created machine X' add-machine reports | 11:10 |
fwereade | dimitern, yes | 11:16 |
frobware | dimitern: hey, I kept working... can we sync after I have some lunch. :) | 11:52 |
dimitern | frobware: sure :) | 11:53 |
voidspace | dimitern: ping | 12:48 |
voidspace | dimitern: a quick sanity check. Every LinkLayerDevice should have a corresponding refs doc with a ref that defaults to 0. If non-zero the references are the number of devices that have this device as a parent (set in ParentName)? | 12:49 |
voidspace | dimitern: so a quick scan of the linklayerdevices counting parent references should enable me to reproduce it without having to directly migrate it. | 12:50 |
dimitern | voidspace: sorry, just got back | 13:02 |
dimitern | voidspace: yes, I think that's correct | 13:02 |
dimitern | voidspace: ah, well 'quick scan' could work but only if nothing else can add or remove stuff from the db while you do it | 13:03 |
babbageclunk | frobware: I tried your patch after trying a few other things, but it seems like passwd -d ubuntu just makes it so that ubuntu can't login through the terminal. | 13:15 |
babbageclunk | frobware: trying it with chpasswd instead. | 13:15 |
frobware | babbageclunk: I use that alll the time | 13:16 |
babbageclunk | frobware: hmm. Definitely didn't let me log in. | 13:17 |
babbageclunk | frobware: maybe it's hanging before the bridgescript runs? | 13:17 |
dimitern | frobware, babbageclunk: the ubuntu account is locked usually | 13:17 |
frobware | babbageclunk: if that's the case my patch is either borked, or the bridgescript did not run | 13:17 |
dimitern | in the cloud images | 13:18 |
babbageclunk | frobware, dimitern - trying deploying from maas without juju. | 13:20 |
babbageclunk | frobware, dimitern - how does the bridgescript get run? Juju gives it to maas which runs it via cloud-init? | 13:21 |
frobware | babbageclunk: yep | 13:21 |
dimitern | babbageclunk: yeah, as a runcmd: in cloud-init user data | 13:21 |
frobware | dimitern: can we HO? | 13:22 |
dimitern | frobware: sure - omw | 13:23 |
dimitern | frobware: joined standup HO | 13:24 |
frobware | dimitern: heh, I was in the other one. omw | 13:24 |
babbageclunk | frobware, dimitern - ok, I see the same problem deploying with maas-only, so presumably the bridgescript never gets to run. | 13:24 |
dimitern | babbageclunk: is this with trusty on maas 2.0 ? | 13:24 |
babbageclunk | the install's paused for a long time with "Raise network interfaces", then it times out and continues to stop at a login prompt, but it's before cloud-init runs. | 13:26 |
babbageclunk | dimitern: xenial on maas 2.0 | 13:26 |
dimitern | babbageclunk: hmm well that's odd | 13:26 |
babbageclunk | dimitern: yeah. I'm going to kill off the vlans, that seems to trigger it. But I don't see why, since they didn't cause a problem before. | 13:28 |
babbageclunk | dimitern: Then at least I can try to understand the lxd deploy bug better without this getting in the way. | 13:28 |
dimitern | babbageclunk: sorry, otp | 13:33 |
natefinch | dimitern, dooferlad: are you guys looking at the deploy --to lxd issue? I had started looking at that last night, but didn't get very far. I need that to be fixed so I can land my code that removes all the lxc stuff | 14:02 |
dimitern | natefinch: yeah, I posted updates as well | 14:12 |
frobware | natefinch: we may need to reassign if not finished by EOD | 14:14 |
dimitern | natefinch: deploy --to lxc and --to lxd or --to kvm are equally broken, so it shouldn't block landing your patch | 14:14 |
dimitern | natefinch: side-note: I'm more concerned with removing the LXC container type as valid; wasn't there a discussion to still allow both 'lxd' and 'lxc' (but treat both the same as 'lxd') for backwards-compatibility with existing bundles? | 14:17 |
natefinch | dimitern: bundles will treat lxc like lxd, yes | 14:27 |
natefinch | dimitern: it's just everything else that is getting lxc removed | 14:27 |
dimitern | natefinch: ok then | 14:28 |
natefinch | dimitern: btw, I swear there used to be help text for --to lxc that said "deploy to a container on a new machine" | 14:28 |
natefinch | dimitern: but I don't see it now, so maybe I'm crazy | 14:29 |
dimitern | natefinch: if there was, it was never tested | 14:29 |
natefinch | dimitern: so are we fixing the bug that it doesn't immediately error out, or are we fixing the bug that it doesn't work? | 14:30 |
dimitern | natefinch: and I know for sure maas provider is not handling this as it should; not tried others | 14:30 |
cherylj | hey dimitern, should bug 1590689 be fixed in 1.25.6? | 14:30 |
mup | Bug #1590689: MAAS 1.9.3 + Juju 1.25.5 - on the Juju controller node eth0 and juju-br0 interfaces have the same IP address at the same time <cpec> <juju> <maas> <sts> <juju-core:Fix Committed> <juju-core 1.25:Triaged> <MAAS:Invalid> <https://launchpad.net/bugs/1590689> | 14:30 |
dimitern | cherylj: not without backporting the fix I linked to from master | 14:31 |
cherylj | dimitern: sorry, what I mean is, should we hold off releasing 1.25.6 until that gets done? | 14:31 |
dimitern | cherylj: oh, sorry not that one | 14:31 |
dimitern | cherylj: ah, yeah it *is* that one - and FWIW I think we should not release 1.25.6 without it | 14:32 |
cherylj | dimitern: is the backport already on your (or someone's) to do list? | 14:32 |
mup | Bug #1591225 opened: Generated image stream is not considered in bootstrap on private cloud <juju-core:Incomplete> <https://launchpad.net/bugs/1591225> | 14:33 |
dimitern | cherylj: not to my knowledge | 14:33 |
dimitern | cherylj: I could switch to that and propose it (I have too many things in progress..) | 14:33 |
cherylj | boy I know how that feels. | 14:34 |
cherylj | dimitern: I think we're still a couple days away from a 1.25.6, so maybe aim to have it in by Tuesday? | 14:34 |
dimitern | cherylj: that would be great! | 14:37 |
cherylj | thanks, dimitern! | 14:38 |
perrito666 | bbl | 14:38 |
dimitern | frobware: guess what? | 14:39 |
frobware | its broken | 14:39 |
frobware | dimitern: in beta6 | 14:39 |
dimitern | frobware: nope :) it works just the same with beta6 | 14:39 |
frobware | dimitern: sigh | 14:39 |
natefinch | dimitern: so are we fixing it so that deploy --to lxd errors out the way --to lxc does? in my tests --to lxc says: "ERROR cannot add application "ubuntu3": unknown placement directive: lxc" | 14:39 |
dimitern | (...for a change) | 14:39 |
dimitern | natefinch: is that on maas btw? | 14:40 |
natefinch | dimitern: whereas --to lxd doesn't error out (but then never works either) | 14:40 |
dimitern | frobware: added a comment anyway | 14:40 |
frobware | dimitern: thx | 14:40 |
natefinch | dimitern: no. I never test on maas. don't have one. GCE. but I can try aws if it's not still broken like it was yesterday | 14:41 |
natefinch | dimitern: it should be provider independent, though | 14:41 |
dimitern | natefinch: yeah, it *should*, but as it turns out it's not unfortunately | 14:42 |
natefinch | dimitern: I guess maas has that messed up "if it doesn't match anything else, let's assume it's a node" thing | 14:42 |
dimitern | natefinch: I'll do a quick test now how deploy --to lxc and lxd is handled on maas, gce, and aws | 14:43 |
natefinch | dimitern: I did GCE, so you can skip that one | 14:43 |
=== tvansteenburgh1 is now known as tvansteenburgh | ||
dimitern | natefinch: ok, I'll try azure then | 14:43 |
natefinch | dimitern: lxd and kvm behave the same - they both return no error, but then never create a machine either | 14:44 |
dimitern | natefinch: something just occurred to me.. lxd uses the 'lxd' as the default domain for container FQDNs | 14:45 |
dimitern | natefinch: it might be the reason why lxd is different | 14:46 |
natefinch | dimitern: I'm pretty sure a placement directive of just a container type is supposed to work: https://github.com/juju/juju/blob/master/instance/placement.go#L71 | 14:46 |
dimitern | natefinch: yeah, but there's also the PrecheckInstance from the prechecker state policy, which is called while adding a machine | 14:48 |
dimitern | natefinch: hmm it looks like only maas is affected | 14:51 |
dimitern | natefinch: as all other providers expect '=' to be present in the placement or parsing fails | 14:52 |
dimitern | natefinch: or like joyent simply fails with placement != "" | 14:54 |
dimitern | cloudsigma doesn't even bother to do anything.. precheckInstance is { return nil }.. why implement it then? | 14:56 |
babbageclunk | dimitern, natefinch: I can see in the add-machine case where the decision to add a new machine with a container is made for lxc, I can't find anything corresponding to that in the deploy code. | 14:59 |
natefinch | ahh, add machine, that's where it is: juju add-machine lxd (starts a new machine with an lxd container) | 15:01 |
natefinch | I don't know why deploy would be any different | 15:01 |
babbageclunk | dimitern, natefinch: ooh - does State.addMachineWithPlacement need to grow a call to AddMachineInsideNewMachine to do it? | 15:01 |
babbageclunk | (in state/state.go:1249) | 15:02 |
katco | natefinch: standup time | 15:02 |
natefinch | katco: oops, thanks | 15:02 |
natefinch | babbageclunk: 1275 | 15:03 |
dimitern | babbageclunk: the actual code deploy uses lives in juju/deploy.go | 15:03 |
babbageclunk | dimitern: Yeah, but that will only put a new container in an existing machine. | 15:03 |
babbageclunk | dimitern: vs this code from add-machine https://github.com/juju/juju/blob/master/apiserver/machinemanager/machinemanager.go#L158 | 15:04 |
dimitern | natefinch: on AWS 'deploy ubuntu --to lxd' and --to lxc both appear to work, but neither adds a machine for the unit | 15:04 |
natefinch | dimitern: yeah, same for GCE | 15:04 |
dimitern | natefinch: so it looks consistently broken everywhere :) | 15:06 |
dimitern | I'd vote to reject '--to <container-type>' for deploy on its own (i.e. still allow '--to <ctype>:<id>') | 15:06 |
babbageclunk | dimitern: So the code from add-machine will create a new host with a container inside, but the deploy codepath won't because it doesn't call AddMachineInsideNewMachine. | 15:07 |
dimitern | until we can untangle the mess around it and make add-machine and deploy --to behave the same way | 15:07 |
dimitern | babbageclunk: yeah, because nobody thought about it too much I guess | 15:08 |
babbageclunk | dimitern: I think it's just an extra check in that function - if machineId is "", call AddMachineInsideNewMachine instead of AddMachineInsideMachine. | 15:10 |
babbageclunk | dimitern: testing it now | 15:11 |
dimitern | babbageclunk: that sounds correct | 15:11 |
dimitern | babbageclunk: but definitely *isn't* the way to fix the bug | 15:12 |
dimitern | babbageclunk: I mean.. this will allow deploy --to lxd to work, but it might also open a whole new can of worms on all providers | 15:13 |
babbageclunk | dimitern: I don't see why? (But I haven't been following the discussion closely.) | 15:13 |
dimitern | babbageclunk: e.g. deploy --to kvm on aws will start an instance but then fail to deploy the unit as kvm won't be supported | 15:14 |
babbageclunk | dimitern: Isn't that the same behaviour as add-machine kvm? | 15:14 |
dimitern | babbageclunk: similarly, --to lxd with 'default-series: precise' will similarly seem to pass initially, then fail as lxd is not supported on precise | 15:15 |
dimitern | babbageclunk: add-machine is similarly broken in those cases | 15:15 |
babbageclunk | dimitern: Isn't it worth doing this fix so add-machine and deploy behave in the same way (although both broken in the cases you describe)? | 15:16 |
dimitern | babbageclunk: add-machine accepts other things, e.g. ssh:user@hostname | 15:16 |
dimitern | babbageclunk: they still won't act the same | 15:17 |
dimitern | babbageclunk: but, at least they will be a step closer | 15:17 |
babbageclunk | dimitern: Yeah, it still seems like people expect them to work in the same way in this case. | 15:18 |
natefinch | they should be as consistent as possible | 15:18 |
dimitern | babbageclunk: ok, please ignore my previous rants then :) what you suggest is a good fix to have | 15:19 |
* dimitern is just twitchy about changing core behavior before the release.. | 15:20 | |
babbageclunk | dimitern: :) I mean, I think you're right that those cases are problems. | 15:20 |
dimitern | we should have a well define format for placement, which allows provider-specific scopes; e.g. deploy --to/add-machine <scope>:<args>; where <scope> := <container-type>|<provider-type>; <args> := <target>|<key>=<value>[,..] | 15:23 |
frobware | dimitern: in AWS with AA-FF why do we use static addresses and not dhcp? | 15:23 |
frobware | dimitern: in containers | 15:23 |
dimitern | frobware: because the FF | 15:24 |
frobware | dimitern: sure, but really asking why static in that case | 15:24 |
dimitern | frobware: i.e. the user asked for static IPs | 15:24 |
dimitern | frobware: we use dhcp otherwise | 15:25 |
dimitern | frobware: but the whole point of the FF and now the multi-NIC approach on maas has always been to have static IPs for containers | 15:26 |
frobware | dimitern: it was AWS I was questioning; the MAAS I can see because you can ask for static/dhcp there | 15:32 |
dimitern | frobware: you can on AWS as well | 15:33 |
dimitern | frobware: AssignPrivateIpAddress | 15:33 |
dimitern | http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssignPrivateIpAddresses.html | 15:33 |
dimitern | well not nearly equivalent to what maas offers. | 15:34 |
=== tvansteenburgh1 is now known as tvansteenburgh | ||
alexisb | natefinch, when you have five minutes I have a few qs | 16:01 |
dimitern | frobware: ping | 16:01 |
dimitern | frobware: here's my patch so far: http://paste.ubuntu.com/17174180/ | 16:01 |
natefinch | alexisb: sure. | 16:02 |
dimitern | frobware: now testing on aws w/ && w/o AC-FF (xenial), and on maas-19 (t) / maas-20 (x) | 16:02 |
alexisb | https://hangouts.google.com/hangouts/_/canonical.com/juju-release | 16:03 |
alexisb | natefinch, ^^ | 16:03 |
frobware | dimitern: it's nuts... all this manual testing we're BOTH doing... Grrr. | 16:03 |
alexisb | cherylj, feel free to crash the party | 16:03 |
dimitern | frobware: yeah.. | 16:04 |
frobware | dimitern: your patch "so far" - does that mean use or wait? | 16:05 |
dimitern | frobware: so far only as long as the currently running make check passes | 16:06 |
dimitern | frobware: or if something comes up from the live tests (will be able to tell you shortly); otherwise I think I covered everything in what I pasted | 16:07 |
dimitern | frobware: yeah, I've missed a few tests in container/kvm | 16:11 |
alexisb | babbageclunk, dimitern: what is the consensus for a fix on lp 1590960 ?? | 16:15 |
alexisb | lp1590960 | 16:16 |
babbageclunk | alexisb: maybe bug 1590960? Or is mup sulking? | 16:16 |
mup | Bug #1590960: juju deploy --to lxd does not create base machine <deploy> <lxd> <juju-core:Triaged by 2-xtian> <https://launchpad.net/bugs/1590960> | 16:16 |
alexisb | there we go :) | 16:17 |
babbageclunk | I've got a fix, tested manually, just finishing the unit test for it. | 16:17 |
dimitern | alexisb: we can fix deploy to work with --to <container-type>, but that's not what's blocking natefinch's patch LXC-to-LXD | 16:17 |
alexisb | dimitern, correct it is not blocking | 16:17 |
babbageclunk | Should be up for review in ~10 mins | 16:18 |
alexisb | but looking at this mornings discussion there seemed to be some different ideas how what should work woith --to and what shouldnt | 16:18 |
alexisb | was just curious what the expected behavior should be | 16:18 |
cherylj | alexisb, natefinch looks like --to lxc is also a problem on 1.25: https://bugs.launchpad.net/juju-core/+bug/1590960/comments/6 | 16:20 |
mup | Bug #1590960: juju deploy --to lxd does not create base machine <deploy> <lxd> <juju-core:Triaged by 2-xtian> <https://launchpad.net/bugs/1590960> | 16:20 |
dimitern | alexisb: that's the real issue: behavior was neither clearly defined nor tested | 16:20 |
alexisb | dimitern, exactly | 16:20 |
dimitern | alexisb: but it's sensible to expect deploy --to X it should work like add-machine X does | 16:21 |
alexisb | dimitern, also agree | 16:21 |
natefinch | cherylj: an error is a lot better than silently half-working... but yeah, should be fixed to mirror add-machine | 16:21 |
dimitern | alexisb: and babbageclunk's fix should get us there | 16:21 |
natefinch | huzzah :) | 16:21 |
dimitern | but not all the way | 16:22 |
alexisb | dimitern, though a note to the juju-core team might be good so that we highlight the change and educate the team | 16:22 |
alexisb | babbageclunk, ^^ | 16:22 |
dimitern | agreed | 16:22 |
dimitern | 16:22 | |
babbageclunk | alexisb, dimitern: Clarifying - am I sending the note about this change? | 16:25 |
dimitern | babbageclunk: I'd appreciate if you do it, I can help clarifying something or other if you need though | 16:25 |
dimitern | frobware: so the patch didn't work for aws | 16:26 |
alexisb | babbageclunk, yeah just to the juju-core lanchpad group | 16:26 |
frobware | dimitern: what happened? | 16:26 |
dimitern | frobware: ERROR juju.provisioner provisioner_task.go:681 cannot start instance for machine "0/lxd/0": missing | 16:26 |
dimitern | container network config | 16:26 |
dimitern | frobware: it slipped through somewhere.. looking | 16:27 |
frobware | dimitern: why do I think that's an existent bug... ? | 16:27 |
babbageclunk | dimitern, alexisb: Ok cool - I think I understand the wider issues now. Basically just that this will still do slightly weird things on clouds that don't support the container type, but at least that the add-machine and deploy behaviour is more consistent. | 16:28 |
alexisb | babbageclunk, yep | 16:28 |
alexisb | and we as a team should be clear on what the current behaviour is and the gaps, so we can both explain to users *and* make better desicions on what the behaviour should be | 16:29 |
alexisb | cmars, cherylj, do we have any progress on https://bugs.launchpad.net/juju-core/+bug/1581157 | 16:30 |
mup | Bug #1581157: github.com/juju/juju/cmd/jujud test timeout on windows <blocker> <ci> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged by dave-cheney> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1581157> | 16:30 |
dimitern | frobware: nope, it was a warning before | 16:41 |
dimitern | frobware: I'll need to add a few more tweaks to the patch and will resend | 16:41 |
cherylj | alexisb: I haven't heard anything from cmars about it | 16:50 |
=== frankban is now known as frankban|afk | ||
mup | Bug #1591290 opened: serverSuite.TestStop unexpected error <ci> <intermittent-failure> <regression> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1591290> | 17:18 |
dimitern | frobware: fixed patch: http://paste.ubuntu.com/17177501/ | 17:21 |
dimitern | frobware: should now work ok on AWS (testing again); all unit tests fixed | 17:23 |
dimitern | frobware: I probably should've proposed it rather than to bug you with it :/ | 17:24 |
natefinch | dimitern: you mention placement strings with = in them in that bug... but placement strings don't use = AFAIK? placement is like --to 0\lxc\0 or --to lxd:4 maybe you mean constraints? | 17:26 |
dimitern | natefinch: on maas you can do --to zone=foo | 17:28 |
dimitern | natefinch: and I think most others support zone= as well | 17:28 |
dimitern | natefinch: see, it's confusing :) | 17:29 |
natefinch | dimitern: gah, zone should be a constraint :/ | 17:37 |
natefinch | well... maybe not | 17:38 |
natefinch | I g uess constraints are for all units of a service | 17:38 |
natefinch | still... weird | 17:39 |
=== benji is now known as Guest73726 | ||
dimitern | natefinch: yeah, it can't be useful as a constraint if we're to do automatic zone distribution | 17:40 |
natefinch | dimitern: right (sometimes you might not want them distributed,. but that's the exception). Anyway... many valid placements do not use =... like specifying containers or machines | 17:41 |
dimitern | natefinch: there's also a container=lxd constraint btw, hardly tested | 17:41 |
babbageclunk | natefinch, dimitern: halp! After state.AddApplication's been called, the units are just staged, is that right? When/how does juju/deploy:AddUnits get called? | 17:42 |
babbageclunk | Is it triggered by a watcher of some sort? | 17:43 |
natefinch | babbageclunk: there's the unitassigner that makes sure units get assigned to machnies | 17:43 |
dimitern | babbageclunk: it goes like this: cmd/juju/application/deploy.go -> api/application/deploy -> apiserver/application/deploy -> juju/deploy -> state | 17:43 |
natefinch | babbageclunk: it's a worker | 17:43 |
babbageclunk | dimitern: yeah, I could follow that, but none of the code in that chain actually ends up calling AssignUnitWithPlacement. | 17:45 |
babbageclunk | natefinch: Ah, ok - thanks. | 17:45 |
natefinch | babbageclunk: yeah, we add a staged assignment during deploy, and then the unitassigner reads those and turns them into real assignments. | 17:46 |
babbageclunk | natefinch: ok - that makes sense. I was trying to understand why I didn't see the error I see in my unit test when running deploy manually. | 17:50 |
babbageclunk | natefinch: It's because the errors are raised by the unitassigner and logged somewhere, rather than coming back from the api to the command. | 17:52 |
dimitern | frobware: FYI, proposed it: http://reviews.vapour.ws/r/5040/ | 17:55 |
babbageclunk | dimitern, natefinch: review please? http://reviews.vapour.ws/r/5041/ | 18:10 |
dimitern | babbageclunk: cheers, looking | 18:10 |
babbageclunk | dimitern: I mean, you shouldn't now! It's late there! It's kinda late here now! | 18:10 |
babbageclunk | dimitern: but thanks! | 18:12 |
* babbageclunk is off home - have delightful weekends everyone! | 18:12 | |
dimitern | babbageclunk: likewise! :) | 18:12 |
frobware | dimitern: will take a look | 18:20 |
redir | brb reboot | 18:47 |
=== Spads_ is now known as Spads | ||
mup | Bug #1588924 opened: juju list-controllers --format=yaml displays controller that cannot be addressed. <juju-core:Fix Committed> <juju-deployer:Invalid> <https://launchpad.net/bugs/1588924> | 19:22 |
=== Spads_ is now known as Spads | ||
=== natefinch is now known as natefinch-afk | ||
perrito666 | cmars: still around? | 20:41 |
mup | Bug #1591379 opened: bootstrap failure with MAAS doesn't tell me which node has a problem <v-pil> <juju-core:New> <https://launchpad.net/bugs/1591379> | 21:19 |
cmars | perrito666, yep, what's up? | 21:26 |
* perrito666 deleted what he was writing because he began in spanish | 21:26 | |
perrito666 | cmars: I wanted to ask you about juju/permission | 21:26 |
perrito666 | We are sort of moving in another direction http://reviews.vapour.ws/r/4973/#comment27181 | 21:26 |
cmars | leo un poquito | 21:26 |
cmars | looking | 21:27 |
perrito666 | dont do that (the spanish) you just short circuited my brain badly :p | 21:27 |
perrito666 | it is fun to see your own language and not understand it | 21:27 |
cmars | :) | 21:28 |
cmars | perrito666, is there a doc or tl;dr for the permissions changes? | 21:28 |
mup | Bug #1591387 opened: juju controller stuck in infinite loop during teardown <juju-core:New> <https://launchpad.net/bugs/1591387> | 21:55 |
mup | Bug #1591387 changed: juju controller stuck in infinite loop during teardown <juju-core:New> <https://launchpad.net/bugs/1591387> | 21:58 |
mup | Bug #1591387 opened: juju controller stuck in infinite loop during teardown <juju-core:New> <https://launchpad.net/bugs/1591387> | 22:10 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!