[00:08] hey rick_h, just wanted to thank you for the help so far. [00:15] is there support for artful series on aws? [00:16] Im seeing juju has grabbed a machine http://paste.ubuntu.com/25970976/ [00:16] but just doesn't want to start [00:17] been sitting there for a while, making me think artful hasn't made the cut just yet [00:18] by a "while" I mean like 20 mins === frankban|afk is now known as frankban [09:49] is the jaas controller being super slow right now or is it just me? [09:50] ChaoticMind, what cloud/region are you seeing problems in? [09:50] aws/eu-central-1 [09:52] ChaoticMind, thanks I'll try it out and see if I see the same. [09:52] thanks [09:57] ChaoticMind, is there any particular command that's taking it's time for you? [09:58] mhilton: just deploying the bundle took forever (like 3 minutes for a smallish bundle). Setting relationships took like 15 seconds each [09:58] Usually it's about 0.5 seconds [09:59] I made a new model and tried it again now, it seems ok now! [10:00] ChaoticMind: I'll look into it, one of the controllers may be overloaded. Thanks for mentioning it. [10:01] no worries === salmankhan1 is now known as salmankhan === zeus is now known as Guest84717 === Guest84717 is now known as zeus [16:12] yoyoyo - whats the deal with artful deploys? Will someone `juju add-machine --series artful` on aws and let me know if I'm crazy [16:14] oooh shoot, looks like adding a machine of artful worked actually [16:14] nm [16:14] jeeze [16:14] ahh, `juju deploy ubuntu --series artful --force` is what is failing [16:16] bdx: how about juju add-machine --series artful; juju deploy ubuntu --to X --force? [16:16] just a shot in the dark [16:16] kwmonroe: no, great shot, I actually just did that and it worked [16:16] sweet [16:17] and it looks like what was failing me last evening is now working too, with the `juju deploy ubuntu --series artful --force` [16:17] gd [16:18] I was experiencing some extreme jitter yesterday on JAAS I think [16:18] I was trying to get an artful deploy going for quite a while and it was just failing at machine "pending" [16:18] really strange [16:18] ha! yeah, "juju deploy ubuntu --series artful --force" just worked for me too on aws [16:34] hey, so as part of my ongoing evaluation of juju, I've just created an HA controller. But how do I specify what subnets to create the ha-instances into? [16:35] jam:^^ [16:36] R_P_S: that is possible with the `--to` directive, its just not documented yet [16:37] I think its something like `--to subnet=subnet-` [16:37] so $ juju enable-ha --to subnet=subnet-priv1b --to subnet=subnet-priv1c ? [16:38] R_P_S: let me see if I can get it, omp [16:40] R_P_S: `juju bootstrap aws/us-west-2 --to subnet=subnet- --credential mycred` [16:40] ^^ worked [16:40] I'll see about the HA omp [16:40] yeah, that worked for the first instance... [16:40] instances are launching faster then I've ever experienced [16:41] but after creating the initial controller, enabling HA put them in random subnets as far as I can tell [16:41] I already have a bootstrapped controller [16:41] crazy [16:41] including moxing public and private subnets for controllers 1 and 2 [16:43] R_P_S: `juju enable-ha --to subnet=subnet-,subnet=subnet-` [16:43] worked seamlessly [16:44] R_P_S: would you mind putting some heat on this please https://github.com/juju/docs/issues/2122 [16:47] I don't have a github account and I'm at work... I'll need to do that later... as an aside, that ticket doesn't appear to be about enable-ha [16:52] so now I'm not sure how to remove the extra HA controllers [16:52] https://jujucharms.com/docs/2.2/controllers-ha [16:53] juju status doesn't have any mention of "has-vote" for the controller model... and "remove-machine" just fails with a message that "machine 2 is required by the model" [16:55] R_P_S: so juju show-controller should mention HA status bits and has-vote I believe [16:55] hi all, I'm trying to do a simple LXD conjure-up k8s with help, all-in-one. But it fails from the get-go. [16:55] R_P_S: you can always remove-machine --force but yea, best to know what's up there. [16:56] jamesbenson: whats the issue? [16:56] jamesbenson: bummer, what's the issue? I'm sure folks can get you good to go here. [16:56] thanks stokachu and rick_h [16:56] I hope so [16:57] So there seems to be a few issues. Sidenote: I'm doing this from a ubuntu server VM in openstack, xlarge. Pretty sure that doesn't matter, but just in case [16:57] jamesbenson: whats the hw specs? [16:57] ram, cpus [16:58] 8 vCPU, 16GB RAM, 160 GB HD [16:58] ok should be fine [16:59] ubuntu 16. LTS [16:59] These are the commands I do from deploy: http://paste.openstack.org/show/626536/ [17:00] https://snag.gy/nT0LPv.jpg [17:00] that's the latest state... [17:01] actually I've tried twice... here's the other: https://snag.gy/Z9cs2x.jpg [17:02] thoughts stokachu? [17:15] so I'm trying to rollover controllers by simply terminating "bad" ones in aws directly and re-running enable-ha [17:16] but I'm still unable to remove machines that don't show up with ha-status enabled in "show-controller" [17:17] I upped enable-ha to 7 to test the subnets... it looks like I need to specify the subnets with each enable-ha command :\ [17:18] but show-controller lists machines 0,3,4,5,6,7,8 (1,2 were "demoted" accoding to enable-ha output)... but I still get "machine 1 is required by the model" [17:19] and one thing I've found is that using --force for remove-machine leaves an orphaned security group [17:22] jamesbenson: just a guess, but can those units get to the outside world? i know etcd and k8s charms snap install stuff, so i wonder if they're having trouble getting out. can you pastebin a "juju debug-log --replay -i etcd/0"? [17:27] juju remove-machine 1 --force [17:27] fails [17:31] this bug was opened almost a year https://bugs.launchpad.net/juju/+bug/1658033 [17:31] Bug #1658033: Juju HA - Unable to remove controller machines in 'down' state <4010> === frankban is now known as frankban|afk [17:39] R_P_S: downsizing the controller cluster isn't supported [17:40] you have to dump the db and restore to a smaller cluster (I think) [17:40] correct, once an -n N has been specified, it can't be shrunk [17:40] but I'm trying to simulate failure [17:40] so I terminated one instances and reran enable-ha to rebuild new ones [17:40] but the terminated onces are still in the list, unable to be removed [17:41] I'm up to 13 "machines" in the config, with 5/7 ha (currently rebuilding) [17:41] * controller admin superuser aws/us-east-1 2 13 5/7 2.2.6 [17:41] kwmonroe: I'm hitting it again, I just deployed these and they stood up just fine, tore it down and redeployed and its the artful instances that have been in pending for > 20mins now -> http://paste.ubuntu.com/25975589/ [17:52] kwmonroe: create a new model on the same controller, then deployed the same charm http://paste.ubuntu.com/25975644/ [17:52] see what I'm saying about the inconsistencies ? [17:52] bdx: i'm in us-east-1, and just verified "juju deploy ubuntu --series artful --force" worked again. hard to say what's up with it being intermitten. do a "juju ssh -m controller 0" and sudo grep around /var/log/juju for 'machine-X' to see if there's a provisioner issue. [17:52] right [17:53] yeah bdx, frustrating for sure.. i'm hoping there's something in the controller log that will be more insightful about an artful provisioning issue. [17:55] kwmonroe: http://paste.ubuntu.com/25975668/ - oh man [18:00] bdx: i haven't seen "failed to start instance (failed to start instance in provided availability zone)" before, and no sign of it in my controllers. however, i'm on 2.3-beta1 so that could be new logging in beta3. [18:01] bdx: what kind of constraints do you have for redis-cache? any wicked machine reqs there? [18:01] root-disk, spaces [18:02] testing w/o any constraints [18:03] bdx: i was hoping you had "instance-type=p3.xxlarge" and i could just say "us-west is simply out of those instance types", but that doesn't seem like the case. [18:05] kwmonroe: it was the constraints [18:05] I removed them, and wala [18:05] must have been spaces, right? surely not root-disk [18:05] I wonder if I'm hitting disk cap on aws [18:05] testing that right now [18:06] yeah, make sure you're asking for GB and not PB ;) [18:06] othwise RIP your wallet [18:09] ha, yeah, "G" [18:09] so I just logged into aws console and created 10 x 100G ebs volumes [18:09] no issues [18:11] kwmonroe: https://bugs.launchpad.net/juju/+bug/1706462 [18:11] Bug #1706462: juju tries to acquire machines in specific zones even when no zone placement directive is specified

[18:12] see my comment at the bottom [18:19] kwmonroe: I'm about to suggest something crazy [18:19] http://paste.ubuntu.com/25975792/ [18:19] taking ^ into consideration [18:20] redis-space and ubuntu-space are both deployed with only a "spaces" constraint [18:20] the ubuntu-space didn't have a --series constraint [18:20] or bah [18:20] --series argument [18:21] the only failures I'm seeing here are when '--series' is specified alongside a spaces constraint [18:21] because we see from ^ that redis-disk worked, it had '--series artful' and '--constraints "root-disk=100G" [18:21] and ubuntu-space worked [18:22] which had no '--series' arg, but had the spaces constraint [18:22] but the only thing failing consistently [18:23] are things deployed to a space that has the '--series' arg [18:23] I'll prove it by specifying '--series' with another series other than artful [18:23] how about zesty [18:24] since we see from ^ that zesty worked w/o a spaces constraint [18:25] bdx: i don't know enough about juju's zone handling, but happened here with graylog / #38? http://paste.ubuntu.com/25968550/ [18:25] did graylog have constraints? [18:26] no [18:26] well yes [18:26] but that isn't happening because of that [18:27] that happens with every single instance deployed with 2.3beta3 [18:27] it eventually gets past the "failed to start instance (failed to start instance in provided availability zone) " and finds one and eventually starts [18:27] I was just posting that to show that its not only maas thats having that issue [18:28] ok, well I think this verifies my theory http://paste.ubuntu.com/25975824/ [18:28] I jus deploy the ubuntu-zesty-space [18:28] it required the spaces constraint and --series [18:29] and it failed similar to the artful [18:29] just stuck pending [18:29] #@(*$U(#@*$@#*& [18:30] idk [18:30] I may as well go back to sleep [18:30] somehow I knew today would be a trying day [18:37] :) [18:39] bdx: i would note that in bug 1706462, that spaces + series repro this easily on aws [18:39] Bug #1706462: juju tries to acquire machines in specific zones even when no zone placement directive is specified

[18:40] kwmonroe: series + spaces only with artful [18:40] AND @kwmonroe [18:40] ^ bug is entirely different then what I'm seeing I think [18:40] kwmonroe: sorry for the delay, turkey-luncheon thingy.... that command is giving me a TLS handshake timeout.. [18:41] kwmonroe: The instance can ping google... [18:41] kwmonroe: this verifies that it is only happening with artful http://paste.ubuntu.com/25975887/ [18:42] kwmonroe: what I'm seeing is the instances stay in pending for only series + space + artful [18:42] kwmonroe: what I'm seeing is the instances stay in pending for only series + space + artful [18:42] 1706462 - failed to start instance (failed to start instance in provided availability zone) within attempt 0, ret [18:42] rying in 10s with new availability zone [18:43] wait bdx, your previous paste shows machine 8 waiting for machine with series zesty: http://paste.ubuntu.com/25975824/ [18:43] but then on *a* next attempt, juju will find an instance, and start it, and go on its way [18:43] kwmonroe: ah, my bad, yea, that machine started [18:43] which made me realize, in all cases, its only artful that is the commonality here [18:44] when used with spaces + series [18:44] try it [18:44] oooh, it may be only beta3, let me try this on jaas [18:48] jamesbenson: how about just "juju debug-log"? does that give you a tls timeout too? [18:48] works great on jaas http://paste.ubuntu.com/25975933/ [18:48] kwmonroe: the juju agent never starts, so I don't get any log from those instances [18:48] yes [18:48] kwmonroe: ^ [18:49] oooh jamesbenson [18:49] my b [18:49] lol [18:49] :) [18:49] http://paste.openstack.org/show/626542/ [18:49] jamesbenson: ooooohhh, i thought you meant the debug-log command wasn't showing any output. [18:50] kwmonroe: No, seems to have issues with net/http: TLS handshake timeout... [18:51] I'm not sure why that is, though, since easyrsa is able to get active ... [18:51] so it must be able to reach out, correct? [18:52] jamesbenson: easyrsa doesn't snap install anything [18:52] etcd and k8s charms do [18:52] oh man.. [18:52] so something with the bridge then [18:52] so jamesbenson, i'll bet you all the money in my pockets that if you do a "juju run --unit easyrsa/0 'sudo snap install etcd'", it'll fail [18:53] kwmonroe: seems to be just sitting there... [18:54] yep, same error [18:54] juju run --unit easyrsa/0 'sudo snap install etcd' [18:54] error: cannot perform the following tasks: [18:54] - Download snap "core" (3440) from channel "stable" (Get https://068ed04f23.site.internapcdn.net/download-snap/99T7MUlRhtI3U0QFgl5mXXESAiSwt776_3440.snap?t=2017-11-16T20:00:00Z&h=30ced1b835617d49d8ff4221a62d789f7ca638aa: net/http: TLS handshake timeout) [18:54] sorry about the paste there... [18:54] jamesbenson: to test the tls/http connectivity more generically, do this.. juju ssh etcd/0, then wget https://google.com from the etcd unit. [18:55] (make sure it's https) [18:55] ok, here it is https://bugs.launchpad.net/juju/+bug/1732764 [18:55] Bug #1732764: series + spaces + artful + juju2.3beta3 = fail [18:55] kwmonroe: works. [18:55] interesting [18:56] http://paste.openstack.org/show/626545/ [18:56] jamesbenson: how about a "sudo snap install etcd" from that same etcd unit? [18:58] kwmonroe : nope...http://paste.openstack.org/show/626546/ [18:59] interesting... [18:59] jamesbenson: if this is an egress-restricted environment and you're unable to hit the snap store, I can provide you with some steps for installing them manually. [18:59] all ports are open.... [18:59] I'll double check though.. [19:01] bdx: nice detail in 1732764. interesting that's it's such a specific combo. also, you may want to s/"spaces=myspace"/"spaces=facebook" in case a more recent social media platform helps. [19:01] ryebot kwmonroe: iptables are empty, and security group is open all ports in and out. [19:01] http://paste.openstack.org/show/626547/ [19:03] This is the only rule in my iptables -t nat: MASQUERADE all -- 10.55.234.0/24 !10.55.234.0/24 /* managed by lxd-bridge */ [19:04] jamesbenson: stick some quotes around that url... wget 'https://068ed04f23.site.internapcdn.net/download-snap/99T7MUlRhtI3U0QFgl5mXXESAiSwt776_3440.snap?t=2017-11-16T20:00:00Z&h=30ced1b835617d49d8ff4221a62d789f7ca638aa' [19:04] hmm... still shows connected. But once connected it sits. [19:05] jamesbenson: how about running "env | grep -i proxy" on that unit. anything in there? [19:07] NO_PROXY=10.55.234.245,127.0.0.1,::1,localhost [19:07] no_proxy=10.55.234.245,127.0.0.1,::1,localhost [19:09] hmph jamesbenson, that seems legit [19:10] ....so confused.... not a good sign that everything seems legit from you too... [19:11] kwmonroe: do you deploy on baremetal or in VM's? Do you have any script or anything? [19:22] jamesbenson: by "legit", i meant the no_proxy stuff looks legit :) if you can't do a "sudo snap install foo" from the unit, juju won't be able to either. [19:22] jamesbenson: there's a gremlin in there to be sure. just need to figure out why those unit's can't snap install. [19:23] jamesbenson: i typically deploy to clouds or localhost (lxd). not much experience with maas. [19:24] well this is in an openstack VM, so not with maas... [19:24] ah, right [19:25] I know I can deploy using openstack magnum, but want to do it manually... [19:25] well jamesbenson, from what i can tell, apt install works and wget works, so it's not like your units are totally locked down. i'm not sure what's causing snap install to fail. [19:25] error: cannot install "foo": snap "core" has changes in progress [19:26] silly rabbit, dont' actually stick 'foo' in there [19:26] :-p [19:26] hey, didn't know if it was a test option ;-) [19:27] ansible ping/pong test ... [19:27] :) [19:27] jamesbenson: what does snap changes show? [19:27] "snap changes" [19:27] i'm guessing it's stuck somewhere trying to download the core snap [19:27] http://paste.openstack.org/show/626550/ [19:28] you'll like that.. [19:28] heh, classy [19:29] jamesbenson: how about a "snap download etcd"? [19:30] we should see the tls error.. just making sure. [19:31] "Hello Kubernetes support desk, Kevin speaking, how may I help you today???" [19:31] phew! backup arrives. magicaltrout, meet jamesbenson. he's having trouble snap installing k8s. [19:31] yep, tls error [19:31] i have many k8s installations [19:31] too many [19:32] magicaltrout: any on openstack? [19:32] sorta [19:32] its manual though not openstack cloud provider [19:33] magicaltrout: I've got a ubuntu 16 LTS, VM sitting in openstack. Security group is completely open. No iptables rules... [19:34] 8 vCPU, 16GB RAM, 160 GB HD; deployed using these commands: http://paste.openstack.org/show/626536/ [19:37] can't seem to install though, giving me tls errors. [19:41] okay, jamesbenson your cluster lives inside lxd on nodes on openstack? [19:42] yes [19:42] VM in openstack, lxd on that VM. [19:42] hmm i've not tried that before [19:42] if you snap install at vm level does it work? [19:42] yeah, did that to install conjure-up [19:42] and lxd [19:45] hmm [19:47] jamesbenson: it feels like something about your lxd-bridge is interferring with fetching data from the snap store, but i can't fathom a reason why it would affect snap and not apt or wget. [19:49] I am having difficulties adding subnets to spaces to ensure instances are deployed in the correct VPC/AZ [19:49] I get an error "cannot add subnet: no subnets defined" while running [19:50] juju add-subnet 1.2.3.4/5 public subnet-12345678 [19:52] kwmonroe, magicaltrout: do you have general guidelines/rules/instructions on how you set up lxd, zfs, and the network? [19:53] ipv6 is disabled... [19:53] but I wasn't sure about the bridge [19:53] i've only installed k8s with conjure up on lxd once [19:53] i just did whatever it told me [19:53] how do you typically install it? [19:54] i have 1 standard aws install and 3 openstack manual provider installs [19:54] I'm doing lxd to do some dev with multiple "nodes" in an all in one... [19:55] openstack with magnum? [19:55] nope [19:55] manual? [19:55] yeah just spin up some nodes [19:55] and deploy some stuff to them [19:55] using which method? [19:56] https://jujucharms.com/docs/2.2/clouds-manual [19:56] just like a small 3 node cluster for k8s dev [19:57] jamesbenson: how's your lxd bridge configured [19:57] stokachu: any command to detail it? [19:58] jamesbenson: lxc network show lxdbr0 [19:59] jamesbenson: easiset to do `lxc network show lxdbr0|pastebinit` [19:59] http://paste.ubuntu.com/25976235/ [19:59] do you have another bridge defined? [19:59] no idea about the pastebinit... awesome.. [20:00] jamesbenson: whats `lxc network list|pastebinit` show [20:00] http://paste.ubuntu.com/25976248/ [20:01] http://paste.ubuntu.com/25976252/ [20:01] jamesbenson: yea youve got no bridge defined for lxd to use [20:02] okay... [20:02] jamesbenson: how'd you create lxdbr0 before? [20:02] sudo lxd init --auto [20:03] jamesbenson: fwiw, we have a generic lxd guide: https://jujucharms.com/docs/stable/tut-lxd. might be worth following that and bootstrap on a new node, then "juju deploy ubuntu", then "juju ssh ubuntu/0" and see if a "sudo snap install core" works. [20:03] kwmonroe: it's his bridge [20:03] it isn't configured [20:03] BRB [20:04] could you send me a few commands? [20:04] jamesbenson: `lxc profile show default|pastebinit` [20:05] bdx / rick_h: any ideas why add-subnet is not working and complaining about subnets not being defined? [20:05] stokachu: if the bridge was borked, how did he get this far with kubeapi-load-balancer going active: https://snag.gy/nT0LPv.jpg [20:06] (and easyrsa) [20:06] well for one, his lxd bridge is inet addr:10.55.234.1 [20:06] and those ip's are different [20:06] http://paste.ubuntu.com/25976291/ [20:07] kwmonroe: also thats not output from conjure-up [20:07] so i dont know what he did there [20:08] jamesbenson: basically your lxd network bridge is acting up [20:08] jamesbenson: i recommend tearing down that setup [20:09] jamesbenson: juju kill-controller localhost-localhost [20:09] okay [20:09] and how do I bring it back up? [20:09] then delete that lxdbr0 bridge [20:09] one sec [20:09] ok [20:09] thanks 🙏 [20:09] jamesbenson: then do `sudo brctl delbr lxdbr0` [20:11] jamesbenson: let me know when you've done that, and give output of `ip addr|pastebinit` [21:01] so I just discovered that by creating a new model, the subnets aren't populated... [21:02] the subnet info is available in the controller and default models, but I want to build a model for each environment [21:02] how do I populate... or copy the subnet info from one model to the other? [21:07] juju switch default && juju list-subnets -> full subnet output [21:07] juju switch dev-k8s && juju list-subnets -> No subnets to display [21:20] R_P_S: does 'juju reload-spaces' while in the dev-k8s model do anything? [21:21] R_P_S: on aws, all new models look populated with subnets for me. [21:24] reload-spaces appears to not do anything [21:25] hold on [21:26] would reload-spaces be dependent on a vpc-id being specified in the model-config? [21:30] stokachu: I think it's easier to reset the VM and start from scratch, no? [21:30] jamesbenson: probably [21:34] stokachu: So it's rebuilt. [21:34] ok so do this, `sudo apt-add-repository ppa:ubuntu-lxc/stable` [21:34] `sudo apt update && sudo apt install lxd lxd-client` [21:35] then `lxd init --auto` (no sudo) [21:35] then `lxc network create lxdbr0 ipv4.address=auto ipv4.nat=true ipv6.address=none ipv6.nat=false` [21:35] then `snap install conjure-up --classic` [21:35] and run conjure-up [21:38] snap needs sudi [21:38] sudo [21:39] running conjure-up [21:40] ok, turns out you can't just add a VPC to a model after the fact (got errors), as the VPC parameters need to be specified during model creation with --config [21:41] stokachu: oooo something different is happening... getting a good feeling ^_^ [21:41] jamesbenson: \o/ [21:42] what's the watch command again? [21:43] got it [21:50] ok, now I'm straight up running into this bug :( https://bugs.launchpad.net/juju/+bug/1704876 [21:50] Bug #1704876: can't deploy to specific AWS subnets due to `juju add-subnet` fails [21:59] how do you delete a space in a model? [21:59] spaces can’t be deleted currently. :-( [21:59] ... [22:00] are spaces completely broken? :( can't delete, can't add subnets to a space... can't do anything with them? yet they're core to defining where things will be deployed? [22:01] i never use spaces personally - there are other ways to define how things are deployed [22:01] depends on the cloud you’ve bootstrapped [22:01] I'm following: https://insights.ubuntu.com/2017/02/08/automate-the-deployment-of-kubernetes-in-existing-aws-infrastructure/ [22:02] how would I rewrite this command then to not use spaces? [22:02] juju deploy --constraints "instance-type=m3.medium spaces=private" cs:~containers/etcd-23 [22:03] ah… [22:03] you can just make a space with a different name to use - suboptimal i know - [22:04] the only things I've done different so far are that I'm not using cloudformation (infrastructure preexisting) and creating a model [22:04] But how do I use empty spaces? [22:04] since I can't add-subnet to a space? [22:04] stokachu : etcd/0 Missing relation to certificate authority. [22:05] https://snag.gy/5ED6sa.jpg [22:05] ah, my nginx just became active.... [22:08] so apparently you need to define your subnets when calling add-space... [22:09] do it once, don't screw it up... and if you ever accidentally re-assign a subnet to a different space, you're screwed? [22:14] kwmonroe stokachu : thoughts? seems like having a similiar issue like before. the bridge is managed now though [22:16] If there is no error in juju status then give it time [22:17] error hook failed: "install" ? [22:17] for all etcd [22:17] but keeps on restarting... [22:17] Is this a full VM you're running? [22:17] full? yes, ubuntu server cloud image... [22:18]