[00:05] <stokachu> fallenour_: how'd it go?
[01:07] <fallenour_> still trying to install
[01:07] <fallenour_> keep getting faile derror, just realized though the error was lying, and the install was fine
[01:07] <fallenour_> so the past about 6 installs were for nothing, and wasted all because of a false error reporting
[01:08] <fallenour_> needless to say, juju is bringing me bad juju, and making me one sad panda
[01:08] <fallenour_> yogurt made my night better though
[01:11] <fallenour_> also question, Im installing standard conjure-up deployment, why is it that ceph-osd 2 and 3, the standard ones +1 arent seeing the ceph-mon, even though the system automatically installs it?
[01:14] <fallenour_> @stokachu
[01:16] <stokachu> fallenour_: they should be related and once the deployment is complete they would see each other
[01:16] <stokachu> fallenour_:also we are testing `sudo snap refresh conjure-up --candidate`, you may have a better experience there
[01:17] <fallenour_> @stokachu yea its cleaning up
[01:17] <fallenour_> you guys should really consider using my project as a large scale guinea pig
[01:17] <fallenour_> as insane as it drives me, its a great project, and a great idea, and I use openstack to provide a lot of services for free to a lot of major efforts
[01:17] <fallenour_> as insane as it drives me, its a great project, and a great idea, and I use openstack to provide a lot of services for free to a lot of major efforts
[01:18] <stokachu> what project?
[01:18] <fallenour_> Project PANDA, short for platform accessibility and development acceleration
[01:18] <stokachu> is it public?
[01:18] <fallenour_> Its designed to provide free infrastructure and services to nonprofits, research institutes, universities, and OSS developers
[01:18] <fallenour_> yeap, very public
[01:19] <stokachu> whats the project url?
[01:19] <fallenour_> pending these last hurdles, I expect to take it fully public and live by the end of this month
[01:19] <fallenour_> 100 Gbps pipe, and about 10 racks of gear to start with
[01:19] <fallenour_> 3 supercomputers (small beowulf clusters)
[01:20] <fallenour_> 3 supercomputers (small beowulf clusters)
[01:20] <fallenour_> damn, neutron gateway errored out
[01:21] <stokachu> fallenour_:yea neutron needs access to a bridge device
[01:21] <fallenour_> @stokachu giving me a "config-changed " error
[01:21] <stokachu> so depending on your server you can set a range of bridges for neutron to search through
[01:21] <fallenour_> it should have one
[01:21] <fallenour_> right now the test stack is about 15 servers
[01:22] <fallenour_> does it configure a bridge when building via conjure-up?
[01:22] <fallenour_> it deploys the system, I figured it did by default
[01:22] <fallenour_> via eth1....
[01:22] <fallenour_> o.o
[01:22] <fallenour_> 8O
[01:23] <stokachu> fallenour_:https://jujucharms.com/neutron-gateway/237 look at port configuration
[01:23] <stokachu> fallenour_:not openstack on maas, that's up to you
[01:23] <stokachu> you can configure the port in the configure section for neutron gateway
[01:24] <fallenour_> oh my dear lawd! https://jujucharms.com/neutron-gateway/234
[01:24] <fallenour_> Holy geebus Batman! you even provided me the config links via the status command output
[01:26] <fallenour_> its not letting me ssh in?
[01:30] <fallenour_> isnt it supposed to inherit my maas ssh key?
[01:37] <fallenour_> hmmm
[01:50] <fallenour_> @stokachu Hey just an fyi, one of the systems we are working on we is an equivalent to Redhat Satellite for Openstack Environments, didnt know if thats a system already
[01:51] <fallenour_> but its major helpful for us, especially because we have limited bandwidth at the current location
[01:59] <fallenour_>    
[02:32] <fallenour_>      
[02:44] <fallenour_> not seeing two of my storage nodes in my volumes, can anyone provide any insight as to why?
[02:44] <fallenour_> I have ceph-mon and ceph-osd installed
[02:44] <fallenour_> ceph-mon shows 5/5 of cluster
[03:40] <bdx> rick_h: just to recap, I was haggling the collectd charm to get the prometheus-node-exporter, I just ended up going with subordinate that relates to prometheus on the scrape interface https://jujucharms.com/u/jamesbeedy/prometheus-node-exporter/1
[03:40] <bdx> and just dropping collectd
[08:35] <tlyng> I'm trying to bootstrap a controller on azure and it's stuck at "Contacting Juju controller at <internal-ip> to verify accessibility...". The controller VM get assign an internal IP and an external IP. I've tried connecting to the external IP using SSH and that is successful. How is juju supposed to connect to an internal IP at azure which is not routable from here? Apart from that I noticed the API server is listening on port 17070 or so
[08:35] <tlyng> Is there a list of ports that need to be open (apart from ssh) in firewall to actually manage to use juju on public clouds?
[10:30] <tlyng> I deployed Kubernetes using JAAS, but when trying to download the kubectl configuration from kubernetes-master/0 I get an authentication error. My private ssh key is not recognized by that node (juju scp kubernetes-master/0:config ~/.kube/config), how am I supposed to get hold of this configuration?
[10:35] <mhilton> tlyng: have you tried running juju add-ssh-key to add your key to the model?
[10:36] <tlyng> mhilton: no, didn't even know that command existed (I'm new :-)) I will try it. Should I do it before I deploy the model or is it possible to do it after it's up and running?
[10:37] <mhilton> tlyng: I think it should work after the model is up and running.
[10:37] <rogpeppe1> tlyng: what mhilton says
[10:37] <mhilton> tlyng: if your key is in github or launchpad then it can also be imported with juju import-ssh-key which might be slightly easier.
[10:39] <tlyng> mhilton: Ok thanks, I'll try. Another quick question if you have time / knowledge about it. I've tried bootstrapping my own controller at Azure, but after it has launched the bootstrap agent it tries to connect to the VM's internal IP address - which is not routable.
[10:39] <tlyng> Contacting Juju controller at 192.168.16.4 to verify accessibility... ERROR unable to contact api server after 1 attempts: try was stopped
[10:41] <mhilton> tlyng, azure can be slow to bootstrap, it sometimes has to wait a while before it get's an external IP address. What version of juju have you got (output of "juju version")
[10:42] <tlyng> 2.2.3-sierra-amd64
[10:43] <tlyng> (the one provided by homebrew on mac)
[10:44] <tlyng> it connects using the external IP to bootstrap (after it first try to use the internal IP). But when it's waiting for the controller it only tries the internal IP, it deletes everything when it fails.
[10:46] <mhilton> tlyng: OK that's interesting. I'll see if I see the same behaviour.
[10:48] <tlyng> Sadly I have to use Azure, at least for the time being. It looks like Microsoft has created this stuff called "security" and told the authorities about it. So if you're in the financial industry only "azure" is certified/approved by the government.
[11:44] <mhilton> tlyng, I've just successfully bootstrapped an Azure controller with that juju version. I think your bootstrapping problem was that it couldn't talk to port 17070 on the external address. Even though it only said it was contacting the internal address it will be contacting all of them at the same time.
[11:45] <mhilton> tlyng: port 17070 is the only port you'll need access to for juju to communicate with the controller.
[11:46] <tlyng> mhilton: Ok, thank you. From now on I will use my phone as modem. Did I mention I hate firewalls?
[11:46] <mhilton> tlyng: The easist way to run models on Azure is through JAAS
[11:47] <rick_h> tlyng: I'm testing it as well and seeing some issues. I'm working to collect a bootstrap with --debug for filing a bug. At the moment seems Juju can't get the agents needed. :/
[11:47] <rick_h> tlyng: I'll bug balloons once it finishes timing out and get a bug report going
[11:54] <tlyng> What about persistence volume claims after deploying to Azure, does they work out of box?
[11:55] <tlyng> Currently it says "Pending" and it's been like that for some time.
[12:15] <urulama> mhilton, rick_h: fyi, i was able to bootstrap on azure/westeurope with 2.2.3 ... might be region thing
[12:32] <ejat> hi .. can we use --constraints with bundle ?
[13:21] <fallenour_> !ceph
[13:24] <rick_h> ejat: you stick the constrains on the machine or application in the bundle.
[13:28] <BarDweller> Nice work on adding flush =) my vagrant provisioning is a little more chatty now =) nice to see it slowly put the world together =)
[13:44] <rick_h> urulama: mhilton tlyng so I did get azure to bootstrap but it literally took 13min to get there.
[13:47] <tlyng> rick_h: yes, it's slow. I'm still unable to use azure storage and the loadbalancer stuff (for services). It doesn't look like the canonical distribution of kubernetes actually configure cloud-providers, which I would say is broken.
[13:47] <tlyng> using ceph on cloud providers ain't that wise
[13:47] <tlyng> (due to fault domains, data locality etc)
[13:53] <stokachu> BarDweller: nice!
[13:53] <SimonKLB> tlyng: juju currently doesnt enable charms to do anything cloud native such as setting up policies etc but with conjure-up there is some initial work on bootstrapping the kubernetes cluster (only on aws for now)
[13:54] <fallenour_> I figured out my problem is that the keyrings are in the wrong place, which is why it never got configured, but I need to know the cluster id so I can move the keyring to the appropriate directory @stokachu
[13:54] <SimonKLB> tlyng: see https://github.com/conjure-up/spells/pull/79
[13:55] <stokachu> coreycb: jamespage ^ do you know anything about this wrt ceph-mon/ceph-osd?
[13:55] <stokachu> tlyng: yea azure is next on our list to enable their storage/load balancer
[13:55] <fallenour_> the ceph god himself o.o
[13:55] <stokachu> :)
[13:55] <fallenour_> I am not worthy o.o
[14:03] <fallenour_> by the way, for future reference guides on Ceph-OSD, please see: http://docs.ceph.com/docs/jewel/rados/operations/add-or-rm-osds/ http://docs.ceph.com/docs/master/radosgw/admin/ http://docs.ceph.com/docs/master/radosgw/config-ref/ https://fatmin.com/2015/08/13/ceph-simple-ceph-pool-commands-for-beginners/
[14:04] <fallenour_> http://docs.ceph.com/docs/dumpling/rados/operations/pools/ http://ceph.com/geen-categorie/how-data-is-stored-in-ceph-cluster/
[14:04] <fallenour_> all very good resources
[14:15] <jamespage> fallenour_: give me the 101 on what you are trying todo
[14:35] <fallenour_> @jamespage hey james, https://github.com/fallenour/panda this is what I am working towards.
[14:36] <fallenour_> right now my struggle is getting the environment stable so I can go live, which is proving to be difficult
[14:37] <fallenour_> right now I think the issue is related to ceph-osd and ceph-mon, specifically with the /var/lib/ceph/mds directories missing on all ceph-mon  and ceph-osd systems
[14:38] <fallenour_> the error output is  "No block devices detected using current configuration" and "  auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring: (2) No such file or directory"
[14:39] <fallenour_> my direct thoughts are that since the directory /var/lib/ceph/mds was never created, and the /etc/ceph/ceph.conf file points to it for a keyring for mds, that is the reason why its not working or responding to ceph-osd commands, w hich would make since why it thinks there arent any ceph block storage
[14:39] <fallenour_> what confuses me the most though is in my horizon, I see 3 of the 5 storage devices.
[14:39] <fallenour_> my guess is that because nova-compute is still working, the host can still see the storage, even though it may not be able to use it.
[14:41] <fallenour_> Ive identified that /var/lib/ceph/mon has its keyring, and both upgrade keyrings are present, but im not sure what keyring to copy to fix it, or if thats even the issue.
[14:42] <fallenour_> one thing I did realize is that the keyring in /var/lib/ceph/mon/$cluster_id is the same keyring across multiple systems, but im not sure what uses it.
[14:43] <jamespage> fallenour_: thats generated by ceph during the cluster bootstrap process I think
[14:43] <fallenour_> yea I found the bootstrap scripts for that
[14:43] <fallenour_> @jamespage Do you think the error might be that the mds directories were never created? And if so, why didn't the yaml build script build those?
[14:43] <jamespage> the mds directory being missing should not be a problem - that's related to ceph-fs
[14:44] <fallenour_> mmk
[14:44] <jamespage> where are you trying to run the ceph commands from?
[14:44] <fallenour_> from juju
[14:44] <jamespage> example?
[14:44] <jamespage> which unit?
[14:45] <fallenour_> juju run --unit ceph-osd/3 .....
[14:45] <fallenour_> and ive tried runnign it on multiple systems
[14:45] <fallenour_> do I need to run it specifically against the radosgw system?
[14:46] <fallenour_> also, one thing I just noticed, my $cluster_id variable is empty on the ceph nodes. If im not mistaken, that variable is used to define where keyrings are located
[14:46] <fallenour_> @jamespage how can I verify that the variable is populated properly, aside from juju run --unit ceph-osd/3 'echo "$cluster_id"'
[14:47] <jamespage> fallenour_: that's internal to ceph, not an environment variable
[14:47] <jamespage> the cluster_id is by default 'ceph'
[14:47] <fallenour_> @jamespage ahh I see. So what happens when ceph needs that variable or something outside of ceph needs the variable info in order to locate the keyring?
[14:48] <jamespage> that all gets passed via command line options
[14:48] <jamespage> fwiw the ceph-osd units don't get admin keyrings so you won't be able to run commands from those units
[14:48] <jamespage> only from the ceph-mon units, where "sudo ceph -s" should just work
[14:48] <fallenour_> @jamespage my /etc/ceph/ceph.conf file still reads at /var/lib/ceph/mon/$cluster-id in the config file
[14:49] <fallenour_> @jamespage I made all of my units a ceph-osd / ceph-mon pair. I didnt know if they all needed ceph mon, so I made 5 and 5 respectively
[14:50] <jamespage> fallenour_: ok so that's actually broken atm - you can't co-locate the charms (there is a bug open)
[14:50] <fallenour_> @jamespage ooooh...
[14:50] <jamespage> fallenour_: normally we deploy three ceph-mon units in LXD containers, and ceph-osd directly on the hardware
[14:50] <fallenour_> @jamespage Yea thats what I did
[14:51] <fallenour_> @jamespage I put all 5 on hardware, and 5 in lxd containers, ceph-osd hardware, ceph-mon lxd
[14:51] <fallenour_> @jamespage I figured it was done that way for a reason, so I copied the design for the other 2 additional storage units
[14:51] <jamespage> oh well that should work just fine - what does "sudo ceph -s" on a ceph-mon unit do?
[14:51] <jamespage> but 5 is overkill - 3 is fine
[14:52] <jamespage> there is no horizotal scale-out feature for ceph-mon - its control onlu
[14:52] <jamespage> have to drop for a bit to go find my room at the PTG
[14:53] <fallenour_> I didnt want to have to scale it later, I figured 5 for 500 PB of storage would be good
[14:53] <fallenour_> Output: cluster fc36db4c-9693-11e7-aae7-00163e20bc2c      health HEALTH_ERR             196 pgs are stuck inactive for more than 300 seconds             196 pgs stuck inactive             196 pgs stuck unclean             no osds      monmap e2: 5 mons at {juju-950b53-0-lxd-0=10.0.0.51:6789/0,juju-950b53-1-lxd-0=10.0.0.10:6789/0,juju-950b53-2-lxd-0=10.0.0.37:6789/0,juju-950b53-3-lxd-0=10.0.0.252:6789/0,juju-950b53-4-lxd-0=10.0.0.40:
[14:53] <Dweller_> when I'm running --edge, I had to install lxd with snap before installing conjure-up, and I don't think I have a conjure-up.lxc command anymore..
[14:54] <stokachu> Dweller_: yea all that went away now you just use the snap lxd
[14:54] <Dweller_> but when I do lxc list .. it doesn't show the juju containers?
[14:55] <Dweller_> mebbe I need to set a config somewhere
[14:55] <stokachu> does juju status show anything?
[14:55] <Dweller_> it does.. until I first do lxc list, and then it breaks
[14:56] <Dweller_> (rebuilding the vm at the mo, will be able to confirm when it comes back up)
[14:57] <fallenour_> @jamespage just an fyi, power is becoming unstable, hurricane is coming towards georgia, so if i dont respond, that is why.
[17:13] <Dweller_> ok.. vm is back up.. juju status shows all containers as running and active
[17:14] <Dweller_> is there any config I should set for lxc to list the containers using lxc ?
[17:15] <magicaltrout> hello ya'll completely off the wall question here, but here goes
[17:16] <magicaltrout> if I wanted to run K8S in LXC/LXD on Centos, my understanding is that conjure-up makes some changes to the profile to allow it on Ubuntu? Can i manually make those changes on Centos or is that out of the question?
[17:19] <stokachu> magicaltrout: i think the changes made are related to app armor
[17:19] <stokachu> some of the changes
[17:19] <stokachu> the others are just enabling privledged etc
[17:19] <magicaltrout> hrmmm
[17:19] <magicaltrout> k
[17:19] <stokachu> magicaltrout: https://github.com/conjure-up/spells/blob/master/canonical-kubernetes/steps/lxd-profile.yaml
[17:20] <stokachu> thats what our profile looks like
[17:20] <stokachu> the lxc.aa_profile is apparmor
[17:20] <stokachu> not sure if devices apply either
[17:21] <magicaltrout> okay cool thanks stokachu i'll have a prod
[17:21] <stokachu> magicaltrout: np
[18:38] <Dweller_> confirmed.. I'm probably doing something wrong.. I install lxd with snap install lxd .. then I install conjure-up with  snap install conjure-up --classic --edge  then I bring up kube with  conjure-up kubernetes-core localhost  ..  after which   juju status  shows the stuff up and running..  I then do lxc list  and it mumbles about generating a client certificate, then lists no containes at all, and after that.. juju status
[18:38] <Dweller_> just hangs and doesnt work anymore
[18:38] <stokachu> hmm
[18:39] <stokachu> Dweller_: what does `which lxc` show
[18:39] <Dweller_>  /usr/bin/lxc
[18:39] <stokachu> try /snap/bin lxc list
[18:39] <stokachu> im curious
[18:39] <Dweller_> that works
[18:40] <Dweller_> ok.. so stock ubuntu has an lxc that isn't the one that juju used =) no probs.. I can work with that
[18:40] <stokachu> Dweller_: yea conjure-up uses the snap lxd for it's deployments
[18:40] <stokachu> though i thought the environment's PATH had /snap/bin listed first
[18:41] <Dweller_> for me, /snap/bin is at the end
[18:41] <stokachu> ok, it may just be something i have to document for now
[18:41] <Dweller_> I wonder if I can apt uninstall the old lxc
[18:41] <stokachu> until snap lxd becomes the default
[18:41] <stokachu> Dweller_: yea if you aren't using the deb installed one
[18:55] <Dweller_> added apt-get purge -y lxd lxd-client to my vagrantfile =) that should sort it
[20:52] <Dweller_> hmm.. my last 2 bringups have got stuck at the 'setting relation' bit
[23:27] <Dweller_> interesting.. I need to confirm this.. but I _think_ if I apt-get purge lxd-client before I do conjure-up lxd / conjure-up kubernetes-core .. then conjure-up kubernetes-core hangs at the 'Setting relation ...' phase (never gets to 'Waiting for deployment to settle' log output)
[23:29] <Dweller_> which really kinda makes you wonder whats going on there, and could it be using the 'wrong' lxc atm ?
[23:31] <Dweller_> hmm.. I mean snap install xd / conjure-up kubernetes-core ;p
[23:31] <Dweller_> s/xd/lxd