[07:09] <kjackal> Good morning Juju world!
[07:52] <erik_lonroth> good morning
[10:22] <armaan_> jamespage: Hello, I am trying to figure out the rolling upgrade process for Ceph with juju. I want to upgrade from firefly -> hammer -> jewel, Could you please let me know if there is any official documentation available for this process?
[10:23] <jamespage> armaan_: there is some documentaion in https://jujucharms.com/ceph-mon/ 'Rolling Upgrades'
[10:24] <jamespage> armaan_: basically you have to set the 'source' configuration option
[10:25] <armaan_> jamespage: AFAIU, I will need to follow these steps: (1) set action-managed-upgrade=true (2) juju upgrade-charm (3) Set new origin "openstack-origin=cloud:trusty-mitaka"? Or am i missing some steps here
[10:25] <jamespage> armaan_: no
[10:25] <armaan_> jamespage: ahh, thanks let me have a look at the link
[10:25] <jamespage> armaan_: just the source config option
[10:25] <jamespage> armaan_: the ceph charms don't have 'action-managed-upgrade'
[10:25] <jamespage> or openstack-origin
[10:26] <jamespage> 'Supported Upgrade Paths' is also importnat
[10:33] <armaan_> jamespage: please correct me if i am wrong, by executing "juju set ceph-mon source=cloud:trusty-mitaka"; the charm will upgarde the ceph-mon to jewel release?
[10:33] <jamespage> armaan_: yes but you have to set via kilo first
[10:33] <jamespage> armaan_: if you just try jump direct to mitaka from icehouse (stock trusty), it won't upgrade
[10:35] <armaan_> jamespage: My environment is running on Liberty + Firefly and the target is to upgrade to Newton + Jewel.
[10:35] <jamespage> armaan_: you'll only be able to get as far as mitaka on trusty
[10:36] <armaan_> jamespage: oh, you mean Newton is not supported on trusty.
[10:36] <jamespage> armaan_: liberty was aligned with hammer
[10:36] <jamespage> http://reqorts.qa.ubuntu.com/reports/ubuntu-server/cloud-archive/liberty_versions.html
[10:36] <jamespage> not jewel - be careful mixing things as client/server version mismatches between ceph releases can cause issues
[10:37] <armaan_> jamespage: ok so, juju set ceph-mon source=cloud:trusty-liberty, will upgrade the ceph charm to hammer?
[10:37] <jamespage> basically yes
[10:41] <armaan> sorry, bad internet connection
[10:43] <armaan> jamespage: this was my question before i got disconnected :(. Is it fair to assume that i will have to upgrade from 14.04 to 16.04 first, because Newton charms are not supported in Trusty?
[10:43] <jamespage> armaan: no that's not correct
[10:44] <jamespage> the charms are the same whatever the release, the Newton UCA is for Xenial onwards
[10:44] <jamespage> armaan: however, its not possible to in-place upgrade between Ubuntu series using Juju
[10:44] <jamespage> armaan: you can get trusty to mitaka, but that's the last supported OpenStack release for trusty in the UCA
[10:49] <dakj> Anyone can help me with Openstack Base bundle? It's has been deployed on all nodes on MAAS but Nova-Cloud-Controller and Ceph-mon are in pending and not complete their task. I've open also a post here (https://askubuntu.com/questions/913007/openstack-base-bundle-deployed-with-juju). Thanks
[10:54] <jamespage> dakj: looking
[10:56] <jamespage> dakj: its a little hard to say from your question post
[10:56] <armaan> jamespage: understood, thanks!
[10:56] <jamespage> dakj: it would appear that the ceph-mon cluster is struggling to bootstrap
[10:57] <dakj> James-age: I've an issue with Nova-Cloud-Controller and Ceph-mon, both result in pending and not complete the task
[10:57] <armaan> jamespage: but for openstack, these steps: (1) set action-managed-upgrade=true (2) juju upgrade-charm (3) Set new origin "openstack-origin=cloud:trusty-mitaka" are okay?
[10:59] <jamespage> armaan: if you set action-managed-upgrade=true you'll also have to run the 'openstack-upgrade' action on each unit of every charm in turn
[10:59] <jamespage> dakj: we'll need to figure out why its not bootstrapping
[11:00] <jamespage> dakj: (for ceph)
[11:00] <armaan> jamespage: Ok, so only two steps (1) juju upgrade-charm <service> (2) Set new origin "openstack-origin=cloud:trusty-mitaka" ?
[11:00] <jamespage> I'm wondering whether you might be seeing some memory contention but I though the bundle contained config to reduce the chance that happens
[11:00] <jamespage> armaan: that will work
[11:00] <jamespage> 1) upgrades the charm 2) tells the charm to upgrade openstack
[11:00] <dakj> James-age: what do you need to figure that?
[11:03] <armaan> jamespage: awesome, thanks! :)
[11:03] <jamespage> dakj: you'll need to ssh to the ceph-mon units and look at the log files
[11:03] <jamespage> in /var/log/ceph
[11:04] <dakj> jamespage: ok, I'm doing that and post its result
[11:21] <dakj> James-age: here is that http://paste.ubuntu.com/24580436/
[12:25] <Hetfield> hi all. i deployed openstack-base but i have issues with radosgw
[12:26] <Hetfield> basically when deployed on a lxc container, with basic network settings (all endpoint are the same) it's not reachable by VM in openstack world
[12:26] <Hetfield> actually all the admin network is not reachable, the neutron-gateway doesn't let the VM reach the infra endpoints
[12:26] <Hetfield> i.e. a guest vm cannot reach keystone
[12:27] <Hetfield> anyone with a similar issue?
[12:34] <dakj> Jamesage:
[12:34] <dakj> Jamespage: any suggest?
[12:35] <jamespage> dakj: not just from that - what does "sudo ceph -s" say?
[12:36] <jamespage> Hetfield: the network topology between your instances and the control plane of the cloud is not limited by the bundle
[12:36] <jamespage> Hetfield: so if the network containing floating ip's used to access your instances can't route to/from the network hosting the IP addresses
[12:37] <Hetfield> jamespage: sure, actually rados is the only app that is needed by users, the others are just internal
[12:37] <jamespage> Hetfield: for the API services in the control plane, then no vm's won't be able to
[12:37] <dakj> Jamespage: here is http://paste.ubuntu.com/24580701/
[12:37] <Hetfield> jamespage: but is looks like the vxlan packets coming from a guest VM are goint to neutron-gatway machine correctly, the router routes all but those  to the admin network
[12:38] <jamespage> dakj: it looks like the ceph-mon units are not able to form a new cluster for some reason
[12:38] <jamespage> dakj: can I see a 'ps -aef' from all three units please
[12:41] <dakj> Jamespage: here is http://paste.ubuntu.com/24580706/ for the LXD that is in maintenance.
[12:42] <dakj> Jamespage: the juju status for ceph gives this result http://paste.ubuntu.com/24580714/
[12:44] <jamespage> dakj: can you do the "sudo ceph -s" from all three ceph-mon units please
[12:45] <Zic> hi here
[12:45] <jamespage> dakj: the 'unable to detect block devices' maybe that the osd-devices in the bundle uses /dev/sdb, but your VM's won't have that block device - probably /dev/vdb
[12:45] <Zic> it seems that etcd snap of CDK bundle is a little... too confined: http://paste.ubuntu.com/24580734/
[12:45] <Zic> (cc lazyPower / kjackal)
[12:45] <jamespage> Hetfield: the neutron router on the gateway should just ship everything to the default gateway it has set
[12:46] <jamespage> it is possible to add extra routes to its routing tablke
[12:46] <jamespage> but I'd have to google to remember quite how
[12:47] <lazyPower> Zic: Ah, yeah. What you can do, is path that to $HOME/snap/etcd/  and it should work as expected. I'll file a bug for that as well.
[12:47] <jamespage> I expect its exposed in the api somewhere
[12:47] <dakj> jamespage: here is the first one (http://paste.ubuntu.com/24580733/), the second one (http://paste.ubuntu.com/24580737/), the last one (http://paste.ubuntu.com/24580741/)
[12:47] <Zic> lazyPower: thanks, I will try that (hello o/)
[12:48] <jamespage> dakj: "noname-b=10.20.81.16:6789/0" - not something I've seen before
[12:48] <jamespage> one of the monitors is not bootstrapped into the cluster correctly AFAICT
[12:49] <jamespage> dakj: might be something time-ish
[12:50] <jamespage> dakj: if clocks are not synced between the physical hosts hosting the ceph-mon units you might get this
[12:50] <dakj> jamespage: it has created 3 LXC machine for CEPH-mon, 3 for CEPH-osd and 1 for CEPH-radosgw
[12:52] <dakj> Jamespage: the date reports 2 different time between host with MAAS and the VM why?
[12:52] <jamespage> well that's a good question
[12:52] <Hetfield> jamespage: the default routing is already working. my issue is only when routing tries to reach a network directly connected to the hypervisor hosting the neutron-gateway unit
[12:53] <jamespage> dakj: which MAAS version are you using?
[12:53] <dakj> Jamespage: the VM where I've installed MAAS has the correct clock, the VM used for Openstack different clock
[12:53] <dakj> Jamespage: MAAS Version 2.1.5+bzr5596-0ubuntu1 (16.04.1)
[12:53] <jamespage> dakj: timezone or clock?
[12:54] <jamespage> dakj: in any case, the important bit here is the time is in sync between the VM's that are hosting the cloud
[12:54] <jamespage> dakj: I'd expect those are using UTC
[12:54] <dakj> jamespage: on MAAS is CEST while others VM is UTC
[12:54] <jamespage> yeah that's what I'd expect
[12:55] <jamespage> dakj: you did the MAAS VM install by hand?
[12:55] <jamespage> dakj: anyway the problem machine is juju-37af3b-2-lxd-1
[12:55] <dakj> No I've used Ubuntu 16.04 ISO and then updated it via spa stable
[12:56] <jamespage> dakj: can you check the avalaible free memory on that machine please
[12:58] <dakj> jamespage: all 4 nodes dedicated for Openstack have 12GB RAM each one, and 250GBx2 of HDD
[12:58] <jamespage> yeah I see the spec - but how much free memory does machine 2 have?
[12:58] <jamespage> once it has everything running on it
[12:58] <jamespage> or trying to run on it
[12:59] <Zic> lazyPower: thanks, it works
[12:59] <lazyPower> Zic: cheers :) sorry you hit that snag
[12:59] <jamespage> dakj: my thought process here is that something is inhibiting the third mon unit from joining the cluster properly - trying to figure out what
[12:59] <dakj> Jamespage: here its free memory http://paste.ubuntu.com/24580784/
[13:00] <jamespage> hmm its been in and out of swap alot
[13:00] <lazyPower> Zic: i think the crux of the issue here is etcdctl is shipping with the etcd server bin, and if we strictly confine one it affects the other. In order to get the behavior you're looking for we'd need to package up etcdctl as a separate snap and in turn change its confinement flags. I may be wrong and we might be able to just add some plugs to etcdctl. But i do believe it presumes you'll be working in $HOME with
[13:00] <lazyPower> its current confinment model.
[13:00] <lazyPower> Zic: you might have been able to get away with just $HOME, i believe etcdctl has the home slot declared.
[13:10] <dakj> jamespage: all vnodes on MAAS is using UTC, while the node where is installing MAAS uses CEST. On VMware ESX Is configured as NTP server ntp.ubuntu.com
[13:43] <dakj> jamespage: also the VM with JUJU has UTC as clock!!!! Why Host server has the correct clock and nodes not???
[13:44] <jamespage> dakj: anything deployed by MAAS will have UTC
[13:44]  * jamespage thinks that is generally a good practice for servers btw
[13:44]  * jamespage spent to long doing 0100 support to 'watch' daylight saving changes in a past life
[13:45] <lazyPower> +1
[13:45] <jamespage> management used to insist that some things got shutdown for the lost/gained hour...
[13:45] <jamespage> just in case they got confused....
[13:48] <dakj> Jamesspage: all the VM are deployed on VMware ESX, I don't understand why the VM used for MAAS has the CEST and the others VM UTC!!! In this way the clock between host and VM is never sync.
[14:05] <dakj> James-age: I've changed clock on MAAS from CEST to UTC as the node, now I trying to re-deploy all node and run the bundle, I'll say if something is changed or not. See you later
[14:35] <Zic> lazyPower: does changing the storage driver of Docker is supported? I saw it's overlay for now, but as we have "out of inodes" issues, we saw that Overlay2 might be a solution
[14:35] <lazyPower> Zic: its not exposed, but you bet if thats something you need supported i can expose that
[14:36] <lazyPower> Zic: is this in a test cluster that you can just drop that graph driver in and give it a trial before we expose it?
[14:36] <lazyPower> i'm not certain what types of headaches that may bring in for operators
[14:38] <Zic> lazyPower: https://docs.docker.com/engine/userguide/storagedriver/images/driver-pros-cons.png
[14:38] <Zic> it's a bit complex :(
[14:38] <lazyPower> Zic: i dont see overlay2 in this chart at all
[14:38] <Zic> https://docs.docker.com/engine/userguide/storagedriver/selectadriver/#overlay-vs-overlay2
[14:39] <Zic> it's merged I think
[14:39] <Zic> The overlay driver has known limitations with inode exhaustion and commit performance. <= we are touched by the "inode" problem, we are OK with the performances actually
[14:43] <Zic> lazyPower: I will do some tests on testing cluster, I will report to you what I discovered :)
[14:43] <lazyPower> Zic: thanks for driving that. I'm happy to support you in this effort though
[14:44] <lazyPower> so keep me in the loop and lets do a discovery on what needs to happen. I think it may be as simple as exposing a graph config option, but we may need supporting packages yeah?
[14:44] <lazyPower> i guess i should hold my questions until you've done discovery
[14:47] <Zic> for now, the switch from overlay to overlay2 is my main fear, as it will need a full docker stop, change driver, clean overlay FS, start docker, re-pulling all image containers of the cluster
[14:48] <Zic> I think overlay2 is already a part of the docker package of Ubuntu archive
[14:48] <Zic> > 1.11 is needed said the doc
[14:48] <Zic> Ubuntu have 1.12.6
[14:53] <lazyPower> ah yeah
[14:53] <lazyPower> doing the backend graph migration is goign to be intense for you on your existing deployment
[14:53] <lazyPower> i can see why that would induce concern
[14:54] <lazyPower> Zic: i wonder if it wouldn't make more sense for you in this case to deploy a fresh worker pool set and migrate stuff instead of attempting to do an in place update
[14:54] <lazyPower> basically cordon + drain the overlay nodes, and let k8s migrate to overlay2
[15:42] <Zic> lazyPower: on the production cluster, some kubernetes-worker are physical machines
[15:42] <Zic> so it's not that easy for this ones :) for VMs it will be OK
[15:42] <lazyPower> Zic: ack. So i'll work with you to make sure this isn't nail biting.
[15:43] <Zic> it will maybe take an outage for doing replacement migration from overlay overlay2
[15:43] <Zic> but if it's planed and in midnight, it's not a big problem :}
[15:44] <Zic> my concern is more about, does it will work just by stopping docker, change the driver, clean all /var/lib/docker/overlay, start docker, let kubelet handle the re-pull for every pods/containers
[16:31] <catbus1> Hi, does nova-cloud-controller charm support subordinate charms?
[20:19] <cholcombe> is there a library for juju storage functions?  I couldn't find anything in charmhelpers
[20:49] <lazyPower> cholcombe: there isn't any library that i'm aware of. This is a prime opportunity to contribute charms.storage though :)
[20:50] <cholcombe> lazyPower: thanks.  wolsen pointed out that hookenv has it
[20:50] <lazyPower> oh nice
[20:50] <cholcombe> lazyPower: brain not working haha
[20:50]  * lazyPower still likes the idea of charms.storage though
[20:50] <cholcombe> yeah i like that also
[20:51] <lazyPower> charms.storage.format('xfs', '/path/to/bd')   charms.storage.mount('/path/to/bd', 'path/to/mount', persist=True)
[20:51] <lazyPower> and have sub classes for specifics like doing tuning to zfs pools or whatever the case may be
[20:51] <lazyPower> #wishlisted
[21:26] <Budgie^Smore> o/ juju world
[21:53] <thumper> o/