[00:12] <stokachu> Dweller_: Yea if you can confirm if like to know
[00:14] <Dweller_> results so far, with lxd/lxd-client purged before snap install lxd / snap install conjure-up / conjure-up kubernetes-core localhost == hang, apt-purge of just lxd-client before same == hang..  apt-purge line commented out = no hang..  currently testing with apt-purge lxd-client moved to after the conjure-up kubernetes-core is complete..
[00:33] <stokachu> Dweller_: this is on edge channel too?
[00:33] <Dweller_> aye.. snap install conjure-up --classic --edge
[00:33] <stokachu> Is this through vagrant as well?
[00:35] <stokachu> Dweller_: and virtual box?
[00:36] <Dweller_> still running tests tho.. so bear with me.. it takes quite a while for each conjure-up to complete (especially if it hangs ;p)
[00:37] <stokachu> Dweller_: np, if it's vagrant and you have a vagrantfile to share I can try to reproduce here as well
[00:37] <Dweller_> and contradicting everything I've said so far.. I've got one terminal that I think should have hung, that's currently on the 00_deploy-done step (thats traditionally very quite for quite a while)
[00:37] <stokachu> Dweller_: you can do a watch juju status in a another terminal to make sure things are progressing
[00:38] <Dweller_> Yeah I did when it had hung.. it said everything was up and active
[00:38] <stokachu> Ah
[00:40] <Dweller_> the "Running step: 00_deploy-done" takes quite the long while
[00:51] <Dweller_> I wonder if you could have each sub container output something as they complete their init step or whatever during the 00_deploy_done phase
[00:52] <Dweller_> like .. I started this current vm off back at 14mins past, and it's now 52mins past .. ;p
[00:52] <Dweller_> it took about 7 mins to do the apt-get upgrade, snap installs, etc.. (I dump date out before launching the conjure-up kubernetes-core localhost)
[00:53] <Dweller_> at 21mins past it started the conjure-up kubernetes-core localhost
[00:54] <Dweller_> 54mins past now, and still on "Running step: 00_deploy-done"
[00:54] <Dweller_> ooh.. I dont think its going to complete...
[00:55] <Dweller_> from juju status
[00:55] <Dweller_> Machine  State    DNS            Inst id        Series  AZ  Message
[00:55] <Dweller_> 0        down                    pending        xenial      failed to setup authentication: cannot set API password for machine 0: cannot set password of machine 0: read tcp 10.193.182.114:33282->10.193.182.114:37017: i/o timeout
[00:55] <Dweller_> 1        started  10.193.182.34  juju-46dac4-1  xenial      Running
[00:55] <Dweller_> 2        pending                 pending        xenial
[00:55] <Dweller_> 3        pending                 pending        xenial
[00:56] <Dweller_> i/o timeout would sound bad..
[01:09] <Dweller_> I'll have another look tomorrow
[01:49] <Dweller_> k8s-local: [error] cannot add relation "flannel:cni kubernetes-master:cni": read tcp 10.4.78.199:32894->10.4.78.199:37017: i/o timeout
[01:49] <Dweller_> (diff attempt)
[01:49] <Dweller_> meh.. off to sleep.. will try again tomoz.. feels like sommat isnt waiting long enough anymore
[10:17] <erik_lonroth_> rick_h: I've sent you an email with details on our problems connecting to AWS. We have found it relates to a session token which is provided for time-limited API keys. The environment variable used is "AWS_SESSION_TOKEN" and according to AWS documentation site it is used to sign API requests. I've commented in the bug report: https://bugs.launchpad.net/juju/+bug/1714022
[10:17] <mup> Bug #1714022: Juju failed to run on aws with authentication failed <juju:New> <https://launchpad.net/bugs/1714022>
[11:48] <erik_lonroth_> rick_h: I'm looking into the code of juju and can't yet see if support for AWS_SESSION_TOKEN/KEY is in juju yet. It prevents us from using AWS in our current federated setup. How would you suggest we proceed?
[11:49] <rick_h> erik_lonroth_: this is what I need to get to the core folks. I don't believe it's supported and so I want to get engineers looking into what it'll take to support. We need to.
[11:49] <rick_h> jam: ^
[11:49] <rick_h> erik_lonroth_: ty for updating the bug with details.
[11:50] <erik_lonroth_> We can start looking into this also from our end, however, we just need to double check that there is indeed a need for this before we start up a pull request.
[11:52] <erik_lonroth_> The documentation on on AWS if "pretty" clear on how the extra signing of API calls need to happen, but we are not that experienced developers of juju so we don't want to fuck your code up and waste your time fixing our code. =/
[11:52] <erik_lonroth_> *spelling is great*
[12:01] <Dweller_> [error] cannot add relation "flannel:cni kubernetes-master:cni": read tcp 10.4.78.199:32894->10.4.78.199:37017: i/o timeout
[12:01] <Dweller_>   :(
[12:31] <stokachu> Dweller_: hmm
[12:32] <stokachu> Dweller_: are you running this with vagrant+virtualbox?
[12:38] <Dweller_> yep =)
[12:38] <Dweller_> still on edge, but hadn't seen timeouts until yesterday evening
[12:39] <Dweller_> the system is the twin xeon rig, 24g of ram, and the cpu's are bearly breaking a sweat running the vagrant box.. plenty of ram left, no swap in use, and the only disk is an ssd
[12:45] <Dweller_> hmm.. mebbe networking issues?
[12:45] <Dweller_> [error] cannot get resource metadata from the charm store: Get https://api.jujucharms.com/charmstore/v5/~containers/easyrsa-15/meta/resources: dial tcp: lookup api.jujucharms.com on 10.157.242.1:53: read udp 10.157.242.193:41730->10.157.242.1:53: i/o timeout
[12:52] <Zic> hello here: one of my kubernetes-master is blocked in "maintenance" in juju status, but in fact it's ok
[12:53] <Zic> (was after a juju upgrade-charm kubernetes-master)
[12:53] <Zic> I have two other master in this K8s cluster which are "idle"
[12:54] <Zic> "Starting the Kubernetes master services." is the message for "maintenance"
[13:27] <Zic> I'm seaching something like "juju resolved kubernetes-master/0" but "resolved does not work on status "maintenance"
[13:35] <kjackal> Hi Zic
[13:37] <kjackal> Zic: is this deployment one that got updated?
[13:39] <kjackal> Zic: can you show me the output of this: juju run --unit kubernetes-master/0 'charms.reactive --format=yaml get_states'
[13:48] <Zic> yeah, it was from 1.6.2 to 1.7.4
[13:48] <Zic> but I found something new/weird : all my nodes are in NotReady in kubectl get nodes :/
[13:48] <Zic> and their logs say:
[13:49] <Zic> kubelet_node_status.go:106] Unable to register node "ig1-k8s-01" with API server: the server has asked for the client to provide credentials (post nodes)
[13:49] <Zic> seems they lost their certificate
[13:49] <Zic> http://paste.ubuntu.com/25521019/ <= kjackal
[13:58] <kjackal_> Zic: I do not see the kube-api-server relation between the master and the workers
[13:59] <kjackal_> Zic: is this a production cluster? If not I would remove and re-add the kubernetes-master <-> kubernetes-worker relations
[14:01] <kjackal_> Zic: from 1.7 we did harden the auth mechanism between master-workers and admins
[14:02] <kjackal_> that means you should also grab the updated config file from the master: juju scp kubernetes-master/0:config ~/.kube/
[14:02] <Dweller_> hmm.. can't bring up vagrant with conjure up kubernetes core since yesterday.. seems something is now taking too long, causing a timeout that leads to failure
[14:02] <stokachu> Dweller_: can you pastebin your vagrantfile?
[14:02] <stokachu> i can try to reproduce
[14:03] <Dweller_> sure.. give me a mo..
[14:03] <Dweller_> it's in github, but our enterprise one .. which wont help you.. lemme paste it =)
[14:07] <Dweller_> stokachu: https://pastebin.com/DvuSEWvs
[14:08] <stokachu> is bento/ubuntu-16.04 like an official image?
[14:08] <Dweller_> aye.. it's ubuntu-16.04
[14:09] <Dweller_> http://chef.github.io/bento/
[14:09] <Dweller_> but it was all working pretty well until yesterday evening.. but since then I've not been able to bring up a vm
[14:10] <stokachu> Dweller_: ok ill try with this bento project but you should use https://app.vagrantup.com/ubuntu instead
[14:10] <Dweller_> I just tried one reverted to the non edge version (you'll need to edit the vagrant file if you want to try edge.. looks like I lost the --edge too in my hackery)
[14:10] <stokachu> those are the ones we build
[14:10] <Dweller_> sure.. will try that one now..
[14:11] <stokachu> ok i need to install virtualbox and vagrant it'll be a few minutes
[14:15] <Dweller_> no probs.. I've got it attempting with ubuntu/xenial64 at the mo
[14:15] <Zic> kjackal_: was a dev cluster yup, testing if upgrading directly from 1.6.2 to 1.7.4 was possible
[14:16] <Dweller_> although given it used to work, and stuff is timing out.. I'm wondering if networking somewhere is causing me grief
[14:16] <Zic> kjackal_: I think I miss something, I just read the upgrade page from kubernetes.io about Ubuntu/Juju and note the specific release note of every release
[14:16] <stokachu> Dweller_: yea im running it now
[14:20] <Zic> do you have a link to read all upgrade-step between CDK versions or do I need to find old articles on Ubuntu Insights?
[14:20] <Zic> I know that this blogpost sometime have more extra-step than https://kubernetes.io/docs/getting-started-guides/ubuntu/upgrades/
[14:22] <kjackal_> Zic: Here is the anounce ment we had when 1.7 came out: https://insights.ubuntu.com/2017/07/07/kubernetes-1-7-on-ubuntu/
[14:22] <kjackal_> Looking for the upgrade and release doc
[14:23] <Zic> yup, just found it, just saw the auth/cert part, don't know what happens with my charms relation so :(
[14:23] <Dweller_> stokachu: so the official box uses 'ubuntu' as the user, not 'vagrant' the file will need changes for that..
[14:23] <Zic> I also juste noted that my Juju GUI is down on https://<host>:17070/
[14:23] <Zic> just*
[14:23] <stokachu> Dweller_: yea just ran into that :)
[14:24] <Zic> < HTTP/1.1 400 Bad Request
[14:24] <Zic> * no chunk, no close, no size. Assume close to signal end
[14:28] <stokachu> Dweller_: it's deploying now
[14:28] <Zic> (fixed for the Juju GUI, some part of the full URI to access it missed)
[14:30] <Zic> kjackal_: can you confirm the juju remove-relation / juju add-relation ? I fear to do something nasty :)
[14:31] <Zic> even if it's non-prod, I prefer to try to solve properly in case if it happens one day in prod
[14:31] <Dweller_> ==> k8s-local: error: cannot perform the following tasks:
[14:31] <Dweller_> ==> k8s-local: - Download snap "core" (2844) from channel "stable" (Get https://068ed04f23.site.internapcdn.net/download-snap/99T7MUlRhtI3U0QFgl5mXXESAiSwt776_2844.snap?t=2017-09-12T16:00:00Z&h=4d4b35a936b3094a2dcbba86a2d9063de4b843ac: dial tcp: lookup 068ed04f23.site.internapcdn.net on 192.168.1.1:53: server misbehaving)
[14:32] <kjackal_> Zic: I am not sure why that deployment went into this state. However, I see a state missing indicating this relation is not in place
[14:32] <kjackal_> so...
[14:33] <Zic> yup, and it seems logic so that master does not recognize its nodes
[14:35] <stokachu> Dweller_: what's your bridge defined as?
[14:35] <stokachu> Dweller_: i picked lxdbr0 for mine
[14:36] <Dweller_> enp2s0 .. the adapter with access to my lan
[14:38] <stokachu> Dweller_: hmm ok so that's one thing i did differently
[14:38] <stokachu> Dweller_: picked a virtual bridge
[14:40] <stokachu> Dweller_: oh, are you running out of space on the device?
[14:40] <stokachu> Dweller_: because that just happened to me
[14:40] <Dweller_> 184g available
[14:40] <stokachu> 9.7G for / is not enough
[14:40] <stokachu> what does `df -h` show
[14:41] <Dweller_> oh.. you mean inside the vm ?
[14:41] <stokachu> yea
[14:41] <Dweller_>  /dev/sda1       9.7G  1.3G  8.4G  14% /
[14:41] <stokachu> yea you're going to run out of space
[14:41] <stokachu> that's one issue
[14:42] <stokachu> that's probably why it seemed like it was hanging
[14:43] <Dweller_> I wonder how big the bento image was ;p
[14:44] <Dweller_> still 8.3g available on / tho.. how much does conjure-up kubernetes-core need?
[14:45] <stokachu> well i was at 00_deploy-done and all 9.7G was used
[14:45] <stokachu> i dont know how much exactly but i would do at least a 40G /
[14:47] <Dweller_> https://github.com/sprotheroe/vagrant-disksize  =)
[14:50] <stokachu> Dweller_: cool!
[14:54] <stokachu> Dweller_: yea that gave me a 40GB partition
[14:54] <stokachu> re-running now
[14:54] <Dweller_> same
[14:55] <kjackal_> Zic: did it work?
[14:58] <Zic> kjackal_: oh oops, I questioned you about the exact juju remove-relation/add-relation command since I fear to do something nasty, habitually I only use the Juju GUI to prepare new deployment
[14:59] <stokachu> Dweller_: did you get past that snap download error?
[14:59] <Dweller_> not this time..
[15:00] <Dweller_> ==> k8s-local: error: cannot install "conjure-up": Get
[15:00] <Dweller_> ==> k8s-local:        https://api.snapcraft.io/api/v1/snaps/details/core?channel=stable&fields=anon_download_url%2Carchitecture%2Cchannel%2Cdownload_sha3_384%2Csummary%2Cdescription%2Cdeltas%2Cbinary_filesize%2Cdownload_url%2Cepoch%2Cicon_url%2Clast_updated%2Cpackage_name%2Cprices%2Cpublisher%2Cratings_average%2Crevision%2Cscreenshot_urls%2Csnap_id%2Csupport_url%2Ccontact%2Ctitle%2Ccontent%2Cversion%2Corigin%2Cdeveloper_id%2Cpri
[15:00] <Dweller_> vate%2Cconfinement%2Cchannel_maps_list:
[15:00] <Dweller_> ==> k8s-local:        net/http: request canceled while waiting for connection (Client.Timeout
[15:00] <Dweller_> ==> k8s-local:        exceeded while awaiting headers)
[15:00] <stokachu> Dweller_: im thinking you got some network issues happening
[15:00] <kjackal_> Zic: removing and readding relations is safe, should always work
[15:01] <Dweller_> yarp.. gonna add some changes to my lan & see if I cant route that box out via a different network provider
[15:02] <Dweller_> (I have 3 exits from my lan to the internet, loadbalanced using mwan3 on openwrt)
[15:09] <stokachu> Dweller_: ok, it all came up for me
[15:10] <stokachu> Dweller_: oh i also added `apt-get remove -qyf lxd lxd-client`
[15:10] <stokachu> so that it doesn't get confused there
[15:11] <Dweller_> yeah.. thats what broke me yesterday evening.. although I'm suspecting now thats when my network went nuts, rather than it being the cause
[15:13] <stokachu> Dweller_: ack
[15:16] <Zic> kjackal_: "kube-control relation removed between kubernetes-worker and kubernetes-master."
[15:16] <Zic> is it good?
[15:19] <kjackal_> Zic: sure add it back
[15:20] <kjackal_> there is also the relation kube-api-endpoint missing between master and worker
[15:20] <kjackal_> Zic: ^
[15:21] <kjackal_> actually is you do a juju add-relation kubernetes-master kubernetes worker you will see the two relations that need to be added between master and worker
[15:27] <Zic> kjackal_: yup, I did that, now the master is in "blocked / Waiting for workers"
[15:27] <Zic> I'm waiting a bit :)
[15:29] <Zic> but nothing much happen now in "juju debug-log"
[15:36] <kjackal_> Zic:  did you add the kubernetes control  relation between master and worker? This message comes is shown when the relation is not there: https://github.com/kubernetes/kubernetes/blob/master/cluster/juju/layers/kubernetes-master/reactive/kubernetes_master.py#L420
[15:38] <Zic> kjackal_: yup, I redit the get_states command after: http://paste.ubuntu.com/25521483/
[15:40] <kjackal_> Zic: did you also added the kube-api-endpoint relation? https://github.com/kubernetes/kubernetes/blob/master/cluster/juju/layers/kubernetes-master/reactive/kubernetes_master.py#L432
[15:41] <kjackal_> Zic: workrs need to know where the api-server is
[15:44] <Zic> # juju add-relation kubernetes-worker kube-api-endpoint
[15:44] <Zic> ERROR application "kube-api-endpoint" not found (not found)
[15:44] <Zic> hmm?
[15:44] <kjackal_> wait Zic this is a relation between master and workers
[15:45] <kjackal_> should be something like: juju add-relation kubernetes-master:kube-api-endpoint kubernetes-worker:kube-api-endpoint
[15:45] <kjackal_> Zic: ^
[15:46] <Zic> thanks, it works for now
[15:46] <kjackal_> Awesome
[15:47] <kjackal_> I have to go Zic
[15:47] <Zic> thanks anyway for your help kjackal_ ;)
[15:47] <kjackal_> Should be back in a few hours, sorry
[16:10] <Fallenour> @jamespage hey Im back, survived hurricane minus modest house damage.
[16:10] <Fallenour> I didnt see any of the last replies though after you went to your hotel. Did you get the ceph output?
[16:50] <Fallenour> can anyone help me out with a ceph issue?
[16:52] <stormmore> o/ juju world... hey rick_h I have started to build my first charm :)
[16:54] <rick_h> stormmore: woot woot
[16:54] <rick_h> stormmore: whatcha building?
[16:54] <stormmore> rick_h: sub-ord charm for etckeeper
[16:54] <Fallenour> @rick_h @stormmore hopefully not ceph, talk about a rough start
[16:55] <stormmore> Fallenour: thankfully not, I know that charm well though
[16:55] <stormmore> Fallenour: well well-ish even
[16:56] <Fallenour> @stormmore its giving me nothing short of pure hell. Built an entire openstack charm, got it working EXCEPT for ceph LOL
[16:57] <rick_h> Fallenour: yay on openstack but :/ on ceph. Sorry, not an expert there.
[16:57] <Fallenour> @stormmore so pretty much, I have a fully blown, rocking hard openstack, it just has the memory of a goldfish LOL
[16:57] <stormmore> Fallenour: why build when there is a good bundle already avialable?
[16:58] <Fallenour> @stormmore I did use that build, but I also built a separate one with hyperscalability. The current issue with the current trusty charm built in for newton is that its not ocata, and its trusty, at least last I checked. As for the openstack base from charmers, its only for 3, and I need at least 5 ceph-mon boxes to handle all the future storage add
[16:58] <rick_h> stormmore: cool on the sub for etckeeper.
[17:00] <stormmore> Fallenour: ah Trusty enough said! As far as being able to scale, I haven't seen any issue with add-unit from the main bundle
[17:00] <Fallenour> @stormmore @rick_h Current upgrading is being a huge pain in the ass though, and its holding up my project really bad
[17:01] <Fallenour> @stormmore The concern isnt the add to, the issue is drive management requests once you get over several Petabytes. My end objective is currently over 500PB for the current project build as is. 3 Ceph-Mon systems cant manage that many requests.
[17:02] <stormmore> Fallenour: oh I get that but you should just have to manipulate the bundle yaml
[17:03] <stormmore> put in the number of units you want and the placements
[17:03] <Fallenour> @stormmore yea the overall idea is fix the issues at small scale, then implement to yaml, push to Juju / Salt power combo, and scale like a mad man
[17:03] <stormmore> I have been playing with OS on LXD on my poor laptop
[17:03] <Fallenour> @stormmore the current issue is if I cant get 5 ceph-osd / ceph-mon nodes to work, theres not way im gonna get 5000 to play nice
[17:06] <stormmore> Fallenour: on I understand the problem, just don't really see it as a charm problem, more of a charm bundle
[17:06] <stormmore> oh*
[17:07] <Fallenour> @stormmore Well it kind of is. The issue is if the basic charms dont deploy correctly, which they didnt, I cant trust them to work at scale
[17:07] <catbus> Fallenour: how do they not deploy correctly? What's the symptom?
[17:08]  * stormmore is still wondering why Trusty not Xeniel
[17:08] <Fallenour> @catbus ceph is down, even though they show 3 of the 5 nodes in openstack horizon, but are missing two of the larger nodes.
[17:08] <Fallenour> @catbus what makes it more confusing , is it shows 5/5 and 5/5 respectively
[17:09] <zeestrat> @Fallenour is there a copy of the bundle you're deploying?
[17:09] <Fallenour> @catbus checks of the /etc/ceph/ceph.conf show all configs configured properly, but health outputs show pgs not building properly, all 196 pgs are stuck for some reason, and its not responding to ceph pg repair commands
[17:10] <Fallenour> @zerestrat yea, its the standard that ships with juju, so like millions of copies
[17:10] <Fallenour> @zerestrat the only difference is after build failure, I simply ran the upgrade commands and let it upgrade to xenial.
[17:11] <Fallenour> @zerestrat my first thought was that it needed to upgrade in order to work, so I pushed upgrade across all, but still didnt resolve issue.
[17:12] <Fallenour> @catbus @zerestrat @stormmore I can dump ceph health, ceph tree, and ceph -s if that helps
[17:17] <catbus> Fallenour: I am no ceph expert but I can look up if I see something suspicious. Can you also show juju status?
[17:18] <catbus> in a pastebin.ubuntu.com
[17:26] <Fallenour> @catbus @zeestrat @stormmore @rick_h Here is the full paste, includes: juju status, ceph health, ceph health detail, ceph ph dump_stuck unclean
[17:26] <Fallenour> http://pastebin.ubuntu.com/25522179/
[17:27] <Fallenour> my thoughts on a nuclear option: ceph osd force-create-pg <pgid>
[17:28] <stormmore> Fallenour: have you looked at the juju logs for the osds? /var/log/juju/unit-*?
[17:28] <catbus> Fallenour: You see "No block devices detected using current configuration" for ceph-osd units in juju status?
[17:28] <Fallenour> @catbus yea, I saw that. I saw this too: juju run --unit ceph-mon/1 'ceph osd stat'      osdmap e35: 0 osds: 0 up, 0 in             flags sortbitwise,require_jewel_osds
[17:29] <stormmore> Fallenour: what catbus said is what I am getting at. I suspect that it can't create the PGs cause it doesn't know what block devices to use
[17:29] <catbus> Fallenour: thanks for using conjure-up. conjure-up with openstack-base is using this bundle https://api.jujucharms.com/charmstore/v5/openstack-base/archive/bundle.yaml, which specifies /dev/sdb for ceph-osd to use.
[17:29] <Fallenour> @catbus @stormmore No, not yet. Power issues kept me from going much further.
[17:30] <Fallenour> @catbus yea, conjure up is pretty amazing. I used the current one that ships with the up to date conjure-up. Thats my big question
[17:30] <catbus> Fallenour: you can modify the ceph-osd config when you select openstack with novaKVM.
[17:30] <Fallenour> @catbus why are 3 of the 5 working? but yet none of them workign?
[17:31] <catbus> after you select the spell, it will present all the service configurations, you can select ceph-osd and find the 'osd-devices' to modify accordingly to your environment.
[17:31] <Fallenour> @catbus Thats exactly what I did, I did the configure, added 2 machines, added OSD to bare metal, mon to lxd
[17:31] <Fallenour> @catbus thats exactly what I did
[17:31] <Fallenour> @catbus other than that, and assigning the specific machines, I let it auto deploy the rest of the way
[17:33] <Fallenour> @catbus @stormmore do you think I should rebuild?
[17:33] <catbus> Fallenour: what are the block devices you set for ceph-osd units?
[17:33] <Fallenour> @catbus Dell R610 and R710s.
[17:33] <catbus> Fallenour: how many hard disk drives are in these servers?
[17:34] <catbus> each.
[17:34] <Fallenour> @catbus 8,8,2,2,2. the 2, 2 , and 2 are the 610s, and their drives are raided. Should I break the raid?
[17:34] <zeestrat> And what are the device names for the block devices?
[17:35] <Fallenour> @zeestrat systems were autonamed with maas, so like fresh-llama and hot-seal (not my idea I swear).
[17:35] <catbus> Fallenour: you can have raid for driver number >=2, but for 2 drives only servers, break the raid, so you can have 1 drive to ceph-osd to use.
[17:36] <Fallenour> @catbus, soo, break the raid, which blows away OS, let it rebuild those three, or just rerun the charm install?
[17:36] <zeestrat> @Fallenour, those sound like the hostnames of the servers. the 'osd-devices' needs a list of block devices
[17:36] <catbus> Fallenour: i'd start over since the OS will be gone.
[17:37] <Fallenour> @catbus #sadpanda :'( its gonna take a while with my current connection at 6/1
[17:38] <stormmore> catbus: couldn't Fallenour use a loopback device at least for a PoC before going to the extreme of a rebuild?
[17:39] <stormmore> or resize the OS drive and make a data partition?
[17:40] <catbus> stormmore: re-build usually takes ~1 hour for me, so I usually prefer to start over. It's up to Fallenour.
[17:40] <stormmore> those might be faster than a rebuild to confirm the setup but I would defintely do a rebuild
[17:40] <stormmore> catbus: oh I am with you there ;-)
[17:40] <catbus> Fallenour: and you can put '/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh' for 'osd-devices' in ceph-osd configuration. It will take whatever is available on the host.
[17:41] <Fallenour> @catbus If the rebuild is a confirmed kill, Ill take that for sure. Whatever it takes to get it up and running, I have a lot of people depending on me to get this system live, I cant keep pushing back anymore
[17:42] <catbus> Fallenour: if you need professional service with SLA, Canonical provides that, you know? ;)
[17:43] <Fallenour> @catbus I really wish I could, but Im already eating massive losses by giving everything away for free now as is
[17:45] <catbus> Fallenour: https://jujucharms.com/ceph-osd/245 for ceph-osd charm configuration reference.
[17:47] <stormmore> just out of curiousity, does anyone have an example of a charm with an action unit test using action_get?
[17:50] <Dweller_> stokachu: network gremlins must like me again.. back up and running, and managed to use the registry action on the worker node to add the registry running in the host, and have done a simple test of a custom image pushed / built to that docker from outside the vm, and then deployment/exposing of that image via kubectl from outside.. and it worked (more or less.. I need to sort out my image a little more, but it did send back my
[17:50] <Dweller_> ctx root not found error page from liberty running in the container)
[17:50] <Dweller_> so at this point.. I now have a kube-in-a-box =)
[17:53] <stokachu> Dweller_: \o/
[17:55] <stokachu> Dweller_: if possible please post a blog post or something on your setup
[17:55] <stokachu> so we can share that out
[17:56] <Dweller_> yep, there'll be one via the gameontext blog site .. I'll ping you a preview if possible to give you a chance to point out if I've been idiotic somewhere ;p
[17:58] <Dweller_> gameontext.org is a microservice based text adventure that a bunch of us contribute to as a way to learn about microservices, and related technologies.. each room in the game is its own microservice, and the core is built with a 12 factor approach.. it's currently running in IBM Bluemix, and we regularly post bits about it =)
[17:58] <Dweller_> in the not too distant future, we're planning to move the core from docker-compose to k8s, and part of that story includes figuring out a sensible local development story for that
[17:59] <Fallenour> @Dweller_ Once I get all this running, id be more than happy to host your project for free. its right up our alley of projects we support
[17:59] <Dweller_> and having kube in a box that we can target for docker builds & k8s deploys, fulfils that pretty neatly
[18:00] <Dweller_> Thanks for the offer, but we're not paying for it at the mo either =) (Full disclosure, I work for IBM, in the Cloud Native Java team, so this kinda stuff is important to us too)
[18:00] <Dweller_> At somepoint, I should really look into what it would take to add bluemix to the set of juju supported clouds too =)
[18:01] <Fallenour> @Dweller_  LOL! Well thats definitely a good benefit to have XD. You guys ever think of cleaning out your DCs to upgrade, give me a call. We will decom and drag the gear off for free. A lot better than paying 100000k+ a quarter for decom.
[18:01] <stokachu> Dweller_: really happy conjure-up helps with that
[18:02] <stokachu> and by extension juju
[18:02] <Dweller_> Yeah.. problem with IBM is its huge.. like 400k employees worldwide huge, which means I have virtually no visibility over that stuff.. I work remote out of Montreal CA, my team is based in UK, Austin, and New York =)
[18:03] <Dweller_> stokachu: its a nice solution.. I like it.. it has a lot of scope for expansion and experimentation.. which makes it much better suited to my goals, than say, minikube
[18:03] <Fallenour> @dweller_ yea I saw a similar issue with AT&T and Verizon. Nobodies knows where the gear comes or goes from or to, just that it does haha
[18:03] <stokachu> Dweller_: awesome, feature requests welcomed too if you think something conjure-up could do to help out
[18:04] <Dweller_> aye.. when I worked out of UK we used to know a few ppl in goods inwards.. and once or twice heard about equipment being skipped that we got to salvage
[18:04] <Dweller_> but that was like once or twice in 17 years
[18:05] <Dweller_> that said.. I ended up with a large back of 72pin simms which is still handy for Amiga's and stuff today
[18:05] <Dweller_> back/bag
[18:11] <Dweller_> stokachu: from an education perspective, is there a way to have conjure-up say what it's doing? .. I love that it can do it all for me.. but I also want to know what it did.. I can mostly figure it out now by going and looking at the stuff in github.. conjure up feels like a big macro engine for juju ;p and juju is like a swiss army knife.. where I don't totally understand what the blades are, or how many it has ;p
[18:12] <stokachu> Dweller_: yea true, cory_fu  and I were kicking around an idea at one point where we basically record what the juju equivalent commands are during each step of the deployment
[18:14] <Dweller_> mebbe a variant on headless where it writes out a script with the juju commands it would have run.. then you can edit & run the script
[18:14] <Fallenour> @catbus @stormmore @rick_h @Dweller_ The issue is definitely my raid. If someone has the same issue as me in the future, tell them to break the raid on their Nova-Compute nodes, and make sure their dedicated storage nodes have at least 2 PDs per Span, with at least 2 Spans. Otherwise the install will fail, and the rebuild will be a sad drink of coffee
[18:15] <rick_h> Fallenour: :( sucky
[18:15] <rick_h> Fallenour: glad you found some root cause to attack
[18:15] <rick_h> ty catbus stormmore and Dweller_ for helping out
[18:15] <stormmore> not a problem rick_h
[18:15] <Dweller_> aye.. fwiw, I stopped using RAID, in favour of file duplicating stuff like snapraid / drivepool etc
[18:16] <stormmore> glad to when I can
[18:16] <Fallenour> @rick_h yurp. on the funny note, when I pulled one of the drives out to physically check it, the sled was empty LOL, so I got a great laugh outta that one. Lesson Learned, Dont build boxes will beer:30, will not end well XD
[18:16] <rick_h> Fallenour: hah
[18:16] <stormmore> Fallenour: lol
[18:16] <rick_h> Fallenour: #lifelessons
[18:16] <Dweller_> have over 70tb running under windows using drivebender pooling.. and abotu half that again using snap/flex raid on linux
[18:41] <rick_h> anyone gotten ssl and haproxy playing nice? I'm trying to get the charm to proxy something with ssl termination on it
[18:48] <bdx> rick_h: I've put a bit of time in there
[18:48] <rick_h> bdx: I've put a ssl_key and ssl_cert and I see the unit did create a valid .pem file but the config written doesn't do any ssl config
[18:49] <rick_h> bdx: I'm missing flipping some bit in the charm I'm thinking
[18:49] <bdx> rick_h: https://gist.github.com/jamesbeedy/d587cbf048038fb274ef4cd55c4ee3dd
[18:50] <rick_h> bdx: ah, so you setup the services yourself bummer
[18:50] <bdx> yeah ...
[18:50] <bdx> my way is the simplemans way
[18:51] <rick_h> bdx: heh, I wanted simpler the charm to go "oh I see you like you some ssl here" :P
[18:51] <Fallenour> @dweller_ @rick_h @catbus @stormmore @stokachu do any of you happen to know of a guide or collection of guides to publish a charm bundle? Im building a self-configuring, audit compliant cloud environment, and I want to make it available via the charm store, how do I do that?
[18:51] <bdx> rick_h: you can provide that info via relation too, instead of making it static in the config
[18:52] <bdx> the reverseproxy relation is difficult because of the formatting
[18:52] <rick_h> Fallenour:https://jujucharms.com/docs/stable/charms-bundles is the start
[18:52] <rick_h> bdx: yea, gotcha. K, I'll poke at it. TY for your sample config
[19:13] <Fallenour> ok here we go
[19:13] <Fallenour> wish me luck. if this all goes well, im live. If not, im probably gonna cry. grown men shouldnt cry. at least not over spilt bits
[19:44] <rick_h> mhilton_: interesting, that looks close to what I'm doing. I wonder what I've got off.
[19:44] <rick_h> mhilton_: does the controller website charm add the service with https then I wonder?
[19:46] <rick_h> mhilton_: oh hmm, that seems to be non-https setup. interesting
[20:15] <xarses_> so I changed the password that is in my cloud credentials yaml file, ran juju update-credentials, how can I ensure that these are the credentials the model is using now? P.S. the credential keeps locking out and is shared with some other systems, so I can currently neither prove or disporve that juju is the problem
[20:19] <rick_h> xarses_: hmmm...juju add-unit and check the dashboard?
[20:20] <rick_h> xarses_: this is something we're actively working to improve right now as it's come up with folks that need to swap credentials on running models so I admit it's kind of sucky atm
[20:20] <xarses_> story of my life ...
[20:22] <rick_h> xarses_: we need to get you a better life
[20:22] <Fallenour> @rick_h @stokachu wait..we? Youve both said "we" and "working", are both of you part of the official juju team? o.O
[20:22] <rick_h> Fallenour: yes, stokachu works on conjure-up and I work on jaas
[20:22] <Fallenour> 8O
[20:22] <Fallenour> so...
[20:22] <rick_h> Fallenour: so we're canonical folks working around the juju community of projects
[20:23] <Fallenour> if plebian is like...1
[20:23] <Fallenour> and godmode is like a 10
[20:23] <Fallenour> you guys are like...35?
[20:23] <rick_h> hah, no. we're like 5 or 6
[20:27] <xarses_> rick_h: if you are going to keep a bucket list, credential validation, always using env_vars for openstack, and supporting clouds.yaml + secrets.yaml (from openstack_cloud_config)
[20:27] <xarses_> autoadd doesn't support the last
[20:27] <rick_h> xarses_: interesting on the openstack bits. Can you file a bug on those and I can group them into the credential mgt discussion?
[20:28] <rick_h> xarses_: no, but does normal add-credential support a file?
[20:28] <rick_h> xarses_: if not that should be the right way to grab a standard file I think
[20:28] <rick_h> it's kind of how the gce one works. We just accept the json file it dumps out
[20:42] <xarses_> rick_h: file? ya in the juju format, sure but thats not how any of the other providers lead you to storing credentials to use against them
[20:44] <rick_h> xarses_: huh? I missed that sorry
[20:44] <rick_h> xarses_: you mean the secreta.yaml?
[20:44] <rick_h> oh sorry, I thuoght you meant that openstack_cloud_config would dump a file
[20:45] <xarses_> "does normal add-credential support a file" I think so, but not the clouds.yaml format
[20:45] <rick_h> xarses_: gotcha
[20:46] <xarses_> rick_h: they use different key names
[20:47] <xarses_> id hope that auto-add would be able to scan it, but add file, I wasn't holding my breath
[20:56] <Fallenour> @rick_h @stokachu @catbus @stormmore so far it looks like exact same issue, even with raids broken.
[21:00] <catbus> Fallenour: can you show juju status in a pastebin.ubuntu.com?
[21:01] <Fallenour> @catbus Give me a bit, it looks like its finalizing now. might take a moment.
[21:41] <Fallenour> @catbus It was exact same issue as last time. Same output. This time all im gonna do is conjure-up, novakvm install, assign devices via standard configure, no extra machines, deploy all 16. If this fails, Im not insane, and something via conjure-up simply doesnt work
[21:56] <Fallenour> Any ideas as to why I keep getting neutron-gateway/0 "hook failed: "config-changed" error? This time it was  totally native, no changes or additions run of conjure-up @stokachu
[21:56] <stokachu> Fallenour: you need to `juju ssh neutron-gateway/0; cd /var/log/juju; pastebinit unit-neutron-gateway-0.log`
[21:56] <stokachu> i imagine it is because it can't find the interface
[21:57] <stokachu> that's that whole port mapping thing i pointed you to on sunday
[21:57] <Fallenour> @stokachu alright, Ill do that once the install is finished. hopefulyl that will be the only issue. I imagine if it is, a simple change of the interface, and a reboot will change that?
[21:58] <Fallenour> @stokachu I hate to ask, but can you relink me that info? I lost everything when the hurricane hit and wiped out power :(
[21:58] <stokachu> Fallenour: so you'll want to juju config neutron-gateway <key>=<value>
[21:58] <stokachu> then juju resolved neutron-gateway/0
[21:59] <stokachu> Fallenour: https://jujucharms.com/neutron-gateway/238 look under Port Configuration
[21:59] <stokachu> specifically note:
[21:59] <stokachu> If the device name is not consistent between hosts, you can specify the same
[21:59] <stokachu> bridge multiple times with MAC addresses instead of interface names. The charm
[21:59] <stokachu> will loop through the list and configure the first matching interface.
[22:01] <catbus> Fallenour: can you confirm 'same output' as "No block devices detected using current configuration" for ceph-osd units in juju status?
[22:12] <Fallenour> @catbus yea on the previous build it was the exact same. Im doing a very generic conjure-up this time, no additional machines, no additional services
[22:12] <Fallenour> @catbus so far, it looks good, but only time will tell.
[22:13] <stokachu> For ceph do your machines have 2 disks?
[22:13] <catbus> ok.
[22:40] <xarses_> @rick_h: never created the machine, so I'm not sure but I think juju is stuck with my dead credentials
[22:41] <xarses_> and now it wont destory, because the a machine is in pending state
[22:43] <stormmore> does anyone know how install a trusty charm? I am aware of the Ubuntu charm but I have only managed to get to install Xenial so far
[22:46] <xarses_> just set the series, or declare it explicitly if the charm has multiple
[22:46] <xarses_> https://jujucharms.com/docs/2.2/charms-deploying
[22:48] <Fallenour> @catbus @stokachu @stormmore @rick_h same exact issue, completely native install this time. What gives? Is the conjure-up instance simply not working natively? Shoul dI use a different install bundle?
[22:50] <catbus> Fallenour: what's the issue exactly? Please show error messages.
[22:50] <Fallenour> @catbus hang on. I did a raid 0 thinking it would split the disks, lemme rebuild....again
[22:50] <stormmore> Fallenour: I would have to see what you are putting in for the devices for ceph-osd
[22:51] <stormmore> Fallenour: it still sounds like ceph-osd is not finding the drives where it is looking to me
[22:52] <catbus> stormmore: juju deploy cs:trusty/ubuntu?
[22:52] <stormmore> catbus: that is what I was wondering :) trying it now
[22:53] <Fallenour> @stormmore I used a raid 0 instead of simply leaving the drives as is.
[22:54] <stormmore> Fallenour: then ceph needs a loopback device of some sort to act as a virtual drive