=== CyberJacob is now known as zz_CyberJacob === urulama is now known as urulama|afk === zz_CyberJacob is now known as CyberJacob === CyberJacob is now known as zz_CyberJacob === urulama|afk is now known as urulama === zz_CyberJacob is now known as CyberJacob [13:07] morning [13:31] howdy [13:42] jamespage do you know if I can get juju + aws to automatically attach ebs drives for ceph to use as a block device? [13:42] Icey, probably [13:42] Icey, andrew is the right person to ask [13:44] Icey: you'll want to use the experimental storage stuff axw has been working on [13:51] Hi Folks.. can anyone help me with what's happening here? Hit this twice today in a setup that was previously working fine.. destroyed and re-created my environment (MAAS), deploying first service juju-gui to the same machine i bootstrapped onto (which is libvirt) .. stuck on "filter.go:137 tomb: dying -> leadership failure: leadership manager stopped" - there are some earlier errors about i/o errors/losing communication - full log; http://lathi.at/fil [13:52] not entirely sure which of the errors is more related to the actual issue and what that is [13:59] lathiat: didn't get teh full log link, could you supply it again? [13:59] full log; http://lathi.at/files/juju-leadership.txt [14:00] i am guessing the real issue is the EOF stuff much earlier but i'm a bit lost with it from there [14:00] maybe related https://bugs.launchpad.net/juju-core/+bug/1493123 [14:00] Bug #1493123: Upgrade in progress reported, but panic happening behind scenes [14:00] [14:01] lathiat: that's odd, I've not seen that [14:01] thing is i ran into this issue just before, but iw as also getting an error about machine 7 not existing, that i had previously force destroyed, assumed i had corrupted something.. decided to try and start over and hitting the same thing now. === JoshStrobl is now known as JoshStrobl|AFK === CyberJacob is now known as zz_CyberJacob === zz_CyberJacob is now known as CyberJacob [16:34] beisner, lazyPower: deploying trusty-kilo-ha with next charms eliminated 99% percent of the problems I was experiencing :-) [16:35] hi bdx good to hear - we're feverishly working on the release process this wk. [16:36] beisner, lazyPower: Good to hear! There is one remaining issue I can't seem to get around that still exists out of the issues I was experiencing when deploying kilo-trusty-ha with trunk charms [16:36] that is, I can't query the keystone vip api endpoint.... [16:36] bdx grain of salt: next charms not recommended for production as they are generally in active dev [16:37] bdx well that would be problematic ;-) [16:37] beisner: totally. [16:37] bdx, enabling keystone ssl by chance? [16:38] beisner: I'm totally down, what reasoning have you, if any? [16:38] besides security [16:38] ha [16:39] bdx, just wondering as the client connections get trickier. if not specifically needed, it's less hassle (and less secure of course) to use the default non-ssl. [16:39] so nvm me [16:39] totally, ok [16:40] you have never experienced this? [16:40] bdx, gotta run. if you'd like input, i'd start with inspecting your sanitized novarc / openstackrc file or env vars + keystone --debug catalog + keystone --debug token-get. [16:40] that info might give insight [16:41] + juju stat --format tabular :) [16:42] also might do a sanity check that the services are all running, and that the ips are in place in each unit's nic. [16:42] o/ [16:42] beisner: charmconf.yaml <- http://paste.ubuntu.com/12886818/ [16:42] juju status --format tabular <- http://paste.ubuntu.com/12886819/ [16:44] im not worried about the dashboard relation, as I am currently troubleshooting it as it depends on the keystone api as well [16:46] which is why I think its giving me grief and status shows "Incomplete relations: identity " [16:47] beisner: oooh just saw ^^ [16:47] nice, thanks [16:47] later [16:55] bdx yeah, that's the new workload status. gives a lot better feedback as to what is going on throughout the deployment steps, and through managing the thing longer term. [17:28] mgz, I made updates to https://code.launchpad.net/~jog/juju-ci-tools/centos_deploy_stack/+merge/275135 [17:31] jog: landit [17:31] mgz, thanks [18:44] jamespage, marcoceppi, gnuoy, beisner, lazyPower: After toying with failing openha deploys, most of the issues I was experiencing have been resolved. I have found the primary issue(s) that still exist in next charms concerning ha deploys....the issue is that service charms do not get keystone vip in their .conf files, also keystone endpoints get created for non vip service endpoints [18:44] ^^ resolved in next branches* [18:46] After manually making the needed modifications to the endpoints in the keystone.endpoints table and correcting each of the charms configs to include the keystone vip endpoint, I have a working ha stack [18:46] ! [18:46] yea! [18:48] bdx: none of the service charms get the keystone vip? Or is there a specific charm? Also do you have the bundle you are deploying from so I can see? [18:48] I'll file bugs on these issues, It would be nice to see these things fixed in the 15.10 release so as those of us looking for HA stacks have a somewhat stable answere, instead of heading into the 15.10 release with HA borked still [18:48] bdx: fantastic. Let me know when those bugs are filed [18:48] thedac: juju status --format tabular <- http://paste.ubuntu.com/12887639/ [18:49] thedac: deployer.yaml <- http://paste.ubuntu.com/12887642/ [18:49] thanks [18:50] thedac: thats correct, none of the service charms get the keystone vip [18:50] also keystone.endpoints has all non vip entries [18:51] ok, I'll take a look today [18:51] thedac: awesome! thanks! [19:05] thedac, openstack-charmers, core: https://bugs.launchpad.net/charms/+source/keystone/+bug/1508575 [19:05] Bug #1508575: Keystone DB gets all non vip endpoints + openstack service conf files get keystone non vip [19:05] boom [19:12] bdx: thanks [19:13] thedac: NP, thanks for looking into this! [19:29] hello bdx [19:29] bdx, we're at bit late for 15.10 release for any more bugs today (as its tomorrow) [19:30] bdx, I am keen to understand the problems you're having - as thedac and other have noted, we've run our own internal QA cloud for 1.5 years in a HA deployment through three openstack series upgrades [19:45] bdx, I added a comment to that bug - really need to see the output of "sudo crm status" on any of the haclustered services === JoshStrobl|AFK is now known as JoshStrobl [20:10] jamespage: heres the output of "sudo crm status" on a keystone node -> http://paste.ubuntu.com/12888152/ [20:11] bdx, ok - that looks fine [20:11] * jamespage thinks [20:12] bdx, could you do the following and pastebint the output [20:12] of course :-) [20:13] bdx, juju run --unit keystone/0 "relation-ids ha" [20:13] and then [20:13] jamespage, ha:81 [20:13] juju run --unit keystone/0 "relation-get -r - keystone-hacluster/0" [20:14] bdx, for that next one you'll need to use ha:81 and the unit name for the paired hacluster unit [20:14] clustered: "yes" [20:14] private-address: 10.16.100.72 [20:15] bdx, well again that looks ok - next link... [20:15] clustered = yes is the good bit there [20:16] jamespage: let me note, a) this is a repeated issue across 15x deploys of trunk and next, and b) on every deploy, the service clusters form without error for each service [20:17] bdx, yeah - just puzzled as to why thats not propagating out correctly [20:18] bdx, the clustered=true triggers a re-run of relation hooks where things need to be changed, and the code that determines endpoint resolution should detect the same thing and start using the VIP's [20:19] jamespage: ok, good, how does this happen "the code that determines endpoint resolution should detect the same thing and start using the VIP" --> the vip isnt the same as private-address: 10.16.100.72 [20:19] bdx, lets see what keystone is propagating [20:20] bdx, there is an endpoint resolver in charmhelpers that figures that out consistently taking into account cluster status and configuration [20:20] I have modified all of my endpoints and .conf files to resolve the issue, and also as a proof of concept [20:20] suffice to say with split networks, its gets quite hairy, but your deployment is not doring that [20:20] bdx, you should categorically not need todo that [20:21] jamespage: where is the "endpoint resolver in charmhelpers"? if you don't mind? [20:22] bdx, http://bazaar.launchpad.net/~charm-helpers/charm-helpers/devel/view/head:/charmhelpers/contrib/openstack/ip.py#L106 [20:23] bdx, can you also do: juju run --unit keystone/1 'relation-ids identity-service' [20:23] bdx, the log data from /var/log/juju/unit-keystone-1.log would also be useful [20:23] identity-service:41 [20:23] identity-service:45 [20:23] identity-service:57 [20:23] identity-service:59 [20:23] identity-service:70 [20:23] identity-service:89 [20:23] bdx, ok and now [20:24] juju run --unit keystone/1 'relation-get -r identity-service:89 - keystone/1' [20:24] jamespage: /var/log/juju/unit-keystone-4.log <- http://paste.ubuntu.com/12888244/ [20:25] private-address: 10.16.100.72 [20:27] bdx, that's very light [20:27] bdx, ok can I see this (need to know which unit is leader) [20:27] jamespage: I don't see how resolve_address could return the vip ..... [20:27] bdx, L132 should be in path [20:29] bdx, juju run --service keystone 'is-leader' [20:29] jamespage: and also, keep in mind that I have destroyed two keystone units and re-add-units ....in case you are wonder why the log is light, and also the extra ids for past units...also I don't have debug, or verbose logging on....grrr ...my bad [20:29] jamespage: if not net_addr: will never execute .... [20:29] bdx, oh the relation data was light, not the log [20:29] oh [20:30] bdx, it will - if config is unset, None gets returned [20:31] so not net_addr will equal True in that case [20:31] jamespage: what config must be unsetH? [20:32] bdx, os-XXX-network [20:32] unset* [20:32] omg [20:32] it has no default [20:32] jesus [20:33] where XXX in public, internal, admin [20:34] this is totally my bad... I should of read into that weeks ago [20:36] jamespage: MAJOR revision to any and all docs concerning HA deploy to include ^^ [20:37] I should of gotten to the bottom of this earlier on my own, by investigating, but ...thank god [20:38] jamespage: thanks for your help getting to the bottom of this [20:39] bdx, have we? [20:39] bdx, got to the bottom of this? [20:39] just eating dinner as well biab [20:39] jamsepage: yes! You must leave the os-xxx-network unset for vip endpoints to get set anywhere!!!! === natefinch is now known as natefinch-afk [20:40] jamespage, openstack-charmers: that is the missing piece! you all have been keeping secretssssss! [20:41] not really though....I could of found it:-/ [20:41] :-) [20:49] woot! [20:50] jamespage, beisner: thanks for your help concerning this [20:51] actually thedac is just uses aliases: jamespage and beisner [20:51] jusssst kidding. [20:51] heh [20:51] thedac: ^^ [20:51] bdx: fwiw, I just ran a test over lunch that proves this point. Services do get keystone's vip [20:52] bdx, erm that's not quite true [20:52] jamespage: which? [20:53] bdx, you can use vips with configurations that also use os-XXX-network [20:53] configuration options [20:53] bdx, vip can be a single VIP or a space delimited list, if you are splitting endpoints across networks [20:53] jamespage: concerning the resolve_address function, I don't see how that could happen....? [20:53] bdx, L134 [20:54] for vip in vips: [20:54] check if in network for endpoint type [20:54] if it is, use this one [20:54] basically [20:54] oooh I see [20:54] bdx, looking at http://paste.ubuntu.com/12887642/ [20:55] I can't see that you are setting is os-XXX-network config options for the keystone charm [20:56] jamespage: shoot...your right.... [20:58] jamespage: I must disclose ....initially 1 of the keystone endpoints was set to the vip in the database on my last deploy .... the keystone admin endpoint of http://10.16.100.34:35357/v2.0 [20:59] jamespage: every other endpoint was not set to the vip including the other keystone endpoints [21:06] bdx, the charm would have blindley configured that anyway [21:09] bdx, did you figure out which is the lead keystone unit? I really want to see the juju log file fromthat one [21:15] jamespage: here is the keystone log from the leader: http://paste.ubuntu.com/12888588/ [21:18] unit-keystone-1.log* [21:39] jamespage, beisner, thedac: I propose I redeploy, and this time I will give the services 30mins to settle and ensure clusters form before I add any relations.... this could rule out any possibility of timing issues with clusters not being fully formed when relations are made. [21:40] thedac: Did you use juju-deployer in a once through to deploy all services and relations sequentially in your test? [21:40] bdx: yes [21:41] and I am testing one of our ha oneshot bundles right now. I'll let you know [21:41] thedac: sweeet! [21:45] thedac: are you deploying ha services on containers? [21:47] to containers* [21:47] let me confirm. [21:48] ah, no actually, so that may not be a valid test [21:48] I'll test with your bundle. [21:50] thedac: nice [22:12] thedac, jamespage, beisner: So...I haven't set the param 'vip_iface', hence I am assuming the default of 'eth0'. Seeing as I am deploying these services to containers, the primary interface is not 'eth0', but 'juju-br0'. This is a redherring to me, and looks like the ha-relation-joined hook could be affected. [22:13] http://bazaar.launchpad.net/~openstack-charmers/charms/trusty/keystone/next/view/head:/hooks/keystone_hooks.py#L515 [22:15] vip_iface: [22:15] type: string [22:15] default: eth0 [22:15] description: | [22:15] Default network interface to use for HA vip when it cannot be [22:15] automatically determined. [22:15] so you may be on to something there [22:19] thedac, jamespage, beisner: WAAALAAA [22:19] thedac, jamespage, beisner: >>> import netifaces [22:19] >>> netifaces.interfaces() [22:19] ['lo', 'eth0', 'lxcbr0']YGT [22:20] no juju-br0!!!!! [22:21] http://bazaar.launchpad.net/~charm-helpers/charm-helpers/devel/view/head:/charmhelpers/contrib/network/ip.py#L156 [22:23] bdx: so you might test with vip_iface set to juju-br0 [22:26] thedac: entirely....what I'm pointing out....is that netifaces.interfaces() does not recognize the juju-br0! [22:27] which would implicate the call to netifaces.interfaces() in network/ip.py as the culprit [22:31] http://bazaar.launchpad.net/~openstack-charmers/charms/trusty/keystone/next/view/head:/hooks/keystone_hooks.py#L515 get_iface_for_address *OR* vip_iface. I think setting vip_iface and vip_cider will fix this. [22:36] thedac: as example if "get_one()" returns "one" and "get_two()" returns "two" [22:37] thedac: and you have "one_or_two = (get_one() or get_two())" [22:37] thedac: one_or_two == "one" [22:38] thedac: so even if the 'vip_iface' is set it would still not return the correct iface [22:38] sorry, I am trying desperatly to get something up and running to actually validate this. But if get_iface_for_address does not retrun an address because juju-br0 is not in netifaces.interfaces() then the or would work. If it does return you are right. [22:41] thedac: entirely [22:43] Looking at lines 145-183 looks like it will return None http://bazaar.launchpad.net/~charm-helpers/charm-helpers/devel/view/head:/charmhelpers/contrib/network/ip.py#L156 [23:08] thedac: totally [23:08] thedac: I'm redeploying with 'vip_iface' configured to juju-br0 [23:08] great, fyi, you may also need vip_cidr set