[01:37] thedac, wolsen, gnuoy, coreycb, dosaboy - fix for bug 1485722 proposed and tested. i view this as critical as it is a deployment blocker for Vivid and later when nrpe is related to rmq. please & thanks :-) [01:37] Bug #1485722: rmq + nrpe on >= Vivid No PID file found

[01:38] beisner, well sir let me go look [01:42] beisner - added a comment [01:43] beisner, please take a look - if I don't hear back I can fix it in the merge as the rest of the change looks fine [01:45] wolsen, removed a cat :-) [01:46] beisner, awesome, those pesky felines getting in the way :) [01:46] cats, rabbits, meh [01:48] beisner, this is backport-potential yes? [01:48] wolsen, definitely [01:48] wolsen, affects next and stable rmq + nrpe [01:48] well just the rmq charm but nrpe users will hurt [01:48] beisner, yuppers - though its slightly mitigated by not affecting an lts version - but ya [01:49] wolsen, it'll be a MP blocker once those tests land ;-) then it'll get heat if it doesn't already have any. [01:49] beisner, +1 on that [01:49] beisner, I was noticing earlier with a proposal that jillr put up - there's not a lot of test around some of the nrpe stuff - which is something we should mark for improvement somewhere [01:50] wolsen, i added a comment on that fyi [01:50] basically to hang tight while we iron out these crits [01:50] then resubmit [01:51] beisner, ah ok - hadn't looked at it too closely, but was trying to help jillr take a look at it [01:51] beisner, did she retarget? I had asked her to retarget /next (honestly haven't looked at that) [01:51] wolsen, yep. but it's running the old tests, which as we know now, lead to releasing broken-ass charms. [01:51] beisner, ack [01:52] wolsen, hence the hang-tight, then rebase & resubmit review. no sense in not-testing any additional features. [01:53] biab, thanks for the check-in wolsen [01:55] beisner, landed [02:01] wolsen, thanks sir [02:03] beisner, np === scuttlemonkey is now known as scuttle|afk [13:17] Hi I am trying to login to the Juju MongoDB instance and I am getting "not authorized" errors. I am using '-u admin' and '-p ....' from the '/var/lib/juju/agents/machine-0/agent.conf' file (API and state password are the same). The 'mongod' instance with started with '--keyFile ....' but there does not seem to be an equivalent option for the 'mongo' client. Suggestions welcome. [13:23] also curiously all three members of the replica set have different passwords. How does one member log into the other members? [13:27] Walex: MongoDB replicaset passwords are cluster specific, so typically you log in through a mongos gateway to reach your cluster nodes [13:27] however i'm not certain how you would log in and poke around in the Juju DB - thats a good question. The core devs probably would have some insight here [13:28] lazyPower: http://www.metaklass.org/how-to-recover-juju-from-a-lost-juju-openstack-provider/ has a suggestion which I tried... [13:28] natefinch: wwitzel3 - Any insight for Walex on how they can cannect to the jujudb? [13:30] lazyPower: maybe I am using the wrong terms here, I don't see any 'mongos' daemons running on the nodes. I meant perhaps the state servers. [13:31] the 3 nodes can log into each other obviously (I see the port 37017 traffic)... [13:32] but that's obviously done with the replica set keyfile. [13:32] right. I'm not so familiar with how our juju stateserver is setup to be honest. i just know it exists and what function it serves [13:32] the experience i speak of is from running a distributed monogdb cluster [13:33] will wait or perhaps later send a mailing list message. [13:33] let me see if I can remember [13:34] you use the oldpassword field from the agent config on machine 0.... [13:35] I forget the exact incantation [13:35] natefinch: I'll try again [13:35] natefinch: in http://www.metaklass.org/how-to-recover-juju-from-a-lost -juju-openstack-provider/ [13:35] there is a plausible looking line [13:36] Thanks natefinch and good morning o/ [13:36] but I use the 'oldpassword' and "auth fails" not sure if 'admin' is the right user [13:36] Walex: that looks good to me. I usually just get the password the old fashioned way (copy and paste) but assuming the grep does the right thing, then yes [13:38] ahhhhhhhhh I have just noticed my mistake: I was trying to log into the 'juju' database, not 'admin'. oops [13:40] indeed with "/admin" that works, sorry... [13:44] Walex: glad we could get you sorted! [13:46] lazyPower: and I can connect directly to 'localhost:37017/juju' if I add '--authenticationDatabase admin' as an option to 'mongo' [13:46] sorted! [13:51] cory_fu: so, if you've got a moment - we got this far yesterday - http://paste.ubuntu.com/12205843/ [13:52] thats our reactive/nginx.py, down to implementing a relationship stub and the super simple intro to reactive and layers is basically complete. we've mirrored what we use in charm schools to teach charming w/ docker [13:53] lazyPower: Would it have killed you to select Python as the language to get syntax highlighting? ;) [13:53] cory_fu: i used pastebinit? [13:53] Ah. Fair enough [13:55] I'm surprised pastebinit doesn't guess the format based on the file name [13:56] papercutz [13:56] :) [13:59] cory_fu: thanks for the sync yesterday. that really got mbruzek and I moving. We're going to have this particluar charm wrapped today and ready to move on to extending the base layer(s) and writing docs before the week is up [14:00] lazyPower: You have a bit of a bug in your config-changed handler. It could potentially call stop_container and attempt to issue docker commands before docker is installed [14:00] that was critical to resolving things we were doing exploratory dev for [14:00] cory_fu: in our testing it was from install => and the entire chain ran before it hit a possible stop hook. [14:00] what scenario would be exposed that leaves us vulnerable to calling stop before its present [14:02] Oh, yeah, I suppose you're right. docker.available will be set during the install hook, so it's a bit moot. Though, if your docker base layer ever changes (say to require a repo URL config or something) that could potentially delay docker install until config, it could open you up. *shrug* [14:02] lazyPower: I was going to suggest creating an nginx.restart state that you could set [14:02] Would be another potentially useful entrypoint for layers using this [14:03] And would future-proof the code against the admittedly non-issue [14:03] we thought about that, and i forget why exactly we refactored down to just dropping in a config-changed hook context vs using the state [14:03] but it makes sense [14:11] * mbruzek starts reading the scrollback === mgz is now known as mgz_ [15:06] hey lazyPower, i just deployed cs:trusty/etcd and was met with: http://paste.ubuntu.com/12206396/ have you seen that before? [15:06] ^^ no relations or anything, just "juju deploy cs:trusty/etcd" got me there === scuttle|afk is now known as scuttlemonkey [15:16] hey Kevin that looks aweful [15:18] kwmonroe: that looks like our overuse of path.py has come to bite us. [15:18] kwmonroe: Can you paste the entire unit log? Did the pip install path.py not work? [15:19] in the install hook [15:19] * marcoceppi wears a smug face [15:19] * mbruzek waves and nods at marco [15:21] momento mbruzek, i tore that env down, but I'll fetch the logs again shortly [15:21] kwmonroe: it looks to me that the install hook does not install pip [15:32] I see that the Juju "command" node(s) don't run (necessarily) any daemon, and that the "state" nodes run 'mongod' from the 'juju-mongodb' package. Also I see that all nodes run 'jujud' from the unit 'tools' directories. I am about to update to 1.24.5. How do the 'jujud' binaries in each unit get updated? When? Are there ordering dependencies among the 'juju-*' packages for upgrade, and among the state servers? ... [15:32] * Walex worries about details... [15:38] mbruzek: i betcha you gotta do "from path import Path" (cap P on the 2nd path): http://paste.ubuntu.com/12206611/ [15:38] kwmonroe: It looks like path.py was updated today! It is possible that is not working. [15:39] kwmonroe: we use lowercase path all over the place. whit can you help with this problem? [15:39] mbruzek: we use cap P in our big data charms... now fight. [15:40] kwmonroe: hmm interesting, let me check the charm code 1 sec kwmonroe [15:40] mbruzek: lazyPower.. i'm just gonna leave this here: https://pypi.python.org/pypi/path.py [15:49] kwmonroe: i follow that with the bug hat spawned this issue [15:49] https://github.com/jaraco/path.py/issues/102 [15:51] kwmonroe: but thanks for the heads up on the issue. We'll cut a hotfix patch and get it cueued up - as we are apparently broken in the store now [15:53] gracias lazyPower! [15:53] kwmonroe: once we have a fix in place do you mind being the on-call reviewer for that MP? i'll stack it on what you're already reviewing so its applicable :D [15:54] sure lazyPower, i'll be your huckleberry [15:54] aww yeee [16:14] kwmonroe: broken rel of path.py was just pulled from pypi [16:14] ready for you to re-test at your leisure [16:18] whoa juju gui just removed its crosshatched background - https://github.com/juju/juju-gui/pull/799 [16:21] confirmed lazyPower, latest deploy pulls path.py-7.7.1 and all is right with the world. would you like a tracking bug requesting s/path/Path for the inevitable time when path does finally go away? [16:22] lazyPower: quit watching us :P [16:22] kwmonroe: we're going to pin package deps now, and prepare for the inevitable breakage when we have the bandwidth [16:22] thats used in a lot of places [16:22] and we have a lot of auditing to do [16:23] ack lazyPower. thanks to you and whit for the nudge to get path-8 out of pypi :) [16:42] I updates packages on master node, now 'juju upgrade-juju' tells me that "no more recent supported versions available" how do I make a more recent version of the tools available to the nodes? [16:46] Walex: when you juju upgrade-juju it should have published a newer version of the tools to your state server [16:47] the nodes will slowly start to upgrade once your environment is upgraded if memory serves me correctly [16:49] lazyPower: ahhhhhhh so in theory I just wait. I noticed somewhere a mention of a queue [16:50] ah but just looked at my state servers and I don't see the 1.24.5 directories I'll investigate [16:58] what's peculiar is that when I upgrade 'juju-core' it took many minutes and it is a fairly small package. [17:00] ackk, regarding our discussion for the keystone pause/resume [17:00] wolsen, yes [17:01] ackk, so for the clustering support - if we had the support in hacluster charm to move the vip off a node (e.g. get a node in maintenance mode or paused), then I think what is in the keystone charm would be fine actually [17:01] ackk, the pause and such that is [17:02] ackk, I still have a concern that if a user were to simply issue the pause against keystone but they hadn't done the appropriate action on the hacluster charm that they could end up with a service disruption [17:02] which might not be great [17:02] wolsen, right. there are other similar cases where there's more to do than "service foo start/stop". for instance ceph OSDs need to be set to "noout" [17:03] ackk, right - maybe we can address it with docs around the pause/resume action? [17:04] wolsen, I see your point. I'm a bit worried about putting a lot of logic in a single action and having an action with the same name doing different things across charms [17:05] wolsen, there are other cases where you'd definitely want separate steps, like for nova-compute [17:05] ackk, that's a fair point, but to me the action defines the semantics of what you want to happen and its up to the charm to define what needs to happen for that action to take place - which can add some complications [17:05] ackk, that being said lets try to keep it simple until we have to [17:05] do more [17:05] ackk, but if we do keep it simple, we still need to be able to inform the user what other actions need to take place [17:06] wolsen, you mean documenting that you should do other stuff before stopping services/ [17:06] ? [17:11] ackk, yep [17:12] ackk, I'm thinking the action docs would say something about requirements in a clustered scenario, e.g. running the pause action there first [17:13] wolsen, btw what's needed on the hacluster side to move the VIP? [17:14] ackk, if we could enforce that the action were running first, that'd be great, but that's kind of above and beyond... [17:14] ackk, for the hacluster - theres the option to move a resource - but the cluster may need to be in maintenance mode as well or the node marked as offline [17:15] ackk, i'd have to go through the specific details of how to do that (to refresh my memory) [17:16] wolsen, I see. so basically we could add a pair of actions there so that you'd "juju action do pause hacluster-keystone/X; juju action do pause keystone/Y" [17:16] (roughly) [17:16] ackk, yep [17:16] wolsen, cool [17:16] ackk, so it'd still keep the building blocks you're looking to add (we can fancy it up in the future if needed) [17:16] wolsen, +1 [17:17] ackk, but the user needs to know that they have to do the multi-step process [17:17] wolsen, agreed [17:17] wolsen, could you sum that up in a comment on the MP? [17:17] ackk, doing so now [17:17] wolsen, thanks [17:18] wolsen, totally unrelated (but since we're on openstack charms topic), do you know any downside of not using the embedded webserver for ceph-radosgw? [17:19] ackk, and I think the other proposals which are similar (e.g. glance and percona-cluster) will likely fall into the same - though percona-cluster I think we should carefully think through that in some more depth (I'll try to give some more thought to it) [17:20] ackk, when not using the embedded server, it doesn't have the 100 continue support built-in to the apache service. Ceph devs used to provide an apache package which had it but they yanked it in favor of the embedded web server [17:20] ackk, the 100 continue support is necessary for some of the use cases (e.g. using it from the horizon dashboard) [17:20] wolsen, I see [17:21] ackk, so the preferred way forward is the embedded server [17:21] ackk, but is there another use case that you have for not using it? [17:22] wolsen, well, we've seen failures in autopilot deploys recently. I'm not sure it's related, but it might have happened since we switched to the embedded server [17:23] ackk, oh :( [17:24] wolsen, as said it's just a guess, maybe it's an unlucky coincidence [17:26] ackk, logs and a bug would be great (if you haven't gotten one already) [17:27] wolsen, https://bugs.launchpad.net/charms/+source/ceph-radosgw/+bug/1477225 [17:27] Bug #1477225: ceph-radosgw died during deployment

[17:27] ackk, also wanted to say the MP looked really good in general and thanks for that contribution! [17:27] wolsen, np! :) [17:29] wolsen, wrt maintenance, we're also not sure yet of what needs to be done on neutron-gateway nodes (see notes in the doc) [17:30] ackk, bleh yeah that's tricky as it will almost certainly cause service disruption unless dvr is enabled I believe [17:30] wolsen, specifically if removing/readding the l3router in neutron is needed, and how properly cause a failover if [17:30] wolsen, we deploy with l3ha [17:30] ackk, ok [17:30] router-ha, that is [17:31] wolsen, still, stopping services on the node is not enough to cause a failover [17:32] ackk, I'll have to dig into it (I don't have enough background on neutron gateway and ha to be honest) [17:33] wolsen, ok, thanks for the info [17:33] wolsen, and for the review :) [17:35] ackk, np :-) it was fun! [17:36] heh [17:52] lazyPower: fwiw, i saw pypi went to path.py-8.1 and re-checked etcd. you're still good. [17:52] kwmonroe: above and beyond, thats awesome. Thanks! [17:52] np lazyPower, gives you time to work out which version you want to pin. [18:00] lazyPower: this path.py hiccup makes me think we should have an official juju python index [18:00] whit: that sounds like the opposite of what we need, why not just version lock your deps? [18:01] marcoceppi: accomplishes the same thing without having to edit all the places the dep is defined everytime you need to update [18:01] marcoceppi: think of it as a hierarchy of control [18:02] the index is centralized, but under our control (unlike pypi) [18:02] then reqfiles and setup.pys become the more granular control [18:02] sounds like a lot of work for little pay off [18:02] whit: that sounds like an extra maintenance burden and infrastructure for the sake of running infrastructure. It would yield some benefit, but i'm not certain thats enough to not just version lock deps. [18:03] if we had packages constantly getting yanked from pypi, then yes, that sounds like the way to move forward [18:03] so we can maintain the versions we depend on that are otherwise disappering [18:03] the issue is the "default" version [18:03] which is always the most current in the index [18:03] well, thats fair, but we also didnt define any of that in our requirements. in 2 places we had blind install path.py on the CLI [18:04] if you pin, you no longer will pull newer [18:04] and in others we had no version data in the requirements.txt file [18:04] sure, so you should develop a charm, pip freeze, deliver, iterate, pip update, freeze again, release [18:04] I don't think you all are grokking how these things scale or the tooling available [18:05] * marcoceppi shrugs [18:05] just take it as my advice until it makes sense ;) [18:05] whit: thats possible [18:05] whit: but do we adopt the same thing with companion technologies for other languages? Run our own gem host, npm index, et-al? [18:05] think of it as every deploy of any charm as your "product website" [18:06] you wouldn't deploy packages straight from pypi to prod [18:06] quality control folks [18:06] i did :3 [18:06] ur.. production [18:06] and i hated the pain that introduced like today, but we pinned deps then too. we didn't spin up a gem host. [18:06] so, how does pip freeze not solve this? [18:06] I don't want us to be responsible for someone's charm not working because an index is down [18:07] or they're using a newer version than our index or vice versa [18:07] marcoceppi: vs. pypi or github being down which you can do nothing about? [18:08] yeah, but we're not responsible and they're all well estabilished services that have a team dedicated to keeping those things running [18:08] no ones perfect but I don't want to run pager duty because we're running our own index [18:08] marcoceppi: yeah, but those being down == a crap experience for charm deployment [18:09] and our index being down? [18:09] this is an academic example in reliability and control [18:09] we can fix that [18:09] we can't fix externalities [18:09] that's the point of the example [18:09] this is the exact reason SAAS exists [18:10] marcoceppi: when your shit break because some elses saas break, you still look like an ass [18:10] if you want to run this for a set of charm syou maintain, sure, that sounds great, I don't think it sets us up for success anymore than what exists with pypi or other services [18:10] saas exists so you don't have to build it [18:10] but when aws goes down, netflix loses money [18:11] and when yoru shit breaks because you can't run a web service 24/7 due to stafing you look like a bigger ass [18:11] marcoceppi: that we can fix ;) [18:12] marcoceppi: my general point is that python libraries working are part of charms working and therefore part of a good charming experience [18:12] which is important to the success of juju [18:12] whit: this sounds more like deps should be bundled with charms. [18:12] not that we should run an indexer [18:12] running a proxy isn't a project concern [18:12] it's an operations concern [18:12] we run pypi proxies in our private environments [18:12] resources would help, but an index fixes the problem now without the developer issue of pin maintenance [18:13] that just work, leave this for ops people to run themselves, not us [18:13] https://pypi.python.org/pypi/devpi [18:13] lunch is here [18:14] marcoceppi: yes it is an operations concern. that we agree on. [18:14] who's operation concern and why, we don't [18:17] whit: actually the fact we were pulling from pypi and not some mirror helped us today... had we still had the 8.0 copy cached we would still be broken in the charm store right now [18:17] whit: marcoceppi fyi from another team's perspective. We've recently discussed talking with IS on running a pypi index in prodstack for our services there and gating/curating. However, we currently have a matching "xxx-download-cache" project for each codebase and build it into the project's build steps. [18:17] this allows for complete offline building of code/projects [18:17] and completely verison locked w/o internet access (since prodstack has egress firewall locks) [18:18] whit: marcoceppi so I guess some additional feedback, I can't not redeploy my produciton because pypi or GH are having DDOS issues atm. [18:18] * rick_h_ goes back to lurking [18:23] Every organization and project has different tollerances for acceptable risk. Some projects may be willing to accept the risk that goes with depending on github or pypi being up, others cannot. [18:45] lazyPower: we would have tested the new copy before updating the index [18:45] therefore no breakage? [18:45] i guess [18:45] * lazyPower shrugs [18:46] i'm not interested in running a pypi mirror [18:46] but if someone else is,i'm thumbs up to them doing it [18:47] rick_h_: that sounds good [18:47] ideally grabbing and freezing all necessary resources for a charm has lots of benefits [18:47] this is the general idea behind IBWFs mas o menos [18:49] whether you are building on the fly or building resource blobs or some sort of image, controlling the source material has lots of benefits [18:50] image workflows do have the advantage of breaking before deploy (in the build stage) rather than during deploy [18:53] whit: how far do you take this? what makes archive.ubuntu and PPA different from pypi? === fgallina`` is now known as fgallina [19:01] Fwiw I use devpi caching mirror in uosci [19:02] As doing A lot of iterations revealed pypi weaknesses and false test failures === scuttlemonkey is now known as scuttle|afk [20:30] jrwren: archive.ubuntu is better curated than pypi [20:30] ppa is effectively == to a self hosted index if you control the ppa [20:30] if you don't, you trust the maintainer, so it depends on the nature of the relationship [20:31] whit: ah, I thought you were refering to uptime. I used to deal with pypi being down often enough. [20:31] so if you run the index, you have a bit more control of the uptime [20:31] rather than depending on the packaging vols of pypi [20:32] yes, my point is, that to a 3rd party, there isn't much difference between pypi and a ppa. [20:33] Both are out of control systems which present risk. [20:33] jrwren: risk is contextual [20:33] what do you mean? [20:33] if i am trying out juju, and my charm fails due to pypi being down, I still will say juju sucks [20:34] definitely true. [20:34] if I'm using juju for a situation I'm invested in, it behooves me to run a debian index and a python index [20:34] and my own charm server [20:34] exactly. [20:35] or accept the risk of not doing so. [20:35] so from the context of eco (and anyone who care bout adoption), controlling the central vectors of potential failure is valuable [20:35] yep [20:36] a good first time experience is one that works [20:37] from our perspective, it's a tradeoff between investing in running, curating and monitoring our own index, vs. the less known cost of random failure [20:38] its a very good point. [20:39] it makes me wonder if charms shouldn't declare their external dependencies. [20:41] certainly with unit status external deps could be handled in the charm an a status-set blocked used to clearly tell end user that an external dep failed. [20:59] +1 to charms declaring external dependencies, at minimum in the form of a README blip. [21:00] That is so much better than having to figure it out via install hook failures when you're sitting behind firewalls and proxies.