=== blahdeblah_ is now known as blahdeblah [00:49] rick_h: is there special prometheus/telegraf/grafana bundle floating around that works with cmr/juju-2.3beta? [00:50] bdx: just have to grab my versions in http://jujucharms.com/u/rharding [00:50] Ooooooo [00:51] rick_h: possibly you haven't granted me access? [00:52] oh the charms there [00:52] are the ones I'm looking for [00:52] bdx: everyone has? https://jujucharms.com/u/rharding/prometheus/2 and https://jujucharms.com/u/rharding/telegraf/0 [00:52] IC [00:52] bdx right [00:53] you deploy telegraf to the model on which you want to monitor the things on? [00:53] and relate telegraf across to prometheus then? [00:53] or, you offer up the prometheus endpoint [00:53] I see [00:54] offer up the prometheus endpoint [00:54] then the target model telegraf can relate [00:58] bdx: right, telegraf is the subordinate that goes onto the things you want to watch [00:58] totally [00:58] bdx: right, should be some blog examples to walk through from the demos if that helps [00:58] already on it [00:58] thx [00:58] awesome [00:59] not awesome rick_h [01:00] kwmonroe: :( but but can it please be awesome? [01:00] i hate our monitoring stack, and here's why... [01:01] it's too hard [01:03] rick_h: i want to gather logs and metrics from my deployment.. maybe that's k8s, maybe that's hadoop, etc. it's hard to figure out whether elastico or prometheus or roll-your-own-rsyslog will serve me best. [01:04] kwmonroe: yea, I think that getting the data into prometheus is alright as things tend to build that part in [01:05] kwmonroe: the missing gap is that the relations needs to send sample dashboards over the wire to pre-setup the grafana/visualization bit for that data [01:05] kwmonroe: imo and all that [01:07] rick_h, yes! we can give peeps a great deployment and functional UI in one shot. things like "sample dashboards" are what we're lacking. [01:07] https://github.com/jamesbeedy/layer-elasticsearch/issues/10 [01:07] I'm on the same track [01:07] kwmonroe: yea, so in the work we did for monitoring Juju we gave a downloadable sample dashboard but you have to import it manually [01:07] I figured if I can get the grafana-source relation working there [01:08] that would help pull it all together [01:08] kwmonroe: I think the thing is that the thing feeding prometheus data knows about the dashboard json, but has to pass that through prometheus, to grafana somehow [01:08] bdx: yea, but it's a bit different since it's a relation once removed (one link back) [01:09] huh [01:09] "things like "sample dashboards" are what we're lacking" - totaly [01:09] @kwmonroe we need to start a hoard of dashboard json [01:10] bdx: sorry, I thought you were saying prometheus should do it "I figured if I can get the grafana-source relation working there" [01:10] nah. [01:10] I got it working [01:10] with elasticsearch [01:10] not 100% [01:10] I'm just saying if you monitor a machine with telegraf the telegraf charm should have sample grafana dashboards and I'm not sure how to get them to grafana through prometheus [01:10] but like working/bronk working [01:10] bdx: ah, ES in particular [01:10] yeah [01:10] bdx: gotcha, I read too much into it my bad [01:10] nw [01:12] I prefer grafana over kibana, I think its good to have both options for people (I've a new kibana charm in the works now) [01:12] but for *me* [01:12] I just want it all in one place [01:12] it makes sense to just have the two backends plugged into grafana [01:13] then you can get your elastic beat metrics [01:13] for logs [01:13] i prefer "juju run --all top" for my metrics, but i get that others want this pesky stuff called "data". [01:13] lol [01:13] right [01:16] bdx: let's say i want to propose a visualization engine for metrics/log analysis. your vote is +1 for grafana? [01:17] yeah [01:17] its multi-backend capable [01:17] k, that settles it. we're doing kibana. [01:17] haha [01:17] ;) [01:18] bdx: you tested so well the first time, let me try another. prometheus vs elastic. what say you? [01:18] we need a snap monitor [01:19] to export logs from snaps [01:19] kwmonroe, rick_h: how are you guys getting snap logs exported? [01:21] oh, nm [01:21] Im guessing you just pick them up from syslog [01:23] bdx: i really don't know what you're asking... like a report from "snap list" for installed snaps? [01:24] how in the holy heck this is related to prometheus/grafana/kibana monitors is a mystery to me. get back in line bdx, one topic at a time. [01:25] lol [01:25] well [01:25] I just snapped our api [01:25] my next goal [01:25] like tomorrow is to get the logs sending through filebeat [01:26] https://forum.snapcraft.io/t/best-practice-for-logging/1210/7?u=jamesbeedy [01:27] so I asked the snapcraft guys what the best practice is there, to get the logs from your snaps [01:27] and the say to log to syslog [01:27] so I did that [01:28] but that doesnt actually mean your same log entries get into the actual system syslog [01:28] all in all [01:28] what I'm saying is [01:28] I don't know how to export the logs of a snap [01:28] and I feel its a difficult task [01:28] that I am extremely pressed to figure out [01:29] and I'm entirely blocked because I dont know how to export the darn logs for my snapped processes [01:29] if everyone is snapping their software, this has to be a common issue [01:30] if there is one thing that is a blocker for the whole logging deal using canonical tech [01:31] its exporting logs from snaps [01:31] for sure [01:31] bdx: hmm, yea not sure tbh. I think stuff is still early into snaps and gaps like this need some love. I've not tried it out myself so no first hand exp. [01:31] ok [01:31] kwmonroe: your snapping quite a bit [01:32] how are you exporting your snapped process' logs? [01:36] ahh possibly they snap cant be in devmode [02:00] omg [02:00] im so silly [02:01] my logs were never showing up in syslog because I hadnt ever taken my snaps out of devmode [02:02] oh it wsa probably actually the confinement mode confinement [02:02] getting switched to classic [02:02] there we go [02:03] ok glad I figured that out [02:06] bdx: i had to go see a guy about a thing. i'm sorry i missed the diatribe, but here's the juice. use the content interface to provide data from one snap to another (or the filessytem as a whole) [02:06] it works wonders with 2.29 [02:07] sure, classic mode can save you, but you can enforce all kinds of specialness if you go confined. nothing good, mind you, but all kinds for sure. [02:07] nice [02:07] ok [02:08] Ill look into this immediately [02:08] snapd 2.29? [02:10] is that in zesty or something [02:11] https://packages.ubuntu.com/bionic/snapd [02:11] its not even in bionic [02:11] oooh [02:11] adha [02:11] bdx, stop being a sheeple [02:11] that is a funny joke kwmonroe [02:11] ;) [02:12] snap refresh core --channel candidate will get ya 2.29 [02:12] sudo ^^ and what not [02:14] anywho, i really do have to bug out.. point i was trying to make is the new ability to share bits via the content interface. if you're interested, here's how something like the "pig" snap talks to hadoop, which is only available in 2.29: https://github.com/juju-solutions/snap-pig/blob/master/snap/snapcraft.yaml#L29 [02:14] oh beautiful [02:15] thank you [02:15] go bug out now [02:15] ;) [02:18] fwiw bdx, that only works because the 'configure' script of the pig snap makes that directory. it's a fickle time to be alive. i'm happy to chat with you more about how this stuff works -- maybe even headline the juju show this week. [02:19] yeah [02:19] you should [02:19] that would be great [02:19] @rick_h^ [02:19] not sure what you agenda is [02:20] oh hell, now you've gone and made it official [02:20] im hungry [02:20] for blood [02:20] heh, see you folks manana [02:20] Lol on the hook! [02:20] Night kwmonroe [05:54] I am still looking for some help in this "unable to remove a service using juju remove-service or by juju remove-unit or even removing a machine by juju remove-machine command. Even i deleted container still juju show-status display associated machine, application" === frankban|afk is now known as frankban [13:51] I'm having trouble provisioning a ceph cluster via juju. I think I got most of it going, but curl'ing the radosgw results in "Empty reply from server" -- can any folks with ceph/openstack experience provide insight? [14:13] Hey ChaoticMind, what are you deploying on (MAAS?) and how (bundle?)Å [14:20] zeestrat: I'm deploying on aws, via my own bundle which includes ceph-mon/ceph-osd/ceph-radosgw/keystone/horizon/percona(mysql) with appropriate relations etc [14:24] ChaoticMind: Gotcha. What do the radosgw logs say? Are you using network spaces by any chance? [14:27] Is there maybe like a minimum viable bundle available somewhere? The logs via juju debug-log aren't super helpful. Nope, not using network spaces. [14:27] Is there some specific log you're referring to? [14:29] oh, also, trying to follow https://jujucharms.com/ceph-radosgw/#access results in the command just hanging [14:42] Try to `juju ssh ceph-radosgw/0` and look at `/var/log/ceph/radosgw.log` [14:45] The guys in #openstack-charms (https://jujucharms.com/u/openstack-charmers/ in the charm store) have a bundle, but that is aimed at maas. You could ask the guys there if they have anything for specific for aws. [15:01] ah thanks zeestrat - it looks like it didn't initialize properly -- unable to find a keyring on /etc/ceph/keyring.rados.gateway. Have to figure out why [16:51] <[Kid]> zeestrat, i am trying my new YAML and getting the same thing i was before [16:51] <[Kid]> ERROR invalid charm or bundle provided at "./k8s-v1.yml" [16:52] <[Kid]> https://pastebin.com/5XCNwXT3 [16:52] <[Kid]> is there a way to get a more verbose output on where juju is erroring? [16:52] <[Kid]> like what line, column type debug [17:50] [Kid]: try running 'charm proof' on the directory with your yaml. you'll need to call it 'bundle.yaml' because i think that's a hard req for proofing bundles. [17:50] [Kid]: when i ran it, it complained on line 28 and 40ish. looks like you're missing the "to:" directive before your lxd:x entries. [17:53] Also, I think you can drop machine definitions 2 to 7 and just add the constraints part to the worker charm [18:45] Juju show in 15min!!!! [18:45] woot woot! [18:51] link for folks that want to join in the chat https://hangouts.google.com/hangouts/_/zxyr3qie45ejfh2x7x7sva5szee [18:52] kwmonroe: bdx hml tvansteenburgh cory_fu marcoceppi and anyone else that I can talk into it :) [18:52] ^ [18:52] for watching from the sidelines you'll want the link: https://www.youtube.com/watch?v=YFwE6x6JETY [18:53] Cynerva, ryebot: ^ [18:58] bdx: you coming today? [19:14] https://github.com/juju-solutions/charms.reactive/pull/123 [19:19] Also, here is an example interface refactored to use it: https://github.com/juju-solutions/interface-kube-control/blob/6987921aacb7a03c01b635a9465873683a6a54ac/provides.py [19:21] bdx: Have you used the --wheelhouse-overrides option for charm build to test some of these in-progress features? [19:21] no [19:22] I can feed that a wheelhouse tarbal? [19:22] bdx: Newer versions of charm-build support --wheelhouse-overrides that lets you provide an external wheelhouse.txt file when building a charm that will override items provided by the charm or layers [19:22] Not a tarball, alas [19:22] Might be worth doing as another feature [19:23] But it lets you override items line-by-line [19:28] cory_fu: how is it that I get your branch into the wheelhouse.txt? [19:29] content interface docs: https://forum.snapcraft.io/t/the-content-interface/1074 <-- rick_h [19:31] bdx: You can use the VCS URL support: https://pip.readthedocs.io/en/stable/reference/pip_install/#vcs-support [19:31] Who's in charge of charmhelpers? The docs situation is a bit of a mess. https://github.com/juju/charm-helpers/issues/27 [19:40] damn, hit done on that and it killed my chrome [19:41] kwmonroe: :P the last thing I heard before crashing was something about "get it together at the end" [19:42] yeah rick_h, i was trolling on your ability to get your k8stuffv2 deployed. you did it though. nice job :) [19:42] kwmonroe: hah, /me dances harder for the fans [19:47] zeestrat: Hrm, yeah, that's probably because pythonhosted.org has been deprecated and uploads all but disabled. We need to set it up on RTD.io [19:49] That would be great :) [19:51] tvansteenburgh: Are you admin on the charm-helpers GH repo? [19:52] cory_fu: yes, and so are you [19:52] oh wait [19:52] tvansteenburgh: Nope. [19:53] It's probably just jamespage, I suspect [19:53] cory_fu: yeah probably, i'm not [19:55] tvansteenburgh, zeestrat: Ok, I pinged him on the issue. Should be pretty straightforward to get done when he's in. [19:56] https://imgur.com/a/4oM40 [19:56] all of those ^ machines are right next to eachother in the same subnet [19:57] just because its a cross-model relation shouldn't mean we automatically assume the public ip [19:57] I know I've talked about this with jam and axw (I think) [19:58] but I just want to make it apparent such that its something people think about as they use the new networking functionality [19:58] I absolutely do not want my things to talk over the wan when they are sitting right next to eachother [19:58] hopefully others start to see this, and start yapping about it [19:59] "all of those ^ machines are right next to eachother in the same subnet" - the machine are also right next to prometheus and grafana [19:59] in the same subnet [19:59] not sure if people are aware [19:59] or even care [20:00] I guess we are making the assumption that if things are in a different model, then they dont have rout-ability to eachother [20:01] or possibly, just the wrong endpoint was chosen from network-get [20:01] idk [20:01] <[Kid]> kwmonroe, thanks! i completely missed that [20:01] <[Kid]> and that charm proof command is handy! [20:02] np [Kid] [20:02] cory_fu: hey - what do you need? [20:03] jamespage: You're up late. :) I was going to set up the charm-helpers repo on readthedocs.org, but I don't have sufficient permissions. So it's up to you to either set it up there, or give me perms to do so [20:03] cory_fu: in australia - what's your github handle? [20:04] jamespage: johnsca Then you're up even later than I thought! [20:04] cory_fu: ok you have admin on that repo now [20:04] * jamespage heads for breakfast [20:04] <[Kid]> kwmonroe, is charm a separate program? [20:05] jamespage: Thanks! Oh, I guess it's early there, then [20:05] <[Kid]> i.e., do i need to apt-get install charm? [20:05] I'm bad at timezones [20:05] cory_fu: Great. Would also be handy to redirect the pythonhosted site and update the pypi as it has a decent google rank. [20:05] <[Kid]> ahh got it [20:05] <[Kid]> charm-tools [20:05] zeestrat: Unfortunately, I haven't found a way that (still) works to do a pythonhosted.org redirect. Best I can do is delete the docs from there entirely [20:07] <[Kid]> haha, now i am getting FATAL: Not a Bundle or a Charm, can not proof [20:07] <[Kid]> and i named it bundle.yaml [20:09] <[Kid]> haha, but it works anyways [20:09] <[Kid]> oh well [20:14] [Kid]: charm is a separate program, and while i think you've stumbled onto the ppa that provides charm-tools, the recommended install method is to 'sudo snap install charm'. [20:15] not sure why you're getting a fatal error, but you can pass --debug to the charm command. perhaps that'll reveal something. [20:16] <[Kid]> it works [20:16] <[Kid]> it is deploying, but still only deploying 5 out of the 8 nodes [20:16] <[Kid]> will it deploy in stages? [20:16] <[Kid]> i.e., do 5 then the other 3? [20:17] <[Kid]> this is my new bundle.yaml [20:17] <[Kid]> https://pastebin.com/GBHKFtzi [20:17] <[Kid]> and i have the "worker" tag on the nodes in MAAS that are not being deployed [20:18] <[Kid]> but they are showing "down" in juju status [20:20] [Kid]: juju should do all the machines at once -- maybe maas is just taking a bit of time to fulfil the request. you could try passing --debug to the juju deploy command to see if that bubbles up any provisioning errors from maas. [20:21] [Kid]: fyi, you can run 'juju deploy' multiple times and it'll reuse existing deployed stuff, so like if easyrsa is already in your model, a subsequent juju deploy will just reuse it. [20:24] <[Kid]> ahh that is good information [20:25] <[Kid]> maybe that will kick start MAAS [20:31] [Kid]: Just to be clear, how many nodes do you have in maas and how are they tagged? Could ya also pastebin `juju status`? [20:33] <[Kid]> zeestrat, i have 8 nodes [20:33] <[Kid]> first two are tagged: master,worker [20:33] <[Kid]> the rest are tagged: worker [20:34] <[Kid]> https://pastebin.com/DRAgWsV5 [20:34] <[Kid]> juju status [20:34] <[Kid]> thing is, MAAS spun up 5 nodes [20:35] <[Kid]> and not 3 [20:35] <[Kid]> and the 3 it didn't spin up are in a "Ready" state powered off [20:36] <[Kid]> the two that are labeled master,worker were spun up, but they arent being seen by juju [20:37] [Kid]: are there any errors in the juju log? [20:37] <[Kid]> just ran deploy again against the same bundle also. [20:37] <[Kid]> https://pastebin.com/CYbbP8qA [20:37] <[Kid]> umm in /var/log/juju.log? [20:38] <[Kid]> i mean /var/log/juju/logsink.log? [20:38] or juju debug-log [20:38] [Kid]: Gotcha. Thanks for the info. I see the model shows version 2.0.2. Could you write the output of `juju version`? Just wanna make sure you're on an up to date juju first. [20:38] actually [20:38] * thumper thinks of a nice way [20:39] <[Kid]> juju --version [20:39] <[Kid]> 2.0.2-xenial-amd64 [20:39] oh dear [20:39] you really want to be using newer [20:39] however [20:39] <[Kid]> i used snap install [20:39] <[Kid]> that should have pulled latest, right? [20:39] juju debug-log -m controller -l warning --replay [20:40] [Kid]: snap list [20:40] how old is the controller? [20:40] <[Kid]> snap list [20:40] <[Kid]> Name Version Rev Developer Notes [20:40] <[Kid]> core 16-2.28.5 3247 canonical core [20:40] <[Kid]> juju 2.2.6 2739 canonical classic [20:41] when did you bootstrap the controller? [20:41] <[Kid]> like a week ago [20:41] juju status -m controller [20:41] <[Kid]> this is a brand new install [20:41] @thumper, might we have a case of mixed snap and ppa install? [20:41] <[Kid]> juju status -m controller [20:41] <[Kid]> Model Controller Cloud/Region Version Notes [20:41] <[Kid]> controller dal01-maas dal01-maas 2.0.2 upgrade available: 2.0.4 [20:41] <[Kid]> App Version Status Scale Charm Store Rev OS Notes [20:41] <[Kid]> Unit Workload Agent Machine Public address Ports Message [20:42] <[Kid]> Machine State DNS Inst id Series AZ [20:42] <[Kid]> 0 started 10.1.105.15 6hqhy4 xenial default [20:42] <[Kid]> 1 started 10.1.105.16 n83trc xenial default [20:42] <[Kid]> 2 started 10.1.105.17 e6qkdm xenial default [20:42] [Kid]: Could you also do a `sudo dpkg -l | grep juju` [20:43] <[Kid]> just to be clear the 3 controllers it is showing are VMs [20:43] <[Kid]> and the machine i am running these commands from is the MAAS server [20:43] <[Kid]> so i could bootstrap [20:43] <[Kid]> ii juju 2.0.2-0ubuntu0.16.04.2 all next generation service orchestration system [20:43] <[Kid]> ii juju-2.0 2.0.2-0ubuntu0.16.04.2 amd64 Juju is devops distilled - client [20:43] <[Kid]> ii python-jujubundlelib 0.4.1-0ubuntu1 all Python 2 library for working with Juju bundles [20:46] Looks like there's an old juju installed. I'd suggest removing the juju and juju-2.0 packages with apt remove etc. Then hopefully you should end up with 2.2.6 from the snap. [20:48] Then if it's OK, destroy the controller and do a fresh bootstrap with 2.2.6. [20:48] <[Kid]> this is wierd [20:48] <[Kid]> Removing juju (2.0.2-0ubuntu0.16.04.2) ... [20:48] <[Kid]> juju --version [20:49] <[Kid]> Removing juju (2.0.2-0ubuntu0.16.04.2) ... [20:49] <[Kid]> 2.0.2-xenial-amd64 [20:50] You probably need to purge those packages [20:51] [Kid]: might also want to 'export PATH=/snap/bin:$PATH' to make sure the snap executables take precedence [20:52] <[Kid]> kwmonroe, that did i! [20:52] <[Kid]> the export now points to the right juju [20:52] <[Kid]> 2.2.6 [20:54] <[Kid]> another wierd thing is when it destroys the model, it will release all the nodes except for the two that have the dual tags. [20:54] <[Kid]> the worker,master two [20:54] <[Kid]> i have to manually release those [20:54] <[Kid]> trying with 2.2.6 now [20:57] <[Kid]> well this is stupid. i just rebooted and right back to 2.0.2 [20:58] <[Kid]> ok, its cause it isn't keeping the system variable [20:59] <[Kid]> how can i make that variable persistant? [20:59] Yeah, if purging those packages aint an option, you could fix your path in .bashrc or the like [21:00] <[Kid]> crap, i have to bootstrap again too [21:00] <[Kid]> cause the controllers are at 2.0.2 [21:09] [Kid]: As it's getting late for my brain, here's what I'd do if you still end up with the nodes not deploying. Make sure juju is able to deploy all maas nodes, for example by just deploying a bunch of ubuntu charms `juju deploy ubuntu -n 8`. If all good, remove the worker tags from the master nodes in maas for now just to avoid any confusion and try to deploy something like this: https://pastebin.com/UP3CpdD3 [21:09] By the way, what version of maas are you running? [21:10] <[Kid]> 2.2.2 [21:10] <[Kid]> zeestrat, kwmonroe, and thumper. thank you for your help, i really apprciate it [21:10] Great. That should work fine. [21:11] <[Kid]> i see how beneficial juju and maas can be together, just have to get past these bumps. helps me learn though [21:11] w00t, great to here [Kid] [21:11] er, hear [21:20] [Kid]: you probably could have upgraded the controllers [21:20] rather than redeploy [21:21] <[Kid]> crap. oh well [21:22] <[Kid]> i am almost done bootstrapping again [21:22] <[Kid]> also, i meant to mention i am in a HA cluster [21:22] <[Kid]> for the controllers [21:22] you will find 2.2.6 much better than 2.0.2 [21:22] faster [21:22] and more stable [21:22] <[Kid]> that is good [21:22] better memory footprint [21:23] <[Kid]> yeah, the ubuntu repos are usually way behind [21:23] <[Kid]> i don't like bleeding edge, but they are barely trickling blood edge [21:29] <[Kid]> well, this is interesting [21:30] <[Kid]> 2.2.6 gives me more detail [21:30] <[Kid]> cannot run instances: cannot run instance: N [21:30] <[Kid]> o available machine matches constraints: [('tags', ['master', 'worker']), ('agent_name', ['d71fcd25-e [21:30] <[Kid]> 11b-4991-8831-1349aca1ad9a']), ('zone', ['default'])] (resolved to "tags=master,worker zone=default") [21:30] <[Kid]> they are there. [21:31] <[Kid]> and now the only machines it spun up are the two that have the master,worker tags [21:31] <[Kid]> ahhh there we go [21:31] <[Kid]> it is spinning more up now... [21:32] <[Kid]> so now i think it has something to do with those tags [21:32] <[Kid]> its like it doesn;t like having two tags as a constraint [21:40] [Kid]: I'd try not using multiple tags in constraints like I described above to avoid the issue for now and come back to that later. [22:00] <[Kid]> ok [22:01] <[Kid]> and i can just use the to: instead of contraints [22:37] [Kid]: why does the machine need both master and worker tags?