=== thumper-dogwalk is now known as thumper [00:17] cory_fu: kwmonroe: http://i.imgur.com/QMMtsAt.gif :-) [01:39] does juju support running against a snapped lxd? I get 'ERROR can't connect to the local LXD server: LXD socket not found; is LXD installed & running?' any way to fix [01:40] lutostag: wallyworld had some issues running with snapped lxd :) he may b the best person to help \o/ [01:41] lutostag, export LXD_DIR=/var/snap/lxd/common/lxd [01:41] lutostag, see this https://github.com/conjure-up/conjure-up/blob/master/snap/wrappers/juju [01:41] stokachu: my man!!! ty [01:42] np [01:42] lutostag, you can't have both deb and snap lxd installed though as they conflict on port 8443 [01:42] and thats hardcoded in juju [01:43] yeah, I have been fighting a bug with juju2.1rc with bootstrapping, seeing if blowing away lxd will fix (and a time to try lxd snapped on my main machine) [01:43] lutostag, i know lxd 2.8 which is in snap stable works fine [01:43] anything newer though is broken [01:46] stokachu: thanks [01:46] axw: fiddlesticks, still can't bootstrap against a wiped lxd [04:20] lutostag: I have no idea what's going on :/ do you have a firewall on your host perchance? ufw enabled? [04:21] lutostag: can you try starting a lxd container by hand, and connecting to 10.232.128.1:8443 with telnet or whatever? [04:27] lutostag: the only other thing I can think of that might be useful is the output of: lxc network show lxdbr0 === frankban|afk is now known as frankban [08:43] Good morning Juju world! [09:46] How does Juju decide which subnet to use within a specific space? I'm deploying to MaaS where each machine has 2 NIC:s, one NIC is exposed to Internet and one NIC is for internal communication. I want to deploy openstack-base. If I have 2 separate spaces (one external, one internal), Juju seems to become confused. If I have 1 space with both external and internal subnet, I can't communicate with some units as [09:46] they get internal IP:s listed as public... === frankban is now known as frankban|afk === lukasa_ is now known as lukasa [13:10] hi, how do i get juju to use a socks5 proxy for connections? [13:10] or even better a pac file [13:34] cnf: I don't think you can. https://jujucharms.com/docs/devel/models-config has details of the various environment configuration options, but no mention of socks5. [13:36] stub: that's remote proxies, from what I get [13:36] i need it locally [13:37] juju can't talk to the cloud API directly [13:37] stub: I'm experiencing some odd behavior when requesting the same database from multiple applications, e.g. one application seems to have higher privs then the other [13:39] stub: is there anything that would cause the privs for a subsequent user to be less than the user that requested the database initially [13:40] hmm [13:40] i think setting http_proxy works [13:41] but then i get [13:41] ERROR invalid URL "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson" not found [13:43] stub: for example, after each app requests access to the database, one of the apps can communicate with postgres just fine, see http://paste.ubuntu.com/24046542/ [13:44] hmm, can I pre-download whatever image it wants from http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson ? [13:44] because i need to go through a proxy to get to my controller, but not to get the image [13:45] the other application just barfs on `rails c`, due to postgres acccss http://paste.ubuntu.com/24046554/ [13:46] stub: I see that postgres has created the correct entries in pg_hba.conf http://paste.ubuntu.com/24046557/ [13:48] hmz "getting started" with juju is turning out to be quite a rabbit hole :/ [13:49] and that the database has the correct acceess for each user created http://paste.ubuntu.com/24046568/ [13:50] stub: what is odd, and I'm just noticing this, is that there is a database created for each user as well as a database created with the dbname that I request in each of my charms [13:51] should the postgres charm create a db named that of each user? [13:51] alongside the one you request? [13:53] ugh, so i guess i can't use the vsphere provider [13:53] this was working for me some time ago, not sure if recent changes may have impact on what I'm seeing here or what the deal is [13:53] that leaves me with nothing to try it out [13:57] I had changed some perms around prior to getting that paste of the \l [13:57] any suggestions? [13:58] I was trying to test some new logic with an upgrade charm and can't eploy local changes: https://bugs.launchpad.net/juju/+bug/1666904 [13:58] Bug #1666904: upgrade-charm --switch doesn't work with local charms [14:00] where does juju store local config, btw? [14:02] $HOME/.local/share/juju [14:02] thanks jrwren [14:03] now to find a provider i can actually use [14:25] hmz [14:39] why would I get ERROR failed to bootstrap model: cannot start bootstrap instance: cannot run instances: cannot run instance: No available machine matches constraints: mem=3584.0 zone=default [14:39] ? [14:40] hmz, I can't get any of the providers working [14:44] wow, maas is horrible [14:46] right, 3 hours "getting started with juju", and I got nothing working [14:47] i'm done for the day [14:48] can't get any provider working === frankban|afk is now known as frankban [15:23] cory_fu: I have a couple of MP for charm-haproxy. Please take a look? [15:34] cnf - are you really done for the day? if not I can lend a hand in getting you unblocked [15:50] kwmonroe: I am merging the jenkins jobs backup and restore! [15:51] lazyPower: i'm mostly frustrated atm :P [15:51] got about an hour before end of day [15:55] ack kjackal - thanks! [16:00] cnf: just catching up on backscroll.. if you're still hitting the bootstrap error mentinoed above, the default bootstrap requires 3.5G of ram. you can override that with "juju bootstrap --bootstrap-constraints mem=2G", for example. [16:00] yeah, that's not going to help. that was me misunderstanding maas [16:01] gotcha [16:01] I just want to learn how juju works, but it seems to want cloud stuff I don't have available [16:01] and I can't get the vsphere one working [16:03] cnf: sorry this isn't going smoothly! i don't have vsphere/juju experience, but i can point you to free aws creds: http://developer.juju.solutions/. i know you're about to EOD, but perhaps it's worth signing up there so you can check out juju/aws. [16:03] i'll look at that for learning [16:04] but i'm mostly interested in juju for setting up openstack [16:04] if i am understanding things correctly, maas is pretty much the only way [16:04] but thanks for that link [16:05] np [16:06] admcleod: you still around? what's the best guide for juju/maas/openstack these days? is it the openstack-base readme (https://jujucharms.com/openstack-base/)? [16:07] cnf - MAAS on vsphere would be your best bet yeah. [16:07] we have quite a bit of testing around that in our OIL lab [16:07] add vm's to a maas controller? [16:08] for testing that could work, I guess [16:08] cnf - yep, thats how we do it. You can also skip MAAS and use the vsphere direct https://jujucharms.com/docs/stable/help-vmware [16:08] so you have 2 options there [16:08] lazyPower: yeah, I could not get that working [16:08] ok so the vsphere provider is what was giving you heartburn attempting to bootstrap? [16:08] i need to go through a proxy to get to the vsphere api [16:09] and then it can't get the ubuntu img list, iirc [16:09] can you do me a favor, just so i can get a bug filed if there's a bug in there - juju bootstrap --debug 2>&1 | tee bootstrap.log and pastebin that log? [16:09] ooooohhh ok [16:09] so its a localized setup, that has some restrictions our bootstrap process assumes are not there [16:09] ERROR invalid URL "http://cloud-images.ubuntu.com/releases/streams/v1/index.sjson" not found [16:10] it is [16:10] you can set a proxy for the bootstrap process to use, however i do beleive it works as follows: its going to fetch the cloud image to your workstation, and then push that to the bootstrap node (the docs state this) and that might also be troublesome [16:10] yeah, i can't get the image through the proxy [16:10] but i need the proxy to get to the vsphere [16:10] and it doesn't support my pac file, I think [16:11] and i could not figure out how to download the image manually [16:11] that's where I got stuck [16:12] cnf - ok, can i get you to file a bug against our docs for this? I'd like to see if we can get you a working path to resolution. The alternative would be to install maas, and then pre-load maas with both images and vm's [16:12] but thats a load of extra setup steps you can forego [16:12] yeah [16:12] eg: why am i setting up pxe to juju some vms? [16:13] indeed [16:13] i feel ya, let me get you a bug link [16:13] the end goal is to evaluate juju as a mechanism for running openstack in production [16:13] but i'm not there yet [16:13] cnf https://github.com/juju/docs/issues/new -- can you file a bug here, describing the limitations of your setup wrt the vsphere / network limitations? [16:14] i [16:14] we can see about getting some updated docs cut around those limitations and also ping the right people to weigh in on what you would need to request from your IT staff (if applicable) for allowed proxy domains. [16:14] hmm, not sure how to word this [16:14] (end of day, i'm tired and hungry, i'm afraid) [16:14] cnf - however i do believe its just the cloud image archive and the jujucharms.com api is all thats expected. Charms on the other hand will want hte cloudarchive bits. [16:19] lazyPower: are you one of the devs? should I mention you in the ticket? [16:20] cnf - I work on the charm ecosystem, but you bet feel free to ping me direct. i'm @chuckbutler on github [16:21] https://github.com/juju/docs/issues/1676 [16:21] I hope that's a bit clear [16:24] cnf - thanks, acknowledged receipt [16:24] i'll shop this with some of the core devs when they come online and see if we cant get you unblocked [16:24] cool, thanks [16:31] lazyPower: from what I understand, the --config sets the proxy on the remote side [16:32] export http_proxy=http://ip:port/ works though [16:32] except i can't download the image through the proxy [16:33] ok so confirmed its during the client side image fetch to load the cloud image. [16:33] ie "The HTTP proxy value to configure on instances, in the HTTP_PROXY environment variable" [16:34] for http-proxy [16:34] right, the bootstrap controller is going to want proxy access as well to reach the charmstore. I'm looking now for osx proxy settings you can set on the CLI to bypass thsi [16:34] i read that you have browser proxy working, i do beleive there's a way to proxy cli tooling too, i think its with networksetup but its been quite a while since i've done that [16:35] well, i have a pac file [16:35] that sets proxy servers differently according to the url [16:35] networksetup -setautoproxyurl "wi-fi" "http://somedomain.com/proxy.pac" [16:35] yeah, cli tools don't respect that [16:35] they use the http_proxy env var [16:36] lazyPower: hi! I just saw Kubernetes 1.5.3 was out since 15th, is it already available through Juju ? [16:36] (just like on linux) === redir_holiday is now known as redir [16:36] Zic - we are a bit behind with 1.6 code freeze. I can kick off a build and run e2e on that today if you want the 1.5.3 bump [16:36] i suspect we can get you an edge by EOD, probably closer to beta/stable by end of week [16:36] Zic - however, this depends on clean e2e results :) [16:37] cnf - ok, and i guess the PAC file changes from time to time so its not really convenient to use the HTTP_PROXY url with a manual config? [16:37] lazyPower: my main problem is that the vsphere api is behind a proxy, the REST of the internet is NOT [16:37] was just to know :) if you have any "edge" channel that I can test on preprod I can give you some review with my usage :) [16:37] cnf sorry for basic questions, just trying to wrap my head around the domain here. [16:38] Zic - ok, lets shoot for later this week then my plate is full today. [16:38] the pac file is always the same here, just ubuntu.com needs a DIRECT connection, and vsphere needs to go through a proxy [16:38] Zic i'll try to get you an edge build by tomorrow, if all else fails, friday [16:38] np :) [16:38] lazyPower: and i can't have both in the shell [16:38] and i can't split the commands, i think [16:39] cnf thats unfortunate. I dont have a direct answer right now, let me think on this and see if we cant resolve this with some clever cli fu [16:39] or maybe less-than-clever cli fu [16:39] :P [16:39] i'll keep up in the bug sinc eyou're close to EOD [16:39] lazyPower: the only thing i can think of is download the image manually [16:39] and put it somewhere juju can find it [16:39] does that work for you? [16:39] sure [16:40] fantastic. Thanks for being patient cnf, i'll try to run down some answers regarding manual image upload and instructing juju what to do with that image [16:41] kwmonroe nice drive by on that bug [16:42] i responded to that kwmonroe [16:42] to add context from here to the bug [16:42] rick_h - when you have a sec to glance at https://github.com/juju/docs/issues/1676 it would be good to gather info on if we have encountered split proxy clouds before, and if there's a known path to success here. [16:44] kwmonroe: i just noticed the no_proxy as an env var [16:45] kwmonroe: setting that to ubuntu.com gets me to [16:45] ERROR failed to bootstrap model: cannot package bootstrap agent binary: no prepackaged agent available and no jujud binary can be found [16:45] kwmonroe: im here, yeah - depends, what do you want to do with it? [16:46] kwmonroe - do we know if the 2.0.3 release was pre no-more --upload-tools? [16:46] kwmonroe: oh. i see [16:47] oof cnf, i haven't seen that error before. lazyPower, i dunno the state of --upload-tools in 2.0.3. [16:47] hmm [16:47] there is also the option of doing openstack-on-lxd on a laptop,etc [16:47] i beleive that went away in teh 2.0 release chain but i forget when [16:47] no lxd on my laptop [16:48] it adopted new behavior where it "just does the right thing" during the bootstrap process. [16:48] oh, on that [16:48] might want to hide the lxd options on the osx build [16:48] doesn't work, anyway :P [16:49] thx admcleod - cnf was working on an openstack/juju happy-fun-time, so wanted to point him at the most recent docs. atm, we're working through bootstrapping vsphere with proxies, so we'll need to get that sorted before diving into the 'stack. [16:49] cnf - there's been work in progress to command lxd providers on remote units. [16:49] kwmonroe: yep read some of the scroll [16:49] lazyPower: that would be awesome [16:49] cnf - so you can point it at a vm and suddenly you have a developer cloud on your laptop :) [16:49] spiffy right? [16:49] yep [16:50] i do that with docker things, atm [16:50] prior to that work it was a nasty hack with socat and some tls cert syncs and other fun time stuff [16:51] uhu [16:51] how do you communicate with it? [16:51] a socket file? [16:51] lxd is a restful api [16:51] (recent versions of ssh support forwarding socket files) [16:53] i saw lxd intro at fosdem a few years ago [16:56] hmm [16:57] lazyPower: well, lxc (as the command) is in homebrew [16:57] (also, juju doesn't like socks5:// as proxy protocol [16:57] maybe i should file a bug against that as well [17:01] ok, as they say, badly translated from dutch, my pipe is out [17:01] i'm going home :P [17:01] \o [17:01] cheers cnf [17:01] thanks for the help [17:18] o/ Juju world! === scuttle|afk is now known as scuttlemonkey [17:23] lazyPower, you awake today? ;-) Just noticed my first "issue" with my k8s cluster. kube-dns seems to be in CLBO but the only error I see is https://gist.github.com/cm-graham/bc9ff905ca63b06c393c08e0f33a8e33 [17:24] stormmore - thats a new one. can you file a bug for this? it looks like kubedns might be failing health checks [17:24] stormmore - one thing we did notice, is that if you hvae a busy cluster you might be filling the max_con_table [17:24] it is failing health checks, it is how I found it [17:24] stormmore - https://github.com/juju-solutions/bundle-canonical-kubernetes/issues/216 [17:24] related bug [17:26] yeah this is not a busy cluster (yet!) only running k8s services and 3 in house ones right now [17:26] ok, that doesn't sound like it woudl be the culprit [17:26] Cynerva ryebot - can we add a todo to add an action for re-scheduling the addons? Seems like this clears up a lot of the funky issues we've found with addons [17:27] stormmore - one thing to try is scale the replicas to 0, then re-scale to n [17:27] being the self-hater that I am, I suspect something stupid I did ;-) [17:27] stormmore - getting the pods rescheduled seems to unstick whatever the root cause is there [17:27] lazyPower: sure thing [17:28] ryebot - ty, in the process of fighting with merge bots :| [17:28] np, good luck! [17:36] OK lazyPower that seems to have brought it back to a healthy state for now ;-) [17:36] stormmore - ok, let me know if it continues to give you issues, this bit Zic before as well [17:37] if we can identify a root cause for why kubedns is messing its own bed, it would be good to capture that and get a patch submitted upstream [17:37] but i also know its abysmal at giving output as to why its having trouble [17:37] :| [17:37] lazyPower, yeah I remember seeing that :-/ [17:38] still more useful than that 404 I am getting from the internal dev's container! :) [17:46] stormmore - welp. :) [17:46] i can only solve so much with magic and khai [17:51] lazyPower, I know this thankfully :) the devs I am working with are a little green (read a lot!) in containers [17:57] wow I am really laggy today... going up to 13secs to here :-/ [18:09] lazyPower / stormmore : yep, I had this one, have you try to lurk at the logs directly through the docker container? because I didn't have any useful logs through "kubectl logs" [18:09] stormmore: in my case, it was the max conn for dnsmask which was reached [18:09] I just scaled kube-dns and then deleted the kube-dns pods [18:09] (it respawned to respect the scale deployment) [18:10] Zic - i dont know if thats tuneable, but we should probably investigate making that tuneable via configmaps [18:10] yeah, it seems to be tuneable... at least the maxconn is in the starting process line of dnsmask container [18:10] Zic, I am no where near hitting a conntrack limit with the little number of containers I am running right now === frankban is now known as frankban|afk [18:11] stormmore: I'm running ~120 containers but only 4 was responsible of the maxconn of dnsmask [18:11] because they were too heavily request resolving [18:11] lazyPower, isn't a tunable option on the hosts /proc? [18:11] (was Cassandra pods...) [18:11] I am only running about a dozen right now [18:12] stormmore: but all that said, it's not a conntrack limit on my case, just a software configuration-max in dnsmask container of kube-dns pod [18:12] ah :) [18:13] my team haven't realized just what we can do with kube-dns yet [18:13] if you want to verify this issue, just run a `docker logs` at dnsmask container (it's bad, you should prefer kubectl logs but in that case, I didn't find these logs anywhere else except directly with docker command) [18:13] not inclined to let them in on that yet, too many other things need my attention [18:14] if you have something like "dnsmask : Max request reaching (150)" (from what I remember), try to scale kube-dns [18:14] i'm going to interject that it gives me an extreme happy face to see you two self helping and talking through these issues here... [18:14] :D [18:15] I personally scaled kube-dns to the number of nodes I have [18:15] (one kube-dns per nodes so) [18:15] Zic - there is a horizontal autoscaler addon for this case... [18:15] i wonder if thats something worth investigating as its jsut another manifest [18:15] yes, I know you can automatize that :) [18:16] but as my nodes will not expand daily... [18:16] and as you told me you will take a look for this autoscaler in CDK :D [18:16] * Zic runs [18:16] haha [18:16] oh sure, put the flaming bag on MY doorstep, thanks Zic ;) [18:19] Hi. I'm trying to bootstrap my juju 2.0.3 with a private openstack environment. I'm making good progress (instance created, etc) but it seems to "hang" on Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux [18:19] any ideas? [18:19] shewless_ - any proxy requirements in your env? [18:19] Here is my bootstrap line: juju bootstrap --metadata-source /home/test/simplestreams --config network=9a7d0138-ecf6-4c16-a894-e033e5be9631 --config use-floating-ip=true myclouder openstack [18:20] lazyPower: no I expect the floating ip will work [18:20] ..and from within the instance there isn't a proxy [18:20] shewless_ - can you capture the same command but add --debug and pastebin the logs? i imagine we can get some output as to why its hanging there. [18:20] beauty.. I will do that [18:20] jrwren: Sorry for the delay. Comments added to the two MRs [18:21] also: I can't seem to ssh into the instance (public key). what's the default username created? [18:21] cory_fu: thanks! [18:21] shewless_ - there are key credentials in $HOME/.local/share/juju/ssh so you should be able to ssh -i $HOME/.local/share/juju/ssh/id_rsa ubuntu@$IP [18:22] cory_fu: great catch on the lint. I'm sorry I didn't catch that myself. [18:23] lazyPower: just a question without -> do upgrade-charm is required/advised as pure-incremental? or can I "jump" an upgrade (imagine 1.5.1 -> 1.5.3 for CDK)? [18:24] lazyPower: thanks. I can see that resolv.conf doesn't have the nameservers I want in it. Is there an option for me to tell juju which nameservers to use? [18:24] s/without// [18:25] (or s/without/without real intention to do it/ if I remember what my fingers wanted to type :p) [18:27] lazyPower: trying to set the nameserver on my pre-created network to see if that helps [18:30] lazyPower: much better with a good DNS server! thanks for the help [18:51] Zic - at present we dont have any dependent chains of charms so you should be g2g [18:52] Zic - if thats not the case we'll certainly signal you before the 1.6 upgrade steps are published to help vet in your staging env [18:52] shewless_ - awesome glad we got you unblocked :) sorry i stepped out for lunch in the middle of all that [18:52] ok :) [18:53] so close for your multi-master branch -https://github.com/kubernetes/kubernetes/pull/41910 [18:53] once that lands and fixes the queue we should be unblocked to land the multi-master patch and you can scale your control plane. couple that with an haproxy rewrite of the apilb and you should be in HA nirvana [18:53] * lazyPower isn't ambitious or anything [18:54] oh yeah [18:54] with the new kube-api-loadbalancer charm, do you have plan for scaling it? [18:54] like 2 apilb and a VIP? [18:54] well we can give you the 2 apilb [18:55] the impl of VIP would be up to your model environment [18:55] you can ELB, you can floating ip, you can round robin dns [18:55] or as you said, VIP [18:55] however if you want to ELB you can probably just negate the apilb and go direct to ELB, and use that in place. but i dont have confirmation on any of that yet as its still WIP :) but yes, i dont want to reintroduce a SPF to solve HA [18:55] I can maybe implement an Heartbeat VIP directly on the VM which host the kube-api-loadbalancer [18:56] I think there is no overlap with Juju deployed files [18:57] lazyPower: this part (master & kube-api-loadbalancer) of the cluster is not hosted @aws [18:57] lazyPower: I can share you something in private :) [18:59] lazyPower, well it is back :-/ going to go look at the docker logs in a little bit [19:04] lazyPower: Now I'm trying to add some charms via the gui... I see an instance is created in openstack but the charm just says "pending" - hints at where the logs are? [19:06] shewless_ juju debug-log would be a good place to start [19:06] if the instances aren't coming back you might need to remote into the controller or switch to the controller model and check the logsync [19:09] lazyPower.. hmm the logs are showing things happening.. maybe I just didn't wait long enough [19:09] shewless_ - i can be a process sometimes depending :) [19:09] BTW will it download images as needed or do I need to provide them like I did in order to bootstrap? [19:10] shewless_ - should download as needed, it pulls that data from simplestreams [19:10] lazyPower: so I have to add the images to the simplestream then? [19:11] shewless_ - give me a moment to re-read thread. i'm in standup and want to answer this correctly [19:13] lazyPower, nothing there other than the sidecar's log showing the connection refused [19:14] stormmore - that sounds suspect and similar to Zic's issue where its a conmax on the daemon [19:14] stormmore - if you sale kubedns to 3 replicas does it continue to be a CLBO issue? [19:16] to fetch the dnsmask logs, I needed to use `docker logs ` directly at the kubernetes-worker which host the Pod's containers, because through `kubectl logs` I have nothing interesting [19:16] scaling up now to find out :) [19:16] days after, I think it was because I didn't know the `kubectl logs kube-dns --container dnsmask` :p [19:16] +syntax [19:18] Zic, yeah I did that to confirm that I was seeing the same output in the container logs in the kubernetes dashboard too [19:18] OK it is scaled and green (for now!) [19:19] lazyPower, if you are correct, scale is not the issue since I basically had the default pods setup [19:20] stormmore: you have maybe one container/pod which "talk" a lot to kube-dns [19:21] in my case, it was not the number of Pods I have, this error was triggered by only 4 pods [19:21] was Cassandra pods misconfigured, which try to querying kube-dns in loop [19:24] Zic, nope only basic nodejs containers that are not really aware of their environment yet. [19:24] lazyPower, definitely isn't working, the new pods are in a restart loop already [19:25] stormmore: if you tail -f /var/log/syslog at your kubernetes-master, do you have any suspect lines ? [19:25] as I understood, kube-dns healthz/readyness is checked through the API [19:26] so check if you see any denied or error GET at kubernetes-master [19:37] nothing but a bunch of 200s [19:38] hmmm... somethigns amiss if you aren't getting error output in the console and the api is giving you 200's [19:38] stormmore - ok lets try to reduce to square 1, can you whole sale remove the kube-dns deployment and reschedule? the rendered template is in /etc/kubernetes/addons/ [19:39] stormmore - i'd like to get you to attach and tail the container logs and kubectl log output for the application pods until it reaches CLBO. we might catch something happening [19:40] this is where i wish i had prometheus monitoring completed, we could likely scrape the issue out of the metrics. [19:40] yeah that would be nice too :P [19:40] future me will appreciate it :) [19:40] but present me hates that its not there [19:42] so basically you want me to detroy / recreate the kubdns deployment, right? [19:42] (lazyPower: CDK plans to integrate Prometheus as default? or through a third-party charm?) [19:42] third party charm - i'm 90% certain of that [19:42] as for the log output lazyPower, you want the container logs from all 3 containers in the pod, right? [19:42] there are already helm charts to deploy prometheus if you want it today, but thats not a very juju-style answer. what do you do when your k8s is sick and you want that data? :) [19:42] stormmore - yeah, we'll need all 3 to correlate [19:43] ack [19:43] because I'm scratching my head to add metrics to our Nagios/Naemon by hands... if a Promethus charm automatize this, it will helps me a lot, I confirm :p [19:43] Zic - thats the idea, you betchya [19:44] my Naemon's metrics are curently just a bunch of curl to the K8s API status :/ [19:44] for pods, nodes, services... [19:44] it's kinda creep [19:44] Zic - in the interrim there's always BEATS [19:44] and with metricbeat you can create custom metrics fairly easily [19:44] which could in turn handle that and stuff it in ES to be graphed with kibana [19:45] Beats just sond like horrible earpods in my head [19:45] do I miss something? :D [19:45] elastic released golang based agent's to ship arbitrary metrics [19:45] s/sond/sounds* [19:45] Zic https://insights.ubuntu.com/2016/09/22/monitoring-big-software-stacks-with-the-elastic-stack/ [19:46] "elastic stack" sounds as ELK now, ARGH! :p [19:46] its the successor to ELK [19:48] oh nice [19:49] because my current Naemon checks looks like (ugly) this: `curl -sk https:// I'm trying to avoid grep and use "jq" instead, as it stdout is JSON [19:49] yeah, metricbeat can just poll that whole json feed, and stuff it in ES [19:50] you can then subquery in teh dashboard to make nice timeseries charts out of it. [19:50] or do additional parsing in logstash, whatever your business logic is [19:50] the idea is to be flexible and give you a swiss army knife to make meaningful reports based on whats important to you as an operator / analyst [19:50] thats why i fell in love with beats, you dont have to code your app to integrate with it like you do with prometheus [19:51] well it is re-provisioned and running green for now, going to make a cup of tea and see if it can last at least that long [19:51] my main concern is for alerting (we have TVs which displayed the current status of all our platforms at office) and mail-alerting/SMS for our on-call rotation [19:51] stormmore - ok, thanks for doing the debug work, i'm concerned that theres a hidden dragon in here we've not encountered so therefore we aren't accounting for. [19:52] you're the second user thats reported kubedns failures in < 1 month. its likely that release of the addon might just be hinky [19:52] all these are linked to Naemon for now [19:52] but as we're testing Prometheus of others platform, it will be nice to have it for CDK also :D [19:52] s/of/for/ [19:52] Zic - its future work but on the roadmap :) again, i'll ping ya when somethings brewing there [19:53] happy to help you clean up addon services to replace with charms, because thats how we roll [19:55] not a problem, least I can do :) [19:57] although the bad good news might be that the rescheduled deployment might have solved the problem [19:57] in any case, even if I don't have anymore CLBO or dnsmask maxconn reached on kube-dns, it continues to restart sometime, but as I scaled them, at least they are not restarting at the same time, so no unavailability : http://paste.ubuntu.com/24048434/ [19:57] look at restarts column [19:57] as lazyPower said, this kube-dns release seems to not be so stable :/ [19:58] so far I am not seeing any connection refused errors in the sidecar container which was what I was setting before [19:59] stormmore - might be a sync issue :| [19:59] i'm not impressed with this release of kubedns. when we circle back to the 1.5.3 release we'll grab the latest manifests for that rel and see if we can get this resolved via addon bumps [19:59] but i'm not hopeful [20:01] it's not blocking as I have like stormmore a hard CLBO before, now it's scaled, I just have some "instant-restart", and not all at the same time [20:01] but it's weird :/ [20:02] does make me question the decision to use kubernetes / docker vs some lxd type environment right now [20:02] dns seems critical enough to me that it needs to be stable [20:03] i'm sure if we gather enough info and bug it, that it'll get fixed [20:03] we just need to find that root cause and get it contributed [20:03] if its biting us, its biting other users [20:03] and i'd rather not throw the baby out with the bath water :) but on that note stormmore - i'm more than happy to support you in a move to lxd as well. because LXD is the bees knees [20:03] oh agreed, questioning isn't going to keep me from figuring it out [20:04] lazyPower, I just need to do my research on Docker to LXD [20:05] we're not planning to use Kubernetes and LXD at the same place here, we're using Kubernetes as PaaS (= our customer managed which pods are running, we are managing the deployment and the availability of the cluster) ; for full-managed infrastructure, we're curently using VMware ESXi or Proxmox, and LXD will be part of this list [20:05] lazyPower, from the little research I have done, outside of maybe Juju, LXD management / orchestration isnt as mature as k8s [20:07] TL;DR: Kubernetes as Docker's orchestrator / LXD as hypervisor, even if LXD use LXC-component of containers, it's more like VMs [20:08] the only things that have a "versus" to Kubernetes is Swarm or Rancher, with less features [20:08] (we have some Rancher here, and our PoC of Swarm was not satisfying) [20:09] oh I definitely get that by management, I mean things like the kubenetes dashboard level maturity [20:09] ehhhhhh [20:09] not so sure i agree with that sentiment, but i'm clearly biased [20:10] so i'll let you come up with your own conclusions, however lxd has been in prod here at canonical since release, and before that with lxc. flockport even wrote an entire hosting company based on lxc [20:10] lazyPower, don't get me wrong the juju gui is one of the nicest guis I have seen but it doesn't have the level of data that the kubernetes one does [20:11] well sure, those are wildly different use cases [20:12] the juju gui is only intended to be used for modeling your applications and then routine tasks. There hvae been many requests to integrate things like ssh in the browser, log aggregation, etc. [20:12] i dont think we've had the manpower to promote that in priority however, as other things like model migrations and what not have taken precedence. [20:12] I didn't touch the Juju webUI since I finished the deployment personally, I'm doing all post-stuff with the juju-cli only [20:12] which are arguably larger / important features for the core of juju to have. [20:12] yeah [20:12] we find that most operators tend to do that [20:13] and it was for our baremetal/manual installation [20:13] myself included, i look at the juju ui during testing only or when i'm mocking something up quick to share. [20:13] for labs, with conjure-up, I didn't use Juju WebUI at all [20:13] but that being said [20:13] comparing apples/oranges here :) [20:13] i found this though [20:14] https://github.com/aarnaud/lxd-webui [20:14] i haven't used it, and it looks a bit long in the tooth- 9 months since last contri - however... looks neato [20:14] not faulting juju at all, just s saying from a cluster management perspective the kubernetes dashboard is awesome [20:14] stormmore - well its a good thing we grabbed it for part of the CDK :D I'm happy i could deliver on that one [20:15] stormmore - still no issues with kube-dns? [20:15] here it's that way: LXD is used as an hypervisor (and have a take on Proxmox, VMware ESXi, KVM) even if it use LXC-container-echnologies-inside ; Juju is used as a tool to deployed and managed highly-complicated platform like K8s or OpenStack ; Kubernetes is used for a customer which come with "I have 100 docker running at a raw-dockerd, do you have something to orchestrate them and pass to production?" :) [20:15] if its running idle as it should right now, i fear we're running into a race condition with the pods or a sync issue or something similar. Just keep that pipeline open and if you catch something dump the logs and lets bundle up for an issue, even if its inconclusive results. [20:16] lazyPower, Zic I use the command line more often than not for things but it is always nice to have readily available "pretty pictures" to show people [20:16] and lazyPower still green [20:17] ok, sounds good. Thanks again, i'm going to context switch back into the etcd bits and focus on landing this multi-master PR [20:17] ping me if you need anything. otherwise i'm resuming silent operation [20:17] yeah I am going to go back to determine my permissions issue solution [20:19] lazyPower: the good thing I note for later is that you're at the middle of your office-day when I'm on-call rotation :p [20:19] (it's 21:19 here o/) [20:19] and I'm on-call this week :p [20:23] I can happily say I am not on call at the moment :) [20:25] :D [20:43] oh that just means I have more time to architect and design environments for now [20:55] jrwren: http://pastebin.ubuntu.com/24048767/ on the test now [21:13] jrwren: And on the other MR: http://pastebin.ubuntu.com/24048870/ [21:45] petevg: i need your unit test guru status. i wanna unit test actions. my actions have hyphens in the name "do-stuff" wihtout a .py extension. that makes imports hard. my workaround is to have "do-stuff" import ./stuff.py, and call "stuff.action()". then i just unit test stuff.py. any better way? [21:47] kwmonroe: Your way sounds pretty good. Not naming a python script "blah.py" is kind of an anti pattern, so the workaround isn't necessarily going to be pretty. [21:48] kwmonroe: you could also copy the file to a temp dir, with a ".py" extension, and import it from there. [21:48] kwmonroe: ... or you could try to hack on Python's import command, to make it work wit a non .py file. Shouldn't be too scary, but I don't know what you'd do off the top of my head. [21:49] omg petevg, i don't know why i talk to you. you went from "sounds pretty good" to "this is gonna hurt" in like 2 messages. [21:49] Just trying to be helpful :-) [21:50] well i'm all for anti anti-patterns, but i don't belive actions can have a suffix, which makes this particularly annoying [21:51] lib + wrapper, or action == executor action.py == library [21:51] i do appreciate the alternatives petevg! just giving you grief. also, it's 85F here in texas, i'm coding by the pool. how's shoveling your driveway going? [21:51] i dont know what that would do tho, if you use foo executor and foo.py... if it would give you grief during import [21:53] kwmonroe: The snow is actually basically melting, because it's kind of the same thing as being 85F here, relatively speaking :-p [21:53] wait lazyPower, i don't follow your == suggestions. are you suggesting symlink the action to action.py? [21:53] Ooh. A temp symlink > temp file. [21:54] symlinks would work [21:54] slightly opaque, but doable [21:54] i know the bashism to call a method based on $0, but how do you do that in python? [21:54] sys.argv[0] [21:55] nice [21:58] nm, i hate that for the same reason i hate trying to follow old charms with links to hooks.py. [22:00] i'll just shell out to "java -jar myaction.jar " like matt taught me. [22:00] I want to know number of units of a charm deployed in the charm code === Siva is now known as Guest1503 [22:00] How can I find that? [22:01] kwmonroe: it looks like there's a way to do it, no hacking needed, with the "imp" module. Or so says Stack Overflow: http://stackoverflow.com/questions/2601047/import-a-python-module-without-the-py-extension [22:06] I want to find out the ipaddress of all the peer units deployed. How can I find that from within the charm code? [22:07] Guest1503: if the peers in the charm have a relation to each other, you can query the conversations in the relation. [22:08] For an example, see the interface the Zookeeper charm uses to wrangle its peers at https://github.com/juju-solutions/interface-zookeeper-quorum/blob/master/peers.py [22:09] Guest1503: this assumes that you're writing a layered charm using the reactive framework, and it does require writing an interface. If you have some trouble figuring out how everything works, I might be able to answer specific questions. cory_fu might be able to help you out, as well. [22:11] @petevg, I am not writing layered charms....normal charms based on the hooks [22:13] Guest1503: I'm afraid that you've stepped outside of my area of expertise, then :-/ Does anyone else have any advice on doing interface style stuff in an older style charm? [22:14] @petevg, in my case I want to make sure I have all the peer ip's before I do some operation. The problem is how do I find out the num_units you specify in the bundle in the charm code? [22:14] Guest1503: Either way, you will need to use a peer relation. On the -relation-joined hook, you should be able to use `relation-get private-address` [22:15] @cory_fu, Yes that will work.. but for me I need a way to know if all the peer relation ip's have been fetched [22:15] How do I find that out? [22:15] Or you could iterate over the peers in any hook using related-units, and call relation-get with an explicit relation-id and unit [22:16] Guest1503: I'd consider refactoring so that your charm can handle an additional peer joining after you've done whatever processing that you're doing. Someone can add another peer with "juju add-unit" at any time, so the code will need to handle the case where you add a peer, anyway. [22:17] What do you mean by "all the peers"? Each unit will be able to see all of the peers that are connected to it, though it might take a small amount of time before a new peer is connected to all of the other units [22:17] Right, what petevg said. You can always add more units [22:17] Or remove them [22:18] @cory_fu, say I deploy 3 units of a charm using a bundle... so you recommend the 'for' loop for peers in a some other hook rather than Guest1503: the "best practice" pattern is "this hook/event fires off when I have a new peer on my relation, and I do the appropriate thing." There isn't really a "wait until all my peers have come up" event, because you can never be confident that the operator is done adding peers. [22:19] Guest1503: so the is the correct hook. It just needs to do the right thing whenever a new peer joins. [22:20] Guest1503: in zookeeper's case, it writes out the peer to a list in a config file, then restarts the zookeeper daemon. [22:41] cory_fu: if I want to grab the latest cwr-ci bundle, this invocation should do it, right? `charm pull cs:~bigdata/bundles/cwr-ci` [22:41] (Or did we move it out of bigdata?) [22:42] petevg: It's in ~juju-solutions [22:42] ... and it's singular bundle, rather than plural. [22:42] cory_fu: thx. [22:42] Apparently, my bash history is all lies. [22:42] cory_fu: are you using a different lint tool? `make lint` returns nothing for me. [22:43] jrwren: Lint is fine now, it's `make test` that's failing now. And the config-changed hook in the other MR [22:45] cory_fu: I can't repro that either. :( not saying its not happening though. I'm sure its something strange about my setup. Thanks for the fast feedback. I'll have fixes tomorrow. [22:45] k