[04:31] juju makes me want to bang my head on the desk sometimes :) === mup_ is now known as mup [07:40] Good morning Juju world! === CyberJacob is now known as zz_CyberJacob [09:05] heya, is there a way to force juju 1.25 to use a particular controller? === zz_CyberJacob is now known as CyberJacob === jamespag` is now known as jamespage [14:27] Hi folks! [14:27] https://gist.github.com/pananormalny/11144709446aa80e956d7977df3d88d1 anyone know how to run it just once? @only_once seems to be not working here... [14:28] only_once is really a thing?! [14:28] also i have no idea what is the best way to check if db is already filled, should I check that some tables exists or there is some other fancy way of doing this [14:28] so it is [14:29] I don't know how fixed it is Spaulding but a issue here says: [14:29] "Another consideration that needs to be documented because it is very non-obvious is that, because of an implementation detail, @only_once must be the innermost decorator, if combined with @when, etc." [14:29] yours is the outer most [14:29] :) [14:30] omfg [14:30] let's try it then :) [14:30] i know... my google-fu is amazing on a Friday [14:31] haha [14:31] friday the 13th [14:31] shit [14:31] i didn't even notice [14:31] so now you know ;) [14:31] your magic abilities works even today! [15:01] hello there [15:01] i used juju with maas and i created several network aliasses on one vlan/interface [15:02] maas deploy's it correctly [15:02] i can access the servers on every network [15:03] but when i let juju generate lxd's on these hosts, they only receive one ip [15:08] hi here, I'm (always) testing canonical-kubernetes bundle charms, and I test to shut an etcd member to see if the 2 others deployed by default will be hurted [15:08] when I shut the etcd/1 or etcd/2 it's fine [15:08] but if I shut etcd/0, the cluster go unhealthy [15:08] lazyPower: one for you -^ [15:08] Zic which versin of the etcd charm? [15:08] have you any idea about this issue? [15:09] charm: cs:~containers/etcd-21 [15:11] I have done disaster testing where we tear down nodes, and i haven't encountered this behavior. Is the broken deploy still around? Can you collect the logs from one of the etcd nodes thats failing after you've destroyed etcd/0? [15:12] yep, I collected it, it just try to relink to the etcd/0 in loop (nothing unexpected) : http://paste.ubuntu.com/23792623/ [15:13] I have the same type of message if I shut etcd/1 or etcd/2, but the cluster stay healthy and Kubernetes cluster is working [15:13] but if I shut etcd/0, all kubectl or even the kubernetes-dashboard is dead, with a "etcd cluster unhealthy or misconfigured" error message [15:14] oh Zic i know what happened here. it looks like the etcd cluster lost quorem and didn't re-elect a new master because it couldn't reach consensus via the 2 nodes that were still active, and not acting as coordinator [15:16] lazyPower: oh, it seems logic, but how can I prevent that ? [15:16] Zic juju add-unit etcd -n 2 -- that will bring the total number of etcd units to 5, and you should be able to test the disaster recovery at that point [15:17] note that I put etcd on the same machine that's running kubernetes-master charm, instead of the default model where etcd is running alone in a machine [15:17] don't know if it's a safe way [15:17] Zic thats not recommended for production deployments :) [15:17] etcd needs to man up and make a decision! ;) [15:17] as etcd is the core database tracking the state of your cluster. [15:17] yeah, but I scale 3 kubernetes-master [15:18] (and test kube-api-loadbalancer ;)) [15:18] but its fine to colocate etcd for testing purposes [15:18] even if I run through 3 kubernetes-master, do you recommend to put etcd separately ? [15:19] to put it correctly: the default model deploy 1 kubernetes-master, 1 etcd, 1 etcd, 1 etcd [15:19] I add two more kubernetes-master, and place etcd charms alongside [15:19] is it a poor design choice for production cluster? :| [15:22] i think thats fine [15:22] extend the etcd cluster onto those principal units, 3 independent, 2 shared [15:22] where 2 is N really [15:23] lazyPower: thanks, I will try that [15:24] bonus question: is kube-api-loadbalancer really experimental? I hesitate to run it for production environment, bus as I lurked in what it is doing, it's just an automatically configured nginx which act as reverse-proxy [15:25] s/bus/but/ [15:26] Zic we've found some interesting behavior with the proxy, there's still some tuning to do before we can call it GA [15:26] it works well, and needs no further tuning in 90% of the scenarios in which its deployed [15:27] but if you're using addons like HELM, we've found that it doesn't work when routed through the LB [15:43] lazyPower: 5 etcd units in total permits what fault-tolerance? 2 nodes down as they stay 3 to make a quorum decision? [15:44] Zic https://coreos.com/etcd/docs/latest/v2/admin_guide.html [15:44] see the table under Optimal Cluster Size [15:44] oh yeah, I found this link yesterday and forgot it, sorry :) [15:45] Zic no worries :) I just find it useful to spot check my own knowledge against their upstream docs [15:45] it keeps me honest [15:45] :) [15:45] nooo lazyPower make him pay!!!! [15:45] make him paaaaaaaay! [15:45] sorry, friday. [15:45] and according to magicaltrout you owe us pizza and beer... because we posted links. [15:45] and he pinged me [15:45] * lazyPower pokes magicaltrout [15:45] \o/ [15:45] marcoceppi_: i should be in DC in the last week in March [15:46] consider yourself warned [15:46] conference season hasn't even started and i'm already juggling trips [15:46] true! come in France, but I must warn you, we don't have good pizzas (and our 'good' bear is tolen from Belgian neighbors) [15:47] i was in Paris last year, that was fun [15:49] Zic i'll gladly take your belgian brews [15:49] hehe [15:49] wise lazyPower you need to get practicing ;) [15:50] practicing what exactly? [15:50] drinking 15% beer for Ghent ;) [15:50] dear lord [15:50] i am not ready [15:50] magicaltrout you have lofty goals ;) [15:51] its always achievable... just requires dedication and practice... and the distinct possiblity of doing talks whilst still drunk [15:51] That would clearly follow in the footsteps of my hero Jamie Windsor [15:52] i think it was chef conf 2012, where he took the stage blitzed and tried to run a live siri-controlled infra deploy and it failed fantastically. RIP that demo, but it certainly was memorable [15:52] lol [15:53] hehe, here we use Puppet generally... but for some new projects (like K8S) we're testing Juju :) [15:56] lazyPower: if I don't care about ressources, do you advice me to split up etcd from machines which already host kubernetes-master? or staying with 3 kubernetes-master+etcd and 2 others etc separately is also fine? (more question more beers, I know :() [15:56] i do, i think its the best strategy to keep them out of the kubernetes units themselves for instances of scale [15:57] if you colocate etcd with kubernetes-master-3,4,5 for example [15:57] and your cluster goes dormant, and you remove master-4,5 [15:57] you'll still have tainted machines running with those etcd units [15:58] Plus, in juju land, when you colocate services (we call this hulk-smashing), it *can sometimes* have unintended behavior, like say we add an addon that requires an isolated etcd container to be running... (i'm looking @ you CNI networking providers) [15:58] if you've got a colocated etcd unit on that node, you'll hit port collision [15:58] ok, good advices, I will apply them :) [15:58] and in some cases, it will cause the addon workload to fail [18:40] so, my juju is broken badly, mongo won't cluster and keeps restarting, I can't issue ensure-availability, I tried restoring a lxc snapshot but the last two didn't help, before I try something drastic, like restoring the first snapshot (removing the newer to do so) anyone have any ideas how I might recover? [19:25] spaok have you been able to collect the logs from your container? which juju version? [19:25] should be 1.25 [19:26] its an older install we are trying to decomm, but juju broke so it making that harder [19:26] yeah, thats a rough situation. sorry spaok i'm not certain how to resolve mongo clustering issues but i would lead with a bug with those logs so we can attempt to triage, but it sound slike you're on a time table === Guest87783 is now known as med_ === med_ is now known as medberry [21:25] friday is here \o/ gin o'clock [21:26] this ones for you kwmonroe https://youtu.be/u6mJMJzDD-M?t=1m13s [21:26] no way i'm clicking that [21:26] lol [21:26] its music! :P [21:26] it popped up on spotify on the way home and made me think of you [21:27] ugh.. fine. i'll click it. [21:28] ha! i like it. you're back on my nice list magicaltrout! [21:28] lol [21:28] thanks :P [21:41] spaok: did you try sshing into the mongo machine and looking at the logs? [23:39] elbalaa: ya there was some corruption or something, mongo would try to cluster and then restart [23:39] we were able to finally find a mix between a previous snapshot and turning off the repl [23:39] then recovering from there [23:40] so it's up enough that we can finish the decomm process [23:58] spaok: "corruption or something" story of my life when working with mongo