rmcadams | juju makes me want to bang my head on the desk sometimes :) | 04:31 |
---|---|---|
=== mup_ is now known as mup | ||
kjackal | Good morning Juju world! | 07:40 |
=== CyberJacob is now known as zz_CyberJacob | ||
spaok | heya, is there a way to force juju 1.25 to use a particular controller? | 09:05 |
=== zz_CyberJacob is now known as CyberJacob | ||
=== jamespag` is now known as jamespage | ||
Spaulding | Hi folks! | 14:27 |
Spaulding | https://gist.github.com/pananormalny/11144709446aa80e956d7977df3d88d1 anyone know how to run it just once? @only_once seems to be not working here... | 14:27 |
magicaltrout | only_once is really a thing?! | 14:28 |
Spaulding | also i have no idea what is the best way to check if db is already filled, should I check that some tables exists or there is some other fancy way of doing this | 14:28 |
magicaltrout | so it is | 14:28 |
magicaltrout | I don't know how fixed it is Spaulding but a issue here says: | 14:29 |
magicaltrout | "Another consideration that needs to be documented because it is very non-obvious is that, because of an implementation detail, @only_once must be the innermost decorator, if combined with @when, etc." | 14:29 |
magicaltrout | yours is the outer most | 14:29 |
magicaltrout | :) | 14:29 |
Spaulding | omfg | 14:30 |
Spaulding | let's try it then :) | 14:30 |
magicaltrout | i know... my google-fu is amazing on a Friday | 14:30 |
Spaulding | haha | 14:31 |
Spaulding | friday the 13th | 14:31 |
magicaltrout | shit | 14:31 |
magicaltrout | i didn't even notice | 14:31 |
Spaulding | so now you know ;) | 14:31 |
Spaulding | your magic abilities works even today! | 14:31 |
BlackDex | hello there | 15:01 |
BlackDex | i used juju with maas and i created several network aliasses on one vlan/interface | 15:01 |
BlackDex | maas deploy's it correctly | 15:02 |
BlackDex | i can access the servers on every network | 15:02 |
BlackDex | but when i let juju generate lxd's on these hosts, they only receive one ip | 15:03 |
Zic | hi here, I'm (always) testing canonical-kubernetes bundle charms, and I test to shut an etcd member to see if the 2 others deployed by default will be hurted | 15:08 |
Zic | when I shut the etcd/1 or etcd/2 it's fine | 15:08 |
Zic | but if I shut etcd/0, the cluster go unhealthy | 15:08 |
magicaltrout | lazyPower: one for you -^ | 15:08 |
lazyPower | Zic which versin of the etcd charm? | 15:08 |
Zic | have you any idea about this issue? | 15:08 |
Zic | charm: cs:~containers/etcd-21 | 15:09 |
lazyPower | I have done disaster testing where we tear down nodes, and i haven't encountered this behavior. Is the broken deploy still around? Can you collect the logs from one of the etcd nodes thats failing after you've destroyed etcd/0? | 15:11 |
Zic | yep, I collected it, it just try to relink to the etcd/0 in loop (nothing unexpected) : http://paste.ubuntu.com/23792623/ | 15:12 |
Zic | I have the same type of message if I shut etcd/1 or etcd/2, but the cluster stay healthy and Kubernetes cluster is working | 15:13 |
Zic | but if I shut etcd/0, all kubectl or even the kubernetes-dashboard is dead, with a "etcd cluster unhealthy or misconfigured" error message | 15:13 |
lazyPower | oh Zic i know what happened here. it looks like the etcd cluster lost quorem and didn't re-elect a new master because it couldn't reach consensus via the 2 nodes that were still active, and not acting as coordinator | 15:14 |
Zic | lazyPower: oh, it seems logic, but how can I prevent that ? | 15:16 |
lazyPower | Zic juju add-unit etcd -n 2 -- that will bring the total number of etcd units to 5, and you should be able to test the disaster recovery at that point | 15:16 |
Zic | note that I put etcd on the same machine that's running kubernetes-master charm, instead of the default model where etcd is running alone in a machine | 15:17 |
Zic | don't know if it's a safe way | 15:17 |
lazyPower | Zic thats not recommended for production deployments :) | 15:17 |
magicaltrout | etcd needs to man up and make a decision! ;) | 15:17 |
lazyPower | as etcd is the core database tracking the state of your cluster. | 15:17 |
Zic | yeah, but I scale 3 kubernetes-master | 15:17 |
Zic | (and test kube-api-loadbalancer ;)) | 15:18 |
lazyPower | but its fine to colocate etcd for testing purposes | 15:18 |
Zic | even if I run through 3 kubernetes-master, do you recommend to put etcd separately ? | 15:18 |
Zic | to put it correctly: the default model deploy 1 kubernetes-master, 1 etcd, 1 etcd, 1 etcd | 15:19 |
Zic | I add two more kubernetes-master, and place etcd charms alongside | 15:19 |
Zic | is it a poor design choice for production cluster? :| | 15:19 |
lazyPower | i think thats fine | 15:22 |
lazyPower | extend the etcd cluster onto those principal units, 3 independent, 2 shared | 15:22 |
lazyPower | where 2 is N really | 15:22 |
Zic | lazyPower: thanks, I will try that | 15:23 |
Zic | bonus question: is kube-api-loadbalancer really experimental? I hesitate to run it for production environment, bus as I lurked in what it is doing, it's just an automatically configured nginx which act as reverse-proxy | 15:24 |
Zic | s/bus/but/ | 15:25 |
lazyPower | Zic we've found some interesting behavior with the proxy, there's still some tuning to do before we can call it GA | 15:26 |
lazyPower | it works well, and needs no further tuning in 90% of the scenarios in which its deployed | 15:26 |
lazyPower | but if you're using addons like HELM, we've found that it doesn't work when routed through the LB | 15:27 |
Zic | lazyPower: 5 etcd units in total permits what fault-tolerance? 2 nodes down as they stay 3 to make a quorum decision? | 15:43 |
lazyPower | Zic https://coreos.com/etcd/docs/latest/v2/admin_guide.html | 15:44 |
lazyPower | see the table under Optimal Cluster Size | 15:44 |
Zic | oh yeah, I found this link yesterday and forgot it, sorry :) | 15:44 |
lazyPower | Zic no worries :) I just find it useful to spot check my own knowledge against their upstream docs | 15:45 |
lazyPower | it keeps me honest | 15:45 |
Zic | :) | 15:45 |
magicaltrout | nooo lazyPower make him pay!!!! | 15:45 |
magicaltrout | make him paaaaaaaay! | 15:45 |
magicaltrout | sorry, friday. | 15:45 |
lazyPower | and according to magicaltrout you owe us pizza and beer... because we posted links. | 15:45 |
lazyPower | and he pinged me | 15:45 |
* lazyPower pokes magicaltrout | 15:45 | |
magicaltrout | \o/ | 15:45 |
magicaltrout | marcoceppi_: i should be in DC in the last week in March | 15:45 |
magicaltrout | consider yourself warned | 15:46 |
magicaltrout | conference season hasn't even started and i'm already juggling trips | 15:46 |
Zic | true! come in France, but I must warn you, we don't have good pizzas (and our 'good' bear is tolen from Belgian neighbors) | 15:46 |
magicaltrout | i was in Paris last year, that was fun | 15:47 |
lazyPower | Zic i'll gladly take your belgian brews | 15:49 |
magicaltrout | hehe | 15:49 |
magicaltrout | wise lazyPower you need to get practicing ;) | 15:49 |
lazyPower | practicing what exactly? | 15:50 |
magicaltrout | drinking 15% beer for Ghent ;) | 15:50 |
lazyPower | dear lord | 15:50 |
lazyPower | i am not ready | 15:50 |
lazyPower | magicaltrout you have lofty goals ;) | 15:50 |
magicaltrout | its always achievable... just requires dedication and practice... and the distinct possiblity of doing talks whilst still drunk | 15:51 |
lazyPower | That would clearly follow in the footsteps of my hero Jamie Windsor | 15:51 |
lazyPower | i think it was chef conf 2012, where he took the stage blitzed and tried to run a live siri-controlled infra deploy and it failed fantastically. RIP that demo, but it certainly was memorable | 15:52 |
magicaltrout | lol | 15:52 |
Zic | hehe, here we use Puppet generally... but for some new projects (like K8S) we're testing Juju :) | 15:53 |
Zic | lazyPower: if I don't care about ressources, do you advice me to split up etcd from machines which already host kubernetes-master? or staying with 3 kubernetes-master+etcd and 2 others etc separately is also fine? (more question more beers, I know :() | 15:56 |
lazyPower | i do, i think its the best strategy to keep them out of the kubernetes units themselves for instances of scale | 15:56 |
lazyPower | if you colocate etcd with kubernetes-master-3,4,5 for example | 15:57 |
lazyPower | and your cluster goes dormant, and you remove master-4,5 | 15:57 |
lazyPower | you'll still have tainted machines running with those etcd units | 15:57 |
lazyPower | Plus, in juju land, when you colocate services (we call this hulk-smashing), it *can sometimes* have unintended behavior, like say we add an addon that requires an isolated etcd container to be running... (i'm looking @ you CNI networking providers) | 15:58 |
lazyPower | if you've got a colocated etcd unit on that node, you'll hit port collision | 15:58 |
Zic | ok, good advices, I will apply them :) | 15:58 |
lazyPower | and in some cases, it will cause the addon workload to fail | 15:58 |
spaok | so, my juju is broken badly, mongo won't cluster and keeps restarting, I can't issue ensure-availability, I tried restoring a lxc snapshot but the last two didn't help, before I try something drastic, like restoring the first snapshot (removing the newer to do so) anyone have any ideas how I might recover? | 18:40 |
lazyPower | spaok have you been able to collect the logs from your container? which juju version? | 19:25 |
spaok | should be 1.25 | 19:25 |
spaok | its an older install we are trying to decomm, but juju broke so it making that harder | 19:26 |
lazyPower | yeah, thats a rough situation. sorry spaok i'm not certain how to resolve mongo clustering issues but i would lead with a bug with those logs so we can attempt to triage, but it sound slike you're on a time table | 19:26 |
=== Guest87783 is now known as med_ | ||
=== med_ is now known as medberry | ||
magicaltrout | friday is here \o/ gin o'clock | 21:25 |
magicaltrout | this ones for you kwmonroe https://youtu.be/u6mJMJzDD-M?t=1m13s | 21:26 |
kwmonroe | no way i'm clicking that | 21:26 |
magicaltrout | lol | 21:26 |
magicaltrout | its music! :P | 21:26 |
magicaltrout | it popped up on spotify on the way home and made me think of you | 21:26 |
kwmonroe | ugh.. fine. i'll click it. | 21:27 |
kwmonroe | ha! i like it. you're back on my nice list magicaltrout! | 21:28 |
magicaltrout | lol | 21:28 |
magicaltrout | thanks :P | 21:28 |
elbalaa | spaok: did you try sshing into the mongo machine and looking at the logs? | 21:41 |
spaok | elbalaa: ya there was some corruption or something, mongo would try to cluster and then restart | 23:39 |
spaok | we were able to finally find a mix between a previous snapshot and turning off the repl | 23:39 |
spaok | then recovering from there | 23:39 |
spaok | so it's up enough that we can finish the decomm process | 23:40 |
elbalaa | spaok: "corruption or something" story of my life when working with mongo | 23:58 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!