/srv/irclogs.ubuntu.com/2017/04/20/#juju.txt

=== petevg is now known as petevg_afk
anrah_Quick question about the juju instances hostnames06:06
anrah_Is the format always juju-<someid>-<modelname>-<instance number> ?06:07
anrah_thing is that I require name of the current model to be resolved within instances, and agent.conf file shows only uuid and I can't find any other appropriate way (besides talking to API) to get the model where the instance is running06:08
ZiclazyPower: saw your message of yesterday, ack, so I will need to downgrade docker at our production-cluster also, before upgrading to K8s 1.6 :(08:13
Zicbut I'm afraid of garbage collecting problem with Docker that we had with the version from Ubuntu archive08:13
Zicthe Docker version from docker.com private repository fixed that08:13
Zica bug is open at Docker for that, but the error message is so generic that this bug stayed open since April 2016 with no other news than "We upgraded to the latest version of Docker and that fixed everything!" :'(08:14
iatrouhi, I am testing cdk using snapd 2.22.6 conjure-up 2.1.5 and the kubernetes-master is stuck for a while in waiting:  http://paste.ubuntu.com/24420144/13:16
stokachuiatrou: yea ive seen that before, lazyPower ^13:16
lazyPoweriatrou: can you pastebin the output of `kubectl get po --all-namespaces`? You can execute that on the master itself.13:19
iatrou here is /var/log/juju/unit-kubernetes-master-0.log http://paste.ubuntu.com/24420169/13:20
iatroulazyPower: "The connection to the server localhost:8080 was refused - did you specify the right host or port?"13:20
lazyPoweriatrou: i see that13:22
lazyPoweriatrou: do you get that same error message when attempting to issue the kubectl command on the master?13:23
zeestratDoes anyone know of a way to use dynamic variables such as env variables when deploying bundles with the native juju deploy in 2.x?13:23
lazyPowerzeestrat: you would need to use something like envtemplate to render the bundle and substitute the variables, then deploy that generated bundle13:23
iatroulazyPower: that was from kubernetes-master, from the conjure-up host, for kubectl.conjure-canonical-kubern-f24 get po --all-namespaces I get 502 Bad Gateway13:25
lazyPoweriatrou: whats the status of the apiserver when you `systemctl status kube-apiserver`?13:25
iatroulazyPower: inactive (dead)13:26
* magicalt1out wants to update to the latest CDK but my underlying openstack hardware is so woeful I don't currently dare =/13:27
=== magicalt1out is now known as magicaltrout
lazyPoweriatrou: we've found the culprit, can you attempt a restart of that service? `systemctl restart kube-apiserver`  then re-insepct to make sure its active13:27
stokachulazyPower: this problem has happened to me a few times too13:28
lazyPowerstokachu: any indicator or collection of logs so we can scrutinze what actually happened?13:29
lazyPoweri suspect a race condition but i'm not certain of that13:29
stokachuim running more tests today and will get you those logs13:29
lazyPowerok, thanks stokachu13:30
iatroulazyPower: I must be doing something wrong: on kubernetes-master I get Failed to restart kube-apiserver.service: Unit kube-apiserver.service not found.13:31
lazyPoweroh, wait, i see what i did there. my mistake13:32
lazyPoweri gave you the wrong service name13:32
lazyPoweriatrou: systemctl status snap.kube-apiserver.daemon13:32
lazyPoweri'm going to have to un-learn that muscle memory13:33
magicaltroutor get a mac with those silly bars, so you can put it on a hot key ;)13:34
lazyPower^ that13:34
lazyPoweronly, can we get an updated x1 carbon with a silly bar?13:34
ZiclazyPower: hey, I just upgraded a production-cluster this time and... except that I needed to kubectl delete -f <all_cdk_addons> && kubectl create -f <all_cdk_addons> like earlier, it works well13:34
magicaltrouthehe13:34
Zicit's the one of my two K8s cluster, the smaller one13:34
lazyPowerZic: and this is with install_from_upstream=true?13:35
iatroulazyPower: OK, still dead, but for the "right" reason this time, the restart "fixed" the waiting on the master, but  not the workers...13:35
ZiclazyPower: nope, I downgraded just before13:35
lazyPoweriatrou: ok, so i suspect kubelet needs restarted on the workers13:35
ZicI do not attempt with it :(13:35
lazyPoweriatrou: its that or wait for update-status to run and it should attempt to reconverge13:35
lazyPowerZic: ok, good13:35
lazyPowerZic: i konw the GC issue is a thing for you, but we dont have teh bandwidth to enable that level of testing at this time if the upstream project isn't doing it :(13:35
lazyPowerZic: i suspect with 1.7/1.8 those newer dockers will be addressed, however at the release of 1.6 it was only vetted against 1.10-1.1313:36
ZiclazyPower: before the kubectl create/delete cdk-addons, my kube-dns was stuck at ContainCreating with a Mount option error13:36
ZiclazyPower: for this smaller customer, he has no Docker garbage collecting error13:36
Zicso I downgraded without any drawbacks13:36
Zicit's the bigger one where it's a problem13:36
* lazyPower nods13:37
Zicin practical, the GC issue is doing this: some of the container of a pod (ZooKeeper one) have an heavy usage which may busy the kubernetes-worker units machine at 100% of its capability13:38
Zic- docker stuck13:38
Zic- docker will eventually came alive after, but with orphan container error and GC in loop13:38
Zic- kubelet cannot schedule any new pod on the node of this docker13:39
Zic-> solution, restart dockerd13:39
Zicit happens twice a days13:39
Zic-s13:39
Zicthe 17.04-ce version of Docker is not affected13:40
lazyPowerZic: catch-22, the 17.04-ce edition of docker breaks networking as we've discovered13:40
Zicyep :(13:40
Zicin 1.6.113:40
Zicworked well in 1.5.3 :)13:41
Zica possible mitigation: I recommend to my customer to set Kubernetes "limits" on his manifest for ZooKepper pods13:41
Zicof CPU/RAM13:41
Zicto never hit the 100% spot13:41
lazyPowerZic: that sounds like the proper way to do this13:43
lazyPoweryou might have to scale zookeeper.... but if its related to resource consumption and why it gets stuck GC'ing...13:43
Zicas the GC issue with dockerd seems to trigger when the kubernetes-worker units machine is under hevy load-average13:43
lazyPowerah13:43
lazyPowerok13:43
lazyPowerwell thats certainly something to explore and try13:44
lazyPoweri would def. give it resource limits and see if that resolves it13:44
lazyPowerZic: also ifyou want to file a bug to track this issue with us, i'm sure someone else is bound to hit this and it would be good to have it documented somewhere13:44
Zichttps://github.com/kubernetes/kubernetes/issues/39028 <= it's this kind13:48
Zicwith this kind of error:13:49
ZicMar 31 00:08:15 mth-k8s-01 kubelet[21200]: E0331 00:08:15.911907   21200 kubelet.go:1128] Container garbage collection failed: operation timeout: context deadline exceeded13:49
lazyPoweriatrou: any change in status on your workers?13:54
* lazyPower looks @ linked issue from zic13:54
Zicno clear identification / solution of the problem :s13:55
Zicthe OP seems to have this issue with a heaby Prometheus pod13:55
Zicheavy13:55
lazyPowerok i see the linked issue now as well relating ot gc13:56
lazyPowerthis is a hairy issue if we're seeing perf gc lock from the underlying container daemon, we're running heapster, which should be sending data back to kubelet to do GC at runtime13:56
lazyPowerbut if the container process fails to GC, yeah we're kind of dead in the water here too13:56
* lazyPower sighs13:56
lazyPowerif its not one thing its another amirite Zic?13:56
ZicI needed to Google "amirite" xD </French_guy>13:58
ZiclazyPower: most of the time, it's Zookeeper pods14:00
ZicI never saw another things, except one time, a Kibana one, but seems to be an isolated case14:00
lazyPowerZic: fair enough, i'm speaking internet gibberish14:00
Zic:D14:01
lazyPowerso i dont think its english guy vs french guy lingo, its just me being full of trash talk from the internet14:01
Zicnever kube-system pods or default pods like ingress-controller failed at this GC problem14:01
lazyPowerZic: that kibana pod gc seems unrelated, kibana is client side. I would expect that from ES though.14:01
lazyPowerlike kibana has *some* Server side jruby code, but for the most part its client side14:01
Zicyep14:02
ZicI also have ES pods14:02
lazyPowerand those dont give you trouble w/ GC?14:02
Zicthey never provoke this GC issue14:02
* lazyPower does the schenanigans dance14:02
Zicit always start with this ZK, and sometime other pod begins to fail14:02
lazyPoweri suspect its ZK bleeding into other areas because of the GC issue14:02
Zic(I think it's untied, they are just crashing, auto-healing tried to heal them, but cannot as kubelet cannot reschedule new container)14:03
lazyPoweryeah14:03
lazyPowerthat14:03
lazyPowerthats my bold prediction as well14:03
Zicin conclusion, with the docker_from_upstream=false and for that smaller cluster which don't have any GC issue, the only "problem" I got during after the upgrade was the need of delete/create all cdk-addons YAML manifests14:08
Zicdon't know if "all" was really required, was just kube-dns which failing14:08
Zicbut just in case, I did it for all cdk-addons14:08
Zics/during//14:08
Zic(I think you already identified this problem yesterday)14:10
iatroulazyPower: nope, the workers are still in waiting14:11
Zicdon't know if it's a good advice but, when juju status is stuck at something with "waiting", and if I see no action at all at a "juju debug-log" or in the units concerned by the "waiting", I just try to reboot the juju controller and/or the unit concerned :>14:13
Zic90% of the time it works14:13
Zic10% of the other time, I keep this error until lazyPower wake up :>14:13
* Zic => exit14:13
lazyPoweriatrou: ok, lets remote into one of the workers and see whats happening. do you have a minute to troubleshoot?14:14
iatroulazyPower: sure, what am I looking for?14:17
lazyPoweriatrou: systemctl status snap.kubelet.daemon14:17
lazyPoweris it dead? any indicator in the logs of a failure scenario?14:17
iatrouinactive (dead), snap[1510]: cannot change profile for the next exec call: No such file or directory14:19
iatrourestart seems to recover it14:21
iatroulazyPower: ^^14:22
lazyPoweriatrou: excellent, sorry you ran into this. was this an upgrade or a fresh deploy?14:27
erik_lonrothHello! I've spent some time writing a "Hello World" tutorial about getting started with juju development. If you like to give me feedback to it, I'd be glad. I intend to expand with more tutorials as I learn from this myself. https://github.com/erik78se/juju/wiki/The-hello-world-charm14:28
lazyPowererik_lonroth: certainly, thanks for sharing. Another good place to cross post that would be the juju mailing list "juju@lists.ubuntu.com"14:28
erik_lonrothOh, thanx for that tip.14:28
iatroulazyPower: fresh install14:29
lazyPoweriatrou: ah interesting. What cloud was this deployed to?14:30
iatroulazyPower: localhost14:31
lazyPowerok, thats interesting. There appears to be a race that you hit and i'm not sure why14:33
lazyPoweriatrou: if you could snag the contents of the kube-apiserver logs, i think that's good enough to diagnose what happened14:33
iatroulazyPower: from the master? where are they located?14:36
lazyPoweriatrou: from master14:39
lazyPoweriatrou: in standup give me 514:39
magicaltroutbet you're not stood up14:39
lazyPowermagicaltrout: i have a convertible desk14:41
lazyPoweri stand for meetings sometimes :)14:42
bdxjam: ping14:50
lazyPoweriatrou: ok, are you familiar with juju-crashdump?14:59
lazyPoweriatrou: it would probably be easier to just collect the env so i dont have to come back for more logs down the road.14:59
iatroulazyPower: apologies otp, getting back to you in a bit15:02
lazyPoweriatrou: no worries, ping when you're available15:02
magicaltroutotp? off to the pub?15:04
aisraelon the phone, but the pub sounds good too. ;)15:05
magicaltroutah15:09
magicaltroutyeah the pub is the better one15:09
magicaltroutor15:09
magicaltroutotpitp15:09
magicaltroutis acceptable if you need to combine work and alcohol15:10
ZiclazyPower: found a minor "bug" -> kube-apiloadbalancer drop my kubectl executed remotely after 90s15:28
Zicmaybe because of the       proxy_read_timeout      90;15:29
Zicat /etc/nginx/sites-enabled/apilb15:29
Zics/executed/exec executed/15:29
lazyPowerZic: you can try increasing that to some absurdly large number like 99999999;15:30
lazyPoweryou cannot wholesale disable the read timeout though, that much i'm 90% certain of.15:30
ZicI switched it to 600s/10min instead15:31
Zicbut I don't know if I will be overwritten at next Juju check, or just at next juju upgrade-charm kube-apiloadbalncer (not important)15:31
mbruzekZic - Then you can edit the template used to write that file. /var/lib/juju/agents/unit-kubeapi ... find that template file and edit it there15:33
Zicif it's just at next-upgrade, it's not a very big problem, and actually, next-upgrade is switching to HAProxy right? :)15:35
Zicif it's actively overwritten, yeah, I will edit the template, thanks for the trick15:35
mbruzekNot sure when haproxy is coming, we had problems with the tls keys15:35
mbruzekpassing through the traffic as a layer 4 router, the addresses were incorrect15:36
lazyPowero/ mbruzek15:46
mbruzekheyo15:46
lazyPowermbruzek: i saw you on steam lastnight and got giddy :D15:49
mbruzekFirewatch is a great linux game15:50
lazyPower+1 to that15:50
lazyPowerso is *Drumroll* ARK15:50
lazyPowerbut i'm clearly biased ;)15:50
mbruzekAh ark15:50
lazyPoweri started playing the even more difficult version: Scorched Earth15:50
lazyPoweri need to find some good persian music as a backdrop while playing15:50
lazyPoweror maybe just bolero on a loop15:51
kwmonroeaisrael: marcoceppi:  will one of you confirm the right benchmark doohickey to import:15:58
kwmonroe>>> from charmhelpers.contrib.benchmark import Benchmark15:58
kwmonroe>>> from charms.benchmark import Benchmark15:58
kwmonroeit's the last one, right?15:58
aisraelthe last one, yeah15:58
kwmonroegracias15:58
aisraelcharms.contrib.benchmark was the initial version, before we spun it out15:58
kwmonroeack15:59
Zicthe ambiance of Firewatch is brillant :>16:25
lazyPowerZic: +1 to that, Panic! writes great software16:36
lazyPowera bit pricey, but excellent tools all the same16:36
=== lutostag_ is now known as lutostag
sivaI am seeing an issue where 'juju status' command just hangs17:29
=== siva is now known as Guest31335
Guest31335How do I resolve this issue?17:30
Guest31335Both the 'juju status' and 'juju list-models' command just hangs17:30
Guest31335ubuntu@juju-api-client:/var/log$ juju --version 2.0.2-xenial-amd6417:31
Guest31335How do I resolve this one>17:31
Guest31335How do I resolve this one?17:31
=== frankban is now known as frankban|afk
=== lutostag_ is now known as lutostag
=== mup_ is now known as mup

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!