/srv/irclogs.ubuntu.com/2017/01/27/#juju.txt

=== thumper is now known as thumper-afk
=== thumper-afk is now known as thumper
stubcory_fu: Should I do a followup MP that tears out the Python2 import machinery completely? Is there any Python2 code still out there using charms.reactive?06:11
SaMnCo@Zic thanks. Today is a bit busy for me, but can we do a call like next week?06:32
=== scuttlemonkey is now known as scuttle|afk
=== frankban|afk is now known as frankban
kjackalGood morning Juju world08:23
BlackDexhello :)08:29
BlackDexCan i upgrade a charm which is installed via cs, but i now want it to use a local version?08:29
BlackDexof use code.launchpad.net for its source?08:30
aisraelBlackDex, yes. check out the --switch flag of upgrade-charm08:31
BlackDexoke08:31
BlackDexi think i need --path :)08:33
AnkammaraoHi juju world08:47
Ankammaraodo we need to created terms each time we are pushing to charm store ..08:47
Ankammaraoor else it's enough to crdate one time08:48
ZicSaMnCo: I'm also very busy at this time at office because of Vitess (the Canonical Kubernetes was one of the quicker part :D) as we're late on the deadline, but I'm available through IRC all the (France UTC+1 o/) time. If you prefer an audio call I will try to find a solution :)09:11
Zicfeel free to pm me if you need09:12
ZicI saw in the Ubuntu Newsletter the blogpost of jcastro : so conjure-up is the now-official way to install Kubernetes through Juju? I personately used the "manual provisionning" of Juju as I'm on baremetal server and doesn't use Ubuntu MaaS09:17
ZicI will surely bootstrap new k8s cluster so I'm asking myself if I must continue through this or begin to use conjure-up for the next ones09:18
Zic(I know that conjure-up is just a ncurses-like GUI for Juju, but I don't know if this install-way does exactly the same vs. what I did)09:19
aisraelBlackDex, Aha. I was close!09:24
BlackDexaisrael: You indeed were, and it worked :)09:48
BlackDex--switch --revision and --path are mutually exclusive :)09:48
SaMnCoZic are you using MAAS?10:00
SaMnCofor bare metal management?10:01
SaMnCoif you do, then conjure-up will help. If not and you are on full manual provisioning then I guess you'll be good with your current method.10:01
SaMnCoconjure-up is a wizard to provide some help10:06
ZicSaMnCo: ok, yeah we don't use MaaS as we have a sensible same homemade product here, so I bootstrap Ubuntu Server from it, add this via juju add-machine over SSH11:33
Zicand when I want to deploy the canonical-bundle charms, delete all "newX" machines Juju want to pop, and reassign charms to machine already installed via manual11:34
Zic(just via drag'n'dropping)11:34
Zicat this step, I personally scale etcd to 5 instead of 3 by default, and put the EasyRSA charm at the same machine of kube-api-load-balancer11:35
Zic(and scale kubernetes-master to 3 also, I forgot to mention)11:35
Ziceven if we have a MaaS-like in our company, maybe I will try in the future to set a MaaS here just to automatize all with Juju :)11:37
SaMnCoZic that or write a juju provider for your tool. Is it all in house development or another product like crowbar ?12:45
ZicSaMnCo: completely homemade, it was to permit our customer to reinstall their VMs or physical server from our SI13:08
BlackDexhow can i define a local charm in a bundle file?13:18
BlackDexor atleast in which directoy does it look? "local:xenial/charm-name" should be enough i think?13:19
anrahBlackDex: charm: "./build/xenial/my-charm"13:20
anrahfor example13:20
BlackDexso instead of local i can just input the full path?13:21
anrahyep13:21
BlackDexoke :)13:21
BlackDexnice13:21
BlackDexthx13:21
BlackDexin juju 1.25 it took some hassel13:21
anrahif you download the bundle through the GUI you must change those manually13:21
anrahI haven't find a better way to do that13:22
BlackDexthats no prob13:22
BlackDexi have a bundle file already :)13:23
BlackDexusing the export via the gui makes a messy bundle file in my opinion13:23
anrahthat is true13:23
BlackDexi don't need the placements for instance13:24
BlackDexor annotations is they are called13:24
BlackDexow strange, i see a lot "falsetrue" in the exported file13:25
BlackDexthose are values which should be default13:25
BlackDex:q13:25
jcastroZic: yeah for first use we went with conjure-up because it's a better user experience, especially for those getting started, it's all juju under the hood though so it's all good.13:26
=== scuttle|afk is now known as scuttlemonkey
SaMnCoZic: whaow. This is a significant engineering effort, congrats on building that.13:45
Zichmm, my kubernetes-dashboard display a 500 error with "the server has asked for the client to provide credentials (get deployments.extensions)14:34
Zicdid you already see that? I just apt update & upgrade & reboot kubernetes-master and etcd, one per one14:35
Zicthe juju status is all green14:43
=== mskalka|afk is now known as mskalka
jcastrothat one sounds like a bug14:46
jcastrobut mbruzek and lazypower aren't awake yet :-/14:46
cory_fustub: There is not.  Other parts of the framework, mainly the base layer due to the wheelhouse, require Python 3.  So, +1 to pulling out py2 support14:46
Zicjcastro: I also have some error when running command that create or delete ressources, but they are random compared to kubernetes-dashboard error:14:47
Zickubectl create -f service-endpoint.yaml14:47
ZicError from server (Forbidden): error when creating "service-endpoint.yaml": services "cassandra-endpoint" is forbidden: not yet ready to handle request14:47
Zicthis kind of error14:48
jcastrook as soon as one of them shows up we'll set aside some time and get you sorted14:48
Zicthanks a lot14:49
ZicI will try to debug and collect some logs14:49
Zichttp://paste.ubuntu.com/23875089/14:57
Zic"has invalid apiserver certificates or service accounts configuration" hmm14:58
lazyPowerZic - thats a new one to me, hmmmm15:01
Zicmany pods are in CrashLoopBackOff such Ingress also :/15:02
lazyPowersounds like something botched during the upgrade. you ran the deploy upgrade to 1.5.2 correct?15:02
ZicW0127 15:01:40.848867       1 main.go:118] unexpected error getting runtime information: timed out waiting for the condition15:03
ZicF0127 15:01:40.850545       1 main.go:121] no service with name default/default-http-backend found: the server has asked for the client to provide credentials (get services default-http-backend)15:03
ZicI just upgraded the OS via apt update/upgrade15:03
lazyPowerDid the units assign a new private ip address to their interface perhaps?15:03
Zicand reboot the machine which host kube-api-load-balancer, kubernetes-master and etcd15:03
ZiclazyPower: hmm, to the eth0 interface?15:05
lazyPowerZic - correct. The units request TLS certificates during initial bootstrap of the cluster, and we dont yet have a mechanism to re-key with new x509 data, such as if the ip addressing changes15:06
lazyPowerwhich would yield an invalid certificate if the ip addresses changed15:06
lazyPoweri'm trying to run the gambit of what might have happened to cause this in my head15:06
ZicI only use one private eth0 interface (static) for management VMs like master, etcd and kube-api-loadbalancer/easyrsa15:08
Zicfor worker, I use bonding on two private interface15:09
Zicbut nothing change at this area :(15:09
lazyPowerok i dont think thats teh issue then if the addressing hasn't changed15:09
lazyPowerhmmm15:09
SaMnColazyPower, Zic would maybe removing the relation to easyrsa and adding it again fix?15:09
Zicfor info, I reboot the VM which host juju controller also15:09
Zicrebooted*15:10
lazyPowerZic - i dont think its a juju controller issue, its an issue with the tls certificates it seems. Something changed that's causing them to be invalid which is causing a lot of sick type symptoms with the cluster15:10
Ziclet me check the date of cert files15:10
Zicis it in /srv/kubernetes root?15:11
lazyPoweryep, the keys are stored in /srv/kubernetes15:11
Zic16 January15:11
Zic:(15:11
ZicSaMnCo: do I risk to loose the PKI if I do that?15:12
SaMnCothat's what I am asking myself, if it would just regen the certs for the whole thing or not15:12
SaMnColazyPower would know better15:12
lazyPowerSaMnCo - i'm mostly certain there's logic to check if the cert already exists in cache and will re-send the existing cert15:13
lazyPowerwe have an open bug about rekeying the infra but haven't taken an action on it yet15:13
Zicand there is some strange behaviour: via kubectl, I can do read action (get/describe) without any problem15:13
Zicbut write, like create/delete sometime return a Forbidden15:13
Zic(I posted the exact message above)15:13
Zicbut for Ingress or dashboard, it's a strong "nope"15:13
lazyPowerZic - hav eyou upgraded the kube-api-loadbalancer charm? we changed some of the tuning to disable proxy-buffering which was cuasing those issues15:13
SaMnCoI have seen that behavior in clusters where the relation with etcd or etcd itself was messy15:14
SaMnCok8s seems to keep a state as long as it can15:14
SaMnCoso if you break etcd, it will keep returning values for its current state, but will refuse to change anything15:14
ZiclazyPower: I don't upgrade any juju's charm, juste classical .deb via apt15:14
Zicoh, in the apt upgrade, I saw an etcd upgrading15:14
Ziccan it be...?15:15
lazyPower:| i sincerely hope this is not related to the deb package doing something with what we've done to the configuration of etcd post deployment15:15
lazyPowerif it is, i'm going to be upset and have nobody to complain to15:15
ZicI run an etcdctl member list on etcd machines15:15
Zicseems OK15:15
Zicbut I don't know what to do more to check the health15:16
lazyPowermember list and cluster-health are the 2 commands that would point out any obvious failures15:16
SaMnCoetcdctl cluster-health15:16
SaMnCoand tail the log, it tells if a member is out of sync15:16
Zichttp://paste.ubuntu.com/23875201/15:16
SaMnCook so not that issue then15:16
lazyPowerso that doesn't seem to be the culprit15:16
ZicI did a etcdctl backup before the upgrade also, just in case15:17
lazyPowerexcellent choice15:17
Zichmm, so it seems to be tied to the CA15:17
Ziccan I run some manual curl --cacert to one point of the API to check it?15:17
lazyPowerZic - yeah, so long as you use the client certificate or server certificate for k8s, you should be able to get a valid response if the certificates are valid15:18
lazyPowerthe server certificates are generated with server and client side x509 details. meaning the k8s certificates on the unit can be used as client or server keys.15:18
ZiclazyPower: it's what is strange : kubectl get/describe commands always work, kubectl create/delete at contrary works only 1 of 3, returning Forbidden message15:19
Zicand for Ingress/default-http-backend/kubernetes-dashboard side, it's just CrashLoopBackOff :(15:20
lazyPowerZic - can you check the log output on teh etcd unit to see if there's a tell-tale in there?15:20
Zicyep15:20
lazyPowerZic - it does sound like the cluster state storage is potentially at fault here15:20
Zicthe etcd cluster doesn't return any weird logs, it juste saw that I upgraded the etcdd package :s15:24
SaMnCoZic what are the logs of the Ingress/default-http-backend/kubernetes-dashboard pods?15:25
ZicUnpacking etcd (2.2.5+dfsg-1ubuntu1) over (2.2.5+dfsg-1) ...15:25
Zic(was the update)15:25
Zicthe '1ubuntu1' part15:26
Zicseems that the etcd from Ubuntu archive installed over the Juju charm one, no?15:26
ZicSaMnCo: (I'm pasting you the log shortly)15:26
lazyPowerZic - thats expected. the etcd charm installs from archive15:26
Zicyeah but as it has not some "ubuntu" tagged version in the deb-version, I thought it was from outside of the Ubuntu archive.ubuntu.com15:27
Zichttp://paste.ubuntu.com/23875254/15:29
ZicSaMnCo: ^15:29
lazyPowerZic - from your kubernetes master, can you grab the x509 details and pastebin it? openssl x509 -in /srv/kubernetes/server.crt -text15:33
lazyPoweri dont need teh full certificate output, just teh x509 key usage bits so i can cross ref this info w/ whats in the cert15:33
lazyPoweri'm expecting to find IP Address:10.152.183.1, in the output15:33
Zicoki15:34
lazyPowerZic - additionally, if you could run juju run-action debug kubernetes-master/0  && juju show-action-output --wait  $UUID-RETURNED-FROM-LAST-COMMAND15:35
lazyPowerit'll giv eyou a debug package you can ship us for dissecting the state of the cluster and we can try to piece together whats happened here15:35
Zic            X509v3 Subject Alternative Name:15:35
Zic                DNS:mth-k8smaster-01, DNS:mth-k8smaster-01, DNS:mth-k8smaster-01, IP Address:10.152.183.1, DNS:kubernetes, DNS:kubernetes.cluster.local, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local15:35
lazyPoweri think i transposed debug and kuberetes-master15:35
lazyPoweryeah, the certs valid, it has all the right SAN's i would expect to see there15:35
Zicthat part of the certificate?15:35
Zicok15:36
Ziclet me run this juju command15:36
* lazyPower sighs15:36
lazyPowerthis is a red herring, its something else thats gone awry15:36
Zicerror: invalid unit name "debug"15:37
Zichmm?15:37
Zicmaybe I need to inverse the two args :D15:37
Zicjuju run-action kubernetes-master/0 debug ?15:37
ZicAction queued with id: 99267d59-f3aa-467d-8686-130e90dc47a015:38
Zicseems to be that :)15:38
Zic# juju show-action-output --wait 99267d59-f3aa-467d-8686-130e90dc47a015:38
Zicerror: no action ID specified15:38
lazyPower:|15:39
lazyPowerjuju y u do dis15:39
lazyPowerZic  if you omit hte --wait, it'll give you what you're looking for now15:40
lazyPowerteh debug action doesn't take long to run15:40
lazyPowerits just aggregating information and then offers up a tarball of files15:41
Zichttp://paste.ubuntu.com/23875303/15:42
Zicbut at that path, I don't have any debug-20170127153807.tar.gz15:42
Zicam I missing something? :o15:42
lazyPowerCynerva - have we encountered any situations where the debug package isnt' created?15:42
lazyPower1 sec, cc'ing the feature author15:43
Zicif I run the proposed juju scp it's ok15:43
CynervalazyPower: I haven't seen anything like that, no15:44
lazyPowerwait so it did create?15:44
ZiclazyPower: if I run the juju scp manually, don't know if it was what you wait from me :)15:44
Zicor if the show-action should exec it15:44
SaMnCoZic, lazyPower other people have had that: https://github.com/kubernetes/minikube/issues/36315:44
lazyPowerZic - I'm looking for the payload from that juju scp command that showed up in teh action output15:46
lazyPowerZic - that tarball will have several files which includes system configuration, logs, and things of that nature15:46
Zicyeah, I have it15:46
Zicjust untared15:46
lazyPowerDo you have a secure means to send that to us? if not i can give you a temporary dropbox upload page to send it over15:47
Zicyeah, I can generate you a secure link15:47
lazyPowerexcellent, thank you15:47
mbruzekHello Zic, sorry I am late to the party. I heard you were having trouble with the Kubernetes cluster.15:48
Zicyeah :(15:50
ZiclazyPower: I pm-ed you the link with its password15:51
lazyPowerZic - confirmed receipt of the file15:51
Zicmbruzek: just run apt update/upgrade all over the different canonical-kubernetes machine, one per one, and the API begins to refuse some request for unknown reason15:51
lazyPoweri'll take this debug package and we'll dissect it to see if we can discern whats happened post apt-get upgrade. i can't for the life of me think what went wrong but i suspect there's clues in here.15:52
Zic(for TL;DR :))15:52
mbruzekThanks for bringing me up to speed.15:52
lazyPowerZic can you also send me the output from a kubernetes-worker node as well?15:52
lazyPowersame process to run the debug action15:52
ZiclazyPower: just before the upgrade, I kubectl delete ns <a_large_namespaces> and it was still in Terminating when I kubectl get ns15:53
Zicdon't know if it can help15:53
lazyPowerZic - it might be trash in the etcd kvstore, but i'm not positive this is the culprit yet15:53
Zicthe goal was to delete all large namespaces used for PoC, upgrade all the cluster, reboot it, and begins some prod; but it seems that it will not be the good day :p15:54
Zic(I'm generating you other logs)15:55
mbruzekZic: I am sorry you ran into this problem15:55
mbruzekZic Have you verified that kube-apiserver is running on your kubernetes-master/0 charm?15:56
ZicI'm running a permanent watch -c "juju status --color"15:57
Zicit should be red if it's not working, correct?15:57
Zicbecause all is green atm :)15:57
mbruzekZic not necessarily15:57
Zicoh15:57
Ziclet me check directly so15:57
Zicbut even if it was that, no queries will work at all, here I have some success via kubectl get/describe, random success with kubectl create/delete (resulting in Forbidden error sometime, and works just at the 2nd try...), and 0 success with Ingress & dashboard15:58
Zic(yeah it's running fine)15:58
lazyPowerZic  - ok we'er going to need a bit to sift through this data and see what we come up with16:04
lazyPoweri have the whole team looking at these debug packages, i'll ping you back when we've got more details16:04
Zicthanks for all your help!16:04
mbruzekZic: You rebooted the nodes after apt-get update?16:06
Zicyep16:06
Zicall of it16:06
mbruzekZic: Do you remember what time about? Looking at the logs I see some connection loss about 2017-01-25 10:3516:07
Zichmm, I begin the kube-api-load-balancer, 3 kubernetes-master and the two etcd at ~14:15 (UTC+1)16:10
Zicand finished 3 mores etcd and all kubernetes-worker 1 hour after I think16:11
mbruzekZic: OK that does not appear to be the problem then16:11
Zicbut on 25th january, all was fine16:11
Zic(didn't see the day, sorry)16:11
Zicthe exact timelaps is : I delete 4 large namespaces, that was forever in Terminating state, and no pods or other ressources was in Terminating, so I began to delete them one per one (without --force ou --grace=0, just normally)16:13
Zicall pods & svc was terminated, but the namespaces always show "Terminating" in the "kubectl get ns"16:13
Zicas I needed to upgrade and reboot all the cluster anyway, and saw an issue concering this fixed by rebooting the masters, I did it16:14
lazyPowerDELETE /apis/authorization.k8s.io/v1beta1/namespaces/production/localsubjectaccessreviews: (698.088µs) 405 -- this seems to be dumping stacks in teh apiserver log16:14
lazyPower405 response16:14
lazyPowerundetermined if this is the root cause, but it is consistent16:14
Zicyeah so it's maybe this large deletion which is the root cause :/16:14
Zicwas 4 namespaces hosting 4 Vitess Cluster labs16:15
lazyPowerlogging error output: "{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"the server does not allow this method on the requested resource\",\"reason\":\"MethodNotAllowed\",\"details\":{},\"code\":405}\n"16:15
lazyPowerwhich is interesting, i know for a fact you can delete namespaces16:15
lazyPoweri believe what might be the cause, is it caused some kind of lock in etcd16:15
Zicyeah16:15
Zicfor my previous labs, I just delete ns and all was clean16:15
lazyPowerand k8s is stuck trying to complete that reqeuest and etcd is actively being aggressively in denial about it16:15
Zicbut never deleted 4 larges one in the same time...16:15
lazyPowerbut not positive this is the root cause, we're still dissecting16:15
mbruzekZic our e2e tests do large deletes of namespaces so that should be fine.16:17
Zicok16:17
Zicatm, this namespace are still in "Terminating"16:18
ZicI check if rc, pods, services, statefulset, all the ressources was terminated16:18
mbruzekZic Did you reboot the etcd node(s) while this was still trying delete? Was there an order of reboot?16:18
Zicmbruzek: I just checked the rc/pods/svc/statefulset of this namespaces was goodly terminated, but the namespace was still blocked at Terminating16:19
ZicI rebooted the etcd node one by one16:19
Zic(and try etcdctl member list after each reboot)16:19
lazyPoweryeah16:19
ZicI have a previous backup of this morning for etcd16:20
Zic(and one after the upgrade)16:20
lazyPowerthe more we think this through, i think etcd is the core troublemaker here16:20
lazyPoweri think teh client lost the claim on the lock16:20
Zicbecause of the high amount of delete request or because of the upgrade via apt of its package?16:20
lazyPowercombination of the operation happenign and then being rebooted during the op16:21
lazyPoweretcd is still waiting for that initial client request to complete16:21
Zic:s16:21
lazyPoweri hear you, etcd is very finicky, and this is exactly why we label it as the problem child16:21
lazyPoweri'm looking up how to recover from this16:22
Zicall my troubles was with etcd for all that time :D with K8s or Vitess16:22
lazyPowerZic - can you curl the leader units ledaer status in etcd?16:22
lazyPowereg:  curl http://127.0.0.1:2379/v2/stats/leader16:22
lazyPowerteh leader is identified with an asterisk next to the unit-number in juju status output16:22
Zichmm16:24
ZicI have a non-printable character in return16:24
ZicI have a bad feeling about this16:24
* lazyPower 's heart sinks a little in his chest16:24
Zichttps://dl.iguanesolutions.com/f.php?h=1mvhf5F9&p=116:26
Zicoh wait16:28
Zicit's not the master16:28
Zicetcd/0*                   active    idle   5        mth-k8setcd-02             2379/tcp        Healthy with 5 known peers.16:28
ZicI will try here16:28
Zicsame non-printable-character :(16:28
mbruzekZic: juju run --unit etcd/0 "systemctl status etcd" | pastebin16:29
lazyPowerZic - etcdctl ls /registry/namespaces16:29
Zichttp://paste.ubuntu.com/23875518/16:29
Zicmbruzek: ^16:29
Zichttp://paste.ubuntu.com/23875523/16:30
ZiclazyPower: ^16:30
Zicjma, production, integration, development was the namespaces I deleted16:31
Zic(which is still locked to "Terminating" status)16:31
Zichmm, lazyPower I run the same curl with https instead of http16:34
Zicroot@mth-k8setcd-01:~# curl -k https://127.0.0.1:2379/v2/stats/leader16:34
Ziccurl: (35) gnutls_handshake() failed: Certificate is bad16:34
Ziceven with the "-k"16:34
lazyPowerZic - etcd is configured to listen to http on localhost16:34
Zicoh ok, so it was correct16:34
lazyPoweryou'll need https if you poll the eth0 interface ip16:34
ZicI try16:35
Zic# curl -k https://10.128.74.205:2379/v2/stats/leader16:35
Ziccurl: (35) gnutls_handshake() failed: Certificate is bad16:35
mbruzekZic: juju run --unit etcd/0 "journalctl -u etcd"  | pastebinit16:37
Zichttp://paste.ubuntu.com/23875565/16:38
Zicthe 14:17-14:32 interval is the upgrade/reboot I think16:40
lazyPowerZic - etcdctl ls /registry/serviceaccounts/$deleted-namespace16:43
lazyPowerdo you have 'default' listed in there in any of those namespaces?16:43
Zicroot@mth-k8setcd-02:~# etcdctl ls /registry/serviceaccounts/production16:43
Zic/registry/serviceaccounts/production/default16:43
Zicyep16:43
Zicsorry, I will be afk for 1 hour (breaking K8s cluster was not an sufficient punition, I'm also of the rotation on-call tonight... need to go home before it begins... double-punishment :D)16:48
mbruzekZic we were about to offer some face to face support. We can wait until you get home.16:49
mbruzekZic ping us when you are back17:07
bdxis there a reason juju automatically adds a security group rule to every instance that allows access on 22 from 0.0.0.0/0?17:08
bdxI'm guessing juju just assumes you will always be accessing the instance via public internet and not from behind vpn?17:09
lazyPowerZic - we think we've narrowed it down to the one area we dont have visibility into at the moment, we're missing debug info from etcd, and there's no layer-debug support in teh etcd charm at present. When you surface and have a moment to re-ping, we'd like to gather some more information from the etcd unit(s) under question and i think we can then successfully determine what has happened.17:56
stormmorehowdy juju world18:01
lazyPowero/ stormmore18:01
stormmoreso I got my first k8s cluster up and running yesterday, woot!18:02
lazyPowerAWE-SOME18:05
lazyPowerdoing anything interesting in there yet stormmore?18:05
stormmorenot yet, still teaching the Devs how to create containers18:06
stormmoreit will get more interesting when I migrate out AWS to our own hw18:06
stormmoreI just love all the "pretty" dashboards that I can "show off" to management18:07
stokachustormmore, it's what promotions are built on :)18:22
stormmorestokachu not in a startup where you already report to the CEO18:25
stormmorejob security maybe18:26
Zicping back lazyPower and mbruzek18:29
Zicsorry transports was hell18:29
stokachustormmore, hah, maybe some nice hunting retreats18:31
stokachustormmore, that may just be here in the south though18:31
stormmoreyeah that is a southern thing stokachu, definitely not a bay area thing18:31
ZiclazyPower: what do you need from etcd, just the journalctl entries?18:32
lazyPowerZic - can you grab that, the systemd unit file /var/lib/systemd/etcd.service   and the defaults environment file   /etc/defaults/etcd18:37
lazyPowerer18:37
lazyPowersorry /lib/systemd/service/etcd.service18:37
lazyPoweri clearly botched teh systemd unit file location. herp derp18:37
Zic/lib/systemd/system/etcd.service?18:39
Zicbecause /lib/systemd/service does not exist :)18:39
lazyPowercorrect18:39
Zic'k18:39
Zichttp://paste.ubuntu.com/23876230/18:40
Zicetcd.service date of Dec 18th Jan 16th, if it can help18:42
Zicoops, missing copy/paste18:42
Zicetcd.service is Dec 18th and /etc/default/etcd is Jan 16th18:42
lazyPowerok these unit files appear to be in order. We found some issues that also look related to the core problem regarding flannel not actually running on the units18:47
lazyPowerit failed contacting etcd18:47
mbruzekZic are you able to hangout with us for a debug session?18:47
Zicso sorry but I can't, my wife and my child will kill me if I run into a debugging-session with audio :/ but really appreciate your kindness, thanks18:50
ZicI'm out-of-office actually but I can do some IRC discretely :)18:50
ZiclazyPower: yes, we always discuss about this, but I `systemctl start flannel` at all rebooted kubernetes-worker18:51
Zicor maybe the time Flannel is not running sets the problem?18:51
Zic(and I also did a `juju resolved` on the flannel unit which was in error)18:52
Zicas you taught me :)18:52
lazyPowerZic - seems like flannel is having an issue contacting etcd per the debug output from kubernetes-master18:52
lazyPowerwhich in turn is causing the kubernetes api to not be available to pods, which is causing the pod crashloop18:53
mbruzekZic: Can you pastebin the /var/run/flannel/subnet.env also ?18:53
Zichmm, I remembered to start the flannel service after every kubernetes-*worker* reboot18:53
Zicnot on master18:53
Zic(as Juju doesn't tell me anything is on error after master reboots)18:54
mbruzekZic Were the flannel services not autostarting?18:54
Zicyes, on worker18:54
ryebotfwiw, that issue was fixed in 1.5.218:55
Zicnice, noted18:56
ZicI will plane an upgrade if I'm able to recover from this crash18:56
mbruzekZic: Can you get the /var/run/flannel/subnet.env  file for us?18:57
Zicoh sorry, I missed your message18:57
Zichttp://paste.ubuntu.com/23876307/18:59
ryebotThanks, Zic18:59
=== frankban is now known as frankban|afk
ZiclazyPower: at this point, you can tell me the truth, do you think I will be able to recover from this crash? :D not so important because it's not in prod yet, and it's easy to redeploy from scratch19:24
Zicbut to know of what mu monday will be made :)19:25
Zics/mu/my/19:25
lazyPowerZic - we have some ideas, but nothing definitive for the root cause so its hard to point at what  a fix would be without access to real time debug.19:25
lazyPowerZic - we're trapped in a meeting atm thats starting to wind down, but we've been scrubbing through the logs you sent all morning, it all seems to point back to flannel + etcd as the core of the issue19:25
lazyPowermbruzek - any ideas left to try before we call it DOA?19:26
Zicdon't hesitate to ping me for more info, I'm @home but lurking at IRC as usual19:27
lazyPowerZic - will do. just pending feedback from the remainder of the team actively cycling on the issue.19:27
lazyPoweri did the same oeprations you outlined on my home cluster running 1.5.1 and i got the intermediary flannel connection issue19:27
lazyPowerit resolved once i restarted the master however19:27
ZicI rebooted in this order : juju controller, kube-api-loadbalancer+easyrsa, kubernetes-master, kubernetes-worker19:28
lazyPowerThat all sounds correct to me in terms of ordering19:29
ZicI don't try to reboot anything since then19:29
lazyPowerwould you mind terribly trying to reboot the kubernetes-master unit one last time to see if it "unsticks" the error?19:29
Zicbut I can if it's needed19:29
Zicyeah, I can19:29
Zicthe 3 ones?19:29
Zicone by one?19:29
lazyPowerZic - i would pick one, and restart it yes. identified as the leader19:30
lazyPowerstart there, and lets see what results we get back from that single reboot19:31
lazyPowerif it looks promising, then cycle one by one the other two nodes19:31
lazyPowers/nodes/units/19:31
Zicreboot launched on the active master19:31
lazyPowerZic - i need to cycle into another role at the moment, but i'm leaving you in the very capable hands of mbruzek, ryebot, and Cynerva - they're going to keep the stream alive and ask for details about post-reboot19:32
ryebotReady and waiting, Zic :)19:33
Zicoki, thanks anyway lazyPower for your great help :)19:33
lazyPowerZic - no, thank YOU for the patience during this debugging session. I know its unnerving19:33
Zicryebot: o/, reboot finished, do I start the flannel.service?19:33
lazyPowerif we can fix it we'd like to do so19:33
ryebotZic: yes, please19:34
mbruzekZic: Yes19:34
Zicstarted, systemctl status seems correct19:34
ryebotgreat, and /var/run/flannel/subnet.env exists?19:34
lazyPowermbruzek ryebot - the flanneld unit not being up before kube-apiserver/scheudler/controller-manager might be a bigger portion of the error set as well.19:34
lazyPowerduring bootstrap that happens after flanneld has indiciated its running and available19:35
stormmoreHey lazyPower just out of curiousity, do you know why the k8s bundle adds flannel connection to the k8s master that takes a full /24? seems like a bit of a waste to me19:35
lazyPowerstormmore - expedience, that seems like an area we can optimize19:35
Zichttp://paste.ubuntu.com/23876577/19:35
ryebotlazyPower ack, we'll investigate19:35
Zicryebot: ^19:35
ryebotZic great, thanks19:35
Zicdo I mark the flannel/0 unit as "resolved" with juju cli?19:36
Zicit's in error atm19:36
stormmorelazyPower - seems like it, still not a deal breaker for me ;-)19:36
ryebotZic, yes, please19:36
Zicdone, it's green19:36
ryebotgreat19:36
ryebotone sec19:37
lazyPowerstormmore glad to hear it :) As you can see per the channel logs, we take deployments seriously and value all feedback. keep it comin. if you'd like to file a bug against github.com/kubernetes/kubernetes regarding the master service cider range we can angle to get it on the roadmap in the future.19:37
mbruzekZic: Can you also reboot the other masters and start flanneld as well.19:37
Zicok19:38
Zicok for the 2nd master19:40
Zicalso for the 3rd one19:41
ryebotokay, and you resolved the errors?19:41
Zicyep19:41
mbruzekZic can you try to create a simple pod to see if that works?19:41
stormmorelazyPower this channel is one of the reasons I am driving adoption in my company of mass and juju to underpin the k8s environment instead of other options19:41
mbruzekAre you still in a Crash Loop back off?19:41
lazyPowerstormmore <319:41
lazyPowerwe appreciate you too19:42
mbruzekThanks stormmore!19:42
Zicmbruzek: I will deploy a simple nginx pod, let me few seconds19:42
ryebotThanks, Zic19:42
Zic$ kubectl run my-nginx --image=nginx --replicas=2 --port=8019:44
ZicError from server (Forbidden): deployments.extensions "my-nginx" is forbidden: not yet ready to handle request19:44
Zicsame from before19:44
ryebotalright, thanks, Zic, one moment19:44
ZicI run it a 2nd time, same error19:45
Zicbut the 3rd time operation was executed19:45
Zicso, same as earlier19:45
stormmoreso on a less work related topic, when is the next convention that I can try and twisted the boss' arm into letting me attend?19:45
mbruzekZic what master does your kubectl point to?19:45
Zicthe kube-api-load-balancer19:46
ZicI can test from a master directly19:46
mbruzekZic: Do you have a bundle or description of how you deployed your cluster?19:46
Zicyeah : 1 machine for kube-api-loadbalancer and easyrsa (identified by mth-k8slb-01 in my infra), 3 machines for kubernetes-master (mth-k8smaster-0[123]), 8 kubernetes-worker (mth-k8s-0[123], mth-k8svitess-0[12], mth-k8sa-0[123])19:49
Zicand one Juju controller of course : mth-k8sjuju-0119:49
mbruzekZic: Can you pastebin juju status?19:49
Zicof course19:49
Zichttp://paste.ubuntu.com/23876663/19:50
lazyPowermbruzek - one thought just occured to me as well, if Zic  hasn't updated to our 1.5.2 release, the api-lb stil has the proxy buffer issue which reared its ugly head on delete/put requests as well.19:50
lazyPowerbut not certain if thats pertinent19:50
Zic(oh, I forgot to mention mth-k8setcd-0[12345] which running etcd parts)19:50
Zicmbruzek: ^19:50
Zicall is added manually ("cloud-manual" provisionning of Juju, even the AWS-EC2 instance)19:52
ZicI don't use the AWS connector since I have baremetal servers and EC2 instances in the same juju controller19:52
mbruzekZic would you be able to upgrade this cluster to see if our operations code corrects this problem?19:55
Zicyeah, it will be the first upgrade I conduct through Juju since the bootstrapping of this cluster :)19:55
Zicwhat is the recommended way?19:55
mbruzekWe have that documented in this blog post: http://insights.ubuntu.com/2017/01/24/canonical-distribution-of-kubernetes-release-1-5-2/#how-to-upgrade19:56
mbruzekSince you don't seem to be using the bundle, I would recommend the "juju upgrade-charm" steps19:56
mbruzekWe can walk you through it.19:56
ZicI use the canonical-kubernetes but just scale etcd from 3 to 5 and master from 1 to 3 in the Juju GUI19:57
Zic(and make easyrsa to be on the same machine as kube-api-loadbalancer)19:58
Zicthe canonical-kubernets bundle*19:58
mbruzekZic: Ah I see, still would recommend the upgrade-charm path19:58
Zicok, I just read the how-to-upgrade section, never use the upgrad-charm one by one :(19:59
mbruzekThere is a first time for everything!19:59
Zic:)19:59
Zicso I run `juju upgrade-charm <on_every_charm_one_by_one>` ?20:00
mbruzekZic: Just the applications in the cluster, kubernetes-master, kubernetes-worker, etcd, flannel, easyrsa, and kubeapi-load-balancer20:01
mbruzekThe _units_ will upgrade automatically when the _application_ does20:02
mbruzekSo you don't need to use the /0 /1 /2 /320:02
ryebotZic: Right, so literally as it is in those docs :)20:02
Zicok20:02
Zicjust to let me know, for the future (~yay~), is it the classical recommended way in a cluster like mine?20:03
Zicso I will try to memorize this :)20:03
ryebotZic: As far as I know, for custom deployments, yes.20:05
ryebotYou might be able to use your own bundle without version numbers, but I'm not sure. I'll have to investigate.20:05
mbruzekZic: Technically you could export a bundle of your current system and it would be reproducable, as well as you could edit the version numbers of the charms and upgrade your cluster in one step20:05
mbruzekZic: But we are just trying to fix the cluster here, we can add automation and reproduction as following steps20:06
Zicok, so as I modify the number of units and replace the easyrsa charm, the out-of-the-box way is via juju upgrade-charms, that's all I want to know, it's not the time to automation I agree :)20:07
Zic(upgrades is running btw)20:07
Zics/replace/place in the same machine of kube-api-loadbalancer/20:08
mbruzekZic: Actually you can export the deployment from the Juju GUI in one step, and it will have the machines and everything just as you have it now.  We could then copy that yaml and edit it.20:08
Zicsmooth, noted, I will study this after :)20:09
Zic(easyrsa, kube-api-load-balancer already OK, kubernetes-master in progress, I will follow then with etcd and kubernetes-master)20:10
Zicworker*20:10
Zichmm20:11
ZicI had a `watch "kubectl get pods --all-namespaces"` running20:11
mbruzekZic: The documenation describe this process: https://jujucharms.com/docs/2.0/charms-bundles20:11
Zicand suddenly all the pods switched to Running after the kubernetes-master upgrade20:11
Zic\o/20:11
mbruzekExcllent!20:11
Ziclet me check carefully20:11
lazyPower@!#$%^@!#$%!@#$%20:11
lazyPowerAWESOME mbruzek  GREAT WORK!20:11
Zicyeah, all is running20:12
mbruzeksweet20:12
ZicI continue the upgrade-charms20:12
lazyPoweri think this calls for a tradmarked WE DID IT!20:12
Zic(I will try to schedule a new pod deployments just after)20:12
Zicclap clap anycase o/20:12
Zicoh maybe "clap clap" is not sounding like a STANDING APPLAUSE in English, sorry :)20:13
mbruzekZic: It translated in my head just fine20:13
Zic:)20:13
Zicetcd charms is upgrading20:14
Zic<crossfinger>20:14
Zicdone for etcd, I'm ending with kubernetes-worker charm20:15
Zicall upgrades is finished20:17
Ziclet me run a kubectl deployment20:17
mbruzekZic: Can you create a small test deployment?20:17
Zicyep20:17
Zic(oh btw, kubectl get ns does not return any of the old locked Terminating namespace)20:17
Zicit's clean now20:17
lazyPoweryassssss20:18
ryebotgreat20:18
Zicso, it works but something weird but maybe normal : some Ingress goes to CrashLoopBackOff but it looks to come back in Running atm20:20
Zicyeah, they are all running now20:20
Zicsome strange flap20:20
lazyPowerit the default-http-backend pod is under re-creating it'll crash loop the ingress controller until the backend stabilizes20:20
lazyPowerthats known and normal20:20
Zicoh, so it's normal20:20
ZicCOOL :D20:21
lazyPowerphwew20:21
lazyPowerman, fan-tastic20:21
Zicso your new version amazingly correct everything20:21
Zicand I don't know if it's tied to it, but Scheduling is much more quicker20:21
lazyPowerthats a 1.5.2 fixup :)20:21
ZicI waited sometime up to 3min between Scheduling and ContainerCreating20:22
lazyPowerplus you probably have less pressure on the etcd cluster without those large namespaces20:22
Zichere it was done in 10s20:22
mbruzekZic: The upgrade process installed new things and reset the config files to what we would expect, that is why I think you are having so much success here.20:24
mbruzekZic: Is everything ok? do you have any other problems?20:24
ZicI'm checking some more test but all seems OK now !!20:25
mbruzekZic: There was also a fix for LB where we turned off the proxy buffering in the 1.5.2 update.20:26
Zichmm, some Ingress stayed in CrashLoopBackOff and others are running correctly:20:28
Zichttp://paste.ubuntu.com/23876828/20:28
mbruzekZic it is possible that those are still being "restarted" by the operations20:30
mbruzekBut good information.20:30
Zichmm, kubernetes-dashboard is staying in CrashBackLoopBackOff, I forgot to check the kube-system namepace :/20:31
Zic(kube-dns was also Crashed, but it is Running since the upgrade)20:31
ryebotZic, is your juju status all-green right now?20:32
Zicyep20:32
ryebotthanks20:32
Zicheapster also was CrashLoopBackOff and just Running fine now20:36
mbruzekZic: so just the ingress ones are in CLBO ?20:37
Zicthe only crashloopbackoff now is the dashboard and some of the Ingress20:37
Zicoh I anticipated the question :)20:37
ZicI'm trying to relaunch the pod of kubernetes-dashboard, maybe it will help20:38
Zic  2m1s163{kubelet mth-k8svitess-01}WarningFailedSyncError syncing pod, skipping: failed to "SetupNetwork" for "kubernetes-dashboard-3697905830-qv6hv_kube-system" with SetupNetworkError: "Failed to setup network for pod \"kubernetes-dashboard-3697905830-qv6hv_kube-system(8785f143-e4cc-11e6-b87d-0050569e741e)\" using network plugins \"cni\": open /run/flannel/subnet.env: no such file or directory; Skipping pod"20:40
mbruzekZic: Did you get to juju upgrade-charm flannel?20:41
Zicno, I just realize that, I thought it was a part of the kubernetes-worker charms20:41
Ziccan I run its upgrade now?20:41
mbruzekZic: Yes please20:42
Zicdone20:44
mbruzekZic: Can you please pastebin `kubectl logs nginx-ingress-controller-jlxr5` ?20:45
mbruzekWe want to see why that is in a CLBO20:45
Zichttp://paste.ubuntu.com/23876915/20:46
Zichmm20:47
ZicI didn't have this error before20:47
mbruzekAre they still in crash loop back off?20:47
Zicyep20:47
Zicoh20:47
Zicdashboard is running20:47
Zicand 2 Ingress are back in Running20:48
ryebot6 total ingress running now?20:48
Zic2 are always in CLBO but I think it will combe back20:48
Zichop, all Running20:48
ryebot8 total running ingress?20:48
Zicyep20:48
ryebotgreat20:48
Zicall is Running now, and I'm checking --all-namespaces20:48
Zic(I forgot kube-system earlier :/)20:49
Zicdashboard is working effectively20:49
Zichmm, 2 Ingress are returning in CLBO atm20:49
mbruzekZic: Send us pastebin logs for those20:50
Zichttp://paste.ubuntu.com/23876940/20:50
ZicI have another log from the RC of Ingress:20:52
ZicLiveness probe failed: Get http://10.52.128.99:10254/healthz: dial tcp 10.52.128.99:10254: getsockopt: connection refused20:52
ryebotthanks, looking20:53
=== wolverin_ is now known as wolverineav
Zicso the only CLBO part now is that 2 Ingress:20:53
Zicdefault       nginx-ingress-controller-vg9qc            0/1       CrashLoopBackOff   17         13h20:53
Zicdefault       nginx-ingress-controller-w1dhl            0/1       CrashLoopBackOff   92         10d20:53
lazyPowerZic - which namespace is this in?20:54
Zicdefault, it's the builtin nginx-ingress of the canonical-kubernetes bundle20:54
lazyPowerack, ok20:54
Zic(I don't add any Ingress myself)20:54
Zicingress controller* to use the right terminology20:54
lazyPowercan you kubectl describe po  nginx-ingress-controller-vg9qc | pastebinit20:55
Zic(Actually, I add Ingress, but not Ingress controller)20:55
Zichttp://paste.ubuntu.com/23876964/20:56
ryebotthanks20:56
lazyPowerZic - thats totally fine20:57
lazyPowerhmm nothing is leaping out at me from the pod description.... it did say it was failing health checks20:57
lazyPowerfrom earlier pastes it looked like it was running out of file descriptors20:58
Zicin fact, my Ingress deserve well my testing website21:00
Zicbut this two controller stays in CLBO :/21:00
Zicthe 6 others is well Running21:00
cholcombeis it possible to tell juju to give my instance 2 ip addrs when it starts them?21:00
lazyPowercholcombe - you have to sacrifice 40tb of data and do the chant of "conjure-man-rah"21:01
ZiclazyPower: if I reboot (yeah, it's a bit brutal) nodes which host this two CLBO, maybe it will repop as Running?21:02
cholcombeman i need to brush up on that chant haha21:02
lazyPowerin short, i dont know but i think extra-bindings and spaces is what would introduce that functionality21:02
ZiclazyPower: it's what I did for kubernetes-dashboard21:02
lazyPowers/know/so/21:02
lazyPowerno no, nevermind that last edit, it was right21:02
lazyPowerZic - its worth a shot.21:02
stormmorehmmm that is curious for an "idle" cluster, one of my nodes just dropped 4GB memory usage and increased network tx for potential no obvious reason!21:03
ZiclazyPower: it seems to work, weird but I like it xD21:07
ZicI will wait few minutes before confirming21:07
lazyPowerZic - i'm going to blame gremlins for that one21:07
Zicdefault       nginx-ingress-controller-7qcsn            0/1       CrashLoopBackOff   10         13h       10.52.128.135   mth-k8sa-0121:10
Zicdefault       nginx-ingress-controller-lx6kt            0/1       CrashLoopBackOff   16         13h       10.52.128.253   mth-k8sa-0221:11
Zicit don't work so long :)21:11
ryebotZic, another option is destroying them; the charm should launch new ones to replace them on the next update (<5 mins)21:11
Zicwith the same error: F0127 21:11:44.648841       1 main.go:121] no service with name default/default-http-backend found: the server has asked for the client to provide credentials (get services default-http-backend)21:12
Zicryebot: thanks, I will try21:12
Zicit pops again and they are Running21:13
Ziclet few minutes pass to confirm :)21:14
ryebot+121:14
ZicCLBO for one of them21:15
mbruzekZic: Please try one more thinkg21:15
mbruzekjuju config kubernetes-worker ingress=false21:15
mbruzek*wait* for the pods to terminate.21:15
Zicthe two new bringed up ingress-controller are now in CLBO again21:15
Zicmbruzek: ok21:16
mbruzekThen you should be able to juju config kubernetes-worker ingress=true21:16
Zicmbruzek: (it mades me remember that at the end, I need to shut off the debug I enabled earlier on the kubernetes-master?)21:17
Zicall pods are terminated21:18
Zicrewitch to true21:19
Zicall is Running, let's wait some minutes21:20
Zicthey are all in CLBO now xD21:20
Zicbut it switch again to Running21:21
lazyPowerit will register as running when they initially come up, but have to pass the healthcheck21:21
lazyPowerZic - can you juju run --application kubernetes-worker "lsof | wc -l"   and pastebin the output of that juju run command?21:21
Zichttp://paste.ubuntu.com/23877086/21:23
ZicAll Ingress are flapping between CLBO and Running now21:24
lazyPoweri'm not positive, but k8s worker/2 and k8s-worker/1 seem to have a ton of open file descriptors21:24
lazyPowerlooks like something is leaking file descriptors :|21:24
lazyPowerwhich would explain the crash loop backoff on a segment of the ingress controllers vs the handfull that succeed21:25
lazyPowerZic - at this point we'll need a bug about this, and can look into it further, but we're not in a position to recommend a fix at this time.21:26
lazyPowerwe have encountered this before but the last patch that landed should have both a) enlarged the file descriptor pool, and b) hopefully corrected that. we might be behind in a path release on the ingress manifest that fixed this21:26
lazyPoweri'll take a look in a bit, but i think we're 1:121:27
Zicin fact the Ingress is working even if the Ingress Controller are flapping21:27
lazyPowerdo you have any ingress controllers listed as running that are not in CLB?21:27
Zicstrange21:27
Zicyeah, as they flap between the two state not at the same time21:27
lazyPowerif thats the case, kubernetes will do its best to use a functional route throuth the ingress API21:27
lazyPoweras it round-robin distributes them. you'll likely find some requests that get dropped if you run something like apache-bench or bees-with-machine-guns against it21:28
lazyPowerbut typical single user testing likely looks fine21:28
ryebotZic, one last thing - can you pastebin /lib/systemd/system/kubelet.service on a kubernetes-worker node?21:28
Zichttp://paste.ubuntu.com/23877117/21:30
lazyPowerjuju run --application kubernetes-worker "sysctl fs.file-nr" | pastebinit -- as well would be helpful in ensuring it is indeed related to file descriptors. this will list the the number of allocated file handles, the number of unused-but-allocated file handles, and the system wide max number of file handles21:30
Zichttp://paste.ubuntu.com/23877122/21:31
lazyPowerwell those fd numbers went way down21:32
lazyPowerhowever there are very different configurations listed there21:32
lazyPowerwhere some have 400409 and some have 6573449 listed as max21:32
ryebotThey might be his baremetal/ec2 machines?21:32
lazyPowerah, you are correct21:32
lazyPowerdifferent substrates21:32
lazyPowerdifferent rules21:32
Zicyeah, only mth-k8s-* and mth-k8svitess-* are robust physical servers21:33
lazyPowerall the units appear to be well within bounds of those numbers though21:33
Zicmth-k8sa-* are EC2 instances21:33
lazyPowerso maybe its not FD Leakage21:33
Zichttp://paste.ubuntu.com/23877130/ <= and this error about credential remembered some bad hours of this day21:35
Zic(which was the same error returned by some kubectl commands and the dashboard earlier)21:35
Zic21:28 is @UTC21:36
Zicso few minutes ago21:36
mbruzekZic: Can you please file a bug about this problem on the kubernetes github issue tracker? https://github.com/kubernetes/kubernetes/issues21:38
mbruzekMaybe someone else knows why the ingress would be in CrashLoopBackOff.21:38
mbruzekPlease list if you added any ingress things and what manifest you used for that.21:39
Zichmm, about this, maybe I can delete my Ingress21:39
mbruzekPut Juju in the title and your best estimation on how to reproduce these errors.21:39
mbruzekZic: The ingress=false would have deleted them no?21:40
mbruzekDid you put in your own _different_ ingress objects?21:40
Zicif you talk about Ingress controller, no, I stayed with the default nginx-ingress-controller of the charms bundle21:41
Zicbut I create two Ingress yep21:41
lazyPoweringress objects are related to, and depend on the ingress controller, but have very little to do with ingress controller operations21:41
lazyPowerunless i'm misinformed21:41
Zicok, so I promise you to fill a bug tomorrow morning, it's late now and if I want to fill a well-written/described bugs I prefer to do it seriously :) I will post the Issue link here tomorrow21:44
Zicthe step-to-reproduce part will be the most difficult part21:44
mbruzekZic yes21:44
mbruzekZic, I just don't know how to reproduce this21:44
mbruzekI realize it is late for you, sorry about the problems.21:44
Zicno worries, you were all formidable to help me and focus on this problem during hours today, I can at least pursue the debug on IRC even if I'm out of office!21:46
Zicthanks a lot mbruzek, ryebot, lazyPower, jcastro21:49
ryebotHappy to help, Zic!21:50
ZicI will pop again here tomorrow about the issue I will report21:50
ryebotthanks, feel free to me ping when you post it, I'd like to track it21:51
ryebotping me* :)21:51
Zichuh21:51
ZicI deleted and recreated my Ingress, and controller are now Running without flapping since 6min21:52
Zicengineering is SO 2016, magic is the new way21:52
Zic:|21:52
ryebotlol21:52
ryebotZic, when you say you deleted your ingress, can you provide the command you executed?21:52
Zickubectl delete ing <my_two_ingress>21:53
Zic(one was exposing a nginx-deployment-test with nodeSelector on machine labelled at our datacenter/Paris (baremetal servers))21:53
Zic(and one another on EC2 only)21:54
Zic8min without flapping, lol21:54
ryebotI wonder if there was a conflict with our automatic ingress scaling. Shouldn't be, but I should probably make sure.21:54
ZicI try to keep all traces to fill a bug anyway tomorrow21:55
ryebotHmm, you said you recreated them, too, so I guess that can't be it.21:55
ryebotI guess you're right... magic!21:55
Zicand add how I resolved, if it don't flap from tommorrow21:55
ryebot+1 sounds good, thanks Zic.21:55
Zicryebot: yeah, and it's really simple Ingress, with only one rule on the hostname21:55
Ziclike baremetal.ndd.com for the first, ec2.ndd.com for the second21:56
Zic11min without flapping \o/21:56
ryebot\o/21:56
* Zic will buy some magic powder before sleeping21:57
ryebotheh21:57
Zichmm, I discovered some more information in kubectl describe ing22:00
Zicall the operation that the controller did22:00
ZicI didn't know where to find those22:00
ZicI have a lot of   42m42m1{nginx-ingress-controller }WarningUPDATEerror: Operation cannot be fulfilled on ingresses.extensions "nginx-aws": the object has been modified; please apply your changes to the latest version and try again22:01
Zicduring the CLBO period22:01
Zicand now it's working, juste MAPPING action22:01
Zicjust*22:01
Zic(no flapping for 38min :p I'm going to bed, g'night and thanks one more time to all of you :))22:18
lazyPowerZic : excelelnt news. have a good sleep and enjoy your weekend o/22:18
ZicI will ping back with the GitHub issue tomorrow o/22:18
stormmorewish there was a good recommendation guide for SDN hardware :-/22:24
stormmore(still think runninig some of the maas services on an sdn switch would rock!)22:24
=== scuttlemonkey is now known as scuttle|afk
stormmoresince it is Friday and I am not doing anhything to affect my cluster but I want to learn more about juju, is it possible run your own charm "store"?22:46
lazyPowerstormmore - not at this time22:49
stormmorelazyPower another "future work" thing then :)22:49
lazyPowerstormmore - you can keep a local charm repository which is just files on disk, but as far as running the actual store display + backend service(s), thats not available for an on-premise solution22:49
bdxyo whats up with the charmstore ?23:37
bdxERROR can't upload resource23:37
bdxwill we ever fix this?23:37
bdxkillin' me here23:37

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!