/srv/irclogs.ubuntu.com/2018/09/03/#juju.txt

veebersanastasiamac: can do :-)00:43
veebersanastasiamac: LGTM00:48
veeberswallyworld: I'm seeing "agent lost, see 'juju show-status-log mariadb/0'" you mentioned this last week but I can't recall the context. Is this something that's new, I wasn't seeing it earlier (before I rebased develop onto my branch)00:50
wallyworldveebers: the new presence implementation break k8s status. it's something i need to fix00:51
veeberswallyworld: ack, ok that makes sense that I'm seeing it then :-) As long as I know the reason I'm comfortable that I'm sane (ish)00:51
veeberswallyworld: in other news: https://pastebin.canonical.com/p/tWCjdgMCH7/ (re: units being terminated)00:52
wallyworldveebers: good, so that means we don't need to explicitly set the status in the api facade00:53
veebersindeed00:53
hloeungis it safe to upgrade juju from 2.3.8 to 2.4.3?00:54
anastasiamacwow, so provisioner unit tests now started failing as frequently as 1 out 4 times... working thru these01:43
anastasiamachloeung: fwiw, yes it should be safe. if u know otherwise, please let us know :)01:53
hloeungok, will let you know when I get to upgrading our CI environment. Thnaks01:53
thumperhloeung: there are no known issues upgrading from 2.3.8 to 2.4.301:54
hloeungack, thanks01:56
thumperbabbageclunk: is your recent fix for the sometimes timeout with the raft worker test?02:20
veeberswallyworld: I'm confused, I'm trying to deploy cs:~wallyworld/caas-mariadb (without doing the 'juju trust aws-integrator' step to create an error). With "kubectl -n message log -f juju-operator-mariadb-744bb855-vtvbd" I see no complaints; with juju debug-log -m controller I see "ERROR juju.worker.dependency "caas-unit-provisioner" manifold worker returned unexpected error: resource name may not be empty" every02:26
veebers4 seconds. I must be missing something obvious02:27
wallyworldveebers: did you use the --storage deply arg?02:28
wallyworldthe way it is erroring is a bug also02:28
veeberswallyworld: aye, "ERROR juju.worker.dependency "caas-unit-provisioner" manifold worker returned unexpected error: resource name may not be empty"02:28
veeberssorry, juju deploy cs:~wallyworld/mariadb-k8s --storage database=10M,k8s-ebs02:28
wallyworlddid you create the storage pool?02:29
wallyworldjuju create-storage-pool k8s-ebs kubernetes storage-class=juju-ebs storage-provisioner=kubernetes.io/aws-ebs parameters.type=gp202:29
veebersyep, as per discourse post02:29
wallyworldmaybe there's a bug if oeprator storage is missing02:30
wallyworldjuju create-storage-pool operator-storage kubernetes storage-class=juju-operator-storage storage-provisioner=kubernetes.io/aws-ebs parameters.type=gp202:30
veeberswallyworld: I'll run that now?02:31
wallyworldyeah02:31
* veebers makes it so02:32
wallyworldyou may need to deploy a new app though02:32
veebersah, ack02:33
wallyworldbut i expect it should work02:33
wallyworldit should bounce the worker and read the nre storage ppol info02:33
wallyworld*new02:33
veebersI'm trying a new deploy, was still sing the manifold errors in logs every 4sec02:34
* anastasiamac imagines veebers singing errors02:36
anastasiamaclike a snake charmer02:36
veebersugh, I'm seeing "Warning  FailedScheduling  6s (x8 over 1m)  default-scheduler  pod has unbound PersistentVolumeClaims" for the new deploy, but that's not reflected in juju status.  it's not getting pushed through the cloud container status properly it seems02:36
veebersanastasiamac: hah ^_^02:36
thumperanastasiamac: I'm looking at another intermittent test failure02:47
thumperand I think I have worked out the race in that...02:47
thumperit may be similar to yours...02:47
thumperI'm thinking through a solution02:48
anastasiamacthumper: which test and what race?02:48
* thumper dabbles02:48
anastasiamacthumper: mine seems to be all in provisioner_task02:48
thumperhttp://10.125.0.203:8080/view/Unit%20tests/job/RunUnittests-s390x/lastCompletedBuild/testReport/github/com_juju_juju_apiserver_facades_client_client/TestPackage/02:48
thumperFAIL: client_test.go:762: clientSuite.TestClientWatchAllAdminPermission02:48
thumperthe fundamental problem is the test goes:02:48
thumperdo something02:48
thumperdo something02:48
thumperstart watcher02:48
thumperexpect X changes02:48
thumperthere is the expectation that the second do something had been processed before the watcher started02:49
thumperso there is a race there02:49
anastasiamacthumper: oh k... i hope it similar to mine...02:49
thumperthere is another bug... and a kinda bug one...02:50
anastasiamacthumper: m owrking at the moment at "start controller machine, start another machine, remove 1 machine... oooh... both machines removed"...02:50
thumperrelated to CMR02:50
anastasiamacso mayb our failures are similar but mayb not...02:50
anastasiamacouch, cmr bugs are scary :)02:50
thumperthat I'm not sure whether it has real world impact or not02:50
* anastasiamac looks in directoin of FTP and wallyworld :D02:50
wallyworldhuh?02:51
anastasiamacwallyworld: nothing02:51
anastasiamacthumper and i are having fun with watcher intermittent test failures :D ignore me02:51
thumperwallyworld: the multiwatcher interaction with CMR is questionable02:51
wallyworldmultiwatcher does report on reote apps02:53
thumperyes, but it seems more by good luck02:56
thumperwallyworld: did you want to chat about caas presence at some stage?02:58
wallyworldnah, fixed it02:58
wallyworldjust testing02:58
veeberssigh, I almost tried to put my glass of water in my pocket so I could carry my muesli bar back to the office :-|03:11
anastasiamacveebers: it does get better... at least it was not a hot drink like coffee or tea03:12
veebershah ^_^ true that03:12
wallyworldthumper: fyi, the prsence fix https://github.com/juju/juju/pull/915003:13
anastasiamacthumper: https://github.com/juju/juju/pull/9151 (one test fix) ... m chasing the 2nd one03:19
anastasiamacturns out we r just way too efficient now sometimes03:20
* thumper looks at both03:26
thumperI'm testing a fix for mine too03:27
thumperanastasiamac: I think my fix would be more appropriate03:28
thumperanastasiamac: I think yours is adjusting the timing by side-effect03:28
thumperthe start sync method doesn't do any syncing with the underlying txn watcher03:28
* thumper sighs03:28
thumpermine just failed too03:28
thumperFFS03:29
thumperI made the race much smaller... but it is still there03:29
* thumper thinks some more03:29
thumpertesting async code is hard...03:30
anastasiamacthumper: k, m chasing the 2nd failure... m sure that the 1st failure is not t=with code but with test setup..03:30
anastasiamacthumper: hence, the sync felt appropriate03:30
thumperthe StartSync doesn't do anything for the JujuConnSuite03:30
thumperexcept poking the presence worker03:31
* thumper thinks 03:31
thumperand something else03:31
* thumper goes to look at the something else03:31
anastasiamacthumper: k03:31
thumperpingBatcher03:32
anastasiamacthumper: what about it?03:32
thumperthat is the other thing StartSync pokes03:32
thumperpresenceWatcher and pingBatcher03:32
thumpernothing to do with the normal watchers03:32
anastasiamacthumper: right. so the first failure was becase we were creating a machine, setting harvest mode and removing in hopes that harvest mode will b respected... occasionally, and now more often, harvest mode was not set when we came to remove... hence we failed...03:33
* thumper nods03:33
anastasiamacthumper: as soon as sync was addeed before removal, the failure disappeared03:33
thumperbut that was just due to a change in timing03:34
thumperif you added sleep 10ms it would probably do the same03:34
thumperwe work really hard to have workers work asynchronously03:34
thumperthen want control in tests03:34
anastasiamacthumper: k... can we ho?03:35
thumpersure03:35
veeberswallyworld, kelvinliu__ : any idea what might cause the error; pod has unbound PersistentVolumeClaims?04:10
wallyworldif the underlying volume cannot be created04:11
veeberswallyworld: ok, so I did create-storage-pool, is it likely something aws related? Perhaps previously storage bits wheren't cleaned up?04:13
wallyworldnew volumes are created on demand04:13
wallyworlddid you deploy the aws-integrator?04:13
wallyworldand used juju trust?04:13
wallyworldkubectl get all,pv,pvc04:14
wallyworldwill show status of volumes and claims04:14
kelvinliu__veebers, Is the aws-integrator/0 in active status?04:14
veeberswallyworld: ah hah right, no I didn't do the juju trust part. So this is the failure I was expecting to see right?04:14
wallyworldyup04:15
wallyworldthat's what should be surfaced in juju status04:15
veeberswallyworld: ok cool. I'm not seeing it surfaced, need to debug why04:15
wallyworldveebers: fyi, "lost" status fix just landed04:26
veeberswallyworld:  yay, thanks!04:28
veebersgah, why is "Running machine config. script" taking so damn long in aws/ap-southeast-1. It would be quicker to deploy locally in lxd :-|04:41
veeberswallyworld (sorry to pester) Am I reading this right in that the storage error is stopping the operator pod from being deployed and thus the updateStateUnits et. al won't be in operation? (https://pastebin.canonical.com/p/rQrDsx7KRM/)04:55
wallyworldyes, that's right. but you should be able to deploy the operator without any storage unless there's a bug04:57
wallyworldif there is a bug and the operator does need storage, you could always create the mariadb storage pool with a dud provisioner04:58
wallyworldthat should induce an error in deploying the mariadb unit04:58
veeberswallyworld: hmm, so I had created both operator-storage and k8s-ebs before deploying mariadb (still without having run juju trust for the k8s cluster)05:02
wallyworldwhen you run juju trust the storage will come good and thing will provision05:02
wallyworldso you can create a new different storage pool with a dud provisioner05:03
veeberswallyworld: right, but the intention is to be able to surface the fact that the storage is borked right?05:03
wallyworldand deploy a new mariadb with an alias usng that dud pool05:03
veebersah shoot I also (somehow) misspelled the image path (caas-operator-image-path=veebers/caas-operator...) :-\05:04
wallyworldthat would explain things a bit05:04
wallyworldyou shouldn't need to create a storage ool for the operator05:04
wallyworldhence you can leave off the trust step05:04
wallyworldand the operator will deploy05:04
veebersbut, it's not trying to install that as far as I can tell. At anyrate I'll fix that and re-deploy05:04
wallyworldand the app itself will fail;05:04
veebersack05:05
veebersargh its still complaining about storage with the proper image url05:07
veeberswallyworld: does that suggest a bug where juju is putting storage constraints on the operator pod that shouldn't be there?05:08
wallyworldi'd have to see the error. but you can deploy the operator with storage and posion the app storage pool to get by05:09
veeberswallyworld: deploy op with storage as in run 'juju trust aws-integrator'?05:11
wallyworldyeah05:11
veebersack ok cheers05:11
wallyworldjust set up the mariadb storage pool with a typo in the provioner attribute05:11
veebersah right, ack will do05:12
veebersI've just enabled trust, waiting for the scheduling to succeed05:12
stickupkidjam: I just want to ammend some stuff in here, before we merge https://github.com/juju/juju/pull/9148#08:22
jamstickupkid: sorry about that, I had 2 PR up, and accidentally submitted the wrong one09:09
stickupkidjam: haha, you can merge away now :p09:10
manadartNeed a review: https://github.com/juju/juju/pull/915309:35
manadartSmall change, with easy Q/A.09:35
stickupkidmanadart: looking09:46
manadartstickupkid: Ta.09:48
stickupkidmanadart: done09:48
manadartCheers.09:49
manadartstickupkid: As discussed - https://github.com/juju/juju/pull/915513:24
stickupkidmanadart: nice, will have a look now13:28
wallyworldbabbageclunk: have you tried bootstrapping lately?22:28
babbageclunkwallyworld: not todat22:29
babbageclunky22:29
babbageclunkwallyworld: y?22:29
wallyworldsince late yesterday it's hung for m22:29
wallyworlde22:29
wallyworldjust wondering if it's just me22:30
veeberswallyworld: in aws I see "Running machine config. script" take *ages*22:31
wallyworldfor me on aws or lxd it just hangs at that point22:31
babbageclunkwallyworld: ok, having a go myself after pushing this change.22:37
wallyworldok, let's see how it goes22:38
veeberswallyworld, babbageclunk: I got a successful bootstrap, took almost 40 minutes22:45
babbageclunkcrazy22:45
wallyworldthere's got to be something that's changed. it could be slow apt get of image updates or mongo or something22:46
veebersmaybe cloud-init taking a while when it's apt installing?22:46
veebersheh22:46
thumperwallyworld: I've worked out this bug, but would like to talk to if you have a chance22:46
wallyworldsure, give me 522:46
wallyworldotp22:46
thumperack22:49
thumperwallyworld: actually, never mind23:01
wallyworldthumper: sorry, still in 1:123:01
babbageclunkwallyworld: bootstrap was about normal speed for me23:13
wallyworlddamn ok23:13
veebersbabbageclunk: where were you bootstrapping to?23:14
veebersinto? at? into probably23:15
babbageclunkveebers: localhost.23:15
babbageclunkI'll try aws23:15
babbageclunkooh, meeting23:15
thumperwallyworld: this one is for you https://github.com/juju/juju/pull/915623:25
wallyworldok, will look after standup23:25
veebersbabbageclunk: which region did you bootstrap aws? I'm using ap-southeast-123:27
babbageclunkI used ap-southeast-223:28
wallyworldthumper: lgtm, nice pickup23:35
thumperwallyworld: took me a while because I had the assumption that the initial state was wrong and we weren't waiting23:37
wallyworldseems obvious now23:37
thumperbut it was initial state was right, and subsequent update fubared it23:37
wallyworldalays is after the fact23:37
thumpersure is23:37

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!