/srv/irclogs.ubuntu.com/2018/09/03/#juju.txt

veebers	anastasiamac: can do :-)	00:43
veebers	anastasiamac: LGTM	00:48
veebers	wallyworld: I'm seeing "agent lost, see 'juju show-status-log mariadb/0'" you mentioned this last week but I can't recall the context. Is this something that's new, I wasn't seeing it earlier (before I rebased develop onto my branch)	00:50
wallyworld	veebers: the new presence implementation break k8s status. it's something i need to fix	00:51
veebers	wallyworld: ack, ok that makes sense that I'm seeing it then :-) As long as I know the reason I'm comfortable that I'm sane (ish)	00:51
veebers	wallyworld: in other news: https://pastebin.canonical.com/p/tWCjdgMCH7/ (re: units being terminated)	00:52
wallyworld	veebers: good, so that means we don't need to explicitly set the status in the api facade	00:53
veebers	indeed	00:53
hloeung	is it safe to upgrade juju from 2.3.8 to 2.4.3?	00:54
anastasiamac	wow, so provisioner unit tests now started failing as frequently as 1 out 4 times... working thru these	01:43
anastasiamac	hloeung: fwiw, yes it should be safe. if u know otherwise, please let us know :)	01:53
hloeung	ok, will let you know when I get to upgrading our CI environment. Thnaks	01:53
thumper	hloeung: there are no known issues upgrading from 2.3.8 to 2.4.3	01:54
hloeung	ack, thanks	01:56
thumper	babbageclunk: is your recent fix for the sometimes timeout with the raft worker test?	02:20
veebers	wallyworld: I'm confused, I'm trying to deploy cs:~wallyworld/caas-mariadb (without doing the 'juju trust aws-integrator' step to create an error). With "kubectl -n message log -f juju-operator-mariadb-744bb855-vtvbd" I see no complaints; with juju debug-log -m controller I see "ERROR juju.worker.dependency "caas-unit-provisioner" manifold worker returned unexpected error: resource name may not be empty" every	02:26
veebers	4 seconds. I must be missing something obvious	02:27
wallyworld	veebers: did you use the --storage deply arg?	02:28
wallyworld	the way it is erroring is a bug also	02:28
veebers	wallyworld: aye, "ERROR juju.worker.dependency "caas-unit-provisioner" manifold worker returned unexpected error: resource name may not be empty"	02:28
veebers	sorry, juju deploy cs:~wallyworld/mariadb-k8s --storage database=10M,k8s-ebs	02:28
wallyworld	did you create the storage pool?	02:29
wallyworld	juju create-storage-pool k8s-ebs kubernetes storage-class=juju-ebs storage-provisioner=kubernetes.io/aws-ebs parameters.type=gp2	02:29
veebers	yep, as per discourse post	02:29
wallyworld	maybe there's a bug if oeprator storage is missing	02:30
wallyworld	juju create-storage-pool operator-storage kubernetes storage-class=juju-operator-storage storage-provisioner=kubernetes.io/aws-ebs parameters.type=gp2	02:30
veebers	wallyworld: I'll run that now?	02:31
wallyworld	yeah	02:31
* veebers makes it so		02:32
wallyworld	you may need to deploy a new app though	02:32
veebers	ah, ack	02:33
wallyworld	but i expect it should work	02:33
wallyworld	it should bounce the worker and read the nre storage ppol info	02:33
wallyworld	*new	02:33
veebers	I'm trying a new deploy, was still sing the manifold errors in logs every 4sec	02:34
* anastasiamac imagines veebers singing errors		02:36
anastasiamac	like a snake charmer	02:36
veebers	ugh, I'm seeing "Warning FailedScheduling 6s (x8 over 1m) default-scheduler pod has unbound PersistentVolumeClaims" for the new deploy, but that's not reflected in juju status. it's not getting pushed through the cloud container status properly it seems	02:36
veebers	anastasiamac: hah ^_^	02:36
thumper	anastasiamac: I'm looking at another intermittent test failure	02:47
thumper	and I think I have worked out the race in that...	02:47
thumper	it may be similar to yours...	02:47
thumper	I'm thinking through a solution	02:48
anastasiamac	thumper: which test and what race?	02:48
* thumper dabbles		02:48
anastasiamac	thumper: mine seems to be all in provisioner_task	02:48
thumper	http://10.125.0.203:8080/view/Unit%20tests/job/RunUnittests-s390x/lastCompletedBuild/testReport/github/com_juju_juju_apiserver_facades_client_client/TestPackage/	02:48
thumper	FAIL: client_test.go:762: clientSuite.TestClientWatchAllAdminPermission	02:48
thumper	the fundamental problem is the test goes:	02:48
thumper	do something	02:48
thumper	do something	02:48
thumper	start watcher	02:48
thumper	expect X changes	02:48
thumper	there is the expectation that the second do something had been processed before the watcher started	02:49
thumper	so there is a race there	02:49
anastasiamac	thumper: oh k... i hope it similar to mine...	02:49
thumper	there is another bug... and a kinda bug one...	02:50
anastasiamac	thumper: m owrking at the moment at "start controller machine, start another machine, remove 1 machine... oooh... both machines removed"...	02:50
thumper	related to CMR	02:50
anastasiamac	so mayb our failures are similar but mayb not...	02:50
anastasiamac	ouch, cmr bugs are scary :)	02:50
thumper	that I'm not sure whether it has real world impact or not	02:50
* anastasiamac looks in directoin of FTP and wallyworld :D		02:50
wallyworld	huh?	02:51
anastasiamac	wallyworld: nothing	02:51
anastasiamac	thumper and i are having fun with watcher intermittent test failures :D ignore me	02:51
thumper	wallyworld: the multiwatcher interaction with CMR is questionable	02:51
wallyworld	multiwatcher does report on reote apps	02:53
thumper	yes, but it seems more by good luck	02:56
thumper	wallyworld: did you want to chat about caas presence at some stage?	02:58
wallyworld	nah, fixed it	02:58
wallyworld	just testing	02:58
veebers	sigh, I almost tried to put my glass of water in my pocket so I could carry my muesli bar back to the office :-\|	03:11
anastasiamac	veebers: it does get better... at least it was not a hot drink like coffee or tea	03:12
veebers	hah ^_^ true that	03:12
wallyworld	thumper: fyi, the prsence fix https://github.com/juju/juju/pull/9150	03:13
anastasiamac	thumper: https://github.com/juju/juju/pull/9151 (one test fix) ... m chasing the 2nd one	03:19
anastasiamac	turns out we r just way too efficient now sometimes	03:20
* thumper looks at both		03:26
thumper	I'm testing a fix for mine too	03:27
thumper	anastasiamac: I think my fix would be more appropriate	03:28
thumper	anastasiamac: I think yours is adjusting the timing by side-effect	03:28
thumper	the start sync method doesn't do any syncing with the underlying txn watcher	03:28
* thumper sighs		03:28
thumper	mine just failed too	03:28
thumper	FFS	03:29
thumper	I made the race much smaller... but it is still there	03:29
* thumper thinks some more		03:29
thumper	testing async code is hard...	03:30
anastasiamac	thumper: k, m chasing the 2nd failure... m sure that the 1st failure is not t=with code but with test setup..	03:30
anastasiamac	thumper: hence, the sync felt appropriate	03:30
thumper	the StartSync doesn't do anything for the JujuConnSuite	03:30
thumper	except poking the presence worker	03:31
* thumper thinks		03:31
thumper	and something else	03:31
* thumper goes to look at the something else		03:31
anastasiamac	thumper: k	03:31
thumper	pingBatcher	03:32
anastasiamac	thumper: what about it?	03:32
thumper	that is the other thing StartSync pokes	03:32
thumper	presenceWatcher and pingBatcher	03:32
thumper	nothing to do with the normal watchers	03:32
anastasiamac	thumper: right. so the first failure was becase we were creating a machine, setting harvest mode and removing in hopes that harvest mode will b respected... occasionally, and now more often, harvest mode was not set when we came to remove... hence we failed...	03:33
* thumper nods		03:33
anastasiamac	thumper: as soon as sync was addeed before removal, the failure disappeared	03:33
thumper	but that was just due to a change in timing	03:34
thumper	if you added sleep 10ms it would probably do the same	03:34
thumper	we work really hard to have workers work asynchronously	03:34
thumper	then want control in tests	03:34
anastasiamac	thumper: k... can we ho?	03:35
thumper	sure	03:35
veebers	wallyworld, kelvinliu__ : any idea what might cause the error; pod has unbound PersistentVolumeClaims?	04:10
wallyworld	if the underlying volume cannot be created	04:11
veebers	wallyworld: ok, so I did create-storage-pool, is it likely something aws related? Perhaps previously storage bits wheren't cleaned up?	04:13
wallyworld	new volumes are created on demand	04:13
wallyworld	did you deploy the aws-integrator?	04:13
wallyworld	and used juju trust?	04:13
wallyworld	kubectl get all,pv,pvc	04:14
wallyworld	will show status of volumes and claims	04:14
kelvinliu__	veebers, Is the aws-integrator/0 in active status?	04:14
veebers	wallyworld: ah hah right, no I didn't do the juju trust part. So this is the failure I was expecting to see right?	04:14
wallyworld	yup	04:15
wallyworld	that's what should be surfaced in juju status	04:15
veebers	wallyworld: ok cool. I'm not seeing it surfaced, need to debug why	04:15
wallyworld	veebers: fyi, "lost" status fix just landed	04:26
veebers	wallyworld: yay, thanks!	04:28
veebers	gah, why is "Running machine config. script" taking so damn long in aws/ap-southeast-1. It would be quicker to deploy locally in lxd :-\|	04:41
veebers	wallyworld (sorry to pester) Am I reading this right in that the storage error is stopping the operator pod from being deployed and thus the updateStateUnits et. al won't be in operation? (https://pastebin.canonical.com/p/rQrDsx7KRM/)	04:55
wallyworld	yes, that's right. but you should be able to deploy the operator without any storage unless there's a bug	04:57
wallyworld	if there is a bug and the operator does need storage, you could always create the mariadb storage pool with a dud provisioner	04:58
wallyworld	that should induce an error in deploying the mariadb unit	04:58
veebers	wallyworld: hmm, so I had created both operator-storage and k8s-ebs before deploying mariadb (still without having run juju trust for the k8s cluster)	05:02
wallyworld	when you run juju trust the storage will come good and thing will provision	05:02
wallyworld	so you can create a new different storage pool with a dud provisioner	05:03
veebers	wallyworld: right, but the intention is to be able to surface the fact that the storage is borked right?	05:03
wallyworld	and deploy a new mariadb with an alias usng that dud pool	05:03
veebers	ah shoot I also (somehow) misspelled the image path (caas-operator-image-path=veebers/caas-operator...) :-\	05:04
wallyworld	that would explain things a bit	05:04
wallyworld	you shouldn't need to create a storage ool for the operator	05:04
wallyworld	hence you can leave off the trust step	05:04
wallyworld	and the operator will deploy	05:04
veebers	but, it's not trying to install that as far as I can tell. At anyrate I'll fix that and re-deploy	05:04
wallyworld	and the app itself will fail;	05:04
veebers	ack	05:05
veebers	argh its still complaining about storage with the proper image url	05:07
veebers	wallyworld: does that suggest a bug where juju is putting storage constraints on the operator pod that shouldn't be there?	05:08
wallyworld	i'd have to see the error. but you can deploy the operator with storage and posion the app storage pool to get by	05:09
veebers	wallyworld: deploy op with storage as in run 'juju trust aws-integrator'?	05:11
wallyworld	yeah	05:11
veebers	ack ok cheers	05:11
wallyworld	just set up the mariadb storage pool with a typo in the provioner attribute	05:11
veebers	ah right, ack will do	05:12
veebers	I've just enabled trust, waiting for the scheduling to succeed	05:12
stickupkid	jam: I just want to ammend some stuff in here, before we merge https://github.com/juju/juju/pull/9148#	08:22
jam	stickupkid: sorry about that, I had 2 PR up, and accidentally submitted the wrong one	09:09
stickupkid	jam: haha, you can merge away now :p	09:10
manadart	Need a review: https://github.com/juju/juju/pull/9153	09:35
manadart	Small change, with easy Q/A.	09:35
stickupkid	manadart: looking	09:46
manadart	stickupkid: Ta.	09:48
stickupkid	manadart: done	09:48
manadart	Cheers.	09:49
manadart	stickupkid: As discussed - https://github.com/juju/juju/pull/9155	13:24
stickupkid	manadart: nice, will have a look now	13:28
wallyworld	babbageclunk: have you tried bootstrapping lately?	22:28
babbageclunk	wallyworld: not todat	22:29
babbageclunk	y	22:29
babbageclunk	wallyworld: y?	22:29
wallyworld	since late yesterday it's hung for m	22:29
wallyworld	e	22:29
wallyworld	just wondering if it's just me	22:30
veebers	wallyworld: in aws I see "Running machine config. script" take ages	22:31
wallyworld	for me on aws or lxd it just hangs at that point	22:31
babbageclunk	wallyworld: ok, having a go myself after pushing this change.	22:37
wallyworld	ok, let's see how it goes	22:38
veebers	wallyworld, babbageclunk: I got a successful bootstrap, took almost 40 minutes	22:45
babbageclunk	crazy	22:45
wallyworld	there's got to be something that's changed. it could be slow apt get of image updates or mongo or something	22:46
veebers	maybe cloud-init taking a while when it's apt installing?	22:46
veebers	heh	22:46
thumper	wallyworld: I've worked out this bug, but would like to talk to if you have a chance	22:46
wallyworld	sure, give me 5	22:46
wallyworld	otp	22:46
thumper	ack	22:49
thumper	wallyworld: actually, never mind	23:01
wallyworld	thumper: sorry, still in 1:1	23:01
babbageclunk	wallyworld: bootstrap was about normal speed for me	23:13
wallyworld	damn ok	23:13
veebers	babbageclunk: where were you bootstrapping to?	23:14
veebers	into? at? into probably	23:15
babbageclunk	veebers: localhost.	23:15
babbageclunk	I'll try aws	23:15
babbageclunk	ooh, meeting	23:15
thumper	wallyworld: this one is for you https://github.com/juju/juju/pull/9156	23:25
wallyworld	ok, will look after standup	23:25
veebers	babbageclunk: which region did you bootstrap aws? I'm using ap-southeast-1	23:27
babbageclunk	I used ap-southeast-2	23:28
wallyworld	thumper: lgtm, nice pickup	23:35
thumper	wallyworld: took me a while because I had the assumption that the initial state was wrong and we weren't waiting	23:37
wallyworld	seems obvious now	23:37
thumper	but it was initial state was right, and subsequent update fubared it	23:37
wallyworld	alays is after the fact	23:37
thumper	sure is	23:37

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!