/srv/irclogs.ubuntu.com/2018/02/09/#maas.txt

tasker	my maas server is running out of disk space. I don't want to turn off the maas-proxy service, but I do want to clean it up. is there an "official" way to do this? or can I simply "rm -r /var/spool/maas-proxy/*"?	03:02
=== frankban\|afk is now known as frankban
xygnal	roaksoax: please give me more detail about your larger unimpacted environment where this issue is not seen. what kind of hardware capacity? what sysctl settings on the OS?	15:11
roaksoax	xygnal: i don't have the details i hand as wasn't given that information. But 32GB of ram seems plenty to me. I have a feeling that this could be related to it running on vmware	15:23
roaksoax	that said	15:24
roaksoax	how UI intensive are you ?	15:24
xygnal	we are very UI intensive. we have as many as probably 4-6 people refreshing at the same time impatiently, beacuse refresh takes a long time.	15:55
xygnal	like i have implied before, if the first scan of nodes is slow. any further scans and reloads-of-scans just caues memory to go craaaaazy until its killed	15:57
xygnal	and that source reason of the first scan being SLOW is likely why we see it trigger so often	15:58
roaksoax	xygnal: so the devices list has bene testing with 8000+ nodes	15:58
roaksoax	the machine listing with 2000+ "fake nodes"	15:58
xygnal	and what data do these fake nodes have in them, that a real node would have, such as all of the logs of its commissions?	15:59
xygnal	are those not, normal, regular things to have in a node history? a real node hstory?	15:59
roaksoax	xygnal: the only difference is power paramaters	15:59
roaksoax	but everything else is filled	15:59
roaksoax	e.g. commissioning data, events, hardware testing data, etc	16:00
xygnal	ah if they are not checking power then i would expect them to simply breeze by at high speeds	16:00
xygnal	no hardware, no slowdowns	16:00
xygnal	that doest seem very effective testing to me :/	16:00
roaksoax	xygnal: so why dont you try to start a couple workers ?	16:00
roaksoax	xygnal: or at least an extra worker ?	16:00
roaksoax	xygnal: in 2.3 that's all done in systemd	16:00
xygnal	we can do that? i thought 4 was the limit	16:00
roaksoax	xygnal: we dont support more owrkers on 2.3, but its been done in te past afaik	16:01
roaksoax	xygnal: 2.4 will introduce dynamic worker up to 8 workers	16:01
xygnal	i'll look into adding a worker in systemd	16:03
xygnal	would that actally cause any difference scans or just UI response?	16:03
roaksoax	xygnal: it should spread the load more, we dont pin specific workers to specific services	16:03
roaksoax	at least not on 2.3	16:03
roaksoax	2.4 will have some worker separation between what each worker does	16:04
roaksoax	xygnal: also, i would be interested in knowing what data is being setn over the websocket	16:04
xygnal	I provided screen shots of that in the bug report	16:04
xygnal	but mike confirmed we cannot export the logs for web socket	16:04
roaksoax	xygnal: yea we can't but I mean, see how big the data being sent is	16:05
xygnal	it didnt look that big to me	16:05
xygnal	look at the attached screen shots :)	16:05
roaksoax	xygnal: have the bug link in hand ?	16:05
xygnal	if you want network traces, or database dumps, core dumps from memory kills, just tell us what to gather to get you deeper	16:06
roaksoax	what i'm more interested in knowing is what data is loading over the websocket and how big it is	16:06
roaksoax	for example, it could be loading data for the 500 machines	16:07
roaksoax	instead of loading data only for the machines that you can see	16:07
roaksoax	although that should have gotten fixed	16:07
xygnal	https://bugs.launchpad.net/maas/+bug/1744765	16:07
roaksoax	maybe the hardware testing is loading more dat	16:07
xygnal	it looks to be grabbing them 50 at a time	16:07
xygnal	from what i saw in the websocket calls	16:07
roaksoax	xygnal: https://i355451027.restricted.launchpadlibrarian.net/355451027/Screen%20Shot%202018-02-01%20at%202.42.35%20PM.png?token=PqMK4FCvf7Dfg9cp83g88PzFwD0K4hMd	16:08
roaksoax	xygnal: in thta screenshot, the above has a length of 75489	16:08
xygnal	what is in that request	16:08
xygnal	what is the payload to make it so big	16:08
xygnal	or rather, what CAN it be that would make it so big	16:09
roaksoax	xygnal: i would like to see the expanded output	16:09
roaksoax	to know for sure	16:09
roaksoax	xygnal: but in the one that's already expanded	16:09
xygnal	not sure how to do that since it was not letting me export	16:09
roaksoax	xygnal: you can see it seems to be for various machines	16:09
xygnal	web sockets does not support export to file	16:09
roaksoax	xygnal: right, that's fine screenshots are fine	16:10
xygnal	will see if i can copy it, i thought i had trouble getting it to LET me	16:10
roaksoax	but for example, in that 75k length one	16:10
xygnal	what you want a 10 page screen shot?	16:10
xygnal	;)	16:10
roaksoax	xygnal: the data will be organized per machine, so first things first would be to see for how many machines its showing that data	16:10
xygnal	it looked like it was 50 machines at a time, in those requests	16:11
xygnal	when iwas expanding and digging around	16:11
roaksoax	e.g. if it is showing for 590... even though the UI is only rendering 10, then that seems like a bug	16:11
roaksoax	xygnal: and tin the 75k one, per ecah machine, what data is being sent	16:11
roaksoax	so i would need to know those two things	16:11
xygnal	I will see what I can get	16:12
roaksoax	thanks	16:14
xygnal	btw, when thsi happens, we dont see all of the twistd3 processes going nuts at the same time.	16:15
xygnal	it's usually one or two processes that just grow grow grow in cpu and memory	16:15
xygnal	so i dont think threading is going to do much	16:15
xygnal	if you think our commission logs could be part of the problem, is there a quick database query you could propose to see just how much of that data we have?	16:16
roaksoax	i dont think the commissioning logs are the problem actually, since I believe you applied a fix in the websockets to not load the whole file	16:18
roaksoax	but rather, if virtual scrolling is working as expected, it should not be loading the data from the 500 machines	16:19
roaksoax	only from the ones you see rendered	16:19
roaksoax	xygnal: https://stackoverflow.com/questions/29953531/how-to-save-websocket-frames-in-chrome	16:20
xygnal	nice find :)	16:23
xygnal	we did apply a fix that as proposed, but we backed it out in prod after it did not have affect.	16:23
xygnal	fyi	16:24
roaksoax	yeah that was only for machine details	16:37
roaksoax	not really for the listing	16:37
=== frankban is now known as frankban\|afk
xygnal	roaksoax: any change increased threads may cause problems connecting to rack controllers? none of mine can connect now.	19:56
mup	Bug #1748538 opened: [2.4] Updating the boot source can cause duplicate entries in the boot source cache <MAAS:Triaged> <https://launchpad.net/bugs/1748538>	20:32
xygnal	roaksoax: soemthing caused all the rackd's to hang, so i restarted their service.. I tried that code in the inspector console, it fails. syntax is not right. not sure how to write proper syntax to do this.	20:45
mup	Bug #1748542 opened: [2.4, API] Pods create do not document the parameters needed per type <doc> <pod> <trivial> <MAAS:Triaged> <https://launchpad.net/bugs/1748542>	21:11
xygnal	roaksoax: bug updated with requested WS traces	22:20

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!