/srv/irclogs.ubuntu.com/2018/02/09/#maas.txt

taskermy maas server is running out of disk space. I don't want to turn off the  maas-proxy service, but I do want to clean it up. is there an "official" way to do this? or can I simply "rm -r /var/spool/maas-proxy/*"?03:02
=== frankban|afk is now known as frankban
xygnalroaksoax: please give me more detail about your larger unimpacted environment where this issue is not seen.  what kind of hardware capacity?  what sysctl settings on the OS?15:11
roaksoaxxygnal: i don't have the details i hand as wasn't given that information. But 32GB of ram seems plenty to me. I have a feeling that this could be related to it running on vmware15:23
roaksoaxthat said15:24
roaksoaxhow UI intensive are you ?15:24
xygnalwe are very UI intensive.  we have as many as probably 4-6 people refreshing at the same time impatiently, beacuse refresh takes a long time.15:55
xygnallike i have implied before, if the first scan of nodes is slow. any further scans and reloads-of-scans just caues memory to go craaaaazy until its killed15:57
xygnaland that source reason of the first scan being SLOW is likely why we see it trigger so often15:58
roaksoaxxygnal: so the devices list has bene testing with 8000+ nodes15:58
roaksoaxthe machine listing with 2000+ "fake nodes"15:58
xygnaland what data do these fake nodes have in them, that a real node would have, such as all of the logs of its commissions?15:59
xygnalare those not, normal, regular things to have in a node history? a real node hstory?15:59
roaksoaxxygnal: the only difference is power paramaters15:59
roaksoaxbut everything else is filled15:59
roaksoaxe.g. commissioning data, events, hardware testing data, etc16:00
xygnalah if they are not checking power then i would expect them to simply breeze by at high speeds16:00
xygnalno hardware, no slowdowns16:00
xygnalthat doest seem very effective testing to me :/16:00
roaksoaxxygnal: so why dont you try to start a couple workers ?16:00
roaksoaxxygnal: or at least an extra worker ?16:00
roaksoaxxygnal: in 2.3 that's all done in systemd16:00
xygnalwe can do that? i thought 4 was the limit16:00
roaksoaxxygnal: we dont support more owrkers on 2.3, but its been done in te past afaik16:01
roaksoaxxygnal: 2.4 will introduce dynamic worker up to 8 workers16:01
xygnali'll look into adding a worker in systemd16:03
xygnalwould that actally  cause any difference scans or just UI response?16:03
roaksoaxxygnal: it should spread the load more, we dont pin specific workers to specific services16:03
roaksoaxat least not on 2.316:03
roaksoax2.4 will have some worker separation between what each worker does16:04
roaksoaxxygnal: also, i would be interested in knowing what data is being setn over the websocket16:04
xygnalI provided screen shots of that in the bug report16:04
xygnalbut mike confirmed we cannot export the logs for web socket16:04
roaksoaxxygnal: yea we can't but I mean, see how big the data being sent is16:05
xygnalit didnt look that big to me16:05
xygnallook at the attached screen shots :)16:05
roaksoaxxygnal: have the bug link in hand ?16:05
xygnalif you want network traces, or database dumps, core dumps from memory kills, just tell us what to gather to get you deeper16:06
roaksoaxwhat i'm more interested in knowing is what data is loading over the websocket and how big it is16:06
roaksoaxfor example, it could be loading data for the 500 machines16:07
roaksoaxinstead of loading data only for the machines that you can see16:07
roaksoaxalthough that should have gotten fixed16:07
xygnalhttps://bugs.launchpad.net/maas/+bug/174476516:07
roaksoaxmaybe the hardware testing is loading more dat16:07
xygnalit looks to be grabbing them 50 at a time16:07
xygnalfrom what i saw in the websocket calls16:07
roaksoaxxygnal: https://i355451027.restricted.launchpadlibrarian.net/355451027/Screen%20Shot%202018-02-01%20at%202.42.35%20PM.png?token=PqMK4FCvf7Dfg9cp83g88PzFwD0K4hMd16:08
roaksoaxxygnal: in thta screenshot, the above has a length of 7548916:08
xygnalwhat is *in* that request16:08
xygnalwhat is the payload to make it so big16:08
xygnalor rather, what CAN it be that would make it so big16:09
roaksoaxxygnal: i would like to see the expanded output16:09
roaksoaxto know for sure16:09
roaksoaxxygnal: but in the one that's already expanded16:09
xygnalnot sure how to do that since it was not letting me export16:09
roaksoaxxygnal: you can see it seems to be for various machines16:09
xygnalweb sockets does not support export to file16:09
roaksoaxxygnal: right, that's fine screenshots are fine16:10
xygnalwill see if i can copy it, i thought i had trouble getting it to LET me16:10
roaksoaxbut for example, in that 75k length one16:10
xygnalwhat you want a 10 page screen shot?16:10
xygnal;)16:10
roaksoaxxygnal: the data will be organized per machine, so first things first would be to see for how many machines its showing that data16:10
xygnalit looked like it was 50 machines at a time, in those requests16:11
xygnalwhen iwas expanding and digging around16:11
roaksoaxe.g. if it is showing for 590... even though the UI is only rendering 10, then that seems like a bug16:11
roaksoaxxygnal: and tin the 75k one, per ecah machine, what data is being sent16:11
roaksoaxso i would need to know those two things16:11
xygnalI will see what I can  get16:12
roaksoaxthanks16:14
xygnalbtw, when thsi happens, we dont see all of the twistd3 processes going nuts at the same time.16:15
xygnalit's usually one or two processes that just grow grow grow in cpu and memory16:15
xygnalso i dont think threading is going to do much16:15
xygnalif you think our commission logs could be part of the problem, is there a quick database query you could propose to see just how much of that data we have?16:16
roaksoaxi dont think the commissioning logs are the problem actually, since I believe you applied a fix in the websockets to not load the whole file16:18
roaksoaxbut rather, if virtual scrolling is working as expected, it should not be loading the data from the 500 machines16:19
roaksoaxonly from the ones you see rendered16:19
roaksoaxxygnal: https://stackoverflow.com/questions/29953531/how-to-save-websocket-frames-in-chrome16:20
xygnalnice find :)16:23
xygnalwe did apply a fix that as proposed, but we backed it out in prod after it did not have affect.16:23
xygnalfyi16:24
roaksoaxyeah that was only for machine details16:37
roaksoaxnot really for the listing16:37
=== frankban is now known as frankban|afk
xygnalroaksoax: any change increased threads may cause problems connecting to rack controllers? none of mine can connect now.19:56
mupBug #1748538 opened: [2.4] Updating the boot source can cause duplicate entries in the boot source cache <MAAS:Triaged> <https://launchpad.net/bugs/1748538>20:32
xygnalroaksoax:  soemthing caused all the rackd's to hang, so i restarted their service..  I tried that code in the inspector console, it fails. syntax is not right.  not sure how to write proper syntax to do this.20:45
mupBug #1748542 opened: [2.4, API] Pods create do not document the parameters needed per type <doc> <pod> <trivial> <MAAS:Triaged> <https://launchpad.net/bugs/1748542>21:11
xygnalroaksoax:  bug updated with requested WS traces22:20

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!