[03:02] <tasker> my maas server is running out of disk space. I don't want to turn off the  maas-proxy service, but I do want to clean it up. is there an "official" way to do this? or can I simply "rm -r /var/spool/maas-proxy/*"?
[15:11] <xygnal> roaksoax: please give me more detail about your larger unimpacted environment where this issue is not seen.  what kind of hardware capacity?  what sysctl settings on the OS?
[15:23] <roaksoax> xygnal: i don't have the details i hand as wasn't given that information. But 32GB of ram seems plenty to me. I have a feeling that this could be related to it running on vmware
[15:24] <roaksoax> that said
[15:24] <roaksoax> how UI intensive are you ?
[15:55] <xygnal> we are very UI intensive.  we have as many as probably 4-6 people refreshing at the same time impatiently, beacuse refresh takes a long time.
[15:57] <xygnal> like i have implied before, if the first scan of nodes is slow. any further scans and reloads-of-scans just caues memory to go craaaaazy until its killed
[15:58] <xygnal> and that source reason of the first scan being SLOW is likely why we see it trigger so often
[15:58] <roaksoax> xygnal: so the devices list has bene testing with 8000+ nodes
[15:58] <roaksoax> the machine listing with 2000+ "fake nodes"
[15:59] <xygnal> and what data do these fake nodes have in them, that a real node would have, such as all of the logs of its commissions?
[15:59] <xygnal> are those not, normal, regular things to have in a node history? a real node hstory?
[15:59] <roaksoax> xygnal: the only difference is power paramaters
[15:59] <roaksoax> but everything else is filled
[16:00] <roaksoax> e.g. commissioning data, events, hardware testing data, etc
[16:00] <xygnal> ah if they are not checking power then i would expect them to simply breeze by at high speeds
[16:00] <xygnal> no hardware, no slowdowns
[16:00] <xygnal> that doest seem very effective testing to me :/
[16:00] <roaksoax> xygnal: so why dont you try to start a couple workers ?
[16:00] <roaksoax> xygnal: or at least an extra worker ?
[16:00] <roaksoax> xygnal: in 2.3 that's all done in systemd
[16:00] <xygnal> we can do that? i thought 4 was the limit
[16:01] <roaksoax> xygnal: we dont support more owrkers on 2.3, but its been done in te past afaik
[16:01] <roaksoax> xygnal: 2.4 will introduce dynamic worker up to 8 workers
[16:03] <xygnal> i'll look into adding a worker in systemd
[16:03] <xygnal> would that actally  cause any difference scans or just UI response?
[16:03] <roaksoax> xygnal: it should spread the load more, we dont pin specific workers to specific services
[16:03] <roaksoax> at least not on 2.3
[16:04] <roaksoax> 2.4 will have some worker separation between what each worker does
[16:04] <roaksoax> xygnal: also, i would be interested in knowing what data is being setn over the websocket
[16:04] <xygnal> I provided screen shots of that in the bug report
[16:04] <xygnal> but mike confirmed we cannot export the logs for web socket
[16:05] <roaksoax> xygnal: yea we can't but I mean, see how big the data being sent is
[16:05] <xygnal> it didnt look that big to me
[16:05] <xygnal> look at the attached screen shots :)
[16:05] <roaksoax> xygnal: have the bug link in hand ?
[16:06] <xygnal> if you want network traces, or database dumps, core dumps from memory kills, just tell us what to gather to get you deeper
[16:06] <roaksoax> what i'm more interested in knowing is what data is loading over the websocket and how big it is
[16:07] <roaksoax> for example, it could be loading data for the 500 machines
[16:07] <roaksoax> instead of loading data only for the machines that you can see
[16:07] <roaksoax> although that should have gotten fixed
[16:07] <xygnal> https://bugs.launchpad.net/maas/+bug/1744765
[16:07] <roaksoax> maybe the hardware testing is loading more dat
[16:07] <xygnal> it looks to be grabbing them 50 at a time
[16:07] <xygnal> from what i saw in the websocket calls
[16:08] <roaksoax> xygnal: https://i355451027.restricted.launchpadlibrarian.net/355451027/Screen%20Shot%202018-02-01%20at%202.42.35%20PM.png?token=PqMK4FCvf7Dfg9cp83g88PzFwD0K4hMd
[16:08] <roaksoax> xygnal: in thta screenshot, the above has a length of 75489
[16:08] <xygnal> what is *in* that request
[16:08] <xygnal> what is the payload to make it so big
[16:09] <xygnal> or rather, what CAN it be that would make it so big
[16:09] <roaksoax> xygnal: i would like to see the expanded output
[16:09] <roaksoax> to know for sure
[16:09] <roaksoax> xygnal: but in the one that's already expanded
[16:09] <xygnal> not sure how to do that since it was not letting me export
[16:09] <roaksoax> xygnal: you can see it seems to be for various machines
[16:09] <xygnal> web sockets does not support export to file
[16:10] <roaksoax> xygnal: right, that's fine screenshots are fine
[16:10] <xygnal> will see if i can copy it, i thought i had trouble getting it to LET me
[16:10] <roaksoax> but for example, in that 75k length one
[16:10] <xygnal> what you want a 10 page screen shot?
[16:10] <xygnal> ;)
[16:10] <roaksoax> xygnal: the data will be organized per machine, so first things first would be to see for how many machines its showing that data
[16:11] <xygnal> it looked like it was 50 machines at a time, in those requests
[16:11] <xygnal> when iwas expanding and digging around
[16:11] <roaksoax> e.g. if it is showing for 590... even though the UI is only rendering 10, then that seems like a bug
[16:11] <roaksoax> xygnal: and tin the 75k one, per ecah machine, what data is being sent
[16:11] <roaksoax> so i would need to know those two things
[16:12] <xygnal> I will see what I can  get
[16:14] <roaksoax> thanks
[16:15] <xygnal> btw, when thsi happens, we dont see all of the twistd3 processes going nuts at the same time.
[16:15] <xygnal> it's usually one or two processes that just grow grow grow in cpu and memory
[16:15] <xygnal> so i dont think threading is going to do much
[16:16] <xygnal> if you think our commission logs could be part of the problem, is there a quick database query you could propose to see just how much of that data we have?
[16:18] <roaksoax> i dont think the commissioning logs are the problem actually, since I believe you applied a fix in the websockets to not load the whole file
[16:19] <roaksoax> but rather, if virtual scrolling is working as expected, it should not be loading the data from the 500 machines
[16:19] <roaksoax> only from the ones you see rendered
[16:20] <roaksoax> xygnal: https://stackoverflow.com/questions/29953531/how-to-save-websocket-frames-in-chrome
[16:23] <xygnal> nice find :)
[16:23] <xygnal> we did apply a fix that as proposed, but we backed it out in prod after it did not have affect.
[16:24] <xygnal> fyi
[16:37] <roaksoax> yeah that was only for machine details
[16:37] <roaksoax> not really for the listing
[19:56] <xygnal> roaksoax: any change increased threads may cause problems connecting to rack controllers? none of mine can connect now.
[20:32] <mup> Bug #1748538 opened: [2.4] Updating the boot source can cause duplicate entries in the boot source cache <MAAS:Triaged> <https://launchpad.net/bugs/1748538>
[20:45] <xygnal> roaksoax:  soemthing caused all the rackd's to hang, so i restarted their service..  I tried that code in the inspector console, it fails. syntax is not right.  not sure how to write proper syntax to do this.
[21:11] <mup> Bug #1748542 opened: [2.4, API] Pods create do not document the parameters needed per type <doc> <pod> <trivial> <MAAS:Triaged> <https://launchpad.net/bugs/1748542>
[22:20] <xygnal> roaksoax:  bug updated with requested WS traces