tasker | my maas server is running out of disk space. I don't want to turn off the maas-proxy service, but I do want to clean it up. is there an "official" way to do this? or can I simply "rm -r /var/spool/maas-proxy/*"? | 03:02 |
---|---|---|
=== frankban|afk is now known as frankban | ||
xygnal | roaksoax: please give me more detail about your larger unimpacted environment where this issue is not seen. what kind of hardware capacity? what sysctl settings on the OS? | 15:11 |
roaksoax | xygnal: i don't have the details i hand as wasn't given that information. But 32GB of ram seems plenty to me. I have a feeling that this could be related to it running on vmware | 15:23 |
roaksoax | that said | 15:24 |
roaksoax | how UI intensive are you ? | 15:24 |
xygnal | we are very UI intensive. we have as many as probably 4-6 people refreshing at the same time impatiently, beacuse refresh takes a long time. | 15:55 |
xygnal | like i have implied before, if the first scan of nodes is slow. any further scans and reloads-of-scans just caues memory to go craaaaazy until its killed | 15:57 |
xygnal | and that source reason of the first scan being SLOW is likely why we see it trigger so often | 15:58 |
roaksoax | xygnal: so the devices list has bene testing with 8000+ nodes | 15:58 |
roaksoax | the machine listing with 2000+ "fake nodes" | 15:58 |
xygnal | and what data do these fake nodes have in them, that a real node would have, such as all of the logs of its commissions? | 15:59 |
xygnal | are those not, normal, regular things to have in a node history? a real node hstory? | 15:59 |
roaksoax | xygnal: the only difference is power paramaters | 15:59 |
roaksoax | but everything else is filled | 15:59 |
roaksoax | e.g. commissioning data, events, hardware testing data, etc | 16:00 |
xygnal | ah if they are not checking power then i would expect them to simply breeze by at high speeds | 16:00 |
xygnal | no hardware, no slowdowns | 16:00 |
xygnal | that doest seem very effective testing to me :/ | 16:00 |
roaksoax | xygnal: so why dont you try to start a couple workers ? | 16:00 |
roaksoax | xygnal: or at least an extra worker ? | 16:00 |
roaksoax | xygnal: in 2.3 that's all done in systemd | 16:00 |
xygnal | we can do that? i thought 4 was the limit | 16:00 |
roaksoax | xygnal: we dont support more owrkers on 2.3, but its been done in te past afaik | 16:01 |
roaksoax | xygnal: 2.4 will introduce dynamic worker up to 8 workers | 16:01 |
xygnal | i'll look into adding a worker in systemd | 16:03 |
xygnal | would that actally cause any difference scans or just UI response? | 16:03 |
roaksoax | xygnal: it should spread the load more, we dont pin specific workers to specific services | 16:03 |
roaksoax | at least not on 2.3 | 16:03 |
roaksoax | 2.4 will have some worker separation between what each worker does | 16:04 |
roaksoax | xygnal: also, i would be interested in knowing what data is being setn over the websocket | 16:04 |
xygnal | I provided screen shots of that in the bug report | 16:04 |
xygnal | but mike confirmed we cannot export the logs for web socket | 16:04 |
roaksoax | xygnal: yea we can't but I mean, see how big the data being sent is | 16:05 |
xygnal | it didnt look that big to me | 16:05 |
xygnal | look at the attached screen shots :) | 16:05 |
roaksoax | xygnal: have the bug link in hand ? | 16:05 |
xygnal | if you want network traces, or database dumps, core dumps from memory kills, just tell us what to gather to get you deeper | 16:06 |
roaksoax | what i'm more interested in knowing is what data is loading over the websocket and how big it is | 16:06 |
roaksoax | for example, it could be loading data for the 500 machines | 16:07 |
roaksoax | instead of loading data only for the machines that you can see | 16:07 |
roaksoax | although that should have gotten fixed | 16:07 |
xygnal | https://bugs.launchpad.net/maas/+bug/1744765 | 16:07 |
roaksoax | maybe the hardware testing is loading more dat | 16:07 |
xygnal | it looks to be grabbing them 50 at a time | 16:07 |
xygnal | from what i saw in the websocket calls | 16:07 |
roaksoax | xygnal: https://i355451027.restricted.launchpadlibrarian.net/355451027/Screen%20Shot%202018-02-01%20at%202.42.35%20PM.png?token=PqMK4FCvf7Dfg9cp83g88PzFwD0K4hMd | 16:08 |
roaksoax | xygnal: in thta screenshot, the above has a length of 75489 | 16:08 |
xygnal | what is *in* that request | 16:08 |
xygnal | what is the payload to make it so big | 16:08 |
xygnal | or rather, what CAN it be that would make it so big | 16:09 |
roaksoax | xygnal: i would like to see the expanded output | 16:09 |
roaksoax | to know for sure | 16:09 |
roaksoax | xygnal: but in the one that's already expanded | 16:09 |
xygnal | not sure how to do that since it was not letting me export | 16:09 |
roaksoax | xygnal: you can see it seems to be for various machines | 16:09 |
xygnal | web sockets does not support export to file | 16:09 |
roaksoax | xygnal: right, that's fine screenshots are fine | 16:10 |
xygnal | will see if i can copy it, i thought i had trouble getting it to LET me | 16:10 |
roaksoax | but for example, in that 75k length one | 16:10 |
xygnal | what you want a 10 page screen shot? | 16:10 |
xygnal | ;) | 16:10 |
roaksoax | xygnal: the data will be organized per machine, so first things first would be to see for how many machines its showing that data | 16:10 |
xygnal | it looked like it was 50 machines at a time, in those requests | 16:11 |
xygnal | when iwas expanding and digging around | 16:11 |
roaksoax | e.g. if it is showing for 590... even though the UI is only rendering 10, then that seems like a bug | 16:11 |
roaksoax | xygnal: and tin the 75k one, per ecah machine, what data is being sent | 16:11 |
roaksoax | so i would need to know those two things | 16:11 |
xygnal | I will see what I can get | 16:12 |
roaksoax | thanks | 16:14 |
xygnal | btw, when thsi happens, we dont see all of the twistd3 processes going nuts at the same time. | 16:15 |
xygnal | it's usually one or two processes that just grow grow grow in cpu and memory | 16:15 |
xygnal | so i dont think threading is going to do much | 16:15 |
xygnal | if you think our commission logs could be part of the problem, is there a quick database query you could propose to see just how much of that data we have? | 16:16 |
roaksoax | i dont think the commissioning logs are the problem actually, since I believe you applied a fix in the websockets to not load the whole file | 16:18 |
roaksoax | but rather, if virtual scrolling is working as expected, it should not be loading the data from the 500 machines | 16:19 |
roaksoax | only from the ones you see rendered | 16:19 |
roaksoax | xygnal: https://stackoverflow.com/questions/29953531/how-to-save-websocket-frames-in-chrome | 16:20 |
xygnal | nice find :) | 16:23 |
xygnal | we did apply a fix that as proposed, but we backed it out in prod after it did not have affect. | 16:23 |
xygnal | fyi | 16:24 |
roaksoax | yeah that was only for machine details | 16:37 |
roaksoax | not really for the listing | 16:37 |
=== frankban is now known as frankban|afk | ||
xygnal | roaksoax: any change increased threads may cause problems connecting to rack controllers? none of mine can connect now. | 19:56 |
mup | Bug #1748538 opened: [2.4] Updating the boot source can cause duplicate entries in the boot source cache <MAAS:Triaged> <https://launchpad.net/bugs/1748538> | 20:32 |
xygnal | roaksoax: soemthing caused all the rackd's to hang, so i restarted their service.. I tried that code in the inspector console, it fails. syntax is not right. not sure how to write proper syntax to do this. | 20:45 |
mup | Bug #1748542 opened: [2.4, API] Pods create do not document the parameters needed per type <doc> <pod> <trivial> <MAAS:Triaged> <https://launchpad.net/bugs/1748542> | 21:11 |
xygnal | roaksoax: bug updated with requested WS traces | 22:20 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!