/srv/irclogs.ubuntu.com/2018/09/08/#ubuntu-server.txt

=== med_ is now known as medberry
LeMikeHello there. I want to debug what is taking so long in mysqld. One server is 120 times slower than 2 other server (1 weaker, 1 stronger). So I strace mysql and see some pread but lots of io_getevents and io_submit. Is this normal? How do I find the device or process that causes the problem?08:33
sarnoldstrace is the wrong tool for the job08:34
sarnoldthis is a decent starting point http://www.brendangregg.com/linuxperf.html08:35
LeMikeThanks. This is the best help so far :D dtrace or where could I start? I just found out that iotop has a "Netlink error" but the memtester passed.08:35
LeMikesarnold: thanks I read this08:35
sarnoldLeMike: linux's perf tool / or the iovisor collection of tools is your best bet, no dtrace on linux08:36
sarnoldalright, time to bail, have fun :)08:39
LeMikesarnold: thanks. I already like perf a lot :D read a report and liked to investigate even more but somehow the problem is gone. I have a good guess that the hoster finally found a thing on his side and shamefully fixed it silently. I hunt this shit since yesterday.08:52
RoyKLeMike: try slow query log11:29
RoyKLeMike: and make sure you install sysstat first so you can monitor the system while running the query11:29
RoyKLeMike: to see what's slowing things down - usually cpu or i/o11:29
LeMikeRoyK: Slow query log just shows the query and how long it took. With that I came to the conclusion to look into mySQL and IO. Thanks14:33
LeMikeI continue some other day because after a limit of 4 hours I have to stop investigating (as the customer does not want to pay more xD )14:35
_KaszpiR_LeMike what's the specs of the servers, and any noticeable mysql setting set?14:41
_KaszpiR_if they fixed it silently then I suspect that if this is on virtual platform then memory baloon driver/high cpu steal/horrid disk io perf (cause array rebuilds etc)14:43
RoyKLeMike: you should see where the load is high easily with sar14:46
LeMikeRoyK: I dont get it. This seems more like recording a timespan than a single process. And what if some buffer is full or some process is just waiting for stupid reasons?17:53
RoyKit's recording a timespan, yes17:59
RoyKbut then again, using things like top or htop can tell you where the problem lies17:59
RoyKiotop perhaps too18:00
RoyKif you have a truckload of different services on the same server, all competing, you should be virtualising more18:01
RoyKif the machine is dedicated to mysql, it should be trivial to find the bottleneck18:02
blackflowif I may interject for a moment.... how would virt help here. if anything it'd just slow things down with extra overhead.18:02
blackflowLeMike: you can check for two things, iowait and disk latency18:04
RoyKblackflow: it'll slow things down a few percent, but it'll definetely make it easier to monitor18:05
RoyKblackflow: throw a ton of services on the same machine and "something" is slow, well, then everything's slow18:05
blackflowpretty sure it'd be the same thing under a VM. the host kernel has to juggle processes and serialize IO in both cases18:07
blackflowbut I agree about monitoring, somewhat. should be possible with cgroups to some extent too.18:07
RoyKif vitualised, monitoring on the guest will show you the bottleneck18:08
LeMikeblackflow: I let it run with "iostat -d 1" and got somewhat like 2348 kB write every few seconds. iowait is 0,85% but I guess I need to reboot. dunno how to check latency18:13
blackflowLeMike: https://www.kernel.org/doc/Documentation/block/stat.txt18:13
blackflowread ticks / read I/Os = average wait (latency) for read18:14
blackflowsame for write.18:14
LeMikeuh. can I truncate this file while the server runs? :) this would be nice18:16
RoyKLeMike: how much memory? what is the iops count? what sort of storage does it have?18:16
RoyKwhat about cpu load?18:16
RoyKswap usage?18:16
RoyKsar will give you paging statistics18:17
RoyKhave you tuned mysql/mariadb in any way?18:17
RoyKhow large is the database's dataset?18:17
blackflowwhat about the db itself... effective indexes? does it access tuples sequentially or indexed?18:17
RoyKlots and lots of questions ;)18:17
blackflowindeedy.18:17
RoyKmysql allows you to run "explain <query>" to see if indices are used or not18:18
RoyKquite useful thing18:19
blackflowLeMike: I'd recommend you activate some graph baesed monitoring on that server. A picture is worth a thousand eyes, and problems are easier to spot. I use Munin for that, infact, that's how I know about that latency measure. It's from Munin's diskstats plugin.18:19
blackflow*a thousand words   (lol)18:19
blackflowI'm listening to TechSnap podcast and they're talking about "many eyes in open source security", so the fingers typed that :)18:20
LeMike4G ram (more than 3G free also in swap free), cpu idles, iotop does not work :/ the table is 276M18:21
LeMikenot touched mysql config (once for more cache etc but reset it already)18:21
blackflow276M is a lot, esp if the DB has to sift through the rows sequentially because it's badly indexed. a single select can take seconds....18:22
blackflow(I'm assuming you meant bytes, because 270M rows would be gigabytes most likely...)18:22
RoyKand then - mysql doesn't scale too well18:22
* RoyK only uses mysql at gunpoint18:23
blackflowyeah I don't know much about MySQL. PostgreSQL is my poison and I monitor all that with Munin - tuple counts, sequential vs indexed access, average query time, size, stuff...18:23
blackflowDB monitoring and statistics is a life saver.18:24
RoyKI'd choose postgres over mysql any day18:24
blackflowaye!18:24
LeMike(megabytes) Same DB and query on my laptop takes 1-2 seconds. server needs more than 60 seconds18:24
RoyKbut then - some systems like wordpress don't work too well on postgres because those wp guys haven't understood a damn thing about abstraction18:25
RoyKLeMike: what sort of storage?18:25
LeMikeI look up some monitoring and gather statistics. To bad I got perf.data here and was hoping to just read through and see the big time waster18:25
blackflowif MySQL is anything liek postgres,   EXPLAIN ANALYZE <query> is a treasure trove of data bout the performance of that query. number of iterations, indexes used, loops, hashes, everything.18:26
LeMikeRoyK: You mean InnoDB? Or filesystem?18:26
RoyKdisk18:26
blackflowRoyK: I was so pleasantly suprised when I installed GitLab and saw it developed specifically for Postgres (and recommending Postgres over the dolphin). So rare to see something written for Postgres.  Ohh ohh, and roundcube.18:27
LeMikeRoyK: zfs18:27
RoyKLeMike: what sort of disks?18:27
blackflowoh wait, that changes things a bit. did you tune it for mysql?18:27
blackflowLeMike: iirc MySQL recordsize is recommended to be 16k. Postgres 8k. Now, depending on your use case and workload, that might mean a lot, or nothing at all (eg. my use case allows larger record size -- with postgres -- which benefits compressratio)18:28
LeMikeRoyK: HDD. if that isn the answer then help pls18:29
RoyKLeMike: what sort of hdd and how many of them?18:30
RoyKLeMike: there's no straight answer here, but as blackflow says, the zfs blocksize for that dataset may be important18:30
RoyKLeMike: http://open-zfs.org/wiki/Performance_tuning#MySQL18:32
LeMikecant access shit with lshw. just one HDD. Got 50GB out of 3TB for this machine it seems.18:34
LeMikeoh. 2x TOSHIBA DT01ACA3 in /proc/scsi/scsi. A raid? I dont see the block size. fdisk does not work and I dunno how18:45
_KaszpiR_zfs on one disk?18:52
_KaszpiR_show us  'lsblk'  output18:53
RoyKLeMike: zfs get18:54
tomreynOS + zfs + mysql on 4GB RAM just sounds wrong.18:58
tomreynunless its a *very* light-weight DB workload you'll want more RAM.18:59
tomreyndisks dont matter for a database server unless you designed it wrong, provided too little RAM / didn't tune it for the RAM it has available.19:00
RoyKzfs isn't that heavy on memory - that's a myth - that is - unless you turn on dedup19:02
RoyKbut then, that's praying to all known gods for trouble19:02
LeMike_KaszpiR_: got it here https://pastebin.com/p3BsHefU19:04
RoyKLeMike: pastebin output of "zpool status", "zfs list" and "zfs get all", please19:04
LeMikeRoyK: no pool, no dataset, zfs get all has no output (I am in PVE it seems, I am not the hoster or have access to the host)19:09
RoyKthen you're not running zfs19:12
LeMikedf -Th said so19:15
RoyKpastebin that19:17
RoyKkeep in mind you need "sudo" in front of those zfs/zpool commands19:17
LeMikeI am groot!19:18
LeMikehttps://pastebin.com/tqEXHQs719:18
RoyKthat doesn't make sense19:18
LeMikeEither way this feels like a direction that is not really giving me an answer what specific device or thingy is taking so long . Like "the buffer is too low" or "sync takes too long" or "Raid is broken" etc.19:20
tomreynit's probably a container and ZFS is managed outside of it19:24
LeMikeOkay. This is reaching 1k € almost. I give up, tell the customer and let him kick the hoster in his nuts. I think that my app has nothin to do with being so slow :P19:24
tomreyndid you run mysqltuner?19:25
tomreyndoing so (or knowing what needs to be done OTOH) is the minimum i'd expect from a contractor in charge of managing a DB19:26
LeMikethanks, those tests are all okay. and I go to sleep now. This took the whole day and I am totally not a SysOp, just a simple dev. But I learned things. Thanks a lot! :)19:39
_KaszpiR_1k EUR and answer is 'this is a vm provider issue, lol'19:40
blackflowRoyK: fwiw, zfs/zpool commands no longer need sudo for read only tasks  (eg list, status, get, ...)19:40
RoyKblackflow: possibly19:40
blackflowno longer = as of 0.7.x and Bionic19:40
_KaszpiR_LeMike I suggest installing some monitoring tools (newrelic is the easiest but free tier limits view stats for 24h afair)19:41
RoyKstill, zfs list should show the dataset(s) and zpool list should show the pools etc19:41
RoyKif they don't, something is messed up19:41
_KaszpiR_zpool list without sudo will return permission denied19:41
RoyKLeMike: have you upgraded something lately?19:41
blackflow_KaszpiR_: not on Bionic19:42
RoyKzfs can be a bit touchy in that respect19:42
blackflowalso... newrelic? why on earth.... there's plenty of nice, FOSS-y and not SaaS monitoring things. Munin, Nagios, Cacti, ...19:42
_KaszpiR_I know, but that was just an idea of getting thing super fast and easy without the whole mess required to set up other tools19:43
_KaszpiR_munin would be quite fast, though19:43
blackflowyeah. Munin is great, thats what I use.19:44
RoyKnagios went out of style some years back ;)19:44
blackflowheh19:44
blackflowwhat is the monitoring tool of the year now, Zabbix?19:44
RoyKI use zabbix19:44
_KaszpiR_but most of those paid projects hav eone advantage - you get a shitload of integrated monitoring stuff from the box, without much extra configs19:44
RoyKand munin19:44
RoyKwhy not both19:44
_KaszpiR_collectd + graphite + grafana / munin (wanna migrate from munin, though)19:45
RoyKmunin is very nice to start with19:49
_KaszpiR_it works very well for say... 15 hosts19:49
_KaszpiR_but thend to be lacking if you need to aggregate stuff19:49
RoyKworks fine with a hundred machines too, but it doesn't scale too well above that19:49
RoyKmunin is old and rather simple19:50
RoyKbut very easy to setup19:50
_KaszpiR_and about zabbix, I'm not a fan of it (but used only 2.x AFAIR and nothign else, and it was pretty nightmare)19:50
RoyK_KaszpiR_: we're at 3.4 now - quite a bit has hppened since v2 ;)19:51
_KaszpiR_I remember that the logic behind zabbix was pretty 'wtf' and it required A LOT of extra customization to get any app working, which was major pain19:54
RoyKit still seems russian19:55
RoyKbut you get used to ut ;)19:55
RoyKs/ut/it/19:55
_KaszpiR_something like 'send any metric to server' was just impossible, because it had to exist on zabbix or it was dropped19:55
_KaszpiR_graphite as it's own issues but I've found it easier to manage with those19:56
RoyKbut then - what else is there to choose?19:56
RoyKgraphite only draws graphs19:56
RoyKit doesn't have alerts and triggers etc19:56
_KaszpiR_yeah19:57
_KaszpiR_still you can query those and have different alerting system19:57
_KaszpiR_grafana has alerting now19:57
_KaszpiR_from other solutiuons - prometheus is pretty decent19:58
RoyKwe chose zabbix at work - some 350 servers, 60/40 linux/windows19:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!