/srv/irclogs.ubuntu.com/2018/09/08/#ubuntu-server.txt

=== med_ is now known as medberry
LeMike	Hello there. I want to debug what is taking so long in mysqld. One server is 120 times slower than 2 other server (1 weaker, 1 stronger). So I strace mysql and see some pread but lots of io_getevents and io_submit. Is this normal? How do I find the device or process that causes the problem?	08:33
sarnold	strace is the wrong tool for the job	08:34
sarnold	this is a decent starting point http://www.brendangregg.com/linuxperf.html	08:35
LeMike	Thanks. This is the best help so far :D dtrace or where could I start? I just found out that iotop has a "Netlink error" but the memtester passed.	08:35
LeMike	sarnold: thanks I read this	08:35
sarnold	LeMike: linux's perf tool / or the iovisor collection of tools is your best bet, no dtrace on linux	08:36
sarnold	alright, time to bail, have fun :)	08:39
LeMike	sarnold: thanks. I already like perf a lot :D read a report and liked to investigate even more but somehow the problem is gone. I have a good guess that the hoster finally found a thing on his side and shamefully fixed it silently. I hunt this shit since yesterday.	08:52
RoyK	LeMike: try slow query log	11:29
RoyK	LeMike: and make sure you install sysstat first so you can monitor the system while running the query	11:29
RoyK	LeMike: to see what's slowing things down - usually cpu or i/o	11:29
LeMike	RoyK: Slow query log just shows the query and how long it took. With that I came to the conclusion to look into mySQL and IO. Thanks	14:33
LeMike	I continue some other day because after a limit of 4 hours I have to stop investigating (as the customer does not want to pay more xD )	14:35
_KaszpiR_	LeMike what's the specs of the servers, and any noticeable mysql setting set?	14:41
_KaszpiR_	if they fixed it silently then I suspect that if this is on virtual platform then memory baloon driver/high cpu steal/horrid disk io perf (cause array rebuilds etc)	14:43
RoyK	LeMike: you should see where the load is high easily with sar	14:46
LeMike	RoyK: I dont get it. This seems more like recording a timespan than a single process. And what if some buffer is full or some process is just waiting for stupid reasons?	17:53
RoyK	it's recording a timespan, yes	17:59
RoyK	but then again, using things like top or htop can tell you where the problem lies	17:59
RoyK	iotop perhaps too	18:00
RoyK	if you have a truckload of different services on the same server, all competing, you should be virtualising more	18:01
RoyK	if the machine is dedicated to mysql, it should be trivial to find the bottleneck	18:02
blackflow	if I may interject for a moment.... how would virt help here. if anything it'd just slow things down with extra overhead.	18:02
blackflow	LeMike: you can check for two things, iowait and disk latency	18:04
RoyK	blackflow: it'll slow things down a few percent, but it'll definetely make it easier to monitor	18:05
RoyK	blackflow: throw a ton of services on the same machine and "something" is slow, well, then everything's slow	18:05
blackflow	pretty sure it'd be the same thing under a VM. the host kernel has to juggle processes and serialize IO in both cases	18:07
blackflow	but I agree about monitoring, somewhat. should be possible with cgroups to some extent too.	18:07
RoyK	if vitualised, monitoring on the guest will show you the bottleneck	18:08
LeMike	blackflow: I let it run with "iostat -d 1" and got somewhat like 2348 kB write every few seconds. iowait is 0,85% but I guess I need to reboot. dunno how to check latency	18:13
blackflow	LeMike: https://www.kernel.org/doc/Documentation/block/stat.txt	18:13
blackflow	read ticks / read I/Os = average wait (latency) for read	18:14
blackflow	same for write.	18:14
LeMike	uh. can I truncate this file while the server runs? :) this would be nice	18:16
RoyK	LeMike: how much memory? what is the iops count? what sort of storage does it have?	18:16
RoyK	what about cpu load?	18:16
RoyK	swap usage?	18:16
RoyK	sar will give you paging statistics	18:17
RoyK	have you tuned mysql/mariadb in any way?	18:17
RoyK	how large is the database's dataset?	18:17
blackflow	what about the db itself... effective indexes? does it access tuples sequentially or indexed?	18:17
RoyK	lots and lots of questions ;)	18:17
blackflow	indeedy.	18:17
RoyK	mysql allows you to run "explain <query>" to see if indices are used or not	18:18
RoyK	quite useful thing	18:19
blackflow	LeMike: I'd recommend you activate some graph baesed monitoring on that server. A picture is worth a thousand eyes, and problems are easier to spot. I use Munin for that, infact, that's how I know about that latency measure. It's from Munin's diskstats plugin.	18:19
blackflow	*a thousand words (lol)	18:19
blackflow	I'm listening to TechSnap podcast and they're talking about "many eyes in open source security", so the fingers typed that :)	18:20
LeMike	4G ram (more than 3G free also in swap free), cpu idles, iotop does not work :/ the table is 276M	18:21
LeMike	not touched mysql config (once for more cache etc but reset it already)	18:21
blackflow	276M is a lot, esp if the DB has to sift through the rows sequentially because it's badly indexed. a single select can take seconds....	18:22
blackflow	(I'm assuming you meant bytes, because 270M rows would be gigabytes most likely...)	18:22
RoyK	and then - mysql doesn't scale too well	18:22
* RoyK only uses mysql at gunpoint		18:23
blackflow	yeah I don't know much about MySQL. PostgreSQL is my poison and I monitor all that with Munin - tuple counts, sequential vs indexed access, average query time, size, stuff...	18:23
blackflow	DB monitoring and statistics is a life saver.	18:24
RoyK	I'd choose postgres over mysql any day	18:24
blackflow	aye!	18:24
LeMike	(megabytes) Same DB and query on my laptop takes 1-2 seconds. server needs more than 60 seconds	18:24
RoyK	but then - some systems like wordpress don't work too well on postgres because those wp guys haven't understood a damn thing about abstraction	18:25
RoyK	LeMike: what sort of storage?	18:25
LeMike	I look up some monitoring and gather statistics. To bad I got perf.data here and was hoping to just read through and see the big time waster	18:25
blackflow	if MySQL is anything liek postgres, EXPLAIN ANALYZE <query> is a treasure trove of data bout the performance of that query. number of iterations, indexes used, loops, hashes, everything.	18:26
LeMike	RoyK: You mean InnoDB? Or filesystem?	18:26
RoyK	disk	18:26
blackflow	RoyK: I was so pleasantly suprised when I installed GitLab and saw it developed specifically for Postgres (and recommending Postgres over the dolphin). So rare to see something written for Postgres. Ohh ohh, and roundcube.	18:27
LeMike	RoyK: zfs	18:27
RoyK	LeMike: what sort of disks?	18:27
blackflow	oh wait, that changes things a bit. did you tune it for mysql?	18:27
blackflow	LeMike: iirc MySQL recordsize is recommended to be 16k. Postgres 8k. Now, depending on your use case and workload, that might mean a lot, or nothing at all (eg. my use case allows larger record size -- with postgres -- which benefits compressratio)	18:28
LeMike	RoyK: HDD. if that isn the answer then help pls	18:29
RoyK	LeMike: what sort of hdd and how many of them?	18:30
RoyK	LeMike: there's no straight answer here, but as blackflow says, the zfs blocksize for that dataset may be important	18:30
RoyK	LeMike: http://open-zfs.org/wiki/Performance_tuning#MySQL	18:32
LeMike	cant access shit with lshw. just one HDD. Got 50GB out of 3TB for this machine it seems.	18:34
LeMike	oh. 2x TOSHIBA DT01ACA3 in /proc/scsi/scsi. A raid? I dont see the block size. fdisk does not work and I dunno how	18:45
_KaszpiR_	zfs on one disk?	18:52
_KaszpiR_	show us 'lsblk' output	18:53
RoyK	LeMike: zfs get	18:54
tomreyn	OS + zfs + mysql on 4GB RAM just sounds wrong.	18:58
tomreyn	unless its a very light-weight DB workload you'll want more RAM.	18:59
tomreyn	disks dont matter for a database server unless you designed it wrong, provided too little RAM / didn't tune it for the RAM it has available.	19:00
RoyK	zfs isn't that heavy on memory - that's a myth - that is - unless you turn on dedup	19:02
RoyK	but then, that's praying to all known gods for trouble	19:02
LeMike	_KaszpiR_: got it here https://pastebin.com/p3BsHefU	19:04
RoyK	LeMike: pastebin output of "zpool status", "zfs list" and "zfs get all", please	19:04
LeMike	RoyK: no pool, no dataset, zfs get all has no output (I am in PVE it seems, I am not the hoster or have access to the host)	19:09
RoyK	then you're not running zfs	19:12
LeMike	df -Th said so	19:15
RoyK	pastebin that	19:17
RoyK	keep in mind you need "sudo" in front of those zfs/zpool commands	19:17
LeMike	I am groot!	19:18
LeMike	https://pastebin.com/tqEXHQs7	19:18
RoyK	that doesn't make sense	19:18
LeMike	Either way this feels like a direction that is not really giving me an answer what specific device or thingy is taking so long . Like "the buffer is too low" or "sync takes too long" or "Raid is broken" etc.	19:20
tomreyn	it's probably a container and ZFS is managed outside of it	19:24
LeMike	Okay. This is reaching 1k € almost. I give up, tell the customer and let him kick the hoster in his nuts. I think that my app has nothin to do with being so slow :P	19:24
tomreyn	did you run mysqltuner?	19:25
tomreyn	doing so (or knowing what needs to be done OTOH) is the minimum i'd expect from a contractor in charge of managing a DB	19:26
LeMike	thanks, those tests are all okay. and I go to sleep now. This took the whole day and I am totally not a SysOp, just a simple dev. But I learned things. Thanks a lot! :)	19:39
_KaszpiR_	1k EUR and answer is 'this is a vm provider issue, lol'	19:40
blackflow	RoyK: fwiw, zfs/zpool commands no longer need sudo for read only tasks (eg list, status, get, ...)	19:40
RoyK	blackflow: possibly	19:40
blackflow	no longer = as of 0.7.x and Bionic	19:40
_KaszpiR_	LeMike I suggest installing some monitoring tools (newrelic is the easiest but free tier limits view stats for 24h afair)	19:41
RoyK	still, zfs list should show the dataset(s) and zpool list should show the pools etc	19:41
RoyK	if they don't, something is messed up	19:41
_KaszpiR_	zpool list without sudo will return permission denied	19:41
RoyK	LeMike: have you upgraded something lately?	19:41
blackflow	_KaszpiR_: not on Bionic	19:42
RoyK	zfs can be a bit touchy in that respect	19:42
blackflow	also... newrelic? why on earth.... there's plenty of nice, FOSS-y and not SaaS monitoring things. Munin, Nagios, Cacti, ...	19:42
_KaszpiR_	I know, but that was just an idea of getting thing super fast and easy without the whole mess required to set up other tools	19:43
_KaszpiR_	munin would be quite fast, though	19:43
blackflow	yeah. Munin is great, thats what I use.	19:44
RoyK	nagios went out of style some years back ;)	19:44
blackflow	heh	19:44
blackflow	what is the monitoring tool of the year now, Zabbix?	19:44
RoyK	I use zabbix	19:44
_KaszpiR_	but most of those paid projects hav eone advantage - you get a shitload of integrated monitoring stuff from the box, without much extra configs	19:44
RoyK	and munin	19:44
RoyK	why not both	19:44
_KaszpiR_	collectd + graphite + grafana / munin (wanna migrate from munin, though)	19:45
RoyK	munin is very nice to start with	19:49
_KaszpiR_	it works very well for say... 15 hosts	19:49
_KaszpiR_	but thend to be lacking if you need to aggregate stuff	19:49
RoyK	works fine with a hundred machines too, but it doesn't scale too well above that	19:49
RoyK	munin is old and rather simple	19:50
RoyK	but very easy to setup	19:50
_KaszpiR_	and about zabbix, I'm not a fan of it (but used only 2.x AFAIR and nothign else, and it was pretty nightmare)	19:50
RoyK	_KaszpiR_: we're at 3.4 now - quite a bit has hppened since v2 ;)	19:51
_KaszpiR_	I remember that the logic behind zabbix was pretty 'wtf' and it required A LOT of extra customization to get any app working, which was major pain	19:54
RoyK	it still seems russian	19:55
RoyK	but you get used to ut ;)	19:55
RoyK	s/ut/it/	19:55
_KaszpiR_	something like 'send any metric to server' was just impossible, because it had to exist on zabbix or it was dropped	19:55
_KaszpiR_	graphite as it's own issues but I've found it easier to manage with those	19:56
RoyK	but then - what else is there to choose?	19:56
RoyK	graphite only draws graphs	19:56
RoyK	it doesn't have alerts and triggers etc	19:56
_KaszpiR_	yeah	19:57
_KaszpiR_	still you can query those and have different alerting system	19:57
_KaszpiR_	grafana has alerting now	19:57
_KaszpiR_	from other solutiuons - prometheus is pretty decent	19:58
RoyK	we chose zabbix at work - some 350 servers, 60/40 linux/windows	19:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!