MySQL Forums
Forum List  »  Replication

Slow slaves (even when replicating nothing)
Posted by: Alexander Javoronkov
Date: March 10, 2017 07:46AM

Hi,

Right now I've got a star setup with one "central" server and 39 "local" servers (one for each of my company's branches). The "central" server is the only one used in production environment ("local" servers are supposed to be idle for now), the future plan is to move applications to "local" servers (after making sure that replication architecture is stable).

All servers are virtual, dedicated to MySQL (provided with ample resources - 4cpu/16gb for "central" and 2cpu/8gb for "local"s). All versions are 5.7.17, GTID/enforce_consistency, RowBasedReplication, log-bin, log-slave-updates; I don't need log-slave-updates on the "central" server, but this bug bites me:
https://bugs.mysql.com/bug.php?id=84973

There are 39 channels on the "central" (one for each "local" server) and 1 channel on each "local" server pointing to the "central" server.

Almost all of the production databases are ignored on the slave servers (using the "replicate-ignore-db"), the slaves ("local" servers) are essentially receiving binlogs from "central" server, but almost no transactions are executed on the "local" servers (we've got literally 1-2 insert/update/delete "central" transactions per day that have to be replicated to "local" servers).

The binlogs on the "central" are growing quite fast (approx. 100MB per hour), since we've got a lot of insert/update/delete activity during the day; again, they all should be ignored on the "local" servers for now.

* THE PROBLEM *

All "local" servers are having a hard time trying to catch up with the "central". As far as I understand, the problem has nothing to do with the IO_THREAD, I see the SQL_THREAD (or some other internal MySQL component) as the main culprit.

SHOW SLAVE STATUS for the "local" shows that relay log is processed in rather slow fashion: approximately 15 kb/s for Exec_Master_Log_Pos and 1 transaction in several seconds (Executed_Gtid_Set, string for "central").

CPU load is 1%, iowait is 0%, idle is 99%.

From my point of view (that's my very first attempt at replication), all "local" servers should run through these binlogs really fast (since all transactions belong to ignored databases); instead, MySQL employs a very tedious routine like "read transaction from the relay log, open its own binlog, decide to ignore transaction's contents, update gtid, write to binlog and so on. Is there some dangerous option like "update GTID/write to binlog once for every N transactions"?

* THE QUESTION *

Where should I start to look to make my "local" servers catch up with the "central"?

* THE CLUES *

Own binlog for "local" shows nothing but that (in cycle): SET @@SESSION.GTID_NEXT (gtid of the "central" server), SET TIMESTAMP, BEGIN, SET TIMESTAMP, COMMIT. I guess this "dummy sequence" is normal (since almost all production databases are ignored by the replicate-ignore-db directives).

PhpMyAdmin thinks that the "handler_read_rnd_next" counter looks suspiciously big and it's growing (approx. 800 per second).

I guess the only table that is constantly updating is "mysql.slave_relay_log_info".

* CONFIGS *

All configs are almost the same for "central" and "local" servers.

Common configs:

gtid-mode=ON
enforce-gtid-consistency=ON
relay-log-info-repository=TABLE
master-info-repository=TABLE
log-bin
log-slave-updates
max_binlog_size = 100M
expire_logs_days = 10
max_heap_table_size = 2048M
binlog-ignore-db=mysql
replicate-ignore-db=mysql
relay_log_space_limit=150000000
binlog-cache-size=32M
binlog-stmt-cache-size=32M

Having browsed through the documentation, I've implemented a bunch of evil options that are targeted at maximizing i/o performance (and risk of losing data in case of power failure, too) - that really helped to nullify the iowait state on the "local" servers:

sync-binlog=0
sync-relay-log=0
sync-relay-log-info=0
binlog-group-commit-sync-delay=60000
binlog-group-commit-sync-no-delay-count=100000
innodb_flush_log_at_trx_commit=0
innodb_flush_log_at_timeout=2700
innodb_flush_neighbors=0
innodb_flush_sync=0
innodb_purge_threads=1

"central" specifics:

key_buffer_size = 1024M
sort_buffer_size = 128M
read_buffer_size = 64M
join_buffer_size = 256M
table_open_cache = 2048
tmp_table_size=128M

Options: ReplyQuote


Subject
Views
Written By
Posted
Slow slaves (even when replicating nothing)
1526
March 10, 2017 07:46AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.