Multi-Primary Group Replication hangs on query too large
Hi,
i've set up 3 nodes, mysql 5.7.20 to act as multi-primary group replication. The nodes are geographically relocated in different places, they run under docker and communicate over vpn. No SSL involved.
I'm trying to dump a database from a single instance and recreate it to the group replication, via mysqldump. I use --compress --no-create-info --set-gtid-purged=OFF --add-locks --max-allowed-packet=2147483648 --quick --skip-opt --insert-ignore --skip-extended-insert -R , therefore producing a series of inserts, usually one row per query. At a certain point the insertion stops and the unique query running is stuck in "query end" state.
this is the table scheme interested:
CREATE TABLE `params` (
`id` bigint(20) unsigned NOT NULL,
`label` varchar(45) DEFAULT 'no_name',
`paramtype_id` int(10) unsigned NOT NULL,
`data` mediumblob,
`context_name` varchar(50) DEFAULT NULL,
`context_value` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `param_type_idx` (`paramtype_id`),
KEY `param_context_idx` (`context_name`,`context_value`)
) ENGINE=InnoDB DEFAULT CHARSET=latin
this is the configuration of the nodes:
[mysqld]
pid-file = /tmp/mysqld-2.pid
socket = /tmp/mysqld-2.sock
datadir = /tmp/mysqldata
log-error = /tmp/error-2.log
log-error-verbosity = 3
bind-address = 0.0.0.0
symbolic-links=0
port = 3307
report_host = 'mysql-2'
sql_mode = ''
gtid_mode = ON
enforce_gtid_consistency = ON
master_info_repository = TABLE
relay_log_info_repository = TABLE
binlog_checksum = NONE
log_slave_updates = ON
log_bin = binlog
binlog_format = ROW
transaction_write_set_extraction = XXHASH64
loose-group_replication_bootstrap_group = OFF
loose-group_replication_start_on_boot = OFF
loose-group_replication_ssl_mode = DISABLED
loose-group_replication_recovery_use_ssl = 0
loose-group_replication_group_name = "5AA8FEEE-3539-4308-9546-6D54F1949F46"
loose-group_replication_ip_whitelist = "10.0.0.0/16,0.0.0.0/16"
loose-group_replication_group_seeds = "10.0.1.101:33010,10.0.0.101:33010,10.0.3.104:33010"
loose-group_replication_single_primary_mode = OFF
loose-group_replication_enforce_update_everywhere_checks = ON
server_id = 2
loose-group_replication_local_address = "mysql-2:33010"
innodb_flush_log_at_trx_commit = 2
innodb_log_file_size="512MB"
innodb_flush_method = "O_DIRECT"
innodb_flush_log_at_timeout=1800
binlog_row_image=minimal
slave_parallel_workers=10
innodb_buffer_pool_size = 12884901888
slave_parallel_type = 'LOGICAL_CLOCK'
slave_preserve_commit_order=1
transaction_isolation='READ-COMMITTED'
max_allowed_packet = 4294967296
i isolated the problematic query, it is trying to insert a row of about 7.3Mb size. Trying to execute the query alone, after ~100 seconds the query is in "query end" state, the log repeat indefinitely the two messages:
2018-01-03T17:16:40.979825Z 0 [Warning] Plugin group_replication reported: 'The member with address mysql-3:33006 has already sent the stable set. Therefore discarding the second message.'
2018-01-03T17:16:52.913010Z 0 [Warning] Plugin group_replication reported: 'The member with address mysql-1:33006 has already sent the stable set. Therefore discarding the second message.'
also, the network is busy sending the same query over and over again. I let the query run for 8 hours, and it just didn't stop, always repeating itself to the network. The only way to halt the process is stopping the group_replication plugin.
I'm pretty sure i miss some parameter and i have to tune it better. during my tries, i discovered that increasing innodb_buffer_pool_size had some sort of effect for smaller queries. I increased it a lot but then i'm stuck again. Any suggestion?