MySQL Forums
Forum List  »  Replication

MySQL5.7 group replication Mulit-Master read only after one server corrupted
Posted by: chliny chen
Date: January 16, 2017 10:56AM

I do the follow steps, and then I find the problem as title says:
1 Make a mulit-master mysql group replication with 3 servers(A、B、C), set group_replication_start_on_boot=on
2 Keep writing datas to server A.(in my cases, I load a file that mysqldump before on the server A's mysql)
3 Physically power off server B, so that we can make the mysql data on server B corrupted. And then reboot server B.
4 After server B setup, restart the mysql.
5 After mysql started

on the server B:
mysql will try to join the group because group_replication_start_on_boot is on. Query "SELECT * FROM performance_schema.replication_group_members", it shows srever B "RECOVERING" first, but after a while , it shows "ERROR".
Here is the error log on server B:
2017-01-16T15:29:34.699725Z 0 [Note] Plugin group_replication reported: 'Starting group replication recovery with view_id 14845797131005440:9'
2017-01-16T15:30:45.845109Z 7 [ERROR] Error in Log_event::read_log_event(): 'Event too small', data_len: 0, event_type: 0
2017-01-16T15:30:45.845127Z 7 [ERROR] Error reading relay log event for channel 'group_replication_applier': slave SQL thread aborted because of I/O error
2017-01-16T15:30:45.845143Z 7 [ERROR] Slave SQL for channel 'group_replication_applier': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594
2017-01-16T15:30:45.845160Z 7 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'FIRST' position 315
2017-01-16T15:30:45.845175Z 7 [ERROR] Plugin group_replication reported: 'The applier thread execution was aborted. Unable to process more transactions, this member will now leave the group.'
2017-01-16T15:30:45.845195Z 4 [ERROR] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.'
2017-01-16T15:30:45.845236Z 4 [ERROR] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
2017-01-16T15:30:45.845307Z 11 [ERROR] Plugin group_replication reported: 'Can't evaluate the group replication applier execution status. Group replication recovery will shutdown to avoid data corruption.'
2017-01-16T15:30:45.845324Z 11 [ERROR] Plugin group_replication reported: 'Fatal error during the Recovery process of Group Replication. The server will leave the group.'
2017-01-16T15:30:45.845339Z 11 [Warning] Plugin group_replication reported: 'Skipping leave operation: concurrent attempt to leave the group is on-going.'
2017-01-16T15:30:45.845379Z 4 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
2017-01-16T15:30:45.847937Z 0 [Note] Plugin group_replication reported: 'getstart group_id 4317e324'
2017-01-16T15:30:48.853138Z 0 [Note] Plugin group_replication reported: 'state 4330 action xa_terminate'
2017-01-16T15:30:48.853317Z 0 [Note] Plugin group_replication reported: 'new state x_start'
2017-01-16T15:30:48.853327Z 0 [Note] Plugin group_replication reported: 'state 4257 action xa_exit'
2017-01-16T15:30:48.853368Z 0 [Note] Plugin group_replication reported: 'Exiting xcom thread'
2017-01-16T15:30:48.853375Z 0 [Note] Plugin group_replication reported: 'new state x_start'

on the server A and C:
Query "SELECT * FROM performance_schema.replication_group_members", shows server B "RECOVERING" first, and then disapered after a while.
server A and C mysql' error log shows no errors;
When server B disapered from the group, write querys on server A and server C will be holded (maybe the title is wrong, mysql didn't throw any read-only errors, it just holded the write querys without any return).

However, I can't always reproduce the problem, I redo setps above 10 times, and 8 of it reproduce the problem, group is read-only, and 2 of it, server B's status is also "ERROR", but the group can be writed as usual.

I wander that "The group holding write querys after one server corrupted" is a feature or a bug in MGR? And Is there any way to workaroud it, making the groups can be writed.

Thank you for your help!

Options: ReplyQuote

Written By
MySQL5.7 group replication Mulit-Master read only after one server corrupted
January 16, 2017 10:56AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.