MySQL Forums
Forum List  »  Replication

Issue reconnecting group replication servers after connection issue.
Posted by: Kirk Schnable
Date: April 08, 2019 10:04PM

Hello,

I administer a 3 server multi-primary MySQL Community Server (8.0.15) group replication cluster, and recently have run into an issue after a server was disconnected due to a network connection problem.

In the past, if a server got knocked out of the cluster and wouldn't rejoin, I would rebuild the server from scratch, starting with no data, and connect it to the group. After some time, it would replicate all of the database information and become a healthy member of the cluster. I have done this process a handful of times and have step by step documentation which was working until recently.

Now, it seems I can no longer do this process because the cluster has existed for awhile, and the binlogs no longer go back to the beginning of time, I get an error that the master is missing binlog data and I need to replicate the transactions from elsewhere. So, I decided to try importing a .SQL dump from a cluster member and resume the group replication afterward.

Unfortunately this is not working. I have completed these steps:
- Cleared all MySQL data starting from an empty server.
- Imported a recent .SQL dump from another server on the replication group.
- Attempted to connect to the cluster (CHANGE MASTER with the rpl_user and group_replication_recovery channel specified).
- START GROUP_REPLCIATION;

My thinking is it should try to resume replication from the last transaction GTID in the .SQL dump.

Now I am getting the following error connecting to the cluster:
2019-04-09T02:51:32.775763Z 40 [ERROR] [MY-013328] [Repl] Plugin group_replication reported: 'The certification information could not be set in this server: 'Certification information is too large for transmission.''
2019-04-09T02:51:32.775789Z 40 [ERROR] [MY-011624] [Repl] Plugin group_replication reported: 'Error when processing certification information in the recovery process'
2019-04-09T02:51:32.775798Z 40 [ERROR] [MY-011620] [Repl] Plugin group_replication reported: 'Fatal error during the recovery process of Group Replication. The server will leave the group.'


I have been unable to find any detailed documentation on what this error means. I feel like I am the only person on the entire Internet to encounter it, at least looking at Google results.

My educated guess was that it's related to max_allowed_packet, but I set this and slave_max_allowed_packet to 1GB and it still does the same thing.

How can I get group replication going again now from this point?

There is hardly anything in the logs on the working server, it just says it's removing this server from the group and no data on why.


Also -- if my process is bad, please advise how I should be doing this.

Frankly, most of the documentation out there on Group Replication seems to be very entry level "how to start your cluster" type instructions, and I question how many people are even using this in production environments at this point...

Let's say I buy another server and want to expand my cluster, what's the proper way to join a new server to my cluster now that joining an empty server with no data doesn't work anymore? Any guide out there on this process that I may be missing?


Thank you!
Kirk

Options: ReplyQuote


Subject
Views
Written By
Posted
Issue reconnecting group replication servers after connection issue.
1400
April 08, 2019 10:04PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.