MySQL Forums
Forum List  »  Replication

Re: InnoDB Cluster Distributed Recovery seemingly stuck forever
Posted by: Raphaël G
Date: May 05, 2021 01:54PM

Here is the extended output of mysqlsh Cluster.status() of one the secondary nodes stuck in recovery:

"10.0.20.22:3306": {
"address": "10.0.20.22:3306",
"applierWorkerThreads": 1,
"fenceSysVars": [
"read_only",
"super_read_only"
],
"memberId": "0afbe53e-229d-11eb-b975-fa163ed33e44",
"memberRole": "SECONDARY",
"memberState": "RECOVERING",
"mode": "R/O",
"readReplicas": {},
"recovery": {
"recoveryChannel": {
"applierStatus": "APPLIED_ALL",
"applierThreadState": "Waiting for an event from Coordinator",
"receiverStatus": "ON",
"receiverThreadState": "Waiting for master to send event",
"source": "10.0.20.20:3306"
},
"state": "ON"
},
"recoveryStatusText": "Distributed recovery in progress",
"role": "HA",
"status": "RECOVERING",
"version": "8.0.23"
}



It reports having no transaction to apply yet remains in recovery. The applier & recovery channels are doing nothing :

mysql> SELECT * FROM performance_schema.replication_applier_status_by_worker\G
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
WORKER_ID: 1
THREAD_ID: 25851
SERVICE_STATE: ON
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION:
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION:
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_recovery
WORKER_ID: 1
THREAD_ID: 25894
SERVICE_STATE: ON
LAST_ERROR_NUMBER: 0
LAST_ERROR_MESSAGE:
LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION:
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION:
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 0000-00-00 00:00:00.000000
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
2 rows in set (0.00 sec)



PROCESSLIST reports threads waiting for master :

mysql> show processlist;
+-------+-----------------+-------------------+------+---------+---------+--------------------------------------------------------+----------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-------+-----------------+-------------------+------+---------+---------+--------------------------------------------------------+----------------------------------+
| 6 | event_scheduler | localhost | NULL | Daemon | 3964 | Waiting on empty queue | NULL |
| 25801 | system user | | NULL | Connect | 698 | waiting for handler commit | Group replication applier module |
| 25805 | system user | | NULL | Query | 697 | Slave has read all relay log; waiting for more updates | NULL |
| 25806 | system user | | NULL | Connect | 2440393 | Waiting for an event from Coordinator | NULL |
| 25845 | system user | connecting host | NULL | Connect | 694 | Waiting for master to send event | NULL |
| 25846 | system user | | NULL | Query | 692 | Slave has read all relay log; waiting for more updates | NULL |
| 25847 | system user | | NULL | Connect | 694 | Waiting for an event from Coordinator | NULL |
+-------+-----------------+-------------------+------+---------+---------+--------------------------------------------------------+----------------------------------+



The secondary node's log reports being properly connected to the master for recovery but then nothing more :

2021-05-05T19:48:27.299501Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.23' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL.
2021-05-05T19:48:27.308696Z 14 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2021-05-05T19:48:29.589018Z 2 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address 10.0.20.20:3306.'
2021-05-05T19:48:30.589442Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor'
2021-05-05T19:48:30.589824Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to 10.0.20.22:3306, 10.0.20.20:3306 on view 16178027833824519:46.'
2021-05-05T19:48:30.620914Z 51 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='10.0.20.20', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='10.0.20.20', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2021-05-05T19:48:30.684776Z 52 [Warning] [MY-010897] [Repl] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2021-05-05T19:48:30.688387Z 52 [System] [MY-010562] [Repl] Slave I/O thread for channel 'group_replication_recovery': connected to master 'mysql_innodb_cluster_3@10.0.20.20:3306',replication started in log 'FIRST' at position 4

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: InnoDB Cluster Distributed Recovery seemingly stuck forever
436
May 05, 2021 01:54PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.