Troubleshooting innodb mysql-router containers as they are in a unhealthy state
Posted by: Faisal Basha
Date: June 14, 2024 10:30AM
Date: June 14, 2024 10:30AM
Hi,
I have been facing an issue with my innodb setup of 3 nodes. Each node runs two containers with image mysql/mysql-server:5.7 and mysql/mysql-router:latest.
These nodes are VM of an esxi which were restarted because of a crash. So when I restarted the containers I am not able to get the router container to a healthy state rather keeps restarting.
so I exec to the vantage-mysql container and mysqlsh to access the shell in order to administer the database. On the master node, Using the JS console I am not able to get the cluster state so here is what I always see:
MySQL 1XX.1XX.XX.XX:3306 ssl JS > var cluster = dba.getCluster('prodCluster_new');
Dba.getCluster: This function is not available through a session to an instance belonging to an unmanaged replication group (RuntimeError)
I tried to reboot the cluster but here is what I got:
MySQL 1XX.1XX.XX.XX:3306 ssl JS > var cluster = dba.rebootClusterFromCompleteOutage();
Dba.rebootClusterFromCompleteOutage: This function is not available through a session to an instance belonging to an unmanaged replication group (RuntimeError)
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | b0d2cc2d-2993-11ef-b92f-00505680c619 | 1XX.1XX.XX.XX | 3306 | ONLINE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.0004 sec)
So from the above there is only one member which is the master that is online
so from another member I ran the above code and here is what I got:
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > CHANGE MASTER TO MASTER_HOST='1XX.1XX.XX.XX', MASTER_USER='repl', MASTER_PASSWORD='password', MASTER_LOG_FILE='binlog.000442', MASTER_LOG_POS=295030;
Query OK, 0 rows affected, 1 warning (0.0136 sec)
Note (code 1760): Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > START GROUP_REPLICATION;
Query OK, 0 rows affected, 1 warning (6.0626 sec)
Warning (code 1681): 'group_replication_allow_local_disjoint_gtids_join' is deprecated and will be removed in a future release.
MySQL 192.168.XX.XX:3306 ssl SQL > SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 9cc4dc4f-2993-11ef-b5a9-005056806695 | 192.168.40.47 | 3306 | RECOVERING |
| group_replication_applier | a97b2419-2993-11ef-b86e-005056809797 | 192.168.40.46 | 3306 | RECOVERING |
| group_replication_applier | b0d2cc2d-2993-11ef-b92f-00505680c619 | 192.168.40.45 | 3306 | ONLINE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
the same for the other two nodes. So these nodes always say that they are in RECOVERING state whereas the master is always ONLINE
so when I run the command "show slave status" on the master node here is what I get:
Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it). |
So I check the server_ids which are all unique:
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > SHOW VARIABLES LIKE 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 45 |
+---------------+-------+
1 row in set (0.0029 sec)
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > SHOW VARIABLES LIKE 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 46 |
+---------------+-------+
1 row in set (0.0028 sec)
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > SHOW VARIABLES LIKE 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 47 |
+---------------+-------+
1 row in set (0.0033 sec)
For each node 1XX.1XX.XX.XX are different IP addresses, I didnt want to reveal the ip address here.
I have been facing an issue with my innodb setup of 3 nodes. Each node runs two containers with image mysql/mysql-server:5.7 and mysql/mysql-router:latest.
These nodes are VM of an esxi which were restarted because of a crash. So when I restarted the containers I am not able to get the router container to a healthy state rather keeps restarting.
so I exec to the vantage-mysql container and mysqlsh to access the shell in order to administer the database. On the master node, Using the JS console I am not able to get the cluster state so here is what I always see:
MySQL 1XX.1XX.XX.XX:3306 ssl JS > var cluster = dba.getCluster('prodCluster_new');
Dba.getCluster: This function is not available through a session to an instance belonging to an unmanaged replication group (RuntimeError)
I tried to reboot the cluster but here is what I got:
MySQL 1XX.1XX.XX.XX:3306 ssl JS > var cluster = dba.rebootClusterFromCompleteOutage();
Dba.rebootClusterFromCompleteOutage: This function is not available through a session to an instance belonging to an unmanaged replication group (RuntimeError)
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | b0d2cc2d-2993-11ef-b92f-00505680c619 | 1XX.1XX.XX.XX | 3306 | ONLINE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
1 row in set (0.0004 sec)
So from the above there is only one member which is the master that is online
so from another member I ran the above code and here is what I got:
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > CHANGE MASTER TO MASTER_HOST='1XX.1XX.XX.XX', MASTER_USER='repl', MASTER_PASSWORD='password', MASTER_LOG_FILE='binlog.000442', MASTER_LOG_POS=295030;
Query OK, 0 rows affected, 1 warning (0.0136 sec)
Note (code 1760): Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > START GROUP_REPLICATION;
Query OK, 0 rows affected, 1 warning (6.0626 sec)
Warning (code 1681): 'group_replication_allow_local_disjoint_gtids_join' is deprecated and will be removed in a future release.
MySQL 192.168.XX.XX:3306 ssl SQL > SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
| group_replication_applier | 9cc4dc4f-2993-11ef-b5a9-005056806695 | 192.168.40.47 | 3306 | RECOVERING |
| group_replication_applier | a97b2419-2993-11ef-b86e-005056809797 | 192.168.40.46 | 3306 | RECOVERING |
| group_replication_applier | b0d2cc2d-2993-11ef-b92f-00505680c619 | 192.168.40.45 | 3306 | ONLINE |
+---------------------------+--------------------------------------+---------------+-------------+--------------+
the same for the other two nodes. So these nodes always say that they are in RECOVERING state whereas the master is always ONLINE
so when I run the command "show slave status" on the master node here is what I get:
Fatal error: The slave I/O thread stops because master and slave have equal MySQL server ids; these ids must be different for replication to work (or the --replicate-same-server-id option must be used on slave but this does not always make sense; please check the manual before using it). |
So I check the server_ids which are all unique:
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > SHOW VARIABLES LIKE 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 45 |
+---------------+-------+
1 row in set (0.0029 sec)
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > SHOW VARIABLES LIKE 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 46 |
+---------------+-------+
1 row in set (0.0028 sec)
MySQL 1XX.1XX.XX.XX:3306 ssl SQL > SHOW VARIABLES LIKE 'server_id';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| server_id | 47 |
+---------------+-------+
1 row in set (0.0033 sec)
For each node 1XX.1XX.XX.XX are different IP addresses, I didnt want to reveal the ip address here.
Subject
Views
Written By
Posted
Troubleshooting innodb mysql-router containers as they are in a unhealthy state
245
June 14, 2024 10:30AM
Sorry, only registered users may post in this forum.
Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.