MySQL Operator not self-healing innodb cluster
Posted by: Keegan Bantom
Date: April 09, 2024 02:47AM
Date: April 09, 2024 02:47AM
Hello, we've been experimenting with the community edition of the MySQL Operator and deployed an InnoDB cluster. Everything went smoothly initially, but recently, two MySQL pods failed to start up, and the MySQL router couldn't connect to our MySQL instance. Consequently, our applications lost access to the database.
Further investigation revealed incomplete data replication among the pods.
Has anyone encountered a similar issue? Shouldn't the operator handle such operations/failures?
Additionally, any suggestions on resolving this problem would be helpful. Thank you in advance.
Snippet from our mysql-operator logs:
[2024-04-09 08:31:19,327] kopf.objects [INFO ] mysql: all={<MySQLPod mysql-1>, <MySQLPod mysql-2>, <MySQLPod mysql-0>} members={<MySQLPod mysql-0>, <MySQLPod mysql-2>, <MySQLPod mysql-1>} online=set() offline={<MySQLPod mysql-0>, <MySQLPod mysql-2>, <MySQLPod mysql-1>} unsure=set()
[2024-04-09 08:31:19,730] kopf.objects [INFO ] cluster probe: status=ClusterDiagStatus.OFFLINE online=[]
[2024-04-09 08:31:19,932] kopf.objects [INFO ] Cluster cannot be restored because there are unreachable pods: retrying after 5 seconds
[2024-04-09 08:31:21,994] kopf.objects [WARNING ] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'dummy': '2024-04-09T08:31:21.670928'}, None),)
[2024-04-09 08:31:22,113] kopf.objects [INFO ] Handler 'on_pod_event' succeeded.
[2024-04-09 08:31:23,402] kopf.objects [WARNING ] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'dummy': '2024-04-09T08:31:23.217513'}, None),)
[2024-04-09 08:31:23,523] kopf.objects [INFO ] Handler 'on_pod_event' succeeded.
[2024-04-09 08:31:23,642] kopf.objects [ERROR ] Handler 'on_pod_delete' failed temporarily: mysql busy. lock_owner=mysql-0 owner_context=n/a lock_created_at=2024-04-09T08:31:22.539191
[2024-04-09 08:31:23,805] kopf.objects [WARNING ] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'progress': {'on_pod_delete': {'started': '2024-02-18T09:29:44.888978', 'stopped': None, 'delayed': '2024-04-09T08:31:33.643064', 'purpose': 'delete', 'retries': 276494, 'success': False, 'failure': False, 'message': 'mysql busy. lock_owner=mysql-0 owner_context=n/a lock_created_at=2024-04-09T08:31:22.539191', 'subrefs': None}}}, None),)
[2024-04-09 08:31:23,923] kopf.objects [INFO ] Handler 'on_pod_event' succeeded.
[2024-04-09 08:31:25,026] kopf.objects [INFO ] mysql busy. lock_owner=mysql-0 owner_context=n/a lock_created_at=2024-04-09T08:31:22.539191: retrying after 10 seconds
[2024-04-09 08:31:25,877] kopf.objects [INFO ] Could not connect to mysql-2.mysql-instances.infra.svc.cluster.local:3306: error=MySQL Error (2003): mysqlsh.connect_dba: Can't connect to MySQL server on 'mysql-2.mysql-instances.infra.svc.cluster.local:3306' (113)
[2024-04-09 08:31:26,033] kopf.objects [INFO ] mysql-2.mysql-instances.infra.svc.cluster.local:3306: pod.phase=Running deleting=True
[2024-04-09 08:31:26,035] kopf.objects [INFO ] diag instance mysql-2 --> InstanceDiagStatus.OFFLINE quorum=None gtid_executed=None
[2024-04-09 08:31:29,376] kopf.objects [INFO ] Could not connect to mysql-0.mysql-instances.infra.svc.cluster.local:3306: error=MySQL Error (2003): mysqlsh.connect_dba: Can't connect to MySQL server on 'mysql-0.mysql-instances.infra.svc.cluster.local:3306' (113)
[2024-04-09 08:31:29,619] kopf.objects [INFO ] mysql-0.mysql-instances.infra.svc.cluster.local:3306: pod.phase=Running deleting=True
[2024-04-09 08:31:29,621] kopf.objects [INFO ] diag instance mysql-0 --> InstanceDiagStatus.OFFLINE quorum=None gtid_executed=None
[2024-04-09 08:31:29,982] kopf.objects [INFO ] get_cluster() error for mysql-1.mysql-instances.infra.svc.cluster.local:3306: error=Shell Error (51314): Dba.get_cluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active)
[2024-04-09 08:31:29,986] kopf.objects [INFO ] diag instance mysql-1 --> InstanceDiagStatus.OFFLINE quorum=None gtid_executed=09eb7e8d-821a-11ee-88eb-5a412442a166:1-21,
0a387b50-821a-11ee-89ad-3a757c649278:1-10,
476b478a-821a-11ee-9b2b-5a412442a166:1-1368775:2340682-2340727,
476bbbd0-821a-11ee-9b2b-5a412442a166:1-44,
9bf7ae20-af14-11ee-9d0f-9e5a8ec4b826:1-537496:1000478-1001192,
9bf7ba05-af14-11ee-9d0f-9e5a8ec4b826:1-57
[2024-04-09 08:31:29,988] kopf.objects [INFO ] mysql: all={<MySQLPod mysql-2>, <MySQLPod mysql-0>, <MySQLPod mysql-1>} members={<MySQLPod mysql-0>, <MySQLPod mysql-2>, <MySQLPod mysql-1>} online=set() offline={<MySQLPod mysql-0>, <MySQLPod mysql-2>, <MySQLPod mysql-1>} unsure=set()
[2024-04-09 08:31:30,275] kopf.objects [INFO ] cluster probe: status=ClusterDiagStatus.OFFLINE online=[]
[2024-04-09 08:31:30,277] kopf.objects [INFO ] ATTEMPTING CLUSTER REPAIR
[2024-04-09 08:31:30,421] kopf.objects [ERROR ] Handler 'on_pod_delete' failed temporarily: Cluster cannot be restored because there are unreachable pods
[2024-04-09 08:31:30,688] kopf.objects [WARNING ] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'progress': {'on_pod_delete': {'started': '2024-03-08T09:21:31.041514', 'stopped': None, 'delayed': '2024-04-09T08:31:35.421718', 'purpose': 'delete', 'retries': 255087, 'success': False, 'failure': False, 'message': 'Cluster cannot be restored because there are unreachable pods', 'subrefs': None}}}, None),)
[2024-04-09 08:31:30,809] kopf.objects [INFO ] Handler 'on_pod_event' succeeded.
[2024-04-09 08:31:34,119] kopf.objects [WARNING ] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'dummy': '2024-04-09T08:31:33.644117'}, None),)
[2024-04-09 08:31:34,237] kopf.objects [INFO ] Handler 'on_pod_event' succeeded.
[2024-04-09 08:31:35,431] kopf.objects [INFO ] mysql busy. lock_owner=mysql-2 owner_context=n/a lock_created_at=2024-04-09T08:31:34.541457: retrying after 10 seconds
[2024-04-09 08:31:35,458] kopf.objects [INFO ] get_cluster() error for mysql-1.mysql-instances.infra.svc.cluster.local:3306: error=Shell Error (51314): Dba.get_cluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active)
[2024-04-09 08:31:35,463] kopf.objects [INFO ] diag instance mysql-1 --> InstanceDiagStatus.OFFLINE quorum=None gtid_executed=09eb7e8d-821a-11ee-88eb-5a412442a166:1-21,
Further investigation revealed incomplete data replication among the pods.
Has anyone encountered a similar issue? Shouldn't the operator handle such operations/failures?
Additionally, any suggestions on resolving this problem would be helpful. Thank you in advance.
Snippet from our mysql-operator logs:
[2024-04-09 08:31:19,327] kopf.objects [INFO ] mysql: all={<MySQLPod mysql-1>, <MySQLPod mysql-2>, <MySQLPod mysql-0>} members={<MySQLPod mysql-0>, <MySQLPod mysql-2>, <MySQLPod mysql-1>} online=set() offline={<MySQLPod mysql-0>, <MySQLPod mysql-2>, <MySQLPod mysql-1>} unsure=set()
[2024-04-09 08:31:19,730] kopf.objects [INFO ] cluster probe: status=ClusterDiagStatus.OFFLINE online=[]
[2024-04-09 08:31:19,932] kopf.objects [INFO ] Cluster cannot be restored because there are unreachable pods: retrying after 5 seconds
[2024-04-09 08:31:21,994] kopf.objects [WARNING ] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'dummy': '2024-04-09T08:31:21.670928'}, None),)
[2024-04-09 08:31:22,113] kopf.objects [INFO ] Handler 'on_pod_event' succeeded.
[2024-04-09 08:31:23,402] kopf.objects [WARNING ] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'dummy': '2024-04-09T08:31:23.217513'}, None),)
[2024-04-09 08:31:23,523] kopf.objects [INFO ] Handler 'on_pod_event' succeeded.
[2024-04-09 08:31:23,642] kopf.objects [ERROR ] Handler 'on_pod_delete' failed temporarily: mysql busy. lock_owner=mysql-0 owner_context=n/a lock_created_at=2024-04-09T08:31:22.539191
[2024-04-09 08:31:23,805] kopf.objects [WARNING ] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'progress': {'on_pod_delete': {'started': '2024-02-18T09:29:44.888978', 'stopped': None, 'delayed': '2024-04-09T08:31:33.643064', 'purpose': 'delete', 'retries': 276494, 'success': False, 'failure': False, 'message': 'mysql busy. lock_owner=mysql-0 owner_context=n/a lock_created_at=2024-04-09T08:31:22.539191', 'subrefs': None}}}, None),)
[2024-04-09 08:31:23,923] kopf.objects [INFO ] Handler 'on_pod_event' succeeded.
[2024-04-09 08:31:25,026] kopf.objects [INFO ] mysql busy. lock_owner=mysql-0 owner_context=n/a lock_created_at=2024-04-09T08:31:22.539191: retrying after 10 seconds
[2024-04-09 08:31:25,877] kopf.objects [INFO ] Could not connect to mysql-2.mysql-instances.infra.svc.cluster.local:3306: error=MySQL Error (2003): mysqlsh.connect_dba: Can't connect to MySQL server on 'mysql-2.mysql-instances.infra.svc.cluster.local:3306' (113)
[2024-04-09 08:31:26,033] kopf.objects [INFO ] mysql-2.mysql-instances.infra.svc.cluster.local:3306: pod.phase=Running deleting=True
[2024-04-09 08:31:26,035] kopf.objects [INFO ] diag instance mysql-2 --> InstanceDiagStatus.OFFLINE quorum=None gtid_executed=None
[2024-04-09 08:31:29,376] kopf.objects [INFO ] Could not connect to mysql-0.mysql-instances.infra.svc.cluster.local:3306: error=MySQL Error (2003): mysqlsh.connect_dba: Can't connect to MySQL server on 'mysql-0.mysql-instances.infra.svc.cluster.local:3306' (113)
[2024-04-09 08:31:29,619] kopf.objects [INFO ] mysql-0.mysql-instances.infra.svc.cluster.local:3306: pod.phase=Running deleting=True
[2024-04-09 08:31:29,621] kopf.objects [INFO ] diag instance mysql-0 --> InstanceDiagStatus.OFFLINE quorum=None gtid_executed=None
[2024-04-09 08:31:29,982] kopf.objects [INFO ] get_cluster() error for mysql-1.mysql-instances.infra.svc.cluster.local:3306: error=Shell Error (51314): Dba.get_cluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active)
[2024-04-09 08:31:29,986] kopf.objects [INFO ] diag instance mysql-1 --> InstanceDiagStatus.OFFLINE quorum=None gtid_executed=09eb7e8d-821a-11ee-88eb-5a412442a166:1-21,
0a387b50-821a-11ee-89ad-3a757c649278:1-10,
476b478a-821a-11ee-9b2b-5a412442a166:1-1368775:2340682-2340727,
476bbbd0-821a-11ee-9b2b-5a412442a166:1-44,
9bf7ae20-af14-11ee-9d0f-9e5a8ec4b826:1-537496:1000478-1001192,
9bf7ba05-af14-11ee-9d0f-9e5a8ec4b826:1-57
[2024-04-09 08:31:29,988] kopf.objects [INFO ] mysql: all={<MySQLPod mysql-2>, <MySQLPod mysql-0>, <MySQLPod mysql-1>} members={<MySQLPod mysql-0>, <MySQLPod mysql-2>, <MySQLPod mysql-1>} online=set() offline={<MySQLPod mysql-0>, <MySQLPod mysql-2>, <MySQLPod mysql-1>} unsure=set()
[2024-04-09 08:31:30,275] kopf.objects [INFO ] cluster probe: status=ClusterDiagStatus.OFFLINE online=[]
[2024-04-09 08:31:30,277] kopf.objects [INFO ] ATTEMPTING CLUSTER REPAIR
[2024-04-09 08:31:30,421] kopf.objects [ERROR ] Handler 'on_pod_delete' failed temporarily: Cluster cannot be restored because there are unreachable pods
[2024-04-09 08:31:30,688] kopf.objects [WARNING ] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'progress': {'on_pod_delete': {'started': '2024-03-08T09:21:31.041514', 'stopped': None, 'delayed': '2024-04-09T08:31:35.421718', 'purpose': 'delete', 'retries': 255087, 'success': False, 'failure': False, 'message': 'Cluster cannot be restored because there are unreachable pods', 'subrefs': None}}}, None),)
[2024-04-09 08:31:30,809] kopf.objects [INFO ] Handler 'on_pod_event' succeeded.
[2024-04-09 08:31:34,119] kopf.objects [WARNING ] Patching failed with inconsistencies: (('remove', ('status', 'kopf'), {'dummy': '2024-04-09T08:31:33.644117'}, None),)
[2024-04-09 08:31:34,237] kopf.objects [INFO ] Handler 'on_pod_event' succeeded.
[2024-04-09 08:31:35,431] kopf.objects [INFO ] mysql busy. lock_owner=mysql-2 owner_context=n/a lock_created_at=2024-04-09T08:31:34.541457: retrying after 10 seconds
[2024-04-09 08:31:35,458] kopf.objects [INFO ] get_cluster() error for mysql-1.mysql-instances.infra.svc.cluster.local:3306: error=Shell Error (51314): Dba.get_cluster: This function is not available through a session to a standalone instance (metadata exists, instance belongs to that metadata, but GR is not active)
[2024-04-09 08:31:35,463] kopf.objects [INFO ] diag instance mysql-1 --> InstanceDiagStatus.OFFLINE quorum=None gtid_executed=09eb7e8d-821a-11ee-88eb-5a412442a166:1-21,
Subject
Written By
Posted
Sorry, only registered users may post in this forum.
Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.