I am desperate, having spent several weeks experimenting without a reliable solution.
Our hosting provider told us that hypervisors of our VMs experience network congestion. The result is short-term connection dropouts (usually between 1 to 10 seconds). We can resolve this instance - my question is how to protect MySQL clusters in this kind of environment.
I have tried to describe some of the configuration changes in my blog (it's quite long so here's just a link -
https://magicofsecurity.com/mysql8-cluster-and-networking-problems/ ). Some of the config. options we tried:
group_replication_autorejoin_tries
group_replication_recovery_retry_count
group_replication_member_expel_timeout
while it improved things, the cluster's stability was still very fragile.
I can see 2 possible directions:
1. improve resilience of the cluster by some configuration changes
2. give up on the cluster and replace it with replication and manual (or HAproxy triggered) change of the Master.
Any thoughts would be very much appreciated!
Dan