I was reading this article
https://dev.mysql.com/blog-archive/automatic-member-fencing-with-offline_mode-in-group-replication/ about member fencing in MySQL, and while I tried this, I bumped into something unexpected from a first glance.
I have a cluster with 3 instances configured in a group replication InnoDB cluster, with exit action to be ABORT_SERVER. I create a network isolation for one of the nodes. Following I describe what I see in a chronological order:
1. I make an insert from the primary, which hangs and I cancel after a while.
2. After a while, I see from the logs of the primary that the isolated member has been removed from the group.
3. I insert in the primary a new entry in my table.
4. I read from the secondary the same table, and the read operation is executed but my data are stale!
5. After a while the secondary reports that it cannot leave the group gracefully and a SHUTDOWN signal is send to the server.
The exit state action is ABORT_SERVER. I understand that this action is supposed to run after some time, in which the replica has established that it is in the minority partition, and needs to exit from the group.
However if the time it takes for the replica to understand that it should fence is longer that the time for the primary to remove that replica [which is the above case], then this is **not** an actual fencing, and I will have a read-after-right discrepancy in this in-between time.
Is there a way or a set of settings to ensure actual member fencing in the group replication scenario?
I am happy to post more details and logs for my scenario if this helps.