MySQL Forums
Forum List  »  InnoDB clusters

exitStateAction is never called
Posted by: IGG t
Date: November 23, 2021 10:28AM

I have a three node cluster (n1, n2, n3). Each Database node is on a separate VM (same hypervisor).

It is set up with the following options on each node:

- "autoRejoinTries", 1
- "exitStateAction", "OFFLINE_MODE"
- "group_replication_member_expel_timeout" = 30

n1 is the Primary.

My understanding from the documentation is that should a node (n3) disappear, e.g. the network between n3 and the other nodes goes down, the following should happen:

5 seconds passes
A 'suspicion' is raised, the missing node is marked as "unreachable" by n1 and n2
30 seconds passes (group_replication_member_expel_timeout)
n3 is expelled from the cluster by n1 and n2
n3 tries to re-join the cluster once (autoRejoinTries) and fails
exitStateAction is called by n3 putting it into offline mode

But this doesn't seem to happen, in fact exitStateAction never seems to get called.

Is this right? As it means that in the event of a problem I am left with a single node serving up stale data to anyone who can connect to it until such as time as the issue is fixed and it re-joins the cluster.

The only way I can get exitStateSction to trigger is to set "autoRejoinTries" to 0.
Remove the network interface for n3, thus simluating a netwok outage between nodes.
Wait for the required 30 seconds to pass.
Reconnect the network interface.
Stop/start Group Replication.

Then and only then does the exitStateAction fire and put the node into OFFLINE_MODE.

Unfortunately the cluster has now allowed it to rejoin, meaning the routers will try and send connections there, and 50% of my non-admin connections are now getting failure errors.

This doesn't seem right to me. Is it me that has misunderstood, or is this actually not working properly?

n.b. I raised a BUG on this (105576) where it was suggested that it was the wrong "type" of network failure.

Options: ReplyQuote

Written By
exitStateAction is never called
November 23, 2021 10:28AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.