Re: How to shorten the time after data node switchover
Hi Xielei,
Your problem is likely that the data node on host B kills itself, since the the management server on host A is likely the current arbitrator.
The time you see for the "switchover" is actually the time it takes for DB on host B detects it should kill itself + time to restart + time to wait a while for DB on host A to also start + time to complete startup, or, potentially DB on host A starts up before host B.
To avoid the cluster shutting down on a single node failure it is recommended to have the arbitrator on a third host. Without shutting down the cluster on node failure application should not notice more than that the sessions against host A are disconnected, and some transactions on sessions against host B fails.
The management servers by default act as arbitrators, but only one can be active at a time. To enforce that only the management server on third host is active on should set ArbitrationRank=0 for the management servers on host A and B.
If cluster have more than two data nodes the surviving nodes can typically decide without involving the arbitrator whether they should continue or kill themself.
But for two nodes, as in your case, arbitrator will be needed.
When host A dies, the data node on host B detects that it lost connection with DB on host A, but it does not know if it was due to host failure or network failure. To not allow potentially both data nodes to continue (so called split brain scenario) the data node involves a third party, the arbitrator, if it should continue or kill itself.
To see which node is the current arbitrator you can do:
mysql> SELECT arbitrator FROM ndbinfo.arbitrator_validity_summary;
Note that having StartPartitionedTimeout=1000 will increase risk for split-brain scenarios, the recommendation is to turn it off (StartPartitionedTimeout=0). That will require manual intervention if cluster shuts down or some external mechanism to decide that data node should start without waiting for other data node (ndbmtd --nowait-nodes=<nodeid-of-permanent-down-node>).
Also note that MySQL Cluster 7.5 is no longer supported, there are several newer releases available.
Regards,
Mauritz
Subject
Views
Written By
Posted
58
January 08, 2025 04:33AM
16
January 08, 2025 06:58PM
Re: How to shorten the time after data node switchover
8
January 10, 2025 12:42PM
Sorry, only registered users may post in this forum.
Content reproduced on this site is the property of the respective copyright holders.
It is not reviewed in advance by Oracle and does not necessarily represent the opinion
of Oracle or any other party.