Hey guys,
I have been reading this article of Frazer Clement's lately:
https://messagepassing.blogspot.com/2011/12/eventual-consistency-in-mysql-cluster.html. It got an excellent description of 'epoch'. But there is one sentence I am not sure I understand: Epoch boundaries act as markers in the flow of row events generated by each node, which are then used as consistent points to recover to.
I use the following scenario to explain my confusion: suppose at some point the NDB kernel requests all the nodes to persist last second of data changes to disk; and before that request, all data changes <= epoch 100 have been persisted to disk. After the request, some nodes do the job and eventually changes in epoch 101-110 are flushed to the REDO log; but some don't, for example, because of a busy disk sub-system. Then a system-wide crash strikes.
My question is: in the following system-wide recovery, are all the nodes recovered to epoch 100 uniformly, or each node is recovered to the largest epoch in its REDO log so that some nodes are recovered to epoch 100 but others are recovered to epoch 110?
It seems Frazer Clement's statement implies the first case to me. But I need to make sure about this.