Re: Power failure and innodb
Hi Tobia,
I am really wondering how InnoDB could possibly roll back several days worth of transactions. Usually, if the system contains even one long-running transaction, the purge subsystem will be effectively disabled, and you should eventually run out of undo log space.
In InnoDB, a transaction commit is a simple operation that flags the undo log header of the transaction as committed. This operation is covered by a redo log (ib_logfile* files). On transaction commit, the InnoDB redo logs will be flushed (written and synced to disk) when you are using the default setting of innodb_flush_log_at_trx_commit=1 (which is what you listed).
With the MySQL binlog, things get a little more complex. The MySQL binlog acts as a distributed transaction coordinator, and InnoDB can be one of the distributed transactions. Internally, InnoDB would execute the same code as XA PREPARE and XA COMMIT. The redo log for setting XA PREPARE must be flushed synchronously, but the XA COMMIT can in this case use less durable semantics, because MySQL would consult the binlog file on recovery to decide whether to issue XA COMMIT or XA ROLLBACK.
I am not deeply familiar with the MySQL binlog, but I would assume that at most a few transactions could require binlog-based XA COMMIT or XA ROLLBACK recovery after a power failure. This is assuming that the binlog will be flushed to the file system in a reasonable time.
Internally, InnoDB has two kinds of undo logs: insert_undo (covering INSERT of completely new records) and update_undo (covering everything else). On commit, the insert_undo records will be freed. The update_undo logs will be kept around for a little longer, in case some older transaction needs to see old versions of records (MVCC). The update_undo logs will eventually be freed by the purge subsystem, once all old transactions are gone.
If the transactions really were rolled back, they should have never been committed. After commit, it will be more or less impossible to roll back, because the undo log pages will be reused for something else.
However, if you had lots of transactions in XA PREPARE state, this would mean that new transactions at the default REPEATABLE READ isolation level would not see the data, as it was not committed yet. Only READ UNCOMMITTED transactions would be able to see them. Furthermore, you should have hit locking conflicts, because the XA PREPARE transactions are holding implicit locks on all records that they inserted, deleted or updated.
Can you try to find out what really happened? Or can you try to repeat the scenario?
Note that on some hardware, fsync() can return before all the data has actually been written to the persistent storage medium. You might be affected by this and some reordering of writes by the operating system. Was there anything written to the server error log? Any tables corrupted?
Best regards,
Marko Mäkelä
InnoDB developer at Oracle