Replication ignore table issue
Hi all,
I am having trouble with replication after a hardware upgrade. I have to say that I initially used mysql (years ago), all worked fine, I then used Mariadb for a while without issues, but now returned to mysql, mainly because it is default in the linux distribution I am provided with on the new hardware. Replication worked there for a few weeks, but now it is stuck and I have no idea what to do.
The setup it one master, one slave. The master is running fine and logging as required.
The slave server runs fine as well, no issues with the mysql instance other than the replication. The Slave_IO_Running says yes running and the relay log files are up to date.
Problem is the Slave_SQL_Running says no, and there is an error. If I do:
select * from performance_schema.replication_applier_status_by_worker;
which seems to be the best way to get a meaningful error message, I get:
Worker 1 failed executing transaction 'ANONYMOUS' at source log binlog.000021, end_log_pos 36111459; Could not execute Update_rows event on table nmrshiftdb.SESSIONS; Can't find record in 'SESSIONS', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's source log binlog.000021, end_log_pos 36111459
There is other stuff, but this is the core message, I believe. SESSIONS is a table name in my database.
If I take the binlog from master and look at it, the command at that position is a commit. The command before is an "update SESSIONS", and before that there is begin.
From this, I take that the update is the problem. This statement works on the server, and the tables are identical on master and slave.
The first issue is: This table should be ignored. If I do a "show slave status", I get "x.y,x.SESSIONS" in "Replicate_Ignore_Table", where x is my database name and y is another table. It seems the ignore table is ignored (haha). Is there any explanation for this? Something I can do? I am lost here. I tried leaving out the database name from ignore table and played with it, but nothing seems to work, sessions is still executed, or attempted to be executed. Plus it seems it worked initially, since the table is heavily used, I can't imagine that it was not used when the replication worked initially.
Secondly (and this is more an observation) looking at the binlog, the update statement is not the original one, but it has all columns in the update and the where statement. Since the table is different on master and slave (that's why I use ignore), it makes sense the replication "can't find the record". I wasn't aware replication works like this, I thought it logs the original statement (which is "update SESSIONS set x=... where y=...", whereas the logged statement has where x=...,y=...,z=...). I can see that if the table would be the same, the replicated statement would work, so this is not really the issue here.
So in short, are there any ideas why my ignore table doesn't work? If you need more information, please ask.
Thanks