I'm trying to set up MySQL replication using one slave. The slave appears to be replicating fine, and then the following error occurs:
"Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave."
The dump that I imported into the slave to get it started was created using the following command on the master:
mysqldump --opt --allow-keywords -q -uroot -ppassword dbname > E:\Backups\dbname.sql
The script that performs this backup also logs the master's current binary log position. I then took the following steps to start replication on the slave:
1. STOP SLAVE;
2. DROP DATABASE dbname;
3. SOURCE dbname.sql; (... waited a few hours for the 10gb dump to import)
4. RESET SLAVE;
5. CHANGE MASTER TO MASTER_HOST='[masterhostname]', MASTER_USER='[slaveusername]', MASTER_PASSWORD='[slaveuserpassword]', MASTER_PORT=[port], MASTER_LOG_FILE='[masterlogfile]', MASTER_LOG_POS=[masterlogposition];
6. START SLAVE;
After about a day of replication working fine, it failed again at 3:43 AM. The first thing that appeared in MySQL's error log was the error above. Then another generic error appeared after with the same timestamp:
"Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '[masterlogfile]' position [masterlogpos]"
For more logging information, I had set up a batch script to run "SHOW SLAVE STATUS" and "SHOW FULL PROCESSLIST" every hour. Here are the results before and after the failure:
http://pastebin.com/raw.php?i=13gATNRC
I tried following the instructions from the error and ran mysqlbinlog on the slave's relay log with a start_position thousands of statements before, and stop_position thousands of statements after the point of failure, and redirected the output to a text file. I did not see any corruption errors on the command line or in the log file. This is what the log file said around the point of failure:
http://pastebin.com/raw.php?i=Gh3SB1cB
Interesting that it's logging an Invalid floating point operation at that point, but I'm not sure how that could cause replication to break at that position. I ran mysqlbinlog on the master's binary log found in SHOW SLAVE STATUS from above, and did not see any errors on the command line (but did not get a chance to open the 100mb log file that was generated since I didn't want to bog down the production server).
So right now I'm at a loss for what else to try. I'm basically just looking for any insights as to what might be going wrong or any suggestions for what steps to take next. Thanks!