Thanks for noticing this problem.
I have tried to resolve the last stack trace I got but I guess it has to be recent for getting anything useful. The only thing I got was:
0x8072d44 + 134688068
0x826ca28 + 136759848
0x8295a9f + 136927903
0x8295813 + 136927251
0x8251526 + 136647974
0x8250bb8 + 136645560
0x80f0547 + 135202119
0x80ec622 + 135185954
0x80ebc23 + 135183395
0x807198e + 134683022
0x8071c98 + 134683800
0x8071ce5 + 134683877
0x826a1dc + 136749532
0x82a008a + 136970378
I will try to get a new stack trace. But the thing is that I only receive a stack trace when it crashes due to exceeding the max_bin_log_size. Recently I have never let it reach that limit cause the crashes makes it more difficult for me to get it on it's feet again. But the problem we have is that "FLUSH LOGS" doesn't work either. For us to manage to rotate the relay log we have to do it like this:
1) STOP SLAVE;
2) FLUSH LOGS;
3) Restart MySQL
If we do it that way the error log looks like this:
-------------
050512 2:38:40 Slave I/O thread killed while reading event
050512 2:38:40 Slave I/O thread exiting, read up to log 'freebsds-bin.019', position 42271333
050512 2:38:40 Error reading relay log event: slave SQL thread was killed
050512 2:38:53 Could not use ensim-relay-bin for logging (error 13). Turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
050512 2:42:42 /usr/sbin/mysqld: Normal shutdown
050512 2:42:42 InnoDB: Starting shutdown...
050512 2:42:44 InnoDB: Shutdown completed
050512 2:42:44 /usr/sbin/mysqld: Shutdown Complete
050512 02:42:44 mysqld ended
050512 02:42:46 mysqld started
050512 2:42:46 Warning: Asked for 196608 thread stack, but got 126976
050512 2:42:46 InnoDB: Started
/usr/sbin/mysqld: ready for connections.
Version: '4.0.23-standard' socket: '/home/virtual/FILESYSTEMTEMPLATE/.mysqlsock/mysql.sock' port: 3306 Official MySQL RPM
050512 2:42:46 Slave SQL thread initialized, starting replication in log 'freebsds-bin.019' at position 42271333, relay log './ensim-relay-bin.020' position: 581495788
050512 2:42:46 Slave I/O thread: connected to master 'repl@217.198.149.24:3306', replication started in log 'freebsds-bin.019' at position 42271333
-----------
And that works fine. The old relay log is deleted and a new is created just like it should.
But if we skip the first step STOP SLAVE and just do FLUSH LOGS we get the following error log:
------------------
050509 16:48:44 Could not use ensim-relay-bin for logging (error 13). Turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
050509 16:48:44 next log error: -1 offset: 22 log: ./ensim-relay-bin.017
050509 16:48:44 Error reading relay log event: Error purging processed log
050509 16:48:44 Slave: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 0
050509 16:48:44 Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'freebsds-bin.018' position 534507105
--------------------
To recover from this we have to delete the file "relay-log.info" and then restart MySQL. Then the old relay log remains but a new relay log is created and that's ok.
If we restart MySQL without removing the "relay-log.info" we seem to get different results in the error log probably depending on if it gets queries from the master at the moment or not. I did this once during busy hours and then I got a stack trace in the error log.
-----------
050509 16:48:50 /usr/sbin/mysqld: Normal shutdown
050509 16:48:50 Slave I/O thread killed while reading event
050509 16:48:50 Slave I/O thread exiting, read up to log 'freebsds-bin.018', position 534517823
mysqld got signal 11;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.
key_buffer_size=8388600
read_buffer_size=131072
max_used_connections=18
max_connections=100
threads_connected=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_connections = 225791 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
thd=(nil)
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xbfe5f498, backtrace may not be correct.
Stack range sanity check OK, backtrace follows:
0x8072d44
0x826ca28
0x8295a9f
0x8295813
0x8251526
0x8250bb8
0x80f0547
0x80ec622
0x80ebc23
0x807198e
0x8071c98
0x8071ce5
0x826a1dc
0x82a008a
New value of fp=(nil) failed sanity check, terminating stack trace!
Please read
http://dev.mysql.com/doc/mysql/en/Using_stack_trace.html and follow instructions on how to resolve the stack trace. Resolved
stack trace is much more helpful in diagnosing the problem, so please do
resolve it
The manual page at
http://www.mysql.com/doc/en/Crashing.html contains
information that should help you find out what is causing the crash.
Number of processes running now: 0
050509 16:48:50 mysqld restarted
050509 16:48:50 Warning: Asked for 196608 thread stack, but got 126976
050509 16:48:51 InnoDB: Started
050509 16:48:51 Could not use ensim-relay-bin for logging (error 13). Turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.
050509 16:48:51 Failed in open_log() called from init_relay_log_info()
050509 16:48:51 Failed to initialize the master info structure
/usr/sbin/mysqld: ready for connections.
Version: '4.0.23-standard' socket: '/home/virtual/FILESYSTEMTEMPLATE/.mysqlsock/mysql.sock' port: 3306 Official MySQL RPM
----------------
By the way that is the last stack trace I've gotten and the one I tried to resolve.
But as I said this was made during busy hours. When I tried to recreate the same thing when it was not busy and it got very few queries from the master then I didn't get a stack trace.
The thing with the corrupted master.info where we get part of a query in the file master.info seems to only appear if you after the relay log reached the max_bin_log_limit and the replication crashed and you try to do START SLAVE. Which might suggest that it probably has something to do with that it can't use more memory. So that is not the core of the problem.
The main issue for us is that FLUSH LOGS doesn't work and that the relay logs doesn't rotate when they reach the size limit for relay logs.
I will now let the relay logs reach the size limit and let it crash to get a usable stack trace and try to resolve it. I will post it here then.
Regards
Iwarson