Hello Folks,
Recently I started upgrading all the slave servers from MySQL 5.5.37 to MySQL 5.6.21 and right on the very first one I've found a problem that is making me concerned about upgrading it at this time.
BTW, the story behind it is that: after stopping a MySQL replication and instance, I removed all the MySQL 5.5.37 rpm packages and install MySQL 5.6.21 from the Oracle's YUM repository. Added some new variables for crash-safe replication (master_info_repository, relay_log_info_repository, relay_log_recovery) and those used to configure InnoDB Buffer pool to sump/reload its content. Currently relaying on the replication coordinates, I stated the replication that worked for ~ 8 hours and then, 5.6 had its first crash by a signal 11. I commented all the per-client variables and tried again, another crash when issuing the start slave command.
Considering that the problem is related to the moment in which I issue the START SLAVE, I removed the skip-slave-start and started MySQL 5.6 again - it enters into a loop when trying to start slave...the final action was:
[bianchi@ndbx ~]$ sudo killall -9 mysqld_safe
[bianchi@ndbx ~]$ sudo killall -9 mysqld
Server configs:
[bianchi@ndbx ~]$ cat /proc/meminfo | grep MemTotal
MemTotal: 65955736 kB
[bianchi@ndbx ~]$ cat /proc/cpuinfo | grep processor | wc -l
32
BTW, the very first time MySQL 5.6.21 crashed after a START SLAVE cmd:
09:25:33 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
key_buffer_size=4294967296
read_buffer_size=4194304
max_used_connections=3
max_threads=3000
thread_count=3
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 41099413 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7fe9a8000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fea607c17e0 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8dbbb5]
/usr/sbin/mysqld(handle_fatal_signal+0x494)[0x665f24]
/lib64/libpthread.so.0(+0xf710)[0x7ff761812710]
/lib64/libc.so.6(+0x7966b)[0x7ff76050466b]
/lib64/libc.so.6(__libc_malloc+0x71)[0x7ff7605056b1]
/usr/sbin/mysqld(my_malloc+0x32)[0x8d74d2]
/usr/sbin/mysqld(alloc_root+0xcb)[0x8d305b]
/usr/sbin/mysqld(_ZN9base_list9push_backEPv+0x20)[0x60cb50]
/usr/sbin/mysqld(_Z10MYSQLparseP3THD+0x18f64)[0x7aa6f4]
/usr/sbin/mysqld(_Z9parse_sqlP3THDP12Parser_stateP19Object_creation_ctx+0x72)[0x6dae42]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0xc9)[0x6e5f49]
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcj+0x74c)[0x87b61c]
/usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x74)[0x87fb54]
/usr/sbin/mysqld(_Z26apply_event_and_update_posPP9Log_eventP3THDP14Relay_log_info+0x263)[0x8aefd3]
/usr/sbin/mysqld[0x8b0cf2]
/usr/sbin/mysqld(handle_slave_sql+0xb09)[0x8b27c9]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xb00b1a]
/lib64/libpthread.so.0(+0x79d1)[0x7ff76180a9d1]
/lib64/libc.so.6(clone+0x6d)[0x7ff76057386d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fe998ede03c): is an invalid pointer
Connection ID (thread ID): 6
Status: NOT_KILLED
The manual page at
http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
141008 06:25:35 mysqld_safe Number of processes running now: 0
141008 06:25:35 mysqld_safe mysqld restarted
And then, I commented all the per-client variables on my.cnf and tried again the START SLAVE:
2014-10-08 13:29:37 34495 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.21' socket: '/var/mysql/logs/mysql.sock' port: 3306 MySQL Community Server (GPL)
16:29:37 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=0
max_threads=151
thread_count=2
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 68245 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f3c18000990
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f3c216097e0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x8dbbb5]
/usr/sbin/mysqld(handle_fatal_signal+0x494)[0x665f24]
/lib64/libpthread.so.0(+0xf710)[0x7f47f6039710]
/usr/sbin/mysqld(_ZN10Copy_field13get_copy_funcEP5FieldS1_+0x3a)[0x7cb35a]
/usr/sbin/mysqld(_ZN10Copy_field3setEP5FieldS1_b+0x11e)[0x7cb9fe]
/usr/sbin/mysqld(_Z10unpack_rowPK14Relay_log_infoP5TABLEjPKhPK9st_bitmapPS5_PmS5_+0x26c)[0x8989ac]
/usr/sbin/mysqld(_ZN14Rows_log_event24do_index_scan_and_updateEPK14Relay_log_info+0x100)[0x873c50]
/usr/sbin/mysqld(_ZN14Rows_log_event14do_apply_eventEPK14Relay_log_info+0x852)[0x877c32]
/usr/sbin/mysqld(_ZN9Log_event11apply_eventEP14Relay_log_info+0x74)[0x87fb54]
/usr/sbin/mysqld(_Z26apply_event_and_update_posPP9Log_eventP3THDP14Relay_log_info+0x263)[0x8aefd3]
/usr/sbin/mysqld[0x8b0cf2]
/usr/sbin/mysqld(handle_slave_sql+0xb09)[0x8b27c9]
/usr/sbin/mysqld(pfs_spawn_thread+0x12a)[0xb00b1a]
/lib64/libpthread.so.0(+0x79d1)[0x7f47f60319d1]
/lib64/libc.so.6(clone+0x6d)[0x7f47f4d9a86d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): is an invalid pointer
Connection ID (thread ID): 2
Status: NOT_KILLED
The manual page at
http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
141008 13:29:37 mysqld_safe Number of processes running now: 0
141008 13:29:37 mysqld_safe mysqld restarted
And then, to confirm that it was caused by the START SLAVE (or any other knock-on effect), I removed skip-slave-start to start replication within mysqld start - mysqld *got crazy* and enters in a loop, restarting all the time.
The output of my_print_defaults mysqld (I remember that it's coming from MySQL 5.5):
[bianchi@ndbx logs]$ my_print_defaults mysqld
--user=mysql
--port=3306
--basedir=/usr
--datadir=/var/mysql/datadir
--socket=/var/mysql/logs/mysql.sock
--pid-file=/var/mysql/logs/ndbx.com.br.pid
--log-error=/var/mysql/logs/ndbx.com.br.err
--log-warnings=2
--slow-query-log-file=/var/mysql/logs/ndbx.com.br.slow
--tmpdir=/var/mysql/tmp
--skip-external-locking
--skip-name-resolve
--explicit_defaults_for_timestamp
--old_password=1
--secure_auth=0
--table_definition_cache=20000
--table_open_cache=20000
--query_cache_type=0
--query_cache_size=0
--query_cache_limit=0
--innodb_log_group_home_dir=/var/mysql/logs
--innodb_log_files_in_group=10
--innodb_log_file_size=128M
--innodb_buffer_pool_size=44G
--innodb_buffer_pool_instances=8
--innodb_log_buffer_size=128M
--innodb_flush_method=O_DSYNC
--innodb_open_files=100000
--innodb_read_io_threads=8
--innodb_write_io_threads=8
--innodb_purge_threads=1
--innodb_flush_log_at_trx_commit=0
--server-id=207
--bind-address=[...]
--relay_log=/var/mysql/logs/relay.log
--relay_log_index=/var/mysql/logs/relay-index.log
--relay_log_info_file=relay-log.info
--relay_log_purge=1
--relay_log_recovery=1
--relay_log_space_limit=0
--relay_log_info_repository=TABLE
--max_relay_log_size=256M
--master_info_repository=TABLE
--long_query_time=5
I will continue to deal with it, but if you guys have any clue on this, it's gonna be very welcomed.
PS.: I tried to restart replication after a RESET SLAVE, but, the same behavior.
Thanks!!
Wagner Bianchi - +55 31 8654-9510
Profile: bit.ly/toG94v
Blog: wagnerbianchi.com/blog
Twitter: @wagnerbianchijr
Skype: wbianchijr