Hi,
I have a MySQL master/slave set up with row-based replication. At times of higher load we've been seeing replication halt and all slave thread activity commands just hang. STOP SLAVE, SHOW SLAVE STATUS, as well as the init.d stop script hang and never return. Replication is restored when I kill -9 the mysql process and restart.
I believe the hangups are being initiated by a FLUSH LOCAL TABLES WITH READ LOCK which is called by our backup snapshot script (ec2-consistent-snapshot) that is run every 2 hours. When things are in a hung state, SHOW PROCESSLIST shows 'Waiting for commit lock' - killing this command allows me to stop mysql cleanly, but it still requires a restart to get replication going again. I should also note that the ec2-consistent-snapshot script runs fine most of the time and only causes replication to break occasionally - for the last week it's broken about 5 times; randomly throughout the day.
I've seen this -
http://bugs.mysql.com/bug.php?id=68460, however it states this bug was fixed in 5.6.13 and we're running 5.6.17.
MySQL Ver 14.14 Distrib 5.6.17 for Linux (x86_64)
Running on Amazon's m1.large linux distribution on EC2 with RAID0 EBS data volumes
slave cnf config below ->
--------------------------
port=3306
socket=/var/run/mysqld/mysqld.sock
datadir="/data/mysql"
lower_case_table_names=1
default-storage-engine=INNODB
max_connections=400
query_cache_size=176M
table_open_cache=1520
tmp_table_size=63M
binlog-format=ROW
event_scheduler=ON
tmpdir=/data/mysql/temp
long_query_time=5
slow_query_log
log-queries-not-using-indexes
innodb_additional_mem_pool_size=11M
innodb_flush_log_at_trx_commit=1
innodb_support_xa=1
innodb_log_buffer_size=10M
innodb_buffer_pool_size=5000M
innodb_log_file_size=107M
innodb_thread_concurrency=10
innodb_file_per_table
innodb_autoextend_increment=200M
innodb_data_home_dir = "/data/mysql"
innodb_log_group_home_dir = "/data/mysql"
innodb_log_files_in_group = 2
max_allowed_packet=500M
#replication stuff
server-id=20
log-bin=1
show-slave-auth-info=1
expire_logs_days=7
relay_log=/data/mysql/relay-bin
log_slave_updates
skip_slave_start
read_only