blank group_replication_group_seeds after persisting configuration
Posted by:
Dayo Lasode
Date: September 08, 2017 06:28AM
Hi,
I'm hoping someone else has seen this behavior with innodb cluster.
I have a 3 node cluster defined as below (mysql 5.7.19 and mysql shell 1.0.10) created via the mysql shell methods :
mysql-js> c.status()
{
"clusterName": "devclust",
"defaultReplicaSet": {
"name": "default",
"primary": "dbbox1:1485",
"status": "OK",
"statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
"topology": {
"dbbox1:1485": {
"address": "dbbox1:1485",
"mode": "R/W",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"dbbox2:1485": {
"address": "dbbox2:1485",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
},
"dbbox3:1485": {
"address": "dbbox3:1485",
"mode": "R/O",
"readReplicas": {},
"role": "HA",
"status": "ONLINE"
}
}
}
}
As advised by the documentation, after cluster creation, the cluster configuration has been persisted on all nodes which should enable members to automatically join on restart:
root@dbbox1:~# mysqlsh --interactive -f /tmp/configLocal.js
Validating instance...
The instance 'localhost:1485' is valid for Cluster usage
You can now use it in an InnoDB Cluster.
{
"status": "ok"
}
root@dbbox2:~# mysqlsh --interactive -f /tmp/configLocal.js
Validating instance...
The instance 'localhost:1485' is valid for Cluster usage
You can now use it in an InnoDB Cluster.
{
"status": "ok"
}
root@dbbox3:~# mysqlsh --interactive -f /tmp/configLocal.js
Validating instance...
The instance 'localhost:1485' is valid for Cluster usage
You can now use it in an InnoDB Cluster.
{
"status": "ok"
}
Unfortunately, when a single node is restarted after this to simulate a failure, it fails to reconnect to the cluster The error log of the restarted node is below (showing only relevant sections):
2017-09-08T11:57:40.982789Z 0 [Note] Plugin group_replication reported: 'Initialized group communication with configuration: group_replication_group_name: "4d3473df-948b-11e7-85ba-022185d04910"; group_replication_local_address: "dbbox3:11485"; group_replication_group_seeds: ""; group_replication_bootstrap_group: false; group_replication_poll_spin_loops: 0; group_replication_compression_threshold: 1000000; group_replication_ip_whitelist: "AUTOMATIC"'
2017-09-08T11:57:40.983450Z 5 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 125, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2017-09-08T11:57:41.010447Z 4 [ERROR] Plugin group_replication reported: '[GCS] Unable to join the group: peers not configured. '
2017-09-08T11:57:41.010625Z 4 [ERROR] Plugin group_replication reported: 'Error on group communication initialization methods, killing the Group Replication applier'
2017-09-08T11:57:41.011094Z 8 [Note] Slave SQL thread for channel 'group_replication_applier' exiting, replication stopped in log 'FIRST' at position 0
2017-09-08T11:57:41.012092Z 5 [Note] Plugin group_replication reported: 'The group replication applier thread was killed'
It seems the replication peers are not defined at restart and looking at the persisted config on each my.cnf, variable group_replication_group_seeds was blank:
disabled_storage_engines = MyISAM,BLACKHOLE,FEDERATED,CSV,ARCHIVE
group_replication_allow_local_disjoint_gtids_join = OFF
group_replication_allow_local_lower_version_join = OFF
group_replication_auto_increment_increment = 7
group_replication_bootstrap_group = OFF
group_replication_components_stop_timeout = 31536000
group_replication_compression_threshold = 1000000
group_replication_enforce_update_everywhere_checks = OFF
group_replication_flow_control_applier_threshold = 25000
group_replication_flow_control_certifier_threshold = 25000
group_replication_flow_control_mode = QUOTA
group_replication_force_members
group_replication_group_name = 4d3473df-948b-11e7-85ba-022185d04910
group_replication_group_seeds
group_replication_gtid_assignment_block_size = 1000000
group_replication_ip_whitelist = AUTOMATIC
group_replication_local_address = dbbox3:11485
group_replication_poll_spin_loops = 0
group_replication_recovery_complete_at = TRANSACTIONS_APPLIED
group_replication_recovery_reconnect_interval = 60
group_replication_recovery_retry_count = 10
group_replication_recovery_ssl_ca
group_replication_recovery_ssl_capath
group_replication_recovery_ssl_cert
group_replication_recovery_ssl_cipher
group_replication_recovery_ssl_crl
group_replication_recovery_ssl_crlpath
group_replication_recovery_ssl_key
group_replication_recovery_ssl_verify_server_cert = OFF
group_replication_recovery_use_ssl = ON
group_replication_single_primary_mode = ON
group_replication_ssl_mode = REQUIRED
group_replication_start_on_boot = ON
group_replication_transaction_size_limit = 0
group_replication_unreachable_majority_timeout = 0
auto_increment_increment = 1
auto_increment_offset = 2
Is this the expected behavior or is there another way to make this stick?