MySQL Forums
Forum List  »  NDB clusters

Re: 5.1.6-alpha NDB: Could not get apply status share
Posted by: jim shnider
Date: March 28, 2006 04:44PM

my previous post was somewhat naive, but I am still having a problem

I read a concurrent thread: 'Mysql Cluster: Unable to create table.. Table exists error (ERROR 1050)' by Annapoorani SundarRajan and Gabriel Harriman. At the end of this thread, Gabe explains the cause of his problem: that ndbd was not fully started, but hung in one of its startup phases.

His solution was to reinitialize the ndbd fs, then wait to start the mysqld until all data nodes were 'started'. This allowed his mysqld to create tables.

I was distracted because the API node was not registering as 'connected' with the mgm console. I missed that the data node was still doing its 'starting' thing, and was stuck in Phase 1.

Some log entries are generated after calling 'ndbd' when the mgm node is active:

(ndbd log - /var/lib/mysql-cluster/ndb_2_out.log)
<quote>
2006-03-28 15:43:35 [ndbd] INFO -- Angel pid: 3666 ndb pid: 3667
2006-03-28 15:43:35 [ndbd] INFO -- NDB Cluster -- DB node 2
2006-03-28 15:43:35 [ndbd] INFO -- Version 5.0.19 --
2006-03-28 15:43:35 [ndbd] INFO -- Configuration fetched at 192.168.0.20 por
t 1186
2006-03-28 15:43:36 [ndbd] INFO -- Start initiated (version 5.0.19)
</quote>

At this point, the mgm console reports:
<quote>
ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 3 node(s)
id=2 @192.168.0.212 (Version: 5.0.19, starting, Nodegroup: 0, Master)
id=3 (not connected, accepting connect from 192.168.0.214)
id=4 (not connected, accepting connect from 192.168.0.216)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @192.168.0.20 (Version: 5.1.7)

[mysqld(API)] 3 node(s)
id=5 (not connected, accepting connect from any host)
id=6 (not connected, accepting connect from any host)
id=7 (not connected, accepting connect from any host)

ndb_mgm> 2 status
Node 2: starting (Phase 1) (Version 5.0.19)
</quote>

and the mgmd writes a few lines to its log:

(ndb_mgmd log - $work_dir/ndb_1_cluster.log)
<quote>
2006-03-28 15:46:02 [MgmSrvr] INFO -- Shutdown complete 2006-03-28 15:46:51 [MgmSrvr] INFO -- NDB Cluster Management Server. Version
5.1.7 (beta) 2006-03-28 15:46:51 [MgmSrvr] INFO -- Id: 1, Command port: 1186
2006-03-28 15:47:31 [MgmSrvr] INFO -- Mgmt server state: nodeid 2 reserved for ip 192.168.0.212, m_reserved_nodes 0000000000000006.
2006-03-28 15:47:31 [MgmSrvr] INFO -- Node 1: Node 2 Connected 2006-03-28 15:47:32 [MgmSrvr] INFO -- Mgmt server state: nodeid 2 freed, m_r
eserved_nodes 0000000000000002. 2006-03-28 15:48:04 [MgmSrvr] INFO -- Node 2: Start phase 1 completed
</quote>

Perhaps fortuitously, I became distracted by other matters while preparing this post. When I returned, I wanted to verify my expectation (from past trials) that the mgm console would report an error when trying to stop the data node:

<quote>
ndb_mgm> 2 stop
Node 2 has shutdown.

ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 3 node(s)
id=2 @192.168.0.212 (Version: 5.0.19, starting, Nodegroup: 0, Master)
id=3 (not connected, accepting connect from 192.168.0.214)
id=4 (not connected, accepting connect from 192.168.0.216)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @192.168.0.20 (Version: 5.1.7)

[mysqld(API)] 3 node(s)
id=5 (not connected, accepting connect from any host)
id=6 (not connected, accepting connect from any host)
id=7 (not connected, accepting connect from any host)

ndb_mgm> 2 status
Node 2: starting (Phase 2) (Version 5.0.19)
</quote>

The error I expected was not reported, the stop operation still failed, but the data node had managed to move-on to Phase 2 (either because it was left for a long time or because it received a 'stop' command...).

FYI: 'restart' (but not 'stop') works most of the time when the data node is still in phase 1, and 'killall ndbd' stops ndbd and notifies the mgm node that it is stopping (due to signal 15)

After killall, the ndbd manages to log:
<quote>
2006-03-28 15:20:26 [ndbd] INFO -- Received signal 15. Performing stop.
2006-03-28 15:20:26 [ndbd] INFO -- Shutdown initiated
2006-03-28 15:20:26 [ndbd] INFO -- Shutdown completed - exiting
2006-03-28 15:20:26 [ndbd] INFO -- Angel shutting down
2006-03-28 15:20:26 [ndbd] INFO -- Node 2: Node shutdown completed. Initiate
d by signal 15.
</quote>

Now I am tempted to wait and see if the ndbd gets all the way through to 'started' if I just wait...

Quite clearly, I am struggling with the mysteriousness of all this.

Comments? Suggestions?

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.