Re: Can't connect to MySQL server on '...' (4)
Yes, it is Linux server, currently running 2.6.21 kernel. During peak time I catch some communication between my client and server. Have to say it is enough heavy. But comparing in time when my clients got mysql_real_connect error I found an error in network communication.
This is related conversation which ends up with error:
54111 15:52:20.857297 192.168.0.6 49508 192.168.0.7 3306 TCP [SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSV=959831467 TSER=0 WS=6
54116 15:52:20.857552 192.168.0.7 3306 192.168.0.6 49508 TCP [SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSV=660587887 TSER=959831467 WS=6
54117 15:52:20.857564 192.168.0.6 49508 192.168.0.7 3306 TCP [RST] Seq=1 Win=0 Len=0
(I compared SYN,ACK response from server with another converstion which has been handled correctly, packets are same in matter).
I looks like client is really closing TCP connection prematurely. I looked up at code of mysql_real_connect. I am using libmysqlclient_r (thread-safe) library, version 5.1.31. I found one thing which can cause it. Because I set options.connect_timeout, socket is being created as O_NONBLOCK and connect() function in client.c:my_connect will always returning E_INPROGRESS. Afterwards process continues (and ends) by calling wait_for_data with specified timeout:
return wait_for_data(fd, timeout);
}
This is original wait_for_data():
static int wait_for_data(my_socket fd, uint timeout)
{
#ifdef HAVE_POLL
struct pollfd ufds;
int res;
ufds.fd= fd;
ufds.events= POLLIN | POLLPRI;
if (!(res= poll(&ufds, 1, (int) timeout*1000)))
{
errno= EINTR;
return -1;
}
if (res < 0 || !(ufds.revents & (POLLIN | POLLPRI)))
return -1;
return 0;
... continues with part for systems with no poll()
So when poll() returns -1, function returns -1 and mysql_real_connect returns -1 too. But poll() can return -1 not only as "hard error" but also when system is catching signal on which process or thread have to call appropriate signal handler. In this case poll() should return -1 and errno=EINTR and process should call poll() again, not returning error.
I changed it to:
static int wait_for_data(my_socket fd, uint timeout)
{
#ifdef HAVE_POLL
struct pollfd ufds;
int res;
ufds.fd= fd;
ufds.events= POLLIN | POLLPRI;
ufds.revents = 0;
do {
res = poll(&ufds, 1, (int) timeout*1000);
} while(res < 0 && errno == EINTR);
if (res <= 0 || !(ufds.revents & (POLLIN | POLLPRI))) /*timeout or error*/
return -1;
return 0;
And now while simulating error (with forking to 2 processes, first connecting to mysql, second sending signals to first) it looks good. I have to wait for real system experience. But because is is multi threaded system there are many signals there.
I already looked at another poll() calls in client, there was another one strange in net.c:net_data_is_ready.