web123456

zookeeper Unable to read additional data from client sessionid

2016-08-18 16:25:11,412 [myid:3] - INFO  [:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.9.40:47606 (no session established for client)
2016-08-18 16:25:14,410 [myid:3] - INFO  [:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@192] - Accepted socket connection from /192.168.9.40:47610
2016-08-18 16:25:14,410 [myid:3] - WARN  [:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket
        at (:230)
        at (:203)
        at (:745)
2016-08-18 16:25:14,410 [myid:3] - INFO  [:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.9.40:47610 (no session established for client)

After carefully checking the contents in the log, I found that many connections were established and immediately disconnected:

Judging from the time of log output, it is broken in seconds.


Check network connection status using netstat

Check the connection status on the client's server:

netstat -antp | grep 2181  

I found that there are many connections with TIME_WAIT state:

tcp        0      0 192.168.9.45:44269         192.168.9.40:2181         TIME_WAIT   -                     
tcp        0      0 192.168.9.45:43646         192.168.9.40:2181         TIME_WAIT   -                     
tcp        0      0 192.168.9.45:44184         192.168.9.40:2181         TIME_WAIT   -                     
tcp        0      0 192.168.9.45:44026         192.168.9.40:2181         TIME_WAIT   -                     
tcp        0      0 192.168.9.45:43766         192.168.9.40:2181         TIME_WAIT   -    


However, the process number cannot be seen in the connection with TIME_WAIT status. After searching and studying the parameters of netstat, I found that there was no way to output the pid of the connection with the TIME_WAIT status, so I had to try other methods.

Then use jstack -l pid to view the process's thread stack, and no exception was found. I saw several threads connected to zookeeper, but they were also normal.
For example /usr/jdk64/jdk1.7.0_67/bin/jstat -gcutil 221470 1000

I checked the machine's IO, CPU, and memory, and there were no exceptions.

I found no useful information, so I had to study the parameters of netstat:
I found that using netstat -ae output some information:

tcp        0      0 192.168.9.45:41772     192.168.9.40:eforward     TIME_WAIT   root       0            
tcp        0      0 192.168.9.45:41412     192.168.9.40:eforward     TIME_WAIT   root       0            
tcp        0      0 192.168.9.45:24226     192.168.9.40:2181         TIME_WAIT   root       0            
tcp        0      0 192.168.9.45:24623     192.168.9.40:2181         TIME_WAIT   root       0  


Start adding Zookeeper's link settings:
tickTime=2000
Change to tickTime=10000
Observation has no effect.


So I tried to stop the java program and found that there was no TIME_WAIT connection.
However, it was confirmed that it was a problem with the Java application, so I restarted the Java application, but no TIME_WAIT situation was found again.

Reference: /hengyunabc/article/details/41450003