One Thousand and One Solutions for Oracle Fusion Middleware
One Thousand and One Solutions for Oracle Fusion Middleware is a series of articles about my experiences with different technologies in Oracle Fusion Middleware (OFM) area. In the articles of One Thousand and One Solutions for OFM, I try to summarize one exact issue and then present a solution that can help you. Moreover, I introduce you to available references regarding the issue. You can study them on your own whenever you have a little bit of time.
WebLogic Server 10.3, Cluster, Exception; Too many open files, Socket, Linux, SLES, Error Message: <BEA-002616> <Failed to listen on channel “Default”…>
<Critical> <Server> <BEA-002616> <Failed to listen on channel “Default” on IP_NR:Port, failure count: 1, failing for 0 seconds, java.net.SocketException: Too many open files>
Weblogic declaration from: http://docs.oracle.com/cd/E23549_01/apirefs.1111/e14397/Server.html
Critical: Failed to listen on channel “channel” on listenAddress:port, failure count: fails1, failing for secs2 seconds, e3
- Description: The server listener will retry the listen after a short delay.
- Cause: The server got an exception while trying to accept client connections. It will try to backoff to aid recovery.
- Action: The OS limit for the number of open file descriptor (FD limit) needs to be increased. Tune OS parameters that might help the server to accept more client connections (e.g. TCP accept back log).
More detail and Background information:
“Tune OS parameters” depends on OS which you use. For Linux, kernel parameters need adjustment- exact details depend on the distribution. There are individual user limits for open files. You can use `ulimit -a` command to find out the Linux for the user that owns.
You can find information regarding the important parameters e.g. number of “open files” or similar entry. Please consider, in AIX, the user limits apply as well as a system OS configuration for the total number of open files allowed on the host.
Here is an example how can you check and isolate the position of error.
1- Check linux configuration (e.g. on host XXX):
> ulimit –a
core file size (blocks, -c) 1
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127575
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) 13887980
open files (-n) 65600 à OK
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 127575
virtual memory (kbytes, -v) 17749200
file locks (-x) unlimited
The open file descriptor limit is at 65600 as recommended by Oracle.
2- Check currently open files
- Find PID of Managed Server
> ps -fea|grep myManagedServer
- Check open files
Please use lsof (list open files) that lists information about files opened by processes
You see list of open files via (e.g.PID is here 1234)
>lsof -p 1234
In order to findour the number of open files:
> lsof -p 1234 | wc –l
In this case, we observed that application has 13 thousand connections on “CLOSE_WAIT” status. This usually happens when the application doesn’t close the connections properly:
The remote end has shut down, waiting for the socket to close.
It’s reason why they are reaching the 65600 limit.
After more analyses this case, it is cleared, the issue is a result of weblogic cluster configuration.
“Open files” are waiting to close cluster communication between two nodes. See <BEA-003108> <Unicast receive error: java.io.EOFException” >
- Additional information regarding <BEA-002616> is available on MOS: WebLogic Server Support Pattern: How To Troubleshoot Too Many Open Files Problems [ID 867492.1]