Category Archives: Cluster

weblogic.server.ServerLifecycleException: Cannot get to the relevant ServerRuntimeMBean for server MGSRV

Lately, I got the following error:

weblogic.server.ServerLifecycleException: Can not get to the relevant ServerRuntimeMBean for server MGSRV.

weblogic.management.scripting.ScriptException: Error occured while performing shutdown : Error shutting down the server : Can not get to the relevant ServerRuntimeMBean for server MGSRV.

Use dumpStack() to view the full stacktrace

 

It’s happened, when I wanted to shutdown a managed server via WebLogic Server script “stopManagedWebLogic.sh”. In addition, nodemanger-utility was not able to shutdown managed server via administration console.

After review the issue, I faced to an Oracle Document:

WebLogic Managed Server shutdown failing when using SSL t3s Admin Server address as ADMIN_URL (Doc ID 851065.1)

The main reason for this issue is: “JMX clients using secure protocols were not able to invoke operations on MBeans registered in the Domain Runtime MBeanServer of the AdminServer. Authentication of JMX clients was not correctly performed.”

 

It seems that is a bug (Bug 8359946):

“Not able to shutdown the managed serves using t3s protocol in the WLST scripts. The same script with t3 protocol is working fine. Adminserver is shutting down fine with either protocol. Not able to shutdown the managed server using WLST. Getting, connecting to successfully connected to Admin Server ‘adminserver’ that belongs to domain…”

You can test it with following steps:

1) Create a domain for WLS 10.3 with 1 Admin and 1 Managed server.

2) Enable SSL and configure the Demo Identity and Demo trust for the Keystores on both the Admin and Managed server.set ADMIN_URL=t3s://localhost:7002

And add the -Dweblogic.security.TrustKeyStore=DemoTrust

4) Start the Admin server.

5) Start the Managed server.

6) Stop the Managed server. Then: Error

 

What is the solution?

Quick and dirty solution:

You can kill managed server with kill -9 <pid> of MGSRV and using administration console for start/stop with nodemanager.

  • ·         Advantage: You have a uniform start/stop method for managed server via administration console.
  • ·         Disadvantage: Automatic start/stop of managed server(s) via script(s) and operating system is not possible.

Oracle solution:

The bug is fixed in: 12.1.1, for previous WebLogic Server versions are patches available:

 

WLS Version                            Patch Number

10.0.1                                      Patch 8589531

 10.0.2                                     Patch 8359946

 10.3.0                                     Patch 8359946

 

Please note that patches are applied per WLS installation and not per domain. That is, if you apply this patch on one WLS installation, then all of the servers from all the domains in that installation will have this patch. On the other hand, if you have a managed server in another machine in a domain (that is, set up with its own WLS installation), you need to install this patch on that other machine as well. Generally, patches can only be applied while the server is not running because WLS locks the needed files while it is running. If, however, you are able to apply a patch while WLS is running, you must restart WLS before the patch will take effect.

Advertisements

BEA-003108: Unicast receive error: java.io.EOFException

BEA-003108: Unicast receive error: java.io.EOFException.

BEA-003108: Unicast receive error: java.io.EOFException

One Thousand and One Solutions for Oracle Fusion Middleware

BEA-003108: Unicast receive error: java.io.EOFException

One Thousand and One Solutions for Oracle Fusion Middleware is a series of articles about my experiences with different technologies in Oracle Fusion Middleware (OFM) area. In the articles of One Thousand and One Solutions for OFM, I try to summarize one exact issue and then present a solution that can help you. Moreover, I introduce you to available references regarding the issue. You can study them on your own whenever you have a little bit of time.

Technology, KeyWords:

WebLogic Server 10.3, Cluster, Unicast, Exception;JAVA.IO.EOFEXCEPTION

Error Message:

<BEA-003108> <Unicast receive error : java.io.EOFException” >

Problem:

If I restart one Managed Server in a Cluster, I get the following error:

<Error> <Cluster> <BEA-003108> <Unicast receive error : java.io.EOFException

java.io.EOFException

at java.io.DataInputStream.readFully(DataInputStream.java:180)

at java.io.DataInputStream.readLong(DataInputStream.java:399)

at java.io.ObjectInputStream$BlockDataInputStream.readLong(ObjectInputStream.java:2799)

at java.io.ObjectInputStream.readLong(ObjectInputStream.java:960)

at weblogic.cluster.HeartbeatMessage.readExternal(HeartbeatMessage.java:55)

Truncated. see log file for complete stacktrace>

Description

Weblogic declaration from: http://docs.oracle.com/cd/E17904_01/apirefs.1111/e14397/ClusterExtension.html

<BEA-003108> <Unicast receive error : java.io.EOFException”

Error: Unicast receive error: e

An error occurred while trying to receive a message over the cluster broadcast.

Cause

An error occurred while trying to receive a message over the cluster broadcast.

Action

Make sure that the NIC is functioning properly. If you believe there no environment problems exist, contact Oracle Customer Support and provide the stack trace for further analysis.

More detail and Background information:

Note: Sometimes you get Error Message: <BEA-002616> <Failed to listen on channel “Default”…> and it can be a logical effect of error BEA-003108.

The issue ” <BEA-003108> <Unicast receive error : java.io.EOFException”  occurs when using a Unicast cluster and when the cluster debug flags DebugCluster and/or DebugClusterHeartBeats are turned on for some of the managed servers, but not all of them.

The issue also occurs when the debug flags are turned on for the servers that existed previously in the cluster but not on servers which are newly added.

Solution

To resolve this issue, the cluster debug flags need to be enabled consistently: either disabled on all the servers of the cluster or enabled on all of them. In a production environment, debugging should be disabled. Cluster debug flags should be enabled only while debugging a problem with the Unicast cluster.

Solution via AdminServer

Click on Domain Structure and select your Server:

2_3108_1

Select debugging tab: 2_3108_2

Select items, that you want to disable or/enable, e.g. Disable weblogic debugging by click on „disable“(german: „deaktivieren“):

2_3108_3

Check Log files, wether the issue is solved.

Refernces

Quote

BEA-002616: Failed to listen on channel on listenAddress:port

One Thousand and One Solutions for Oracle Fusion Middleware

One Thousand and One Solutions for Oracle Fusion Middleware is a series of articles about my experiences with different technologies in Oracle Fusion Middleware (OFM) area. In the articles of One Thousand and One Solutions for OFM, I try to summarize one exact issue and then present a solution that can help you. Moreover, I introduce you to available references regarding the issue. You can study them on your own whenever you have a little bit of time.

Technology, KeyWords:

WebLogic Server 10.3, Cluster, Exception; Too many open files, Socket, Linux, SLES, Error Message: <BEA-002616> <Failed to listen on channel “Default”…>

 Problem:

<Critical> <Server> <BEA-002616> <Failed to listen on channel “Default” on IP_NR:Port, failure count: 1, failing for 0 seconds, java.net.SocketException: Too many open files>

 Description

Weblogic declaration from: http://docs.oracle.com/cd/E23549_01/apirefs.1111/e14397/Server.html

BEA-002616

Critical: Failed to listen on channel “channel” on listenAddress:port, failure count: fails1, failing for secs2 seconds, e3

  •  Description: The server listener will retry the listen after a short delay.
  •  Cause: The server got an exception while trying to accept client connections. It will try to backoff to aid recovery.
  •  Action: The OS limit for the number of open file descriptor (FD limit) needs to be increased. Tune OS parameters that might help the server to accept more client connections (e.g. TCP accept back log).

 More detail and Background information:

“Tune OS parameters” depends on OS which you use. For Linux, kernel parameters need adjustment- exact details depend on the distribution. There are individual user limits for open files. You can use `ulimit -a` command to find out the Linux for the user that owns.

You can find information regarding the important parameters e.g. number of “open files” or similar entry. Please consider, in AIX, the user limits apply as well as a system OS configuration for the total number of open files allowed on the host.

Solution

Here is an example how can you check and isolate the position of error.

1-    Check linux configuration (e.g. on host XXX):

> ulimit –a

core file size          (blocks, -c) 1

data seg size           (kbytes, -d) unlimited

scheduling priority             (-e) 0

file size               (blocks, -f) unlimited

pending signals                 (-i) 127575

max locked memory       (kbytes, -l) 64

max memory size         (kbytes, -m) 13887980

open files                      (-n) 65600 à OK

pipe size            (512 bytes, -p) 8

POSIX message queues     (bytes, -q) 819200

real-time priority              (-r) 0

stack size              (kbytes, -s) 8192

cpu time               (seconds, -t) unlimited

max user processes              (-u) 127575

virtual memory          (kbytes, -v) 17749200

file locks                      (-x) unlimited

The open file descriptor limit is at 65600 as recommended by Oracle.

2-    Check currently open files

  • Find PID of Managed Server

> ps -fea|grep myManagedServer

  • Check open files

Please use lsof (list open files) that lists information about files opened by processes

You see list of open files via (e.g.PID is here 1234)

>lsof -p 1234

In order to findour the number of open files:

> lsof -p 1234 | wc –l

In this case, we observed that application has 13 thousand connections on “CLOSE_WAIT” status. This usually happens when the application doesn’t close the connections properly:

CLOSE_WAIT

The remote end has shut down, waiting for the socket to close.

It’s reason why they are reaching the 65600 limit.

***

After more analyses this case, it is cleared, the issue is a result of weblogic cluster configuration.

“Open files” are waiting to close cluster communication between two nodes. See <BEA-003108> <Unicast receive error: java.io.EOFException” >

Refernces