| Message |
Experienced a %n1 ms communication delay (probable remote GC) with Member %s |
| Parameters |
%n1 - the latency in milliseconds of the communication delay; %s the full Member information |
| Severity |
2-Warning or 5-Debug Level 5 or 6-Debug Level 6 depending on the length of the delay |
| Cause |
This node detected a delay in receiving acknowledgment packets from the specified node, and has determined that is it likely due to a remote GC (rather than a local GC). This message indicates that the overdue acknowledgment has been received from the specified node, and that it has likely emerged from its GC. |
| Action |
Prolonged and frequent GC's can adversely affect cluster performance and availability. If these warnings are seen frequently, review your JVM heap and GC configuration and tuning. See the performance tuning guide for more details. |
| Message |
Failed to satisfy the variance: allowed=%n1 actual=%n2 |
| Parameters |
%n1 - the maximum allowed latency in milliseconds; %n2 - the actual latency in milliseconds |
| Severity |
3-Informational or 5-Debug Level 5 depending on the message frequency |
| Cause |
One of the first steps in the Coherence cluster discovery protocol is the calculation of the clock difference between the new and the senior nodes. This step assumes a relatively small latency for peer-to-peer round trip UDP communications between the nodes. By default, the configured maximum allowed latency (the value of the "maximum-time-variance" configuration element) is 16 milliseconds. Failure to satisfy that latency causes this message to be logged and increases the latency threshold, which will be reflected in a follow up message. |
| Action |
If the latency consistently stays very high (over 100 milliseconds), consult your network administrator and run the Datagram Test. |
| Message |
Created a new cluster "%s1" with Member(%s2) |
| Parameters |
%s1 - the cluster name; %s2 - the full Member information |
| Severity |
3-Informational |
| Cause |
This Coherence node attempted to join an existing cluster the configured amount of time (specified by the "multicast-listener/join-timeout-milliseconds" element), but did not receive any responses from any other node. As a result, it created a new cluster with the specified name (either configured by the "member-identity/cluster-name" element or calculated based on the multicast listener address and port or the "well-known-address" list). The Member information includes the node id, creation timestamp, unicast address and port, location, process id, role, etc.) |
| Action |
None, if this node is expected to be the first node in the cluster. Otherwise, the operational configuration has to be reviewed to determine the reason that this node does not join the existing cluster. |
| Message |
This Member(%s1) joined cluster "%s2" with senior Member(%s3) |
| Parameters |
%s1 - the full Member information for this node; %s2 - the cluster name; %s3 - the full Member information for the cluster senior node |
| Severity |
3-Informational |
| Cause |
This Coherence node has joined an existing cluster. |
| Action |
None, if this node is expected to join an existing cluster. Otherwise, identify the running cluster and consider corrective actions. |
| Message |
Member(%s) joined Cluster with senior member %n |
| Parameters |
%s - the full Member information for a new node that joined the cluster this node belongs to; %n - the node id of the cluster senior node |
| Severity |
5-Debug Level 5 |
| Cause |
A new node has joined an existing Coherence cluster. |
| Action |
None. |
| Message |
Member(%s) left Cluster with senior member %n |
| Parameters |
%s - the full Member information for a node that left the cluster; %n - the node id of the cluster senior node |
| Severity |
5-Debug Level 5 |
| Cause |
A node has left the cluster. This departure could be caused by the programmatic shutdown, process termination (normal or abnormal), or any other communication failure (e.g. a network disconnect or a very long GC pause). This message reports the node's departure. |
| Action |
None, if the node departure was intentional. Otherwise, the departed node logs should be analyzed. |
| Message |
MemberLeft notification for Member %n received from Member(%s) |
| Parameters |
%n - the node id of the departed node; %s - the full Member information for a node that left the cluster |
| Severity |
5-Debug Level 5 |
| Cause |
When a Coherence node terminates, this departure is detected by nodes earlier than others. Most commonly, a node connected via the TCP ring connection ("TCP ring buddy") would be the first to detect it. This message provides the information about the node that detected the departure first. |
| Action |
None, if the node departure was intentional. Otherwise, the logs for both the departed and the detecting nodes should be analyzed. |
| Message |
Service %s joined the cluster with senior service member %n |
| Parameters |
%s - the service name; %n - the senior service member id |
| Severity |
5-Debug Level 5 |
| Cause |
When a clustered service starts on a given node, Coherence initiates a handshake protocol between all cluster nodes running the specified service. This message serves as an indication that this protocol has been initiated. If the senior node is not known at this time, it will be shown as "n/a". |
| Action |
None. |
| Message |
This node appears to have partially lost the connectivity: it receives responses from MemberSet(%s1) which communicate with Member(%s2), but is not responding directly to this member; that could mean that either requests are not coming out or responses are not coming in; stopping cluster service. |
| Parameters |
%s1 - set of members that can communicate with the member indicated in %s2; %s2 - member that can communicate with set of members indicated in %s1 |
| Severity |
1-Error |
| Cause |
The communication link between this member and the member indicated by %s2 has been broken. However, the set of witnesses indicated by %s1 report no communication issues with %s2. It is therefore assumed that this node is in a state of partial failure, thus resulting in the shutdown of its cluster threads. |
| Action |
Corrective action is not necessarily required, since the rest of the cluster presumably is continuing its operation and this node may recover and rejoin the cluster. On the other hand, it may warrant an investigation into root causes of the problem (especially if it is recurring with some frequency). |
| Message |
validatePolls: This senior encountered an overdue poll, indicating a dead member, a significant network issue or an Operating System threading library bug (e.g. Linux NPTL): Poll |
| Parameters |
none |
| Severity |
2-Warning |
| Cause |
When a node joins a cluster, it performs a handshake with each cluster node. A missing handshake response prevents this node from joining the service. The log message following this one will indicate the corrective action taken by this node. |
| Action |
If this message reoccurs, further investigation into the root cause may be warranted. |
| Message |
Received panic from senior Member(%s1) caused by Member(%s2) |
| Parameters |
%s1 - the cluster senior member as known by this node; %s2 - a member claiming to be the senior member |
| Severity |
1-Error |
| Cause |
This occurs after a cluster is split into multiple cluster islands (usually due to a network link failure.) When a link is restored and the corresponding island seniors see each other, the panic protocol is initiated to resolve the conflict. |
| Action |
If this issue occurs frequently, the root cause of the cluster split should be investigated. |
|
| Message |
Member %n1 joined Service %s with senior member %n2 |
| Parameters |
%n1 - an id of the Coherence node that joins the service; %s - the service name; %n2 - the senior node for the service |
| Severity |
5-Debug Level 5 |
| Cause |
When a clustered service starts on any cluster node, Coherence initiates a handshake protocol between all cluster nodes running the specified service. This message serves as an indication that the specified node has successfully completed the handshake and joined the service. |
| Action |
None. |
| Message |
Member %n1 left Service %s with senior member %n2 |
| Parameters |
%n1 - an id of the Coherence node that joins the service; %s - the service name; %n2 - the senior node for the service |
| Severity |
5-Debug Level 5 |
| Cause |
When a clustered service terminates on some cluster node, all other nodes that run this service are notified about this event. This message serves as an indication that the specified clustered service at the specified node has terminated. |
| Action |
None. |
| Message |
Service %s: received ServiceConfigSync containing %n entries |
| Parameters |
%s - the service name; %n - the number of entries in the service configuration map |
| Severity |
5-Debug Level 5 |
| Cause |
As a part of the service handshake protocol between all cluster nodes running the specified service, the service senior member updates every new node with the full content of the service configuration map. For the partitioned cache services that map includes the full partition ownership catalog and internal ids for all existing caches. That same message is sent in the case of an abnormal service termination at the senior node, when a new node assumes the service seniority. This message serves as an indication that the specified node has received that configuration update. |
| Action |
None. |
| Message |
TcpRing: connecting to member %n using TcpSocket{%s} |
| Parameters |
%s - the full information for the TcpSocket that serves as a TcpRing connector to another node; %n - the node id to which this node has connected |
| Severity |
5-Debug Level 5 |
| Cause |
For quick process termination detection Coherence utilizes a feature called TcpRing, which is a sparse collection of TCP/IP-based connection between different nodes in the cluster. Each node in the cluster is connected to at least one other node, which (if at all possible) is running on a different physical box. This connection is not used for any data transfer; only trivial "heartbeat" communications are sent once a second per each link. This message indicates that the connection between this and specified node is initialized. |
| Action |
None. |
| Message |
Rejecting connection to member %n using TcpSocket{%s} |
| Parameters |
%n - the node id that tries to connect to this node; %s - the full information for the TcpSocket that serves as a TcpRing connector to another node |
| Severity |
4-Debug Level 4 |
| Cause |
Sometimes the TCP Ring daemons running on different nodes could attempt to join each other or the same node at the same time. In this case, the receiving node may determine that such a connection would be redundant and reject the incoming connection request. This message is logged by the rejecting node when this happens. |
| Action |
None. |
| Message |
Timeout while delivering a packet; requesting the departure confirmation for Member(%s1) by MemberSet(%s2) |
| Parameters |
%s1 - the full Member information for a node that this node failed to communicate with; %s2 - the full information about the "witness" nodes that are asked to confirm the suspected member departure |
| Severity |
2-Warning |
| Cause |
Coherence uses UDP for all data communications (mostly peer-to-peer unicast), which by itself does not have any delivery guarantees. Those guarantees are built into the cluster management protocol used by Coherence (TCMP). The TCMP daemons are responsible for acknowledgment (ACK or NACK) of all incoming communications. If one or more packets are not acknowledged within the ACK interval ("ack-delay-milliseconds"), they are resent. This repeats until the packets are finally acknowledged or the timeout interval elapses ("timeout-milliseconds"). At this time, this message is logged and the "witness" protocol is engaged, asking other cluster nodes whether or not they experience similar communication delays with the non-responding node. The witness nodes are chosen based on their roles and location. |
| Action |
Corrective action is not necessarily required, since the rest of the cluster presumably is continuing its operation and this node may recover and rejoin the cluster. On the other hand, it may warrant an investigation into root causes of the problem (especially if it is recurring with some frequency). |
| Message |
This node appears to have become disconnected from the rest of the cluster containing %n nodes. All departure confirmation requests went unanswered. Stopping cluster service. |
| Parameters |
%n - the number of other nodes in the cluster this node was a member of |
| Severity |
1-Error |
| Cause |
Sometime a node that lives within a valid Java process, stops communicating to other cluster nodes. (Possible reasons include: a) network failure; b) extremely long GC pause; c) swapped out process. ) In that case, other cluster nodes may choose to revoke the cluster membership fro the paused node and completely shun any further communication attempts by that node, causing this message be logged when the process attempts to resume cluster communications. |
| Action |
Corrective action is not necessarily required, since the rest of the cluster presumably is continuing its operation and this node may recover and rejoin the cluster. On the other hand, it may warrant an investigation into root causes of the problem (especially if it is recurring with some frequency). |
| Message |
A potential communication problem has been detected. A packet has failed to be delivered (or acknowledged) after %n1 seconds, although other packets were acknowledged by the same cluster member (Member(%s1)) to this member (Member(%s2)) as recently as %n2 seconds ago. Possible causes include network failure, poor thread scheduling (see FAQ if running on Windows), an extremely overloaded server, a server that is attempting to run its processes using swap space, and unreasonably lengthy GC times. |
| Parameters |
%n1 - The number of seconds a packet has failed to be delivered or acknowledged; %s1 - the recipient of the packets indicated in the message; %s2 - the sender of the packets indicated in the message; %n2 - the number of seconds since a packet was delivered successfully between the two members indicated above |
| Severity |
2-Warning |
| Cause |
Possible causes are indicated in the text of the message. |
| Action |
If this issue occurs frequently, the root cause should be investigated. |
| Message |
Node %s1 is not allowed to create a new cluster; WKA list: [%s2] |
| Parameters |
%s1 - Address of node attempting to join cluster; %s2 - List of WKA addresses |
| Severity |
1-Error |
| Cause |
The cluster is configured to use WKA, and there are no nodes present in the cluster that are in the WKA list. |
| Action |
Ensure that at least one node in the WKA list exists in the cluster, or add this node's address to the WKA list. |
| Message |
This member is configured with a compatible but different WKA list then the senior Member(%s). It is strongly recommended to use the same WKA list for all cluster members. |
| Parameters |
%s - the senior node of the cluster |
| Severity |
2-Warning |
| Cause |
The WKA list on this node is different than the WKA list on the senior node. |
| Action |
Ensure that every node in the cluster has the same WKA list. |
| Message |
UnicastUdpSocket failed to set receive buffer size to %n1 packets (%n2 bytes); actual size is %n3 packets (%n4 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance. |
| Parameters |
%n1 - the number of packets that will fit in the buffer that Coherence attempted to allocate; %n2 - the size of the buffer Coherence attempted to allocate; %n3 - the number of packets that will fit in the actual allocated buffer size; %n4 - the actual size of the allocated buffer |
| Severity |
2-Warning |
| Cause |
See OS Performance Tuning |
| Action |
See OS Performance Tuning |
|