TCMP Log Messages

Skip to end of metadata
Go to start of metadata

TCMP Log Messages

Message Experienced a %n1 ms communication delay (probable remote GC) with Member %s
Parameters %n1 - the latency in milliseconds of the communication delay; %s the full Member information
Severity 2-Warning or 5-Debug Level 5 or 6-Debug Level 6 depending on the length of the delay
Cause This node detected a delay in receiving acknowledgment packets from the specified node, and has determined that is it likely due to a remote GC (rather than a local GC). This message indicates that the overdue acknowledgment has been received from the specified node, and that it has likely emerged from its GC.
Action Prolonged and frequent GC's can adversely affect cluster performance and availability. If these warnings are seen frequently, review your JVM heap and GC configuration and tuning. See the performance tuning guide for more details.
Message Failed to satisfy the variance: allowed=%n1 actual=%n2
Parameters %n1 - the maximum allowed latency in milliseconds; %n2 - the actual latency in milliseconds
Severity 3-Informational or 5-Debug Level 5 depending on the message frequency
Cause One of the first steps in the Coherence cluster discovery protocol is the calculation of the clock difference between the new and the senior nodes. This step assumes a relatively small latency for peer-to-peer round trip UDP communications between the nodes. By default, the configured maximum allowed latency (the value of the "maximum-time-variance" configuration element) is 16 milliseconds. Failure to satisfy that latency causes this message to be logged and increases the latency threshold, which will be reflected in a follow up message.
Action If the latency consistently stays very high (over 100 milliseconds), consult your network administrator and run the Datagram Test.
Message Created a new cluster "%s1" with Member(%s2)
Parameters %s1 - the cluster name; %s2 - the full Member information
Severity 3-Informational
Cause This Coherence node attempted to join an existing cluster the configured amount of time (specified by the "multicast-listener/join-timeout-milliseconds" element), but did not receive any responses from any other node. As a result, it created a new cluster with the specified name (either configured by the "member-identity/cluster-name" element or calculated based on the multicast listener address and port or the "well-known-address" list). The Member information includes the node id, creation timestamp, unicast address and port, location, process id, role, etc.)
Action None, if this node is expected to be the first node in the cluster. Otherwise, the operational configuration has to be reviewed to determine the reason that this node does not join the existing cluster.
Message This Member(%s1) joined cluster "%s2" with senior Member(%s3)
Parameters %s1 - the full Member information for this node; %s2 - the cluster name; %s3 - the full Member information for the cluster senior node
Severity 3-Informational
Cause This Coherence node has joined an existing cluster.
Action None, if this node is expected to join an existing cluster. Otherwise, identify the running cluster and consider corrective actions.
Message Member(%s) joined Cluster with senior member %n
Parameters %s - the full Member information for a new node that joined the cluster this node belongs to; %n - the node id of the cluster senior node
Severity 5-Debug Level 5
Cause A new node has joined an existing Coherence cluster.
Action None.
Message Member(%s) left Cluster with senior member %n
Parameters %s - the full Member information for a node that left the cluster; %n - the node id of the cluster senior node
Severity 5-Debug Level 5
Cause A node has left the cluster. This departure could be caused by the programmatic shutdown, process termination (normal or abnormal), or any other communication failure (e.g. a network disconnect or a very long GC pause). This message reports the node's departure.
Action None, if the node departure was intentional. Otherwise, the departed node logs should be analyzed.
Message MemberLeft notification for Member %n received from Member(%s)
Parameters %n - the node id of the departed node; %s - the full Member information for a node that left the cluster
Severity 5-Debug Level 5
Cause When a Coherence node terminates, this departure is detected by nodes earlier than others. Most commonly, a node connected via the TCP ring connection ("TCP ring buddy") would be the first to detect it. This message provides the information about the node that detected the departure first.
Action None, if the node departure was intentional. Otherwise, the logs for both the departed and the detecting nodes should be analyzed.
Message Service %s joined the cluster with senior service member %n
Parameters %s - the service name; %n - the senior service member id
Severity 5-Debug Level 5
Cause When a clustered service starts on a given node, Coherence initiates a handshake protocol between all cluster nodes running the specified service. This message serves as an indication that this protocol has been initiated. If the senior node is not known at this time, it will be shown as "n/a".
Action None.
Message This node appears to have partially lost the connectivity: it receives responses from MemberSet(%s1) which communicate with Member(%s2), but is not responding directly to this member; that could mean that either requests are not coming out or responses are not coming in; stopping cluster service.
Parameters %s1 - set of members that can communicate with the member indicated in %s2; %s2 - member that can communicate with set of members indicated in %s1
Severity 1-Error
Cause The communication link between this member and the member indicated by %s2 has been broken. However, the set of witnesses indicated by %s1 report no communication issues with %s2. It is therefore assumed that this node is in a state of partial failure, thus resulting in the shutdown of its cluster threads.
Action Corrective action is not necessarily required, since the rest of the cluster presumably is continuing its operation and this node may recover and rejoin the cluster. On the other hand, it may warrant an investigation into root causes of the problem (especially if it is recurring with some frequency).
Message validatePolls: This senior encountered an overdue poll, indicating a dead member, a significant network issue or an Operating System threading library bug (e.g. Linux NPTL): Poll
Parameters none
Severity 2-Warning
Cause When a node joins a cluster, it performs a handshake with each cluster node. A missing handshake response prevents this node from joining the service. The log message following this one will indicate the corrective action taken by this node.
Action If this message reoccurs, further investigation into the root cause may be warranted.
Message Received panic from senior Member(%s1) caused by Member(%s2)
Parameters %s1 - the cluster senior member as known by this node; %s2 - a member claiming to be the senior member
Severity 1-Error
Cause This occurs after a cluster is split into multiple cluster islands (usually due to a network link failure.) When a link is restored and the corresponding island seniors see each other, the panic protocol is initiated to resolve the conflict.
Action If this issue occurs frequently, the root cause of the cluster split should be investigated.
Message Member %n1 joined Service %s with senior member %n2
Parameters %n1 - an id of the Coherence node that joins the service; %s - the service name; %n2 - the senior node for the service
Severity 5-Debug Level 5
Cause When a clustered service starts on any cluster node, Coherence initiates a handshake protocol between all cluster nodes running the specified service. This message serves as an indication that the specified node has successfully completed the handshake and joined the service.
Action None.
Message Member %n1 left Service %s with senior member %n2
Parameters %n1 - an id of the Coherence node that joins the service; %s - the service name; %n2 - the senior node for the service
Severity 5-Debug Level 5
Cause When a clustered service terminates on some cluster node, all other nodes that run this service are notified about this event. This message serves as an indication that the specified clustered service at the specified node has terminated.
Action None.
Message Service %s: received ServiceConfigSync containing %n entries
Parameters %s - the service name; %n - the number of entries in the service configuration map
Severity 5-Debug Level 5
Cause As a part of the service handshake protocol between all cluster nodes running the specified service, the service senior member updates every new node with the full content of the service configuration map. For the partitioned cache services that map includes the full partition ownership catalog and internal ids for all existing caches. That same message is sent in the case of an abnormal service termination at the senior node, when a new node assumes the service seniority. This message serves as an indication that the specified node has received that configuration update.
Action None.
Message TcpRing: connecting to member %n using TcpSocket{%s}
Parameters %s - the full information for the TcpSocket that serves as a TcpRing connector to another node; %n - the node id to which this node has connected
Severity 5-Debug Level 5
Cause For quick process termination detection Coherence utilizes a feature called TcpRing, which is a sparse collection of TCP/IP-based connection between different nodes in the cluster. Each node in the cluster is connected to at least one other node, which (if at all possible) is running on a different physical box. This connection is not used for any data transfer; only trivial "heartbeat" communications are sent once a second per each link. This message indicates that the connection between this and specified node is initialized.
Action None.
Message Rejecting connection to member %n using TcpSocket{%s}
Parameters %n - the node id that tries to connect to this node; %s - the full information for the TcpSocket that serves as a TcpRing connector to another node
Severity 4-Debug Level 4
Cause Sometimes the TCP Ring daemons running on different nodes could attempt to join each other or the same node at the same time. In this case, the receiving node may determine that such a connection would be redundant and reject the incoming connection request. This message is logged by the rejecting node when this happens.
Action None.
Message Timeout while delivering a packet; requesting the departure confirmation for Member(%s1) by MemberSet(%s2)
Parameters %s1 - the full Member information for a node that this node failed to communicate with; %s2 - the full information about the "witness" nodes that are asked to confirm the suspected member departure
Severity 2-Warning
Cause Coherence uses UDP for all data communications (mostly peer-to-peer unicast), which by itself does not have any delivery guarantees. Those guarantees are built into the cluster management protocol used by Coherence (TCMP). The TCMP daemons are responsible for acknowledgment (ACK or NACK) of all incoming communications. If one or more packets are not acknowledged within the ACK interval ("ack-delay-milliseconds"), they are resent. This repeats until the packets are finally acknowledged or the timeout interval elapses ("timeout-milliseconds"). At this time, this message is logged and the "witness" protocol is engaged, asking other cluster nodes whether or not they experience similar communication delays with the non-responding node. The witness nodes are chosen based on their roles and location.
Action Corrective action is not necessarily required, since the rest of the cluster presumably is continuing its operation and this node may recover and rejoin the cluster. On the other hand, it may warrant an investigation into root causes of the problem (especially if it is recurring with some frequency).
Message This node appears to have become disconnected from the rest of the cluster containing %n nodes. All departure confirmation requests went unanswered. Stopping cluster service.
Parameters %n - the number of other nodes in the cluster this node was a member of
Severity 1-Error
Cause Sometime a node that lives within a valid Java process, stops communicating to other cluster nodes. (Possible reasons include: a) network failure; b) extremely long GC pause; c) swapped out process. ) In that case, other cluster nodes may choose to revoke the cluster membership fro the paused node and completely shun any further communication attempts by that node, causing this message be logged when the process attempts to resume cluster communications.
Action Corrective action is not necessarily required, since the rest of the cluster presumably is continuing its operation and this node may recover and rejoin the cluster. On the other hand, it may warrant an investigation into root causes of the problem (especially if it is recurring with some frequency).
Message A potential communication problem has been detected. A packet has failed to be delivered (or acknowledged) after %n1 seconds, although other packets were acknowledged by the same cluster member (Member(%s1)) to this member (Member(%s2)) as recently as %n2 seconds ago. Possible causes include network failure, poor thread scheduling (see FAQ if running on Windows), an extremely overloaded server, a server that is attempting to run its processes using swap space, and unreasonably lengthy GC times.
Parameters %n1 - The number of seconds a packet has failed to be delivered or acknowledged; %s1 - the recipient of the packets indicated in the message; %s2 - the sender of the packets indicated in the message; %n2 - the number of seconds since a packet was delivered successfully between the two members indicated above
Severity 2-Warning
Cause Possible causes are indicated in the text of the message.
Action If this issue occurs frequently, the root cause should be investigated.
Message Node %s1 is not allowed to create a new cluster; WKA list: [%s2]
Parameters %s1 - Address of node attempting to join cluster; %s2 - List of WKA addresses
Severity 1-Error
Cause The cluster is configured to use WKA, and there are no nodes present in the cluster that are in the WKA list.
Action Ensure that at least one node in the WKA list exists in the cluster, or add this node's address to the WKA list.
Message This member is configured with a compatible but different WKA list then the senior Member(%s). It is strongly recommended to use the same WKA list for all cluster members.
Parameters %s - the senior node of the cluster
Severity 2-Warning
Cause The WKA list on this node is different than the WKA list on the senior node.
Action Ensure that every node in the cluster has the same WKA list.
Message UnicastUdpSocket failed to set receive buffer size to %n1 packets (%n2 bytes); actual size is %n3 packets (%n4 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance.
Parameters %n1 - the number of packets that will fit in the buffer that Coherence attempted to allocate; %n2 - the size of the buffer Coherence attempted to allocate; %n3 - the number of packets that will fit in the actual allocated buffer size; %n4 - the actual size of the allocated buffer
Severity 2-Warning
Cause See OS Performance Tuning
Action See OS Performance Tuning
Labels:
log_messages log_messages Delete
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.