Data Affinity

Skip to end of metadata
Go to start of metadata

This section contains the following information:

Understanding Affinity

Data affinity describes the concept of ensuring that a group of related cache entries is contained within a single cache partition. This ensures that all relevant data is managed on a single primary cache node (without compromising fault-tolerance).

Affinity may span multiple caches (as long as they are managed by the same cache service, which will generally be the case). For example, in a master-detail pattern such as an "Order-LineItem", the Order object may be co-located with the entire collection of LineItem objects that are associated with it.

The benefit is two-fold. First, only a single cache node is required to manage queries and transactions against a set of related items. Second, all concurrency operations can be managed locally, avoiding the need for clustered synchronization.

A number of standard Coherence operations can benefit from affinity, including cache queries, ${xhtml} operations and the getAll/putAll/removeAll methods.

Data affinity is specified in terms of entry keys (not values). As a result, the association information must be present in the key class. Similarly, the association logic applies to the key class, not the value class.

Specifying Affinity

Affinity is specified in terms of a relationship to a partitioned key. In the Order-LineItem example above, the Order objects would be partitioned normally, and the LineItem objects would be associated with the appropriate Order object.

The association does not need to be directly tied to the actual parent key – it only needs to be a functional mapping of the parent key. It could be a single field of the parent key (even if it is non-unique), or an integer hash of the parent key. All that matters is that all child keys return the same associated key; it does not matter whether the associated key is an actual key (it is simply a "group id"). This fact may help minimize the size impact on the child key classes that don't already contain the parent key information (as it is derived data, the size of the data may be decided explicitly, and it also will not affect the behavior of the key). Note that making the association too general (having too many keys associated with the same "group id") can cause a "lumpy" distribution (if all child keys return the same association key regardless of what the parent key is, the child keys will all be assigned to a single partition, and will not be spread across the cluster).

There are two ways to ensure that a set of cache entries are co-located. Note that association is based on the cache key, not the value (otherwise updating a cache entry could cause it to change partitions). Also, note that while the Order will be co-located with the child LineItems, Coherence at present does not support composite operations that span multiple caches (for example, updating the Order and the collection of LineItems within a single invocation request ${xhtml}).

Specifying Data Affinity with a KeyAssociation

For application-defined keys, the class (of the cache key) may implement ${xhtml} as follows:

Specifying Data Affinity with a KeyAssociator

Applications may also provide a custom ${xhtml}:

The key associator may be configured for a NamedCache in the associated distributed-scheme element:

Example of Using Affinity

An example of using affinity for efficient query ( ${xhtml}) and cache access ( ${xhtml}).

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.