If you have read Domain Driven Design
by Eric Evans [DDD], much of the information in this section will
simply refresh your knowledge of the domain model building blocks
described by Eric, with a Coherence twist. If you haven't, this section
will give you enough background information to allow you to identify
various domain objects within the application.
Rich versus Anemic domain models
The main argument Eric Evans makes in Domain Driven Design
is that your domain objects should be used to implement business logic
of your application, not just to hold data. While this logically makes
perfect sense, it is not what most developers are used to, primarily
because such architecture was discouraged by the J2EE spec, which pretty
much required from you to turn your domain objects into property bags
and implement all the logic in a higher-level service layer using
session EJBs.
If you still develop the applications that way, you
might be wondering if Coherence requires you to use rich domain objects.
The short answer is: no, it does not. You can use any object with
Coherence, as long as it is serializable, so you can easily use anemic
domain objects as well.
The question, however, is why you would want to do
that. If you are already creating custom classes for your objects, you
might as well take the extra step and implement related behavior within
them as well, instead of moving all the logic to a higher-level service
layer. Otherwise, as Martin Fowler points out in his article Anemic Domain Model (http://martinfowler.com/bliki/AnemicDomainModel.html), you will be "robbing yourself blind and paying all the costs of a domain model without yielding any of the benefits".
After all, there isn't much you can achieve with an
anemic domain object that you can't do using a more generic and readily
available data structure, such as a map, so why even bother with custom
objects if they are just dumb data holders?
In Domain Driven Design, Eric identifies a number of core building blocks of the domain model, such as entities, aggregates, value objects, services, factories, and repositories.
The first three represent data objects in the model, and as such are
the most important domain model artifacts from our perspective-they are
what we will be storing in Coherence.
1. Entities and aggregates
An entity is an object that has an identity. The identity can be either a natural attribute of an object or it can be a surrogate attribute that is generated by the system when
the entity is first created. Regardless of the type of identity, what
is important is that once an identity is assigned to an entity, it
remains the same throughout its lifetime.
An aggregate is a special, composite entity type, which represents a containment relationship between an aggregate root and dependent weak entities.
For example, an order contains one or more line items, and while both
the order and each individual line item are entities in their own right,
a single line item is only meaningful within the larger context of an
order.
Entities and aggregates are the most important types
of domain objects from the Coherence perspective, as they usually have
one-to-one mapping to Coherence caches.
One of the most common
mistakes that beginners make is to treat Coherence as an in-memory
database and create caches that are too finely grained. For example,
they might configure one cache for orders and a separate cache for line
items.
While this makes perfect sense when using a
relational database, it isn't the best approach when using Coherence.
Aggregates represent units of consistency from a business perspective,
and the easiest way to achieve atomicity and consistency when using
Coherence is to limit the scope of mutating operations to a single cache
entry. Because of this, you should almost always store whole aggregates
as individual cache entries. In the previous example that we used, an
order and all of its line items would be stored as a single cache entry
in the orders cache.
One exception to this rule might be the case when the
aggregate root contains an unbound, continuously growing collection of
dependent entities, such as Account and Transaction
items in our domain model. In this case, it makes sense to separate
dependent entities into their own cache, in order to avoid infinite
growth of the aggregate object and to allow different caching policies
to be used (for example, we might decide to keep all the accounts in the
cache at all times, but only the last 60 days of transactions for each
account, in order to keep the amount of memory used by transactions
relatively constant over time).
Implementing entities
The domain model for our banking application contains three entities so far: Customer, Account, and Transaction. The last two form an aggregate, with the Account as aggregate root.
Because an entity is such an important type of object
within a Coherence application, we will define an interface that all
our entities have to implement:
public interface Entity<T> {
T getId();
}
The Entity interface is very simple, but it
makes the fact that entities have an identity explicit. This is not
strictly required, but it will come in handy on many occasions, such as
when we implement repositories for our entities, as you'll see in a bit.
Entity implementation is quite simple for the most
part: you define the attributes as you normally would and implement the
necessary operations. In the case of the Account class, this might lead you to create something along these lines:
public class Account
implements Entity<Long>, Serializable {
// data members
private final Long m_id;
private final Long m_customerId;
private String m_description;
private Money m_balance;
private int m_lastTransactionId;
// dependencies
private transient CurrencyConverter m_currencyConverter;
private transient TransactionRepository m_transactionRepository;
// constructor, getters and setters omitted for brevity
...
// core logic
public Money withdraw(Money amount, String description)
throws InsufficientFundsException {
Money balance = m_balance;
if (!balance.isSameCurrency(amount)) {
CurrencyConversion conversion =
getCurrencyConverter().convert(amount, getCurrency());
amount = conversion.getConvertedAmount();
description += " (" +
conversion.getOriginalAmount() + " @ " +
conversion.getExchangeRate() + ")";
}
if (amount.greaterThan(balance)) {
throw new InsufficientFundsException(balance, amount);
}
entity, domain model building blocksimplementingm_balance = balance = balance.subtract(amount);
postTransaction(TransactionType.WITHDRAWAL, description, amount, balance);
return balance;
}
public Money deposit(Money amount, String description) {
// omitted for brevity (similar to withdraw)
}
protected void postTransaction(TransactionType type,
String description,
Money amount, Money balance) {
Transaction transaction =
Transaction.create(m_id, ++m_lastTransactionId,
type, description,
amount, balance);
getTransactionRepository().save(transaction);
}
}
As you can see, except for the fact that we've implemented the Entity interface and made the class Serializable,
there is nothing particularly interesting about this class. The logic
within it is expressed using concepts from a domain and there is
absolutely nothing that ties it to Coherence.
However, we are not done yet, as there are few more things to consider.
Identity management
If an entity has a natural attribute that can be used
to uniquely identify an instance of an entity, it is usually best to
use that attribute as an identity. Unfortunately, many entities do not
have such an attribute, in which case a surrogate identity must be
generated by the system and assigned to entity instance.
Most databases provide a built-in mechanism for this
purpose. For example, SQL Server allows you to define a numeric field
that is automatically incremented when a new record is inserted into the
table, while Oracle has a sequence mechanism, which allows you to get
the next number for the named sequence object and use it within your INSERT statement. Another option is to generate and use a GUID (Globally Unique Identifier)
object as an identity, which might be the best (or even required)
option for scenarios where replication and synchronization across
multiple independent data stores is required.
When you use the identity generation features of your
database, you essentially let it handle all the grunt work for you and
your biggest problem becomes how to obtain the generated identifier from
the database and update your in-memory object to reflect it.
Coherence, on the other hand, forces you to define an
object's identity up front. Because identity is typically used as a
cache key, it is impossible to put an object into the cache unless you
have a valid identifier for it. Unfortunately, while Coherence allows
you to use UUIDs (Universally Unique Identifiers)
as object identifiers and even provides an excellent, platform
independent implementation of UUID, it does not have an out-of-the-box
mechanism for sequential identifier generation. However, it is not too
difficult to implement one, and the Coherence Tools open source project I
mentioned earlier provides one such implementation in the form of SequenceGenerator class.
The SequenceGenerator is very simple to use.
All you need to do is create an instance of it, passing sequence name
and the number of identifiers the client should allocate on each call to
the server (a variation of a Hi/Lo algorithm). The generator uses
Coherence cache internally to keep track of all the sequences, which
allows it to be used from any cluster member. It is also thread-safe and
intended to be shared by instances of an entity that it creates
identifiers for, so you will typically create it as a static final field:
public class Account
implements Entity<Long>, Serializable {
private static IdentityGenerator<Long> s_idGen =
SequenceGenerator.create("account.id", 20);
...
}
Creating entity instances
Now that we have identity generator, we should
ensure that whenever a new object is created it is assigned a unique
identity. While we could do this in a constructor, the idiom I like to
use is to keep the constructor private and to provide a static factory
method that is used to create new entity instances:
public class Account
implements Entity<Long>, Serializable {
...
private Account(Long id, Long customerId,
String description, Money balance) {
m_id = id;
m_customerId = customerId;
m_description = description;
m_balance = balance;
}
static Account create(Customer customer,
String description,
Currency currency) {
return new Account(s_idGen.generateIdentity(),
customer.getId(),
description,
new Money(0, currency));
}
...
}
This way a single constructor can be used to properly
initialize an object instance not only during the initial creation, but
also when the object is loaded from a persistent store or deserialized,
as we'll see shortly.
Managing entity relationships
One thing you might've noticed in the previous examples is that the Account does not have a direct reference to a Customer. Instead, we only store the Customer's identifier as part of the Account's state and use it to obtain the customer when necessary:
public class Account
entity, domain model building blocksrelationships, managingimplements Entity<Long>, Serializable {
private final Long m_customerId;
...
public Customer getCustomer() {
return getCustomerRepository()
.getCustomer(m_customerId);
}
}
This is a common pattern when using Coherence, as
identity lookups from a cache are cheap operations, especially if we
configure near caching for the customers cache in this example. By doing
this, we ensure that a Customer, which can be shared by several Account classes, is always obtained from the authoritative source and avoid the issues that would be caused if the shared Customer instance was serialized as part of each Account object that references it.
On the other hand, this is only one side of the
relationship. How would we model a one-to-many relationship, such as the
relationship between a Customer and several Account classes, or an Account and several Transaction classes?
There are two possible approaches. The first one is to query the cache on the many side of the relationship. For example, we could query the accounts
cache for all the accounts that have a specific customer id. This is
essentially the same approach you use with a relational database when
you query a child table based on the foreign key that identifies the
parent.
However, with Coherence you also have another option
that will yield significantly better performance-you can store the
identifiers of the child objects within the parent, and simply perform a getAll operation against the underlying Coherence cache when you need to retrieve them:
public class Customer
implements Entity<Long>, Serializable {
private Collection<Long> m_accountIds;
...
public Collection<Account> getAccounts() {
return getAccountRepository()
.getAccounts(m_accountIds);
}
}
This approach makes sense when the number of child
objects is finite and you don't need to constrain the results in some
other way. Neither of these is true for the getTransactions methods of the Account class-the transaction collection will likely grow indefinitely and the results of the getTransactions call need to be constrained by a time period. In this case, query against the transactions cache is a better approach.
Leaky abstractions
Notice that in the previous example, I passed a collection of account ids directly to the getAccounts repository method, which leaks the fact that we are doing a bulk identity lookup from the underlying store.
This might make it difficult to implement a
repository for the store that doesn't support such operation or might
force us to implement it in a suboptimal manner. For example, if we had
to implement the same repository for a relational database, our only
option would be to use an IN clause when selecting from a child
table. While this is not the end of the world, a more natural and
better performing approach would be to query the child table on the
foreign key.
We can make that possible by modifying the repository interface to expose the getAccountsForCustomer method that accepts a Customer
instance instead of a collection of account ids. That way the Coherence
repository implementation would be able to perform identity lookup and
the database repository implementation could execute the query on the
foreign key.
The downside of such a change is that we would have to expose a getter for m_accountIds
field to the outside world, which would break encapsulation.
Considering that repositories tend to be leaky abstraction anyway and
that they are rarely implemented for more than one specific persistence
technology, the benefits of such change are questionable.
Dealing with dependencies
Both examples in the previous section had an external
dependency on a repository, which begs the question on how these
dependencies are provided to entities and by whom.
In a conventional application you could use Spring in
combination with AspectJ or Dependency Injection features of your ORM
to inject necessary dependencies into entities. However, implementing
either of these approaches in a distributed system can be tricky, due to
the fact that most repository implementations are not serializable.
The pattern I like to use is to lazily initialize dependencies by looking them up from a Registry:
private transient CustomerRepository m_customerRepository;
protected CustomerRepository getCustomerRepository() {
if (m_customerRepository == null) {
m_customerRepository =
RepositoryRegistry.getCustomerRepository();
}
return m_customerRepository;
}
public void setCustomerRepository(CustomerRepository customerRepository) {
m_customerRepository = customerRepository;
}
In this example, the m_customerRepository field is lazily initialized by retrieving a CustomerRepository instance from a RepositoryRegistry.
The registry itself is a singleton that simply wraps Spring application
context, which enables easy configuration of concrete repository
implementations to use.
Finally, the setter allows injection of fakes or
mocks within unit tests, which significantly simplifies testing by not
requiring the registry to be configured.
Specifying data affinity
In some cases you might want to tell Coherence to
store related objects together. For example, if we had a way to ensure
that all the transactions for any given account are stored within the
same cache partition, we would be able to optimize the query that
returns transactions for an account by telling Coherence to only search
that one partition. That means that in a well-balanced cluster with a
million transactions in a cache and thousand partitions, we would only
need to search one thousandth of the data, or 1,000 transactions, to
find the ones we need.
While it is not possible to tell Coherence explicitly
where to put individual cache entries, there is a way to specify which
objects should be collocated within the same partition.
Coherence uses the cache entry key (or entity
identifier, depending how you look at it) to determine which node and
cache partition an entry should be stored on. If you want to ensure that
two entries are stored within the same partition, all you need to do is
tell Coherence how to associate their keys.
You can achieve this in two different ways:
Both approaches require that you implement custom
classes for your related objects' keys, typically as value objects
containing the identifier of the parent object you want to associate
with in addition to the object's own identifier. For example, in order
to associate Transaction instances with the Account they belong to, we can implement a custom identity class as follows:
public class Transaction
implements Entity<Id>, Serializable {
...
public static class Id implements Serializable, KeyAssociation {
private Long m_accountId;
private Long m_txNumber;
public LineItemId(Long accountId, Long txNumber) {
m_accountId = accountId;
m_txNumber = txNumber;
}
public Object getAssociatedKey() {
return m_accountId;
}
public boolean equals(Object o) {
...
}
public int hashCode() {
...
}
}
}
The previous example uses the first of the two approaches, the KeyAssociation interface. The implementation of that interface is a single method, getAssociatedKey, which in this case returns the identifier of the parent Account instance.
The second approach requires you to implement key association logic in a separate class:
public class TransactionAssociator implements KeyAssociator {
public void init(PartitionedService partitionedService) {
}
}
If you choose this approach, you will also need to configure the line items cache to use the TransactionAssociator:
<distributed-scheme>
<!-- ... -->
</distributed-scheme>
Regardless of how you establish the association between your entities, Coherence will use the value returned by the getAssociatedKey
method instead of the key itself to determine the storage partition for
an object. This will ensure that all transactions for an account are
stored within the same partition as the account itself.
Key association is not limited to aggregates and can
be used to ensure that any related entities are collocated within the
same partition. However, separately stored weak entities are usually
very good candidates for key association, so you should keep that in
mind when designing your domain model.
One potential issue with data
affinity is that it might prevent Coherence from fully balancing the
cluster. For example, if some accounts have many transactions and some
only a few, you could run out of memory on one node even though there is
plenty of room in the cluster as a whole. Because of this, you will
only want to use data affinity if the associated objects are naturally
well-balanced.