Oracle Coherence 3.5 : Implementing Domain Objects - Domain model building blocks (part 1)

1/3/2013 11:39:20 AM

If you have read Domain Driven Design by Eric Evans [DDD], much of the information in this section will simply refresh your knowledge of the domain model building blocks described by Eric, with a Coherence twist. If you haven't, this section will give you enough background information to allow you to identify various domain objects within the application.

Rich versus Anemic domain models

The main argument Eric Evans makes in Domain Driven Design is that your domain objects should be used to implement business logic of your application, not just to hold data. While this logically makes perfect sense, it is not what most developers are used to, primarily because such architecture was discouraged by the J2EE spec, which pretty much required from you to turn your domain objects into property bags and implement all the logic in a higher-level service layer using session EJBs.

If you still develop the applications that way, you might be wondering if Coherence requires you to use rich domain objects. The short answer is: no, it does not. You can use any object with Coherence, as long as it is serializable, so you can easily use anemic domain objects as well.

The question, however, is why you would want to do that. If you are already creating custom classes for your objects, you might as well take the extra step and implement related behavior within them as well, instead of moving all the logic to a higher-level service layer. Otherwise, as Martin Fowler points out in his article Anemic Domain Model (http://martinfowler.com/bliki/AnemicDomainModel.html), you will be "robbing yourself blind and paying all the costs of a domain model without yielding any of the benefits".

After all, there isn't much you can achieve with an anemic domain object that you can't do using a more generic and readily available data structure, such as a map, so why even bother with custom objects if they are just dumb data holders?

In Domain Driven Design, Eric identifies a number of core building blocks of the domain model, such as entities, aggregates, value objects, services, factories, and repositories. The first three represent data objects in the model, and as such are the most important domain model artifacts from our perspective-they are what we will be storing in Coherence.

1. Entities and aggregates

An entity is an object that has an identity. The identity can be either a natural attribute of an object or it can be a surrogate attribute that is generated by the system when the entity is first created. Regardless of the type of identity, what is important is that once an identity is assigned to an entity, it remains the same throughout its lifetime.

An aggregate is a special, composite entity type, which represents a containment relationship between an aggregate root and dependent weak entities. For example, an order contains one or more line items, and while both the order and each individual line item are entities in their own right, a single line item is only meaningful within the larger context of an order.

Entities and aggregates are the most important types of domain objects from the Coherence perspective, as they usually have one-to-one mapping to Coherence caches.

One of the most common mistakes that beginners make is to treat Coherence as an in-memory database and create caches that are too finely grained. For example, they might configure one cache for orders and a separate cache for line items.

While this makes perfect sense when using a relational database, it isn't the best approach when using Coherence. Aggregates represent units of consistency from a business perspective, and the easiest way to achieve atomicity and consistency when using Coherence is to limit the scope of mutating operations to a single cache entry. Because of this, you should almost always store whole aggregates as individual cache entries. In the previous example that we used, an order and all of its line items would be stored as a single cache entry in the orders cache.

One exception to this rule might be the case when the aggregate root contains an unbound, continuously growing collection of dependent entities, such as Account and Transaction items in our domain model. In this case, it makes sense to separate dependent entities into their own cache, in order to avoid infinite growth of the aggregate object and to allow different caching policies to be used (for example, we might decide to keep all the accounts in the cache at all times, but only the last 60 days of transactions for each account, in order to keep the amount of memory used by transactions relatively constant over time).

Implementing entities

The domain model for our banking application contains three entities so far: Customer, Account, and Transaction. The last two form an aggregate, with the Account as aggregate root.

Because an entity is such an important type of object within a Coherence application, we will define an interface that all our entities have to implement:

public interface Entity<T> {
T getId();
}

The Entity interface is very simple, but it makes the fact that entities have an identity explicit. This is not strictly required, but it will come in handy on many occasions, such as when we implement repositories for our entities, as you'll see in a bit.

Entity implementation is quite simple for the most part: you define the attributes as you normally would and implement the necessary operations. In the case of the Account class, this might lead you to create something along these lines:

public class Account
implements Entity<Long>, Serializable {
// data members
private final Long m_id;
private final Long m_customerId;
private String m_description;
private Money m_balance;
private int m_lastTransactionId;
// dependencies
private transient CurrencyConverter m_currencyConverter;
private transient TransactionRepository m_transactionRepository;
// constructor, getters and setters omitted for brevity
...
// core logic
public Money withdraw(Money amount, String description)
throws InsufficientFundsException {
Money balance = m_balance;
if (!balance.isSameCurrency(amount)) {
CurrencyConversion conversion =
getCurrencyConverter().convert(amount, getCurrency());
amount = conversion.getConvertedAmount();
description += " (" +
conversion.getOriginalAmount() + " @ " +
conversion.getExchangeRate() + ")";
}
if (amount.greaterThan(balance)) {
throw new InsufficientFundsException(balance, amount);
}
entity, domain model building blocksimplementingm_balance = balance = balance.subtract(amount);
postTransaction(TransactionType.WITHDRAWAL, description, amount, balance);
return balance;
}
public Money deposit(Money amount, String description) {
// omitted for brevity (similar to withdraw)
}
protected void postTransaction(TransactionType type,
String description,
Money amount, Money balance) {
Transaction transaction =
Transaction.create(m_id, ++m_lastTransactionId,
type, description,
amount, balance);
getTransactionRepository().save(transaction);
}
}

As you can see, except for the fact that we've implemented the Entity interface and made the class Serializable, there is nothing particularly interesting about this class. The logic within it is expressed using concepts from a domain and there is absolutely nothing that ties it to Coherence.

However, we are not done yet, as there are few more things to consider.

Identity management

If an entity has a natural attribute that can be used to uniquely identify an instance of an entity, it is usually best to use that attribute as an identity. Unfortunately, many entities do not have such an attribute, in which case a surrogate identity must be generated by the system and assigned to entity instance.

Most databases provide a built-in mechanism for this purpose. For example, SQL Server allows you to define a numeric field that is automatically incremented when a new record is inserted into the table, while Oracle has a sequence mechanism, which allows you to get the next number for the named sequence object and use it within your INSERT statement. Another option is to generate and use a GUID (Globally Unique Identifier) object as an identity, which might be the best (or even required) option for scenarios where replication and synchronization across multiple independent data stores is required.

When you use the identity generation features of your database, you essentially let it handle all the grunt work for you and your biggest problem becomes how to obtain the generated identifier from the database and update your in-memory object to reflect it.

Coherence, on the other hand, forces you to define an object's identity up front. Because identity is typically used as a cache key, it is impossible to put an object into the cache unless you have a valid identifier for it. Unfortunately, while Coherence allows you to use UUIDs (Universally Unique Identifiers) as object identifiers and even provides an excellent, platform independent implementation of UUID, it does not have an out-of-the-box mechanism for sequential identifier generation. However, it is not too difficult to implement one, and the Coherence Tools open source project I mentioned earlier provides one such implementation in the form of SequenceGenerator class.

The SequenceGenerator is very simple to use. All you need to do is create an instance of it, passing sequence name and the number of identifiers the client should allocate on each call to the server (a variation of a Hi/Lo algorithm). The generator uses Coherence cache internally to keep track of all the sequences, which allows it to be used from any cluster member. It is also thread-safe and intended to be shared by instances of an entity that it creates identifiers for, so you will typically create it as a static final field:

public class Account
implements Entity<Long>, Serializable {
private static IdentityGenerator<Long> s_idGen =
SequenceGenerator.create("account.id", 20);
...
}

Creating entity instances

Now that we have identity generator, we should ensure that whenever a new object is created it is assigned a unique identity. While we could do this in a constructor, the idiom I like to use is to keep the constructor private and to provide a static factory method that is used to create new entity instances:

public class Account
implements Entity<Long>, Serializable {
...
private Account(Long id, Long customerId,
String description, Money balance) {
m_id = id;
m_customerId = customerId;
m_description = description;
m_balance = balance;
}
static Account create(Customer customer,
String description,
Currency currency) {
return new Account(s_idGen.generateIdentity(),
customer.getId(),
description,
new Money(0, currency));
}
...
}

This way a single constructor can be used to properly initialize an object instance not only during the initial creation, but also when the object is loaded from a persistent store or deserialized, as we'll see shortly.

Managing entity relationships

One thing you might've noticed in the previous examples is that the Account does not have a direct reference to a Customer. Instead, we only store the Customer's identifier as part of the Account's state and use it to obtain the customer when necessary:

public class Account
entity, domain model building blocksrelationships, managingimplements Entity<Long>, Serializable {
private final Long m_customerId;
...
public Customer getCustomer() {
return getCustomerRepository()
.getCustomer(m_customerId);
}
}

This is a common pattern when using Coherence, as identity lookups from a cache are cheap operations, especially if we configure near caching for the customers cache in this example. By doing this, we ensure that a Customer, which can be shared by several Account classes, is always obtained from the authoritative source and avoid the issues that would be caused if the shared Customer instance was serialized as part of each Account object that references it.

On the other hand, this is only one side of the relationship. How would we model a one-to-many relationship, such as the relationship between a Customer and several Account classes, or an Account and several Transaction classes?

There are two possible approaches. The first one is to query the cache on the many side of the relationship. For example, we could query the accounts cache for all the accounts that have a specific customer id. This is essentially the same approach you use with a relational database when you query a child table based on the foreign key that identifies the parent.

However, with Coherence you also have another option that will yield significantly better performance-you can store the identifiers of the child objects within the parent, and simply perform a getAll operation against the underlying Coherence cache when you need to retrieve them:

public class Customer
implements Entity<Long>, Serializable {
private Collection<Long> m_accountIds;
...
public Collection<Account> getAccounts() {
return getAccountRepository()
.getAccounts(m_accountIds);
}
}

This approach makes sense when the number of child objects is finite and you don't need to constrain the results in some other way. Neither of these is true for the getTransactions methods of the Account class-the transaction collection will likely grow indefinitely and the results of the getTransactions call need to be constrained by a time period. In this case, query against the transactions cache is a better approach.

Leaky abstractions

Notice that in the previous example, I passed a collection of account ids directly to the getAccounts repository method, which leaks the fact that we are doing a bulk identity lookup from the underlying store.

This might make it difficult to implement a repository for the store that doesn't support such operation or might force us to implement it in a suboptimal manner. For example, if we had to implement the same repository for a relational database, our only option would be to use an IN clause when selecting from a child table. While this is not the end of the world, a more natural and better performing approach would be to query the child table on the foreign key.

We can make that possible by modifying the repository interface to expose the getAccountsForCustomer method that accepts a Customer instance instead of a collection of account ids. That way the Coherence repository implementation would be able to perform identity lookup and the database repository implementation could execute the query on the foreign key.

The downside of such a change is that we would have to expose a getter for m_accountIds field to the outside world, which would break encapsulation. Considering that repositories tend to be leaky abstraction anyway and that they are rarely implemented for more than one specific persistence technology, the benefits of such change are questionable.

Dealing with dependencies

Both examples in the previous section had an external dependency on a repository, which begs the question on how these dependencies are provided to entities and by whom.

In a conventional application you could use Spring in combination with AspectJ or Dependency Injection features of your ORM to inject necessary dependencies into entities. However, implementing either of these approaches in a distributed system can be tricky, due to the fact that most repository implementations are not serializable.

The pattern I like to use is to lazily initialize dependencies by looking them up from a Registry:

private transient CustomerRepository m_customerRepository;
protected CustomerRepository getCustomerRepository() {
if (m_customerRepository == null) {
m_customerRepository =
RepositoryRegistry.getCustomerRepository();
}
return m_customerRepository;
}
public void setCustomerRepository(CustomerRepository customerRepository) {
m_customerRepository = customerRepository;
}

In this example, the m_customerRepository field is lazily initialized by retrieving a CustomerRepository instance from a RepositoryRegistry. The registry itself is a singleton that simply wraps Spring application context, which enables easy configuration of concrete repository implementations to use.

Finally, the setter allows injection of fakes or mocks within unit tests, which significantly simplifies testing by not requiring the registry to be configured.

Specifying data affinity

In some cases you might want to tell Coherence to store related objects together. For example, if we had a way to ensure that all the transactions for any given account are stored within the same cache partition, we would be able to optimize the query that returns transactions for an account by telling Coherence to only search that one partition. That means that in a well-balanced cluster with a million transactions in a cache and thousand partitions, we would only need to search one thousandth of the data, or 1,000 transactions, to find the ones we need.

While it is not possible to tell Coherence explicitly where to put individual cache entries, there is a way to specify which objects should be collocated within the same partition.

Coherence uses the cache entry key (or entity identifier, depending how you look at it) to determine which node and cache partition an entry should be stored on. If you want to ensure that two entries are stored within the same partition, all you need to do is tell Coherence how to associate their keys.

You can achieve this in two different ways:

By having your key classes implement the KeyAssociation interface
By implementing and configuring an external KeyAssociator

Both approaches require that you implement custom classes for your related objects' keys, typically as value objects containing the identifier of the parent object you want to associate with in addition to the object's own identifier. For example, in order to associate Transaction instances with the Account they belong to, we can implement a custom identity class as follows:

public class Transaction
implements Entity<Id>, Serializable {
...
public static class Id implements Serializable, KeyAssociation {
private Long m_accountId;
private Long m_txNumber;
public LineItemId(Long accountId, Long txNumber) {
m_accountId = accountId;
m_txNumber = txNumber;
}
public Object getAssociatedKey() {
return m_accountId;
}
public boolean equals(Object o) {
...
}
public int hashCode() {
...
}
}
}

The previous example uses the first of the two approaches, the KeyAssociation interface. The implementation of that interface is a single method, getAssociatedKey, which in this case returns the identifier of the parent Account instance.

The second approach requires you to implement key association logic in a separate class:

public class TransactionAssociator implements KeyAssociator {
public void init(PartitionedService partitionedService) {
}
}

If you choose this approach, you will also need to configure the line items cache to use the TransactionAssociator:

<distributed-scheme>
<!-- ... -->
</distributed-scheme>

Regardless of how you establish the association between your entities, Coherence will use the value returned by the getAssociatedKey method instead of the key itself to determine the storage partition for an object. This will ensure that all transactions for an account are stored within the same partition as the account itself.

Key association is not limited to aggregates and can be used to ensure that any related entities are collocated within the same partition. However, separately stored weak entities are usually very good candidates for key association, so you should keep that in mind when designing your domain model.

One potential issue with data affinity is that it might prevent Coherence from fully balancing the cluster. For example, if some accounts have many transactions and some only a few, you could run out of memory on one node even though there is plenty of room in the cluster as a whole. Because of this, you will only want to use data affinity if the associated objects are naturally well-balanced.

Others