One of the most important choices you need
to make for your domain objects (from the Coherence perspective) is how
they will be serialized. Coherence works just fine with objects that
simply implement java.io.Serializable interface, so that is
typically the easiest way to try things out. However, that is also the
slowest way to serialize your objects, as it heavily depends on
reflection. It also introduces a lot of overhead into the object's
serialized binary form, as it embeds into it things such as full class
name, field names, and field types.
Serialization performance will impact most operations
against a distributed cache, while the size of the serialized binary
form will not only have an impact on network throughput, but more
importantly, it will ultimately determine how much data you can store
within a cluster of a certain size. Or, looking at it from a different
angle, it will determine how many servers, how much RAM and how many
Coherence licenses you will need in order to manage your data.
Coherence provides several serialization mechanisms
that are significantly faster than the standard Java serialization and
typically also result in a much smaller serialized binary form. The
reason why there are several of them is that they were introduced one by
one, in an effort to improve performance. The latest one, called Portable Object Format or POF,
was introduced in Coherence 3.2 in order to allow .NET clients to
access data within a Coherence cluster, but has since then become the
recommended serialization format for pure Java applications as well.
1. POF basics
POF is an extremely compact, platform-independent
binary serialization format. It can be used to serialize almost any
Java, .NET, or C++ object into a POF value.
A POF value is a binary structure containing type identifier
and value. The type identifier is an integer number, where numbers less than zero are used for the intrinsic types, while the numbers greater than zero can be used for custom user types.
User types are what we are interested in most, as all
the domain objects we will create within an application are considered
user types. The value of a user type is encoded within the POF stream as
a list of indexed attributes,
where each data member of a user type is encoded by specifying its
index within the type. The attribute value is then encoded as a POF
value, defined previously.
The fact that attribute indexes are used instead of
attribute names makes POF very compact and fast, but it puts burden on
the serializer implementation to ensure that attributes are written to
and read from the POF stream in the same order, using the same indexes.
This decision, as well as the decision to use an
integer type identifier instead of class name to represent the type of
the value was made consciously, in order to make POF platform
independent—Java class name is meaningless to a .NET client and
vice-versa, and attribute names might be as well. The consequence is
that unlike many other serialization formats, POF is not a
self-describing serialization format by design, and it requires an
external means of correlating platform independent user type identifiers
with platform-specific classes.
A brief history of POF
Back in December of 2005, during the first The Spring
Experience conference in Miami, I was working with Rob Harrop on the
interoperability solution that would allow Spring.NET clients to
communicate with Sprnig-managed Java services on the server. We had
several working implementations, including SOAP web services and a
custom IIOP implementation for .NET, but we weren't really happy with
any of them, as they either had significant limitations, required too
much configuration, were just plain slow, or all of the above, as was
the case with SOAP web services.
What we wanted was something that was easy to
configure, didn't impose inheritance requirements on our services and
was as fast as it could be. The only option we saw was to implement a
custom binary serialization mechanism that would be platform
independent, but neither of us was brave enough to start working on it.
The very next week I was at JavaPolis in Antwerp,
Belgium, listening to Cameron Purdy's talk on Coherence. One of the
things he mentioned was how Java serialization is extremely slow and how
Tangosol's proprietary ExternalizableLite serialization
mechanism is some ten to twelve times faster. With Spring interop still
fresh in mind, I approached Cameron after the talk and asked him if it
would be possible to port ExternalizableLite to .NET. He just looked at me and said: "We need to talk.".
Well, we did talk, and what I learned was that
Tangosol wanted to implement the .NET client for Coherence, as many
customers were asking for it, and that in order to do that they needed a
platform-independent serialization format and serializer
implementations in both Java and .NET. A few months later, I received an
e-mail with a serialization format specification, complete Java POF
implementation and a question "Can you implement this in .NET for us?"
Over the next six months or so we implemented both
POF and the full-blown .NET client for Coherence. All I can say is that
the experience for me was very intense, and was definitely one of those
humbling projects where you realize how little you really know. Working
with Cameron and Jason Howes on POF and Coherence for .NET was a lot of
fun and a great learning experience.
Although, I knew it would be that way as soon as I saw the following sentence in the POF specification:
In other words, PIF-POF is
explicitly not intended to be able to answer all questions, nor to be
all things to all people. If there is an 80/20 rule and a 90/10 rule,
PIF-POF is designed for the equivalent of a 98/2 rule: it should suffice
for all but the designs of an esoteric and/or convoluted mind.
2. POF context
A POF context
provides a way to assign POF type identifiers to your custom types.
There are two implementations that ship with Coherence, and you are free
to implement your own if neither fits the bill, which is highly
unlikely.
The first implementation is a SimplePofContext, which allows you to register user types programmatically, by calling the registerUserType method. This method takes three arguments: POF type identifier, a class of a user type, and a POF serializer to use.
The last argument, POF serializer, can be an instance of any class that implements the com.tangosol.io.pof.PofSerializer interface. You can implement the serializer yourself, or you can implement the com.tangosol.io.pof.PortableObject interface within your data objects and use the built-in PortableObjectSerializer, as in the following example:
SimplePofContext ctx = new SimplePofContext();
ctx.registerUserType(1000, Account.class,
new PortableObjectSerializer(1000));
ctx.registerUserType(1001,
Transaction.class,
new PortableObjectSerializer(1001));
ctx.registerUserType(1002,
Customer.class,
new Customer.Serializer());
Regardless of the option chosen, you will also have
to implement the actual serialization code that reads and writes
object's attributes from/to a POF stream. We will get to that shortly,
but now let's take a look at the other implementation of a PofContext interface, a ConfigurablePofContext class.
ConfigurablePofContext
The ConfigurablePofContext allows you to
define mappings of user types to POF type identifiers in an external
configuration file and is most likely what you will be using within your
applications.
The POF configuration file is an XML file that has the following format:
<!DOCTYPE pof-config SYSTEM "pof-config.dtd">
<pof-config>
<user-type-list>
<include>otherPofConfig</include>
<user-type>
<type-id>typeId</type-id>
<class-name>userTypeClass</class-name>
<serializer>
<class-name>serializerClass</class-name>
<init-params>...</init-params>
</serializer>
</user-type>
...
</user-type-list>
</pof-config>
The include element allows us to import user
type definitions from another file. This enables us to separate POF
configuration into multiple files in order to keep those files close to
the actual types they are configuring, and to import all of them into
the main POF configuration file that the application will use.
The serializer definition within the user-type element is optional, and if it is not specified PortableObjectSerializer will be used. For example, if we were to create a configuration file for the same user types we registered manually with the SimplePofContext in the previous example, it would look like this:
<!DOCTYPE pof-config SYSTEM "pof-config.dtd">
<pof-config>
<user-type-list>
<user-type>
<type-id>1000</type-id>
<class-name>
sample.domain.Account
</class-name>
</user-type>
<user-type>
<type-id>1001</type-id>
<class-name>
sample.domain.Transaction
</class-name>
</user-type>
<user-type>
<type-id>1002</type-id>
<class-name>
sample.domain.Customer
</class-name>
<serializer>
<class-name>
sample.domain.Customer$Serializer
</class-name>
</serializer>
</user-type>
</user-type-list>
</pof-config>
There is another thing worth pointing out regarding user type registration within a POF context.
You have probably noticed that I used type
identifiers of 1000 and greater, even though any positive integer can be
used. The reason for this is that the numbers below 1000 are reserved
for various user types within Coherence itself, such as filter and entry
processor implementations.
All internal user types are configured in the coherence-pof-config.xml file within coherence.jar, and you should import their definitions into your main POF configuration file using an include element:
<include>coherence-pof-config.xml</include>
Finally, it is worth noting that even though POF is
the recommended serialization format from Coherence 3.4, it is not
enabled within the cluster by default, for backwards compatibility
reasons. In order to enable it you need to either configure it on a per
service basis within the cache configuration file, or enable it globally
by specifying the following system properties:
-Dtangosol.pof.enabled=true
-Dtangosol.pof.config=my-pof-config.xml