Oracle Coherence 3.5 : Implementing object serialization (part 1) - POF basics, POF context

5/11/2013 2:23:14 AM

One of the most important choices you need to make for your domain objects (from the Coherence perspective) is how they will be serialized. Coherence works just fine with objects that simply implement java.io.Serializable interface, so that is typically the easiest way to try things out. However, that is also the slowest way to serialize your objects, as it heavily depends on reflection. It also introduces a lot of overhead into the object's serialized binary form, as it embeds into it things such as full class name, field names, and field types.

Serialization performance will impact most operations against a distributed cache, while the size of the serialized binary form will not only have an impact on network throughput, but more importantly, it will ultimately determine how much data you can store within a cluster of a certain size. Or, looking at it from a different angle, it will determine how many servers, how much RAM and how many Coherence licenses you will need in order to manage your data.

Coherence provides several serialization mechanisms that are significantly faster than the standard Java serialization and typically also result in a much smaller serialized binary form. The reason why there are several of them is that they were introduced one by one, in an effort to improve performance. The latest one, called Portable Object Format or POF, was introduced in Coherence 3.2 in order to allow .NET clients to access data within a Coherence cluster, but has since then become the recommended serialization format for pure Java applications as well.

1. POF basics

POF is an extremely compact, platform-independent binary serialization format. It can be used to serialize almost any Java, .NET, or C++ object into a POF value.

A POF value is a binary structure containing type identifier and value. The type identifier is an integer number, where numbers less than zero are used for the intrinsic types, while the numbers greater than zero can be used for custom user types.

User types are what we are interested in most, as all the domain objects we will create within an application are considered user types. The value of a user type is encoded within the POF stream as a list of indexed attributes, where each data member of a user type is encoded by specifying its index within the type. The attribute value is then encoded as a POF value, defined previously.

The fact that attribute indexes are used instead of attribute names makes POF very compact and fast, but it puts burden on the serializer implementation to ensure that attributes are written to and read from the POF stream in the same order, using the same indexes.

This decision, as well as the decision to use an integer type identifier instead of class name to represent the type of the value was made consciously, in order to make POF platform independent—Java class name is meaningless to a .NET client and vice-versa, and attribute names might be as well. The consequence is that unlike many other serialization formats, POF is not a self-describing serialization format by design, and it requires an external means of correlating platform independent user type identifiers with platform-specific classes.

A brief history of POF

Back in December of 2005, during the first The Spring Experience conference in Miami, I was working with Rob Harrop on the interoperability solution that would allow Spring.NET clients to communicate with Sprnig-managed Java services on the server. We had several working implementations, including SOAP web services and a custom IIOP implementation for .NET, but we weren't really happy with any of them, as they either had significant limitations, required too much configuration, were just plain slow, or all of the above, as was the case with SOAP web services.

What we wanted was something that was easy to configure, didn't impose inheritance requirements on our services and was as fast as it could be. The only option we saw was to implement a custom binary serialization mechanism that would be platform independent, but neither of us was brave enough to start working on it.

The very next week I was at JavaPolis in Antwerp, Belgium, listening to Cameron Purdy's talk on Coherence. One of the things he mentioned was how Java serialization is extremely slow and how Tangosol's proprietary ExternalizableLite serialization mechanism is some ten to twelve times faster. With Spring interop still fresh in mind, I approached Cameron after the talk and asked him if it would be possible to port ExternalizableLite to .NET. He just looked at me and said: "We need to talk.".

Well, we did talk, and what I learned was that Tangosol wanted to implement the .NET client for Coherence, as many customers were asking for it, and that in order to do that they needed a platform-independent serialization format and serializer implementations in both Java and .NET. A few months later, I received an e-mail with a serialization format specification, complete Java POF implementation and a question "Can you implement this in .NET for us?"

Over the next six months or so we implemented both POF and the full-blown .NET client for Coherence. All I can say is that the experience for me was very intense, and was definitely one of those humbling projects where you realize how little you really know. Working with Cameron and Jason Howes on POF and Coherence for .NET was a lot of fun and a great learning experience.

Although, I knew it would be that way as soon as I saw the following sentence in the POF specification:

In other words, PIF-POF is explicitly not intended to be able to answer all questions, nor to be all things to all people. If there is an 80/20 rule and a 90/10 rule, PIF-POF is designed for the equivalent of a 98/2 rule: it should suffice for all but the designs of an esoteric and/or convoluted mind.

2. POF context

A POF context provides a way to assign POF type identifiers to your custom types. There are two implementations that ship with Coherence, and you are free to implement your own if neither fits the bill, which is highly unlikely.

The first implementation is a SimplePofContext, which allows you to register user types programmatically, by calling the registerUserType method. This method takes three arguments: POF type identifier, a class of a user type, and a POF serializer to use.

The last argument, POF serializer, can be an instance of any class that implements the com.tangosol.io.pof.PofSerializer interface. You can implement the serializer yourself, or you can implement the com.tangosol.io.pof.PortableObject interface within your data objects and use the built-in PortableObjectSerializer, as in the following example:

SimplePofContext ctx = new SimplePofContext();

ctx.registerUserType(1000, Account.class, 
                     new PortableObjectSerializer(1000));
ctx.registerUserType(1001, 
                     Transaction.class, 
                     new PortableObjectSerializer(1001));
ctx.registerUserType(1002, 
                     Customer.class, 
                     new Customer.Serializer());

Regardless of the option chosen, you will also have to implement the actual serialization code that reads and writes object's attributes from/to a POF stream. We will get to that shortly, but now let's take a look at the other implementation of a PofContext interface, a ConfigurablePofContext class.

ConfigurablePofContext

The ConfigurablePofContext allows you to define mappings of user types to POF type identifiers in an external configuration file and is most likely what you will be using within your applications.

The POF configuration file is an XML file that has the following format:

<!DOCTYPE pof-config SYSTEM "pof-config.dtd">

<pof-config>

    <user-type-list>

        <include>otherPofConfig</include>

        <user-type>
            <type-id>typeId</type-id>
            <class-name>userTypeClass</class-name>
            <serializer>
                <class-name>serializerClass</class-name>
                <init-params>...</init-params>
            </serializer>
        </user-type>

        ...


    </user-type-list>

</pof-config>

The include element allows us to import user type definitions from another file. This enables us to separate POF configuration into multiple files in order to keep those files close to the actual types they are configuring, and to import all of them into the main POF configuration file that the application will use.

The serializer definition within the user-type element is optional, and if it is not specified PortableObjectSerializer will be used. For example, if we were to create a configuration file for the same user types we registered manually with the SimplePofContext in the previous example, it would look like this:

<!DOCTYPE pof-config SYSTEM "pof-config.dtd">

<pof-config>

    <user-type-list>

        <user-type>
            <type-id>1000</type-id>
            <class-name>
                sample.domain.Account
            </class-name>
        </user-type>

        <user-type>
            <type-id>1001</type-id>
            <class-name>
                sample.domain.Transaction
            </class-name>
        </user-type>

        <user-type>
            <type-id>1002</type-id>
            <class-name>
                sample.domain.Customer
            </class-name>
            <serializer>
                <class-name>
                    sample.domain.Customer$Serializer
                </class-name>
            </serializer>
        </user-type>

    </user-type-list>

</pof-config>

There is another thing worth pointing out regarding user type registration within a POF context.

You have probably noticed that I used type identifiers of 1000 and greater, even though any positive integer can be used. The reason for this is that the numbers below 1000 are reserved for various user types within Coherence itself, such as filter and entry processor implementations.

All internal user types are configured in the coherence-pof-config.xml file within coherence.jar, and you should import their definitions into your main POF configuration file using an include element:

<include>coherence-pof-config.xml</include>

Finally, it is worth noting that even though POF is the recommended serialization format from Coherence 3.4, it is not enabled within the cluster by default, for backwards compatibility reasons. In order to enable it you need to either configure it on a per service basis within the cache configuration file, or enable it globally by specifying the following system properties:

-Dtangosol.pof.enabled=true 
-Dtangosol.pof.config=my-pof-config.xml

Others