Implementing serialization for Evolvable objects
The Evolvable interface simply defines which
information class instances need to be able to provide in order for the
class to support schema evolution. The rest of the work is performed by
a serializer that knows how to use that information to support
serialization across multiple versions of a class.
The easiest way to add schema evolution support to
your application is to use an out-of-the-box serializer that implements
the necessary logic. One such serializer is the PortableObjectSerializer we discussed earlier, and it makes schema evolution a breeze. You simply implement both PortableObject and Evolvable interfaces within your class (or even simpler, a convenience EvolvablePortableObject interface), and the serializer takes care of the rest.
However, if you follow my earlier advice and
implement external serializers for your domain objects, you need to
handle object evolution yourself.
The algorithm to implement is fairly simple. When deserializing an object, we need to:
Read the data version from the POF stream and set the dataVersion attribute
Read object attributes as usual
Read the remaining attributes, if any, from the POF stream and set the futureData attribute
The last item is only meaningful when we are deserializing a newer object version. In all other cases futureData will be null.
When serializing an object, we need to do the exact opposite for steps 2 and 3, but the first step is slightly different:
Set the data version of the POF stream to the greater of implementation version or data version
Write object attributes as usual
Write future data into the POF stream
The reason why we need to write the greater of implementation or data version
in the first step, is that we always want to have the latest possible
version in the POF stream. If we deserialize a newer version of an
object, we need to ensure that its version is written into the POF
stream when we serialize the object again, as we'll be including its
original data into the POF stream as well. On the other hand, if we
deserialized an older version, we should write a new version, containing
new attributes while serializing the object again.
This is actually the key element of the schema
evolution strategy in Coherence that allows us to upgrade the cluster
node by node, upgrading the data stored within the cluster in the
process as well.
Imagine that you have a ten-node Coherence cluster
that you need to upgrade. You can shut a single node down, upgrade it
with new JAR files and restart it. Because the data is partitioned
across the cluster and there are backup copies available, the loss of a
single node is irrelevant—the cluster will repartition itself, backup
copies of the data will be promoted to primary copies, and the
application or applications using the cluster will be oblivious to the
loss of a node.
When an upgraded node rejoins the cluster, it will
become responsible for some of the data partitions. As the data it
manages is deserialized, instances of new classes will be created and
the new attributes will be either calculated or defaulted to their
initial values. When those instances are subsequently serialized and
stored in the cluster, their version is set to the latest implementation
version and any node or client application using one of the older
versions of the class will use the futureData attribute to preserve new attributes.
As you go through the same process with the remaining
nodes, more and more data will be incrementally upgraded to the latest
class version, until eventually all the data in the cluster uses the
current version.
What is important to note is that client applications
do not need to be simultaneously upgraded to use the new classes. They
can continue to use the older versions of the classes and will simply
store future data as a binary blob on reads, and include it into the POF
stream on writes. As a matter of fact, you can have ten different
applications, each using different versions of the data classes, and
they will all continue to work just fine, as long as all classes are
evolvable.
Now that we have the theory covered, let's see how we would actually implement a serializer for our Customer class to support evolution.
public class CustomerSerializer implements PofSerializer {
public void serialize(PofWriter writer, Object o)
throws IOException {
Customer c = (Customer) o;
int dataVersion = Math.max(c.getImplVersion(),
c.getDataVersion());
writer.setVersionId(dataVersion);
writer.writeLong(0, c.getId());
writer.writeString(1, c.getName());
writer.writeString (2, c.getEmail());
writer.writeObject(3, c.getAddress());
writer.writeCollection(4, c.getAccountIds());
writer.writeRemainder(c.getFutureData());
}
public Object deserialize(PofReader reader)
throws IOException {
Long id = reader.readLong(0);
String name = reader.readString(1);
String email = reader.readString (2);
Address address = (Address) reader.readObject(3);
Collection<Long> accountIds =
reader.readCollection(4, new ArrayList<Long>());
Customer c = new Customer(id, name, email, address, accountIds);
c.setDataVersion(pofReader.getVersionId());
c.setFutureData(pofReader.readRemainder());
return c;
}
}
The highlighted code is simple, but it is immediately obvious that it has nothing to do with the Customer class per se, as it only depends on the methods defined by the Evolvable interface. As such, it simply begs for refactoring into an abstract base class that we can reuse for all of our serializers:
public abstract class AbstractPofSerializer<T>
implements PofSerializer {
protected abstract void
serializeAttributes(T obj, PofWriter writer)
throws IOException;
protected abstract void
deserializeAttributes(T obj, PofReader reader)
throws IOException;
protected abstract T createInstance(PofReader reader)
throws IOException;
public void serialize(PofWriter writer, Object obj)
throws IOException {
T instance = (T) obj;
boolean isEvolvable = obj instanceof Evolvable;
Evolvable evolvable = null;
if (isEvolvable) {
evolvable = (Evolvable) obj;
int dataVersion = Math.max(
evolvable.getImplVersion(),
evolvable.getDataVersion());
writer.setVersionId(dataVersion);
}
serializeAttributes(instance, writer);
Binary futureData = isEvolvable
? evolvable.getFutureData()
: null;
writer.writeRemainder(futureData);
}
public Object deserialize(PofReader reader)
throws IOException {
T instance = createInstance(reader);
Evolvable evolvable = null;
boolean isEvolvable = instance instanceof Evolvable;
if (isEvolvable) {
evolvable = (Evolvable) instance;
evolvable.setDataVersion(
reader.getVersionId());
}
deserializeAttributes(instance, reader);
Binary futureData = reader.readRemainder();
if (isEvolvable) {
evolvable.setFutureData(futureData);
}
return instance;
}
}
The only thing worth pointing out is the fact that both the createInstance method and deserializeAttributes method read attributes from the POF stream. The difference between the two is that createInstance
should only read the attributes that are necessary for instance
creation, such as constructor or factory method arguments. All other
object attributes should be read from the stream and set within the deserializeAttributes method.