As anyone who has worked with HL7
would know, ensuring the order of a sequence of messages is a major
issue. Most know that BizTalk has a mechanism called Ordered Delivery
that is available for a port inside an orchestration or within a
messaging port. In short, this setting forces the port to deliver the
messages out of the Messagebox in the order in which they were received.
This ensures that the First In—First Out pattern is followed when
dealing with messages arriving on a port. In BizTalk 2004, this
mechanism was available only when using the MSMQT transport adapter.
Luckily in BizTalk 2009, ordered delivery has become an adapter-agnostic
option and even extends to custom adapters.
1. Building a Resequencer
Ordered delivery
guarantees order in and out of the Messagebox. However, before you can
consider the order problem solved, there are a couple of show-stopping
things that you need to deal with:
Using ordered delivery is a major performance
bottleneck. As great as this option is, when the rubber hits the road,
your overall solution throughput will drop drastically when you use the
default End Point Manager (EPM) ordered delivery. This is because the
BizTalk engine essentially has to "single-thread" each of the messages
as they arrive on the port to their appropriate destination. This means
that every message that arrives on the port can be dequeued,
transformed, and delivered one at a time only. In many high-throughput
scenarios, using the default ordered delivery pattern is simply not an
option because of this fact.
Ordered
delivery assumes the messages arrive in the correct order. In many
situations, this simply isn't the case. In this scenario, the default
ordered delivery pattern simply doesn't work.
What is needed to implement proper ordering is a Resequencer
pattern. The job of the resequencer is to examine incoming messages,
check the order of the messages (i.e., current message is 7 of 9), and
reorder the messages as they arrive into the proper order. To implement
such resequencing in BizTalk, you need a couple of components as listed
in the following subsections, along with some base assumptions.
2. Resequencer Assumptions
Like most patterns, the Resequencer pattern is based on a number of assumptions:
Assuming the messages are arriving out of order, there is a way to examine the incoming message and know
What number the message is in the sequence to be received
A
flag exists somewhere in the message payload to indicate whether the
current fragment is the last in the sequence, or the total number of
messages
Once the messages are
received into your resequencer, you can start sending messages out
immediately so long as you can preserve the order. For example, assume
the messages are arriving into your orchestration in the following
order:
- 3, 5, 1, 2, 4, 8, 9, 11, 23
The
following diagram illustrates this concept, because it can get a little
confusing. Technically once you receive the third message, which is the
first message in the logical sequence, you can send it. You then
receive the fourth message, which is logical sequence number 2, which
you also can immediately send. The resequencer then looks through the
list of previously received messages and finds logical sequence numbers 3
and 5, so it immediately sends sequence number 3, since it is next in
the logical sequence, and waits for the message that is number 4 in the
logical sequence to arrive, since that is the next message that needs to
be sent in the logical sequence, but has not yet been received.
The resequencer is stateful and
exists for the life of the sequence. It terminates itself once the last
message in the sequence is received. This is using a Singleton pattern
and may potentially have performance issues over time.
The message sequence is atomic. If a message in the sequence cannot be sent, the sequence stops until the issue is fixed.
In
cases where multiple instances of the resequencer are running (i.e.,
processing multiple distinct sequences), there exists a way to uniquely
identify each sequence based on the data in the message. For example, in
cases where messages are arriving in distinct interchanges (not from a
disassembler nor from multiple message parts), there is a way to
distinguish which sequence the message belongs to.
3. BizTalk Components Needed
To implement the resequencer, you will need the following BizTalk components:
Schema to describe the inbound message
Custom property schema to hold three properties:
The SequenceID (GUID that uniquely identifies the sequence)
The current SequenceNumber (identifies that the message is number XXXX of YYYY in the sequence)
LastMessageInSequence Boolean, which indicates that the current message is the last in the sequence
Custom inbound receive pipeline with custom pipeline component:
The
pipeline component will be responsible for probing the incoming schema
and validating whether or not it can handle it, checking for a unique
sequence ID in the message as well as the sequence number. Upon finding
these, it promotes these values to the message context programmatically.
We will call this the Resequencing Decoder.
Orchestration using Convoy pattern with correlation:
The
orchestration will be initiated by the receipt of the first message
received in the sequence. (Note: this message doesn't necessarily need
to be the logical first message to be sent.)
The orchestration will store the inbound message in a SortedList object. The key for the sorted list will be the sequence number.
The
orchestration will listen for incoming messages after receiving the
first one and add them to the array. Upon the receipt of each message,
it checks what the next sequence number to be sent is against the list
of currently received messages. If the required message hasn't been
received yet, it continues to listen for more messages.
When the required message arrives, it is immediately sent out via the orchestration with a delivery notification.
Upon
receipt of the delivery notification, the orchestration searches
through the SortedList of messages to see whether the next sequence
number has been received. If it hasn't, it listens for more messages. If
it has been received, it is immediately sent, and the loop starts over
again.
The orchestration uses a
correlation set that is initialized by the receipt of the first message.
The set is correlated based on the Promoted property of SequenceID, which was promoted in the custom pipeline component.
When a message arrives that has the LastMessageInSequence
property set to True, the orchestration stores this message's sequence
number in a private variable. When this sequence number is successfully
delivered, the orchestration exits the receive messages loop and
finishes normally.
The high-level architecture diagram for this pattern is shown in Figure 6-5.
4. Building the Solution
In the orchestration snippet shown in Figure 1,
the key areas to observe are at the first Receive shape and the
receiving loop. The first Receive shape initializes the correlation set.
The correlation set is using the PropertySchema.SequenceID
that you defined and promoted within your custom pipeline upon receipt
of the message. The IsLastFragment Decide shape is checking the Boolean IsLastMessage property using an XPath expression. If it is not the last fragment, the Expression shape adds the message to a SortedList variable and sets a private integer variable, which stores what the last SequenceNumber was for the received message.
The loop illustrated in Figure 2
is responsible for receiving incoming messages as they are processed.
The second receive message is a follower of the original correlation set
that was initialized by the first receive. From this point on, this is a
typical Convoy pattern implementation.
What happens next is that when the next message is received, its SequenceNumber is checked against the internal variable for the next required SequenceNumber.
The next required sequence number is simply the last in-order received
sequence number incremented by 1. If the received message does not have
the required SequenceNumber, it is added to the SortedList object. If it is, then it is immediately sent and the SortedList
is checked for the next lowest received sequence number to see whether
it should be set as well. This repeats until all messages that could be
sent are.
The final step in the process once all the messages have been received and sent in the proper order is to perform cleanup (see Figure 3).
In this pattern, the received messages were stored to disk in a
temporary location as they were received. Cleanup is an optional step
and isn't required. It is useful, however, when you want to see how many
messages were received and verify that all messages have been sent out
in cases where you are debugging. The last step in this orchestration is
to delete those messages from the location once the resequencer has
finished.
Also note the Catch block illustrated in Figure 3.
In the described Resequencer pattern, there is no implementation for
the scenario where a message in the sequence cannot be delivered. In
most cases, the implementation would be a simple Terminate shape or a
Throw Exception shape depending on the requirements. In some cases, it
may be possible to recover from the scenario in which a message cannot
be delivered. If this is the case, you could implement the offline
message storage to disk and input a Suspend shape. Logic would be needed
to restart the orchestration, remember what messages have already been
received and sent, and resend the message that was in error.