1. Problem
1.1. Context
Your component is part of or contains a layered software stack and has to appropriately handle non-fatal domain or system errors.
1.2. Summary
Non-fatal errors must be resolved
appropriately so that an application or service can continue to provide
the functionality the user expects.
There
should be a clear separation between code for error handling and the
normal flow of execution to reduce development time and maintenance
costs by reducing code complexity and increasing software clarity.
Error-handling code should not have a significant impact at run time during normal operation.
1.3. Description
Most well-written software is layered so that
high-level abstractions depend on low-level abstractions. For instance, a
browser interacts with an end user through its UI code at the highest
levels of abstraction. Each time the end user requests a new web page,
that request is given to the HTTP subsystem which in turn hands the
request to the IP stack at a lower level. At any stage, an error might
be encountered that needs to be dealt with appropriately. If the error
is fatal in some way, then there's little you can do but panic as per Fail Fast .
Non-fatal error conditions, however, are more likely
to be encountered during the operation of your application or service
and your code needs to be able to handle them appropriately. Such errors
can occur at any point in the software stack whether you are an
application developer or a device creator. If your software is not
written to handle errors or handles them inappropriately then the user experience will be poor.
As an illustration, the following example of bad code
handles a system error by ignoring it and trying to continue anyway.
Similarly it handles a recoverable missing file domain error by
panicking, which causes the application to terminate with no opportunity
to save the user's data or to try an alternative file. Even if you try
to set things up so that the file is always there, this is very easily
broken by accident:
void CEngine::Construct(const TDesC& aFileName)
{
RFs fs;
fs.Connect(); // Bad code - ignores possible system error
RFile file;
err = file.Open(fs,aFileName,EFileRead); // Could return KErrNotFound
ASSERT(err == KErrNone);
...
}
In general, it is not possible for code running at
the lowest layers of your application or service to resolve domain or
system errors. Consider a function written to open a file. If the open
fails, should the function retry the operation? Not if the user entered
the filename incorrectly but if the file is located on a remote file
system off the device somewhere which does not have guaranteed
availability then it would probably be worth retrying. However, the
function cannot tell which situation it is in because its level of
abstraction is too low. The point you should take away from this is that
it may not be possible for code at one layer to take action to handle
an error by itself because it lacks the context present at higher layers
within the software.
When an error occurs that cannot be resolved at the
layer in which it occurred the only option is to return it up the call
stack until it reaches a layer that does have the context necessary to
resolve the error. For example, if the engine of an application
encounters a disk full error when trying to save the user's data, it is
not able to start deleting files to make space without consulting the
end user. So instead it escalates the error upwards to the UI layer so
that the end user can be informed.
It would be inappropriate to use Fail Fast to resolve such errors by panicking. Whilst it does have
the advantage of resolving the current error condition in a way which
ensures that the integrity of data stored on the phone is unlikely to be
compromised, it is too severe a reaction to system or domain errors that should be expected to occur at some point.
Unfortunately, the C++ function call mechanism
provides no distinct support for error propagation by itself, which
encourages the development of ad-hoc solutions such as returning an
error code from a function. All callers of the function then need to
check this return code to see if it is an error or a valid return, which
can clutter up their code considerably.
As an example of where this is no problem to solve,
consider the error-handling logic in the TCP network protocol. The TCP
specification requires a peer to resend a packet if it detects that one
has been dropped. This is so that application code does not have to deal
with the unreliability of networks such as Ethernet. Since the protocol
has all the information it needs to resolve the error in the layer in
which it occurred, no propagation is required.
1.4. Example
An example of the problem we wish to solve is an
application that transmits data via UDP using the Communication
Infrastructure subsystem of Symbian OS, colloquially known as Comms-Infras.
The application is likely to be split up into at least two layers with
the higher or UI layer responsible for dealing with the end user and the
lower or engine layer responsible for the communications channel.
The engine layer accesses the UDP protocol via the RSocket
API but how should the engine layer handle the errors that it will
receive from this API? How should it handle errors which occur during an
attempt to establish a connection?
Clearly it should take steps to protect its
own integrity and clean up any resources that are no longer needed that
were allocated to perform the connection that subsequently failed. But
to maintain correct layering the engine shouldn't know whether the
connection was attempted as a result of an end user request or because
some background task was being performed. In the latter case, notifying
the end user would be confusing since they'd have no idea what the error
meant as they wouldn't have initiated the connection attempt.
The
engine cannot report errors to the end-user because not only does it
not know if this is appropriate but doing so would violate the layering
of the application and make future maintenance more difficult.
Ignoring
errors is also not an option since this might be an important operation
which the user expects to run reliably. Ignoring an error might even
cause the user's data to be corrupted
so it is important the whole application is designed to use the most
appropriate error-handling strategy to resolve any problems.
For system errors, such as KErrNoMemory
resulting from the failure to allocate a resource, you might think that a
valid approach to resolving this error would be to try to free up any
unnecessary memory. This would resolve the error within the engine with
no need to involve the application at all. But how would you choose
which memory to free? Clearly all memory has been allocated for a reason
and most of it to allow client requests to be serviced. Perhaps caches
and the like can be reduced in size but that will cause operations to
take longer. This might be unacceptable if the application or service
has real-time constraints that need to be met.
2. Solution
Lower-level components should not try to handle
domain or system errors silently unless they have the full context and
can do so successfully with no unexpected impact on the layers above.
Instead lower layers should detect errors and pass them upwards to the
layer that is capable of correctly resolving them.
Symbian OS provides direct support for escalating errors upwards known as the leave and trap
operations. These allow errors to be propagated up the call stack by a
leave and trapped by the layer that has sufficient context to resolve
it. This mechanism is directly analogous to exception handling in C++
and Java.
Symbian OS does not explicitly use the standard C++
exception-handling mechanism, for historical reasons. When Symbian OS,
or EPOC32 as it was then known, was first established, the compilers
available at that time had poor or non-existent support for exceptions.
You can use C++ exceptions within code based on Symbian OS. However,
there are a number of difficulties with mixing Leave and trap operations
with C++ exceptions and so it is not recommended that you use C++
exceptions.
2.1. Structure
The most basic structure for this pattern (see Figure 1) revolves around the following two concepts:
The caller
is the higher-layer component that makes a function call to another
component. This component is aware of the motivation for the function
call and hence how to resolve any system or domain errors that might
occur.
The callee
is the lower-layer component on which the function call is made. This
component is responsible for attempting to satisfy the function call if
possible and detecting any system or domain errors that occur whilst
doing so. If an error is detected then the function escalates this to
the caller through the Leave operation, otherwise the function returns
as normal.
An important point to remember when creating a
function is that there should only be one path out of it. Either return
an error or leave but do not do both as this forces the caller to
separately handle all of the ways an error can be reported. This results
in more complex code in the caller and usually combines the
disadvantages of both approaches!
Note that all leaves must have a corresponding trap
harness. This is to ensure that errors are appropriately handled in all
situations.
A common strategy for resolving errors is to simply report the problem,
in a top-level trap harness, to the end user with a simple message
corresponding to the error code.
2.2. Dynamics
Normally, a whole series of caller–callee pairs are
chained together in a call stack. In such a situation, when a leave
occurs, the call stack is unwound until control is returned to the
closest trap. This allows an error to be easily escalated upwards
through more than one component since any component that doesn't want to
handle the error simply doesn't trap it (see Figure 2).
The most important decision to make is where you
should place your trap harnesses. Having coarse-grained units of
recovery has the advantage of fewer trap harnesses and their associated
recovery code but with the disadvantage that the recovery code may be
general and complex. There is also the danger that a small error leads
to catastrophic results for the end user. For instance, if not having
enough memory to apply bold formatting in a word processor resulted in a
leave that unwound the entire call stack, this might terminate the
application without giving the end user the opportunity to save and
hence they might lose their data! On the other hand, too fine-grained
units of recovery results in many trap harnesses and lots of recovery
code with individual attention required to deal with each error case as
well as a potentially significant increase in the size of your
executable.
Unlike other operating systems, Symbian OS is largely
event-driven so the current call stack often just handles a tiny event,
such as a keystroke or a byte received. Thus trying to handle every
entry point with a separate trap is impractical. Instead leaves are
typically handled in one of three places:
Many threads have a top-level trap which is
used as a last resort to resolve errors to minimize unnecessary error
handling in the component. In particular, the Symbian OS application
framework provides such a top-level trap for applications. If a leave
does occur in an application, the CEikAppUi::HandleError() virtual function is called allowing applications to provide their own error-handling implementation.
Traps are placed in a RunL() implementation when using Active Objects to handle the result of an asynchronous service call. Or
alternatively you can handle the error in the corresponding RunError() function if your RunL() leaves.
Trap
harnesses can be nested so you do not need to rely on just having a
top-level trap. This allows independent sub-components to do their own
error handling if necessary. You should consider inserting a trap at the
boundary of a component or layer. This can be useful if you wish to
attempt to resolve any domain errors specific to your component or layer
before they pass out of your control.
2.3. Implementation
Leaves
A leave is triggered by calling one of the User leave functions defined in e32std.h and exported by euser.dll.
By calling one of these functions you indicate to Symbian OS that you
cannot finish the current operation you are performing or return
normally because an error has occurred. In response, Symbian OS searches
up through the call stack looking for a trap harness to handle the
leave. Whilst doing so Symbian OS automatically cleans up objects pushed
onto the cleanup stack by lower-level functions.
The main leave function is User::Leave(TInt aErr)
where the single integer parameter indicates the type of error and is
equivalent to a throw statement in C++. By convention, negative integers
are used to represent errors. There are a few helper functions that can
be used in place of User::Leave():
User::LeaveIfError(TInt aReason) leaves if the reason code is negative or returns the reason if it is zero or positive.
User::LeaveIfNull(TAny* aPtr) leaves with KErrNoMemory if aPtr is null.
new(ELeave) CObject() is an overload of the new operator that automatically leaves with KErrNoMemory if there is not enough memory to allocate the object on the heap.
Here is an example where a function leaves if it
couldn't establish a connection to the file server due to some system
error or because it couldn't find the expected file. In each case, the
higher layers are given the opportunity to resolve the error:
void CEngine::ConstructL(const TDesC& aFileName)
{
RFs fs;
User::LeaveIfError(fs.Connect());
RFile file;
User::LeaveIfError(file.Open(fs,aFileName,EFileRead));
...
}
By convention, the names of functions which can leave should always be suffixed with an 'L' (e.g., ConstructL()) so that a caller is aware the function may not return normally. Such a function is frequently referred to as a leaving function. Note that this rule applies to any function which calls a leaving function even if does not call User::Leave()
itself. The function implicitly has the potential to leave because
un-trapped leaves are propagated upward from any functions it calls.
Unfortunately you need to remember that this is only a
convention and is not enforced by the compiler so an 'L' function is
not always equivalent to a leaving function. However, static analysis
tools such as epoc32\tools\leavescan.exe exist to help you with
this. These tools parse your source code to evaluate your use of trap
and leave operations and can tell you if you're violating the
convention. They also check that all leaves have a trap associated with
them to help you avoid USER 175 panics.
Traps
A trap harness is declared by using one of the TRAP macros defined in e32cmn.h. These macros will catch any leave from any function invoked within a TRAP macro. The main trap macro is TRAP(ret, expression) where expression is a call to a leaving function and ret is a pre-existing TInt variable. If a leave reaches the trap then the operating system assigns the error code to ret; if the expression returns normally, without leaving or because the leave was trapped at a lower level in the call stack, then ret is set to KErrNone to indicate that no error occurred.
As the caller of a function within a trap you should
not need to worry about resources allocated by the callee. This is
because the leaving mechanism is integrated with the Symbian OS cleanup
stack. Any objects allocated by the callee and pushed onto the cleanup
stack are deleted prior to the operating system invoking the trap
harness.
In addition to the basic TRAP macro, Symbian OS defines the following similar macros:
TRAPD(ret, expression) – the same as TRAP except that it automatically declares ret as a TInt variable on the stack for you (hence the 'D' suffix) for convenience.
TRAP_IGNORE(expression) – simply traps expression and ignores whether or not any errors occurred.
Here is an example of using a trap macro:
void CMyComponent::Draw()
{
TRAPD(err, iMyClass->AllocBufferL());
if(err < KErrNone)
{
DisplayErrorMsg(err);
User::Exit(err);
}
... // Continue as normal
}
Intermediate Traps
If a function traps a leave but then determines from
the error code that it is unable to resolve that specific error, it
needs to escalate the error further upwards. This can be achieved by
calling User::Leave() again with the same error code.
TRAPD(err, iBuffer = iMyClass->AllocBufferL());
if(err < KErrNone)
{
if(err == KErrNoMemory)
{
// Resolve error
}
else
{
User::Leave(err); // Escalate the error further up the call stack
}
}
Trapping and leaving again is normally only done if a
function is only capable of resolving a subset of possible errors and
wishes to trap some while escalating others. This should be done
sparingly since every intermediate trap increases the cost of the entire
leave operation as the stack unwind has to be restarted.
Additional Restrictions on Using Trap–Leave Operations
You should not call a leaving function from within a constructor.
This
is because any member objects that have been constructed will not have
their destructors called which can cause resource leaks.
You also should not allow a Leave to escape from a destructor.
Essentially
this means that it is permissible to call leaving functions within a
destructor so long as they are trapped before the destructor completes.
This is for two reasons; the first is that the leave and trap mechanisms
are implemented in terms of C++ exceptions and hence if an exception
occurs the call stack is unwound. In doing so the destructors are called
for objects that have been placed on the call stack. If the destructors
of these objects leave then an abort may occur on some platforms as
Symbian OS does not support leaves occurring whilst a leave is already
being handled.
The
second reason is that, in principle, a destructor should never fail. If
a destructor can leave, it suggests that the code has been poorly
architected. It also implies that part of the destruction process might
fail, potentially leading to memory or handle leaks. One approach to
solving this is to introduce 'two-phase destruction' where some form of ShutdownL() function is called prior to deleting the object. For further information on this, see the Symbian Developer Library.
2.4. Consequences
Positives
Errors can be handled in a more appropriate
manner in the layer that understands the error compared to attempting to
resolve the error immediately.
Escalating an
error to a design layer with sufficient context to handle it ensures
that the error is handled correctly. If this is not done and an attempt
is made to handle an error at too low a level, your options for handling
the error are narrowed to a few possibilities which are likely to be
unsuitable.
The
low-level code could retry the failed operation; it could silently
ignore the error (not normally practical but there may be circumstances
when ignoring certain errors is harmless); or it could use Fail Fast . None of these strategies is particularly desirable especially the use of Fail Fast, which should be reserved for faults rather than the domain or system errors that we are dealing with here.
In
order to handle an error correctly without escalating it, the component
would probably be forced to commit layering violations, e.g., by
calling up into the user interface from lower-level code. This mixing of
GUI and service code causes problems with encapsulation and portability
as well as decreasing your component's maintainability. This pattern
neatly avoids all these issues.
Less
error-handling code needs to be written, which means the development
costs and code size are reduced as well as making the component more
maintainable.
When using this pattern, you do not
need to write explicit code to check return codes because the leave and
trap mechanism takes care of the process of escalating the error and
finding a function higher in the call stack which can handle it for you.
You do not need to write code to free resources allocated by the
function if an error occurs because this is done automatically by the
cleanup stack prior to the trap harness being invoked. This is
especially true if you use a single trap harness at the top of a call
stack which is handling an event.
Runtime performance may be improved.
Use
of leave–trap does not require any logic to be written to check for
errors except where trap harnesses are located. Functions which call
leaving functions but do not handle the leaves themselves do not have to
explicitly propagate errors upwards. This means that efficiency during
normal operation improves because there is no need to check return
values to see if a function call failed or to perform manual cleanup.
Negatives
Traps and leaves are not as flexible as the C++ exception mechanism. A leave can only escalate a single TInt value and hence can only convey error values without any additional context information.
In
addition, a trap harness cannot be used to catch selected error values.
If this is what you need to do then you have to trap all errors and
leave again for those that you can't resolve at that point which is
additional code and a performance overhead for your component.
Runtime performance may get worse when handling errors.
A
leave is more expensive in terms of CPU usage, compared to returning an
error code from a function, due to the cost of the additional machinery
required to manage the data structures associated with traps and
leaves. In the Symbian OS v9 Application Binary Interface (ABI), this
overhead is currently minimal because the C++ compiler's
exception-handling mechanism is used to implement the leave–trap
mechanism which is usually very efficient in modern compilers.
It
is best to use leaves to escalate errors which are not expected to
occur many times a second. Out-of-memory and disk-full errors are a good
example of non-fatal errors which are relatively infrequent but need to
be reported and where a leave is usually the most effective mechanism.
Frequent leaves can become a very noticeable performance bottleneck.
Leaves also do not work well as a general reporting mechanism for
conditions which are not errors. For example, it would not be
appropriate to leave from a function that checks for the presence of a
multimedia codec capability when that capability is not present. This is
inefficient and leads to code bloat due to the requirement on the
caller to add a trap to get the result of the check.
Leaves
should not be used in real-time code because the leave implementation
does not make any real-time guarantees due to the fact that it involves
cleaning up any items on the cleanup stack and freeing resources,
usually an unbounded operation.
Without additional support, leaves can only be used to escalate errors within the call stack of a single thread.
This
pattern cannot be used when writing code that forms part of the Symbian
OS kernel, such as device drivers, because the leave–trap operations
are not available within the kernel.
3. Example Resolved
In the example, an application wished to send data to
a peer via UDP. To do this, it was divided into two layers: the UI,
dealing with the end user, and the engine, dealing with Comms-Infras.
Engine Layer
To achieve this, we need to open a UDP connection to be able to communicate with the peer device. The RSocket::Open() function opens a socket and RSocket::Connect()
establishes the connection. These operations will fail if Comms-Infras
has insufficient resources or the network is unavailable. The engine
cannot resolve these errors because it is located at the bottom layer of
the application design and does not have the context to try to
transparently recover from an error without potentially adversely
affecting the end user. In addition, it has no way of releasing
resources to resolve local resource contention errors because they are
owned and used by other parts of the application it does not have access
to.
We could implement escalation of the errors by using function return codes as follows:
TInt CEngine::SendData(const TDesC8& aData)
{
// Open the socket server and create a socket
RSocketServ serv;
TInt err = serv.Connect();
if(err < KErrNone)
{
return err;
}
RSocket sock;
err = socket.Open(serv,
KAfInet,
KSockDatagram,
KProtocolInetUdp);
if(err < KErrNone)
{
serv.Close();
return err;
}
// Connect to the localhost.
TInetAddr addr;
addr.Input(_L("localhost"));
addr.SetPort(KTelnetPort);
TRequestStatus status;
sock.Connect(addr, status);
User::WaitForRequest(status);
if(status.Int() < KErrNone)
{
sock.Close();
serv.Close();
return status.Int();
}
// Send the data in a UDP packet.
sock.Send(aData, 0, status);
User::WaitForRequest(status);
sock.Close();
serv.Close();
return status.Int();
}
However, as you can see, the error-handling code is
all mixed up with the normal flow of execution making it more difficult
to maintain. A better approach would be to use the Symbian OS
error-handling facilities, resulting in a much more compact
implementation:
void CEngine::SendDataL(const TDesC8& aData)
{
// Open the socket server and create a socket
RSocketServ serv;
User::LeaveIfError(serv.Connect());
CleanupClosePushL(serv);
RSocket sock;
User::LeaveIfError(sock.Open(serv,
KAfInet,
KSockDatagram,
KProtocolInetUdp));
CleanupClosePushL(sock);
// Connect to the localhost.
TInetAddr addr;
addr.Input(_L("localhost"));
addr.SetPort(KTelnetPort);
TRequestStatus status;
sock.Connect(addr, status);
User::WaitForRequest(status);
User::LeaveIfError(status.Int());
// Send the data in a UDP packet.
sock.Send(aData, 0, status);
User::WaitForRequest(status);
User::LeaveIfError(status.Int());
CleanupStack::PopAndDestroy(2); // sock and serv
}
Note that in the above we rely on the fact that RSocket::Close() does not leave. This is because we use CleanupClosePushL() to tell the cleanup stack to call Close() on both the RSocketServ and RSocket
objects if a leave occurs while they're on the cleanup stack. This is a
common property of Symbian OS functions used for cleanup functions,
such as Close(), Release() and Stop(). There is nothing useful that the caller can do if one of these functions fails, so errors need to be handled silently by them.
UI Layer
In this case, the application implementation relies
on the application framework to provide the top-level trap harness to
catch all errors escalated upwards by the Engine. When an error is
caught by the trap it then calls the CEikAppUi::HandleError()
virtual function. By default, this displays the error that occurred in
an alert window to the end user. If you've put everything on the cleanup
stack then this may be all you need to do. However, an alternative is
to override the function and provide a different implementation. Note
that HandleError() is called with a number of parameters in addition to the basic error:
TErrorHandlerResponse CEikAppUi::HandleError(TInt aError,
const SExtendedError& aExtErr,
TDes& aErrorText,
TDes& aContextText)
These parameters are filled in by the application
framework and go some way to providing extra context that might be
needed when resolving the error at the top of the application's call
stack. By relying on this, the lower layers of the application can
escalate any errors upwards to the top layer in the design to handle the
error. Use of this pattern enables errors to be resolved appropriately
and minimizes the amount of error-handling code which needs to be
written.
4. Other Known Uses
This pattern is used extensively within Symbian OS so here are just a couple of examples:
RArray
This is just one of many classes exported from euser.dll
that leaves when it encounters an error. Basic data structures like
these don't have any knowledge of why they're being used so they can't
resolve any errors. Interestingly, this class provides both a leave and a
function return variant of each of its functions. This is so that it
can be used kernel-side, where leaves cannot be used, and user-side,
where leaves should be used to simplify the calling code as much as
possible.
CommsDat
CommsDat
is a database engine for communication settings such as network and
bearer information. It is rare for an error to occur when accessing a
CommsDat record unless the record is missing. Hence by using leaves to
report errors, its clients can avoid having to write excessive amounts
of error-handling code. The use of the leave mechanism also frees up the
return values of functions in the APIs so that they can be used for
passing actual data.
5. Variants and Extensions
Escalating Errors over a Process Boundary
A
limitation of the basic trap and leave operations is that they just
escalate errors within a single thread. However, the Symbian OS
Client–Server framework extends the basic mechanism so that if a leave
occurs on the server side it is trapped within the framework. The error
code is then used to complete the IPC message which initiated the
operation that failed. When received on the client side this may be
converted into a leave and continue being escalated up the client call
stack. However, this is dependent on the implementation of the
client-side DLL for the server.
Escalating Errors without Leaving
Whilst
this pattern is normally implemented using the leave and trap
mechanisms this isn't an essential part of the pattern. In situations
where the higher layer didn't originate the current call stack, it may
not be possible to use a Leave to escalate the error back to it. Instead
the error will need to be passed upwards via an explicit function call
which informs the higher layer of the error that needs to be handled.
Examples of this commonly occur when an asynchronous request fails for
some reason since by the time the response comes back the original call
stack no longer exists; for instance, this occurs in Coordinator and Episodes .