Friday, September 14, 2012

The Transient Fault Handling Application Block

While looking for some good error handling strategies for cloud solutions, I found the following Microsoft Patterns & Practices Application Block: The Transient Fault Handling Application Block.

From the site:

The Microsoft Enterprise Library Transient Fault Handling Application Block lets developers make their applications more resilient by adding robust transient fault handling logic. Transient faults are errors that occur because of some temporary condition such as network connectivity issues or service unavailability. Typically, if you retry the operation that resulted in a transient error a short time later, you find that the error has disappeared.

Different services can have different transient faults, and different applications require different fault handling strategies. The Transient Fault Handling Application Block encapsulates information about the transient faults that can occur when you use the following Windows Azure services in your application:

  • SQL Azure
  • Windows Azure Service Bus
  • Windows Azure Storage
  • Windows Azure Caching Service

The Transient Fault Handling Application Block enables the developer to select from the following retry strategies:

  • Incremental
  • Fixed interval
  • Exponential back-off

The Enterprise Library Transient Fault Handling Application Block includes the following features:

  • You can select from an extensible collection of error detection strategies for cloud-based services, and an extensible collection of retry strategies.
  • You can use the graphical Enterprise Library configuration tool to manage configuration settings.
  • You can extend the block by adding error detection strategies for other services or by adding custom retry strategies.

The Transient Fault Handling Application Block uses detection strategies to identify all known transient error conditions. You can use one of the built-in detection strategies for SQL Azure, Windows Azure Storage, Windows Azure Caching, or the Windows Azure Service Bus. You can also define detection strategies for any other services that your application uses.

Some possible retry strategies:

  • Fixed interval: Retry four times at one-second intervals
  • Incremental interval: Retry four times, waiting one second before the first retry, then two seconds before the second retry, then three seconds before the third retry, and four seconds before the fourth retry.
  • Exponential back off: Retry four times, waiting two seconds before the first retry, then four seconds before the second retry, then eight seconds before the third retry, and sixteen seconds before the fourth retry.

And here is a sample:

using Microsoft.Practices.TransientFaultHandling;
using Microsoft.Practices.TransientFaultHandling.RetryStrategies;
using Microsoft.Practices.EnterpriseLibrary.WindowsAzure.TransientFaultHandling.AzureStorage;

// Define your retry strategy: retry 3 times, 1 second apart.
var retryStrategy = new FixedInterval(3, TimeSpan.FromSeconds(1));

// Define your retry policy using the retry strategy and the Windows Azure storage
// transient fault detection strategy.
var retryPolicy =
new RetryPolicy<StorageTransientErrorDetectionStrategy>(retryStrategy);

// Do some work that may result in a transient fault.
try
{
// Call a method that uses Windows Azure storage and which may
// throw a transient exception.
retryPolicy.ExecuteAction(
() =>
{
this.queue.CreateIfNotExist();
});
}
catch (Exception)
{
// All of the retries failed.
}

2 comments:

Alexandre Brisebois said...

this is interesting but
the queue already implements Retry logic.

isn't this a little overkill when the queue retry logic already wraps the work in a Task of its own.

I'm just trying to figure out what's the best way to go about it...

any insight is greatly appreciated

Shweta said...

Good post. Please can you show a snippet on how the code you have posted can be tested for different types of transient faults.

Thanks in advance.

-Shweta