We are using the ESB Exception Notification (aka ESB.AlertService) Windows service in conjunction with the ESB Portal website.  On occasion, we have a problem where the service indefinitely sends out duplicate emails for the same alert.  In the server’s Application Event Log, we see the error: “An exception of type ‘System.Data.StrongTypingException’ occurred and was caught.”  The log entry also includes “The value for column ‘To’ in table ‘AlertEmail’ is DBNull.”

We are allowing the service to pull user email addresses from Active Directory by configuring the LDAP Root under Alert Queue Options to LDAP://DC=company, DC=com.  With Active Directory you don’t need to specify a server name in your LDAP path.  Just point to the domain itself and Windows will figure out which domain controller to contact.

The vast majority of the rows in AlertEmail contain the correct email address in the To column, but every once in a while there is a NULL.  Looking at the service code (QueueGenerator.cs), we can see that the email address in CustomEmail is always used first, if one was provided when the alert subscription was created.  We do not set this value, so the code next attempts to pull the email address from Active Directory using the GetEmailAddress() method (ActiveDirectoryHelper.cs).

In order to reduce the number of AD queries, the code caches email addresses using the Enterprise Library caching block.  The cached entries expire after a configurable interval, which defaults to 1 minute.  If the username is already in the cache, then the corresponding email address is returned.  Otherwise, the code looks up the username in AD, grabs the email address and caches it.  The lookup code throws an exception if it doesn’t get back a valid email address, so it doesn’t explain how we got a NULL email address.

The problematic code is the cache lookup:

if (CacheMgr.Contains(name))
  Trace.WriteLine("Reusing email address for " + name + " from cache.");
  return (string)CacheMgr.GetData(name);

This is a classic race condition.  The code checks to see if the username is in the cache, then runs a Trace.WriteLine(), then asks for the cached data associated with the username.  In the time between the Contains() and the GetData() calls, the cached data can expire and drop out of the cache, in which case GetData() will return null.  Most of the time it gets lucky and the data is still cached.  This probably explains how we sometimes get NULL values in the database.

The proper code is simple because GetData() simply returns null when the requested data is not in the cache:

string cachedEmail = (string)CacheMgr.GetData(name);

if (!string.IsNullOrEmpty(cachedEmail))
    Trace.WriteLine("Reusing email address for " + name + " from cache.");
    return cachedEmail;

The new version of the code eliminates the race condition and should prevent us from ever seeing NULL values in the database.

I also created a bug report on Microsoft Connect.