Thursday, October 10, 2013

Using sequential guids as identifiers in Entity Framework

The default strategy for ID fields in Entity Framework is the usage of IDENTITY columns. This means that an identity value is generated for you by the database. Disadvantage of this approach is that an extra roundtrip to the database is required to get an ID assigned to an Entity.

An alternative approach is the use of GUID’s. They make it easy to generate an unique identity value on the client without the need to connect to the database. However the usage of GUID’s doesn’t come without it’s own set of disadvantages. First of all extra space is required to store the information and second of all it leads to fragmented indexes(due to the randomness of the generated values).

To combine the best of both worlds, you can use Sequential Guids. Using sequential guids in NHibernate is easy thanks to the build in guid.comb strategy(read more about it here). But how can we do this in Entity Framework?

I found 2 possible solutions to do this:

Use NewSequentialId()

One option you have is to still specify your ID field as an Identity column in Entity framework but update your database to use the newsequentialid() SQL function to provide a default value. Disadvantage of this approach is that you still need to go the database to get your ID value.

A good article explaining this approach can be found here:

Replicate the Guid.Comb implementation from NHibernate

Another option(and the one that I prefer) is to copy the Guid.Comb implementation in NHibernate and embed it in your Entity Framework application.


Anonymous said...

Generating sequential GUID's in code seems like it should solve most of your fragmentation issues. However, you must be careful because you cannot control the order that Entity Framework generates inserts into the database. Consider a situation where you generate 5000 entities with sequential GUID's and apply them to your context. When the context generates insert statements it will generate them in whatever order it wants (not ordering by the GUID). The result is a very fragmented index.

Anonymous said...

Well, thats not true. The index will not be fragmented. The DB generates/manages the index based on the associated column values, the order of inserts in time is irrelevant.

The only impact of the random insert order would be performance. The 5000 inserts would take a lot of time as the index would be sorted on every insert.