Fun With ORM Includes!

This is a cautionary tale about how an ORM (in this case, Entity Framework) can screw your performance if you don’t pay attention to the queries it generates. Particularly, the use of recusive includes.

Here’s a simplified version of the original problem;

var person = await DataContext.Set<Person>()
 .Include(p => p.Groups)
 .Include(p => p.Manager)
 .Where(p => p.Id == personId)
 .SingleOrDefaultAsync();

So here’s the intent, and the bug. The code above is supposed to say ‘load a person with the specific ID, and include his/her groups and manager’

And here’s the problem. EF sees that you want to load Person.Manager. Person.Manager is a person, so we’ll need to include the manager’s groups. But ah! Person.Manager has a Manager, which has a Manager, which has a Manager. EF goes batshit at this point, and generates over 90 joins;

person join person join person join person join person ....

And that query goes off, loads far too much data, and takes far too much time; in the order of seconds. Now this is a query that should load in small numbers of milliseconds..

The fix I put in was to divide this into two, much faster queries. Something more like this;

// don't load manager etc.
 var person = await DataContext.Set<Person>()
 .Include(p => p.Groups)
 .Where(p => p.Id == personId)
 .SingleOrDefaultAsync();

if (person.ManagerId.HasValue) {
 var manager = await DataContext.Set<Person>()
 .Include(p => p.Groups)
 .Where(p => p.Id == person.ManagerId.Value)
 .FirstOrDefaultAsync();
 }

So we hit the database twice, but with much cheaper queries. This may not be perfect (I’d like to get everything in one shot) but I couldn’t think of a better approach.

Here’s the takeaway, though. It’s almost never right to use a recursive include (like Person.Manager) unless you want to load complete branches or ancestor paths. When mixed with other includes (like Person.Groups) it leads to an explosion in the complexity of the query. So;

  • Measure the performance of any queries you write. Use the free Stackify Prefix  or some other tool to make sure your queries are small-numbers-of-milliseconds fast. If not, question whether you could make it faster.
  • Use Stackify Prefix to examine the generated SQL. Sure, EF can be a bit verbose, but any time it generates more than a page of SQL, question whether the query is as simple as it could be.
  • Don’t ask for it if you don’t need it. Don’t Include() in data just in case.
  • Two very cheap queries beats one awful one. EF’s only so clever, and it doesn’t handle lots of includes well, so don’t be afraid of two efficient queries when EF goes mental.
  • On the other hand, don’t take that as permission to write an N+1 bug. This is where you load a ‘master’ object, then iterate through children, loading each in turn. This will kill your server, and it’s the most common database performance bug written in enterprise software.