I have a complex IQueryable that I would like EF to populate with a single database query so that I can use it with deferred execution. Please consider the following example.
For these models:
public enum AlphaState { Unknown = '\0', Good = 'G', Bad = 'B' }
[Table("MY_ALPHA")]
public class Alpha
{
[Key]
[Column("alpha_index")]
public long Index { get; set; }
[Column("alpha_id")] // user-editable field that users call ID
public string AlphaId { get; set; }
[Column("deleted")]
public char? Deleted { get; set; }
[Column("state")]
public AlphaState State { get; set; }
[InverseProperty("Alpha")]
public ICollection<Bravo> Bravos { get; set; }
}
[Table("MY_BRAVO")]
public class Bravo
{
[Key]
[Column("bravo_index")]
public long BravoIndex { get; set; }
[ForeignKey("Alpha")]
[Column("alpha_index")] // actually a 1:0..1 relationship
public long? AlphaIndex { get; set; }
public virtual Alpha Alpha { get; set; }
[InverseProperty("Bravo")]
public ICollection<Charlie> Charlies { get; set; }
}
[Table("MY_CHARLIE_VIEW")]
public class Charlie
{
[Key]
[Column("charlie_index")]
public int CharlieIndex { get; set; }
[Column("deleted")]
public char? Deleted { get; set; }
[Column("created_at")]
public DateTime CreatedAt { get; set; }
[ForeignKey("Bravo")]
[Column("bravo_index")]
public long BravoIndex { get; set; }
public virtual Bravo Bravo { get; set; }
[ForeignKey("Delta")]
[Column("delta_index")]
public long DeltaIndex { get; set; }
public virtual Delta Delta { get; set; }
[InverseProperty("Charlie")]
public virtual ICollection<Delta> AllDeltas { get; set; }
}
[Table("MY_DELTA")]
public class Delta
{
[Key]
[Column("delta_index")]
public long DeltaIndex { get; set; }
[ForeignKey("Charlie")]
[Column("charlie_index")]
public long CharlieIndex { get; set; }
public virtual Charlie Charlie { get; set; }
[InverseProperty("Delta")] // actually a 1:0..1 relationship
public ICollection<Echo> Echoes { get; set; }
}
public enum EchoType { Unknown = 0, One = 1, Two = 2, Three = 3 }
[Table("MY_ECHOES")]
public class Echo
{
[Key]
[Column("echo_index")]
public int EchoIndex { get; set; }
[Column("echo_type")]
public EchoType Type { get; set; }
[ForeignKey("Delta")]
[Column("delta_index")]
public long DeltaIndex { get; set; }
public virtual Delta Delta { get; set; }
}
...consider this query:
IQueryable<Alpha> result = context.Alphas.Where(a => a.State == AlphaState.Good)
.Where(a => !a.Deleted.HasValue)
.Where(a => a.Bravos.SelectMany(b => b.Charlies)
.Where(c => !c.Deleted.HasValue)
.Where(c => c.Delta.Echoes.Any())
.OrderByDescending(c => c.CreatedAt).Take(1)
.Any(c => c.Delta.Echoes.Any(e => e.Type == EchoType.Two)))
var query = result as System.Data.Objects.ObjectQuery;
string queryString = query.ToTraceString();
NOTE: Charlie is actually a view on a table; Delta has an FK to Charlie's table, but the view gives a fake FK for the most recent Delta linked to that Charlie, so the model uses that because the plan is to use EF only for querying, never for updating.
I would like this query to be populated by a single database query, but as written that's not what's happening. How can I modify this query to get the same results but have EF simply build the condition into the results
IQueryable instead of pre-fetching data for it?
How I Know It's Using Two Queries
I know for sure that it's splitting into multiple queries because, for reasons beyond the scope of this question, I deliberately gave the context a bad connection string. result
is an IQueryable, so it should be using deferred execution and not actually attempt to retrieve any data until it's used, but I'm getting a connection failed exception as soon as I declare it.
Background
We have an existing database structure, database access layer, and several hundred thousand lines of code using said structure & DAL. We'd like to add a UI to allow users to build complex queries of their own, and EF seemed like a good way to build an underlying model for that. We've never used EF before, though, so the Powers That Be have declared that it cannot ever connect to the database; we should use EF to generate an IQueryable, extract the query string from it, and use our existing DAL to run the query.
How I Know It's Using Two Queries
What you're observing isn't EF starting to run your query. After you assign a query to your result
variable, you still have a query definition, not a result set. If you attach a profiler to your database, you'll see that no SELECT
statement has executed for your query.
So, why is as connection being made to the database? The reason is that the first time you build a query for a given derived DbContext
type, EF builds and caches its in-memory model for that type. It does this by applying various conventions to the types, properties and attributes that you've defined. In theory this process doesn't need to connect to the database, but the provider for SQL Server does so anyway. It does this to determine the version of SQL Server you're using so it can work out whether it can use more recent SQL Server features in the model that it builds.
It's interesting that this model is cached for the type, not the instance of the context. You can see this by disposing of your context
then creating a new one and repeating the lines of code that build a query. In the second instance you wouldn't see a connection to the database at all, since EF would use its cached model for your context type.
the Powers That Be have declared that it cannot ever connect to the database; we should use EF to generate an IQueryable, extract the query string from it, and use our existing DAL to run the query.
Since you need to avoid having EF connect to the database at all, you can see my post here which contains details of how you can provide this information up-front in code instead.
Also, be aware that there's another reason why EF might connect to your server the first time it encounters your DbContext
type: initialization. Unless you've disabled initialization (with something like Database.SetInitializer<MyContext>(null)
) it will check that the database exists and try to create it if not.
Note that you can call ToString()
directly on an EF code first query to get the T-SQL query. You don't need to go through the intermediate ObjectQuery.ToTraceString()
method, which is actually part of the legacy EF API. However, both of these methods are intended for debugging and logging purposes. It is rather unusual to use EF for building queries but not executing them. You are likely to encounter problems with this approach—most obviously when EF determines that it should generate a parameterized query. Also, there's no guarantee that different versions of EF will generate similar T-SQL for the same input, so your code may end up being rather brittle as you update to newer EF versions. Make sure you have plenty of tests!
If you are concerned about having users connect to the database directly—and that's an entirely legitimate security concern—you might consider an alternative approach. I haven't much experience with it, but it seems that OData might be a good fit. This allows you to build queries on a client, serialize them over a remote connection, then re-create them on your server. Then, on the server, you can execute them against your database. Your client needs to know nothing about the database whatsoever.
If you do decide (or are instructed) to persevere with the approach you detailed in your question, do spend the time learning how to profile a SQL Server connection. This will be an absolutely essential tool for you in working out how EF is translating your queries.