I have a query that looks something like this:
private static IQueryable<MultiframeModule> WhereAllFramesProperties(this IQueryable<MultiframeModule> query, ICollection<Frame> frames)
{
return frames.Aggregate(query, (q, frame) =>
{
return q.Where(p => p.Frames.Any(i => i.FrameData.ShaHash == frame.FrameData.ShaHash));
});
}
MultiframeModule
and Frame
have an many-to-many relation.
With that query I want to find a MultiframeModule
that contains all frames inside the frames
collection I sent as a parameter, for that I check the ShaHash
parameter.
If frames
contains 2 frames, then the generated SQL would be something like that:
SELECT
"Extent1"."MultiframeModuleId",
"Extent1"."FrameIncrementPointer",
"Extent1"."PageNumberVector"
FROM
"public"."MultiframeModule" AS "Extent1"
WHERE
EXISTS
(
SELECT
1 AS "C1"
FROM
"public"."Frame" AS "Extent2"
INNER JOIN
"public"."FrameData" AS "Extent3"
ON "Extent2"."FrameData_FrameDataId" = "Extent3"."FrameDataId"
WHERE
"Extent1"."MultiframeModuleId" = "Extent2"."MultiframeModule_MultiframeModuleId"
AND "Extent3"."ShaHash" = @p__linq__0
)
AND EXISTS
(
SELECT
1 AS "C1"
FROM
"public"."Frame" AS "Extent4"
INNER JOIN
"public"."FrameData" AS "Extent5"
ON "Extent4"."FrameData_FrameDataId" = "Extent5"."FrameDataId"
WHERE
"Extent1"."MultiframeModuleId" = "Extent4"."MultiframeModule_MultiframeModuleId"
AND "Extent5"."ShaHash" = @p__linq__1
)
LIMIT 2
-- p__linq__0: '0' (Type = Int32, IsNullable = false)
-- p__linq__1: '0' (Type = Int32, IsNullable = false)
But, if I have more frames, say 200 for example, then the call would throw an exception:
Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
With stacktrace:
at Npgsql.ReadBuffer.<Ensure>d__27.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlConnector.<DoReadMessage>d__157.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlConnector.<ReadMessage>d__156.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlConnector.<ReadExpecting>d__163`1.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlDataReader.<NextResult>d__32.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Npgsql.NpgsqlDataReader.NextResult()
at Npgsql.NpgsqlCommand.<Execute>d__71.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlCommand.<ExecuteDbDataReader>d__92.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ValueTaskAwaiter`1.GetResult()
at Npgsql.NpgsqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.Entity.Infrastructure.Interception.InternalDispatcher`1.Dispatch[TTarget,TInterceptionContext,TResult](TTarget target, Func`3 operation, TInterceptionContext interceptionContext, Action`3 executing, Action`3 executed)
at System.Data.Entity.Infrastructure.Interception.DbCommandDispatcher.Reader(DbCommand command, DbCommandInterceptionContext interceptionContext)
at System.Data.Entity.Core.EntityClient.Internal.EntityCommandDefinition.ExecuteStoreCommands(EntityCommand entityCommand, CommandBehavior behavior)
So, is there some obvious reason why my query is failing? And how can I improve it to be able to do the query successfully?
As far as I can tell, the problem is caused by the too many subqueries in the generated SQL query.
In my test environment, SqlServer (LocalDB) simply refuses to execute the generated query with the reason as being too complex. PostgreSQL was able to execute it (after setting the CommandTimeout
to 0) in ~4min.
The solution is to find equivalent construct which does not generate many subqueries. What I usually use in such case in the counting the distinct matches and comparing it to the criteria count approach.
It can be implemented in two ways.
(1) This is applicable only for conditions of type property == valueN
. In such case, counting the distinct matches can be done like this (in pseudo code):
obj.Collection
.Select(elem => elem.Property)
.Distinct()
.Count(value => values.Contains(values))
Applying it to your sample:
private static IQueryable<MultiframeModule> WhereAllFramesProperties(this IQueryable<MultiframeModule> query, ICollection<Frame> frames)
{
var values = frames.Select(e => e.FrameData.ShaHash);
var count = frames.Count();
return query.Where(p => p.Frames.Select(e => e.FrameData.ShaHash)
.Distinct().Count(v => values.Contains(v)) == count);
}
(2) This is applicable for any type of condition. In this case the match is identified by its index, which requires building dynamically a selector expression like this:
Condition0 ? 0 : Condition1 ? 1 : ... ConditionN-1 ? N - 1 : -1
and the distinct match count would be
obj.Collection
.Select(selector)
.Distinct()
.Count(i => i >= 0)
Applying it to your sample:
private static IQueryable<MultiframeModule> WhereAllFramesProperties(this IQueryable<MultiframeModule> query, ICollection<Frame> frames)
{
var parameter = Expression.Parameter(typeof(MultiframeModuleFrame), "e");
var body = frames.Select((frame, index) =>
{
Expression<Func<Frame, bool>> predicate = e => e.FrameData.ShaHash == frame.FrameData.ShaHash;
return new
{
Condition = predicate.Body.ReplaceParameter(predicate.Parameters[0], parameter),
Value = Expression.Constant(index)
};
})
.Reverse()
.Aggregate((Expression)Expression.Constant(-1), (next, item) =>
Expression.Condition(item.Condition, item.Value, next));
var selector = Expression.Lambda<Func<Frame, int>>(body, parameter);
var count = frames.Count();
return query.Where(p => p.Frames.AsQueryable().Select(selector)
.Distinct().Count(i => i >= 0) == count);
}
where the ReplaceParameter
is the following custom extension method:
public static partial class ExpressionUtils
{
public static Expression ReplaceParameter(this Expression expression, ParameterExpression source, Expression target)
{
return new ParameterReplacer { Source = source, Target = target }.Visit(expression);
}
class ParameterReplacer : ExpressionVisitor
{
public ParameterExpression Source;
public Expression Target;
protected override Expression VisitParameter(ParameterExpression node)
{
return node == Source ? Target : base.VisitParameter(node);
}
}
}
The generated SQL contains a huge CASE WHEN
expression (unfortunately doubled in the WHERE
clause), but a single subquery, and is accepted and successfully executed in both SqlServer and PostgreSQL (in the later case in less ~2 sec under the same condition as the original test - 1K records in both tables, 1M links, 200 conditions).