Search code examples
c#sql-serverlinqentity-framework-core.net-8.0

LINQ query using .All() returns fewer than expected results


I have written a LINQ query where I am searching an SQL database with a number of strings. For demonstration purposes, I'll hard-code the search strings as ["term1", "term2"]:

var terms = ["term1", "term2"];
var queryableTerms = terms.AsQueryable();

var results = await(
  from row in _context.table
    where
      row.col1 != null && row.col2 != null && row.col3 != null && row.col4 != null
      && queryableTerms.All(term =>
        row.col1.Contains(term)
        || row.col2.Contains(term)
        || row.col3.Contains(term)
        || row.col4.Contains(term))
    select row
  )
.ToListAsync();
...

This code returns around 60 results. However, if I define terms in reverse as ["term2", "term1"], I get around 250 results. What could cause this to happen?

Additionally, if I explicitly check all the columns like this:

...
   && (row.col1.Contains(terms[0])
      || row.col2.Contains(terms[0])
      || row.col3.Contains(terms[0])
      || row.col4.Contains(terms[0]))
   && (row.col1.Contains(terms[1])
      || row.col2.Contains(terms[1])
      || row.col3.Contains(terms[1])
      || row.col4.Contains(terms[1]))
...

I get around 350 results from the database.

What am I doing wrong? I have had other people check my actual code and there is nothing wrong. The issue seems to be with the behaviour of LINQ. Is this something that can be fixed by using method syntax instead?

Thanks for reading.

EDIT: Here is the SQL that LINQ translates to

SELECT *
FROM [dbo].[table] AS [c]
WHERE [c].[col1] IS NOT NULL AND [c].[col2] IS NOT NULL AND [c].[col3] IS NOT NULL AND [c].[col4] IS NOT NULL AND NOT EXISTS (
    SELECT 1
    FROM (VALUES (CAST('term1' AS char(12))), ('term2')) AS [v]([Value])
    WHERE NOT (CHARINDEX([v].[Value], UPPER([c].[col1])) > 0 OR [v].[Value] LIKE '' OR CHARINDEX([v].[Value], UPPER([c].[col2])) > 0 OR [v].[Value] LIKE '' OR CHARINDEX([v].[Value], UPPER([c].[col3])) > 0 OR [v].[Value] LIKE '' OR CHARINDEX([v].[Value], UPPER([c].[col4])) > 0 OR [v].[Value] LIKE ''))

Same as the above reformatted for improved readability:

SELECT *
FROM dbo.table AS c
WHERE c.col1 IS NOT NULL
AND c.col2 IS NOT NULL
AND c.col3 IS NOT NULL
AND c.col4 IS NOT NULL
AND NOT EXISTS (
    SELECT 1
    FROM (
        VALUES
            (CAST('term1' AS char(12))),
            ('term2')
    ) AS v(Value)
    WHERE NOT (
        CHARINDEX(v.Value, UPPER(c.col1)) > 0
        OR v.Value LIKE ''
        OR CHARINDEX(v.Value, UPPER(c.col2)) > 0
        OR v.Value LIKE ''
        OR CHARINDEX(v.Value, UPPER(c.col3)) > 0
        OR v.Value LIKE ''
        OR CHARINDEX(v.Value, UPPER(c.col4)) > 0
        OR v.Value LIKE ''
    )
)

EDIT: Here is some information about the data returned by the two orders of terms: In the interest of clarity, lets assume all matches occur in col1. If the terms are as such ["term1", "term2"], all results are something like as follows:

"abcterm1 term2"
"123term1 abcterm2"
"term1 term2"
"term123 term2"
"abcterm1 term2"
"term1 term2"
etc...

on the other hand, if the terms are reversed: ["term2", "term1"], the results look something like this:

"term2 term1"
"term2 abcterm1"
"term234 term1"
"123term2 12term1"
etc...

Solution

  • After a lot of help from @TN, which you can see in the comments of the original post, I had the idea to try adding .Trim() to term in every .Contains():

    queryableTerms.All(term =>
            row.col1.Contains(term.Trim())
            || row.col2.Contains(term.Trim())
            || row.col3.Contains(term.Trim())
            || row.col4.Contains(term.Trim()))
    

    And this gave me my expected 355 results. So now the code works at least.

    This doesn't explain where the cast to a char(12) in the translated SQL comes from though.