Search code examples
c#linqentity-frameworkoracle11godp.net

Weird behavior in Entity Framework Linq in string EndsWith method


Background

I have a table that contains only one column: Name. There are only four rows in it, say

| Name      |
| test1.com |
| test2.com |
| test3.com |
| test4.com |

Problem

If I query

var email = "[email protected]";
Table.Where(x => email.EndsWith(x.Name));

I'll get an empty list. but If I query all rows first and calculate Where in memory like this

var email = "[email protected]";
Table.ToList().Where(x => email.EndsWith(x.Name));

I'll get a list contains only test2.com which is correct.

The generated SQL for the first query is

SELECT "Extent1"."Name" AS "Name"
FROM "USER"."Table" "Extent1"
WHERE (( NVL(INSTR(REVERSE(:p__linq__0), REVERSE("Extent1"."Name")), 0) ) = 1)

I've tried replacing :p__linq__0 with '[email protected]' and running the query in the SQLDeveloper, the result is correct.

More Information

If I change EndsWith() to Contains(), the problem will be gone. Here is the generated SQL for Contains()

SELECT "Extent1"."Name" AS "Name"
FROM "USER"."Table" "Extent1"
WHERE (( NVL(INSTR(:p__linq__0, "Extent1"."Name"), 0) ) > 0)

Do you have any idea what's wrong with EndsWith or REVERSE method?

Environment

  • EF5.0
  • .NET4.5
  • Oracle11g
  • ODP.NET11.2 Release 3

Solution

  • This line concerns me and is a common pitfall with people using EF:

    Table.ToList().Where(x => email.EndsWith(x.Name));
    

    The part Table.ToList() is the worst part because this will actually materialise the entire table into memory and then perform the EndsWith in C#. This line:

    Table.Where(x => email.EndsWith(x.Name));
    

    I would caution this approach just on general principle as it will be horrendously slow when the table grows to reasonable size. You can do the heavy lifting before the query hits the database by splitting out the domain from the the email as you construct the query:

    var email = "[email protected]";
    
    /* You should null check this of course and not just assume a match was found */
    var domain = Regex.Match(email , "@(.*)").Groups[1].Value; 
    
    /* Note: ToList() materialisation happens at the end */
    var result = Table.Where(x => x.Name == domain).ToList();  
    

    Furthermore, if you need to match on the domain names of a column storing emails, then my preferred approach would be to split the email and store the domain name in a separate column that you index and just match on, this will scale and be a heck of a lot easier to manage. Remember that these days data is cheap... especially compared to non-indexable table scans.

    Also remember (for both scenarios) that your database is set to CI (case insensitive)