Search code examples
sqlpostgresqlentity-framework-corecollation

Postgresql with accent-insensitive and case-insensitive pattern matching


Is there a simpler way to make a AI and CI search with Entity Frameworkd and Postgres than use multipe EF.Functions?

EF.Functions.ILike(
        EF.Functions.Unaccent(u.Name), 
        EF.Functions.Unaccent($"%{value}%")
    )

Solution

  • Unfortunately no, at least not for pattern matching with ILIKE that you're showing.

    For sorting and direct comparisons, you can set up an accent- and case-insensitive collation: there's an example in PostgreSQL doc.

    CREATE COLLATION ignore_accent_case (
       provider = icu
     , deterministic = false
     , locale = 'und-u-ks-level1');
    SELECT 'Å' = 'A' COLLATE ignore_accent_case; -- true
    SELECT 'z' = 'Z' COLLATE ignore_accent_case; -- true
    

    You need to either set it as the column's default, or use it in your comparison - the former is preferable since your index will likely use the default, which the condition in the query needs to match to benefit from the index. Here's an example from npgsql doc:

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.HasCollation("ignore_accent_case", locale: "und-u-ks-level1", provider: "icu", deterministic: false);
    
        modelBuilder.Entity<Customer>().Property(c => c.Name)
            .UseCollation("ignore_accent_case");
    }
    

    This would make the name always compare and sort the way you want.

    Pattern matching in Postgres is an exception as of version 17 - the chapter 9.7. Pattern Matching welcomes you with a warning about exactly that:

    The pattern matching operators of all three kinds do not support nondeterministic collations. If required, apply a different collation to the expression to work around this limitation.

    The three kinds are LIKE/~~ (and ILIKE/~~*), SIMILAR TO and POSIX ~/~*.

    As pointed out by @Laurenz Albe below, that feature is in the works for PostgreSQL 18:

    Technically you could incorporate the case and accent variants into your pattern, but I wouldn't do that. As underlined by @Panagiotis Kanavos, performance will suffer and the pattern will get ugly, unreadable, unreliable and hard to maintain. In case of your ILIKE, it'll also affect the query structure because its syntax doesn't support alternations, so you'd have to connect multiple ILIKEs in a chain of ORd conditions.