Search code examples
c#sortinglambdaexpressionletters

OrderBy ignoring accented letters


I want a method like OrderBy() that always orders ignoring accented letters and to look at them like non-accented. I already tried to override OrderBy() but seems I can't do that because that is a static method.

So now I want to create a custom lambda expression for OrderBy(), like this:

public static IOrderedEnumerable<TSource> ToOrderBy<TSource, TKey>(
    this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    if(source == null)
        return null;

    var seenKeys = new HashSet<TKey>();

    var culture = new CultureInfo("pt-PT");
    return source.OrderBy(element => seenKeys.Add(keySelector(element)), 
                          StringComparer.Create(culture, false));
} 

However, I'm getting this error:

Error 2 The type arguments for method 'System.Linq.Enumerable.OrderBy<TSource,TKey>(System.Collections.Generic.IEnumerable<TSource>, System.Func<TSource,TKey>, System.Collections.Generic.IComparer<TKey>)' cannot be inferred from the usage. Try specifying the type arguments explicitly.

Seems it doesn't like StringComparer. How can I solve this?

Note:

I already tried to use RemoveDiacritics() from here but I don't know how to use that method in this case. So I tried to do something like this which seems nice too.


Solution

  • Solved! I was getting that error because to use StringComparer the element to sort in OrderBy() expression that element needs to be a string.

    So when I know that element is a string I cast to a string and I use the RemoveDiacritics() method to ignore the accented letters and to look at them like non-accented.

    public static IOrderedEnumerable<TSource> ToOrderBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
    {
        if(!source.SafeAny())
            return null;
    
        return source.OrderBy(element => Utils.RemoveDiacritics(keySelector(element).ToString()));
    }
    

    To garantee the RemoveDiacritics() works fine I add a HtmlDecode() line.

    public static string RemoveDiacritics(string text)
    {
        if(text != null)
            text = WebUtility.HtmlDecode(text);
    
        string formD = text.Normalize(NormalizationForm.FormD);
        StringBuilder sb = new StringBuilder();
    
        foreach (char ch in formD)
        {
            UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
            if (uc != UnicodeCategory.NonSpacingMark)
            {
                sb.Append(ch);
            }
        }
    
        return sb.ToString().Normalize(NormalizationForm.FormC);
    }