Search code examples
c#regexuppercaselowercase

Lowercase the second match in a combination of words using Regex.Replace


In setting the last name of a person (I know this is a terrible job), I'm looking to lowercase the second match in a combination of any of the following words: Van, Den, Der, In, de, het. And repeat this pattern if it happens again after a '-'(combined familiy names).

Wanted results:
Van Den Broek => Van den Broek
Derksen-van 't schip => Derksen-Van 't Schip
In Het Lid-Van De Boer => In het Lid-Van de Boer

I've tried capitalizing the first letters and lower case after ' using the code below. However for creating the above results with Regex is still a bridge to far for me now.

var formattedLastName = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(lastName); 
formattedLastName = Regex.Replace(formattedLastName, @"('\w\b)", (Match match) => match.ToString().ToLower());

Solution

  • You can achieve your expected output using

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text.RegularExpressions;
    using System.Globalization;
    
    public class Test
    {
        public static void Main()
        {
            var strings = new List<string> { "Van Den Broek", "Derksen-van 't schip", "In Het Lid-Van De Boer"};
            var textInfo = new CultureInfo("en-US", false).TextInfo;
            var pattern = new Regex(@"\b(Van|Den|Der|In|de|het)\b(?:\s+(\w+))?", RegexOptions.Compiled|RegexOptions.IgnoreCase);
            foreach (var s in strings)
                Console.WriteLine(pattern.Replace(s, m => textInfo.ToTitleCase(m.Groups[1].Value) + 
                   (m.Groups[2].Success ? $" {m.Groups[2].Value.ToLower()}" : "")));
        }
    }
    

    See the online demo yiedling

    Van den Broek
    Derksen-Van 't schip
    In het Lid-Van de Boer
    

    The \b(Van|Den|Der|In|de|het)\b(?:\s+(\w+))? regex matches a word from the Van, Den, Der, In, de and het list capturing it into Group 1, and then an optional sequence of one or more whitespaces and then any word captured into Group 2.

    The match is replaced with Group 1 turned to title case (note the use of System.Globalization.ToTitleCase) and if Group 2 matched, a space and Group 2 value turned to lower case.