In setting the last name of a person (I know this is a terrible job), I'm looking to lowercase the second match in a combination of any of the following words: Van, Den, Der, In, de, het. And repeat this pattern if it happens again after a '-'(combined familiy names).
Wanted results:
Van Den Broek => Van den Broek
Derksen-van 't schip => Derksen-Van 't Schip
In Het Lid-Van De Boer => In het Lid-Van de Boer
I've tried capitalizing the first letters and lower case after ' using the code below. However for creating the above results with Regex is still a bridge to far for me now.
var formattedLastName = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(lastName);
formattedLastName = Regex.Replace(formattedLastName, @"('\w\b)", (Match match) => match.ToString().ToLower());
You can achieve your expected output using
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.Globalization;
public class Test
{
public static void Main()
{
var strings = new List<string> { "Van Den Broek", "Derksen-van 't schip", "In Het Lid-Van De Boer"};
var textInfo = new CultureInfo("en-US", false).TextInfo;
var pattern = new Regex(@"\b(Van|Den|Der|In|de|het)\b(?:\s+(\w+))?", RegexOptions.Compiled|RegexOptions.IgnoreCase);
foreach (var s in strings)
Console.WriteLine(pattern.Replace(s, m => textInfo.ToTitleCase(m.Groups[1].Value) +
(m.Groups[2].Success ? $" {m.Groups[2].Value.ToLower()}" : "")));
}
}
See the online demo yiedling
Van den Broek
Derksen-Van 't schip
In het Lid-Van de Boer
The \b(Van|Den|Der|In|de|het)\b(?:\s+(\w+))?
regex matches a word from the Van
, Den
, Der
, In
, de
and het
list capturing it into Group 1, and then an optional sequence of one or more whitespaces and then any word captured into Group 2.
The match is replaced with Group 1 turned to title case (note the use of System.Globalization.ToTitleCase
) and if Group 2 matched, a space and Group 2 value turned to lower case.