Search code examples
c#regexstring

Split String At Every Non-Letter/Non-Number Character


Imagine a string that contains special characters like $§%%,., numbers and letters.

I want to receive the letter and number junks of an arbitrary string as an array of strings.

A good solution seems to be the use of regex, but I don't know how to express [numbers and letters]

// example
"abc" = {"abc"};
"ab .c" = {"ab", "c"}
"ab123,cd2,  ,,%&$§56" = {"ab123", "cd2", "56"}

// try
string input = "jdahs32455$§&%$§df233§$fd";
string[] output = input.Split(Regex("makejunksfromstring"));

Solution

  • To extract chunks of 1 or more letters/digits you may use

    [A-Za-z0-9]+   # ASCII only letters/digits
    [\p{L}0-9]+    # Any Unicode letters and ASCII only digits 
    [\p{L}\p{N}]+  # Any Unicode letters/digits 
    

    See a regex demo.

    C# usage:

    string[] output = Regex.Matches(input, @"[\p{L}\p{N}]+").Cast<Match>().Select(x => x.Value).ToArray();