Search code examples
c#regexxamarin.android

Regex match with Arabic


i have a text in Arabic and i want to use Regex to extract numbers from it. here is my attempt.

String :

"ما المجموع:

1+2"

Match match = Regex.Match(text, "المجموع: ([^\\r\\n]+)", RegexOptions.IgnoreCase);

it will always return false. and groups.value will always return null.

expected output:

match.Groups[1].Value //returns (1+2)

Solution

  • The regex you wrote matches a word, then a colon, then a space and then 1 or more chars other than backslash, r and n.

    You want to match the whole line after the word, colon and any amount of whitespace chars:

    var text = "ما المجموع:\n1+2";
    var result = Regex.Match(text, @"المجموع:\s*(.+)")?.Groups[1].Value;
    Console.WriteLine(result); // => 1+2
    

    See the C# demo

    Other possible patterns:

    @"المجموع:\r?\n(.+)" // To match CRLF or LF line ending only
    @"المجموع:\n(.+)"    // To match just LF ending only
    

    Also, if you run the regex against a long multiline text with CRLF endings, it makes sense to replace .+ wit [^\r\n]+ since . in a .NET regex matches any chars but newlines, LF, and thus matches CR symbol.