I am trying to parse words only using Regex in a string. This string contains Turkish characters which are çğıİöşü
.
I tried \b[\wçğıİöşü]+\b
regex pattern but it doesn't work totally well.
In the above picture I was expecting the pattern to be matched Behiç
and Güneş
completely. But it only matches Behi
and Güne
as you can see. What is the correct pattern to match Behiç
and Güneş
?
The result you are getting is because the default regex mode in Regex101 is PCRE (PHP) with support for unicode characters turned off. If you change the flavor to Python (q.v. the demo below), you will see the behavior you expect.
Just turn on support for unicode or UTF-8 and your problem should be solved.