I have simple regex which founds some word in text:
var patern = new RegExp("\bsomething\b", "gi");
This match word in text with spaces or punctuation around.
So it match:
I have something.
But doesn't match:
I havesomething.
what is fine and exactly what I need.
But I have issue with for example Arabic language. If I have regex:
var patern = new RegExp("\bرياضة\b", "gi");
and text:
رياضة أنا أحب رياضتي وأنا سعيد حقا هنا لها حبي
The keyword which I am looking for is at the end of the text.
But this doesn't work, it just doesn't find it.
It works if I remove \b
from regex:
var patern = new RegExp("رياضة", "gi");
But that is now what I want, because I don't want to find it if it's part of another word like in english example above:
I havesomething.
So I really have low knowledge about regex and if anyone can help me to work this with english and languages like arabic.
We have first to understand what does \b
mean:
\b is an anchor that matches at a position that is called a "word boundary".
In your case, the word boundaries that you are looking for are not having other Arabic letters.
To match only Arabic letters in Regex, we use unicode:
[\u0621-\u064A]+
Or we can simply use Arabic letters directly
[ء-ي]+
The code above will match any Arabic letters. To make a word boundary out of it, we could simply reverse it on both sides:
[^ء-ي]ARABIC TEXT[^ء-ي]
The code above means: don't match any Arabic characters on either sides of an Arabic word which will work in your case.
Consider this example that you gave us which I modified a little bit:
أنا أحب رياضتي رياض رياضة رياضيات وأنا سعيد حقا هنا
If we are trying to match only رياض
, this word will make our search match also رياضة
, رياضيات
, and رياضتي
. However, if we add the code above, the match will successfully be on رياض
only.
var x = " أنا أحب رياضتي رياض رياضة رياضيات وأنا سعيد حقا هنا ";
x = x.replace(/([^ء-ي]رياض[^ء-ي])/g, '<span style="color:red">$1</span>');
document.write (x);
If you would like to account for أآإا
with one code, you could use something like this [\u0622\u0623\u0625\u0627]
or simply list them all between square brackets [أآإا]
. Here is a complete code
var x = "أنا هنا وانا هناك .. آنا هنا وإنا هناك";
x = x.replace(/([أآإا]نا)/g, '<span style="color:red">$1</span>');
document.write (x);
Note: If you want to match every possible Arabic characters in Regex including all Arabic letters أ ب ت ث ج
, all diacritics َ ً ُ ٌ ِ ٍ ّ
, and all Arabic numbers ١٢٣٤٥٦٧٨٩٠
, use this regex: [،-٩]+
Useful link about the ranking of Arabic characters in Unicode: https://en.wikipedia.org/wiki/Arabic_script_in_Unicode