Search code examples
javaregexarabicpersian

Regular expression for persian(arabic) letters without any numbers


In Java i'm looking for a regular expression that accepts any Persian( or Arabic ) letters except any Persian ( or Arabic) numbers. In order to have only letters i found a very good regular expression:

[\u0600-\u065F\u066A-\u06EF\u06FA-\u06FF]

although it is true and works for me, But we know that we can use the \\p{L}+ as a regular expression which accepts all letters from all languages in the world, and in my case ( Arabic - Persian ) i can modified it and use [\\p{InArabic}]+$.

But by using [\\p{InArabic}]+$ not only all Arabic(Persian) letters are going to be accepted but also Arabic numbers are acceptable too, like ۱ ۲.

So my question is how can i modify [\\p{InArabic}]+$ to just accept letters not numbers, or in other word how can i restrict [\\p{InArabic}]+$ to not accept any numbers?

Please Notice that the Persian(Arabic) numbers are like these: ۱ ۲ ۳ ۴ ۵ ۶ ۷ ۸ ۹ ۰


Solution

  • You can use the following regex:

    "[\\p{InArabic}&&\\PN]"
    

    \p{InArabic} matches any character in Unicode Block Arabic (from U+0600 to U+06FF)

    \PN matches any character not belonging to any of the Number category (note the capital P).

    Intersecting the 2 sets give the desired result: both digit ranges (U+0660 to U+0669) and (U+06F0 to U+06F9) are excluded.

    Testing code

    for (int i = 0x600; i <= 0x6ff; i++) {
        String c = "" + (char) i;
        System.out.println(Integer.toString(i, 16) + " " + c.matches("[\\p{InArabic}&&\\PN]"));
    }