I want to know is text contain any letter in Urdu or Arabic..using this condition which produce false results when special characters comes.what is right way to do it .any library or what is right regex for this ?
if (cap.replaceAll("\\s+", "").matches("[A-Za-z]+")
|| cap.replaceAll("\\s+", "").matches("[A-Za-z0-9]+")) {
Log.d("isUrdu", "false");
caption.setTypeface(Typeface.DEFAULT);
caption.setTextSize(16);
} else {
Log.d("isUrdu", "True");
/* if (Build.VERSION.SDK_INT > Build.VERSION_CODES.JELLY_BEAN_MR1) {*/
caption.setTypeface(typeface);
caption.setTextSize(20);
/* }*/
}
Taking a look at the Wikipedia Urdu alphabet, it includes the following Unicode ranges:
U+0600 to U+06FF
U+0750 to U+077F
U+FB50 to U+FDFF
U+FE70 to U+FEFF
To match an Arabic letter, you may use a \p{InArabic}
Unicode property class.
So, you may use
if (cap.matches("(?s).*[\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70-\\uFEFF].*"))
{
/*There is an Urdu character*/
}
else if (cap.matches("(?s).*\\p{InArabic}.*"))
{
/* The string contains an Arabic character */
}
else { /*No Arabic nor Urdu chars detected */ }
Note that (?s)
enables the DOTALL
modifier so that .
could match linebreak symbols, too.
For better performance with matches
, you may use reverse classes instead of the first .*
: "(?s)[^\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70-\\uFEFF]*[\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70-\\uFEFF].*"
and "(?s)\\P{InArabic}*\\p{InArabic}.*"
respectively.
Note you may also use shorter "[\\u0600-\\u06FF\\u0750-\\u077F\\uFB50-\\uFDFF\\uFE70-\\uFEFF]"
and "\\p{InArabic}"
patterns with Matcher#find()
.