Search code examples
javascriptarabic

Javascript string split issue in arabic/persian strings


i am trying to split two persian characters using the javascript string split but it is not splitting properly.

var test = '"حسن روحانی"،"حسن+روحانی"';
var tmpkeywords =  test.split(',');
console.log(tmpkeywords);

the split result should be like :

[""حسن روحانی""،""حسن+روحانی""] 

but instead it is coming like [""حسن روحانی"،"حسن+روحانی"↵"]. It works well in English characters or numbers.

My fiddle: https://jsfiddle.net/tueo3sfa/1/


Solution

  • Your string "حسن روحانی"،"حسن+روحانی" does not contain the character "," (U+002C COMMA) but "،" (U+060C ARABIC COMMA): that is why it will not split and return just the whole original string.

    To get what you want, you will need to split by "،"

    var test = '"حسن روحانی"،"حسن+روحانی"';
    var tmpkeywords =  test.split(',');
    console.log(tmpkeywords);
    

    Also note that there are many different commas in other languages, if you need to handle them in a generic way you may want to specify them all (raw unfiltered list http://www.fileformat.info/info/unicode/char/search.htm?q=comma&han=Y&preview=entity) or, if applicable, to use Unicode classes (for example splitting by punctuation characters, see also http://inimino.org/~inimino/blog/javascript_cset for an example).