Search code examples
javascriptjqueryregexarabicxregexp

How to detect text language with jQuery and XRegExp to display mixed RTL and LTR text correctly


I'm trying to display a Twitter feed in a WordPress site. My client tweets in English and in Arabic and sometimes in a combination of the two languages. I need to detect the language and add the class 'rtl' to Arabic tweets and also those tweets where the content is predominately in Arabic. I'm using a plugin which strips the Twitter iso_language_code metadata.

When attempting this on a previous development site a few years ago, I remember successfully using a variation of Tristan's solution found here:

How to detect that text typed in text-area is RTL

Unfortunately it no longer seems to work.

Tristan's jsfiddle no longer works either.

I'm using this resource:

http://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-min.js

and this script:

    jQuery(document).ready(function($) {
        $('p').each(function() {
        if(isRTL($(this).text()))
            $(this).addClass('rtl');
    });

    function isRTL(str) {
        var isArabic = XRegExp('[\\p{Arabic}]');
        var isLatin = XRegExp('[\\p{Latin}]');
        var partLatin = 0;
        var partArabic = 0;
        var rtlIndex = 0;
        var isRTL = false;

        for(i=0;i<str.length;i++){
            if(isLatin.test(str[i]))
                partLatin++;
            if(isArabic.test(str[i]))
                partArabic++;
        }
        rtlIndex = partArabic/(partLatin + partArabic);
        if(rtlIndex > .5) {
            isRTL = true;
        }

        return isRTL;
    }

    });

Can anyone help me with where I'm going wrong?

Many thanks,

Phil


Update


I've managed to get a partial solution working:

    jQuery(document).ready(function($) {

    var arabic = /[\u0600-\u06FF]/;

    $('p').each(function() {

        if (arabic.test($(this).text())) {
      $(this).addClass( "rtl" ).attr('style','text-align:right;direction:rtl');
      }

      else {
      $(this).addClass( "ltr" ).attr('style','text-align:left;direction:ltr');
      }

    });

    });

My apologies in advance - I'm very much a beginner at this.

I've done a jsfiddle here:

http://jsfiddle.net/philnicholl/4xn6jftw

This works if the text is all Arabic or all English but a single word of Arabic in an English tweet will mess things up.

Bizarely, when I added this script to a real world WordPress test, it produced exactly the opposite result from what I wanted, as in Arablic paragraphs and tweets were given the LTR class and styling and English text given RTL.

Reversing the if else gives the right result.

Any help would be greatly appreciated.

Thank you again.

Phil


Solution

  • You can use regular expression to determine if contain only Arabic letters

    $('p').each(function() {
        if(isRTL($(this).text()))
            $(this).addClass('rtl');
    });
    
    function isRTL(str) {
        return /^[\u0600-\u06FF]/.test(str);
    }
    p.rtl {
        direction: rtl;
    }
    p.ltr {
        direction: ltr;
    }
    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
    <p>Hello World</p>
    <p>مرحبا بالعالم</p>
    <p>Hello World مرحبا بالعالم</p>