Search code examples
jqueryregexarabicright-to-left

Adapt regex of scroll to ID jQuery script to include Arabic characters


I've adapted a scroll to ID script to look for h2 tags in the page content, take the text, hyphenate it and add that as an ID to the h2, then create a TOC table-of-content list of h3 scroll-to-ID links in the sidebar.

    if($('h2').length >0 ) {
        var ToC = '<h3 class="on-this-page">On this page</h3>' + '<nav role="navigation" class="table-of-contents widget_pages widget">' + '<ul>';
        var el, title, titleFull, hyphenatedTitle;
        $('h2').not(".not-on-this-page, .hotspot-title").each(function() {
            el = $(this);
            titleFull = el.text();
            title = el.text().replace(/[^\w\s]/g, '');
            hyphenatedTitle = title.replace(/ /g, '-');
            $(el).attr('id', hyphenatedTitle );
        var link = '#' + el.attr('id');
        var newLine = '<li>' + '<a href="' + link + '"' + ' rel="m_PageScroll2id"' + '>' + titleFull + '</a>' + '</li>';
        ToC += newLine;
        }); //end each
        ToC += '</ul>' + '</nav>';
        $('.on-this-page-toc-wrapper').append(ToC); // show table of contents in element with class "on-this-page-toc-wrapper"

I'm now building a site in English and Arabic.

The script works intermittently in Arabic and on inspection it seems to be creating IDs with one hyphen for each word instead of the Arabic words with hyphens between them. As a result the script works when there are h2s with different numbers of words on a page, but not if two h2s have the same number of words. I'm pretty sure the problem lies with the regex in this line:

title = el.text().replace(/[^\w\s]/g, '');

How do I adapt this script so that it works with both my English and Arabic headings?

Many thanks for your help.


Solution

  • The problem is that you want to remove all punctuation from the string, but [^\w\s] also matches Arabic letters and numbers.

    You can fix the above code using

    title = el.text().replace(/[\p{P}\p{S}]+/gu, '');
    

    where /[\p{P}\p{S}]+/gu matches any Unicode punctuation and symbols.

    You may test it at regex101.com.