Search code examples
javascriptregexunicodeboundarycharacter-properties

Latin char in Javascript regexp


How can i inlude the use of latin chars like ČčĆ抚Đđ in this javascript regexp

var regex = new RegExp('\\b' + this.value, "i");

UPDATE:

I have this code for filtering checkbox label, but it doesnt work well when there is an input with Č č ć

function listFilter(list, input) {
    var $lbs = list.find('.css-label');

    function filter(){
        var regex = new RegExp('\\b' + this.value);
        var $els = $lbs.filter(function(){
            return regex.test($(this).text());
        });
        $lbs.not($els).hide().prev().hide();
        $els.show().prev().show();
    };

    input.keyup(filter).change(filter)
}

jQuery(function($){
    listFilter($('#list'), $('.search-filter'))
})

here is a fiddle: DEMO


Solution

  • The problem in your regexp is that the word boundary isn't properly detected with those chars (just like \w and \W are badly handled with regards to Unicode).

    I'd suggest to start with

    new RegExp('(^|[\\s\\.])ČčĆ抚Đđ', "i")
    

    and to add to [\\s\\.] the other chars you may be needing as word boundaries.

    If you can't define the expected possible word boundaries, you'd better use a library to produce "Unicode compatible" regular expressions. Some are listed in this related question.