Search code examples
angularjsregexangular-filterscapitalize

Capitalize a cyrillic strings with JavaScript


I'm making an AngularJS filter which capitalizes each word's first letter. It works well with a-zA-Z letters, but in my case I use also cyrillic characters and I would like to make it work.

var strLatin = "this is some string";
var strCyrillic = "това е някакъв низ";

var newLatinStr = strLatin.replace(/\b[\wа-яА-Я]/g, function(l){ 
    return l.toUpperCase();
});

var newCyrillicStr = strCyrillic.replace(/\b[\wа-яА-Я]/g, function(l){ 
    return l.toUpperCase();
});

Here I got some CodePen example: http://codepen.io/brankoleone/pen/GNxjRM


Solution

  • You need a custom word boundary that you may build using groupings:

    var strLatin = "this is some string";
    var strCyrillic = "това е някакъв низ";
    var block = "\\w\\u0400-\\u04FF";
    var rx = new RegExp("([^" + block + "]|^)([" + block + "])", "g");
    
    var newLatinStr = strLatin.replace(rx, function($0, $1, $2){ 
        return $1+$2.toUpperCase();
    });
    console.log(newLatinStr);
    var newCyrillicStr = strCyrillic.replace(rx, function($0, $1, $2){ 
        return $1+$2.toUpperCase();
    });
    console.log(newCyrillicStr);

    Details:

    • The block contains all ASCII letters, digits and underscore and all basic Cyrillic chars from the basic Cyrillic range (if you need more, see Cyrillic script in Unicode ranges Wiki article and update the regex accordingly), perhaps, you just want to match Russian with А-ЯЁёа-я, then use var block = "\\wА-ЯЁёа-я
    • The final regex matches and captures into Group 1 any char other than the one defined in the block or start of string, and then matches and captures into Group 2 any char defined in the block.