Search code examples
regexperlcentos6.5perl5.10

Matching a border of a russian word with \b


Is this a bug or am I doing something wrong (when trying to match Russian swear words in a multiplayer game chat log) on CentOS 6.5 with the stock perl 5.10.1?

# echo блядь | perl -ne 'print if /\bбля/'

# echo блядь | perl -ne 'print if /бля/'
блядь

# echo $LANG
en_US.UTF-8

Why doesn't the first command print anything?


Solution

  • You have to tell Perl that the source code contains UTF-8 (use utf8), and that the input (-CI) and output (-CO) are UTF-8 encoded:

    echo 'помёт' | perl -CIO -ne 'use utf8; print if /\bпомё/'