Search code examples
sedlocalecapitalize

capitalize names having international letters like éèàö


My attempts on RHEL 6.3:

$ export LC_ALL=fr_FR.utf-8
$ sed 's/ \([a-zA-Zé]\)\([^ ]*\) /[\u\1\L\2\E] /g' <<< " hélène  NOËL  étienne "
 hélène  NOËL  étienne

$ export LC_ALL=C
$ sed 's/ \([a-zA-Zé]\)\([^ ]*\) /[\u\1\L\2\E] /g' <<< " hélène  NOËL  étienne "
[Hÿlÿne] [Noÿl] [ÿtienne]

$ sed --version
GNU sed version 4.2.1
[...]

Is able to output the following?

[Hélène] [Noël] [Étienne]

Solution

  • is this ok for you?

    kent$  echo " hélène  NOËL  étienne "|sed -r 's/(\S)(\S+)/[\U\1\L\2]/g'
     [Hélène]  [Noël]  [Étienne] 
    

    my sed version is abit different from yours, but I think the line should run there too:

    kent$  sed --version |head -1
    sed (GNU sed) 4.2.2
    

    added my locale settings, you may want to know:

    kent$  echo $LANG
    en_US.utf8
    
    kent$  locale
    LANG=en_US.utf8
    LC_CTYPE="en_US.utf8"
    LC_NUMERIC="en_US.utf8"
    LC_TIME="en_US.utf8"
    LC_COLLATE="en_US.utf8"
    LC_MONETARY="en_US.utf8"
    LC_MESSAGES="en_US.utf8"
    LC_PAPER="en_US.utf8"
    LC_NAME="en_US.utf8"
    LC_ADDRESS="en_US.utf8"
    LC_TELEPHONE="en_US.utf8"
    LC_MEASUREMENT="en_US.utf8"
    LC_IDENTIFICATION="en_US.utf8"
    LC_ALL=