Search code examples
bashcut

How will cut options -b and -c become different with Internationalization


`-b BYTE-LIST'
`--bytes=BYTE-LIST'
     Select for printing only the bytes in positions listed in
     BYTE-LIST.  Tabs and backspaces are treated like any other
     character; they take up 1 byte.  If an output delimiter is
     specified, (see the description of `--output-delimiter'), then
     output that string between ranges of selected bytes.

`-c CHARACTER-LIST'
`--characters=CHARACTER-LIST'
     Select for printing only the characters in positions listed in
     CHARACTER-LIST.  The same as `-b' for now, but
     internationalization will change that.  Tabs and backspaces are
     treated like any other character; they take up 1 character.  If an
     output delimiter is specified, (see the description of
     `--output-delimiter'), then output that string between ranges of
     selected bytes.

Description for -c says that : The same as `-b' for now, but internationalization will change that.

I am assuming Internationalized characters for some languages could have multi-byte characters and that is when -c and -b will behave differently..correct?


Solution

  • Yes. Let's do a test:

    $ cat a
    200
    bést
    203
    -Ümlaut
    $ cut -b2-3 a
    00
    é           <---- é has 2 bytes
    03
    Ü           <---- Ü has 2 bytes
    $ cut -c2-3 a
    00
    és
    03
    Üm