Search code examples
regexperlsedunicodeposix

Substitute special character into blank space


do you have any idea to translate all special character into blank spaces

^@^@^@^@<9C>^G^@^@*+^@^@ABD
^@^@^@^@*+^@^@<DC>_^@^@ASD
^@^@^@^@*+^@^@<DC>_^@^@ASaa
^@^@^@^@<80><C2>^A^@<C2>p^A^@ABD

od

0000000 040136 040136 040136 040136 034474 037103 043536 040136
0000020 040136 025452 040136 040136 041101 005104 040136 040136
0000040 040136 040136 025452 040136 040136 042074 037103 057137
0000060 057100 040500 042102 057012 057100 057100 057100 025100
0000100 057053 057100 036100 041504 057476 040136 040136 041101
0000120 005104 040136 040136 040136 040136 034074 037060 041474
0000140 037062 040536 040136 041474 037062 057160 057101 040500
0000160 042102 000012
0000163

cat -vET

^@^@^@^@<9C>^G^@^@*+^@^@ABD$
^@^@^@^@*+^@^@<DC>_^@^@ABD$
^@^@^@^@*+^@^@<DC>_^@^@ABD$
^@^@^@^@<80><C2>^A^@<C2>p^A^@ABD$

I've tried

LC_ALL=C sed -e 's/[^[:blank:][:print:]]//g'
sed -r 's/[^[:print:]]//g'

or from this one https://unix.stackexchange.com/questions/336677/sed-and-remove-string-between-two-patterns

and the output not as expected

output

ABD
ASD
ASaa
ABD

Solution

  • There is no generic definition of a "special character" and you may need to specify what to keep -- so remove all characters other than those

    $string =~ s/[^a-zA-Z0-9_,.-]//g;  # etc, spell out what to leave 
    

    and you can use the word character pattern \w

    $string =~ s/[^\w,.-]//g;
    

    where i've only given a few punctuation charters as example.

    The POSIX character class, which you're trying to use, also work

    $string =~ s/[^[:alnum:][:punct:]]/;
    

    or using Perl extensions in the style of \p Unicode properties for them

    $string =~ s/[^\p{PosixAlnum}\p{PosixPunct}]//g;
    

    where we can of course find actual Unicode properties as well, on the comprehensive page linked above. Careful with the syntax; see "POSIX Character Classes" section in perlrecharclass.

    Or perhaps you really mean to remove the non-printable characters

    $string =~ s/[^[:print:]]//g;
    

    To use this as a command line program ("one-liner")

    perl -wpe's/\W//g' file > new_file
    

    to save the output as new_file, or

    perl -i.bak -wpe's/[^[:print:]]//g' file
    

    to change the file in-place (remove .bak if you don't want a backup).

    If input is piped from another program

    echo input | perl -wpe's/[^w,.-]//g'