Search code examples
vimsedutf-8bytecode

How do I remove specific byte sequences with sed and vim using hex addresses?


I've got a string, looks like this in vim

PFLUGERVILLE TX 7x691 227 12515 <83>¨¨ x Research Boulevard

For reference in vim,

ga Print the ascii value of the character under the cursor in decimal, hexadecimal and octal.

g8 Print the hex values of the bytes used in the character under the cursor, assuming it is in UTF-8 encoding. This also shows composing characters. The value of 'maxcombine' doesn't matter.

I can inspect it if I go over it <83> and type ga, I get this

<<83>> 131, Hex 0083, Octal 203

If I type g8, I get

c2 83

I would have thought that

sed -e's/\x00\x83//g' ./file.csv

would work to remove the character, but no joy.


Solution

  • Not

    sed -e's/\x00\x83//g' ./file.csv
    

    but,

    LC_CTYPE=C sed -e's/\x83//g' ./file.csv
    

    You have to use LC_CTYPE=C, and remove the starting \x00.