Search code examples
bashtextascii7-bit

Convert text to 7-bit ASCII from command-line


I'm on OS X 10.5.5 (though it does not matter much I guess)

I have a set of text files with fancy characters like double backquotes, ellipsises ("...") in one character etc.

I need to convert these files to good old plain 7-bit ASCII, preferably without losing character meaning (that is, convert those ellipses to three periods, backquotes to usual "s etc.).

Please advise some smart command-line (bash) tool/script to do that.


Solution

  • The Elinks web browser will convert Unicode entities to their ASCII equivalents, giving things like "--" for "—" and "..." for "…", etc. There is a python module python-elinks which uses the same conversion table, and it would be trivial to turn it into a shell filter, like this:

    #!/usr/bin/env python
    import elinks
    import sys
    for line in sys.stdin:
        line = line.decode('utf-8')
        sys.stdout.write(line.encode('ASCII', 'elinks'))