Search code examples
pythonstringencode

encoding strings in python


I am trying to encode a piece of text that I am getting from an Excel document. It contains all sorts of weird characters like quotation mark, backslashes, parentheses etc. What is the proper way to convert it to Python compatible string so I can process it and write it to a variable?

ExampleText = "MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER POWDER ACTUATED FASTENERS (P.A.F.S) SPACED ON 8" CENTERS FOR BEARING WALLS, AND AT 12" O.C. FOR NON-LOAD BEARING WALLS (U.N.O.), WITH 1 1/2" MINIMUM PENETRATION INTO CONCRETE. AT X-BRACED SHEAR WALLS, TRACK SHALL BE ATTACHED PER DETAILS.  At Infinity Shear Panels (ISP’S) attach to slab w/ 0.145" x 1 1/2” powder actuated fasteners spaced on 4” centers (HILTI DS 37 P10 or equal) -OR- (6) 3/8" DIA. 2205 expansion anchors w/ 2 1/2" min. embedment - OR-Simpson "Titen" screws  @ 6" o.c."

I tried: str(ExampleText) but it obviously fails.

Thank you for help!

ps. Here's the error that I get: UnicodeEncodeError: ('unknown', '\x00', 0, 1, '') ps2. I am on IronPython2.7 i know a bummer :-(


Solution

  • You can use the escape() function from the re package:

    >>> import re
    >>> re.escape(ExampleText)
        '\\"MINIMUM\\ TRACK\\ FASTENING\\ SHALL\\ BE\\ 0.145\\"\\ DIAMETER ...'
    >>> ExampleText = ExampleText.decode('string_escape')
        '"MINIMUM TRACK FASTENING SHALL BE 0.145" DIAMETER ...'
    

    The escape() function will escape all non-alphanumeric characters with their double-backslashed equivalents. This should handle your input string well.