Search code examples
pythonpython-3.xinputunicodeunicode-literals

input() and literal unicode parsing


Using input() takes a backslash as a literal backslash so I am unable to parse a string input with unicode.

What I mean:

Pasting a string like "\uXXXX\uXXXX\uXXXX" into an input() call will become interpreted as "\\uXXXX\\uXXXX\\uXXXX" but I want it read \u as a single character instead of two separate characters.

Does anyone know how or if possible to make it happen?

Edit: I am taking input as above and converting it to ascii such as below..

import unicodedata

def Reveal(unicodeSol):
    solution = unicodedata.normalize('NFKD', unicodeSol).encode('ascii', 'ignore')
    print(solution)

while(True):
    UserInput = input("Paste Now: ")
    Reveal(UserInput)

Per the answer I marked, a correct solution would be:

import unicodedata
import ast

def Reveal(unicodeSol):
    solution = unicodedata.normalize('NFKD', unicodeSol).encode('ascii', 'ignore')
    print(solution)

while(True):
    UserInput = ast.literal_eval('"{}"'.format(input("Paste Now: ")))
    Reveal(UserInput)

Solution

  • If you can be sure that input would not contain quotes, you can convert the input into a string literal representation, by adding quotes in both ends , and then use ast.literal_eval() to evaluate it into a string. Example -

    import ast
    inp = input("Input : ")
    res = ast.literal_eval('"{}"'.format(inp))
    

    If the input can contain quotes you can replace double quotes with r'\"' before evaluating using ast.literal_eval .