Search code examples
pythonstringpython-unicodeunicode-escapes

Why Python 3.8 interpret String like "\x48" as "H". I want to write it as "\x48" in a string


I am using python to code a simple program which take sample c++ ".cpp" file as a string and then find all the strings declared into it. Then i want to replace that string into its equivalent Hexcode like "H"is equal to "\x48".

My code is

f = open("sample.cpp", "r")
f1 = f.read()
regex = r"\"(?:(?:(?!(?<!\\)\").)*)\""

ii=0
for str2021 in find:
   print("Output Of Encode=")
   str2021="".join(r'\x{0:x}'.format(ord(c)) for c in find[ii])
   print (str2021)
   ii=ii+1

subst='\x22\x48\x22'
result = re.sub(regex,subst, f1, 0)
if result:
  print("substituted op=")
  print (result)

Now when i print result value it shows "H" instead of "\x22\x48\x22". How can i forcefully do it in python 3.8?

Also if i do it like this result = re.sub(regex,str2021, f1, 0) it gives an error raise s.error('bad escape %s' % this, len(this)) re.error: bad escape \x at position 0

I wanted to iterate it so that at every string found using regex in cpp file then code will automatically convert string into its equivalent unicode hex code like this

Sample.cpp string a="abc"; string b="H";

It should change this cpp file like this

string a="\x61\x62\x63"; string b="\x48";

Kindly suggest solution


Solution

  • You can use raw strings:

    subst = r'\x22\x48\x22'
    

    Then, also change your re.sub call:

    re.sub(regex, re.escape(subst), f1, 0)
    

    Python doc says, escaping just the backslashes is better, though:

    re.sub(regex, subst.replace("\\", r"\\"), f1, 0)