How to tell python that a string is actually bytes-object? Not converting

I have a txt file which contains a line:

 '        6: "\\351\\231\\220\\346\\227\\266\\345\\205\\215\\350\\264\\271"'

The contents in the double quotes is actually octal encoding, but with two escape characters.

After the line has been read in, I used regex to extract the contents in the double quotes.

c = re.search(r': "(.+)"', line).group(1)

After that, I have two problem:

First, I need to replace the two escape characters with one.

Second, Tell python that the str object c is actually a byte object.

None of them has been done.

I have tried:

re.sub('\\', '\', line)
re.sub(r'\\', '\', line)
re.sub(r'\\', r'\', line)

All failed.

A bytes object can be easily define with 'b'.

c = b'\351\231\220\346\227\266\345\205\215\350\264\271'

How to change the variable type of a string to bytes? I think this not a encode-and-decode thing.

I googled a lot, but with no answers. Maybe I use the wrong key word.

Does anyone know how to do these? Or other way to get what I want?

Solution

This is always a little confusing. I assume your bytes object should represent a string like:

b = b'\351\231\220\346\227\266\345\205\215\350\264\271'
b.decode()
# '限时免费'

To get that with your escaped string, you could use the codecs library and try:

import re
import codecs

line =  '        6: "\\351\\231\\220\\346\\227\\266\\345\\205\\215\\350\\264\\271"'
c = re.search(r': "(.+)"', line).group(1)

codecs.escape_decode(bytes(c, "utf-8"))[0].decode("utf-8")
# '限时免费'

giving the same result.