Control characters I'm talking about can be found here: http://ascii.cl/control-characters.htm
I need the control characters as their single character length entity, not represented as an ASCII code, or the plain text of their symbol.
As shown above in both sublime and notepad text editors, I need the actual symbols, not their ascii code. So I need the characters as shown in the second invalid_chrs_list.
Is there a way to get these symbols, a file somewhere online, or a site that I can copy paste them from?
Edit:
#Invalid characters ascii codes here (http://ascii.cl/control-characters.htm)
#invalid_chrs_list = [0,1,2,3,4,5,6,7,8,16,17,18,19,20,21,22,23,24,25,26,27] # ascii
#invalid_chrs_list = ['', ''] # real for acsii code 3 and 17 - NEED THE REST - Can't post these characters into stackoverflow so just pretend their there like in my screenshot.
invalid_chrs_list = ['\x00','\x01','\x02','\x03','\x04','\x05','\x06','\x07','\x08','\x10','\x11','\x12','\x13','\x14','\x15','\x16','\x17','\x18','\x19','\x1a','\x1b'] # escaped
with open(file, 'rb') as f:
# Iterate through the rows
for row in f:
# Catch invalid characters
for char in row:
if char in invalid_chrs_list: # <--- MAKE THIS FASTER
print ('found')
break
alternate for loop which would be faster if the check worked:
for char in invalid_chrs_list:
if char in row:
I've tried using ord(char)
and chr(char)
in if char in invalid_chrs_list:
on each of the lists, but am not sure how to compare them to each other to verify a match
Edit - Solution: The list in the code below is the correct list, it is not necessary to use the literals I showed in my images.
I was looking in the wrong place for the answer, thank you to @Peteris for pointing me in the right direction.
I needed to switch the file mode to text: 'r'
or I need to encode the character I'm checking with char.encode()
for it to check the literal properly. In my case I need to be opening the file in binary mode so I went with char.encode()
.
invalid_chrs_list = ['\x00','\x01','\x02','\x03','\x04','\x05','\x06','\x07','\x08','\x10','\x11','\x12','\x13','\x14','\x15','\x16','\x17','\x18','\x19','\x1a','\x1b']
with open('test.txt', 'rb') as f:
# Iterate through the rows
for row in f:
for char in invalid_chrs_list:
if char.encode() in row:
print ('found')
break
Make a tiny program that simply outputs the bytes you want to a file, converting them to bytes from the ascii code?
But I'd bet that you don't really want to copy/paste them as literal characters in your code, it can't work that way for e.g. newline character and others; ascii codes or escaped representations is the proper way to go.