Search code examples
pythonasciicontrol-characters

Need .txt file that lists all ASCII Control Characters as single character entities/symbols


Control characters I'm talking about can be found here: http://ascii.cl/control-characters.htm

I need the control characters as their single character length entity, not represented as an ASCII code, or the plain text of their symbol.

See below... a b

As shown above in both sublime and notepad text editors, I need the actual symbols, not their ascii code. So I need the characters as shown in the second invalid_chrs_list.

Is there a way to get these symbols, a file somewhere online, or a site that I can copy paste them from?

Edit:

#Invalid characters ascii codes here (http://ascii.cl/control-characters.htm)
#invalid_chrs_list = [0,1,2,3,4,5,6,7,8,16,17,18,19,20,21,22,23,24,25,26,27] # ascii
#invalid_chrs_list = ['', ''] # real for acsii code 3 and 17 - NEED THE REST - Can't post these characters into stackoverflow so just pretend their there like in my screenshot.
invalid_chrs_list = ['\x00','\x01','\x02','\x03','\x04','\x05','\x06','\x07','\x08','\x10','\x11','\x12','\x13','\x14','\x15','\x16','\x17','\x18','\x19','\x1a','\x1b'] # escaped

with open(file, 'rb') as f:
    # Iterate through the rows
    for row in f:
        # Catch invalid characters
        for char in row:
            if char in invalid_chrs_list: # <--- MAKE THIS FASTER
                print ('found')
                break

alternate for loop which would be faster if the check worked:

for char in invalid_chrs_list:
    if char in row:

I've tried using ord(char) and chr(char) in if char in invalid_chrs_list: on each of the lists, but am not sure how to compare them to each other to verify a match

Edit - Solution: The list in the code below is the correct list, it is not necessary to use the literals I showed in my images.

I was looking in the wrong place for the answer, thank you to @Peteris for pointing me in the right direction.

I needed to switch the file mode to text: 'r' or I need to encode the character I'm checking with char.encode() for it to check the literal properly. In my case I need to be opening the file in binary mode so I went with char.encode().

    invalid_chrs_list = ['\x00','\x01','\x02','\x03','\x04','\x05','\x06','\x07','\x08','\x10','\x11','\x12','\x13','\x14','\x15','\x16','\x17','\x18','\x19','\x1a','\x1b']

    with open('test.txt', 'rb') as f:
            # Iterate through the rows
            for row in f:
                    for char in invalid_chrs_list:
                            if char.encode() in row:
                                    print ('found')
                                    break

Solution

  • Make a tiny program that simply outputs the bytes you want to a file, converting them to bytes from the ascii code?

    But I'd bet that you don't really want to copy/paste them as literal characters in your code, it can't work that way for e.g. newline character and others; ascii codes or escaped representations is the proper way to go.