Search code examples
streambytemp3

Figure out bytes content


I was working on a compound file which contains several streams. I'm frustrated how to figure out the content of each stream. I don't know if these bytes are text or mp3 or video. for example: is there a way to understand what types of data could these bytes are?

b'\x00\x00\x00\x00\x00\x00\x00\x00\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x0bz\xcc\xc9\xc8\xc0\xc0\x00\xc2?\x82\x1e<\x0ec\xbc*8\x19\xc8i\xb3W_\x0b\x14bH\x00\xb2-\x99\x18\x18\xfe\x03\x01\x88\xcf\xc0\x01\xc4\xe1\x0c\xf9\x0cE\x0c\xd9\x0c\xc5\x0c\xa9\x0c%\x0c\x86`\xcd \x0c\x020\x1a\x00\x00\x00\xff\xff\x02\x080\x00\x96L~\x89W\x00\x00\x00\x00\x80(\\B\xefI;\x9e}p\xfe\x1a\xb2\x9b>(\x81\x86/=\xc9xH0:Pwb\xb7\xdck-\xd2F\x04\xd7co'

Solution

  • Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.

    So what is the signature?

    In computing, a file signature is data used to identify or verify the contents of a file. In particular, it may refer to:

    • File magic number: bytes within a file used to identify the format of the file; generally a short sequence of bytes (most are 2-4 bytes long) placed at the beginning of the file; see list of file signatures

    • File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file contents, generally against transmission errors or malicious attacks. The signature can be included at the end of the file or in a separate file.

    I used the magic number to define the magic number term I'm copying this from Wikipedia

    In computer programming, the term magic number has multiple meanings. It could refer to one or more of the following:

    • Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants
    • A constant numerical or text value used to identify a file format or protocol; for files, see List of file signatures
    • Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)

    in the second point it is a certain sequence of bytes like

    PNG (89 50 4E 47 0D 0A 1A 0A) 
    

    or

    BMP (42 4D)
    

    So how to know the magic number of each file?

    in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article

    PowerShell V5 brings in Format-Hex, which can provide an alternative approach to reading the file and displaying the hex and ASCII value to determine the magic number.

    form Format-Hex help I'm copying this description

    The Format-Hex cmdlet displays a file or other input as hexadecimal values. To determine the offset of a character from the output, add the number at the leftmost of the row to the number at the top of the column for that character.

    This cmdlet can help you determine the file type of a corrupted file or a file which may not have a file name extension. Run this cmdlet, and then inspect the results for file information.

    this tool is very good also to get the magic number of a file. Here is an example enter image description here

    another tool is online hex editor but to be onset I didn't understand how to use it.

    now we got the magic number but how to know what type of data or is that file or stream? and that is the most good question. Luckily there are many database for these magic numbers. let me list some

    1. File Signatures
    2. FILE SIGNATURES TABLE
    3. List of file signatures

    for example the first database has a search capability. just enter the magic number with no spaces and search

    enter image description here

    after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.

    I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream

    def GetPngStartingOffset(arr):
    
        #targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
        markerFound = False
        startingOffset = 0
        previousValue = 0
        arraylength = range(0, len(arr) -1) 
    
        for i in arraylength:
            currentValue = arr[i]
            if (currentValue == 137):   # 0x89  
                markerFound = True
                startingOffset = i
                previousValue = currentValue
                continue
    
            if currentValue == 80:  # 0x50
                if (markerFound and (previousValue == 137)):
                    previousValue = currentValue
                    continue
                markerFound = False
    
            elif currentValue == 78:   # 0x4E
                if (markerFound and (previousValue == 80)):
                    previousValue = currentValue
                    continue
                markerFound = False
    
            elif currentValue == 71:   # 0x47
                if (markerFound and (previousValue == 78)):
                    previousValue = currentValue
                    continue
                markerFound = False
    
            elif currentValue == 13:   # 0x0D
                if (markerFound and (previousValue == 71)):
                    previousValue = currentValue
                    continue
                markerFound = False
    
            elif currentValue == 10:   # 0x0A
                if (markerFound and (previousValue == 26)):
                    return startingOffset
                if (markerFound and (previousValue == 13)):
                    previousValue = currentValue
                    continue
                markerFound = False
    
            elif currentValue == 26:   # 0x1A
                if (markerFound and (previousValue == 10)):
                    previousValue = currentValue
                    continue
                markerFound = False
        return 0
    

    Once this function found the magic number enter image description here

    I split the stream and save the png file

        arr = stream.read()
        a = list(arr)
        B = a[GetPngStartingOffset(a):len(a)]
        bytesString = bytes(B)
        image = Image.open(io.BytesIO(bytesString))
        image.show()
    

    At the end this is not an end to end solution but it is a way to figure out streams content Thanks for reading and Thanks for @Robert Columbia for his patience