Search code examples
filesearchbinaryheaderdm-script

How to search for a unique sequence in binary data?


I am trying to read a binary file with header. I know certain info is saved after a unique sequence 02 06 08 22 02 02 08 00. How could I find the position of such unique sequence?

I can use

String StreamReadAsText( ScriptObject stream, Number encoding, Number count )

to read the binary file one by one. But I guess it is pretty silly and slow.

Besides, how do i compare the result from StreamReadAsText() when the output is not a actual text (between 00 and 1F in the Ascii Table)?

Then, How do i read the binary file as int8 (the same size as a character in a string).for example, read 02, then 06, then 08 etc...

Any help is welcome and appreciated.

Regards,

Roger


Solution

  • You are already on the right track with reading the file with the streaming commands. However, why would you want to read the stream as text? You can read the stream as any (supported) number, using the tagGroup object as a proxy with TagGroupReadTagDataFromStream().

    There is actually an example in the F1 help-section where the streaming commands are listed, which I'm just copying here.

    F1 help

     Object stream = NewStreamFromBuffer( NewMemoryBuffer( 256 ) )
     TagGroup tg = NewTagGroup();
    
     Number stream_byte_order = 1; // 1 == bigendian, 2 == littleendian
     Number v_uint32_0, v_uint32_1, v_sint32_0, v_uint16_0, v_uint16_1
    
     // Create the tags and initialize with default values
     tg.TagGroupSetTagAsUInt32( "UInt32_0", 0 )
     tg.TagGroupSetTagAsUInt32( "UInt32_1", 0 )
     tg.TagGroupSetTagAsLong( "SInt32_0", 0 )
     tg.TagGroupSetTagAsUInt16( "UInt16_0", 0 )
     tg.TagGroupSetTagAsUInt16( "UInt16_1", 0 )
    
     // Stream the data into the tags   
     TagGroupReadTagDataFromStream( tg, "UInt32_0", stream, stream_byte_order );
     TagGroupReadTagDataFromStream( tg, "UInt32_1", stream, stream_byte_order );
     TagGroupReadTagDataFromStream( tg, "SInt32_0", stream, stream_byte_order );
     TagGroupReadTagDataFromStream( tg, "UInt16_0", stream, stream_byte_order );
     TagGroupReadTagDataFromStream( tg, "UInt16_1", stream, stream_byte_order );
    
    // Show the taggroup, if you want
    // tg.TagGroupOpenBrowserWindow("AuxTags",0)
    
     // Get the data from the tags
     tg.TagGroupGetTagAsUInt32( "UInt32_0", v_uint32_0 )
     tg.TagGroupGetTagAsUInt32( "UInt32_1", v_uint32_1 )
     tg.TagGroupGetTagAsLong( "Sint32_0", v_sint32_0 )
     tg.TagGroupGetTagAsUInt16( "UInt16_0", v_uint16_0 )
     tg.TagGroupGetTagAsUInt16( "UInt16_1", v_uint16_1 )
    

    There is already a post here on site about searching for a pattern within a stream: Find a pattern image (binary file) This shows how you would use a stream to look in an image, but you can use the filestream directly of course.


    As an alternative, you can read a whole array from the stream with ImageReadImageDataFromStream after preparing a suitable image beforehand. You can then use images to search location. This would be an example:

    // Example of reading the first X bytes of a file
    // as uInt16 data
    
    image ReadHeaderAsUint16( string filepath, number nBytes )
    {
        number kEndianness = 0 // Default byte order of the current platform
        if ( !DoesFileExist( filePath ) ) 
            Throw( "File '" + filePath + "' not found." )
        number fileID = OpenFileForReading( filePath )
        object fStream = NewStreamFromFileReference( fileID, 1 )
        if ( nBytes > fStream.StreamGetSize() ) 
            Throw( "File '" + filePath + "' has less than " + nBytes + "bytes." )
    
        image buff := IntegerImage( "Header", 2, 0, nBytes/2 )  // UINT16 array of suitable size
        ImageReadImageDataFromStream( buff, fStream, kEndianness )
        return buff 
    }
    
    number FindSignature( image header, image search )
    {
        // 1D images only
        if (        ( header.ImageGetNumDimensions() != 1 ) \
                ||  ( search.ImageGetNumDimensions() != 1 ) )
            Throw( "Only 1D images supported" )
    
        number sx = search.ImageGetDimensionSize( 0 ) 
        number hx = header.ImageGetDimensionSize( 0 )
        if ( hx < sx )
            return -1
    
        // Create a mask of possible start locations
        number startV = search.getPixel( 0, 0 )
        image mask = (header == startV) ? 1 : 0
    
        // Search all the occurances from the first
        number mx, my
        while( max( mask, mx, my ) )
        {
            if ( 0 == sum( header[0,mx,1,mx+sx] - search ) )
                return mx
            else
                mask.SetPixel( mx, 0, 0)
        }
        return -1
    }
    
    // Example
    // 1) Load file header as image (up to the size you want )
    string path = GetApplicationDirectory( "open_save", 0 )
    number maxHeaderSize = 200
    if ( !OpenDialog( NULL, "Select file to open", path, path ) ) Exit(0)
    image headerImg := ReadHeaderAsUint16( path, maxHeaderSize  )
    headerImg.ShowImage()
    
    // 2) define search-header as image
    image search := [8]: { 02, 06, 08, 22, 02, 02, 08, 00 }
    // MatrixPrint( search )
    
    // 3) search for it in the header
    number foundAt = FindSignature( headerImg, search )
    if ( -1 == foundAt ) 
        Throw( "The file header does not contain the search pattern." )
    else
        OKDialog( "Found the search pattern at offset: " + foundAt * 16 + "bytes" )