Search code examples
c#python-3.xtranslate

Is there an equivalent to mmap.mmap.rfind in C#?


While looking at memory-mapped files in C#, there was some difficulty in identifying how to search a file quickly forward and in reverse. My goal is to rewrite the following function in the language, but nothing could be found like the find and rfind methods used below. Is there a way in C# to quickly search a memory-mapped file using a particular substring?

#! /usr/bin/env python3
import mmap
import pathlib


# noinspection PyUnboundLocalVariable
def drop_last_line(path):
    with path.open('r+b') as file:
        with mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as search:
            for next_line in b'\r\n', b'\r', b'\n':
                if search.find(next_line) >= 0:
                    break
            else:
                raise ValueError('cannot find any line delimiters')
            end_1st = search.rfind(next_line)
            end_2nd = search.rfind(next_line, 0, end_1st - 1)
        file.truncate(0 if end_2nd < 0 else end_2nd + len(next_line))

Solution

  • Is there a way in C# to quickly search a memory-mapped file using a particular substring?

    Do you know of any way to memory-map an entire file in C# and then treat it as a byte array?

    Yes, it's quite easy to map an entire file into a view then to read it into a single byte array as the following code shows:

    static void Main(string[] args)
    {
        var sourceFile=  new FileInfo(@"C:\Users\Micky\Downloads\20180112.zip");
        int length = (int) sourceFile.Length;  // length of target file
    
        // Create the memory-mapped file.
        using (var mmf = MemoryMappedFile.CreateFromFile(sourceFile.FullName,
                                                         FileMode.Open, 
                                                         "ImgA"))
        {
            var buffer = new byte[length]; // allocate a buffer with the same size as the file
    
            using (var accessor = mmf.CreateViewAccessor())
            {
                var read=accessor.ReadArray(0, buffer, 0, length); // read the whole thing
            }
    
            // let's try searching for a known byte sequence.  Change this to suit your file
            var target = new byte[] {71, 213, 62, 204,231};
    
            var foundAt = IndexOf(buffer, target);
    
        }
    }
    

    I couldn't seem to find any byte searching method in Marshal or Array but you can use this search algorithm courtesy of Social MSDN as a start:

    private static int IndexOf2(byte[] input, byte[] pattern)
    {
        byte firstByte = pattern[0];
        int  index     = -1;
    
        if ((index = Array.IndexOf(input, firstByte)) >= 0)
        {
            for (int i = 0; i < pattern.Length; i++)
            {
                if (index + i  >= input.Length ||
                    pattern[i] != input[index + i]) return -1;
            }
        }
    
        return index;
    }
    

    ...or even this more verbose example (also courtesy Social MSDN, same link)

    public static int IndexOf(byte[] arrayToSearchThrough, byte[] patternToFind)
    {
        if (patternToFind.Length > arrayToSearchThrough.Length)
            return -1;
        for (int i = 0; i < arrayToSearchThrough.Length - patternToFind.Length; i++)
        {
            bool found = true;
            for (int j = 0; j < patternToFind.Length; j++)
            {
                if (arrayToSearchThrough[i + j] != patternToFind[j])
                {
                    found = false;
                    break;
                }
            }
            if (found)
            {
                return i;
            }
        }
        return -1;
    }