Search code examples
delphidelphi-7binaryfiles

Delphi How to search in binary file faster?


I have a binary file (2.5 MB) and I want to find position of this sequence of bytes: CD 09 D9 F5. Then I want to write some data after this position and also overwrite old data (4 KB) with zeros.

Here is how I do it now but it is a bit slow.

ProcessFile(dataToWrite: string);
var
  fileContent: string;
  f: file of char;
  c: char;
  n, i, startIndex, endIndex: integer;
begin
  AssignFile(f, 'file.bin');
  reset(f);
  n := FileSize(f);
  while n > 0 do
  begin
    Read(f, c);
    fileContent := fileContent + c;
    dec(n);
  end;
  CloseFile(f);

  startindex := Pos(Char($CD)+Char($09)+Char($D9)+Char($F5), fileContent) + 4;
  endIndex := startIndex + 4088;

  Seek(f, startIndex);

  for i := 1 to length(dataToWrite) do
    Write(f, dataToWrite[i]);

  c := #0;
  while (i < endIndex) do
  begin
    Write(f, c); inc(i);
  end;

  CloseFile(f);
end;

Solution

  • Your code to read the entire file into a string is very wasteful. Pascal I/O uses buffering so I don't think it's the byte by byte aspect particularly. Although one big read would be better. The main problem will be the string concatenation and the extreme heap allocation demand required to concatenate the string, one character at a time.

    I'd do it like this:

    function LoadFileIntoString(const FileName: string): string;
    var
      Stream: TFileStream;
    begin
      Stream := TFileStream.Create(FileName, fmOpenRead);
      try
        SetLength(Result, Stream.Size);//one single heap allocation
        Stream.ReadBuffer(Pointer(Result)^, Length(Result));
      finally
        Stream.Free;
      end;
    end;
    

    That alone should make a big difference. When it comes to writing the file, a similar use of strings will be much faster. I've not attempted to decipher the writing part of your code. Writing the new data, and the block of zeros again should be batched up to as few separate writes as possible.

    If ever you find that you need to read or write very small blocks to a file, then I offer you my buffered file streams: Buffered files (for faster disk access).

    The code could be optimised further to read only a portion of the file, and search until you find the target. You may be able to avoid reading the entire file that way. However, I suspect that these changes will make enough of a difference.