Search code examples
c#filestream

C# - remove blocks of bytes in large binary files


i want a fast way in c# to remove a blocks of bytes in different places from binary file of size between 500MB to 1GB , the start and the length of bytes needed to be removed are in saved array

int[] rdiDataOffset= {511,15423,21047};
int[] rdiDataSize={102400,7168,512};

EDIT: this is a piece of my code and it will not work correctly unless i put buffer size to 1:

while(true){
    if (rdiDataOffset.Contains((int)fsr.Position))
    {
        int idxval = Array.IndexOf(rdiDataOffset, (int)fsr.Position, 0, rdiDataOffset.Length);
        int oldRFSRPosition = (int)fsr.Position;
        size = rdiDataSize[idxval];
        fsr.Seek(size, SeekOrigin.Current);

    }
    int bufferSize = size == 0 ? 2048 : size;
    if ((size>0) && (bufferSize > (size))) bufferSize = (size);
    if (bufferSize > (fsr.Length - fsr.Position)) bufferSize = (int)(fsr.Length - fsr.Position);
    byte[] buffer = new byte[bufferSize];
    int nofbytes = fsr.Read(buffer, 0, buffer.Length);
    fsr.Flush();
    if (nofbytes < 1)
    {
     break;
    }
   }

Solution

  • A simple algorithm for doing this using a temp file (it could be done in-place as well but you have a riskier situation in case things go wrong).

    1. Create a new file and call SetLength to set the stream size (if this is too slow you can Interop to SetFileValidData). This ensures that you have room for your temp file while you are doing the copy.

    2. Sort your removal list in ascending order.

    3. Read from the current location (starting at 0) to the first removal point. The source file should be opened without granting Write share permissions (you don't want someone mucking with it while you are editing it).

    4. Write that content to the new file (you will likely need to do this in chunks).

    5. Skip over the data not being copied

    6. Repeat from #3 until done

    7. You now have two files - the old one and the new one ... replace as necessary. If this is really critical data you might want to look a transactional approach (either one you implement or using something like NTFS transactions).

    8. Consider a new design. If this is something you need to do frequently then it might make more sense to have an index in the file (or near the file) which contains a list of inactive blocks - then when necessary you can compress the file by actually removing blocks ... or maybe this IS that process.