I'm working with an arcane data collection filesystem. It's got a block describing the files and their exact offsets on disk, so I know each files' start byte, end byte and length in bytes. The goal is to grab one file from the physical disk. They're big files so performance is paramount.
Here's what "works," but very inefficiently:
import shutil, io
def start_copy(startpos, endpos, filename="C:\\out.bin"):
with open(r"\\.\PhysicalDrive1", 'rb') as src_f:
src_f.seek(startpos)
flength = endpos - startpos
print("Starting copy of "+filename+" ("+str(flength)+"B)")
with open(filename, 'wb') as dst_f:
shutil.copyfileobj( io.BytesIO(src_f.read(flength)), dst_f )
print("Finished copy of "+filename)
This is slow: io.BytesIO(src_f.read(flength))
technically works, but it reads the entire file into memory before writing to the destination file. So it takes much longer than it should.
Copying directly using dst_f
won't work. (I assume) the end position can't be specified, so the copy doesn't stop.
Here are some questions, each of which could be a solution to this:
subprocess
) that takes start/end byte arguments?copyfileobj
can use, which references just a portion of another file-like object?io
object seeks past a certain end point?copyfileobj
be forced to naturally stop at a given byte offset of the drive (a sort of "fake EOF")?The obvious way to do this is to just write
to the file.
The whole point of copyfileobj
is that it buffers the data for you. If you have to read the whole file into a BytesIO
, you're just buffering the BytesIO
, which is pointless.
So, just loop around read
ing a decent-sized buffer from src_f
and write
it to dst_f
until you reach flength
bytes.
If you look at the shutil
source (which is linked from the shutil
docs), there's no magic inside copyfileobj
; it's a trivial function. As of 3.6 (and I think it's been completely unchanged since shutil
was added somewhere around 2.1…), it looks like this:
def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)
You can do the same thing, just keeping track of bytes read and stopping at flength
:
def copypartialfileobj(fsrc, fdst, size, length=16*1024):
"""copy size bytes from file-like object fsrc to file-like object fdst"""
written = 0
while written < size:
buf = fsrc.read(min(length, size - written))
if not buf:
break
fdst.write(buf)
written += len(buf)