From what I understand, the buffer_size
argument to io.BufferedReader
is supposed to control the read buffer size passed to the underlying reader.
However, I'm not seeing that behavior. Instead, when I reader.read()
the entire file, io.DEFAULT_BUFFER_SIZE
is used and buffer_size
is ignored. When I reader.read(length)
, length
is used as buffer size, and the buffer_size
argument is again ignored.
Minimal example:
import io
class MyReader(io.RawIOBase):
def __init__(self, length):
self.length = length
self.position = 0
def readinto(self, b):
print('read buffer length: %d' % len(b))
length = min(len(b), self.length - self.position)
self.position += length
b[:length] = 'a' * length
return length
def readable(self):
return True
def seekable(self):
return False
print('# read entire file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print('output length: %d' % len(reader.read()))
print('\n# read part of file file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print('output length: %d' % len(reader.read(10000)))
print('\n# read beyond end of file file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print 'output length: %d' % len(reader.read(30000))
Outputs:
# read entire file
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192
output length: 20000
# read part of file file
read buffer length: 10000
output length: 10000
# read beyond end of file file
read buffer length: 30000
read buffer length: 10000
output length: 20000
Am I misunderstanding how the BufferedReader is supposed to work?
The point of BufferedIOReader
is to keep an internal buffer, and you set the size of that buffer. That buffer is used to satisfy smaller reads, to avoid many read calls on a slower I/O device.
The buffer does not try to limit the size of reads, however!
From the io.BufferedIOReader
documentation:
When reading data from this object, a larger amount of data may be requested from the underlying raw stream, and kept in an internal buffer. The buffered data can then be returned directly on subsequent reads.
The object inherits from io.BufferedIOBase
, which states:
The main difference with
RawIOBase
is that methodsread()
,readinto()
andwrite()
will try (respectively) to read as much input as requested or to consume all given output, at the expense of making perhaps more than one system call.
Because you called .read()
on the object, larger blocks are read from the wrapped object to read all data to the end. The internal buffer that the BufferedIOReader()
instance holds doesn't come into play here, you asked for all the data after all.
The buffer would come into play if you read in smaller blocks:
>>> reader = io.BufferedReader(MyReader(2048), buffer_size=512)
>>> __ = reader.read(42) # initial read, fill buffer
read buffer length: 512
>>> __ = reader.read(123) # within the buffer, no read to underlying file needed
>>> __ = reader.read(456) # deplete buffer, another read needed to re-fill
read buffer length: 512
>>> __ = reader.read(123) # within the buffer, no read to underlying file needed
>>> __ = reader.read() # read until end, uses larger blocks to read from wrapped file
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192