Search code examples
pythonamazon-s3avrobytesio

Reading in-memory Avro file from S3: 'AttributeError:'


I'm trying to read Avro files stored in S3 by a vendor and write to a DW. See code below. (Was roughly working from this S/O thread.)

obj = obj.get()
raw_bytes = obj["Body"].read()
avro_bytes = io.BytesIO(raw_bytes)
reader = DataFileReader(avro_bytes, DatumReader())

The code is tripped up at the last line, where I get the error: AttributeError: '_io.StringIO' object has no attribute 'mode'

That error comes from this spot in the source code, where DataFileReader is initialized.

def __init__(self, reader: IO[AnyStr], datum_reader: avro.io.DatumReader) -> None:
        if "b" not in reader.mode:
            warnings.warn(avro.errors.AvroWarning(f"Reader binary data from a reader {reader!r} that's opened for text"))
        bytes_reader = getattr(reader, "buffer", reader)

I've tried using avro_bytes as StringIO as well to see if that would help, but it didn't.

Any ideas how to get past that AttributeError?


Solution

  • This is a bug in version 1.11.0 that has been fixed but a new version hasn't been released: https://issues.apache.org/jira/browse/AVRO-3252.

    To resolve this, you can do one of the following:

    1. Wait until the new version is released
    2. Patch the call so that it doesn't do that check
    3. Instead of using BytesIO you could make your own wrapper object that mimics BytesIO but has a mode attribute.