Trying to read an xlsx file from an S3 bucket. Lambda function is called but I get the following 'seek' error
[ERROR] UnsupportedOperation: seek
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 19, in lambda_handler
df = pd.read_excel(xl_file\['Body'\], dtype_backend='pyarrow', header=\[1,2\])
File "/opt/python/pandas/io/excel/\_base.py", line 495, in read_excel
io = ExcelFile(
File "/opt/python/pandas/io/excel/\_base.py", line 1550, in __init__
ext = inspect_excel_format(
File "/opt/python/pandas/io/excel/\_base.py", line 1406, in inspect_excel_format
stream.seek(0)
...and here is my lambda function (the part that is relevant). Header is spread across rows 1 and 2. Code runs fine in it's local form (i.e. when pulling xlsx file from a local directory)
import json
import boto3
import io
import csv
import pandas as pd
s3Client = boto3.client('s3')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
xl_file = s3Client.get_object(Bucket = bucket, Key = key)
df = pd.read_excel(xl_file['Body'], dtype_backend='pyarrow', header=[1,2])
Working code as follows
import boto3
import io
import csv
import pandas as pd
from io import BytesIO
s3Client = boto3.client('s3')
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
xl_file = s3Client.get_object(Bucket = bucket, Key = key)
xl_file_body = io.BytesIO(xl_file['Body'].read())
df = pd.read_excel(xl_file_body, dtype_backend='pyarrow', header=[1,2])