I am trying to write a unit test for a function which uses pd.read_parquet()
function and I am struggling to make it work. I have the code below
from moto import mock_aws
import pandas as pd
import pytest
import datetime as dt
import boto3
from my_module import foo
@pytest.fixture
def mock_df():
cols = [
"timestamp",
"value"
]
values = [
[dt.datetime(2024, 1, 1, 0), 2.57],
[dt.datetime(2024, 1, 1, 1), 1.41],
[dt.datetime(2024, 1, 1, 2), 2.06],
]
df = pd.DataFrame(values, columns=cols)
return df
@mock_aws
def test_download(mock_df):
bucket_name = "test-input-bucket"
s3 = boto3.resource("s3", region_name="us-east-1")
s3.create_bucket(Bucket=bucket_name)
key1 = "s3://test-input-bucket/path/to/data.parquet"
mock_df.to_parquet(key1) # code fails already here
foo() # uses pd.read_parquet()
But I am getting this error
OSError: When initiating multiple part upload for key 'path/to/data.parquet'
in bucket 'test-input-bucket': AWS Error INVALID_ACCESS_KEY_ID during
CreateMultipartUpload operation: The AWS Access Key Id you provided does not exist in our records.
I am getting the same error whether I use to_parquet
or try to use read_parquet
. Everything works fine, if I use something diffrent for the upload and download, like
s3_bucket.put_object(Key=key1, Body=mock_df.to_parquet())
However I am not interested in replacing the pandas functions as it is not possible in my situation and need to find a way to mock S3 while using them. Is there a way to make moto
work with these functions?
EDIT: I am using these versions
boto3 1.28.64
botocore 1.31.64
moto 5.0.3
This fixed the issue on my end. I am not 100% applies to every single case.
On our end we narrowed the issue indeed to using to_parquet
or read_parquet
in tests.
Using fastparquet
as the engine (engine='fastparquet'
) seemed to provide a solution, but for us it wasn't always possible.
For some reason, pyarrow
wants credentials, whereas other packages don't care. Adding credentials and forcing them at the creation of the connection did the trick for us.
So something like
@pytest.fixture
def aws_credentials():
"""Mocked AWS Credentials for moto."""
os.environ["AWS_ACCESS_KEY_ID"] = "testing"
os.environ["AWS_SECRET_ACCESS_KEY"] = "testing"
os.environ["AWS_SECURITY_TOKEN"] = "testing"
os.environ["AWS_SESSION_TOKEN"] = "testing"
@mock_aws
def test_file(aws_credentials, mock_df):
with mock_aws(aws_credentials):
conn = boto3.client("s3", region_name="us-east-1")
yield conn
conn.create_bucket(Bucket="testbucket")
allowed us to have access to the bucket. Let me know if it works on your end as well.