Search code examples
pythonamazon-s3boto3daskpython-s3fs

How to read parquet file from s3 using dask with specific AWS profile


How to read a parquet file on s3 using dask and specific AWS profile (stored in a credentials file). Dask uses s3fs which uses boto. This is what I have tried:

>>>import os
>>>import s3fs
>>>import boto3
>>>import dask.dataframe as dd

>>>os.environ['AWS_SHARED_CREDENTIALS_FILE'] = "~/.aws/credentials"

>>>fs = s3fs.S3FileSystem(anon=False,profile_name="some_user_profile")
>>>fs.exists("s3://some.bucket/data/parquet/somefile")
True
>>>df = dd.read_parquet('s3://some.bucket/data/parquet/somefile')
NoCredentialsError: Unable to locate credentials

Solution

  • Never mind, that was easy, but did not find any reference online, so here it is:

    >>>import os
    >>>import dask.dataframe as dd
    >>>os.environ['AWS_SHARED_CREDENTIALS_FILE'] = "/path/to/credentials"
    
    >>>df = dd.read_parquet('s3://some.bucket/data/parquet/somefile',
                          storage_options={"profile_name":"some_user_profile"})
    >>>df.head()
    # works