Search code examples
pandasamazon-web-servicesamazon-s3parquetaws-data-wrangler

Adding tags to S3 objects using awswrangler?


I'm using awswrangler to write parquets in my S3 and I usually add tags on all my objects to access and cost control, but I didn't find a way to do that using directly awswrangler. I'm current using the code below to test:

import awswrangler as wr
import boto3
import pandas as pd

# Boto session
session = boto3.Session(profile_name='my_profile')

# Dummy pandas dataframe
d = {'col1': [1, 2], 'col2': [3, 4]}
df_pandas = pd.DataFrame(data=d)

wr.s3.to_parquet(df=df_pandas, path='s3://my-bucket/path/', boto3_session=session)

There is a way to add tags to the objects that .to_parquet will write in my S3?


Solution

  • I just figured out that awswrangler has a parameter called s3_additional_kwargs that you can pass additional variables to the s3 requests that awswrangler does for you. You can send tags like in boto3 'Key1=value1&Key2=value2'

    Below is an example how to add tags to your objects:

    import awswrangler as wr
    import boto3
    import pandas as pd
    
    # Tagging
    tag_set = 'Key1=value1&Key2=value2'
    
    # Boto session
    session = boto3.Session(profile_name='my_profile')
    
    # Dummy pandas dataframe
    d = {'col1': [1, 2], 'col2': [3, 4]}
    df_pandas = pd.DataFrame(data=d)
    
    wr.s3.to_parquet(df=df_pandas, path='s3://my-bucket/path/', s3_additional_kwargs={'Tagging': tag_set}, boto3_session=session)