Search code examples
pythonamazon-web-servicesamazon-s3amazon-redshiftboto

Unload to S3 with Python using IAM Role credentials


In Redshift, I run the following to unload data from a table into a file in S3:

unload('select * from table')
to 's3://bucket/unload/file_'
iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>'

I would like to do the same in Python- any suggestion how to replicate this? I saw examples using access key and secret, but that is not an option for me- need to use role based credentials on a non-public bucket.


Solution

  • You will need two sets of credentials. IAM credentials via an IAM Role to access the S3 bucket and Redshift ODBC credentials to execute SQL commands.

    Create a Python program that connects to Redshift, in a manner similar to other databases such as SQL Server, and execute your query. This program will need Redshift login credentials and not IAM credentials (Redshift username, password).

    The IAM credentials for S3 are assigned as a role to Redshift so that Redshift can store the results on S3. This is the iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>' part of the Redshift query in your question.

    You do not need boto3 (or boto) to access Redshift, unless you plan to actually interface with the Redshift API (which does not access the database stored inside Redshift).

    Here is an example Python program to access Redshift. The link to this code is here. Credit due to Varun Verma

    There are other examples on the Internet to help you get started.

    ############ REQUIREMENTS ####################
    # sudo apt-get install python-pip 
    # sudo apt-get install libpq-dev
    # sudo pip install psycopg2
    # sudo pip install sqlalchemy
    # sudo pip install sqlalchemy-redshift
    ##############################################
    
    import sqlalchemy as sa
    from sqlalchemy.orm import sessionmaker
    
    #>>>>>>>> MAKE CHANGES HERE <<<<<<<<<<<<< 
    DATABASE = "dbname"
    USER = "username"
    PASSWORD = "password"
    HOST = "host"
    PORT = ""
    SCHEMA = "public"      #default is "public" 
    
    ####### connection and session creation ############## 
    connection_string = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
    engine = sa.create_engine(connection_string)
    session = sessionmaker()
    session.configure(bind=engine)
    s = session()
    SetPath = "SET search_path TO %s" % SCHEMA
    s.execute(SetPath)
    ###### All Set Session created using provided schema  #######
    
    ################ write queries from here ###################### 
    query = "unload('select * from table') to 's3://bucket/unload/file_' iam_role 'arn:aws:iam:<aws-account-id>:role/<role_name>';"
    rr = s.execute(query)
    all_results =  rr.fetchall()
    
    def pretty(all_results):
        for row in all_results :
            print "row start >>>>>>>>>>>>>>>>>>>>"
            for r in row :
                print " ----" , r
            print "row end >>>>>>>>>>>>>>>>>>>>>>"
    
    
    pretty(all_results)
    
    
    ########## close session in the end ###############
    s.close()