Search code examples
pythonamazon-web-servicesboto3amazon-quicksight

How to schedule or automate dataset refresh in aws quicksight


What are the options available to schedule or automate refresh of a quicksight SPICE dataset?

Are there any APIs available to automate spice datatset refresh? preferably using python.


Solution

  • You have two options,

    - Using API services available in the latest version of boto3

    Use 'create_ingestion' method to initiate dataset refresh, and use 'describe_ingestion' to check the status of refresh

    import boto3
    import time
    import sys
    client = boto3.client('quicksight')
    response = client.create_ingestion(DataSetId='<dataset-id>',IngestionId='<ingetion-id>',AwsAccountId='<aws-account-id>')
    while True:
        response = client.describe_ingestion(DataSetId='<dataset-id>',IngestionId='<ingetion-id>',AwsAccountId='<aws-account-id>')
            if response['Ingestion']['IngestionStatus'] in ('INITIALIZED', 'QUEUED', 'RUNNING'):
                time.sleep(10) #change sleep time according to your dataset size
            elif response['Ingestion']['IngestionStatus'] == 'COMPLETED':
                print("refresh completed. RowsIngested {0}, RowsDropped {1}, IngestionTimeInSeconds {2}, IngestionSizeInBytes {3}".format(
                    response['Ingestion']['RowInfo']['RowsIngested'],
                    response['Ingestion']['RowInfo']['RowsDropped'],
                    response['Ingestion']['IngestionTimeInSeconds'],
                    response['Ingestion']['IngestionSizeInBytes']))
                break
            else:
                print("refresh failed! - status {0}".format(response['Ingestion']['IngestionStatus']))
                sys.exit(1)
    

    DataSetId of dataset can be found from aws URI or use 'list_data_sets' method to list all datasets and get DataSetId from the field ['DataSetSummaries']['DataSetId'] method call response

    IngestionId - set unique id, I used current time in epoch [str(int(time.time()))]

    - Schedule refresh using schedule option in quicksight dataset

    You can schedule refreshes for 'hourly', 'daily', 'weekly' or 'monthly' cadence using schedule option in quicksight-dataset

    enter image description here