Search code examples
amazon-web-servicesamazon-s3aws-sdkboto3aws-glue-data-catalog

How to change name of a table created by AWS Glue crawler using boto3


I'm trying to change the table name created by AWS Crawler using boto3. Here is the code:

import boto3

database_name = "eventbus"
table_name = "enrollment_user_enroll_cancel_1_0_0"
new_table_name = "enrollment_user_enroll_cancel"

client = boto3.client("glue", region_name='us-west-1')
response = client.get_table(DatabaseName=database_name, Name=table_name)
table_input = response["Table"]
table_input["Name"] = new_table_name
print(table_input)
print(table_input["Name"])

table_input.pop("CreatedBy")
table_input.pop("CreateTime")
table_input.pop("UpdateTime")
client.create_table(DatabaseName=database_name, TableInput=table_input)

Getting the below error:

botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in TableInput: "DatabaseName", must be one of: Name, Description, Owner, LastAccessTime, LastAnalyzedTime, Retention, StorageDescriptor, PartitionKeys, ViewOriginalText, ViewExpandedText, TableType, Parameters
Unknown parameter in TableInput: "IsRegisteredWithLakeFormation", must be one of: Name, Description, Owner, LastAccessTime, LastAnalyzedTime, Retention, StorageDescriptor, PartitionKeys, ViewOriginalText, ViewExpandedText, TableType, Parameters

Could you please let me know the resolution for this issue? Thanks!


Solution

  • To get rid of botocore.exceptions.ParamValidationError thrown by client.create_table, you need to delete the corresponding items from table_input in a similar way as you did with CreatedBy etc

    ...
    
    table_input.pop("DatabaseName")
    table_input.pop("IsRegisteredWithLakeFormation")
    
    client.create_table(DatabaseName=database_name, TableInput=table_input)
    

    In case your original table had partitions, which want to add to a new table, you need to use similar approach. First you need to retrieve meta information about those partitions with either:

    Note: depending which one you chose, you would need to pass different parameters. There are limitation on how many partitions you can retrieve within a single request. If I remember correctly it is around 200 or so. On top of that, you might need to use page paginator to list all of the available partitions. This is the case when your table has more then 400 partitions.

    In general, I would suggest to:

    paginator = client.get_paginator('get_partitions')
    response = paginator.paginate(
        DatabaseName=database_name,
        TableName=table_name
    )
    
    partitions = list()
    for page in response:
        for p in page['Partitions']:
            partitions.append(p.copy())
    
    # Here you need to remove "DatabaseName", "TableName", "CreationTime" from 
    # every partition
    

    Now you are ready add those retrieved partition to a new table with either:

    I'd suggest to use batch_create_partition(), however, it limits on how many partitions can be created at the single request.