Search code examples
amazon-web-servicesaws-glueaws-glue-data-catalog

How to create a data catalog in Amazon Glue externally?


I want to create a data catalog externally in Amazon Glue. Is there any way?


Solution

  • AWS Glue Data Catalog consists of meta information about various data sources within AWS, e.g. S3, DynamoDB etc. Instead of using Crawlers or AWS Console, you can populate data catalog directly with AWS Glue API related to different structures, like Database, Table etc. AWS provides several SDKs for different languages, e.g. boto3 for python with easy to use object-oriented API. So as long as you know how your data structure, you can use methods

    Create Database definition:

    from pprint import pprint
    import boto3
    
    client = boto3.client('glue')
    response = client.create_database(
        DatabaseInput={
            'Name': 'my_database',  # Required
            'Description': 'Database created with boto3 API',
            'Parameters': {
                'my_param_1': 'my_param_value_1'
            },
        }
    )
    pprint(response)
    
    # Output
    {
        'ResponseMetadata': {
            'HTTPHeaders': {
                'connection': 'keep-alive',
                'content-length': '2',
                'content-type': 'application/x-amz-json-1.1',
                'date': 'Fri, 11 Oct 2019 12:37:12 GMT',
                'x-amzn-requestid': '12345-67890'
            },
            'HTTPStatusCode': 200,
            'RequestId': '12345-67890',
            'RetryAttempts': 0
        }
    }
    
    

    enter image description here

    Create Table definition:

    response = client.create_table(
        DatabaseName='my_database',
        TableInput={
            'Name': 'my_table',
            'Description': 'Table created with boto3 API',
            'StorageDescriptor': {
                'Columns': [
                    {
                        'Name': 'my_column_1',
                        'Type': 'string',
                        'Comment': 'This is very useful column',
                    },
                    {
                        'Name': 'my_column_2',
                        'Type': 'string',
                        'Comment': 'This is not as useful',
                    },
                ],
                'Location': 's3://some/location/on/s3',
            },
            'Parameters': {
                'classification': 'json',
                'typeOfData': 'file',
            }
        }
    )
    
    pprint(response)
    
    # Output
    {
        'ResponseMetadata': {
            'HTTPHeaders': {
                'connection': 'keep-alive',
                'content-length': '2',
                'content-type': 'application/x-amz-json-1.1',
                'date': 'Fri, 11 Oct 2019 12:38:57 GMT',
                'x-amzn-requestid': '67890-12345'
            },
            'HTTPStatusCode': 200,
            'RequestId': '67890-12345',
            'RetryAttempts': 0
        }
    }
    

    enter image description here