Search code examples
pythonrestaws-glueaws-glue-data-catalog

AWS Glue Search Option


I'm currently using AWS Glue Data Catalog to organize my database. Once I set up the connection and sent my crawler to gather information, I was able to see the formulated metadata.

One feature that would be nice to have is the ability to SEARCH the entire data catalog on ONE column name. For example, if i have 5 tables in my data catalog, and one of those tables happen to have a field "age". I'd like to be able to see that table.

I also was wondering if I can search on the "comments" field every column has in a table on AWS Glue Data Catalog

Hope to get some help!


Solution

  • You can do that with AWS Glue API. For example, you can use python SDK boto3 and get_tables() method to retrieve all meta information about tables in a particular database. Have a look at the Response Syntax returned by calling get_tables() and then you would only need to parse it, for example:

    import boto3
    
    glue_client = boto3.client('glue')
    
    response = glue_client.get_tables(
        DatabaseName='__SOME_NAME__'
    )
    
    for table in response['TableList']:
        columns = table['StorageDescriptor']['Columns']
        for col in columns:
            col_name = col['Name']
            col_comment = col['Comment']
    
            # Here you do search for what you need
    

    Note: if you have a table with partitioning (artificial columns), then you would all need to search through

    columns_as_partitions = table['PartitionKeys']
    for col in columns_as_partitions:
        col_name = col['Name']
        col_comment = col['Comment']
    
        # Here you do search for what you need