I'm currently using AWS Glue Data Catalog to organize my database. Once I set up the connection and sent my crawler to gather information, I was able to see the formulated metadata.
One feature that would be nice to have is the ability to SEARCH the entire data catalog on ONE column name. For example, if i have 5 tables in my data catalog, and one of those tables happen to have a field "age". I'd like to be able to see that table.
I also was wondering if I can search on the "comments" field every column has in a table on AWS Glue Data Catalog
Hope to get some help!
You can do that with AWS Glue API. For example, you can use python SDK boto3
and get_tables()
method to retrieve all meta information about tables in a particular database. Have a look at the Response Syntax returned by calling get_tables()
and then you would only need to parse it, for example:
import boto3
glue_client = boto3.client('glue')
response = glue_client.get_tables(
DatabaseName='__SOME_NAME__'
)
for table in response['TableList']:
columns = table['StorageDescriptor']['Columns']
for col in columns:
col_name = col['Name']
col_comment = col['Comment']
# Here you do search for what you need
Note: if you have a table with partitioning (artificial columns), then you would all need to search through
columns_as_partitions = table['PartitionKeys']
for col in columns_as_partitions:
col_name = col['Name']
col_comment = col['Comment']
# Here you do search for what you need