Search code examples
mysqlsqldatabasegoogle-bigquerydata-warehouse

Extract/Get the table name and their respective columns used in a view from a query in BigQuery


Let us take the following MYSQL query written in BigQuery for creating a view.

SELECT a.col1,
       a.col2,
       a.col3,
       b.col1,
       b.col2,
       b.col3,
       c.col1,
       c.col2
FROM project_name_dataset_table_a a
INNER JOIN project_name_dataset_table_b b ON a.col1 = b.col1
INNER JOIN project_name_dataset_table_c c ON a.col2 = c.col2

Below is the required output format or similar. Final Format Required or any similar will be fine

Basically i have to prepare a document which will contain all the information of the views created and the tables and their respective columns used in it.


Solution

  • I have created a script for the usage at my end you can tweak the regex according to you

    import re
    
    query = """
    SELECT a.col1, a.col2, a.col3, b.col1, b.col2, b.col3, c.col1, c.col2 
    FROM `project_name.dataset.table_a` a
    JOIN `project_name.dataset.table_b` b
    ON ...
    JOIN `project_name.dataset.table_c` c
    ON ...
    """
    tabsandcols = re.findall(r'([A-Za-z0-9\-\_]*)\.([a-zA-Z0-9\.\_]*)', query)
    
    alias_names = re.findall(r'\`([A-Za-z\-\_]*)\.([a-zA-Z\.\_]*)\` ([a-z]{1,3})', query)
    
    dic = {}
    
    print(alias_names)
    
    for i, j, k in alias_names:
        if k in dic:
            pass
        else:
            dic[k] = j
    
    l = set()
    for i, j in tabsandcols:
        if i != "project_name" and dic.get(i, "") != "":
            l.add(dic.get(i, "") + " " + j)
    
    for i in l:
        print("project_name", i)
    

    It will provide following output

    project_name dataset.table_a col1
    project_name dataset.table_b col2
    project_name dataset.table_c col1
    project_name dataset.table_c col2
    project_name dataset.table_a col3
    project_name dataset.table_a col2
    project_name dataset.table_b col1
    project_name dataset.table_b col3