Search code examples
pythonclouderaimpala

Cloudera Impala INVALIDATE METADATA list of tables


I am using pandas and pyodbc to pull data from CDH6 Impala. The tables are ingested daily and every other day I need to go in and manually invalidate metadata on several tables. I've been trying to automate this with Python, to have the code iterate through each table in the list. But since we retrieve a None type object with the statement, iteration doesn't work.

Any ideas?

import pyodbc
import pandas as pd

connString = 'connection string'

tables = ["INVALIDATE METADATA master.table1", "INVALIDATE METADATA master.table2" etc]

for t in tables:
    if tables != None:
        try:
            pd.read_sql_query(tables, con=pyodbc.connect(connString, autocommit=True))
        except:
            print('Unsuccessful')
    else:
        if tables is None:
            pd.read_sql_query(tables, con=pyodbc.connect(connString, autocommit=True))
            print('Success with none')

Solution

  • I think you can automate this easily. I prefer using pyodbc.

    1. first get list of tables using show tables in schema.
    2. Iterate through the list, create invalidate metadata statement and issue it.
    import pyodbc
    
    connp = pyodbc.connect(conn_string, autocommit=True)
    cursor = connp.cursor()
    #Create SQL to fetch all tables from schema_xx. You can use argument instead of schema_xx too.
    sql = "show tables in schema_xx"  
    cursor.execute(sql)
    
    for row in cursor.fetchall():
          #Create and execute invalidate metadata statement one by one
          sql ="INVALIDATE METADATA schema_xx."+row[0]
          cursor.execute(sql)
    
    connp.close()