Search code examples
pythonpandasdatabaseimpala

How to show column name from impala datamart


Basically, my company using apache impala for datamart.

connection args are define before and connection is define as conn = connect(host=host_name, port=port, user=user, password=password, database=database)

Everytime I need a table I need to call a function like this

def impala_connection(host, port, user, password, database):
    conn = connect(host=host_name, port=port, user=user, password=password, database=database)
    cursor = conn.cursor()
    cursor.execute('SELECT * from table1')
    results = cursor.fetchall()
    return results
table1 = pd.DataFrame(impala_connection(host_name, port, user,password, database))

when I call table1 my dataframe will look like this

      0     1    2    3    4
0     a     b    c    d    e
1     e     f    g    g    i

Actually the table has column name, how to show that?


Solution

  • Read a limit lines from database use pd.read_sql. and from df.columns to get the table column names.

    import pandas as pd
    from sqlalchemy import create_engine
    
    conn = create_engine(
        'impala://{user}:{password}@{host}:{port}/{database}'.format(
            host=host,    # your host
            port=port,
            database=database,
            user=user,
            password=password
        )
    
    sql_read = lambda sql: pd.read_sql(sql, conn)
    sql_execute = lambda sql: pd.io.sql.execute(sql, conn)
    
    sql = '''
    select *
    from table1
    limit 1
    '''
    df = sql_read(sql)
    print(df.columns)