Basically, my company using apache impala for datamart.
connection args are define before and connection is define as conn = connect(host=host_name, port=port, user=user, password=password, database=database)
Everytime I need a table I need to call a function like this
def impala_connection(host, port, user, password, database):
conn = connect(host=host_name, port=port, user=user, password=password, database=database)
cursor = conn.cursor()
cursor.execute('SELECT * from table1')
results = cursor.fetchall()
return results
table1 = pd.DataFrame(impala_connection(host_name, port, user,password, database))
when I call table1
my dataframe will look like this
0 1 2 3 4
0 a b c d e
1 e f g g i
Actually the table has column name, how to show that?
Read a limit lines from database use pd.read_sql
. and from df.columns
to get the table column names.
import pandas as pd
from sqlalchemy import create_engine
conn = create_engine(
'impala://{user}:{password}@{host}:{port}/{database}'.format(
host=host, # your host
port=port,
database=database,
user=user,
password=password
)
sql_read = lambda sql: pd.read_sql(sql, conn)
sql_execute = lambda sql: pd.io.sql.execute(sql, conn)
sql = '''
select *
from table1
limit 1
'''
df = sql_read(sql)
print(df.columns)