Search code examples
pythonconnectionodbcteradata

Connecting Python with Teradata using Teradata module


I have installed python 2.7.0 and Teradata module on Windows 7. I am not able to connect and query TD from python.

pip install Teradata

Now I want to import teradata module in my source code and perform operations like -

  1. Firing queries to teradata and get result set.
  2. Check if connection is made to teradata.

Please help me writing code for the same as I am new to Python and there is no information available with me to connect to teradata.


Solution

  • There are a number of ways to connect to Teradata and export table to Pandas. Here are four+:

    Using teradata module

    # You can install teradata via PIP: pip install teradata
    # to get a list of your odbc drivers names, you could do: teradata.tdodbc.drivers
    # You don’t need to install teradata odbc driver if using method='rest'.     
    # See sending data from df to teradata for connection example 
    
    import teradata
    import pandas as pd
    
    host,username,password = 'HOST','UID', 'PWD'
    #Make a connection
    udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)
    
    
    with udaExec.connect(method="odbc",system=host, username=username,
                                password=password, driver="DRIVERNAME") as connect:
    
        query = "SELECT * FROM DATABASEX.TABLENAMEX;"
    
        #Reading query to df
        df = pd.read_sql(query,connect)
        # do something with df,e.g.
        print(df.head()) #to see the first 5 rows
    

    Using TeradataSQL

    from @ymzkala : This package doesn't require you to install Teradata drivers (other than this package).

    # Installing python -m pip install teradatasql
    
    import teradatasql
    
    with teradatasql.connect(host='host', user='username', password='password') as connect:
        df = pd.read_sql(query, connect)
    

    Using pyodbc module

    import pyodbc
    
     #You can install teradata via PIP: pip install pyodbc
     #to get a list of your odbc drivers names, you could do: pyodbc.drivers()
    
    #Make a connection
    link = 'DRIVER={DRIVERNAME};DBCNAME={hostname};UID={uid};PWD={pwd}'.format(
                          DRIVERNAME=DRIVERNAME,hostname=hostname,  
                          uid=username, pwd=password)
    with pyodbc.connect(link,autocommit=True) as connect:
    
        #Reading query to df
        df = pd.read_sql(query,connect)
    

    Using sqlalchemy Module

     #You can install sqlalchemy via PIP: pip install sqlalchemy-teradata
     #Note: It is not pip install sqlalchemy. If you already have sqlalchemy, you still need sqlalchemy-teradata to get teradata dialects
    
    from sqlalchemy import create_engine
    
    #Make a connection
    
    link = 'teradata://{username}:{password}@{hostname}/?driver={DRIVERNAME}'.format(
                   username=username,hostname=hostname,DRIVERNAME=DRIVERNAME)
    
    with create_engine(link) as connect:
    
        #Reading query to df
        df = pd.read_sql(query,connect)
    

    There is a fifth way, using giraffez module. I enjoy using this module as it come with MLOAD, FASTLOAD, BULKEXPORT etc. The only issue for beginners is its requirements (e.g C/C++ compiler ,Teradata CLIv2 and TPT API headers/lib files).

    Note: Updated 13-07-2018, using of context manager to ensure closing of sessions

    Update: 31-10-2018: Using teradata to send data from df to teradata

    We can send data from df to Teradata. Avoiding 'odbc' 1 MB limit and odbc driver dependency, we can use 'rest' method. We need host ip_address, instead of driver argument. NB: The order of columns in df should match the order of columns in Teradata table.

    import teradata
    import pandas as pd
    
    # HOST_IP can be found by executing *>>nslookup viewpoint* or *ping  viewpoint* 
    udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False) 
    with udaExec.connect(method="rest",system="DBName", username="UserName",
                          password="Password", host="HOST_IP_ADDRESS") as connect:
    
        data = [tuple(x) for x in df.to_records(index=False)]
    
        connect.executemany("INSERT INTO DATABASE.TABLEWITH5COL values(?,?,?,?,?)",data,batch=True)
    

    Using 'odbc', you have to chunk your data to less than 1MB chunks to avoid "[HY001][Teradata][ODBC Teradata Driver] Memory allocation error" error: E.g.

    import teradata
    import pandas as pd
    import numpy as np
    
    udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)
    
    with udaExec.connect(method="odbc",system="DBName", username="UserName",
                          password="Password", driver="DriverName") as connect:
    
        #We can divide our huge_df to small chuncks. E.g. 100 churchs
        chunks_df = np.array_split(huge_df, 100)
    
        #Import chuncks to Teradata
        for i,_ in enumerate(chunks_df):
    
            data = [tuple(x) for x in chuncks_df[i].to_records(index=False)]
            connect.executemany("INSERT INTO DATABASE.TABLEWITH5COL values(?,?,?,?,?)",data,batch=True)