Search code examples
pandaspython-2.7teradatateradata-sql-assistant

Load Teradata table from excel using tpt loader


I have developed python and used pandas module to write excel. While executing command print(df1.columns), I get dtype as 'Object'.

and using same excel to load in Teradata table using TPT script and getting below error

FILE_READER[1]: TPT19108 Data Format 'DELIMITED' requires all 'VARCHAR/JSON/JSON BY NAME/CLOB BY NAME/BLOB BY NAME/XML BY NAME/XML/CLOB' schema.

Using Description in TPT:-

 DEFINE SCHEMA Teradata__DATA
 DESCRIPTION 'SCHEMA OF Teradata data'
 (
 Issue_Key VARCHAR(255),
 Log_Date VARDATE(10) FORMATIN ('YYYY-MM-DD') FORMATOUT ('YYYY-MM-DD'),
 User_Name VARCHAR(255),
 Time_Spent NUMBER(10,2)

Please help in resolving the failure message. Error might be due different Datatype or due to defined delimeter as "TAB". Please suggest if any other reason is causing this failure.

CODE

df = pd.read_excel('Time_Log_Source_2019-05-30.xlsx', sheet_name='Sheet1', dtype=str)
print("Column headings:")
print(df.columns)
df = pd.DataFrame(df,columns=['Issue Key', 'Log Date', 'User', 'Time Spent(Sec)'])
df['Log Date'] = df['Log Date'].str[:10]
df['Time Spent(Sec)'] = df['Time Spent(Sec)'].astype(int)/3600
print(df)
df.to_excel("Time_Log_Source_2019-05-30_output.xlsx")
df1 = pd.read_excel('Time_Log_Source_2019-05-30_output.xlsx', sheet_name='Sheet1',dtype=str)
df1['Issue Key'] = df1['Issue Key'].astype('str')
df1['Log Date'] = df1['Log Date'].astype('str')
df1['User'] = df1['User'].astype('str')
df1['Time Spent(Sec)'] = df1['Time Spent(Sec)'].astype('str')
df1.to_excel("Time_Log_Source_2019-05-30_output.xlsx",startrow=0, startcol=0, index=False)
print(type(df1['Time Spent(Sec)']))
print(df.columns)
print(df1.columns)

Result

Index([u'Issue Key', u'Log Date', u'User', u'Time Spent(Sec)'], dtype='object')
Index([u'Issue Key', u'Log Date', u'User', u'Time Spent(Sec)'], dtype='object')

Solution

  • A TPT Schema describes fields in client-side records, not columns in the database table. You would need to change the schema to say the (input) Time_Spent is VARCHAR.

    But TPT does not natively read .xlsx files. Consider using to_csv instead of to_excel.