Search code examples
pythonsqlexceldatetimecell-formatting

Unable to read date time values in the correct format from excel file and save it in a database using python


I have a piece of code in python that reads from a excel file and saves into redshift database.

import psycopg2
def from_redshift():
    book = xlrd.open_workbook("excelfile.xlsx")
    sheet = book.sheet_by_index(0)

    con = psycopg2.connect(dbname='dbname', host='something.com', port=portnum, user='username', password='password')
    cursor=con.cursor()

    query = """INSERT INTO table_name (col1, col2, col3, start_date, update_date) VALUES (%s, %s, %s, %s, %s)"""
    for r in range(1, sheet.nrows):
        col1 = sheet.cell(r,0).value
        col2 = sheet.cell(r,1).value

        col3 = sheet.cell(r,2).value
        start_date     = sheet.cell(r,3).value
        update_date = sheet.cell(r,4).value

        # Assign values from each row
        values = (col1, col2, col3, start_date, update_date)

        # Execute sql Query
        cursor.execute(query, values)
        print("Executed")
    # Close the cursor
    cursor.close()

The code works fine in reading and inserting into the database, but my question is that the 'start_date' and the 'update_date' fields are of datetime in the database, so when I try to insert then, it gives me error that the values from these two columns are not in the right format, and when I changed these two columns to varchar in the database, it inserts these values are some weird number like 23.12345 (something like that).

The values in these two column look like YYYY-MM-DD HH:MM:[SS] (custom format).

How do I get these date time values in the database correctly?

    # Commit the transaction
    con.commit()
    con.close()

Solution

  • From the documentation on xlrd

    To read the date values you can use the xldate_as_tuple function

    Because the dates are stored as numbers within the excel file format

    I have't tested this, but with your code:

    def from_redshift():
        book = xlrd.open_workbook("excelfile.xlsx")
        sheet = book.sheet_by_index(0)
    
        for r in range(1, sheet.nrows):
            start_date     = xldate_as_tuple(sheet.cell(r,3).value, book.datemode)
            start_date = datetime.datetime(*start_date)
    

    BTW, if your method name is any indication of what you are doing. If you are loading this data into AWS Redshift, copying from a CSV file is always faster, easier and generally recommended over executing inserts from excel data like this.