Search code examples
pythonpandasdataframe

Python pandas: how to specify data types when reading an Excel file?


I am importing an excel file into a pandas dataframe with the pandas.read_excel() function.

One of the columns is the primary key of the table: it's all numbers, but it's stored as text (the little green triangle in the top left of the Excel cells confirms this).

However, when I import the file into a pandas dataframe, the column gets imported as a float. This means that, for example, '0614' becomes 614.

Is there a way to specify the datatype when importing a column? I understand this is possible when importing CSV files but couldn't find anything in the syntax of read_excel().

The only solution I can think of is to add an arbitrary letter at the beginning of the text (converting '0614' into 'A0614') in Excel, to make sure the column is imported as text, and then chopping off the 'A' in python, so I can match it to other tables I am importing from SQL.


Solution

  • You just specify converters. I created an excel spreadsheet of the following structure:

    names   ages
    bob     05
    tom     4
    suzy    3
    

    Where the "ages" column is formatted as strings. To load:

    import pandas as pd
    
    df = pd.read_excel('Book1.xlsx',sheetname='Sheet1',header=0,converters={'names':str,'ages':str})
    >>> df
           names ages
       0   bob   05
       1   tom   4
       2   suzy  3