Search code examples
pythonpandascastingtype-conversiondtype

Import pandas dataframe column as string not int


I would like to import the following csv as strings not as int64. Pandas read_csv automatically converts it to int64, but I need this column as string.

ID
00013007854817840016671868
00013007854817840016749251
00013007854817840016754630
00013007854817840016781876
00013007854817840017028824
00013007854817840017963235
00013007854817840018860166
df = read_csv('sample.csv')

df.ID
>>

0   -9223372036854775808
1   -9223372036854775808
2   -9223372036854775808
3   -9223372036854775808
4   -9223372036854775808
5   -9223372036854775808
6   -9223372036854775808
Name: ID

Unfortunately using converters gives the same result.

df = read_csv('sample.csv', converters={'ID': str})
df.ID
>>

0   -9223372036854775808
1   -9223372036854775808
2   -9223372036854775808
3   -9223372036854775808
4   -9223372036854775808
5   -9223372036854775808
6   -9223372036854775808
Name: ID

Solution

  • Just want to reiterate this will work in pandas >= 0.9.1:

    In [2]: read_csv('sample.csv', dtype={'ID': object})
    Out[2]: 
                               ID
    0  00013007854817840016671868
    1  00013007854817840016749251
    2  00013007854817840016754630
    3  00013007854817840016781876
    4  00013007854817840017028824
    5  00013007854817840017963235
    6  00013007854817840018860166
    

    I'm creating an issue about detecting integer overflows also.

    EDIT: See resolution here: https://github.com/pydata/pandas/issues/2247

    Update as it helps others:

    To have all columns as str, one can do this (from the comment):

    pd.read_csv('sample.csv', dtype = str)
    

    To have most or selective columns as str, one can do this:

    # lst of column names which needs to be string
    lst_str_cols = ['prefix', 'serial']
    # use dictionary comprehension to make dict of dtypes
    dict_dtypes = {x : 'str'  for x in lst_str_cols}
    # use dict on dtypes
    pd.read_csv('sample.csv', dtype=dict_dtypes)