Search code examples
pythonpandascsvstringio

Strange output using Pandas when converting StringIO with read_csv


When the pandas CSV reader function 'read_csv" is used to convert StringIO values strange characters ('.1') are being appended at the end of the second field when delimiting certain fields. The desired results is the first test, but all the fields do not have spaces after the delimiter (','). Splitting "1.5M, 1.5M" should always return "1.5M", but when there is no spaces it returns the second field with "1.5M.1" (adding '.1' at the end of the field). Is there a way to resolve this issue?

>>>import pandas as pd
>>>from io import StringIO
>>>pd.read_csv(StringIO("1.5M, 1.5M"))
Empty DataFrame
Columns: [1.5M,  1.5M]
Index: []
>>> pd.read_csv(StringIO("1.5M,1.5M"))
Empty DataFrame
Columns: [1.5M, 1.5M.1]
Index: []
>>>

Solution

  • Notice in the first example with the space, your dataframe has zero rows and your column names include the space in the second column.

     df = pd.read_csv(StringIO("1.5M, 1.5M"))
     df.columns
    
     Index(['1.5M', ' 1.5M'], dtype='object')
    

    In the second case, zero rows also, but you have duplicate column names without the space.

     df = pd.read_csv(StringIO("1.5M,1.5M"))
     df.columns
    
     Index(['1.5M', '1.5M.1'], dtype='object')
    

    Hence, Pandas adds the '.1' to the duplicated column name.

    However, if you want this '1.5M' as data in the dataframe and not as column headings.

    Use

    df = pd.read_csv(StringIO("1.5M, 1.5M"), header=None)
    

    OR, it doesn't make a difference in this case:

    df = pd.read_csv(StringIO("1.5M,1.5M"), header=None)
    

    Output:

          0     1
    0  1.5M  1.5M