Search code examples
pythonexcelpandascsvlibreoffice

Cannot import comma delimited quoted file in Pandas


I am trying to import this data to Pandas. It imports correctly in Libreoffice with a header and 2 lines. However, in Pandas, it does not seem to import. There are some newline characters inside that should be ignored. But in Pandas, they are being treated as a new line instead of single field with \n inside it. Has anyone faced such issue. I've tried setting quotechar and sep parameters in Pandas but cannot import it.

Thanks in advance

data is here

This should be imported as 2 rows in Pandas. However, it is being split into multiple rows.

Output in pandas is this


Solution

  • Your problem is that you have " as escape char (\") in your text that panda should ignore.

    e.g.

    7/20/16: Decreased STS, active flexion to 130, abduction to 100, weak right handed grip. Lack 6-8 \" IR on right, only to waist.

    The \" after Lack 6-8 shouldn't be interpreted as quote character.

    You have to tell pandas that.

    That should work:

    import pandas as pd
    
    df = pd.read_csv("resources/data_to_post.csv", quotechar='"', escapechar='\\')
    print(df)
    

    Output

       id  ...                                      PlanGenerated
    0   1  ...  A course of physical therapy was ordered. Mobi...
    1   2  ...  The patient is instructed to return if pain or...
    
    [2 rows x 17 columns]
    

    only two rows then 6 as before.