Search code examples
pythonpandascsvimportcomments

Keep CSV file's comment lines in pandas?


I have just started delving into the world of Pandas, and the first strange CSV file I've found is one where there are two lines of comments (with different column widths) right at the beginning.

sometext, sometext2
moretext, moretext1, moretext2
*header*
actual data ---
---------------

I know how to skip these lines with skiprows or header=, but, instead, how would I retain these comments while using read_csv? Sometimes comments are necessary as file meta information, and I do not want to throw them away.


Solution

  • Pandas is designed to read structured data.

    For unstructured data, just use the built-in open:

    with open('file.csv') as f:
        reader = csv.reader(f)
        row1 = next(reader)  # gets the first line
        row2 = next(reader)  # gets the second line
    

    You can attach strings to the dataframe like this:

    df.comments = 'My Comments'
    

    But note:

    Note, however, that while you can attach attributes to a DataFrame, operations performed on the DataFrame (such as groupby, pivot, join or loc to name just a few) may return a new DataFrame without the metadata attached. Pandas does not yet have a robust method of propagating metadata attached to DataFrames.