Search code examples
pythonpandascomments

comment='#' in pandas explanation


Can anyone explain how comment='#' works within a csv file in pandas

pd.read_csv(..., comment='#',...)? Sample code is below.

# Read the raw file as-is: df1
df1 = pd.read_csv(file_messy)

# Print the output of df1.head()
print(df1.head(5))

# Read in the file with the correct parameters: df2
df2 = pd.read_csv(file_messy, delimiter=' ', header=3, comment='#')

# Print the output of df2.head()
print(df2.head())

# Save the cleaned up DataFrame to a CSV file without the index
df2.to_csv(file_clean, index=False)

Solution

  • Here is an example of how the comment argument works:

    csv_string = """col1;col2;col3
    1;4.4;99
    #2;4.5;200
    3;4.7;65"""
    
    # Without comment argument
    print(pd.read_csv(StringIO(csv_string), sep=";"))
    #   col1  col2  col3
    # 0    1   4.4    99
    # 1   #2   4.5   200
    # 2    3   4.7    65
    
    # With comment argument
    print(pd.read_csv(StringIO(csv_string), 
                      sep=";", comment="#")) 
    #    col1  col2  col3
    # 0     1   4.4    99
    # 1     3   4.7    65