comment='#' in pandas explanation

Can anyone explain how comment='#' works within a csv file in pandas

pd.read_csv(..., comment='#',...)? Sample code is below.

# Read the raw file as-is: df1
df1 = pd.read_csv(file_messy)

# Print the output of df1.head()
print(df1.head(5))

# Read in the file with the correct parameters: df2
df2 = pd.read_csv(file_messy, delimiter=' ', header=3, comment='#')

# Print the output of df2.head()
print(df2.head())

# Save the cleaned up DataFrame to a CSV file without the index
df2.to_csv(file_clean, index=False)

Solution

Here is an example of how the comment argument works:

csv_string = """col1;col2;col3
1;4.4;99
#2;4.5;200
3;4.7;65"""

# Without comment argument
print(pd.read_csv(StringIO(csv_string), sep=";"))
#   col1  col2  col3
# 0    1   4.4    99
# 1   #2   4.5   200
# 2    3   4.7    65

# With comment argument
print(pd.read_csv(StringIO(csv_string), 
                  sep=";", comment="#")) 
#    col1  col2  col3
# 0     1   4.4    99
# 1     3   4.7    65