Here is a simple example:
import pandas as pd
from io import StringIO
s = """a b c
------------
A1 1 2
A-2 -NA- 3
------------
B-1 2 -NA-
------------
"""
df = pd.read_csv(StringIO(s), sep='\s+', comment='-')
df
a b c
0 A1 1.0 2.0
1 A NaN NaN
2 B NaN NaN
For lines containing but not starting with the comment specifier, pandas
treats the substring from -
as comments.
My question is as above.
Not important but just for curiosity, can pandas
handle two different types of comment lines: starting with #
or -
import pandas as pd
from io import StringIO
s = """a b c
# comment line
------------
A1 1 2
A2 -NA- 3
------------
B1 2 -NA-
------------
"""
df = pd.read_csv(StringIO(s), sep='\s+', comment='#-')
df
raises
ValueError: Only length-1 comment characters supported
Another solution: You can "preprocess" the file before .read_csv
. For example:
import re
import pandas as pd
from io import StringIO
s = """a b c
# comment line
------------
A1 1 2
A-2 -NA- 3
------------
B-1 2 -NA-
------------
"""
df = pd.read_csv(
StringIO(re.sub(r"^-{2,}", "", s, flags=re.M)), sep=r"\s+", comment="#"
)
print(df)
Prints:
a b c
0 A1 1 2
1 A-2 -NA- 3
2 B-1 2 -NA-