handle comment lines when reading csv using pandas

Here is a simple example:

import pandas as pd
from io import StringIO
s = """a   b   c
------------
A1    1    2
A-2  -NA-  3
------------
B-1   2   -NA-
------------
"""
df = pd.read_csv(StringIO(s), sep='\s+', comment='-')
df

a   b   c
0   A1  1.0 2.0
1   A   NaN NaN
2   B   NaN NaN

For lines containing but not starting with the comment specifier, pandas treats the substring from - as comments.

My question is as above.

Not important but just for curiosity, can pandas handle two different types of comment lines: starting with # or -

import pandas as pd
from io import StringIO
s = """a   b   c
# comment line
------------
A1   1    2
A2  -NA-  3
------------
B1   2   -NA-
------------
"""
df = pd.read_csv(StringIO(s), sep='\s+', comment='#-')
df

raises ValueError: Only length-1 comment characters supported

Solution

Another solution: You can "preprocess" the file before .read_csv. For example:

import re
import pandas as pd
from io import StringIO


s = """a   b   c
# comment line
------------
A1    1    2
A-2  -NA-  3
------------
B-1   2   -NA-
------------
"""

df = pd.read_csv(
    StringIO(re.sub(r"^-{2,}", "", s, flags=re.M)), sep=r"\s+", comment="#"
)
print(df)

Prints:

     a     b     c
0   A1     1     2
1  A-2  -NA-     3
2  B-1     2  -NA-