Search code examples
pythonexcelpandasnanxlsx

Python recognizing hyphens (-) as 0, want NANs though


I have an xlsx file where there are hyphens for some values. Upon clicking on the hyphens in the Excel sheet, they turn into 0 I noticed, too.

I am bringing this xlsx file into Python and even if I put na_values='-'... etc. at the very beginning of reading the file (pd.read_excel), Python will not recognize the hyphens by anything else other than 0. For my use case, this is wrong.

How can I make these hyphens into NANs?


Solution

  • This seems to be an Excel caused problem. Excel often formats zeros as "-" in specific formats, such as currency. If they are showing up as 0s in Excel when you click them, then they are actually 0s, not the display format, e.g., "-".

    This raises the question, are the dashes distinct from what you believe are actual true 0s? If so, this must be distinguished somehow. If not and all 0s are actually NaN, replace the 0s with NaN.

    However, from what you've shared thus far, it sounds like you may have fallen victim to someone else's poorly formatted Excel workbook.