Search code examples
pythonpandasdecimalxlm

pandas.read_html not support decimal comma


I was reading an xlm file using pandas.read_html and works almost perfect, the problem is that the file has commas as decimal separators instead of dots (the default in read_html).

I could easily replace the commas by dots in one file, but i have almost 200 files with that configuration. with pandas.read_csv you can define the decimal separator, but i don't know why in pandas.read_html you can only define the thousand separator.

any guidance in this matter?, there is another way to automate the comma/dot replacement before it is open by pandas? thanks in advance!


Solution

  • Thanks @zhqiat. I think upgrading pandas to version 0.19 will solve the problem. unfortunately I couldn't found an easy way to accomplish that. I found a tutorial to upgrade Pandas but for ubuntu (winXP user).

    I finally chose the workaround, using the method posted here, basically converting all columns, one by one, to a numeric type of pandas.Series

    result[col] = result[col].apply(lambda x: x.str.replace(".","").str.replace(",","."))
    

    I know that this solution ain't the best, but works. Thanks