Search code examples
pythonpandascsvnumber-formatting

Pandas to_csv how to format both int and float variables with comma as thousands separator


All the data I crunch needs to be reported with comma as thousands separator. I'm only interested in values with comma as thousands separator after the data is written to a .csv file. Internally within my pandas dataframe, I want to keep them as int or float.

floats

I can output the floats to zero decimal with:

 df.to_csv('c:\Awesome\Groovy.csv', float_format = '%.0f')

but whenever I try to put a comma in the float it doesn't work.

 df.to_csv('c:\Awesome\Groovy.csv', float_format = ':,.0f') ## WRONG
 df.to_csv('c:\Awesome\Groovy.csv', float_format = {:,.0f}'.format) ## WRONG

ints

And my plan for the ints, is first to convert them to float in the dataframe and then format them with the .to_csv function. Any thoughts?


Solution

  • This might be too kludgy for you. At any rate here goes. Using this answer https://stackoverflow.com/a/4205875/42346 we can have our commas:

    def thous(x, sep=',', dot='.'):
        num, _, frac = str(x).partition(dot)
        num = re.sub(r'(\d{3})(?=\d)', r'\1'+sep, num[::-1])[::-1]
        if frac:
            num += dot + frac
        return num
    
    df['my_column'] = df['my_column'].apply(lambda x: thous(x))
    

    Using some sample data I had lying around from another SO question:

    >>> df = pd.DataFrame({'date':['2017-03-10','2017-03-11','2017-03-12'],
                           'activate_time':['2017-03-10 12:13:30','2017-03-11 13:57:49','2017-03-12 14:28:05'],
                           'mycol':[1234.56789,9876.54321,1111111.11111]})
    >>> df['mycol'] = df['mycol'].apply(lambda x: thous(x))
    >>> df
              activate_time        date            mycol
    0   2017-03-10 12:13:30  2017-03-10      1,234.56789
    1   2017-03-11 13:57:49  2017-03-11      9,876.54321
    2   2017-03-12 14:28:05  2017-03-12  1,111,111.11111