Search code examples
pythonstringpandaslocale

Trying to use locale.atof() for changing the period in decimals within a dataframe to convert using pivot_tables


Say I have a dataframe which looks as follows. The values within the column Value are decimals.

df.head()

         ID Key Value
0   A0AVT1  MAHA    4842000
1   A0FGR8  MAHA    3522710
2   A0JLT2  MAHA    283,433
3   A0JNW5  MAHA    356,09677
4   A0MZ66  CEB 37,5
5   A0PJW6  CEB 487,03677
6   A1AG    CEB 10,625567
7   A1L0T0  HAC 12
8   A1L390  HAC 63,946
9   A1X283  HAC 138,25

And I want to use the pandas pivot_tables to cast the above dataframe, by using ID as index and Key as the columns with values from column Value. And so I tried the following one liner:

df2.reset_index().pivot_table(values='Value',index='ID',columns='Key')

However, the above one liner is throwing this data error:

~/software/anaconda/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in _cython_agg_blocks(self, how, alt, numeric_only, min_count)
   4042 
   4043         if len(new_blocks) == 0:
-> 4044             raise DataError('No numeric types to aggregate')
   4045 
   4046         # reset the locs in the blocks to correspond to our

DataError: No numeric types to aggregate

Further, I have tried to use the module locale to convert the , in the Value column in my dataframe df. Here is what I have tried:

import locale
locale.setlocale(locale.LC_ALL, 'de_DE') #Germany
df.Value.astype(str).apply(locale.atof)

And it is throwing the error:

TypeError: data type not understood 

I have tried using astype (float). It did not change anything.

Any help/suggestions are much appreciated! Thank you.


Solution

  • The universal way is to set the locale correctly is to let the system find it out from the enviroment:

    locale.setlocale(locale.LC_NUMERIC, '')
    

    This yields on my machine:

    >>> locale.setlocale(locale.LC_NUMERIC, '')
    'de_DE.UTF-8'
    >>> df.Value.apply(locale.atof)
    0    4.842000e+06
    1    3.522710e+06
    2    2.834330e+02
    3    3.560968e+02
    4    3.750000e+01
    5    4.870368e+02
    6    1.062557e+01
    7    1.200000e+01
    8    6.394600e+01
    9    1.382500e+02
    

    If you want to set the locale explicitely, you'll have to use different locale strings for Linux and Windows:

    Linux:

    locale.setlocale(locale.LC_NUMERIC, 'de_DE.UTF8')   # or 'de_DE.UTF-8'
    

    Windows:

    locale.setlocale(locale.LC_NUMERIC, 'German') # or 'de' or 'deu' (case insensitive)