Search code examples
pythonpandasdataframeindexingmulti-index

Round floats in pandas multiindex DataFrame


I am given a pandas multiindex DataFrame indexed with floats. Consider the following example:

arrays = [[0.21,0.21,0.21,0.22,0.22,0.22,0.23,0.23,0.23],
          [0.81,0.8200000000000001,0.83,0.81,0.8200000000000001,0.83,0.81,0.8200000000000001,0.83]]
df = pd.DataFrame(np.random.randn(9, 2), index=arrays)

df

#               0           1
# 0.21  0.81    -2.234036   -0.145643
#       0.82    0.367248    -1.471617
#       0.83    -0.764520   0.686241
# 0.22  0.81    1.380429    1.546513
#       0.82    1.230707    1.826980
#       0.83    -1.198403   0.377323
# 0.23  0.81    -0.418367   -0.125763
#       0.82    0.682860    -0.119080
#       0.83    -1.802418   0.357573

I am given this DataFrame in this form. Now, if I want to retrieve the entry df.loc[(0.21, 0.82)] I get an error because the index doesn't really carry 0.82 but 0.8200000000000001. I don't know in advance where these problems occur in the index. How can I address this problem? My idea is to round both levels of the multiindex to the significant number of decimals, which is 2 in this case. But how can that be done? Is there a better solution?


Solution

  • Consider using integer numbers instead: multiply your floating-point numbers by 100 (or 1000) and convert to ints:

    df.index = pd.MultiIndex.from_product([
                 (df.index.levels[0] * 100).astype(int),
                 (df.index.levels[1] * 100).astype(int)])
    

    Integer numbers are precise, unlike floating-point numbers. Now, you can use df.loc[(21, 82)] to access your data.