Search code examples
excelpandasintegerfloating

Pandas converts integer numbers to real numbers when reading from Excel


I recently started exploring python for analyzing excel data. I have an excel file with two worksheets, each one with one matrix (with m = 1000 rows and n= 999 columns).The elements of both matrices are related to each other: one of the matrices concerns diplacement values and the other matrix concerns the force values corresponding to each displacement. The displacements and corresponding forces are obtained from m=1000 numerical simulations and n= 999 increments. Is it possible to identify the force values that correspond only to displacement values that are integer numbers? Or, as an alternative, is it possible to replace all the decimal numbers from the matrix of displacements by 0? I tried to read the excel file into a Pandas dataframe, however all values from the matrix of displacements seem presented as "real numbers" (e.g. numbers "1", "2", "3", etc. from excel are presented with a floating point as "1.", "2.", "3." in python). Thank you for your attention.


Solution

  • Let's make an example in a smaller scale (3 * 3).

    I prepared an Excel file with 2 sheets and read them:

    displ = pd.read_excel('Input_2.xlsx', 'Displ')
    forces = pd.read_excel('Input_2.xlsx', 'Forces')
    

    Both DataFrames contain:

    displ                      forces
         C1    C2    C3              C1     C2     C3
    0  10.0  12.1  11.3        0  120.1  130.2  140.3
    1  12.5  13.0  13.5        1  150.4  160.5  170.6
    2  12.6  13.6  13.8        2  180.7  190.8  200.9
    

    To identify elements of displ containing integer numbers (actually, still float numbers, but with the fractional parts == 0.0), you can run:

    displ.mod(1.0) == 0.0
    

    and you will get:

          C1     C2     C3
    0   True  False  False
    1  False   True  False
    2  False  False  False
    

    And to get corresponding force values and NaN for other values, you can run:

    forces.where(displ.mod(1.0) == 0.0)
    

    getting:

          C1     C2  C3
    0  120.1    NaN NaN
    1    NaN  160.5 NaN
    2    NaN    NaN NaN
    

    Another option is to get a list of indices in displ where the corresponding element has zero fractional part. Actually it is a Numpy function, so it operates on the underlying Numpy array and returns integer (zero-based) indices:

    ind = np.nonzero((displ.mod(1.0) == 0.0).values)
    

    The result is:

    (array([0, 1], dtype=int64), array([0, 1], dtype=int64))
    

    so it is a 2-tuple of indices:

    • row indices,
    • column indices.

    You can also retrieve a list of indicated elements from forces, actually also from the underlying Numpy array, running:

    forces.values[ind]
    

    The result is:

    array([120.1, 160.5])
    

    To replace "integer" elements of displ with zeroes, you can run:

    displ.mask(displ.mod(1.0) == 0.0, 0, inplace=True)
    

    Now displ contains:

         C1    C2    C3
    0   0.0  12.1  11.3
    1  12.5   0.0  13.5
    2  12.6  13.6  13.8
    

    Note that the "wanted" elements are still float zeroes, but this is a feature of Pandas that each column has one type, fitting all elements in this column (in this case just float).