Search code examples
pythonpandasdataframeformat

Python Pandas DataFrame format() not updating df value


When attempting to update the format of a column which contains floats or strings, the column values only update for some input files and not others.

Here is the code:

    try:
        print('{:.2e}'.format(cell_counts.iat[0,1]))
        cell_counts.iat[0,1] = '{:.2e}'.format(cell_counts.iat[0,1])
        print(cell_counts.iat[0,1])
    except ValueError:
        cell_counts.iat[0,1] = cell_counts.iat[0,1]

    for x in range(0,8):
        try:
            cell_counts.iat[x,2] = '{:.2e}'.format(cell_counts.iat[x,2])
        except ValueError:
            cell_counts.iat[x,2] = cell_counts.iat[x,2]
    
    for x in range(0,8):
        try:
            cell_counts.iat[x,5] = '{:.2e}'.format(cell_counts.iat[x,5])
        except ValueError:
            cell_counts.iat[x,5] = cell_counts.iat[x,5]

    try:
        cell_counts.at[0,'Average Cells (Dead or Live)'] = '{:.2e}'.format(cell_counts.at[0,'Average Cells (Dead or Live)'])
        cell_counts.at[4,'Average Cells (Dead or Live)'] = '{:.2e}'.format(cell_counts.at[4,'Average Cells (Dead or Live)'])
    except ValueError:
        cell_counts.at[0,'Average Cells (Dead or Live)'] = cell_counts.at[0,'Average Cells (Dead or Live)']
        cell_counts.at[4,'Average Cells (Dead or Live)'] = cell_counts.at[4,'Average Cells (Dead or Live)']


    try:
        cell_counts.at[0,'Standard Deviation'] = '{:.2e}'.format(cell_counts.at[0,'Standard Deviation'])
        cell_counts.at[4,'Standard Deviation'] = '{:.2e}'.format(cell_counts.at[4,'Standard Deviation'])
    except ValueError:
        cell_counts.at[0,'Standard Deviation'] = cell_counts.at[0,'Standard Deviation']
        cell_counts.at[4,'Standard Deviation'] = cell_counts.at[4,'Standard Deviation']

    try:
        cell_counts.at[0,'Calculated Cell Suspension'] = '{:.2e}'.format(cell_counts.at[0,'Calculated Cell Suspension'])
        cell_counts.at[4,'Calculated Cell Suspension'] = '{:.2e}'.format(cell_counts.at[4,'Calculated Cell Suspension'])
    except ValueError:
        cell_counts.at[0,'Calculated Cell Suspension'] = cell_counts.at[0,'Calculated Cell Suspension']
        cell_counts.at[4,'Calculated Cell Suspension'] = cell_counts.at[4,'Calculated Cell Suspension']

    try:
        cell_counts.at[0,'Cell Recovery'] = '{:.2e}'.format(cell_counts.at[0,'Cell Recovery'])
        cell_counts.at[4,'Cell Recovery'] = '{:.2e}'.format(cell_counts.at[4,'Cell Recovery'])
    except ValueError:
        cell_counts.at[0,'Cell Recovery'] = cell_counts.at[0,'Cell Recovery']
        cell_counts.at[4,'Cell Recovery'] = cell_counts.at[4,'Cell Recovery']

The formatting string is correct, when checking it with a print statement it formats correctly and even works for some files. Here is one of the outputs:

Here is the Output

The formatting works for some of the columns but not others. At the top of the screenshot we can see that the formatting does what we want but the value is not updated after storing at the desired location. I know that some use iat and some us at, I am trying everything I can. Is this a situation where I am updating a view/copy of the dataframe?

[Here](https://i.sstatic.net/uvRtI.png) is the output using different input datafiles. The formatting behaves as expected. I have also tried using if isinstance instead of try/except with the same results.

Any help is appreciated. I will also note that I am clearly not a pro at this.

EDIT: After attempting the suggestion by Serge below, I am still getting the same results:

def format_to_scientific(value):
    try:
        return '{:.2e}'.format(float(value))
    except (ValueError, TypeError):
        return value


for col_index in [1,2,5]:
    for row_index in range(0,8):
        cell_counts.iat[row_index,col_index] = format_to_scientific(cell_counts.iat[row_index,col_index])

target_cells = [
    (0, 'Average Cells (Dead or Live)'),
    (4, 'Average Cells (Dead or Live)'),
    (0, 'Standard Deviation'),
    (4, 'Standard Deviation'),
    (0, 'Calculated Cell Suspension'),
    (4, 'Calculated Cell Suspension'),
    (0, 'Cell Recovery'),
    (4, 'Cell Recovery')
]
for row_index, col_name in target_cells:
    cell_counts.at[row_index, col_name] = format_to_scientific(cell_counts.at[row_index, col_name])

Output The columns in the beginning are not formatted while the end columns are.

Checking where we are going wrong:

    for col_index in [1,2,5]:
    for row_index in range(0,8):
        return_value = format_to_scientific(cell_counts.iat[row_index,col_index])
        print(f'Formatted value:{return_value}')
        cell_counts.iat[row_index,col_index] = return_value
        print(cell_counts.iat[row_index,col_index])

Output 2 with prints Dataframe output The formatting works but after assignment the dataframe values have not been updated.


Solution

  • The issue was that the try/except block didn't want to update any columns that had no exceptions when testing row by row. This is interesting behavior to me and still quite does not make sense. First try to format the entire column using map(). If that doesn't work go back to formatting row by row.

    for col_index in ['Average Cells per Replicate', 'Standard dev per Replicate', 'Included Average Cells/Replicate']:
        try:
            viability_out[col_index] = viability_out[col_index].map('{:.2e}'.format)
        except (ValueError, TypeError):
            for row_index in range(0,8):
                viability_out.at[row_index,col_index] = format_to_scientific(viability_out.at[row_index,col_index])