I wrote a function:
def main_table(data, gby_lst, col):
df = data.groupby(gby_lst)[col].describe()
df = df.reset_index()
for i in ['25%', '50%', '75%', 'std', 'min', 'max', 'mean']:
df[i] = df[i].apply(lambda x: float("{:.2f}".format(x)))
df['Mean ± SD'] = (df[['mean', 'std']]
.apply(lambda row: ' ± '.join(row.values.astype(str)), axis=1)
)
df['Median (IQR)'] = (df['50%'].astype(str) + ' (' + df[['25%', '75%']].apply(lambda row: ' - '.join(row.values.astype(str)),
axis=1) + ')'
)
df['Range'] = (df[['min', 'max']]
.apply(lambda row: ' - '.join(row.values.astype(str)), axis=1)
)
summary_list = gby_lst + ['Mean ± SD', 'Median (IQR)', 'Range']
return df.loc[:, summary_list]
But this will not include the ending 0s. For example, I want 3.40 ± 5.55
, this function currently gives me: 3.4 ± 5.55
.
How can I fix it?
Change the line from:
df[i] = df[i].apply(lambda x: float("{:.2f}".format(x)))
To this:
df[i] = df[i].apply(lambda x: "{:.2f}".format(x))
Python's float()
function will truncate 0s by default to save space, which it is doing when converting from a string
in this example. When using just the formatter {:.2f}
we are explicitly formatting the string
to have exactly 2 decimal places. In the first line, the formatter specifications are being overwritten after we cast it as float
.