I have a Pandas DataFrame with 6 rows and 11 columns which contains a float64 array with a single value in each cell. The cells in the dataframe look like this:
And this is what I get after transforming the dataframe to a dictionary:
{'AO': {"W": [-0.09898120815033484],
"X": [0.025084149326805416],
"Y": [-0.043670609717370634],
"Z": [-0.07389705882352943],
"A": [-0.018586460390565218],
"B": [-0.11756766854090006]},
'DR': {"W": [0.8163265306122449],
"X": [1.0814940577249577],
"Y": [0.8759551706571573],
"Z": [0.8828522920203735],
"A": [0.9473403118991668],
"B": [0.7733390301217689]},
'DP': {"W": [-0.14516129032258063],
"X": [0.05955334987593053],
"Y": [-0.10348491287717809],
"Z": [-0.0856079404466501],
"A": [-0.043931563001247564],
"B": [-0.1890928533238282]},
'PD': {"W": [-0.1255102040816326],
"X": [0.09129967776584313],
"Y": [-0.13698152666434293],
"Z": [-0.03421052631578947],
"A": [-0.0456818488984998],
"B": [-0.1711920529801324]}}
Where the indexes of each row are W,X,Y,Z,A, and B. I want to get rid of all of the numpy array structures in each cell and flatten this DataFrame so that I can only have the int/float values in each cell. How can I do this?
Use applymap:
df = df.applymap(lambda x: x[0])
df
:
AO DR DP PD
W -0.098981 0.816327 -0.145161 -0.125510
X 0.025084 1.081494 0.059553 0.091300
Y -0.043671 0.875955 -0.103485 -0.136982
Z -0.073897 0.882852 -0.085608 -0.034211
A -0.018586 0.947340 -0.043932 -0.045682
B -0.117568 0.773339 -0.189093 -0.171192
Timing information via perfplot:
from itertools import chain
import numpy as np
import pandas as pd
import perfplot
np.random.seed(5)
def gen_data(n):
return pd.DataFrame(np.random.random(size=(n, 4)),
columns=['AO', 'DR', 'DP', 'PD']) \
.applymap(lambda x: np.array([x]))
def chain_comprehension(df):
return pd.DataFrame([list(chain(*i)) for i in df.values], index=df.index,
columns=df.columns)
def apply_map(df):
return df.applymap(lambda x: x[0])
if __name__ == '__main__':
out = perfplot.bench(
setup=gen_data,
kernels=[
chain_comprehension,
apply_map
],
labels=[
'chain_comprehension',
'apply_map'
],
n_range=[2 ** k for k in range(25)],
equality_check=None
)
out.save('perfplot_results.png', transparent=False)