pandas dataframe numpy scikit-learn sklearn-pandas

Flatten all cells from float64 arrays to int in a Pandas dataframe

I have a Pandas DataFrame with 6 rows and 11 columns which contains a float64 array with a single value in each cell. The cells in the dataframe look like this:

And this is what I get after transforming the dataframe to a dictionary:

{'AO': {"W": [-0.09898120815033484],
 "X": [0.025084149326805416],
 "Y": [-0.043670609717370634],
 "Z": [-0.07389705882352943],
 "A": [-0.018586460390565218],
 "B": [-0.11756766854090006]},
'DR': {"W": [0.8163265306122449],
 "X": [1.0814940577249577],
 "Y": [0.8759551706571573],
 "Z": [0.8828522920203735],
 "A": [0.9473403118991668],
 "B": [0.7733390301217689]},
'DP': {"W": [-0.14516129032258063],
 "X": [0.05955334987593053],
 "Y": [-0.10348491287717809],
 "Z": [-0.0856079404466501],
 "A": [-0.043931563001247564],
 "B": [-0.1890928533238282]},
'PD': {"W": [-0.1255102040816326],
 "X": [0.09129967776584313],
 "Y": [-0.13698152666434293],
 "Z": [-0.03421052631578947],
 "A": [-0.0456818488984998],
 "B": [-0.1711920529801324]}}

Where the indexes of each row are W,X,Y,Z,A, and B. I want to get rid of all of the numpy array structures in each cell and flatten this DataFrame so that I can only have the int/float values in each cell. How can I do this?

Solution

Use applymap:

df = df.applymap(lambda x: x[0])

df:

         AO        DR        DP        PD
W -0.098981  0.816327 -0.145161 -0.125510
X  0.025084  1.081494  0.059553  0.091300
Y -0.043671  0.875955 -0.103485 -0.136982
Z -0.073897  0.882852 -0.085608 -0.034211
A -0.018586  0.947340 -0.043932 -0.045682
B -0.117568  0.773339 -0.189093 -0.171192

Timing information via perfplot:

from itertools import chain

import numpy as np
import pandas as pd
import perfplot

np.random.seed(5)


def gen_data(n):
    return pd.DataFrame(np.random.random(size=(n, 4)),
                        columns=['AO', 'DR', 'DP', 'PD']) \
        .applymap(lambda x: np.array([x]))


def chain_comprehension(df):
    return pd.DataFrame([list(chain(*i)) for i in df.values], index=df.index,
                        columns=df.columns)


def apply_map(df):
    return df.applymap(lambda x: x[0])


if __name__ == '__main__':
    out = perfplot.bench(
        setup=gen_data,
        kernels=[
            chain_comprehension,
            apply_map
        ],
        labels=[
            'chain_comprehension',
            'apply_map'
        ],
        n_range=[2 ** k for k in range(25)],
        equality_check=None
    )
    out.save('perfplot_results.png', transparent=False)