Search code examples
pythonarrayspandasnumpyvstack

Pandas Dataframe to Numpy Vstack Array by Unique Column Value


I have a dataframe with following structure:

import numpy as np
import pandas as pd

data = {'Group':['1', '1', '2', '2', '3', '3'], 'Value':[1, 2, 3, 4, 5, 6]} 
df = pd.DataFrame(data) 

I need to convert that dataframe (which has approx 4000 values per unique group, and 1000 groups) to a numpy array like the following one (order shall be preservered)

array([[1, 2],[3, 4],[5,6])

Additionaly: 99% percent of the groups have the same count of values, but some have different counts. If some padding would be possilbe to increase to the max. count, that would spare me lost data.

At the moment I iterate trough the uniqe 'Group' values and numpy.vstack them together. That is slow and far from elegant.


Solution

  • This is just pivot:

    (df.assign(col=df.groupby('Group').cumcount())
      .pivot(index='Group', columns='col', values='Value')
      .values
    )
    

    Output:

    array([[1, 2],
           [3, 4],
           [5, 6]], dtype=int64)