Search code examples
pythonpandasdataframereshape

Reshaping DataFrame with pandas


So I'm working with pandas on python. I collect data indexed by timestamps with multiple ways.

This means I can have one index with 2 features available (and the others with NaN values, it's normal) or all features, it depends.

So my problem is when I add some data with multiple values for the same indices, see the example below :

Imagine this is the set we're adding new data :

Index col1 col2
    1   a    A
    2   b    B
    3   c    C

This the data we will add:

Index new col 
    1      z    
    1      y    

Then the result is this :

Index col1 col2 new col
    1   a    A    NaN
    1   NaN  NaN  z
    1   NaN  NaN  y
    2   b    B    NaN
    3   c    C    NaN

So instead of that, I would like the result to be :

Index col1 col2 new col1 new col2
    1   a    A    z        y
    2   b    B    NaN      NaN
    3   c    C    NaN      NaN

I want that instead of having multiples indexes in 1 feature, there will be 1 index for multiple features.

I don't know if this is understandable. Another way is to say that I want this : number of values per timestamp=number of features instead of =numbers of indices.


Solution

  • This solution assumes the data that you need to add is a series.

    Original df:

    df = pd.DataFrame(np.random.randint(0,3,size=(3,3)),columns = list('ABC'),index = [1,2,3])
    

    Data to add (series):

    s = pd.Series(['x','y'],index = [1,1])
    

    Solution:

    df.join(s.to_frame()
            .assign(cc = lambda x: x.groupby(level=0)
                    .cumcount().add(1))
            .set_index('cc',append=True)[0]
            .unstack()
            .rename('New Col{}'.format,axis=1))
    

    Output:

       A  B  C New Col1 New Col2
    1  1  2  2        x        y
    2  0  1  2      NaN      NaN
    3  2  2  0      NaN      NaN