Search code examples
python-3.xpandasdataframepivotpandas-groupby

Transform a dataframe using pivot


I am trying to transform a dataframe using pivot. Since the column contains duplicate entries, i tried to add a count column following what's suggested here (Question 10 posted in this answer).

import pandas as pd
from pprint import pprint


if __name__ == '__main__':
    d = {
        't': [0, 1, 2, 0, 1, 2, 0, 2, 0, 1],
        'input': [2, 2, 2, 2, 2, 2, 4, 4, 4, 4],
        'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A', 'A', 'B', 'B'],
        'value': [0.1, 0.2, 0.3, 1, 2, 3, 1, 2, 1, 1],
    }
    df = pd.DataFrame(d)
    df = df.drop('t', axis=1)
    df.insert(0, 'count', df.groupby('input').cumcount())
    pd.pivot(df, index='count', columns='type', values='value')

But I still get the same error raise ValueError("Index contains duplicate entries, cannot reshape") ValueError: Index contains duplicate entries, cannot reshape.

Could someone please suggest how to resolve this error?


Solution

  • As far as you have more then one value associated with 'A' and 'B' you have to aggregate values somehow.

    So if I've understood your issue right possible solution is the following:

    #pip install pandas
    
    import pandas as pd
    
    d = {
            't': [0, 1, 2, 0, 1, 2, 0, 2, 0, 1],
            'input': [2, 2, 2, 2, 2, 2, 4, 4, 4, 4],
            'type': ['A', 'A', 'A', 'B', 'B', 'B', 'A', 'A', 'B', 'B'],
            'value': [0.1, 0.2, 0.3, 1, 2, 3, 1, 2, 1, 1],
        }
    
    df = pd.DataFrame(d)
    df
    

    enter image description here

    # I've used aggfunc='sum' argument for example, the default value is 'mean'
    pd.pivot_table(df, index='t', columns='type', values='value', aggfunc='sum')
    

    Returns

    enter image description here