Search code examples
pythonpandaspivotmelt

Converting rows to wide columns based on duplicated ids in another column in pandas


My question is similar to this, this, and this questions.

But still cannot resolve it.

I have a dataframe with duplicated ids

ID  Publication_type
1   Journal          
1   Clinical study   
1   Guideline        
2   Journal          
2   Letter           

I want to make it wide, but I do not know how many publication type will I have - maybe 2, maybe 20. Thus, I do not know how many columns wide will I need. The max size of wide columns for publication_type must be not be more than the number of types for each id.

Expected output

 ID Publication_type1 Publication_type2 Publication_type 3    etc
 1  Journal           Clinical Study    Guideline
 2  Journal           Letter            NaN

For now I do not need to put the same publication type into the same column. I do not need all articles in the same column. Thanks!


Solution

  • You can group by ID, aggregate via list, and then create a new DataFrame from the results:

    col = 'Publication_type'
    new_df = pd.DataFrame(df.groupby('ID')[col].agg(lambda x: x.tolist()).tolist()).replace({None: np.nan})
    new_df.columns = [f'{col}{i}' for i in new_df.columns + 1]
    new_df['ID'] = df['ID'].drop_duplicates().reset_index(drop=True)
    

    Output:

    >>> df
      Publication_type1 Publication_type2 Publication_type3  ID
    0           Journal    Clinical-study         Guideline   1
    1           Journal            Letter               NaN   2