Search code examples
pythonpandaspandas-groupby

Pandas sorted groupby returning series data which is not able to access


Sample Data:

    user_id     content_id      date
0   user_44289  cont_3375_16_10 2020-03-06
1   user_44289  cont_1195_1_8   2019-04-18
2   user_44289  cont_3470_2_15  2021-09-18
3   user_44289  cont_310_25_9   2020-09-08
4   user_44289  cont_4350_1_3   2021-06-25
5   user_40584  cont_1399_27_6  2018-11-14
6   user_40584  cont_1808_2_4   2021-05-07
7   user_40584  cont_2615_7_24  2021-10-14

Using below pandas query I am grouping and sorting which is returning all_users_list which is of type pandas.core.series.Series

all_users_list = final_data.sort_values(by=['user_id','date','content_id'], ascending=False).groupby(['user_id','date','content_id'], sort=False)['user_id','content_id','date'].apply(list)

Output:

user_id     date        content_id    
user_99974  2021-10-09  cont_4104_7_52    [user_id, content_id, date]
            2021-10-04  cont_2253_6_4     [user_id, content_id, date]
            2021-08-30  cont_2311_4_4     [user_id, content_id, date]
            2021-07-22  cont_676_5_31     [user_id, content_id, date]
            2021-05-28  cont_2456_6_1     [user_id, content_id, date]
                                                                ...             
user_10013  2018-12-04  cont_2597_6_8     [user_id, content_id, date]
            2018-09-11  cont_2233_3_8     [user_id, content_id, date]
            2018-08-13  cont_300_1_1      [user_id, content_id, date]
            2018-04-10  cont_2244_16_1    [user_id, content_id, date]
            2018-02-03  cont_3189_6_12    [user_id, content_id, date]

But I need to access 3 columns data of user_id, content_id and date from this all_users_list.

result = all_users_list.values.tolist()
result[0:10]

It is always returning below data, but I need to access actual data displayed above with grouped "user_id", "date" and "content_id"

[['user_id', 'content_id', 'date'],
 ['user_id', 'content_id', 'date'],
 ['user_id', 'content_id', 'date'],
 ['user_id', 'content_id', 'date'],
 ['user_id', 'content_id', 'date'],
 ['user_id', 'content_id', 'date'],
 ['user_id', 'content_id', 'date'],
 ['user_id', 'content_id', 'date'],
 ['user_id', 'content_id', 'date'],
 ['user_id', 'content_id', 'date']]

Please help on this. Thanks

Update:

def getContent(user):
  indices = np.where(result == 'user_10013')
  return result[indices][1] ## this should return the list of content_id for the retrieved user_id 'user_10013'

But printing result is always displaying ['user_id', 'content_id', 'date']


Solution

  • Do you want something like:

    out = df.sort_values('date', ascending=False).groupby('user_id').agg(list)
    print(out)
    
    # Output
                                                       content_id                                               date
    user_id                                                                                                         
    user_40584    [cont_2615_7_24, cont_1808_2_4, cont_1399_27_6]               [2021-10-14, 2021-05-07, 2018-11-14]
    user_44289  [cont_3470_2_15, cont_4350_1_3, cont_310_25_9,...  [2021-09-18, 2021-06-25, 2020-09-08, 2020-03-0...