I have a sframe where I want to do a groupby with some operator on a column. But, this returns an sframe only with key columns specified. How can I do the operation on some columns, but keep all the columns nonetheless?
To the best of my understanding from your question, you want to do operations on column without loosing their initial state. The below example may illustrate. Suppose we have a movie dataset as SFrame sf :-
movieId userId actors rating
102 10 A,B,C 5
204 8 B,C,D 4
333 3 K,L,M 3
204 11 P,Q,R 1
423 3 K,B,C 4
533 31 K,A,C 2
633 3 P,L,A 3
.
.
...
In the above SFrame, user 3 gave multiple rating, so you may work on user's rating mean as
rating_stats = sf.groupby(key_columns='userId',operations {'mean_rating': agg.MEAN('rating')})
Then, you may like to add the found column in SFrame without affecting already present columns, i.e you can retain SFrame.
sf['mean_rating'] = rating_stats['mean_rating']
You will find that sf is not affected and you added a new column.
Now answer to your question can be, if you are using groupby()
method, its better to have a separate SFrame where you are specific to the operation, and you may further use or add to the original SFrame, or maybe merge rest of columns to your found SFrame using groupby()
method or you can also use join
on found SFrame, but its not a good practice to keep changing original SFrame to operate.
Also, note that for multiple entities in a column like in actors
in SFrame, method that can make things easy is using stack
method before using groupby()
to operate on data. I hope that helps.