Search code examples
pythondataframegroup-by

Python explain groupby


I'm working in a analysis of data mining. In a the groupby function is used like this:

df.groupby('tshirts')['id'].count()

What does ['id'] really do? I understand that the function are grouping by tshirts, but then the brackets do not know..

Can you explain for me, please? And, if you can give me an example, I appreciate.

Best regard.

pd: df is a dataframe.


Solution

  • So in the square brackets after groupby() you usually place the column names that you want to apply the function that follows to (in your case count()). So for example in your case, it groups by tshirts and then counts how many times each unique id value appears in the 'id' column.

    If your code was something like df.groupby(['tshirts'])['id', 'size'].count() then it would group by tshirts and then apply the count() function to both id and size columns.

    Generally the basic template goes like this df.groupby([list of cols to groupby])[list cols to apply function to].function()

    If you want to have a different function for each col in list of cols to apply function to try df.groupby([...]).agg('col1': 'count', 'col2': 'sum')