Search code examples
pythondataframegroup-bytuples

DataFrame groupby function returning tuple from column instead of the value


Here is my pandas DataFrame:

    id_country  txt_template_1  txt_template_2  id_set  id_question     txt_question
0   NEUTRAL     template neutral 1  template neutral 2  1   1   1_1
1   NEUTRAL     template neutral 1  template neutral 2  1   2   1_2
2   NEUTRAL     template neutral 1  template neutral 2  1   3   1_3
3   NEUTRAL     template neutral 1  template neutral 2  1   4   1_4
4   NEUTRAL     template neutral 1  template neutral 2  2   1   2_1
5   NEUTRAL     template neutral 1  template neutral 2  2   2   2_2
6   NEUTRAL     template neutral 1  template neutral 2  2   3   2_3
7   NEUTRAL     template neutral 1  template neutral 2  2   4   2_4
8   FRA     template FRA 1  template FRA 2  1   1   1_1
9   FRA     template FRA 1  template FRA 2  1   2   1_2
10  FRA     template FRA 1  template FRA 2  1   3   1_3
11  FRA     template FRA 1  template FRA 2  1   4   1_4
12  FRA     template FRA 1  template FRA 2  2   1   2_1
13  FRA     template FRA 1  template FRA 2  2   2   2_2
14  FRA     template FRA 1  template FRA 2  2   3   2_3
15  FRA     template FRA 1  template FRA 2  2   4   2_4

Here is my function so far:

def ask_question(df):
  grouped_country = df.groupby(['id_country'])

  # loop through each group of country
  for country_id, group_country_df in grouped_country:
    grouped_id_set = group_country_df.groupby(['id_set'])

    # loop through each group of id_set
    for set_id, group_set_df in grouped_id_set:
      print(set_id)

the output of print(set_id) gives me the following:

(1,)
(2,)
(1,)
(2,)
(1,)
(2,)

[]

It seems like the group_country_df.groupby(['id_set']) is creating a tuple of the id_set values of the DataFrame, but from my understanding it shouldn’t.

What am I getting wrong? And how to make sure that set_id is indead the value of id_set and not a tuple?


Solution

  • You are grouping using a list (group_country_df.groupby(['id_set'])), so this creates a MultiIndex with a single level, which then gets converted to a tuple in your for loop.

    Only use the column name:

    # ...
        grouped_id_set = group_country_df.groupby('id_set')
        # ...
    

    Example output:

    1
    2
    1
    2