Here is my pandas DataFrame:
id_country txt_template_1 txt_template_2 id_set id_question txt_question
0 NEUTRAL template neutral 1 template neutral 2 1 1 1_1
1 NEUTRAL template neutral 1 template neutral 2 1 2 1_2
2 NEUTRAL template neutral 1 template neutral 2 1 3 1_3
3 NEUTRAL template neutral 1 template neutral 2 1 4 1_4
4 NEUTRAL template neutral 1 template neutral 2 2 1 2_1
5 NEUTRAL template neutral 1 template neutral 2 2 2 2_2
6 NEUTRAL template neutral 1 template neutral 2 2 3 2_3
7 NEUTRAL template neutral 1 template neutral 2 2 4 2_4
8 FRA template FRA 1 template FRA 2 1 1 1_1
9 FRA template FRA 1 template FRA 2 1 2 1_2
10 FRA template FRA 1 template FRA 2 1 3 1_3
11 FRA template FRA 1 template FRA 2 1 4 1_4
12 FRA template FRA 1 template FRA 2 2 1 2_1
13 FRA template FRA 1 template FRA 2 2 2 2_2
14 FRA template FRA 1 template FRA 2 2 3 2_3
15 FRA template FRA 1 template FRA 2 2 4 2_4
Here is my function so far:
def ask_question(df):
grouped_country = df.groupby(['id_country'])
# loop through each group of country
for country_id, group_country_df in grouped_country:
grouped_id_set = group_country_df.groupby(['id_set'])
# loop through each group of id_set
for set_id, group_set_df in grouped_id_set:
print(set_id)
the output of print(set_id)
gives me the following:
(1,)
(2,)
(1,)
(2,)
(1,)
(2,)
[]
It seems like the group_country_df.groupby(['id_set'])
is creating a tuple of the id_set
values of the DataFrame, but from my understanding it shouldn’t.
What am I getting wrong? And how to make sure that set_id
is indead the value of id_set
and not a tuple?
You are grouping using a list (group_country_df.groupby(['id_set'])
), so this creates a MultiIndex with a single level, which then gets converted to a tuple in your for
loop.
Only use the column name:
# ...
grouped_id_set = group_country_df.groupby('id_set')
# ...
Example output:
1
2
1
2