I'm trying to create a function that can pull the first instance of 1 from Expert_Grading_Score column based off of conditions from other columns (subject and treatment). I created two lists with all unique values from subjects and treatment columns:
subj_list = df_sorted['subj'].unique().tolist()
print(subj_list)
Output: [1001, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022]
Treatment_list = df_sorted['Treatment'].unique().tolist()
print(Treatment_list)
Output: ['MPPDDu', 'Product A', 'Product C', 'Product D', 'Product E', 'Product F', 'Product G', 'Product H', 'Std S2', 'Product B', 'Product I', 'Product J']
Then I created a nested for loop to sort through each product on subject basis and apply iloc:
score = 1
for idx, subject in enumerate(subj_list):
for idx2, treatment in enumerate(Treatment_list):
selected_subsite = df_sorted[(df_sorted.subj == subject) & (df_sorted.Treatment == treatment) & (df_sorted.Expert_Grading_Score == score)].iloc[0]
print(selected_subsite)
At this point, I get the ouptut I want but get an index error when I reach Product B for subject 1001. After some thought and checking the master datasheet, I realized that there are certain subjects that are missing certain treatments, in this case Product B for subject 1001. I tried adding a continue statement with conditionals to circumvent the missing data:
score = 1
for idx, subject in enumerate(subj_list):
for idx2, treatment in enumerate(Treatment_list):
if treatment != '':
selected_subsite = df_sorted[(df_sorted.subj == subject) & (df_sorted.Treatment == treatment) & (df_sorted.Expert_Grading_Score == score)].iloc[0]
print(selected_subsite)
if treatment == '':
continue
But got the same error. How can I adjust this code so that it skips missing datapoints and continues to run for the rest? Any advice would be much appreciated!
Example of input dataframe (left) and output dataframe (right):
Replace your if
statement with a try/except
block, as shown below:
try:
selected_subsite = df_sorted[(df_sorted.subj == subject) & (df_sorted.Treatment == treatment) & (df_sorted.Expert_Grading_Score == score)].iloc[0]
print(selected_subsite)
except IndexError as e:
continue
Your code iterates through a list of unique values from the Treatment column, so treatment will never be empty, and the second condition will never be reached.
With this logic provided above, when there are no rows matching the treatment
and subject
variables, it will handle the IndexError
exception and continue the iteration.