python pandas dataframe numpy data-cleaning

How to clean survey data in pandas

Input:

Output:

here's the data:

d = {'Morning': ["Didn't answer", "Didn't answer", "Didn't answer", 'Morning', "Didn't answer"], 'Afternoon': ["Didn't answer", 'Afternoon', "Didn't answer", 'Afternoon', "Didn't answer"], 'Night': ["Didn't answer", 'Night', "Didn't answer", 'Night', 'Night'], 'Sporadic': ["Didn't answer", "Didn't answer", 'Sporadic', "Didn't answer", "Didn't answer"], 'Constant': ["Didn't answer", "Didn't answer", "Didn't answer", 'Constant', "Didn't answer"]}

         Morning      Afternoon          Night       Sporadic       Constant
0  Didn't answer  Didn't answer  Didn't answer  Didn't answer  Didn't answer
1  Didn't answer      Afternoon          Night  Didn't answer  Didn't answer
2  Didn't answer  Didn't answer  Didn't answer       Sporadic  Didn't answer
3        Morning      Afternoon          Night  Didn't answer       Constant
4  Didn't answer  Didn't answer          Night  Didn't answer  Didn't answer

I want the output to be:

d = {"Time of the day": ["Didn't answer", "['Afternoon', 'Night']", "Sporadic", "['Morning', 'Afternoon', 'Night', 'Constant']", "Night"]}

                                 Time of the day
0                                  Didn't answer
1                         ['Afternoon', 'Night']
2                                       Sporadic
3  ['Morning', 'Afternoon', 'Night', 'Constant']
4                                          Night

so if there's no answer in every column in a row, the value in the new data frame would be "Didn't answer" and if there's at least one answer like "night", the value in the new data frame would be "night" and if there are multiple answers like " Morning", "Night" the value in the new data frame would be a list of the answers

Solution

You can use:

df["ToD"] = (df.replace("Didn't answer", np.nan).stack().groupby(level=0)
               .apply(lambda x: [i for i in x] if len(x) > 1 else x.iloc[0])
               .reindex(df.index, fill_value="Didn't answer"))

Output:

>>> df["ToD"]
0                            Didn't answer
1                       [Afternoon, Night]
2                                 Sporadic
3    [Morning, Afternoon, Night, Constant]
4                                    Night
Name: ToD, dtype: object