Search code examples
pythonpandasdataframeplotlydata-analysis

How to get rid of certain rows when calculating mean with pandas making a chart with plotly


So iam trying to make a bar chart with plotly and pandas, to plot the mean rating for different categories of a data frame I have. Each category has a rating of 1-5, and categories are the columns that have "rating" at the end of them. I can get it to work all fine. However, something I want to account for is that in my categories (different columns in the dataframe) they have a value of -1 if that category was not rated. I want to know when I calculate the mean and go to plot the chart, how can I ensure that the -1 one values are not being accounted during the mean calculations?

my code

# Plot to find mean rating for different categories

# Take columns we are interested in and stack them into column named 'category'
# This will allow us to group by category and calculate mean rating
dfm = pd.melt(df, id_vars=["id", "course", "date", "overall_rating", "job_prospects_desc", "course_lecturer_desc", "facilities_desc", "student_support_desc", "local_life_desc"],
              value_vars=["job_prospects_rating", "course_lecturer_rating", "facilities_rating", "student_support_rating", "local_life_rating"],
              var_name ='Category')

# Group by category and calculate mean rating
dfg = dfm.groupby(['Category']).mean().reset_index()

print(df)

fig2 = px.bar(dfg, x = 'Category', y = 'value', color = 'Category',
             category_orders = {'Category':['job_prospects_rating','course_lecturer_rating','facilities_rating','student_support_rating','local_life_rating']},
             color_discrete_map = {
                    'job_prospects_rating' : 'light blue',
                    'course_lecturer_rating' : 'blue',
                    'facilities_rating' : 'pink',
                    'student_support_rating' : 'purple',
                    'local_life_rating' : 'violet'},
             title="Mean Rating For Different Student Categories At The University of Lincoln")

fig2.update_yaxes(title = 'Mean rating (1-5)')
fig2.show()

Solution

  • df = pd.DataFrame(data= np.array([[0,0,0,1,1,1],[1,2,-1,4,5,6],[7,8,9,10,-1,12]]).T, columns = ['Category', 'A', 'B'])
    df1 = df.applymap(lambda x: x if x!= -1 else np.NaN)
    df1.groupby(['Category']).mean()
    

    The logic is extremely simple: substitute those '-1' with NaN and forget about them