Search code examples
pythonpandasplotlylineaverage

How to plot average value lines and not every single value in Plotly


First of all; sorry if what I am writing here is not up to stackoverflow standards, I am trying my best.

I have a dataframe with around 18k rows and 89 columns with information about football players.

For example I need to plot a line graph to visualize the connection between age and overall rating of a player.

But when I plot a line for that with:

fig = px.line(df, x="Age", y="Overall")
fig.show()

This is the result:

Bad Result

This is obviously not a good visualization.

I want to plot the average rating for every age, so its a single line which shows the connection between age and overall rating. Is there an easy function for plotting or do I have to create the right data myself?


Solution

  • It sounds like what you might want to do here is groupby() on "age" and then average on "overall" to create a final dataframe before plugging into that plotting function.

    Roughly,

    import pandas as pd
    
    data = {
        "age": [1, 1, 2, 2, 3, 3],
        "overall": [50, 100, 1, 1, 600, 700],
        # clarifies how to select the correct column to average
        "irrelevant": [1, 1, 1, 1, 1, 1]
    }
    
    df = pd.DataFrame(data)
    new_df = df.groupby('age')['overall'].mean()
    new_df
    
    # age
    # 1     75.0
    # 2      1.0
    # 3    650.0
    # Name: overall, dtype: float64
    

    Alternatively, you can use a scatter plot if you're comfortable having the individual points show the trend instead. Sometimes scatter plots are useful for this situation since a line of averages might have very different samples sizes at each point on the x axis, so you can lose information by making a line.