First of all; sorry if what I am writing here is not up to stackoverflow standards, I am trying my best.
I have a dataframe
with around 18k rows and 89 columns with information about football players.
For example I need to plot a line graph to visualize the connection between age and overall rating of a player.
But when I plot a line for that with:
fig = px.line(df, x="Age", y="Overall")
fig.show()
This is the result:
This is obviously not a good visualization.
I want to plot the average rating for every age, so its a single line which shows the connection between age and overall rating. Is there an easy function for plotting or do I have to create the right data myself?
It sounds like what you might want to do here is groupby()
on "age" and then average on "overall" to create a final dataframe before plugging into that plotting function.
Roughly,
import pandas as pd
data = {
"age": [1, 1, 2, 2, 3, 3],
"overall": [50, 100, 1, 1, 600, 700],
# clarifies how to select the correct column to average
"irrelevant": [1, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
new_df = df.groupby('age')['overall'].mean()
new_df
# age
# 1 75.0
# 2 1.0
# 3 650.0
# Name: overall, dtype: float64
Alternatively, you can use a scatter plot if you're comfortable having the individual points show the trend instead. Sometimes scatter plots are useful for this situation since a line of averages might have very different samples sizes at each point on the x axis, so you can lose information by making a line.