Search code examples
pythonpandasbar-chartstandard-deviation

plot of the standard deviations by species - python


I'm trying to develop a standard dev plot for species but resulting graph for all equal lines doesn't really make much sense. Could someone let me know if this happens because of something I'm doing wrong or just not doing previously?

And I don't get it either why they're reaching 14 when it's 50 for each specie

from sklearn.datasets import load_iris
import pandas as pd
import seaborn as sns
iris = load_iris()
iris_df=pd.DataFrame(iris.data, columns=iris.feature_names)
iris_df['species_id'] = iris.target
iris_df['species_id'] = iris_df['species_id'].replace([0,1,2],iris.target_names)
iris_df['x_pos'] = np.arange(len(iris_df))
print(iris_df)

plt.figure(figsize=(10,5))
ax = sns.barplot(x = "species_id", y = "x_pos", data = iris_df, estimator = np.std)
ax.set_xlabel("Frequency", fontsize = 10)
ax.set_ylabel("Species", fontsize = 10)
ax.set_title("Standard Deviation of Species", fontsize = 15)

Solution

  • x_pos increases by 1 for each row. the dataset is ordered by species, & there are 50 measurements per species, so for each species, you'll get the same standard deviation.

    the following plot would help to explain why:

    sns.scatterplot(x='x_pos', y=1, hue='species_id', data=iris_df)
    

    enter image description here

    the standard deviation of a series of integers from 0 to 49 is the same as the standard deviation of a series of integers from 50 to 99 and so on.

    More interesting plots would be the standard deviation of any feature. example:

    ax = sns.barplot(
        x='species_id',
        y='sepal length (cm)',
        data=iris_df,
        estimator=np.std
    )
    ax.set_xlabel('Frequency', fontsize=10)
    ax.set_ylabel('Species', fontsize=10)
    ax.set_title('StdDev of Sepal Length', fontsize=15)
    

    iris dataset standard deviation of sepal length by species