Search code examples
pythonpandasscatter-plotline-plot

Draw scatter plot with lines to see increasing/decreasing trend for each class


I have dataframe with such structure:

country    strength    index_name     index
africa     0.75        5A             5
boston     0.65        5A             5
tga        0.89        5A             5
ollaw      0.45        5A             5

africa     0.69        80A            80
boston     0.35        80A            80
tga        0.81        80A            80
ollaw      0.33        80A            80
pica       0.29        80A            80

africa     0.70        150A           150
boston     0.47        150A           150
tga        0.40        150A           150
FSD        0.90        150A           150

We can see, the strength for africa decays from 0.75 (in 5A) to 0.69 (in 80A) to 0.70 (in 150A). Same increase or decrease for other cities/countries across differetn index_name. Some countries might be in one index_name and not present in other.

I am trying to plot a scatter plot, having names of countries on each point, but lines connecting across all index_names.

Something like this: enter image description here

Could this be done with sns?


Solution

  • With the dataframe you provided:

    import pandas as pd
    
    df = pd.DataFrame(
        {
            "country": [
                "africa",
                "boston",
                "tga",
                "ollaw",
                "africa",
                "boston",
                "tga",
                "ollaw",
                "pica",
                "africa",
                "boston",
                "tga",
                "FSD",
            ],
            "strength": [
                0.75,
                0.65,
                0.89,
                0.45,
                0.69,
                0.35,
                0.81,
                0.33,
                0.29,
                0.7,
                0.47,
                0.4,
                0.9,
            ],
            "index_name": [
                "5A",
                "5A",
                "5A",
                "5A",
                "80A",
                "80A",
                "80A",
                "80A",
                "80A",
                "150A",
                "150A",
                "150A",
                "150A",
            ],
            "index": [5, 5, 5, 5, 80, 80, 80, 80, 80, 150, 150, 150, 150],
        }
    )
    

    Here is one way to do it (using e Jupyter notebook):

    from matplotlib import pyplot as plt
    
    df = df.sort_values(by=["country", "index", "strength"]).reset_index(drop=True)
    
    fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(7, 4))
    
    for country in df["country"].unique():
        ax.plot(
            df.loc[df["country"] == country, "index"],
            df.loc[df["country"] == country, "strength"],
            ".",
            linestyle="-",
        )
        for x, y in zip(
            df.loc[df["country"] == country, "index"],
            df.loc[df["country"] == country, "strength"],
        ):
            ax.text(x, y, country)
    
    fig
    

    Output:

    enter image description here