Search code examples
pythonpython-3.xseaborn

sns barplot condition highlighting the wrong value


I am trying to highlight a specific City value (categorical) in my seaborn Barplot but every time I feed it the x condition it highlights the wrong bar. e.g. below - I am trying to highlight Los Angeles but it highlights San Francisco instead

    import os
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    import statsmodels.api as sm 
    from statsmodels.formula.api import ols
    from sklearn.linear_model import LinearRegression
    from sklearn.linear_model import LogisticRegression
    from sklearn import metrics
    
    <h3> Part A: What Predicts the Long-term Home Price Appreciation of a City? </h3>
    
    # Set up directory for data imports
    os.chdir("***")
    df = pd.read_excel('W02b_homeprice.xlsx') 
    # NOTE: Data was modified to remove "(Metropolitan Statistical Area)from entire MSA column for easier data manipulation"
    
    Plotting Correlation Chart for all the variables
    
    Replicating Figure 1: Single-Family Home Price Appreciation from 1995Q1 to 2012Q3 for the 30 Largest Metropolitan Areas in the U.S.
    
    df30 = df[df['pop03']>=1700000]
    plt.figure(figsize=(30,20))
    plt.title("Single-Family Home Price Appreciation from 1995Q1 to 2012Q3 for the 30 Largest Metropolitan Areas in the U.S.", fontsize = 30)
    cols = ['red' if x == "Los Angeles" else 'green' for x in df30.MSA]
     # NOTE: Return to this - something is off with the x that is being highlighted!
    sns.barplot(x="MSA", y="homepriceg", data=df30, palette = cols, 
                order=df30.sort_values("homepriceg", ascending=False).MSA) 
    plt.xlabel("")
    plt.ylabel("%", size=30)
    plt.xticks(fontsize=20, rotation=60)
    plt.yticks(fontsize=20)
    sns.set_style("whitegrid")
    plt.show()

As you can see - my code currently highlights the "San Francisco" bar vs "Los Angeles" not sure what I am doing wrong. I have tried other states and it still highlights the wrong state. Which is what makes this confusing to debug. New to using seaborn and python.

Highlights wrong city


Solution

  • You are using an old, deprecated way to assign colors. In the current seaborn versions, you are expected to use a hue column for coloring. If the color depends on the x-values, you can set hue='MSA' (same as x=). Instead of giving the colors as a list, you can create a dictionary that maps a city name to a color.

    In your code, the bars are colored left to right, but you got the order of the bars from sort_values() while the coloring used the original order of the cities.

    By default, the x tick labels are aligned via the center of the label text. Right aligning looks nicer. plt.tight_layout() recalculates the white space to nicely fit all labeling.

    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    import numpy as np
    
    # create some dummy test data similar to given one
    cities =['Long Beach', 'Irvine', 'San Diego', 'Ontario', 'San Jose', 'San Francisco', 'Anaheim', 'Glendale', 'Huntington Beach', 'Chula Vista', 'Riverside', 'Rancho Cucamonga', 'Fontana', 'San Bernardino', 'Modesto', 'Santa Clarita', 'Santa Ana', 'Fresno', 'Fremont', 'Bakersfield', 'Oxnard', 'Oceanside', 'Stockton', 'Los Angeles', 'Elk Grove', 'Santa Rosa', 'Moreno Valley', 'Oakland', 'Sacramento', 'Garden Grove']
    df30 = pd.DataFrame({'MSA': cities, 'homepriceg': np.random.randint(10000, 100000, len(cities))})
    
    sns.set_style("whitegrid") # should be called before creating the figure
    plt.figure(figsize=(12, 8))
    colors = {x: 'crimson' if x == 'Los Angeles' else 'limegreen' for x in df30['MSA']}
    sns.barplot(x='MSA', y='homepriceg', hue='MSA', data=df30, palette=colors,
                order=df30.sort_values('homepriceg', ascending=False)['MSA'])
    plt.xticks(fontsize=12, rotation=60, ha='right')
    plt.yticks(fontsize=12)
    plt.xlabel('')
    plt.tight_layout() # fit labels nicely into the plot
    plt.show()
    

    sns.barplot with one bar highlighted

    PS: The dictionary for the colors now looks like:

    {'Long Beach': 'limegreen', 'Irvine': 'limegreen', 'San Diego': 'limegreen', ... 
     'Los Angeles': 'crimson', ...
    }