I am trying to highlight a specific City value (categorical) in my seaborn Barplot but every time I feed it the x condition it highlights the wrong bar. e.g. below - I am trying to highlight Los Angeles but it highlights San Francisco instead
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.formula.api import ols
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
<h3> Part A: What Predicts the Long-term Home Price Appreciation of a City? </h3>
# Set up directory for data imports
os.chdir("***")
df = pd.read_excel('W02b_homeprice.xlsx')
# NOTE: Data was modified to remove "(Metropolitan Statistical Area)from entire MSA column for easier data manipulation"
Plotting Correlation Chart for all the variables
Replicating Figure 1: Single-Family Home Price Appreciation from 1995Q1 to 2012Q3 for the 30 Largest Metropolitan Areas in the U.S.
df30 = df[df['pop03']>=1700000]
plt.figure(figsize=(30,20))
plt.title("Single-Family Home Price Appreciation from 1995Q1 to 2012Q3 for the 30 Largest Metropolitan Areas in the U.S.", fontsize = 30)
cols = ['red' if x == "Los Angeles" else 'green' for x in df30.MSA]
# NOTE: Return to this - something is off with the x that is being highlighted!
sns.barplot(x="MSA", y="homepriceg", data=df30, palette = cols,
order=df30.sort_values("homepriceg", ascending=False).MSA)
plt.xlabel("")
plt.ylabel("%", size=30)
plt.xticks(fontsize=20, rotation=60)
plt.yticks(fontsize=20)
sns.set_style("whitegrid")
plt.show()
As you can see - my code currently highlights the "San Francisco" bar vs "Los Angeles" not sure what I am doing wrong. I have tried other states and it still highlights the wrong state. Which is what makes this confusing to debug. New to using seaborn and python.
You are using an old, deprecated way to assign colors. In the current seaborn versions, you are expected to use a hue
column for coloring. If the color depends on the x-values, you can set hue='MSA'
(same as x=
). Instead of giving the colors as a list, you can create a dictionary that maps a city name to a color.
In your code, the bars are colored left to right, but you got the order of the bars from sort_values()
while the coloring used the original order of the cities.
By default, the x tick labels are aligned via the center of the label text. Right aligning looks nicer. plt.tight_layout()
recalculates the white space to nicely fit all labeling.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# create some dummy test data similar to given one
cities =['Long Beach', 'Irvine', 'San Diego', 'Ontario', 'San Jose', 'San Francisco', 'Anaheim', 'Glendale', 'Huntington Beach', 'Chula Vista', 'Riverside', 'Rancho Cucamonga', 'Fontana', 'San Bernardino', 'Modesto', 'Santa Clarita', 'Santa Ana', 'Fresno', 'Fremont', 'Bakersfield', 'Oxnard', 'Oceanside', 'Stockton', 'Los Angeles', 'Elk Grove', 'Santa Rosa', 'Moreno Valley', 'Oakland', 'Sacramento', 'Garden Grove']
df30 = pd.DataFrame({'MSA': cities, 'homepriceg': np.random.randint(10000, 100000, len(cities))})
sns.set_style("whitegrid") # should be called before creating the figure
plt.figure(figsize=(12, 8))
colors = {x: 'crimson' if x == 'Los Angeles' else 'limegreen' for x in df30['MSA']}
sns.barplot(x='MSA', y='homepriceg', hue='MSA', data=df30, palette=colors,
order=df30.sort_values('homepriceg', ascending=False)['MSA'])
plt.xticks(fontsize=12, rotation=60, ha='right')
plt.yticks(fontsize=12)
plt.xlabel('')
plt.tight_layout() # fit labels nicely into the plot
plt.show()
PS: The dictionary for the colors now looks like:
{'Long Beach': 'limegreen', 'Irvine': 'limegreen', 'San Diego': 'limegreen', ...
'Los Angeles': 'crimson', ...
}