My shapefile has some missing values (represented by nan
) on certain columns (for example, GDP). When plotting without dealing with those missing values, the legend shows like this:
which is not what I want.
So, I replace the missing values with a string "missing", then redo the plotting. Not surprisingly, I got the error message saying that TypeError: '<' not supported between instances of 'str' and 'float'
.
My questions are: 1. how does Geopandas treat missing values? Does it store the missing values in a string or some other types of data? 2. How can I keep those missing values and redo the plotting with the legend label show the missingness?
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import pysal.viz.mapclassify as mc
from matplotlib.colors import rgb2hex
from matplotlib.colors import ListedColormap
plt.style.use('seaborn')
gdf = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
# generate random data
gdf['random'] = np.random.normal(100, 10, len(gdf))
# assign missing values
gdf.loc[np.random.choice(gdf.index, 40), 'random'] = np.nan
The basic idea here is to create a category/string column based on the categorization method (e.g., quantiles, percentiles, etc.) you want to use for your numberical data. After that, we plot that string column so that we can pass a customized colormap (with a grey color to represent missing values).
# categorize the numerical column
k = 5
quantiles = mc.Quantiles(gdf.random.dropna(), k=k)
gdf['random_cat'] = quantiles.find_bin(gdf.random).astype('str')
gdf.loc[gdf.random.isnull(), 'random_cat'] = 'No Data'
# add grey to a colormap to represent missing value
cmap = plt.cm.get_cmap('Blues', k)
cmap_list = [rgb2hex(cmap(i)) for i in range(cmap.N)]
cmap_list.append('grey')
cmap_with_grey = ListedColormap(cmap_list)
# plot map
fig, ax = plt.subplots(figsize=(12, 10))
gdf.plot(column='random_cat', edgecolor='k', cmap=cmap_with_grey,
legend=True, legend_kwds=dict(loc='center left'),
ax=ax)
# get all upper bounds in the quantiles category
upper_bounds = quantiles.bins
# get and format all bounds
bounds = []
for index, upper_bound in enumerate(upper_bounds):
if index == 0:
lower_bound = gdf.random.min()
else:
lower_bound = upper_bounds[index-1]
bound = f'{lower_bound:.2f} - {upper_bound:.2f}'
bounds.append(bound)
# get all the legend labels
legend_labels = ax.get_legend().get_texts()
# replace the numerical legend labels
for bound, legend_label in zip(bounds, legend_labels):
legend_label.set_text(bound)
You may want to take a look at the following posts:
format/round numerical legend label in GeoPandas
Extract matplotlib colormap in hex-format
Matplotlib.colors.ListedColormap in python
Change main plot legend label text
Update as of geopandas 0.8.1
:
You can now simply pass a missing_kwds
arg in the plot function:
fig, ax = plt.subplots(figsize=(12, 10))
missing_kwds = dict(color='grey', label='No Data')
gdf.plot(column='random', scheme='Quantiles', k= 5,
legend=True, legend_kwds=dict(loc='center left'),
missing_kwds=missing_kwds, ax=ax)