Search code examples
plotnangeopandas

use Geopandas plot missing values


My shapefile has some missing values (represented by nan) on certain columns (for example, GDP). When plotting without dealing with those missing values, the legend shows like this:

enter image description here

which is not what I want. So, I replace the missing values with a string "missing", then redo the plotting. Not surprisingly, I got the error message saying that TypeError: '<' not supported between instances of 'str' and 'float'.

My questions are: 1. how does Geopandas treat missing values? Does it store the missing values in a string or some other types of data? 2. How can I keep those missing values and redo the plotting with the legend label show the missingness?


Solution

  • import numpy as np
    import matplotlib.pyplot as plt
    import geopandas as gpd
    import pysal.viz.mapclassify as mc
    from matplotlib.colors import rgb2hex
    from matplotlib.colors import ListedColormap
    plt.style.use('seaborn')
    
    gdf = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
    # generate random data
    gdf['random'] = np.random.normal(100, 10, len(gdf))
    # assign missing values
    gdf.loc[np.random.choice(gdf.index, 40), 'random'] = np.nan
    

    The basic idea here is to create a category/string column based on the categorization method (e.g., quantiles, percentiles, etc.) you want to use for your numberical data. After that, we plot that string column so that we can pass a customized colormap (with a grey color to represent missing values).

    # categorize the numerical column
    k = 5
    quantiles = mc.Quantiles(gdf.random.dropna(), k=k)
    gdf['random_cat'] = quantiles.find_bin(gdf.random).astype('str')
    
    gdf.loc[gdf.random.isnull(), 'random_cat'] = 'No Data'
    
    # add grey to a colormap to represent missing value
    cmap = plt.cm.get_cmap('Blues', k)
    cmap_list = [rgb2hex(cmap(i)) for i in range(cmap.N)]
    cmap_list.append('grey')
    cmap_with_grey = ListedColormap(cmap_list)
    
    # plot map
    fig, ax = plt.subplots(figsize=(12, 10))
    gdf.plot(column='random_cat', edgecolor='k', cmap=cmap_with_grey,
             legend=True, legend_kwds=dict(loc='center left'),
             ax=ax)
    
    # get all upper bounds in the quantiles category
    upper_bounds = quantiles.bins
    # get and format all bounds
    bounds = []
    for index, upper_bound in enumerate(upper_bounds):
        if index == 0:
            lower_bound = gdf.random.min()
        else:
            lower_bound = upper_bounds[index-1]
    
        bound = f'{lower_bound:.2f} - {upper_bound:.2f}'
        bounds.append(bound)
    
    # get all the legend labels
    legend_labels = ax.get_legend().get_texts()
    # replace the numerical legend labels
    for bound, legend_label in zip(bounds, legend_labels):
        legend_label.set_text(bound)
    

    enter image description here

    You may want to take a look at the following posts:

    format/round numerical legend label in GeoPandas

    Extract matplotlib colormap in hex-format

    Matplotlib.colors.ListedColormap in python

    Change main plot legend label text


    Update as of geopandas 0.8.1:

    You can now simply pass a missing_kwds arg in the plot function:

    fig, ax = plt.subplots(figsize=(12, 10))
    
    missing_kwds = dict(color='grey', label='No Data')
    
    gdf.plot(column='random', scheme='Quantiles', k= 5,
             legend=True, legend_kwds=dict(loc='center left'),
             missing_kwds=missing_kwds, ax=ax)