Search code examples
pythonplotlymapboxgeopandaschoropleth

Plotly Choroplethmapbox not showing all polygons


I'm having an odd issue with Plotly, the image below will give some context:

This is the map made with Bokeh

This is the map made with Plotly

The same transformation steps are applied to both versions, however for some reason Plotly will exclude some of the shapes.

These are the transformation steps I am using:

import pandas as pd
import plotly.io as pio
import plotly.graph_objs as go
import json
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely import wkt
from bokeh.plotting import save, figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar
from bokeh.io import show, output_file
from bokeh.palettes import brewer

df_test = pd.read_csv(f'{filepath}')
df_blocks = pd.read_csv(f'{filepath}')
group_2 = df_test[['geo_name', 'edited_characteristics', 'total', 'male', 'female']]
group_2 = group_2.pivot(index='geo_name', columns='edited_characteristics', values=['total', 'male', 'female'])
cat = 'Total - Low-income status in 2015 for the population in private households to whom low-income concepts are applicable - 100% data'
group_2['LIM 0-17 percent'] = (
        group_2[( 'total', f'{cat}//0 to 17 years')] /
        group_2[( 'total', cat)]
        )
group_2.reset_index(inplace=True)
g2 = group_2[['geo_name', 'LIM 0-17 percent']]
g2.rename(columns={'geo_name': 'DAUID'}, inplace=True)
df_g2 = pd.merge(g2, df_blocks, on='DAUID')
df_g2['geometry'] = df_g2['geometry'].apply(wkt.loads)

geo_df_g2 = gpd.GeoDataFrame(df_g2, geometry='geometry')
geo_df_g2.crs = {'init': 'epsg:3347'}
geo_df_g2 = geo_df_g2.to_crs({'init': 'epsg:4326'})
geo_df_g2 = geo_df_g2[geo_df_g2[('LIM 0-17 percent', '')] < 1]
mean = geo_df_g2[('LIM 0-17 percent', '')].mean()
std = geo_df_g2[('LIM 0-17 percent', '')].std()
geo_df_g2 = geo_df_g2[(geo_df_g2[('LIM 0-17 percent', '')] < (mean - 1 
    * std)) | (geo_df_g2[('LIM 0-17 percent', '')] > (mean + 1 * std))]
geo_df_g2.columns = [x[0] if type(x) is tuple else x for x in 
    geo_df_g2.columns]
geo_df_g2 = geo_df_g2.loc[:, ~geo_df_g2.columns.duplicated()]
geo_df_g2_j = geo_df_g2.copy()
geo_df_g2_j['DAUID'] = geo_df_g2_j['DAUID'].astype(str)
geo_df_g2_j.set_index('DAUID', inplace=True)
geo_df_g2_json = json.loads(geo_df_g2_j.to_json())

USING PLOTLY

geo_df_g2 = geo_df_g2[['DAUID', 'LIM 0-17 percent']]
geo_df_g2['DAUID'] = geo_df_g2['DAUID'].astype(str)
fig = go.Figure(go.Choroplethmapbox(geojson=geo_df_g2_json,
                                    locations=geo_df_g2['DAUID'],
                                    z=geo_df_g2['LIM 0-17 percent'],
                                    colorscale='Viridis',
                                    zauto=True,
                                    marker_opacity=0.5,
                                    marker_line_width=0.5)
                )
fig.update_layout(mapbox_style='white-bg',
                  #mapbox_accesstoken=mapbox_token,
                  mapbox_zoom=12,
                  mapbox_center={'lat': 45.41117, 'lon': -75.69812})
fig.update_layout(margin={'r':0, 't':0, 'l':0, 'b':0})
pio.renderers.default = 'browser'
fig.show()

USING BOKEH

json_data = json.dumps(geo_df_g2_json)

geosource = GeoJSONDataSource(geojson=json_data)
palette = brewer['YlGnBu'][8]
palette = palette[::-1]
color_mapper = LinearColorMapper(palette = palette, low = 0, high = 40)
    tick_labels = {'0': '0%', '5': '5%', '10':'10%', '15':'15%', 
    '20':'20%', '25':'25%', '30':'30%','35':'35%', '40': '>40%'}
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width 
    = 500, height = 20,
    border_line_color=None,location = (0,0), orientation = 
    'horizontal', major_label_overrides = tick_labels)
p = figure(title='LIM', plot_height=600, plot_width=950, 
    toolbar_location=None)
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.patches('xs', 'ys', source=geosource, fill_color={'field': 'LIM 0-17 percent', 'transform': color_mapper}, line_color='black', line_width=0.25, fill_alpha=1)
output_file('test_bokeh.html')
show(p)

As you could see, they both use the same projections, same dataframe transformation, and the same categories. Is there a way to fix this?

TIA

EDIT: The shapes are in the correct position, there are just a lot of them missing from the plot.

UPDATE: In hopes of seeing if other Plotly modules could solve the problem, I kind of narrowed down the issue. Using the tutorial on Plotly for creating a Scattermapbox, the way they called the mapbox features worked better at revealing the inherit problems than the tutorial did on the Choroplethmapbox. Apparently what is happening is that Plotly (or Mapbox) is not recognizing several groups of nearby points as coordinates for a polygon, and hence excluding them until you specify that you want them present. This is done by setting the mapbox dictionary values for 'type' to either 'fill', 'line', or 'circle'. This of course leads to another issue, whereby those new shapes are not colored or labelled the same way as the original polygons since they were not there by default.

Here is the code sample that helps show the problem with the polygon points not forming a complete shape:

fig = go.Figure(go.Choroplethmapbox(geojson=geo_df_g2_json,

                                    locations=geo_df_g2['DAUID'],
                                    z=geo_df_g2['LIM 0-17 percent'],
                                    below='traces',
                                    colorscale='Viridis',
                                    zauto=True,
                                    marker_opacity=0.5,
                                    marker_line_width=0.5)
                        )
fig.update_layout(
        mapbox = {
            'style': 'carto-positron',
            'center': {'lat': 45.41117, 'lon': -75.69812},
            'zoom': 12, 'layers': [{
                'source': {
                    'type': "FeatureCollection",
                    'features': geo_df_g2_json['features']
                },
            'type': 'fill', 'below': 'traces', 'color': 'lightblue'}]},
        margin = {'l':0, 'r':0, 'b':0, 't':0})
fig.show()

To clarify my intent, there are two questions I'm trying to answer:

  1. Why does Plotly transform some polygon coordinates to a shape and others to just the individual points?

  2. Is there a workaround to fill the shapes after using the above function, based on the 'z' value?


Solution

  • I found out what was causing the polygons to disappear. Since Plotly uses geojson files vs. interacting with geopandas dataframes (I believe that's the reason), it has more stringent requirements on data formatting. Other libraries like Bokeh, contextily, or geopandas aggregate multiple rows of polygons that share a common parent before plotting them, whereas Plotly looks at them individually. In my case, since each 'id' had mutliple sub-ids, each with their own polygon coordinates, Plotly would just pick one when plotting them. It would store the rest as points, and it would only display them if I used the 'fill' option. Here is a rough example of what my dataframe looked like:

    DAUID DBUID Total geometry
    001   00101 5     Polygon(x1, y1)
    001   00102 5     Polygon(x2, y2)
    001   00103 5     Polygon(x3, y3)
    

    So while the primary id and the total values stayed constant, the geometries did not. I found this out by accident when trying to write a color mapper and noticed I had duplicate entries for the DAUID. At the end, it was my fault for not using the correct database.

    It looks like Plotly will be introducing geopandas support soon, so I would be curious to see if it resolves edge cases like this.