Search code examples
pandasmatplotlibmapsgeopandas

Why I cannot do a heatmap of UK counties with python?


I am trying to create a heatmap with UK counties map. I have tried for several weeks and did not succeed.

Here is a sample of data I have:

county_name normalized_count    latitude    longitude
20  Army    2.146520    51.5074 -0.1278
31  Bedfordshire    17.960133   52.1387 -0.4669
35  Berks & Bucks   20.326667   51.4664 -0.9731
46  Birmingham  6.846455    52.4862 -1.8904
57  Cambridgeshire  21.258065   52.2053 0.1218
68  Cheshire    12.232529   53.2326 -2.6103
79  Cornwall    16.530534   50.2660 -5.0527
90  Cumberland  8.604545    54.6633 -3.1828
101 Derbyshire  14.582173   53.1224 -1.5564
112 Devon   12.771468   50.7156 -3.5309
123 Dorset  10.192857   50.7488 -2.3445
134 Durham  8.644309    54.7761 -1.5733
145 East Riding 7.403646    53.7577 -0.7798
156 Essex   8.894873    51.7340 0.4691
167 Gloucestershire 12.392987   51.8642 -2.2382
177 Guernsey    43.000000   49.4482 -2.5895
188 Hampshire   12.022789   51.0577 -1.3080
199 Herefordshire   12.397727   52.0765 -2.6544
210 Hertfordshire   10.315981   51.7670 -0.2087
221 Huntingdonshire 21.110000   52.3302 -0.1759
232 Isle of Man 36.088235   54.2361 -4.5481
243 Jersey  44.766667   49.2144 -2.1312
254 Kent    12.353401   51.2787 0.5216
265 Lancashire  12.315197   53.7632 -2.7034
287 Lincolnshire    10.234957   53.2344 -0.5384
298 Liverpool   7.028662    53.4084 -2.9916
309 London  5.495880    51.5074 -0.1278
320 Manchester  8.221386    53.4808 -2.2426
331 Middlesex   9.389552    51.5537 -0.3177
342 Norfolk 14.648590   52.6309 1.2974
353 North Riding    10.763959   54.2641 -1.3157
364 Northamptonshire    11.497890   52.2405 -0.9027
375 Northumberland  8.936893    55.1790 -1.8262
386 Nottinghamshire 11.402954   53.1004 -1.0502
397 Oxfordshire 20.697802   51.7519 -1.2578
406 RAF 2.134831    52.3610 -0.7020
408 Royal Navy  0.196319    50.7989 -1.0912
419 Sheffield & Hallamshire 7.512452    53.3811 -1.4701
430 Shropshire  14.255435   52.7070 -2.7549
441 Somerset    12.581616   51.1000 -3.0000
452 Staffordshire   7.322162    52.8054 -2.1177
463 Suffolk 15.604027   52.1872 0.9708
474 Surrey  9.643979    51.3148 -0.5599
485 Sussex  20.224490   50.9097 -0.2079
505 West Riding 8.980066    53.8008 -1.5491
516 Westmorland 8.141026    54.4493 -2.7515
527 Wiltshire   14.117438   51.3524 -1.9930
538 Worcestershire  13.275410   52.1920 -2.2216

Now the way I have obtained the latitude and longitude for counties was through chatgpt, and created a data dictionary, which was then used to map it to my data.This is the code:

county_coordinates = {
   'Amateur Football Alliance': (51.509865, -0.118092),
   'Army': (51.5074, -0.1278),
   'Bedfordshire': (52.1387, -0.4669),
   'Berks & Bucks': [(51.4664, -0.9731), (51.8099, -1.0201)],  # Berkshire and Buckinghamshire
   'Birmingham': (52.4862, -1.8904),
   'Cambridgeshire': (52.2053, 0.1218),
   'Cheshire': (53.2326, -2.6103),
   'Cornwall': (50.2660, -5.0527),
   'Cumberland': (54.6633, -3.1828),
   'Derbyshire': (53.1224, -1.5564),
   'Devon': (50.7156, -3.5309),
   'Dorset': (50.7488, -2.3445),
   'Durham': (54.7761, -1.5733),
   'East Riding': (53.7577, -0.7798),
   'Essex': (51.7340, 0.4691),
   'Gloucestershire': (51.8642, -2.2382),
   'Guernsey': (49.4482, -2.5895),
   'Hampshire': (51.0577, -1.3080),
   'Herefordshire': (52.0765, -2.6544),
   'Hertfordshire': (51.7670, -0.2087),
   'Huntingdonshire': (52.3302, -0.1759),
   'Isle of Man': (54.2361, -4.5481),
   'Jersey': (49.2144, -2.1312),
   'Kent': (51.2787, 0.5216),
   'Lancashire': (53.7632, -2.7034),
   'Leicestershire:  ': [(52.6369, -1.1398), (52.6712, -0.7558)],  # Leicestershire and Rutland
   'Lincolnshire': (53.2344, -0.5384),
   'Liverpool': (53.4084, -2.9916),
   'London': (51.5074, -0.1278),
   'Manchester': (53.4808, -2.2426),
   'Middlesex': (51.5537, -0.3177),
   'Norfolk': (52.6309, 1.2974),
   'Northamptonshire': (52.2405, -0.9027),
   'North Riding': (54.2641, -1.3157),
   'Northumberland': (55.1790, -1.8262),
   'Nottinghamshire': (53.1004, -1.0502),
   'Oxfordshire': (51.7519, -1.2578),
   'RAF': (52.3610, -0.7020),
   'Royal Navy': (50.7989, -1.0912),
   'Sheffield & Hallamshire': [(53.3811, -1.4701), (53.3640, -1.5189)],  # Sheffield and Hallamshire
   'Shropshire': (52.7070, -2.7549),
   'Somerset': (51.1000, -3.0000),
   'Staffordshire': (52.8054, -2.1177),
   'Suffolk': (52.1872, 0.9708),
   'Surrey': (51.3148, -0.5599),
   'Sussex': (50.9097, -0.2079),
   'The FA': (51.5560, -0.2797),  # Football Association
   'West Riding': (53.8008, -1.5491),
   'Westmorland': (54.4493, -2.7515),
   'Wiltshire': (51.3524, -1.9930),
   'Worcestershire': (52.1920, -2.2216)
}

# Add latitude and longitude columns to the normalized_df
normalized_df['latitude'] = normalized_df['county_name'].map(lambda x: county_coordinates.get(x, (None, None))[0])
normalized_df['longitude'] = normalized_df['county_name'].map(lambda x: county_coordinates.get(x, (None, None))[1])

Here is a code I have tried:

# Load the shapefile
gb_100km = gpd.read_file('mapfiles/gb_100km.shp')

This is how my gb_100km looks like:

CELLCODE    EOFORIGIN   NOFORIGIN   geometry
0   100kmE27N29 2700000 2900000 POLYGON ((2700000 2900000, 2700000 3000000, 28...
1   100kmE27N30 2700000 3000000 POLYGON ((2700000 3000000, 2700000 3100000, 28...
2   100kmE28N28 2800000 2800000 POLYGON ((2800000 2800000, 2800000 2900000, 29...
3   100kmE28N29 2800000 2900000 POLYGON ((2800000 2900000, 2800000 3000000, 29...
4   100kmE28N30 2800000 3000000 POLYGON ((2800000 3000000, 2800000 3100000, 29...
... ... ... ... ...
139 100kmE38N44 3800000 4400000 POLYGON ((3800000 4400000, 3800000 4500000, 39...
140 100kmE38N45 3800000 4500000 POLYGON ((3800000 4500000, 3800000 4600000, 39...
141 100kmE39N35 3900000 3500000 POLYGON ((3900000 3500000, 3900000 3600000, 40...
142 100kmE39N36 3900000 3600000 POLYGON ((3900000 3600000, 3900000 3700000, 40...
143 100kmE39N37 3900000 3700000 POLYGON ((3900000 3700000, 3900000 3800000, 40...

this is the other code:

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Point


# Load the UK shapefile (replace with your actual path)
uk_shapefile = gpd.read_file('mapfiles/gb_100km.shp')

# Drop rows with missing latitude or longitude
normalized_df_with_latlong = normalized_df_with_latlong.dropna(subset=['latitude', 'longitude'])
# Extract latitude and longitude values if they are tuples
normalized_df_with_latlong['latitude'] = [lat for lat, lon in normalized_df_with_latlong['latitude']]
normalized_df_with_latlong['longitude'] = [lon for lat, lon in normalized_df_with_latlong['longitude']]

# Convert latitude and longitude to floats
#normalized_df_with_latlong['latitude'] = normalized_df_with_latlong['latitude'].astype(float)
#normalized_df_with_latlong['longitude'] = normalized_df_with_latlong['longitude'].astype(float)

# Create a GeoDataFrame from your data
counties = gpd.GeoDataFrame(
    normalized_df_with_latlong,
    geometry=gpd.points_from_xy(normalized_df_with_latlong.longitude, normalized_df_with_latlong.latitude),
    crs='EPSG:4326'
)

# Ensure both GeoDataFrames use the same CRS
counties = counties.to_crs(uk_shapefile.crs)

# Create a figure and axes
fig, ax = plt.subplots(1, 1, figsize=(10, 6))

# Plot the UK shapefile
uk_shapefile.plot(ax=ax, color='lightgray', edgecolor='black')

# Plot the normalized counts as a heatmap
counties.plot(
    column='normalized_count', 
    ax=ax, 
    cmap='Reds', 
    legend=True, 
    markersize=50, 
    alpha=0.7
)

# Set title and labels
ax.set_title('Normalized Counts by Case Type in UK')
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')

# Adjust layout and show the plot
plt.tight_layout()
plt.show()

Trying to understand why I cannot get this map plotted and why this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[77], line 18
     16 normalized_df_with_latlong = normalized_df_with_latlong.dropna(subset=['latitude', 'longitude'])
     17 # Extract latitude and longitude values if they are tuples
---> 18 normalized_df_with_latlong['latitude'] = [lat for lat, lon in normalized_df_with_latlong['latitude']]
     19 normalized_df_with_latlong['longitude'] = [lon for lat, lon in normalized_df_with_latlong['longitude']]
     21 # Convert latitude and longitude to floats
     22 #normalized_df_with_latlong['latitude'] = normalized_df_with_latlong['latitude'].astype(float)
     23 #normalized_df_with_latlong['longitude'] = normalized_df_with_latlong['longitude'].astype(float)
     24 
     25 # Create a GeoDataFrame from your data

TypeError: cannot unpack non-iterable float object

This was my last error. Had other errors and I think I have troubleshot them.


Solution

  • The reason as to why I could not get my counties plotted was related to the right shp file that I suppose to merge on.

    There are multiple shp files, based on which my data is merged. Depending on the column I merge, in my case - Counties - then the right selection of shp file should be taken into account. For example:

    gadm41_GBR_0.shp
    gadm41_GBR_1.shp
    gadm41_GBR_2.shp
    gadm41_GBR_3.shp
    gadm41_GBR_4.shp
    

    These files can be found here: https://geodata.ucdavis.edu/gadm/gadm4.1/shp/gadm41_GBR_shp.zip

    And this is my entire code:

    import pandas as pd
    import geopandas as gpd
    import matplotlib.pyplot as plt
    # Load the UK shapefile (ensure this is the correct path to your UK counties shapefile)
    uk_shapefile_path = "shp_files/gadm41_GBR_3.shp"
    uk = gpd.read_file(uk_shapefile_path)
    # Merge the shapefile with the data based on county name
    
    # Merge the shapefile with the data based on county name
    # You may need to adjust 'county_name' to match the column in the shapefile, like 'NAME' or 'COUNTY'
    merged = uk.merge(df, left_on='NAME_2', right_on='county_name', how='left')
    # Plot the heatmap of counties with the normalized count as the color scale
    fig, ax = plt.subplots(1, 1, figsize=(10, 10))
    merged.boundary.plot(ax=ax, linewidth=1, color='black')
    merged.plot(column='normalized_count', ax=ax, cmap='YlOrBr', legend=False)
    
    
    ax.set_xticks([])
    ax.set_yticks([])
    plt.title('XXXX by Counties')
    plt.show()