Search code examples
pythonpandasgeographycartopy

Plotting pandas csv data onto cartopy map


new to python and coding generally so please forgive any obvious mistakes. I'm having some trouble with plotting urban population data (number of people living in urban area per country in thousands) onto a cartopy map. The population data is in a CSV file indexed by country, lat, lon, and year- with years from 1950-2050. I've used def to write a function so I can enter the year and get a plot of that years population by country. I want the markers to be proportional in size to the population for each of the countries. However, it seems that the marker size plotted is proportional to the position of the country in the list so that countries at the top of the list (which is arranged alphabetically) have a smaller marker e.g Brazil has a small marker depsite having a large urban population. Any help would be greatly appreciated. Heres the code:

import pandas as pd
from matplotlib.animation import FuncAnimation
import cartopy.crs as ccrs
import cartopy.feature as cfeature

country_urban_pop = pd.read_csv('/Users/myusername/Desktop/urbanisation_data.csv')

def urban_pop_plot(year):
    lat, lon = country_urban_pop['latitude'], country_urban_pop['longitude']
    population = country_urban_pop[year]
    fig = plt.figure(figsize=(20, 16))
    ax = plt.axes(projection=ccrs.PlateCarree())
    ax.add_feature(cfeature.LAND)
    ax.add_feature(cfeature.OCEAN)
    ax.add_feature(cfeature.COASTLINE)
    ax.add_feature(cfeature.BORDERS, linestyle=':')
    ax.add_feature(cfeature.LAKES, alpha=0.5)
    ax.add_feature(cfeature.RIVERS)
    ax.coastlines()
    ax.set_global()
    ax.gridlines()
    ax.stock_img()
    plt.scatter(lon, lat, transform=ccrs.PlateCarree(), \
        label=None, c=population, cmap='Oranges', linewidth=0, alpha=0.5)
    plt.axis(aspect='equal')
    plt.xlabel('longitude')
    plt.ylabel('latitude')
    plt.colorbar(label='population')
    plt.clim(0, 10)

urban_pop_plot('1950') 

Solution

  • Matplotlib.pyplot.scatter accepts a parameter "s" for a scalar or an array for the marker size in Points (https://matplotlib.org/api/_as_gen/matplotlib.pyplot.scatter.html).

    Assuming that "population" contains the annual population you would want to cast it as a numpy array: population_array = np.array(population) and normalize it, so that you get a value in points that makes sense. A good starting point for that might be to standardize it to values between 0 and 1 and then multiply it by a suitable scalar. Guidance on normalizing data here: https://stackoverflow.com/a/41532180/8766814.