Search code examples
python-3.xgisprojectionnetcdfcartopy

Globcolour data and projection error in Python


I'm having trouble displaying some data from Globcolour (1), due to the projection used with the matplotlib and cartopy definition of the image.

I downloaded a Total Suspended Matter image in NetCDF format (here is the data enter link description here), and when I tried to display it, along with a coastline from the cartopy package, there is a notorious gap between the coastline and the data. As you can see below, the pixels should be next to the coastline (black line), and not surpassed into the land (yellow pixels in the flags image) enter image description here

This shouldn't happen. I check using QGIS and loading directly the netcdf file the coastline is set correctly.

Initially I used a PlateeCarrer projection for the image, considering that if the image was in WGS84 they would match, but clearly they don't. I've tried using the transform option in the matplotlib function but I haven't made it work. Either the gap remains, or the coordinates of the figure change to projected ones and my data (which is in geographical coordinates) disappear.

The attributes of the NetCDF file are:

  'grid_type': 'Equirectangular',
 'spatial_resolution': 4.6383123,
 'nb_equ_bins': 55,
 'registration': 5,
 'lat_step': 0.041666668,
 'lon_step': 0.041666668,
 'earth_radius': 6378.137,
 'max_north_grid': 11.124998,
 'max_south_grid': 9.27,
 'max_west_grid': -86.25,
 'max_east_grid': -83.97,
 'northernmost_latitude': 11.124998,
 'southernmost_latitude': 9.249998,
 'westernmost_longitude': -86.25,
 'easternmost_longitude': -84.0,
 'nb_grid_bins': 2475,
 'nb_bins': 2475,
 'pct_bins': 100.0,
 'nb_valid_bins': 1089,
 'pct_valid_bins': 44.0,
 'netcdf_version': '4.3.3.1 of Jul  8 2016 18:15:50 $',
 'DPM_reference': 'GC-UD-ACRI-PUG',
 'IODD_reference': 'GC-UD-ACRI-PUG'}

The code that I'm using to plot the image is:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import cartopy.crs as ccrs
import dill as pickel



def paint_maps(df_std=None, fecha=1, attributes=None,
               savefol='/media/felipe/TOSHIBA EXT/iMARES/Investigacion/2019_MariculturaPacifico/DB/figures/',
               disp_fig=0):

    """Función para dibujar los datos contenidos en los archivos netCDF de SST, Salinidad y propiedad ópticas del agua.
    Recibe el dataframe con la información en formato de Pandas Dataframe, y selecciona según una fecha establecida,
    el conjunto de datos con coordenadas Lat-Lon que debe dibujar. Esos los dibuja y transforma a formato raster. Unido
    se dibuja también la línea de costa proveniente de un archivo shapefile. La función dibuja toda la información
    contenida en el dataframe aportado (datos, anomalías, flags, y cualquier otro dato que tenga.

    Recibe:
        df_std: dataframe con la información a dibujar. Debe venir indexado por fecha, lat y lon.

        fecha: día que se elige dibujar. Formato string 'yyyymmdd'. Valor 1 significa que grafica el valor promedio de todas las fechas en cada
            píxel. Promedio simple ignorando NaN's

        attributes: diccionario con los atributos del netcdf de donde se obtiene nombre de variable y unidades. Creado
        con open_netcdf.py

        savefol: carpeta donde se guardan las imágenes dibujadas

        disp_fig: booleano para imprimir figura en pantalla.


    Devuelve:
            Nada. Solo crea y guarda figuras"""

    # Identifica la fecha solicitada (cuando se ha especificado) y confirma que sea parte del registro. Extrae la
    # información del Dataframe en la fecha que se solicitó, o calcula el promedio de todas las fechas para graficar
    # el valor promedio.
    if fecha != 1:

        if isinstance(fecha, str):
            fecha = pd.to_datetime(fecha + '120000')
        else:
            print('La fecha indicada no está en formato String. Reinicie la ejecución.')

        try:
            idx = pd.IndexSlice
            df_map = df_std.loc[idx[:, :, fecha], :]
        except:
            print('Se generó un error. Posiblemente fecha no está dentro del registro. La fecha debe estar entre el ' + df_std.index[0][-1].strftime('%d/%m/%Y') + ' y el ' + df_std.index[-1][-1].strftime('%d/%m/%Y'))
            raise
    else:
        df_map = df_std.groupby(['lat', 'lon']).mean()

    # Reestructura la información para tenerla en forma de matriz y dibujarla de forma más simple. Extrae los valores y
    # las latitudes y longitudes correspondientes, así como los valores de la variable y sus flags.
    df_map2 = df_map.unstack(level=0)

    vari = df_map2['mean_val'].values

    flags = df_map2['flag_val'].values

    lat = df_map2['mean_val'].columns.get_level_values('lat')
    lon = df_map2['mean_val'].index.get_level_values('lon')

    # Extrae de los atributos del netcdf el nombre de la variable a graficar y las unidades
    variable_str = attributes['variable']['long_name']

    variable_units = attributes['variable']['units']

    # Dibuja el mapa que se haya seleccionado según fecha (valor promedio del valor o fecha específica)
    fig, ax = plt.subplots(1, 2, figsize=(10, 10), subplot_kw={'projection': ccrs.PlateCarree()})

    extend = [lon[1], lon[-1], lat[1], lat[-1]]

    # Primera figura. Variable a graficar. Usa línea de costa del cartopy y coloca una leyenda abajo
    ax[0].set_extent(extend)
    ax[0].coastlines(resolution='10m')


    #cs = ax[0].pcolormesh(lon, lat, vari.T)

    cs = ax[0].pcolormesh(lon, lat, vari.T, transform=ccrs.PlateCarree())
    ax[0].set_title(variable_str)
    cax, kw = matplotlib.colorbar.make_axes(ax[0], location='bottom', pad=0.05, shrink=0.7)
    out = fig.colorbar(cs, cax=cax, extend='both', **kw)
    out.set_label('Units: '+variable_units, size=10)

    # Segunda figura. Flags de la figura. Usa la leyenda directamente de los datos usados.
    ax[1].set_extent(extend)
    ax[1].coastlines(resolution='10m')
    cs2 = ax[1].pcolormesh(lon, lat, flags.T)
    ax[1].set_title('Flags')
    cax, kw = matplotlib.colorbar.make_axes(ax[1], location='bottom', pad=0.05, shrink=0.7)
    out = fig.colorbar(cs2, cax=cax, extend='both', **kw)
    out.set_label('Flags', size=10)

    # Salva la figura
    plt.savefig(savefol+variable_str+'.jpg', bbox_inches='tight')

    with open(savefol+'fig_'+variable_str+'.pickel', 'wb') as f:
        pickel.dump(fig, f)


    # Imprime figura si se elige opción con disp_fig
    if disp_fig == 1:
        plt.show()

    return

It receives the data as a Pandas dataframe. The NetCDF was opened using xarray.open_dataset and then transforming it to Pandas with to_dataframe()

I'm using Python 3.7 in Ubuntu.

Last thing. When loading the cartopy.crs package, this error occurs:

ERROR 1: PROJ: proj_create_from_database: Open of /home/felipe/anaconda3/envs/personal/share/proj failed

Could it be affecting?


Solution

  • we answered to Felipe by email, I copy/paste here:

    A small Python script to create a map on your area from a TSM GlobColour Product (I used a monthly product to have a good coverage):

        import netCDF4 as nc
        import numpy as np
        import matplotlib.pyplot as plt
        import cartopy.crs as ccrs
    
    
        fig, ax = plt.subplots(figsize=(5, 5), subplot_kw=dict(projection=ccrs.PlateCarree()))
    
        # my region of interest
        ax.set_extent([-86, -84, 9, 11])
    
        ax.coastlines(resolution='10m', color='red')
    
        nc_dst = nc.Dataset('L3m_20100101-20100131__GLOB_4_AV-MER_TSM_MO_00.nc')
        # extent of the product
        data_extent = [nc_dst.max_west_grid, nc_dst.max_east_grid,
                       nc_dst.max_south_grid, nc_dst.max_north_grid]
        data = nc_dst.variables['TSM_mean'][:]
        flags = nc_dst.variables['TSM_flags'][:]
        land = flags & 8 # LAND == 3rd bit == 2^3 == 8
        data_noland = np.ma.masked_where(land, data)
    
        ax.imshow(data_noland, origin='upper', extent=data_extent)
        plt.savefig('TSM_noland.png')
    
        ax.imshow(data, origin='upper', extent=data_extent)
        plt.savefig('TSM.png')
    

    I think you are facing to 2 problems:

    1) Our products may overlap some land areas because of the Level-3 rebinning during the GlobColour processing: if a 4km pixel has only the corner on the water we will fill the full pixel. We keep them because they may be usefull for some needs (for example areas where the land/water limit is varying), but in the quality flags we provide a LAND mask which could be used to remove these pixels. You can also use your own LAND mask if you prefer. The Python example below shows how to use the LAND mask.

    2) I suspect that your Python code introduces an east/south shift of at least half a pixel maybe because the lat/lon arrays are for the center of each pixel but the extent needed by cartopy is the exterior limit.

    GlobColour flags are defined in the Product User Guide http://www.globcolour.info/CDR_Docs/GlobCOLOUR_PUG.pdf page 76.

    The GlobColour Team