Search code examples
pythonsdmx

Download OECD API data using Python and SDMX


I'm trying to execute the following SDMX query using Python from the OECD's database.

https://stats.oecd.org/restsdmx/sdmx.ashx/GetData/LAND_COVER_FUA/AUS+AUT.FOREST+GRSL+WETL+SHRUBL+SPARSE_VEGETATION+CROPL+URBAN+BARE+WATER.THOUSAND_SQKM+PCNT/all?startTime=1992&endTime=2019

I've looked around but can't get this to work with the existing packages and tutorials out there (I've looked at this link for example).

Would you be able to help? I can't seem to get the various libraries (pandasdmx, cif, etc.) to work with it.

Thanks so much in advance!


Solution

  • Using sdmx1:

    import sdmx
    
    OECD = sdmx.Client("OECD_JSON")
    key = dict(
        COU="AUS AUT".split(),
        FUA=[],
        VARIABLE="FOREST GRSL WETL SHRUBL SPARSE_VEGETATION CROPL URBAN BARE WATER".split(),
        MEAS="THOUSAND_SQKM PCNT".split(),
    )
    
    # Assemble into a string
    key_str = ".".join("+".join(values) for values in key.values())
    print(f"{key_str = }")
    
    # Commented: these keys are invalid
    # key_str = "AUS+AUT.FOREST+GRSL+WETL+SHRUBL+SPARSE_VEGETATION+CROPL+URBAN+BARE+WATER.THOUSAND_SQKM+PCNT"
    # key_str = "AUS+AUT.FOREST+GRSL+WETL+SHRUBL+SPARSE_VEGETATION+CROPL+URBAN+BARE+WATER.THOUSAND_SQKM+PCNT."
    
    # Retrieve a data message
    dm = OECD.data(
        "LAND_COVER_FUA",
        key=key_str,
        params=dict(startPeriod=1992, endPeriod=2019),
    )
    
    # Retrieve the first data set in the message
    ds = dm.data[0]
    
    # Convert to pandas
    print(sdmx.to_pandas(ds))
    

    This gives output like:

    $ python q.py 
    key_str = 'AUS+AUT..FOREST+GRSL+WETL+SHRUBL+SPARSE_VEGETATION+CROPL+URBAN+BARE+WATER.THOUSAND_SQKM+PCNT'
    
    COU  FUA      VARIABLE  MEAS           TIME_PERIOD
    AUS  AUS01    BARE      THOUSAND_SQKM  1992           0.000618
                                           2004           0.000618
                                           2015           0.000618
                                           2018           0.000541
                                           2019           0.000541
                                                            ...   
    AUT  AT005L4  WATER     THOUSAND_SQKM  1992           0.001314
                                           2004           0.001314
                                           2015           0.001314
                                           2018           0.001314
                                           2019           0.002319
    Name: value, Length: 2160, dtype: float64
    

    A few key points to understand:

    • The URL you give (with /GetData/) indicates an SDMX 2.0 API. These are very old and few tools support them.

    • OECD provides both an SDMX-ML 2.1 and an SDMX-JSON API.

      • The package documentation above includes some notes on these.
      • This particular data flow (LAND_COVER_FUA) does not appear to be available from the "OECD" (SDMX-ML) source, only from the "OECD_JSON" one.
      • The package can only automatically construct a key from a dict when SDMX-ML is available. Since we have to use the JSON endpoint, we manually construct the key.
    • Parameters like "startTime=1992" are outdated/incorrect; the current form for SDMX 2.1 and later is "startPeriod=1992".

    • Your key only has 3 parts, but this data flow has 4 dimensions. If you uncomment the first line above that sets key_str explicitly, the web service responds:

      Semantic Error - Wrong number of non-time dimensions provided. Expected 4. Query contains 3

      If we randomly guess and put an extra period at the end (second commented key_str), we get the error:

      Semantic Error - Dimension 'FUA' does not contain code(s) 'FOREST,GRSL,WETL,SHRUBL,SPARSE_VEGETATION,CROPL,URBAN,BARE,WATER'

      This indicates that (a) the missing dimension is the second one and (b) it has an id "FUA". So we insert this in key. Notice that the resulting, valid key string contains +AUT..FOREST+. These two consecutive periods (..) indicate "No specific labels/return all labels for this dimension."