I'm trying to execute the following SDMX query using Python from the OECD's database.
https://stats.oecd.org/restsdmx/sdmx.ashx/GetData/LAND_COVER_FUA/AUS+AUT.FOREST+GRSL+WETL+SHRUBL+SPARSE_VEGETATION+CROPL+URBAN+BARE+WATER.THOUSAND_SQKM+PCNT/all?startTime=1992&endTime=2019
I've looked around but can't get this to work with the existing packages and tutorials out there (I've looked at this link for example).
Would you be able to help? I can't seem to get the various libraries (pandasdmx
, cif
, etc.) to work with it.
Thanks so much in advance!
Using sdmx1
:
import sdmx
OECD = sdmx.Client("OECD_JSON")
key = dict(
COU="AUS AUT".split(),
FUA=[],
VARIABLE="FOREST GRSL WETL SHRUBL SPARSE_VEGETATION CROPL URBAN BARE WATER".split(),
MEAS="THOUSAND_SQKM PCNT".split(),
)
# Assemble into a string
key_str = ".".join("+".join(values) for values in key.values())
print(f"{key_str = }")
# Commented: these keys are invalid
# key_str = "AUS+AUT.FOREST+GRSL+WETL+SHRUBL+SPARSE_VEGETATION+CROPL+URBAN+BARE+WATER.THOUSAND_SQKM+PCNT"
# key_str = "AUS+AUT.FOREST+GRSL+WETL+SHRUBL+SPARSE_VEGETATION+CROPL+URBAN+BARE+WATER.THOUSAND_SQKM+PCNT."
# Retrieve a data message
dm = OECD.data(
"LAND_COVER_FUA",
key=key_str,
params=dict(startPeriod=1992, endPeriod=2019),
)
# Retrieve the first data set in the message
ds = dm.data[0]
# Convert to pandas
print(sdmx.to_pandas(ds))
This gives output like:
$ python q.py
key_str = 'AUS+AUT..FOREST+GRSL+WETL+SHRUBL+SPARSE_VEGETATION+CROPL+URBAN+BARE+WATER.THOUSAND_SQKM+PCNT'
COU FUA VARIABLE MEAS TIME_PERIOD
AUS AUS01 BARE THOUSAND_SQKM 1992 0.000618
2004 0.000618
2015 0.000618
2018 0.000541
2019 0.000541
...
AUT AT005L4 WATER THOUSAND_SQKM 1992 0.001314
2004 0.001314
2015 0.001314
2018 0.001314
2019 0.002319
Name: value, Length: 2160, dtype: float64
A few key points to understand:
The URL you give (with /GetData/
) indicates an SDMX 2.0 API. These are very old and few tools support them.
OECD provides both an SDMX-ML 2.1 and an SDMX-JSON API.
LAND_COVER_FUA
) does not appear to be available from the "OECD" (SDMX-ML) source, only from the "OECD_JSON" one.Parameters like "startTime=1992" are outdated/incorrect; the current form for SDMX 2.1 and later is "startPeriod=1992".
Your key only has 3 parts, but this data flow has 4 dimensions. If you uncomment the first line above that sets key_str
explicitly, the web service responds:
Semantic Error - Wrong number of non-time dimensions provided. Expected 4. Query contains 3
If we randomly guess and put an extra period at the end (second commented key_str
), we get the error:
Semantic Error - Dimension 'FUA' does not contain code(s) 'FOREST,GRSL,WETL,SHRUBL,SPARSE_VEGETATION,CROPL,URBAN,BARE,WATER'
This indicates that (a) the missing dimension is the second one and (b) it has an id "FUA". So we insert this in key
. Notice that the resulting, valid key string contains +AUT..FOREST+
. These two consecutive periods (..
) indicate "No specific labels/return all labels for this dimension."