Search code examples
pythonpandasdataframepython-datetimeeconomics

pandas.to_datetime issue identifying correct format when using HKMA's Open API


I am trying to identify the appropriate format used in my obtained dataframe, but I am having trouble finding anything that works.

The issue is that the format contains yearly figures for which the month is assumed to be some sort of zero-padded zeroth month. For example, the yearly nominal GDP is reported as 2014-00 instead of the usual 2014-01.

Thus, when I use,

df['end_of_month'] =pandas.to_datetime(df['end_of_month'], format="%Y-%m")

I get that:

ValueError: time data 2014-00 doesn't match format specified

For your consideration, here is the dataframe:

end_of_month  nominal_gdp
0       2014-00    2260005.0
1       2015-00    2398280.0
2       2016-00    2490617.0
3       2017-00    2662836.0
4       2018-00    2842883.0
5       2018-09     726352.0
6       2018-10          NaN
7       2018-11          NaN
8       2018-12     754904.0
9       2019-01          NaN
10      2019-02          NaN
11      2019-03     712514.0
12      2019-04          NaN
13      2019-05          NaN
14      2019-06     698044.0
15      2019-07          NaN
16      2019-08          NaN
17      2019-09     722831.0
18      2019-10          NaN
19      2019-11          NaN

For anyone interested or whomever might face a similar issue, the data was obtained from the Hong Kong Monetary Authority, using their Open API initiative. For more info visit HKMA's documentation.

Specifically, this issue arises when using the Economic Statistics dataset, which can be found in the following page of the documentation.


Solution

  • It appears that I have managed to find a solution for the issue. Here is the link where I found how to solve it: Handling multiple datetime formats with pd.to_datetime

    Here is the line I used:

    df['end_of_month'] = pandas.to_datetime(df['end_of_month'], format='%Y-%m',errors='coerce').fillna(pandas.to_datetime(df['end_of_month'], format='%Y-00',errors='coerce'))
    

    It simply "fills" the coerced rows with a different format. The format I used for the yearly figures whose zero-padded zeroth month has no meaning, is : "%Y-00" since we can diregard the "-00" which has no meaning for the yearly frequency values.