I am trying to identify the appropriate format used in my obtained dataframe, but I am having trouble finding anything that works.
The issue is that the format contains yearly figures for which the month is assumed to be some sort of zero-padded zeroth month. For example, the yearly nominal GDP is reported as 2014-00 instead of the usual 2014-01.
Thus, when I use,
df['end_of_month'] =pandas.to_datetime(df['end_of_month'], format="%Y-%m")
I get that:
ValueError: time data 2014-00 doesn't match format specified
For your consideration, here is the dataframe:
end_of_month nominal_gdp
0 2014-00 2260005.0
1 2015-00 2398280.0
2 2016-00 2490617.0
3 2017-00 2662836.0
4 2018-00 2842883.0
5 2018-09 726352.0
6 2018-10 NaN
7 2018-11 NaN
8 2018-12 754904.0
9 2019-01 NaN
10 2019-02 NaN
11 2019-03 712514.0
12 2019-04 NaN
13 2019-05 NaN
14 2019-06 698044.0
15 2019-07 NaN
16 2019-08 NaN
17 2019-09 722831.0
18 2019-10 NaN
19 2019-11 NaN
For anyone interested or whomever might face a similar issue, the data was obtained from the Hong Kong Monetary Authority, using their Open API initiative. For more info visit HKMA's documentation.
Specifically, this issue arises when using the Economic Statistics dataset, which can be found in the following page of the documentation.
It appears that I have managed to find a solution for the issue. Here is the link where I found how to solve it: Handling multiple datetime formats with pd.to_datetime
Here is the line I used:
df['end_of_month'] = pandas.to_datetime(df['end_of_month'], format='%Y-%m',errors='coerce').fillna(pandas.to_datetime(df['end_of_month'], format='%Y-00',errors='coerce'))
It simply "fills" the coerced rows with a different format. The format I used for the yearly figures whose zero-padded zeroth month has no meaning, is : "%Y-00" since we can diregard the "-00" which has no meaning for the yearly frequency values.