gold = pd.read_csv('gold.csv')
gold['Date'] = gold['Date'].apply(pd.to_datetime, utc=True)
gold
Date Price
0 1950-01-01 00:00:00+00:00 34.730
1 1950-02-01 00:00:00+00:00 34.730
2 1950-03-01 00:00:00+00:00 34.730
3 1950-04-01 00:00:00+00:00 34.730
4 1950-05-01 00:00:00+00:00 34.730
... ... ...
842 2020-03-01 00:00:00+00:00 1593.764
843 2020-04-01 00:00:00+00:00 1680.030
844 2020-05-01 00:00:00+00:00 1715.697
845 2020-06-01 00:00:00+00:00 1734.032
846 2020-07-01 00:00:00+00:00 1840.807
gold.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 847 entries, 0 to 846
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 847 non-null datetime64[ns, UTC]
1 Price 847 non-null float64
dtypes: datetime64[ns, UTC](1), float64(1)
memory usage: 13.4 KB
gold = gold.set_index('Date')
gold.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 847 entries, 1950-01-01 00:00:00+00:00 to 2020-07-01 00:00:00+00:00
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Price 847 non-null float64
dtypes: float64(1)
memory usage: 13.2 KB
gold
Date Price
1950-01-01 00:00:00+00:00 34.730
1950-02-01 00:00:00+00:00 34.730
1950-03-01 00:00:00+00:00 34.730
1950-04-01 00:00:00+00:00 34.730
1950-05-01 00:00:00+00:00 34.730
... ...
2020-03-01 00:00:00+00:00 1593.764
2020-04-01 00:00:00+00:00 1680.030
2020-05-01 00:00:00+00:00 1715.697
2020-06-01 00:00:00+00:00 1734.032
2020-07-01 00:00:00+00:00 1840.807
years = pd.date_range('01/01/1950', '01/01/2020', freq='YS')
years
DatetimeIndex(['1950-01-01', '1951-01-01', '1952-01-01', '1953-01-01',
'1954-01-01', '1955-01-01', '1956-01-01', '1957-01-01',
'1958-01-01', '1959-01-01', '1960-01-01', '1961-01-01',
'1962-01-01', '1963-01-01', '1964-01-01', '1965-01-01',
'1966-01-01', '1967-01-01', '1968-01-01', '1969-01-01',
'1970-01-01', '1971-01-01', '1972-01-01', '1973-01-01',
'1974-01-01', '1975-01-01', '1976-01-01', '1977-01-01',
'1978-01-01', '1979-01-01', '1980-01-01', '1981-01-01',
'1982-01-01', '1983-01-01', '1984-01-01', '1985-01-01',
'1986-01-01', '1987-01-01', '1988-01-01', '1989-01-01',
'1990-01-01', '1991-01-01', '1992-01-01', '1993-01-01',
'1994-01-01', '1995-01-01', '1996-01-01', '1997-01-01',
'1998-01-01', '1999-01-01', '2000-01-01', '2001-01-01',
'2002-01-01', '2003-01-01', '2004-01-01', '2005-01-01',
'2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01',
'2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01',
'2014-01-01', '2015-01-01', '2016-01-01', '2017-01-01',
'2018-01-01', '2019-01-01', '2020-01-01'],
dtype='datetime64[ns]', freq='AS-JAN')
reindexed = gold.reindex(years)
reindexed
Price
1950-01-01 NaN
1951-01-01 NaN
1952-01-01 NaN
1953-01-01 NaN
1954-01-01 NaN
... ...
2016-01-01 NaN
2017-01-01 NaN
2018-01-01 NaN
2019-01-01 NaN
2020-01-01 NaN
When I plot reindexed there are no values for the price after reindexing. I am trying to get gold reindexed by the first day of each year from 1950 to 2020, and then plot them on a pandas generated line graph. I have looked at various posts from Stack Overflow with similar issues, but I am still not sure what I am doing wrong.
You need to use an UTC localized index, convert with tz_localize
:
reindexed = gold.reindex(years.tz_localize('utc'))
Output:
Price
1950-01-01 00:00:00+00:00 34.73
1951-01-01 00:00:00+00:00 NaN # NaN as the data is
1952-01-01 00:00:00+00:00 NaN # not in your example
1953-01-01 00:00:00+00:00 NaN
1954-01-01 00:00:00+00:00 NaN
... ...
2016-01-01 00:00:00+00:00 NaN
2017-01-01 00:00:00+00:00 NaN
2018-01-01 00:00:00+00:00 NaN
2019-01-01 00:00:00+00:00 NaN
2020-01-01 00:00:00+00:00 NaN
[71 rows x 1 columns]
You might also try to resample
getting the first
value per year:
resampled = gold.resample('YS').first()
Output:
Price
Date
1950-01-01 00:00:00+00:00 34.730
1951-01-01 00:00:00+00:00 NaN
1952-01-01 00:00:00+00:00 NaN
1953-01-01 00:00:00+00:00 NaN
1954-01-01 00:00:00+00:00 NaN
... ...
2016-01-01 00:00:00+00:00 NaN
2017-01-01 00:00:00+00:00 NaN
2018-01-01 00:00:00+00:00 NaN
2019-01-01 00:00:00+00:00 NaN
2020-01-01 00:00:00+00:00 1593.764
[71 rows x 1 columns]