Search code examples
pythonpandasdataframefinance

Return only the last day of the year with pandas?


Made an api get request for the historical close prices of a stock for a specified company from the financialmodelingprep api. It returns every recorded date for the stock. The problem is that i need only the last date of the last 5 years, in order to compare it to the financial statements. Does anyone know how to filter the dataset to get the last date of the year, without specifying the exact date? The goal is to export the table to csv format and further combine it with other companies.

Is there a better way to get the result that i need?

symbols = ["MMM",
           "ABT",
           "ABBV",
           "ABMD",
           "ACN",
           ]
import requests
import pandas as pd
import datetime

API_KEY = 'my_key'
api_stock_price_url =  "https://financialmodelingprep.com/api/v3/historical-price-full/" + symbols[0] + "?serietype=line&apikey=" + API_KEY
company_stock_price = pd.read_json(api_stock_price_url)
date_and_close = pd.json_normalize(company_stock_price["historical"])
company_stock_price["date"] = date_and_close["date"]
company_stock_price["close"] = date_and_close["close"]

company_stock_price.index = [company_stock_price["date"]]
# CHANGES THE INDEX TO BE THE NORMALIZED DATE
company_stock_price["date"] = pd.to_datetime(company_stock_price["date"])
# CHANGES THE FORMAT TO DATE

del company_stock_price['historical']
del company_stock_price['date']
# deletes the unwanted columns

the retuned company_stock_price table

    symbol  close
date        
2020-12-04  MMM 172.460007
2020-12-03  MMM 171.830002
2020-12-02  MMM 171.850006
2020-12-01  MMM 170.520004
2020-11-30  MMM 172.729996
... ... ...
1970-09-14  MMM 0.322600
1970-09-11  MMM 0.321700
1970-09-10  MMM 0.323500
1970-09-09  MMM 0.324000
1970-09-08  MMM 0.318800
12675 rows × 2 columns

the desired output i need would look something like this:

    symbol  close
date        
2020-12-31  MMM 172.460007
2019-12-31  MMM 131.112123
2018-12-31  MMM 123.123123
2017-12-31  MMM 111.111111
2016-11-31  MMM 101.111111

the problem in this case is that i cannot specify the exact date, because some of the s&p500 companies(which i am going for to loop over) are missing the stock price for that date in the returned api responses.


Solution

  • df2 = df.groupby(pd.DatetimeIndex(df['date']).year, 
        as_index=False).agg({'date': max}).reset_index(drop=True)
    
            date symbol       close
    0 1970-09-14    MMM    0.322600
    1 2020-12-04    MMM  172.460007
    

    Here the dataframe is grouped by the year of date column, then the rows with maximum date per year are returned. Then you can sort it by date and get the five last rows:

    df2.sort_values('date').iloc[-5:]