How should I know if I should interpolate the missing (NaN) values in my time series data?

I am using a public dataset (Coca-cola stock prices) from Kaggle: https://www.kaggle.com/datasets/kalilurrahman/coca-cola-stock-live-and-updated/data

I noticed that the frequency of the dataset is daily but there are missing values for some days. In my copy (because the dataset is being updated frequently), there were 15311 entries and when I resampled the dataframe (with the date as the index) daily, the number of missing values were 6900+ missing values which also implies that there were 6900+ missing entries.

This is how I resampled the data to have a daily frequency:

import pandas as pd

data = pd.read_csv(filepath)
data_preprocessed = data.copy()
data_preprocessed['Date'] = pd.to_datetime(data['Date'].str.split(' ').str[0])
data_preprocessed.set_index('Date', inplace=True)
data_monthly = data_preprocessed.resample('D').last()
print(data_monthly.isnull().sum()) # Display how many missing days are there

I am conflicted on how I should approach these NaN values. Should I use ffill(), bfill(), interpolate(), or drop() them all completely?

Any responses will be highly appreciated. Thank you very much.

Solution

Yes so the dataset is 15311 and once resampled it shows as 22213.

data_monthly[data_monthly['Open'].isna()]


output

    Open    High    Low Close   Volume  Dividends   Stock Splits
Date                            
1962-01-06  NaN NaN NaN NaN NaN NaN NaN
1962-01-07  NaN NaN NaN NaN NaN NaN NaN
1962-01-13  NaN NaN NaN NaN NaN NaN NaN
1962-01-14  NaN NaN NaN NaN NaN NaN NaN
1962-01-20  NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ...
2022-10-09  NaN NaN NaN NaN NaN NaN NaN
2022-10-15  NaN NaN NaN NaN NaN NaN NaN
2022-10-16  NaN NaN NaN NaN NaN NaN NaN
2022-10-22  NaN NaN NaN NaN NaN NaN NaN
2022-10-23  NaN NaN NaN NaN NaN NaN NaN
6902 rows × 7 columns

If you then look at how many days are missing from each year it is about the same number each year.

data_monthly[data_monthly['Open'].isna()].index.year.value_counts().plot.bar(figsize=(10,6))

What it looks like is weekends, stock does not trade at weekends and holidays. So maybe do not resample.