Goal: From an excel file, I want to get all the records which have dates that fall within a range and write them to a new excel file. The infile I'm working with has 500K+ rows and 21 columns.
What I've tried:
I've read the infile to a Pandas dataframe then returned the DatetimeIndex
. If I print the range
variable I get the desired records.
import pandas as pd
in_excel_file = r'path\to\infile.xlsx'
out_excel_file = r'path\to\outfile.xlsx'
df = pd.read_excel(in_excel_file)
range = (pd.date_range(start='1910-1-1', end='2021-1-1'))
print(range)
##prints
DatetimeIndex(['1990-01-01', '1990-01-02', '1990-01-03', '1990-01-04',
'1990-01-05', '1990-01-06', '1990-01-07', '1990-01-08',
'1990-01-09', '1990-01-10',
...
'2020-12-23', '2020-12-24', '2020-12-25', '2020-12-26',
'2020-12-27', '2020-12-28', '2020-12-29', '2020-12-30',
'2020-12-31', '2021-01-01'],
dtype='datetime64[ns]', length=11324, freq='D')
Where I'm having trouble is getting the above DatetimeIndex
to the outfile. The following gives me an error:
range.to_excel(out_excel_file, index=False)
AttributeError: 'DatetimeIndex' object has no attribute 'to_excel'
I'm pretty sure that when writing to excel it has to be a dataframe. So, my question is how do I get the range
variable to a dataframe object?
Goal: From an excel file, I want to get all the records which have dates that fall within a range and write them to a new excel file. The infile I'm working with has 500K+ rows and 21 columns.
You could use an indexing operation to select only the data you need from the original DataFrame and save the result in an Excel file.
In order to do that first you need to check if the date column from your original DataFrame is already converted to a datetime/date object:
import numpy as np
date_column = "date" # Suppose this is your date column name
if not np.issubdtype(df[date_column].dtype, np.datetime64):
df.loc[:, date_column] = pd.to_datetime(df[date_column], format="%Y-%m-%d")
Now you can use a regular indexing operation to get all values you need:
mask = (df[date_column] >= '1910-01-01') & (df[date_column] <= '2021-01-01') # Creates mask for date range
out_dataframe = df.loc[mask] # Here we select the indices using our mask
out_dataframe.to_excel(out_excel_file)