I imported a file using pandas. Data look as follows:
I coded to get the data of 'open' from first day of every year saved as start_open and last day of the year saved as end_open for 27 years. My code is as follows:
import pandas as pd
df = pd.read_csv(r'C:\Users\Shivank Chadda\Desktop\Data Analysis\BATS_SPY, 1D.csv')
df['time'] = pd.to_datetime(df['time'],unit='s').dt.normalize()
df['year'] = pd.DatetimeIndex(df['time']).year
sub_df=df[['year','open']]
n=1993
for i in sub_df['year']:
sub_93 = sub_df[(sub_df['year']==n) & (sub_df['year']<2022)]
start_open=sub_93.iloc[0]['open']
end_open=sub_93.iloc[-1]['open']
per= ((end_open-start_open)/start_open)*100
print('The value at the start of the year',n,'is:',start_open,'\nThe value at the end of year',n,' is:',end_open)
n+=1
i+=1
The code prints following
The value at the start of the year 1993 is: 43.9688
The value at the end of year 1993 is: 46.9375
The value at the start of the year 1994 is: 46.59375
The value at the end of year 1994 is: 46.20312
The value at the start of the year 1995 is: 45.70312
The value at the end of year 1995 is: 61.46875
The value at the start of the year 1996 is: 61.40625
The value at the end of year 1996 is: 75.28125
The value at the start of the year 1997 is: 74.375
The value at the end of year 1997 is: 96.875
(This continues until 2021)
With the following error:
File "C:\Users\Shivank Chadda\Desktop\Data Analysis\untitled7.py", line 16, in <module>
start_open=sub_93.iloc[0]['open']
File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 879, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1496, in _getitem_axis
self._validate_integer(key, axis)
File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1437, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
I have two questions
(1) How should I resolve this error?
(2) I want to get an array that contains year, start_open, end_open, and percentage rather than to print in sentence. If possible I would want to make a .csv of the data collected.
Please let me know what should be my next steps
I can't test it but error shows problem with IndexError
in
start_open = sub_93.iloc[0]['open']
so probably you get empty sub_93
and it doesn't have [0]
(and [-1]
).
You should check it and skip calculations
sub_93 = sub_df[(sub_df['year'] == n) & (sub_df['year'] < 2022)]
if len(sub_93) == 0:
print('No data for year', n)
else:
start_open = sub_93.iloc[0]['open']
end_open = sub_93.iloc[-1]['open']
per = ((end_open-start_open)/start_open)*100
print('The value at the start of the year', n, 'is:', start_open, '\nThe value at the end of year', n,'is:', end_open)
n += 1
EDIT:
Second problem - with creating list - seems so simple so I didn't even think about it.
before for
-loop create list results = []
.
inside for
-loop append values results.append([year, start_open, end_open, percentage])
and you get list with sublists.
You may convert it to pandas.DataFrame
and save it as CSV
# - before `for`-loop -
results = []
# - `for`-loop -
for i in sub_df['year']:
# ... code ...
results.append( [year, start_open, end_open, percentage] )
# - after `for`-loop -
df_results = pd.DataFrame(results, header=["Year", "Start", "End", "Percentage"])
#df_results.to_csv("output.csv", index=False)
df_results.to_csv("output.csv")