python arrays pandas numpy data-analysis

How to resolve IndexError and how to save 3 data computed in for loop to array/.csv?

I imported a file using pandas. Data look as follows:

My data looks like this

I coded to get the data of 'open' from first day of every year saved as start_open and last day of the year saved as end_open for 27 years. My code is as follows:

import pandas as pd
df = pd.read_csv(r'C:\Users\Shivank Chadda\Desktop\Data Analysis\BATS_SPY, 1D.csv')
df['time'] = pd.to_datetime(df['time'],unit='s').dt.normalize()
df['year'] = pd.DatetimeIndex(df['time']).year
sub_df=df[['year','open']]
n=1993
for i in sub_df['year']:
      sub_93 = sub_df[(sub_df['year']==n) & (sub_df['year']<2022)]
      start_open=sub_93.iloc[0]['open']
      end_open=sub_93.iloc[-1]['open']
      per= ((end_open-start_open)/start_open)*100
      print('The value at the start of the year',n,'is:',start_open,'\nThe value at the end of year',n,' is:',end_open)
      n+=1
      i+=1

The code prints following

The value at the start of the year 1993 is: 43.9688 

The value at the end of year 1993  is: 46.9375

The value at the start of the year 1994 is: 46.59375 

The value at the end of year 1994  is: 46.20312

The value at the start of the year 1995 is: 45.70312 

The value at the end of year 1995  is: 61.46875

The value at the start of the year 1996 is: 61.40625 

The value at the end of year 1996  is: 75.28125

The value at the start of the year 1997 is: 74.375 

The value at the end of year 1997  is: 96.875

(This continues until 2021)

With the following error:

  File "C:\Users\Shivank Chadda\Desktop\Data Analysis\untitled7.py", line 16, in <module>
    start_open=sub_93.iloc[0]['open']

  File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 879, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)

  File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1496, in _getitem_axis
    self._validate_integer(key, axis)

  File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1437, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")

IndexError: single positional indexer is out-of-bounds

I have two questions

(1) How should I resolve this error?

(2) I want to get an array that contains year, start_open, end_open, and percentage rather than to print in sentence. If possible I would want to make a .csv of the data collected.

Please let me know what should be my next steps

Solution

I can't test it but error shows problem with IndexError in

start_open = sub_93.iloc[0]['open']

so probably you get empty sub_93 and it doesn't have [0] (and [-1]).

You should check it and skip calculations

  sub_93 = sub_df[(sub_df['year'] == n) & (sub_df['year'] < 2022)]

  if len(sub_93) == 0:
      print('No data for year', n)
  else:
      start_open = sub_93.iloc[0]['open']
      end_open = sub_93.iloc[-1]['open']
      per = ((end_open-start_open)/start_open)*100
      print('The value at the start of the year', n, 'is:', start_open, '\nThe value at the end of year', n,'is:', end_open)

  n += 1

EDIT:

Second problem - with creating list - seems so simple so I didn't even think about it.

before for-loop create list results = [].
inside for-loop append values results.append([year, start_open, end_open, percentage])

and you get list with sublists.

You may convert it to pandas.DataFrame and save it as CSV

# - before `for`-loop -

results = []

# - `for`-loop -

for i in sub_df['year']:
    # ... code ...

    results.append( [year, start_open, end_open, percentage] )

# - after `for`-loop -

df_results = pd.DataFrame(results, header=["Year", "Start", "End", "Percentage"])

#df_results.to_csv("output.csv", index=False)
df_results.to_csv("output.csv")