Search code examples
pythonarrayspandasnumpydata-analysis

How to resolve IndexError and how to save 3 data computed in for loop to array/.csv?


I imported a file using pandas. Data look as follows:

My data looks like this

I coded to get the data of 'open' from first day of every year saved as start_open and last day of the year saved as end_open for 27 years. My code is as follows:

import pandas as pd
df = pd.read_csv(r'C:\Users\Shivank Chadda\Desktop\Data Analysis\BATS_SPY, 1D.csv')
df['time'] = pd.to_datetime(df['time'],unit='s').dt.normalize()
df['year'] = pd.DatetimeIndex(df['time']).year
sub_df=df[['year','open']]
n=1993
for i in sub_df['year']:
      sub_93 = sub_df[(sub_df['year']==n) & (sub_df['year']<2022)]
      start_open=sub_93.iloc[0]['open']
      end_open=sub_93.iloc[-1]['open']
      per= ((end_open-start_open)/start_open)*100
      print('The value at the start of the year',n,'is:',start_open,'\nThe value at the end of year',n,' is:',end_open)
      n+=1
      i+=1

The code prints following

The value at the start of the year 1993 is: 43.9688 

The value at the end of year 1993  is: 46.9375

The value at the start of the year 1994 is: 46.59375 

The value at the end of year 1994  is: 46.20312

The value at the start of the year 1995 is: 45.70312 

The value at the end of year 1995  is: 61.46875

The value at the start of the year 1996 is: 61.40625 

The value at the end of year 1996  is: 75.28125

The value at the start of the year 1997 is: 74.375 

The value at the end of year 1997  is: 96.875

(This continues until 2021)

With the following error:

  File "C:\Users\Shivank Chadda\Desktop\Data Analysis\untitled7.py", line 16, in <module>
    start_open=sub_93.iloc[0]['open']

  File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 879, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)

  File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1496, in _getitem_axis
    self._validate_integer(key, axis)

  File "C:\Users\Shivank Chadda\anaconda3\lib\site-packages\pandas\core\indexing.py", line 1437, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")

IndexError: single positional indexer is out-of-bounds

I have two questions

(1) How should I resolve this error?

(2) I want to get an array that contains year, start_open, end_open, and percentage rather than to print in sentence. If possible I would want to make a .csv of the data collected.

Please let me know what should be my next steps


Solution

  • I can't test it but error shows problem with IndexError in

    start_open = sub_93.iloc[0]['open']
    

    so probably you get empty sub_93 and it doesn't have [0] (and [-1]).

    You should check it and skip calculations

      sub_93 = sub_df[(sub_df['year'] == n) & (sub_df['year'] < 2022)]
    
      if len(sub_93) == 0:
          print('No data for year', n)
      else:
          start_open = sub_93.iloc[0]['open']
          end_open = sub_93.iloc[-1]['open']
          per = ((end_open-start_open)/start_open)*100
          print('The value at the start of the year', n, 'is:', start_open, '\nThe value at the end of year', n,'is:', end_open)
    
      n += 1
    

    EDIT:

    Second problem - with creating list - seems so simple so I didn't even think about it.

    • before for-loop create list results = [].

    • inside for-loop append values results.append([year, start_open, end_open, percentage])

    and you get list with sublists.

    You may convert it to pandas.DataFrame and save it as CSV

    # - before `for`-loop -
    
    results = []
    
    # - `for`-loop -
    
    for i in sub_df['year']:
        # ... code ...
    
        results.append( [year, start_open, end_open, percentage] )
    
    # - after `for`-loop -
    
    df_results = pd.DataFrame(results, header=["Year", "Start", "End", "Percentage"])
    
    #df_results.to_csv("output.csv", index=False)
    df_results.to_csv("output.csv")