Search code examples
pythoncsvpandasquandl

Iterate through EOD .csv's to create a historical running high, low in each file


I'm new to python, but am falling in love with it as my understanding gets better. That being said I've managed to piece together a script to dump the Quandl WIKI EOD database into CSV's for each respective stock. which I've included below (Python 2.7):

import pandas as pd
import Quandl

#this ignores the first symbol in tickers.txt, add a header!
symbols_list = list(pd.read_csv('tickers.txt', index_col=0).T)
print symbols_list

df = {}
for ticker in symbols_list:
    stock = str("WIKI/" + ticker + ".11")
    #print stock
    print str(ticker + ".csv")
    df[ticker] = Quandl.get(stock, authtoken="XXXXXXXXXXXXX", trim_start="2016-01-01")
    df[ticker].to_csv(str(ticker + ".csv"))

My next goal is to create a historical running high and low column for each stock csv. This would make it easy to create a historical "gas gauge" for the stock market similar to the ones on http://www.finviz.com/. The new high/new low gauge is extremely useful. When 2000+ stocks make a new low and only a few make a new high, it's easy to gauge just how hard a sell off is.

So far I've managed to iterate through each .csv in my working folder and keep a tally of each new high / new low. My hang up is on the proper way to enter this information into each csv as new columns. The EOD CSV's have ['Date', 'Adj. Close'] for headers in each, and am I looking to add ['High', 'Low'].

My code so far:

import os
import pandas as pd

csv = [f for f in os.listdir('.') if f.endswith('.csv')]
print csv
for i in csv:
    df = pd.read_csv(i, index_col=0, parse_dates=True)
    print df
    j=0
    k=0
    for i in df['Adj. Close']: #for each value in adj close
        if j==0:
            j=i
            k=i

        elif i>j:
            j=i #high

        elif i<k:
            k=i #low

        print j,k #print high & low

I really appreciate your help guys! All of the code snippets and answered questions have made learning python so much more enjoyable.

Here's the output of the script so far:

['A.csv', 'AA.csv']
            Adj. Close
Date                  
2016-01-04       40.69
2016-01-05       40.55
2016-01-06       40.73
2016-01-07       39.00
2016-01-08       38.59
2016-01-11       37.94
2016-01-12       38.19
2016-01-13       36.86
2016-01-14       37.61
2016-01-15       37.11
2016-01-19       37.20
2016-01-20       37.26
2016-01-21       37.23
2016-01-22       37.98
40.69 40.69
40.69 40.55
40.73 40.55
40.73 39.0
40.73 38.59
40.73 37.94
40.73 37.94
40.73 36.86
40.73 36.86
40.73 36.86
40.73 36.86
40.73 36.86
40.73 36.86
40.73 36.86
            Adj. Close
Date                  
2016-01-04        9.71
2016-01-05        9.27
2016-01-06        8.61
2016-01-07        8.27
2016-01-08        8.07
2016-01-11        8.00
2016-01-12        7.28
2016-01-13        7.13
2016-01-14        7.24
2016-01-15        6.90
2016-01-19        6.74
2016-01-20        6.74
2016-01-21        7.09
2016-01-22        6.87
9.71 9.71
9.71 9.27
9.71 8.61
9.71 8.27
9.71 8.07
9.71 8.0
9.71 7.28
9.71 7.13
9.71 7.13
9.71 6.9
9.71 6.74
9.71 6.74
9.71 6.74
9.71 6.74

This code worked beautifully:

import os
import pandas as pd

csv = [f for f in os.listdir('.') if f.endswith('.csv')]
print csv
for i in csv:
    df = pd.read_csv(i, index_col=0, parse_dates=True)
    #print df
    df['High'] = pd.rolling_max(df['Adj. Close'], window=df.shape[0], min_periods=1)
    df['Low'] = pd.rolling_min(df['Adj. Close'], window=df.shape[0], min_periods=1)
    #print df

    g = os.path.normcase(os.path.join(os.path.dirname(__file__), i)) #os.getcwd()
    print 'full path =', g
    with open(g , "w") as f:
        df.to_csv(f)

Now I'll see about creating a historical list of each day's new high/new low


Solution

  • You can use length of df - shape as window of rolling_min and rolling_max:

    print df
              Date  Adj. Close
    0   2016-01-04        9.71
    1   2016-01-05        9.27
    2   2016-01-06        8.61
    3   2016-01-07        8.27
    4   2016-01-08        8.07
    5   2016-01-11        8.00
    6   2016-01-12        7.28
    7   2016-01-13        7.13
    8   2016-01-14        7.24
    9   2016-01-15        6.90
    10  2016-01-19        6.74
    11  2016-01-20        6.74
    12  2016-01-21        7.09
    13  2016-01-22        6.87
    
    df['high'] = pd.rolling_max(df['Adj. Close'], window=df.shape[0], min_periods=1)
    df['low'] = pd.rolling_min(df['Adj. Close'], window=df.shape[0], min_periods=1)
    print df
              Date  Adj. Close  high   low
    0   2016-01-04        9.71  9.71  9.71
    1   2016-01-05        9.27  9.71  9.27
    2   2016-01-06        8.61  9.71  8.61
    3   2016-01-07        8.27  9.71  8.27
    4   2016-01-08        8.07  9.71  8.07
    5   2016-01-11        8.00  9.71  8.00
    6   2016-01-12        7.28  9.71  7.28
    7   2016-01-13        7.13  9.71  7.13
    8   2016-01-14        7.24  9.71  7.13
    9   2016-01-15        6.90  9.71  6.90
    10  2016-01-19        6.74  9.71  6.74
    11  2016-01-20        6.74  9.71  6.74
    12  2016-01-21        7.09  9.71  6.74
    13  2016-01-22        6.87  9.71  6.74