Search code examples
pythonpandasfor-loopmultiprocessingpool

Python Pandas append Data-frame multiprocessor pool for loop to exist Data-frame


i have dataframe called df3 with 5 columns

and i am parsing dataframe tables from bittrex.com using multiprocessor pool to dataframe called df2

i decreased processes to 2 only to simple my code as a test

here is my code

import pandas as pd
import json
import urllib.request
import os
from urllib import parse
import csv
import datetime
from multiprocessing import Process, Pool
import time

df3 = pd.DataFrame(columns=['tickers', 'RSIS', 'CCIS', 'ICH', 'SMAS'])
tickers = ["BTC-1ST", "BTC-ADA"]

def http_get(url):
    result = {"url": url, "data": urllib.request.urlopen(url, timeout=60).read()}
    return result

urls = ["https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin" for ticker in tickers ]

pool = Pool(processes=200)

results = pool.map(http_get, urls)

for result in results:
    j = json.loads(result['data'].decode())
    df2 = pd.DataFrame(data=j['result'])

    df2.rename(columns={'BV': 'BaseVolume', 'C': 'Close', 'H': 'High', 'L': 'Low', 'O': 'Open', 'T': 'TimeStamp',
                        'V': 'Volume'}, inplace=True)

    # Tenken-sen (Conversion Line): (9-period high + 9-period low)/2))
    nine_period_high = df2['High'].rolling(window=50).max()
    nine_period_low = df2['Low'].rolling(window=50).min()
    df2['tenkan_sen'] = (nine_period_high + nine_period_low) / 2

    # Kijun-sen (Base Line): (26-period high + 26-period low)/2))
    period26_high = df2['High'].rolling(window=250).max()
    period26_low = df2['Low'].rolling(window=250).min()
    df2['kijun_sen'] = (period26_high + period26_low) / 2

    TEN30L = df2.loc[df2.index[-1], 'tenkan_sen']
    TEN30LL = df2.loc[df2.index[-2], 'tenkan_sen']
    KIJ30L = df2.loc[df2.index[-1], 'kijun_sen']
    KIJ30LL = df2.loc[df2.index[-2], 'kijun_sen']

    if (TEN30LL < KIJ30LL) and (TEN30L > KIJ30L):
        df3.at[ticker, 'ICH'] = 'BUY'
    elif (TEN30LL > KIJ30LL) and (TEN30L < KIJ30L):
        df3.at[ticker, 'ICH'] = 'SELL'
    else:
        df3.at[ticker, 'ICH'] = 'NO'

    pool.close()
    pool.join()
    print(df2)

my question is about i always get error NameError: name 'ticker' is not defined which will get me mad why i received this error In spite of i pre-defined ticker as a for loop in the line urls = ["https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin" for ticker in tickers ] and already python used it successfully.

i googled for three days and tried several solutions without result.

any ideas please ???!!!!


Solution

  • I don't think you are looking at the correct line; when I run your code, I get:

    NameError                                 Traceback (most recent call last)
    <ipython-input-1-fd766f4a9b8e> in <module>()
         49         df3.at[ticker, 'ICH'] = 'SELL'
         50     else:
    ---> 51         df3.at[ticker, 'ICH'] = 'NO'
         52 
         53     pool.close()
    

    so at line 51, not the line where you create the urls list. And this makes sense, because ticker is not defined outside of the list comprehension at that line. The problem is regardless of your use of multiprocessing or pandas, but due to Python scoping rules: a temporary variable in a list comprehension is not usable outside of it; it would be difficult to imagine how it would be because it has iterated through several values, unless you're just interested in the last value it had, which is not what you want here.

    You'll probably have to keep track of the ticker throughout the fetching process, so you can relate the results to the right ticker in the end, something like:

    def http_get(ticker):
        url = "https://bittrex.com/Api/v2.0/pub/market/GetTicks?marketName=" + ticker + "&tickInterval=thirtyMin"
        result = {"url": url, "data": urllib.request.urlopen(url, timeout=60).read(), "ticker": ticker}
        return result
    
    pool = Pool(processes=200)
    
    results = pool.map(http_get, tickers)
    
    for result in results:
        j = json.loads(result['data'].decode())
        df2 = pd.DataFrame(data=j['result'])
        ticker = result['ticker']
        ...