Search code examples
pythonpandasdataframefinance

Making this pandas code as lean and speedy as possible? [iterating over large DataFrames and setting]


For context, my master dataset is a 24541 row x 1830 column DataFrame full of either NaN or a float (stock price). I am processing this DataFrame 11 times, each time setting values in a casted DataFrame with the same index and columns. An example of both DataFrames is below:

data = pd.DataFrame.from_csv(filepath)
data = pd.DataFrame(data=data, dtype=np.float64)

#dataset of daily prices
data.head()

Out[14]: 
            49154  65541  32791  65568  ...  24563  81910  24571  90110
DATE                                    ...                            
1925-12-31    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-02    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-04    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-05    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN
1926-01-06    NaN    NaN    NaN    NaN  ...    NaN    NaN    NaN    NaN

[5 rows x 1830 columns]

MA_a_frame = pd.DataFrame(
        data=0,
        index=data.index, 
        columns=data.columns)

#bool DataFrame
MA_a_frame.head()

Out[15]: 
            49154  65541  32791  65568  ...  24563  81910  24571  90110
DATE                                    ...                            
1925-12-31      0      0      0      0  ...      0      0      0      0
1926-01-02      0      0      0      0  ...      0      0      0      0
1926-01-04      0      0      0      0  ...      0      0      0      0
1926-01-05      0      0      0      0  ...      0      0      0      0
1926-01-06      0      0      0      0  ...      0      0      0      0

[5 rows x 1830 columns]

The values in MA_a_frame (and 10 other identical DataFrames) are to be set to 1 if a certain condition in the DataFrame "data" is met. Namely, if the price in "data" is within 1% (argument is "j") of a calculated value in a completely different DataFrame generated in a previous function. So in total, each iteration will deal with up to 3 large DataFrames.

In terms of my iterator, I simply create two separate lists ("dates" and "securities") using data.columns and data.index. So I am essentially iterating over data's index and columns indirectly. Without further ado, here is the base of the code that is run a total of 11 times in my program (the part I am trying to speed up!):

def gen_a():

    for date in dates:

        for security in securities: 

            try: 

                if type(data.loc[date, security]) is not float:

                    pass
                    #lots of the data is NaN, so skip these altogether

                elif j > math.log(
                        MA_a_csv.loc[date, security]/
                        data.loc[date, security]) > -j:

                    MA_dict['a'].loc[date, security] = 1

                print(f'Passed {date}, {security}')

            except: 

                print(f'Failed {date}, {security}')

Now, the problem is one cycle of this code takes ~8 hours. Thus, I'm looking at nearly 90 hours per run. I have an academic paper due as a graduation requirement, and the deadline is really starting to scare me with these numbers! Assuming my output is perfect, things should be fine, but I would be eternally grateful if anyone had a suggestion that could cut the speed down. Otherwise, I may have to cut the range of the data down, reducing the power of my statistical analysis.

P.S. I am running this through Spyder on Windows 10 with an Intel i7 3970X. I do not have access to any other computing power. I considered GPU acceleration but my GPU is a GTX 670 which is not Pascal and thus incompatible with CuDF.

Edit:

Here's the bottom five rows of the data DataFrame:

s.head()
Out[16]: 
            49154      65541  32791  65568  ...  24563  81910  24571  90110
DATE                                        ...                            
2018-12-24  61.55  232.70000    NaN    NaN  ...    NaN  15.71    NaN    NaN
2018-12-26  65.11  244.59000    NaN    NaN  ...    NaN  16.48    NaN    NaN
2018-12-27  64.71  252.17999    NaN    NaN  ...    NaN  16.71    NaN    NaN
2018-12-28  64.96  249.64999    NaN    NaN  ...    NaN  16.55    NaN    NaN
2018-12-31  66.09  254.50000    NaN    NaN  ...    NaN  16.74    NaN    NaN

[5 rows x 1830 columns]

And here's a sample of one of the comparison DataFrames:

Out[23]: 
              49154       65541  32791  65568  ...  24563    81910  24571  90110
DATE                                           ...                              
2018-12-24  76.3430  258.376200    NaN    NaN  ...    NaN  19.8672    NaN    NaN
2018-12-26  75.9530  258.143600    NaN    NaN  ...    NaN  19.7980    NaN    NaN
2018-12-27  75.5552  258.127199    NaN    NaN  ...    NaN  19.7238    NaN    NaN
2018-12-28  75.1382  257.878799    NaN    NaN  ...    NaN  19.6440    NaN    NaN
2018-12-31  74.7716  257.683199    NaN    NaN  ...    NaN  19.5600    NaN    NaN

[5 rows x 1830 columns]

Edit 2:

By request, here is data.head().to_dict():

  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '44792': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85753': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20220': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12044': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20239': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28433': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12052': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12060': {Timestamp('1925-12-31 00:00:00'): 326.0,
  Timestamp('1926-01-02 00:00:00'): 326.5,
  Timestamp('1926-01-04 00:00:00'): 325.0,
  Timestamp('1926-01-05 00:00:00'): 325.5,
  Timestamp('1926-01-06 00:00:00'): 326.25},
 '12062': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85792': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12067': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77605': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77606': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20263': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12073': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12076': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12079': {Timestamp('1925-12-31 00:00:00'): 117.5,
  Timestamp('1926-01-02 00:00:00'): 124.25,
  Timestamp('1926-01-04 00:00:00'): 127.125,
  Timestamp('1926-01-05 00:00:00'): 123.75,
  Timestamp('1926-01-06 00:00:00'): 124.5},
 '61241': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12095': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28484': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53065': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20298': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77644': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28505': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53081': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77659': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12124': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77661': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28513': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61284': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77668': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12140': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85869': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20343': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28548': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77702': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12167': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85908': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12183': {Timestamp('1925-12-31 00:00:00'): 78.5,
  Timestamp('1926-01-02 00:00:00'): 78.0,
  Timestamp('1926-01-04 00:00:00'): 77.5,
  Timestamp('1926-01-05 00:00:00'): 76.875,
  Timestamp('1926-01-06 00:00:00'): 76.5},
 '44951': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85913': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85914': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12191': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20386': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77730': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28580': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85926': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20394': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69550': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12212': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20407': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12220': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20415': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77768': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85963': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20431': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45014': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61399': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69607': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '85991': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53225': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20474': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20482': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86021': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45065': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12298': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69649': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12308': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20503': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45081': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86041': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12319': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20511': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12343': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12345': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20554': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12369': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20562': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86102': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20570': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86111': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12394': {Timestamp('1925-12-31 00:00:00'): 123.5,
  Timestamp('1926-01-02 00:00:00'): 124.0,
  Timestamp('1926-01-04 00:00:00'): 123.25,
  Timestamp('1926-01-05 00:00:00'): 123.5,
  Timestamp('1926-01-06 00:00:00'): 122.75},
 '36978': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86136': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28804': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86158': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12431': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '61583': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20626': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '77976': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '53401': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '86176': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12449': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '69796': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12456': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '45225': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '12458': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '20650': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 '28847': {Timestamp('1925-12-31 00:00:00'): nan,
  Timestamp('1926-01-02 00:00:00'): nan,
  Timestamp('1926-01-04 00:00:00'): nan,
  Timestamp('1926-01-05 00:00:00'): nan,
  Timestamp('1926-01-06 00:00:00'): nan},
 ...}

Unfortunately, I am out of space for this post, but MA_a_csv.head().to_dict() produces the same as above except all NaN rather than the one data point.


Solution

  • I made my own sample data generator based on the examples you gave. I think it fits what you have, but let me know if it doesn't. If the data matches, don't worry about the specifics of how I made it.

    rows = 6
    cols = 5
    np.random.seed(0)
    data = pd.DataFrame(np.random.rand(rows, cols) * 100, 
                      index=pd.DatetimeIndex(freq='d', start='1928-12-31', periods=rows))
    nan_cols = len(data.columns) // 2
    random_indices = zip(pd.Series(data.index.values[:-rows // 2])
                         .sample(nan_cols, random_state=1, replace=True), 
                         pd.Series(data.columns).sample(nan_cols, random_state=2))
    for row, col in random_indices:
        data.loc[:row, col] = np.nan
    
    MA_a_csv = data * (1 + (np.random.rand(rows, cols) / 50 
                            * np.random.choice([-1, 1], size=(rows, cols))))
    

    So data looks like

                        0          1          2          3          4
    1928-12-31  54.881350  71.518937        NaN  54.488318        NaN
    1929-01-01  64.589411  43.758721        NaN  96.366276  38.344152
    1929-01-02  79.172504  52.889492  56.804456  92.559664   7.103606
    1929-01-03   8.712930   2.021840  83.261985  77.815675  87.001215
    1929-01-04  97.861834  79.915856  46.147936  78.052918  11.827443
    1929-01-05  63.992102  14.335329  94.466892  52.184832  41.466194
    

    And MA_a_csv looks like

                        0          1          2          3          4
    1928-12-31  55.171734  72.626384        NaN  55.107778        NaN
    1929-01-01  63.791557  44.294412        NaN  98.185186  38.867028
    1929-01-02  78.603241  53.351780  57.597027  92.448175   7.008877
    1929-01-03   8.829794   2.013333  83.047291  77.324770  86.368349
    1929-01-04  98.977844  80.616881  45.235708  77.893620  11.876852
    1929-01-05  63.785651  14.522579  94.945445  52.671519  41.668902
    

    I ran it through something that looks like your gen_a, and then made a vectorized version that gets the same answer:

    logs = np.log(MA_a_csv / data)
    ans = ((j > logs) & (logs > -j)).replace({True: 1, False: 0})
    

    Where ans is

                0  1  2  3  4
    1928-12-31  1  0  0  0  0
    1929-01-01  0  0  0  0  0
    1929-01-02  1  1  0  1  0
    1929-01-03  0  1  1  1  1
    1929-01-04  0  1  0  1  1
    1929-01-05  1  0  1  1  1
    

    np.log can operate on the whole array at once, and pandas is probably doing something fancy to vectorize the greater-than comparisons as well. The & is a bit-wise and, so it's just checking that both conditions are true for each position.

    And this runs ~180 times faster than my version of gen_a, which didn't have the try/except or print statements, so it should be an even greater improvement for your code.

    You also don't need the .replace({True: 1, False: 0}) part - in Python 1 == True is True, as is 0 == False, so you should be able to use them interchangeably.

    Let me know if you have any issues with that. For further reading, I suggest Tom Augspurger's Modern Pandas articles - particularly applicable is the Fast Pandas section.