For context, my master dataset is a 24541 row x 1830 column DataFrame full of either NaN or a float (stock price). I am processing this DataFrame 11 times, each time setting values in a casted DataFrame with the same index and columns. An example of both DataFrames is below:
data = pd.DataFrame.from_csv(filepath)
data = pd.DataFrame(data=data, dtype=np.float64)
#dataset of daily prices
data.head()
Out[14]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
1925-12-31 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-02 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-04 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-05 NaN NaN NaN NaN ... NaN NaN NaN NaN
1926-01-06 NaN NaN NaN NaN ... NaN NaN NaN NaN
[5 rows x 1830 columns]
MA_a_frame = pd.DataFrame(
data=0,
index=data.index,
columns=data.columns)
#bool DataFrame
MA_a_frame.head()
Out[15]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
1925-12-31 0 0 0 0 ... 0 0 0 0
1926-01-02 0 0 0 0 ... 0 0 0 0
1926-01-04 0 0 0 0 ... 0 0 0 0
1926-01-05 0 0 0 0 ... 0 0 0 0
1926-01-06 0 0 0 0 ... 0 0 0 0
[5 rows x 1830 columns]
The values in MA_a_frame (and 10 other identical DataFrames) are to be set to 1 if a certain condition in the DataFrame "data" is met. Namely, if the price in "data" is within 1% (argument is "j") of a calculated value in a completely different DataFrame generated in a previous function. So in total, each iteration will deal with up to 3 large DataFrames.
In terms of my iterator, I simply create two separate lists ("dates" and "securities") using data.columns and data.index. So I am essentially iterating over data's index and columns indirectly. Without further ado, here is the base of the code that is run a total of 11 times in my program (the part I am trying to speed up!):
def gen_a():
for date in dates:
for security in securities:
try:
if type(data.loc[date, security]) is not float:
pass
#lots of the data is NaN, so skip these altogether
elif j > math.log(
MA_a_csv.loc[date, security]/
data.loc[date, security]) > -j:
MA_dict['a'].loc[date, security] = 1
print(f'Passed {date}, {security}')
except:
print(f'Failed {date}, {security}')
Now, the problem is one cycle of this code takes ~8 hours. Thus, I'm looking at nearly 90 hours per run. I have an academic paper due as a graduation requirement, and the deadline is really starting to scare me with these numbers! Assuming my output is perfect, things should be fine, but I would be eternally grateful if anyone had a suggestion that could cut the speed down. Otherwise, I may have to cut the range of the data down, reducing the power of my statistical analysis.
P.S. I am running this through Spyder on Windows 10 with an Intel i7 3970X. I do not have access to any other computing power. I considered GPU acceleration but my GPU is a GTX 670 which is not Pascal and thus incompatible with CuDF.
Edit:
Here's the bottom five rows of the data DataFrame:
s.head()
Out[16]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
2018-12-24 61.55 232.70000 NaN NaN ... NaN 15.71 NaN NaN
2018-12-26 65.11 244.59000 NaN NaN ... NaN 16.48 NaN NaN
2018-12-27 64.71 252.17999 NaN NaN ... NaN 16.71 NaN NaN
2018-12-28 64.96 249.64999 NaN NaN ... NaN 16.55 NaN NaN
2018-12-31 66.09 254.50000 NaN NaN ... NaN 16.74 NaN NaN
[5 rows x 1830 columns]
And here's a sample of one of the comparison DataFrames:
Out[23]:
49154 65541 32791 65568 ... 24563 81910 24571 90110
DATE ...
2018-12-24 76.3430 258.376200 NaN NaN ... NaN 19.8672 NaN NaN
2018-12-26 75.9530 258.143600 NaN NaN ... NaN 19.7980 NaN NaN
2018-12-27 75.5552 258.127199 NaN NaN ... NaN 19.7238 NaN NaN
2018-12-28 75.1382 257.878799 NaN NaN ... NaN 19.6440 NaN NaN
2018-12-31 74.7716 257.683199 NaN NaN ... NaN 19.5600 NaN NaN
[5 rows x 1830 columns]
Edit 2:
By request, here is data.head().to_dict():
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'44792': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85753': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20220': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12044': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20239': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28433': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12052': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12060': {Timestamp('1925-12-31 00:00:00'): 326.0,
Timestamp('1926-01-02 00:00:00'): 326.5,
Timestamp('1926-01-04 00:00:00'): 325.0,
Timestamp('1926-01-05 00:00:00'): 325.5,
Timestamp('1926-01-06 00:00:00'): 326.25},
'12062': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85792': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12067': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77605': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77606': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20263': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12073': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12076': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12079': {Timestamp('1925-12-31 00:00:00'): 117.5,
Timestamp('1926-01-02 00:00:00'): 124.25,
Timestamp('1926-01-04 00:00:00'): 127.125,
Timestamp('1926-01-05 00:00:00'): 123.75,
Timestamp('1926-01-06 00:00:00'): 124.5},
'61241': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12095': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28484': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53065': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20298': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77644': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28505': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53081': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77659': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12124': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77661': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28513': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'61284': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77668': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12140': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85869': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20343': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28548': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77702': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12167': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85908': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12183': {Timestamp('1925-12-31 00:00:00'): 78.5,
Timestamp('1926-01-02 00:00:00'): 78.0,
Timestamp('1926-01-04 00:00:00'): 77.5,
Timestamp('1926-01-05 00:00:00'): 76.875,
Timestamp('1926-01-06 00:00:00'): 76.5},
'44951': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85913': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85914': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12191': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20386': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77730': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28580': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85926': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20394': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69550': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12212': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20407': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12220': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20415': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77768': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85963': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20431': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45014': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'61399': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69607': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'85991': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53225': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20474': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20482': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86021': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45065': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12298': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69649': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12308': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20503': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45081': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86041': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12319': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20511': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12343': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12345': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20554': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12369': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20562': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86102': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20570': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86111': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12394': {Timestamp('1925-12-31 00:00:00'): 123.5,
Timestamp('1926-01-02 00:00:00'): 124.0,
Timestamp('1926-01-04 00:00:00'): 123.25,
Timestamp('1926-01-05 00:00:00'): 123.5,
Timestamp('1926-01-06 00:00:00'): 122.75},
'36978': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86136': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28804': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86158': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12431': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'61583': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20626': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'77976': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'53401': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'86176': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12449': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'69796': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12456': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'45225': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'12458': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'20650': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
'28847': {Timestamp('1925-12-31 00:00:00'): nan,
Timestamp('1926-01-02 00:00:00'): nan,
Timestamp('1926-01-04 00:00:00'): nan,
Timestamp('1926-01-05 00:00:00'): nan,
Timestamp('1926-01-06 00:00:00'): nan},
...}
Unfortunately, I am out of space for this post, but MA_a_csv.head().to_dict() produces the same as above except all NaN rather than the one data point.
I made my own sample data generator based on the examples you gave. I think it fits what you have, but let me know if it doesn't. If the data matches, don't worry about the specifics of how I made it.
rows = 6
cols = 5
np.random.seed(0)
data = pd.DataFrame(np.random.rand(rows, cols) * 100,
index=pd.DatetimeIndex(freq='d', start='1928-12-31', periods=rows))
nan_cols = len(data.columns) // 2
random_indices = zip(pd.Series(data.index.values[:-rows // 2])
.sample(nan_cols, random_state=1, replace=True),
pd.Series(data.columns).sample(nan_cols, random_state=2))
for row, col in random_indices:
data.loc[:row, col] = np.nan
MA_a_csv = data * (1 + (np.random.rand(rows, cols) / 50
* np.random.choice([-1, 1], size=(rows, cols))))
So data
looks like
0 1 2 3 4
1928-12-31 54.881350 71.518937 NaN 54.488318 NaN
1929-01-01 64.589411 43.758721 NaN 96.366276 38.344152
1929-01-02 79.172504 52.889492 56.804456 92.559664 7.103606
1929-01-03 8.712930 2.021840 83.261985 77.815675 87.001215
1929-01-04 97.861834 79.915856 46.147936 78.052918 11.827443
1929-01-05 63.992102 14.335329 94.466892 52.184832 41.466194
And MA_a_csv
looks like
0 1 2 3 4
1928-12-31 55.171734 72.626384 NaN 55.107778 NaN
1929-01-01 63.791557 44.294412 NaN 98.185186 38.867028
1929-01-02 78.603241 53.351780 57.597027 92.448175 7.008877
1929-01-03 8.829794 2.013333 83.047291 77.324770 86.368349
1929-01-04 98.977844 80.616881 45.235708 77.893620 11.876852
1929-01-05 63.785651 14.522579 94.945445 52.671519 41.668902
I ran it through something that looks like your gen_a
, and then made a vectorized version that gets the same answer:
logs = np.log(MA_a_csv / data)
ans = ((j > logs) & (logs > -j)).replace({True: 1, False: 0})
Where ans
is
0 1 2 3 4
1928-12-31 1 0 0 0 0
1929-01-01 0 0 0 0 0
1929-01-02 1 1 0 1 0
1929-01-03 0 1 1 1 1
1929-01-04 0 1 0 1 1
1929-01-05 1 0 1 1 1
np.log
can operate on the whole array at once, and pandas is probably doing something fancy to vectorize the greater-than comparisons as well. The &
is a bit-wise and, so it's just checking that both conditions are true for each position.
And this runs ~180 times faster than my version of gen_a
, which didn't have the try/except or print statements, so it should be an even greater improvement for your code.
You also don't need the .replace({True: 1, False: 0})
part - in Python 1 == True
is True, as is 0 == False
, so you should be able to use them interchangeably.
Let me know if you have any issues with that. For further reading, I suggest Tom Augspurger's Modern Pandas articles - particularly applicable is the Fast Pandas section.