Search code examples
pythonpandasloopsanalysis

Copying existing columns as moving averages to a dataframe


I think I am overthinking this - I am trying to copy existing pandas data frame columns and values and making rolling averages - I do not want to overwrite original data. I am iterating over the columns, taking the columns and values, making a rolling 7 day ma as a new column with the suffix _ma as a copy to the original copy. I want to compare existing data to the 7day MA and see how many standard dev the data is from the 7 day MA - which I can figure out - I am just trying to save MA data as a new data frame.

I have

for column in original_data[ma_columns]:

    ma_df = pd.DataFrame(original_data[ma_columns].rolling(window=7).mean(), columns = str(column)+'_ma')

and getting the error : Index(...) must be called with a collection of some kind, 'Carrier_AcctPswd_ma' was passed

But if I am iterating with

for column in original_data[ma_columns]:

    print('Colunm Name : ', str(column)+'_ma')
    print('Contents : ', original_data[ma_columns].rolling(window=7).mean())

I get the data I need : Moving average data frame

My issue is just saving this as a new data frame, which I can concatenate to the old, and then do my analysis.

EDIT

I have now been able to make a bunch of data frames, but I want to concatenate them together and this is where the issue is:

for column in original_data[ma_columns]:

    MA_data = pd.DataFrame(original_data[column].rolling(window=7).mean())
    for i in MA_data:
        new = pd.concat(i)
        print(i)
<ipython-input-75-7c5e5fa775b3> in <module>
     17 #     print(type(MA_data))
     18     for i in MA_data:
---> 19         new = pd.concat(i)
     20         print(i)
     21 

~\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    279         verify_integrity=verify_integrity,
    280         copy=copy,
--> 281         sort=sort,
    282     )
    283 

~\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
    307                 "first argument must be an iterable of pandas "
    308                 "objects, you passed an object of type "
--> 309                 '"{name}"'.format(name=type(objs).__name__)
    310             )
    311 

TypeError: first argument must be an iterable of pandas objects, you passed an object of type "str"

Solution

  • You should iterate over column names and assign the resulting pandas series as a new named column, for example:

    import pandas as pd
    
    original_data = pd.DataFrame({'A': range(100), 'B': range(100, 200)})
    
    ma_columns = ['A', 'B']
    
    for column in ma_columns:
        new_column = column + '_ma'
        original_data[new_column] = pd.DataFrame(original_data[column].rolling(window=7).mean())
    
    print(original_data)
    

    Output dataframe:

        A    B  A_ma   B_ma
    0    0  100   NaN    NaN
    1    1  101   NaN    NaN
    2    2  102   NaN    NaN
    3    3  103   NaN    NaN
    4    4  104   NaN    NaN
    ..  ..  ...   ...    ...
    95  95  195  92.0  192.0
    96  96  196  93.0  193.0
    97  97  197  94.0  194.0
    98  98  198  95.0  195.0
    99  99  199  96.0  196.0
    
    [100 rows x 4 columns]