Search code examples
pythonpandasappenditeritems

iteritems() DateFrame output mechanics and append() output into a new DataFrame


Ok, enough is enough. I need help with this iteritems() and append() procedeure...

Here we have some Time Series Price Data for barrels of Beer, and Whiskey...

    Beer    Whiskey
Date        
1978-12-29  22.60   86.50
1979-01-02  22.68   86.52
1979-01-03  21.86   87.41
1979-01-04  22.32   87.54
1979-01-05  22.55   87.49
1979-01-08  22.31   87.21
1979-01-09  22.40   87.61
1979-01-10  22.07   87.64
1979-01-11  22.07   88.12
1979-01-12  21.76   88.04

What I am trying to do is create rolling 5 day return values from this data. I have been using the iteritems() function and I am getting the right numbers. The first part that I don't understand is why this function repeats the output as many times as there are columns in the DataFrame (minus the index). This is the code and output...

for value in test.iteritems():
    print(((test - test.shift(5))*100)/test.shift(5))

OUTPUT

               Beer        Whiskey
Date                          
1978-12-29       NaN       NaN
1979-01-02       NaN       NaN
1979-01-03       NaN       NaN
1979-01-04       NaN       NaN
1979-01-05       NaN       NaN
1979-01-08 -1.283186  0.820809
1979-01-09 -1.234568  1.259824
1979-01-10  0.960659  0.263128
1979-01-11 -1.120072  0.662554
1979-01-12 -3.503326  0.628643
                Beer        Whiskey
Date                          
1978-12-29       NaN       NaN
1979-01-02       NaN       NaN
1979-01-03       NaN       NaN
1979-01-04       NaN       NaN
1979-01-05       NaN       NaN
1979-01-08 -1.283186  0.820809
1979-01-09 -1.234568  1.259824
1979-01-10  0.960659  0.263128
1979-01-11 -1.120072  0.662554
1979-01-12 -3.503326  0.628643

Any ideas why this exact output is repeated?

NEXT, I create a new DataFrame and I ask (very nicely!) to append this output into the dataframe. Here is the code...

for value in test.iteritems():
    df.append(((test - test.shift(5))*100)/test.shift(5))

This is the error I receive...


TypeError                                 Traceback (most recent call last)
<ipython-input-133-006bdc416056> in <module>()
      1 for value in test.iteritems():
----> 2     df.append(((test - test.shift(5))*100)/test.shift(5))

TypeError: append() missing 1 required positional argument: 'other'

My research says that this 'other' TypeError occurs when there is a reference missing in the code. I have tried different combinations of "key, value" with no avail. Further, the print function seems to not have any issues. Please let me know if you have any ideas. Thanks in advance


Solution

  • pandas.iteritems iterates over pairs of the form name, column (series to be more precise), you can check that by looking at this example

    for value in test.iteritems():
        print(value[0])
    

    This outputs

    Beer
    Whiskey
    

    That's why you see multiple outputs of the same frame. A simple solution to your problem is

    returns = 100 * test.diff(5) / test.shift(5)
    print(returns)
                    Beer   Whiskey
    Date                          
    1978-12-29       NaN       NaN
    1979-01-02       NaN       NaN
    1979-01-03       NaN       NaN
    1979-01-04       NaN       NaN
    1979-01-05       NaN       NaN
    1979-01-08 -1.283186  0.820809
    1979-01-09 -1.234568  1.259824
    1979-01-10  0.960659  0.263128
    1979-01-11 -1.120072  0.662554
    1979-01-12 -3.503326  0.628643