Search code examples
pythonpandasnumpydataframestocks

Split/Group pandas DataFrame by row, according to column's value continuity


I need some help to create distinct groups of rows, if a price cross a moving average (given by the changes in the "trend" column). I'm going to explain it by examples. The following is the data I have:

            close        avg      diff  trend
date                                        
2017-02-22  13.78  13.578652  0.201348     1
2017-02-23  13.80  13.580854  0.219146     1
2017-02-24  13.67  13.581741  0.088259     1
2017-03-01  13.65  13.582421  0.067579     1
2017-03-02  13.67  13.583292  0.086708     1
2017-03-03  13.60  13.583458  0.016542     1
2017-03-06  13.40  13.581633 -0.181633    -1
2017-03-07  13.48  13.580621 -0.100621    -1
2017-03-08  13.25  13.577332 -0.327332    -1
2017-03-09  12.95  13.571090 -0.621090    -1
2017-03-10  13.40  13.569387 -0.169387    -1
2017-03-13  13.35  13.567204 -0.217204    -1
2017-03-14  13.19  13.563451 -0.373451    -1
2017-03-15  13.85  13.566302  0.283698     1
2017-03-16  13.91  13.569722  0.340278     1
2017-03-17  13.40  13.568033 -0.168033    -1
2017-03-21  13.19  13.565079 -0.375079    -1
2017-03-22  12.95  13.558959 -0.608959    -1
2017-03-23  12.82  13.551606 -0.731606    -1
2017-03-24  12.93  13.545421 -0.615421    -1
2017-03-27  12.90  13.538999 -0.638999    -1
2017-03-28  13.20  13.535626 -0.335626    -1
2017-03-29  13.22  13.532485 -0.312485    -1
2017-03-30  13.20  13.529177 -0.329177    -1
2017-03-31  13.28  13.526698 -0.246698    -1
2017-04-03  13.27  13.524143 -0.254143    -1
2017-04-04  13.20  13.520918 -0.320918    -1
2017-04-05  13.30  13.518720 -0.218720    -1
2017-04-06  13.73  13.520822  0.209178     1
2017-04-07  13.61  13.521710  0.088290     1
2017-04-10  13.78  13.524280  0.255720     1
2017-04-11  13.66  13.525630  0.134370     1

I want distinct groups when the trend column equals to 1, and distinct groups for trend equals to -1. Like this:

            close        avg      diff  trend
date                                        
------- GROUP 1 for trend == 1: --------
2017-02-22  13.78  13.578652  0.201348     1
2017-02-23  13.80  13.580854  0.219146     1
2017-02-24  13.67  13.581741  0.088259     1
2017-03-01  13.65  13.582421  0.067579     1
2017-03-02  13.67  13.583292  0.086708     1
2017-03-03  13.60  13.583458  0.016542     1
------- GROUP 2 for trend == 1: --------
2017-03-15  13.85  13.566302  0.283698     1
2017-03-16  13.91  13.569722  0.340278     1
------- GROUP 3 for trend == 1: --------
2017-04-06  13.73  13.520822  0.209178     1
2017-04-07  13.61  13.521710  0.088290     1
2017-04-10  13.78  13.524280  0.255720     1
2017-04-11  13.66  13.525630  0.134370     1

And the same for trend == -1:

            close        avg      diff  trend
date                                        
------- GROUP 1 for trend == -1: --------
2017-03-06  13.40  13.581633 -0.181633    -1
2017-03-07  13.48  13.580621 -0.100621    -1
2017-03-08  13.25  13.577332 -0.327332    -1
2017-03-09  12.95  13.571090 -0.621090    -1
2017-03-10  13.40  13.569387 -0.169387    -1
2017-03-13  13.35  13.567204 -0.217204    -1
2017-03-14  13.19  13.563451 -0.373451    -1
------- GROUP 2 for trend == -1: --------
2017-03-17  13.40  13.568033 -0.168033    -1
2017-03-21  13.19  13.565079 -0.375079    -1
2017-03-22  12.95  13.558959 -0.608959    -1
2017-03-23  12.82  13.551606 -0.731606    -1
2017-03-24  12.93  13.545421 -0.615421    -1
2017-03-27  12.90  13.538999 -0.638999    -1
2017-03-28  13.20  13.535626 -0.335626    -1
2017-03-29  13.22  13.532485 -0.312485    -1
2017-03-30  13.20  13.529177 -0.329177    -1
2017-03-31  13.28  13.526698 -0.246698    -1
2017-04-03  13.27  13.524143 -0.254143    -1
2017-04-04  13.20  13.520918 -0.320918    -1
2017-04-05  13.30  13.518720 -0.218720    -1

Any hint about how to accomplish that will be welcomed. I would be delighted if there is a procedural solution (without using loops).


Solution

  • Using shift() and cumsum() and the fact that bools are equal to 1, you can build a group which checks for continuity:

    Code:

    df.groupby((df.trend != df.trend.shift()).cumsum())
    

    Test Code:

    df = pd.read_fwf(StringIO(u"""
        date         close        avg      diff  trend
        2017-03-01  13.65  13.582421  0.067579     1
        2017-03-02  13.67  13.583292  0.086708     1
        2017-03-03  13.60  13.583458  0.016542     1
        2017-03-06  13.40  13.581633 -0.181633    -1
        2017-03-07  13.48  13.580621 -0.100621    -1
        2017-03-08  13.25  13.577332 -0.327332    -1
        2017-03-09  12.95  13.571090 -0.621090     1
        2017-03-10  13.40  13.569387 -0.169387     1
        2017-03-13  13.35  13.567204 -0.217204    -1
        2017-03-14  13.19  13.563451 -0.373451    -1
        2017-03-15  13.85  13.566302  0.283698     1
        2017-03-16  13.91  13.569722  0.340278     1
        2017-03-17  13.40  13.568033 -0.168033     1"""),
                     header=1).set_index(['date'])
    
    for group in df.groupby((df.trend != df.trend.shift()).cumsum()):
        print(group)
    

    Results:

    (1,             close        avg      diff  trend
    date                                         
    2017-03-01  13.65  13.582421  0.067579      1
    2017-03-02  13.67  13.583292  0.086708      1
    2017-03-03  13.60  13.583458  0.016542      1)
    (2,             close        avg      diff  trend
    date                                         
    2017-03-06  13.40  13.581633 -0.181633     -1
    2017-03-07  13.48  13.580621 -0.100621     -1
    2017-03-08  13.25  13.577332 -0.327332     -1)
    (3,             close        avg      diff  trend
    date                                         
    2017-03-09  12.95  13.571090 -0.621090      1
    2017-03-10  13.40  13.569387 -0.169387      1)
    (4,             close        avg      diff  trend
    date                                         
    2017-03-13  13.35  13.567204 -0.217204     -1
    2017-03-14  13.19  13.563451 -0.373451     -1)
    (5,             close        avg      diff  trend
    date                                         
    2017-03-15  13.85  13.566302  0.283698      1
    2017-03-16  13.91  13.569722  0.340278      1
    2017-03-17  13.40  13.568033 -0.168033      1)