Search code examples
pythonpandasdataframealgorithmic-tradingtrading

Python Pandas Dataframe: How to get Trend Lines data from existing dataframe columns?


I would like to be able to create columns containing trend line data, from two points from my other columns, e.g. CLOSE price to CLOSE price 5 days later.

I would like the trend line to look forward and backwards, extending past the 2 points it is plotted from.

Right now I am NOT bothered about plotting them in mplfinance or matplotlib, I just want the trend line numbers so that I can work with them.

Any help is greatly appreciated :)


Solution

  • The basic idea is to determine the constants in the formula for a straight line, and use the formula to extend your trend line forward and/or backward. The formula for a straight line is:

    y = mx + b
    

    where m is the slope and b is the y-intercept (the value of y when x == 0).

    This formula assumes that both x and y are continuous.


    Given two points (two x,y pairs), x1,y1 and x2,y2, first calculate the slope:

    slope = (y2-y1) / (x2-x1)
    

    Then use either of the two points to calculate the y-intercept:

    y-intercept = y1 - slope*x1
    

    Now you can use y = mx + b to calculate the straight line y for all relevant values of x


    With time-series data, there are two problems with the above:


    The first problem is in a time series y is typically a float while x is typically some flavor of datetime object. This means the slope (y2-y1)/(x2-x1) would be a float divided by a DatetimeDelta, which most programming languages won't allow (unless you can override the division operator; but if so, with what algorithm?).

    The solution to this problem is to first convert the datetimes into a float representation that is proportional and continuous with respect to time (for example, UNIX time, or matplotlib datetime). Now you can calculate the slope and y-intercept.


    The second problem happens if your time series is discontinuous with time. This is common with trade data if you choose to ignore non-trading days. Ignoring non-trading days causes the x-axis to be discontinuous with time. This means that the slope calculation (y2-y1)/(x2-x1) will be incorrect.

    There are two solutions to this discontinuity problem:

    1. Don't worry about it because for short periods of time, even with some discontinuities, the slope calculation may be a close enough approximation.
    2. If the non-trading days are completely missing from your data frame, recognize that, while the data may be discontinuous with resepect to time, it is continuous with respect to the row number of your dataframe. Therefore the slope can be calculated using the dataframe row numbers in the denominator.

    A more detailed discussion, along with some code examples and plots, can be found by clicking here.

    hth