Search code examples
pythonintersectionstockshapelystockquotes

Shapely can't find intersection points that definitely exist


In my code I have two lines (line1, line2) that definitely visually intersect one another. They are simply stock lines with data scraped from yfinance, but when running line1.intersect(line2), it returns that there are no intersection points. I have attached some photos to bring clarity (note that line1=SMA-50, line2=SMA-200 in the graph legend)

The visual intersection

This is the current section of code:

        new50 = sma50.to_frame().reset_index() #set series to dataframe & preserve date as column
        new200 = sma200.to_frame().reset_index()

        new50.Date = pd.to_numeric(pd.to_datetime(new50.Date.dt.date)) #removing time from datetime, converting to numeric
        new200.Date = pd.to_numeric(pd.to_datetime(new200.Date.dt.date))
        new50.Close = round(new50.Close,1) #round to see if I can obtain intersections...
        new200.Close= round(new200.Close,1)

        line1 = LineString(np.column_stack([new50.Close, new50.Date]))
        line2 = LineString(np.column_stack([new200.Close, new200.Date]))

        print(line1.intersection(line2))

This is what one of the dataframes (new200) looks like, and the response I get when printing line1.intersection(line2):

                    Date  Close
0    1644796800000000000    NaN
1    1644883200000000000    NaN
2    1644969600000000000    NaN
3    1645056000000000000    NaN
4    1645142400000000000    NaN
..                   ...    ...
495  1707091200000000000  442.7
496  1707177600000000000  444.7
497  1707264000000000000  446.9
498  1707350400000000000  449.0
499  1707436800000000000  451.3

[500 rows x 2 columns]
LINESTRING Z EMPTY

I have tried rounding my numbers to see if I can get an intersection this way. Unfortunately this approach did not work. I also removed time from datetime, to see if the reason for no intersections was due to my data being too precise. I've also looked on the internet to see if I can find solutions, but haven't had much luck with this approach.

I have seen a potential solution using numpy np.diff(), which I will test right now - however, I think it would be interesting to see why Shapely fails at recognising this intersection or if it is my own fault.

I have looked online quite a bit but have had no luck with this - would appreciate any help. Thanks all!


Solution

  • NaN values are invalid for coordinates, and calling functions like intersection on invalid geometries results in undefined behaviour, which is probably what you are seeing.

    Normally the following warning should be printed on the lines creating the LineStrings because of the NaN values:

    RuntimeWarning: invalid value encountered in linestrings
    

    If you use dropna() to remove the rows with invalid NaN values for coordinates before creating the linestrings, you should get the expected result:

    import numpy as np
    import pandas as pd
    from shapely import LineString
    
    dates = [
        1644796800000000000,
        1644883200000000000,
        1644969600000000000,
        1645056000000000000,
    ]
    new50 = pd.DataFrame({"Date": dates, "Close": [np.nan, 410.0, 420.0, 500.0]})
    new200 = pd.DataFrame({"Date": dates, "Close": [np.nan, 550.0, 500.0, 420.0]})
    
    line1 = LineString(np.column_stack([new50.Close, new50.Date]))
    line2 = LineString(np.column_stack([new200.Close, new200.Date]))
    print(f"original: {line1.intersection(line2)}")
    
    new50 = new50.dropna()
    new200 = new200.dropna()
    
    line1 = LineString(np.column_stack([new50.Close, new50.Date]))
    line2 = LineString(np.column_stack([new200.Close, new200.Date]))
    print(f"after dropna: {line1.intersection(line2)}")
    

    Result:

    original: LINESTRING EMPTY
    after dropna: POINT (460 1645012800000000000)