Search code examples
pythonpandasmachine-learningk-meanscoordinate-systems

How do apply K-means clustering if you have a polygon set of latitude and longitude


I got some geometry coordinates.

E.g. POLYGON ((15927.81230000034 36864.30379999988, 15926.792399999686 36861.118400000036, 15923.173100000247 36862.27639999986, 15924.19299999997 36865.4617999997, 15927.81230000034 36864.30379999988))

E.g. I went and converted each pair into the latitude longitude version

POLYGON = [(1.1603180714482149, 103.9129638389025), (1.160308848641466, 103.912935217908), (1.1602761166689228, 103.91294562159307), (1.1602853394755797, 103.91297424258724), (1.1603180714482149, 103.9129638389025)]

Normally for k means clustering

From what I understand is that 1 polygon set represents 1 building. So how do i convert 1 set of polygon which has a few pair of lat & lon into 1 single lat lon to represent the building?


Solution

  • If you're using shapely then you can convert the polygon into the corresponding X matrix like this:

    import numpy as np
    from shapely.geometry import Polygon
    
    polygon = Polygon([
        (1.1603180714482149, 103.9129638389025),
        (1.160308848641466, 103.912935217908),
        (1.1602761166689228, 103.91294562159307),
        (1.1602853394755797, 103.91297424258724),
        (1.1603180714482149, 103.9129638389025)
    ])
    
    X = np.vstack(polygon.exterior.xy).T
    print(X)
    

    Result:

    [[  1.16031807 103.91296384]
     [  1.16030885 103.91293522]
     [  1.16027612 103.91294562]
     [  1.16028534 103.91297424]
     [  1.16031807 103.91296384]]
    

    Which is the right format for sklearn's KMeans.