python pandas machine-learning k-means coordinate-systems

How do apply K-means clustering if you have a polygon set of latitude and longitude

I got some geometry coordinates.

E.g. POLYGON ((15927.81230000034 36864.30379999988, 15926.792399999686 36861.118400000036, 15923.173100000247 36862.27639999986, 15924.19299999997 36865.4617999997, 15927.81230000034 36864.30379999988))

E.g. I went and converted each pair into the latitude longitude version

POLYGON = [(1.1603180714482149, 103.9129638389025), (1.160308848641466, 103.912935217908), (1.1602761166689228, 103.91294562159307), (1.1602853394755797, 103.91297424258724), (1.1603180714482149, 103.9129638389025)]

Normally for k means clustering

From what I understand is that 1 polygon set represents 1 building. So how do i convert 1 set of polygon which has a few pair of lat & lon into 1 single lat lon to represent the building?

Solution

If you're using shapely then you can convert the polygon into the corresponding X matrix like this:

import numpy as np
from shapely.geometry import Polygon

polygon = Polygon([
    (1.1603180714482149, 103.9129638389025),
    (1.160308848641466, 103.912935217908),
    (1.1602761166689228, 103.91294562159307),
    (1.1602853394755797, 103.91297424258724),
    (1.1603180714482149, 103.9129638389025)
])

X = np.vstack(polygon.exterior.xy).T
print(X)

Result:

[[  1.16031807 103.91296384]
 [  1.16030885 103.91293522]
 [  1.16027612 103.91294562]
 [  1.16028534 103.91297424]
 [  1.16031807 103.91296384]]

Which is the right format for sklearn's KMeans.