I have a scenario: I have the data of some GPS Tracks( longitudes, latitudes ) and these are contained in 2 parts
First part containing the data (Longitudes and Latitudes) which are the journey stations (These are actual coordinates and they must be visited when the bus starts its journey)
Second Part containing the GPS coordinates (Longitude and Latitude) but probably 2 times more then 1st part. Everytime when bus starts its journey, it stops these station (of whome coordinates have been given). I want to compare that bus completed its journey Or not by comparing its visited GPS stations (realtime coordinates) with the first part (schedualed Coordinates).
I have almost double coordiantes in the second part and all those are very very close with each other and almost 5-8 coordinates represents the same station..( e.g 104578,105888 ) and ( 104579,105890 )
What would be the right and possible way to declare that certain no of coordiantes are representing the same station. This problem probably can be solved out using K Nearest Neighbour or K Means somehow.
This problem seems to be not well defined..But I think on query I would try to explain more.
Have you consider using a simple thresholding approach? i.e. merge coordinates withing a certain distance? It seems as you are very well able to choose such a threshold.
The problem with clustering is that it will try to discover structure in your dataset.
What you seem to be interested in, is simple merging of objects that are within a certain distance. There is no "structure" that you want to discover. You want to do preprocessing, not clustering.