Computing estimated mobility needs of a given territory for bus route

My doubt is more conceptual than programming-related but here we go.

The goal is to place the most optimal set of stops for a bus route but first we need to compute the estimated demand in a territory.

Input is not defined yet, but let's say we have an input of relevant weighted points of common interest of the population in a territory.

We should compute the mobility needs in every point of all the roads in the territory in order to place the stops mentioned before.

The approach I'm taking for this problem is:

For every point in the road, compute the sum of weights of the N elements in a radius of R (something like the KNN algorithm logic).
After that, with KMeans (since its distance-based) we could compute the most relevant K centroids in the territory to get the first sample of stops. After that, optimize in order to fit with the optimization parameters given such as km's covered, route time, etc.

What do you guys think about this approach? Would you use other algorithm's? Would you try to approach the problem differently?

In the end, the biggest mathematical challenge is to:

Define mobility needs in a given territory.
Place the stops based on the mobility needs.

Solution

Step 1: Define every potential stop.

In a city, this would be the center of each block. Are you happy to have users cross roads? If not, two stops in the center of each block, one on each side of the road. In rural areas, maybe every 250 meters along each road. Suburbs are more complex. Possibly identify main roads and place potential stops along those, every 250 meters.

Step 2:

Calculate the need at every potential stop.

Step 3:

Use K-Means to identify the N stops that will become actual stops ( see example implementation below ). N is an input. You might have to iterate over N values, calculating the worst served need distance to the nearest actual stop, until you find the N that gives an acceptable worst served distance.

Step 3:

Construct a graph of vertices ( actual stops ) and edges ( roads )

Step 4:

Use the travelling salesman ( Euclidean ) algorithm to calculate the bus route.

Example implementation of using K-Means to identify the N stops that will become actual stops

This is a C++ implementation using the KMeans library from https://github.com/JamesBremner/KMeans

void cSolution::selectStops()
{
    // Construct the KMeans class
    KMeans KM;

    // loop over potential bus stops
    for (auto &ps : myPotentialStops)
    {
        //Each unit of need at the potential bus stop
        // is represented by "a need" at the location
        for (int k = 0; k < ps.myNeed; k++)
        {
            cDataPoint l(2);
            l.d[0] = ps.myLoc.first;
            l.d[1] = ps.myLoc.second;
            KM.Add(l);
        }
    }

    // initialize KMeans with the number of actual bus stops reuired
    KM.Init( myCountActualStops, false );

    // run KMeans algorithm to find clusters of need
    for( int kiter=0; kiter < 10; kiter++ )
    {
        KM.Assign();
        KM.MoveClustersToMean();
    }

    // Select bus stops nearest to cluster centers
    for( auto& c : KM.clusters() )
    {
        float min = 1e10;
        int nearest;
        int ks = -1;
        for (auto &ps : myPotentialStops)
        {
            ks++;
            float td = dist2( 
                ps.myLoc,
                { c.center().d[0],  c.center().d[1]});
            if( td < min )
            {
                min = td;
                nearest = ks;
            }
        }
        // convert nearest potential bus stop to an actual bus stop.
        myPotentialStops[nearest].myfActual = true;
    }

}

To test this, I have constructed a grid of roads with potential stops ( green dots ) in the middle of blocks. I have assigned semi-random needs to each potential stop ( numbers beside stops ). Running the above code produces this result ( red dots are assigned actual stops )

The complete code for this application is at https://github.com/JamesBremner/bussttop