Search code examples
cluster-analysisk-meanscplexopl

Problems with creating a mathematical clustering model with an additive criterion in CPLEX OPL Studio


I'm trying to create a model in CPLEX OPL Studio for clustering with an additive criterion, but I have a number of errors that I don't know how to fix correctly, because I'm very bad at OPL Studio Initially there was such a loss function to calculate the deviation from the cluster center Next, I substituted the values ​​into the general loss function and as a result I get the following formula There is also a formula for calculating the center of clusters

`   // Number of clients, number of features, and number of clusters
   int n = ...; // Number of clients
   int m = ...; // Number of features
   int k = ...; // Number of clusters

   // Client data: feature values for each client
   float data[i in 1..n][j in 1..m] = ...;

   // Binary variables: x[i][c] = 1 if client i is assigned to cluster c
   dvar boolean x[1..n][1..k];

   // Variables for the center of each cluster for each feature
   dvar float mu[1..k][1..m];

   // Model
   minimize
       sum(c in 1..k, i in 1..n, j in 1..m) x[i][c] * (data[i][j] - mu[c][j])^2;

   // Constraints
   subject to {
       // Each client belongs to exactly one cluster
        forall(i in 1..n)
           sum(c in 1..k) x[i][c] == 1;
    
       // Definition of cluster centers
       forall(c in 1..k, j in 1..m)
            mu[c][j] == sum(i in 1..n) x[i][c] * data[i][j] / sum(i in 1..n) x[i][c];
    }`

I tried to write code for the following formulas, but ran into syntax problems. For example, like this: CPLEX (default) failed to parse expression: forall(c in 1..3, j in 1..4) mu[c][j] == sum(i in 1..5) (x[ i][c]*data[i][j]) / (sum(i in 1..5) x[i][c]) It might be worth adding more restrictions, but I'm a little confused


Solution

  • Within CPLEX I would rather use the Constraint Programming algorithm.

    using CP;
    
    
    
     // Number of clients, number of features, and number of clusters
       int n = 3; // Number of clients
       int m = 4; // Number of features
       int k = 2; // Number of clusters
    
       // Client data: feature values for each client
       float data[i in 1..n][j in 1..m] = i*j;
    
       // Binary variables: x[i][c] = 1 if client i is assigned to cluster c
       dvar boolean x[1..n][1..k];
    
       // Variables for the center of each cluster for each feature
       
       dexpr float mu[c in 1..k][j in 1..m]=
       sum(i in 1..n) x[i][c] * data[i][j] / sum(i in 1..n) x[i][c];
    
       // Model
       minimize
           sum(c in 1..k, i in 1..n, j in 1..m) x[i][c] * (data[i][j] - mu[c][j])^2;
    
       // Constraints
       subject to {
           // Each client belongs to exactly one cluster
            forall(i in 1..n)
               sum(c in 1..k) x[i][c] == 1;
        
           // Definition of cluster centers
           forall(c in 1..k, j in 1..m)
                mu[c][j] == sum(i in 1..n) x[i][c] * data[i][j] / sum(i in 1..n) x[i][c];
        }
    

    works fine

    Or if you use a better formulation

    using CP;
    
    
    
     // Number of clients, number of features, and number of clusters
       int n = 3; // Number of clients
       int m = 4; // Number of features
       int k = 2; // Number of clusters
    
       // Client data: feature values for each client
       float data[i in 1..n][j in 1..m] = i*j;
    
       // Which cluster x[i]
       dvar int x[1..n] in 1..k;
    
       // Variables for the center of each cluster for each feature
       
       dexpr float mu[c in 1..k][j in 1..m]=
       sum(i in 1..n) (x[i]==c) * data[i][j] / sum(i in 1..n) (x[i]==c);
    
       // Model
       minimize
           sum(c in 1..k, i in 1..n, j in 1..m) (x[i]==c) * (data[i][j] - mu[c][j])^2;
    
       // Constraints
       subject to {
           
        
           // Definition of cluster centers
           forall(c in 1..k, j in 1..m)
                mu[c][j] == sum(i in 1..n) (x[i]==c) * data[i][j] / sum(i in 1..n) (x[i]==c);
        }
    

    See https://github.com/AlexFleischerParis/opltipsandtricks/blob/master/kmeans.mod