Search code examples
roptimizationcalibrationdemographics

Optimisation of matrix in R


I'm new to optimisation/calibration of models in R, but i'm eager to learn and really need some help. My question relates to demographic modelling.

I've done some research and found help here and here but neither have quite answered my question.

I have a matrix of scalars (propensities) where each column must total to 1. These propensities are used to estimate the number of households that would arise from a given population (persons by age). The propensities model tends to overestimate the number of households in history (for which I know the true number of households).

I want to calibrate the model to minimise the error in the number of households by tweaking the propensities such that the columns still add to 1 and propensities with an initial value of zero must remain zero.

Simple example:

  # Propensities matrix
  mtx <- matrix(c(0.00,  0.00,  0.85,  0.00,  0.15, 0.35,  0.45,  0.00,  
               0.20,  0.00, 0.65,  0.15,  0.00,  0.20,  0.00), ncol = 3)

  # Population by age cohort
  pop <- c(2600, 16200, 13400)

  # True number of households
  target <- c(7000, 4500, 5500)

  # Function to optimise
  hh <- function(mtx, pop, target) {
    # Estimate living arrangements
    x <- mtx %*% pop 
    # Estimate number of households using parent cohorts (1,2 and 4)
    x <- c(x[1,1]/2, x[2,1]/2, x[4,1]) - target
    return(x)
  }

I haven't included any of my code for the optimisation/calibration step as it would be embarrassing and I've haven't been able to get anything to work!

Ideally i will have one set of propensities that generalises well for lots of different regions at the end of this process. Any advice on how i should go about achieving it? Helpful links?

Update

The snippet of code below executes the local search method as suggested by Enrico.

library(tidyverse)
library(NMOF)

data <- list(mtx = matrix(c(0.00,  0.00,  0.90,  0.00,  0.10, 0.25,  0.50,  0.00,   
                            0.25,  0.00, 0.60,  0.20,  0.00,  0.20,  0.00), ncol = 3),
             pop = c(2600, 16200, 13400),
             target = c(7190, 4650, 5920))

# True mtx
mtx.true <- matrix(c(0.00,  0.00,  0.75,  0.00,  0.25, 0.35,  0.45,  0.00,
                     0.20,  0.00, 0.65,  0.15,  0.00,  0.20,  0.00), ncol = 3)

# Function to optimise
households <- function(x, data) {

  # Estimate living arrangements
  z <- x %*% data$pop 

  # Estimate number of households using parent cohorts (1,2 and 4)
  z <- c(z[1,1]/2, z[2,1]/2, z[4,1]) - data$target
  sum(abs(z))

}

# Local search function to perturb propensities
neighbour <- function(x, data) {

  # Choose random column from mtx
  i <- sample(1:ncol(x), 1)
  # Select two non-zero propensities from mtx column
  j <- which(x[, i] != 0) %>% sample(2, replace = FALSE)

  # Randomnly select one to perturb positively 
  x[j[1], i] <- 0.1 * (1 - x[j[1], i]) + x[j[1], i]
  # Perturb second propensity to ensure mtx column adds to 1
  x[j[2], i] <- x[j[2], i] + (1 - sum(x[,i]))

  x

}

# Local search algorithm inputs 
localsearch <- list(x0 = data$mtx,             
                    neighbour = neighbour,
                    nS = 50000,                
                    printBar = FALSE)

# Execute 
now <- Sys.time()
solution <- LSopt(OF = households, algo = localsearch, data)
#> 
#> Local Search.
#> Initial solution:  2695 
#> Finished.
#> Best solution overall: 425.25
Sys.time() - now
#> Time difference of 6.33272 secs

# Inspect propensity matrices
print(solution$xbest)
#>           [,1]   [,2] [,3]
#> [1,] 0.0000000 0.3925  0.6
#> [2,] 0.0000000 0.4250  0.2
#> [3,] 0.2937976 0.0000  0.0
#> [4,] 0.0000000 0.1825  0.2
#> [5,] 0.7062024 0.0000  0.0
print(mtx.true)
#>      [,1] [,2] [,3]
#> [1,] 0.00 0.35 0.65
#> [2,] 0.00 0.45 0.15
#> [3,] 0.75 0.00 0.00
#> [4,] 0.00 0.20 0.20
#> [5,] 0.25 0.00 0.00

Thanks!


Solution

  • I can only comment on the optimisation part.

    The code you have provided is sufficient; only your objective function evaluates to a vector. You will need to transform this vector into a single number that is to be minimised, such as the sum of squares or of absolute values.

    When it comes to methods, I would try heuristics; in fact, I would try a Local-Search method. These methods operate on the solution through functions which you define; thus, you may code your solution as a matrix. More specifically, you would need two functions: the objective function (which you essentially have) and a neighbourhood function, which takes as input a solution and modifies it. In your particular case, it could take a matrix, select two none-zero elements from one column, and increase one and decrease the other. Thus, the column sum would remain unchanged.

    Perhaps the tutorial http://enricoschumann.net/files/NMOF_Rmetrics2012.pdf is of interest, with R code http://enricoschumann.net/files/NMOF_Rmetrics2012.R .