Simulation results not matching the theoretical answer

I am trying to model an M/M/K queue system by simulating it in R with an arrival rate of 8, service rate of 10 and 1 server. The average queue length at steady state, according to the theoretical formula, should be rho/(1-rho) where rho = (lambda/mu). In my case, this should result in average queue length of 4.

First, I defined the queue parameters:

set.seed(123)
library(ggplot2)
library(tidyr)
library(dplyr)
library(gridExtra)

#  simulation parameters
lambda <- 8          # Arrival rate
mu <- 10               # Service rate
sim_time <- 200       # Simulation time
k_minutes <- 15       # Threshold for waiting time
num_simulations <- 100  # Number of simulations to run
initial_queue_size <- 0  # Initial queue size
time_step <- 1        # Time step for discretization

servers <- c(1)

Next, I defined a function to perform a single simulation. My approach takes the current queue length, adds random arrivals and subtracts random departures - and then repeats this process:

# single simulation
run_simulation <- function(num_servers) {
    queue <- initial_queue_size
    processed <- 0
    waiting_times <- numeric(0)
    queue_length <- numeric(sim_time)
    processed_over_time <- numeric(sim_time)
    long_wait_percent <- numeric(sim_time)
    
    for (t in 1:sim_time) {
        # Process arrivals
        arrivals <- rpois(1, lambda * time_step)
        queue <- queue + arrivals
        
        # Process departures
        departures <- min(queue, rpois(1, num_servers * mu * time_step))
        queue <- queue - departures
        processed <- processed + departures
        
        # Update waiting times
        if (length(waiting_times) > 0) {
            waiting_times <- waiting_times + time_step
        }
        if (arrivals > 0) {
            waiting_times <- c(waiting_times, rep(0, arrivals))
        }
        if (departures > 0) {
            waiting_times <- waiting_times[-(1:departures)]
        }
        
        # Record metrics
        queue_length[t] <- queue
        processed_over_time[t] <- processed
        long_wait_percent[t] <- ifelse(length(waiting_times) > 0,
                                       sum(waiting_times > k_minutes) / length(waiting_times) * 100,
                                       0)
    }
    
    return(list(queue_length = queue_length, 
                processed_over_time = processed_over_time, 
                long_wait_percent = long_wait_percent))
}

I then perform multiple runs of the simulation:

results <- lapply(servers, function(s) {
    replicate(num_simulations, run_simulation(s), simplify = FALSE)
})

And finally, I tidy everything up into data frames:

# Function to create data frames for plotting
create_plot_data <- function(results, num_servers) {
    plot_data_queue <- data.frame(
        Time = rep(1:sim_time, num_simulations),
        QueueLength = unlist(lapply(results, function(x) x$queue_length)),
        Simulation = rep(1:num_simulations, each = sim_time),
        Servers = num_servers
    )
    
    plot_data_processed <- data.frame(
        Time = rep(1:sim_time, num_simulations),
        ProcessedOrders = unlist(lapply(results, function(x) x$processed_over_time)),
        Simulation = rep(1:num_simulations, each = sim_time),
        Servers = num_servers
    )
    
    plot_data_wait <- data.frame(
        Time = rep(1:sim_time, num_simulations),
        LongWaitPercent = unlist(lapply(results, function(x) x$long_wait_percent)),
        Simulation = rep(1:num_simulations, each = sim_time),
        Servers = num_servers
    )
    
    return(list(queue = plot_data_queue, processed = plot_data_processed, wait = plot_data_wait))
}

plot_data <- lapply(seq_along(servers), function(i) {
    create_plot_data(results[[i]], servers[i])
})

plot_data_queue <- do.call(rbind, lapply(plot_data, function(x) x$queue))
plot_data_processed <- do.call(rbind, lapply(plot_data, function(x) x$processed))
plot_data_wait <- do.call(rbind, lapply(plot_data, function(x) x$wait))

My Problem: When I calculate the average queue length, I get the following:

> mean(plot_data_queue$QueueLength)
[1] 2.46215

And this average does not match the theoretical answer.

Can someone please help me understand what is lacking in my approach and what I can to do to fix this?

Note: Here is my full code : https://shrib.com/#Hallie9zZMaKL

Solution

Let me preface this by stating that I'm not an R programmer. However, I am a retired professor with a 40 year research career in the field of simulation. I created and taught a graduate level course in simulation output analysis for the 20 years prior to my retirement. In support of the discussion that follows, I cite sources from publications with premier reputations: Communications of the ACM, Management Science, and the Proceedings of the Winter Simulation Conference.

Bottom Line Up Front—You can model the M/M/1 using fixed time increments, but for reasons outlined below you need to make the time steps at least a couple of orders of magnitude smaller than what you're currently using. The probability of having multiple arrivals or departures occur within the same time step should be close to zero to avoid introducing significant bias. Switching to an event-based model methodology avoids the distortions created by using time-stepped modeling. Whether you use time-stepped or event-based modeling, you need to delete some (hard to determine) amount of data to mitigate the effects caused by serial correlation and starting the model in an atypical state. If you wish to move on to multi-server queueing models, or models using distributions other than exponential, event-based modeling will almost certainly be a necessity.

If you want to skip theory and jump straight to the impact of various interventions, go down to the Addendum section at the end of this post.

I will be illustrating some of these concepts using Ruby, not because I'm trying to dissuade you from using R but because Ruby is a language which reads like pseudo-code but actually runs. I hope you can see past the particular language choice to the underlying ideas. For anybody who wants more detail, please follow the links to various original sources.

Discrete Event vs Time Step Modeling

M/M/k queueing models are a classic example of discrete event systems, i.e., systems in which the time axis is continuous but the state changes only at a discrete set of points in time known as "events". Nothing happens between events, so a program which moves directly from each event to the next will be computationally more efficient than one which steps along the time axis in fixed increments, regardless of whether or not anything changed.

The larger your time step is, the more events will have to be included within that step. Unfortunately, aggregating multiple state transitions within a time step discards important information about the order of events. Using time stepping for a discrete event system can introduce significant bias in your estimates of performance measures, and in the case of path-dependent models can even lead to completely different outcomes for the model. See "The Effects of Time Advance Mechanism..." by Al Rowaei, Buss, and Lieberman (2011) for more information. As a concrete example for your model, if an M/M/1 system has two arrivals and two departures within your chosen time step, your queue length statistic will appear to be unchanged. However, depending on the order in which those events actually occurred the queue length might have changed by ±2 within the time step interval. The larger your time step is relative to the rate of event occurrences, the more information you are masking. I ran more experiments than I have time/space/inclination to include, and this seems to be the main source of bias with your model's parameterization.

Another complication is that time series information can involve two entirely different types of performance measures. Tally-based statistics such as average delay in system can be estimated by summing up delays for individual customers, and dividing by the number of customers. Queue length is an interval-based statistic, an entirely different beast. You need to include not just the length of the queue, but also account for the duration of time it was in that state. The correct way to estimate it is

where T is the time period being sampled and Q(t) is the length of the queue at any point in time t. If you use time stepped increments you convert the measure to a tally-based statistic, since all time steps contribute the same duration, but at the cost of losing all information about queue length variations and their durations within each interval.

If you're still convinced you want to use time stepping to advance the simulation clock, you will need to adjust the size of the time increment to be small enough that there is a very low likelihood of having multiple events fall within each time interval. This will greatly reduce the impact of event ordering problems, but comes at the cost of increasing the number of iterations to simulate a fixed period of time.

Generic Issues for all Simulation Time Series Data

Prof. Richard Conway wrote a paper (Conway, R. W. 1963, October. "Some Tactical Problems in Digital Simulation". Management Science 10 (1): 47–61) in which he identified some common pitfalls in analyzing computer simulation output. A quick summary is that many computer simulations produce observations that are not mutually independent. The output is a time series and hence has serial correlation. This bites us in the backside in a couple of ways.

The first issue is that traditional statistical techniques are based on the assumption that your data are independent and identically distributed, and yield incorrect results with correlated data. Standard errors can be off by orders of magnitude, depending on the magnitude of the correlations, and consequently sample size requirements are often much larger than for independent sampling. Queueing systems with high traffic intensities may require tens of thousands or even hundreds of thousands of observations to yield good estimates of steady state behaviors.

The second issue that serial correlation causes is called "initialization bias". Computer programs need variables to be initialized. If your computer simulation is initialized to a "convenience state", such as starting with no customers in the queue, the first customer has probability 1 of having no delay prior to getting served. We know from queueing theory that once the system achieves steady state, the probability of arriving to an empty system is 1-ρ, where ρ is the traffic intensity. This atypically low waiting time, combined with serial correlation, means that an unknown number of subsequent customers will experience below average wait times and an estimate of the average wait time that includes this data will be biased on the low side. Two approaches to avoid this are 1) make massively long runs to overwhelm the bias; or 2) discard a sufficient amount of the initial data to remove the bias. Researchers have proposed a variety of techniques to deal with these issues in the 60+ years since Conway first noted them, but no single approach is considered definitive. All of the currently known techniques share a common behavior: queueing systems with higher traffic intensities will require both discarding more data to mitigate initialization bias, and substantially increasing run-lengths to improve the precision of point or interval estimates.

Single Server Queues w/ Little's Law and Lindley's Recurrence

Tally-based statistics are much easier to deal with than interval-based statistics, so my personal preference would be to simulate delays in system rather than queue lengths. The good news is that there's a straightforward conversion called Little's Law: L = λW, where L is the long term average number of people in the system, W is the long term average delay in system, and λ is the arrival rate. If we can get a good estimate of the average delay, we just scale it by the arrival rate to estimate average queue length (including the customer receiving service).

For single server queueing systems, we can use Lindley's Recurrence to generate event-based data with a simple loop:

require 'random_variates'
require 'quickstats'

def mm1(arrival_rate, service_rate, stop_time, truncate)
  # Initialize state vars
  arrive = depart = 0.0

  # Create arrival and service exponential distribution generators with specified rates
  arr_dist = RV::Exponential.new(rate: arrival_rate)
  svc_dist = RV::Exponential.new(rate: service_rate)

  # Create a tally statistics object
  stats = QuickStats.new

  while true do
    # Add inter-arrival time to previous arrival time to get next arrival time
    arrive += arr_dist.next
    # Exit the loop if we're past the stopping time 
    break if arrive > stop_time
    # Get to start service at the max of our arrival time or prior customer's departure
    start = [arrive, depart].max
    # Add service time to our start time to get our departure time
    depart = start + svc_dist.next
    # Tally our time in the system if our arrival time is after specified warmup period
    stats.new_obs(depart - arrive) if arrive > truncate
  end
  return stats.sample_mean
end

# Replicate the M/M/1 queue performance 100 times, truncating data prior
# to time 50 and thus generating 200 time units of usable data.
meta_stats = QuickStats.new
100.times do
  meta_stats.new_obs(mm1(8, 10, 250, 50))
end
# Calculate a 95% confidence interval for the expected delay in system
puts "E[W] = #{meta_stats.sample_mean} ± #{1.96 * meta_stats.std_err}"

This produces outcomes such as E[W] = 0.5128982831931866 ± 0.022150081089025338. Multiplying by λ = 8 yields a result consistent with the theoretical value you calculated. Note that at higher traffic intensities, you'll definitely need to run longer and truncate more data to ameliorate Conway's issues. For example, my experience with M/M/1's is that for ρ=0.95, you would want to generate 50,000+ customers and truncate several thousand to avoid initial bias.

The core logic is:

arrive = arrive + arr_dist.next
start = [arrive, depart].max
depart = start + svc_dist.next
time_in_sys = depart - arrive

Translate to R syntax, wrap in a loop, and you've got the correct logic to get customer's delay in system for an M/M/1 queue. Apply suitable logic to ignore some of the data at the beginning to yield unbiased estimates.

Generalizing to K Servers for K > 1

Lindley's Recurrence doesn't apply for multi-server queueing systems. Instead, we need to use a priority queue to explicitly schedule events so they get executed in the proper order. For a FIFO M/M/k queueing system the events are:

Arrival - a new customer arrives to the system, and is appended to the queue. The next arrival gets scheduled based on the inter-arrival time distribution, and if a server is available a begin service event is scheduled with no delay
Begin Service - the customer at the front of the line is removed from the queue, the number of available servers is decremented, and an end service event is scheduled based on the service time distribution.
End Service - the customer departs the system, and the number of available servers is incremented. If the queue is not empty, a begin service event is scheduled with no delay.

Event scheduling involves placing the appropriate event notices in a priority queue ordered by the scheduled times of occurrence. Execution of events is performed using an "event loop", which polls the priority queue to determine which event in the pending set should be the next one, updates the simulation clock to the time of that event, and then performs the state transitions and any additional event scheduling associated with that event. Note that this is extremely efficient, since the program doesn't need to do any work between events.

Scheduling relationships can be represented visually using an "event graph", a notation created by Prof. Lee Schruben (Schruben, L. W. 1983. "Simulation Modeling with Event Graphs". Commun. ACM 26 (11): 957–963).

Note that this is not a flow chart. Because of the time delays, which are adjudicated by passing through the priority queue, execution of events is not sequential along the graph.

With a suitable event scheduling library, this event graph translates very directly to working code:

# GenericQueue.rb

require 'simplekit'

# Demonstration model of an G/G/k queueing system.  There are k servers
# and both the arrival and service processes are generic distributions.
class GGk
  include SimpleKit

  # Constructor - initializes the model parameters.
  # param: arrival_dist - A distribution object used to generate the inter-arrival times.
  # param: service_dist - A distribution object used to generate the service times.
  # param: max_servers - The total number of servers in the system.
  # param: customers - number of customers to process
  def initialize(arrival_dist, service_dist, max_servers, customers, lifo = false)
    @max_servers = max_servers
    @max_customers = customers
    @arrival_dist = arrival_dist
    @service_dist = service_dist
    @LIFO = lifo
  end

  # Initialize the model state and schedule any necessary events.
  # Note that this particular model will terminate after the specified
  # number of customers have been through the queue.
  def init
    @num_available_servers = @max_servers
    @queue = []
    @customers = 0
    schedule(:arrival, @arrival_dist.next)
  end

  # An arrival event increments the queue length, schedules the next
  # arrival, and schedules a beginService event if a server is available.
  def arrival
    @customers += 1
    @queue << model_time
    schedule(:arrival, @arrival_dist.next)
    schedule(:begin_service, 0.0) if @num_available_servers > 0
  end

  # Start service for the next eligible customer in line, removing that
  # customer from the queue and utilizing one of the available servers.
  # An endService will be scheduled.
  def begin_service
    if @LIFO
      arrival_time = @queue.pop
    else
      arrival_time = @queue.shift
    end
    @num_available_servers -= 1
    schedule(:end_service, @service_dist.next)
    printf "%f\n", model_time - arrival_time
    schedule(:halt, 0.0) if @customers >= @max_customers
  end

  # Frees up an available server, and schedules a beginService if
  # anybody is waiting in line.
  def end_service
    @num_available_servers += 1
    schedule(:begin_service, 0.0) unless @queue.empty?
  end
end

which can be parameterized and run, for example, with:

require_relative 'GenericQueue.rb'
require 'random_variates'

if ARGV.length == 1
  N = ARGV.shift.to_i
  # Instantiate a GGk object with exponentials and run it.
  arr = RV::Exponential.new(rate: 19.0)
  svc = RV::Exponential.new(rate: 5.0)
  GGk.new(arr, svc, 4, N).run
else
  STDERR.puts "Must supply number of customers as a cmd-line argument."
end

Addendum

I'm going to contradict the statements made elsewhere about the validity of using Poisson processes to determine the queue length. If you were using discrete event modeling you would generate the times between events using the exponential distribution for M/M systems. However, you're using time-stepped modeling and the Poisson distribution is the correct choice to describe how many exponentially distributed events occur in a fixed interval. Your use of the min function for the departures is a technique called "thinning", used for generating non-homogeneous Poisson RVs. It acts to thin out some "potential" departure events, conditioned on there having been enough arrivals to accommodate the proposed number of departures. This elimination turns out to be valid thanks to the memoryless property of the exponential distribution. When you discard an event the time until the next candidate still has an exponential distribution, so it remains a Poisson process but with a de-facto conditionally (and temporarily) modified rate. The proof is in the pudding—with suitable choices for the time step, run length, and truncation amount as discussed above, your Poisson-based time-stepped model can yield estimates consistent with theory. Note, however, that it will not work for non-exponential service times.

I'm still not an R programmer, so I implemented your time-stepped approach in Ruby. I also figured that an independent implementation could confirm the approach since I don't feel qualified to comment on whether your code contains bugs or not.

I finally broke out a block of time to run some systematic experimentation to address your specific modeling methodology. In order to demonstrate how bias of the estimated queue length is affected, I varied the following factors across the values in square brackets:

the size of the time step [1, 0.1, 0.01];
the length of the run [20, 200, 1000];
truncation time from the start of the run [0, 2, 10, 100]; and
the traffic intensity rho [0.8, 0.9, 0.95]. This yields 108 design points (unique configurations), each of which was replicated 1000 times, for a grand total of 108,000 runs.

For plotting purposes I converted the actual time step size to -log₁₀, so a time step of 1 maps to 0, 0.1 maps to 1, and 0.01 maps to 2. Since we know that smaller time steps should improve the accuracy of a time-stepped simulation, this means that accuracy improves going from left to right for all factors being studied.

The truncation/run-length combination also needs some explanation. The system was actually run for truncation + run-length time units but data prior to the truncation time was discarded. The observations generated within the remaining run-length are the basis of the estimated queue length.

The theoretical values of queue length for traffic intensities 0.8, 0.9, and 0.95 are 4, 9, and 19, respectively. This was converted into the performance measure bias = observed - theoretical. Combinations of the time step, run length, truncation time, and traffic intensity should ideally yield bias estimates of zero to be deemed acceptable.

The plot shows confidence intervals for estimated bias for each combination of the factors described above. I will focus on the broad trends rather than try to discuss each of the 108 data points.

In all cases, smaller time steps improved (reduced) the bias. We want our time steps to be sufficiently small that having multiple events occur within that interval is close to zero.

At lower traffic intensities, larger run lengths come at a cost in the model's run time and have diminishing value, but longer runs also negate the importance of initial truncation. However, as traffic intensity increases we need both longer runs and larger truncation amounts to achieve low bias.