Iterative quantile estimation in Matlab

I'm trying to implement an interative algorithm to estimate quantiles in data that is generated from a Monte-Carlo simulation. I want to make it iterative, because I have many iterations and variables so storing all data points and using Matlab's quantile function would take much of the memory that I actually need for the simulation.

I found some approaches based on the Robbin-Monro process, given by

Formula

The implementation with a control sequence c_t = c / t where c is constant is quite straight forward. In the cited paper, they show that c = 2 * sqrt(2 * pi) gives quite good results, at least for the median. But they also propose an adaptive approach based on an estimation of the histogram. Unfortunately, I haven't figured out how to implement this adaptation yet.

I tested the implementation with a constant c for three test samples with 10.000 data points. The value c = 2 * sqrt(2 * pi) did not work well for me, but c = 100 looks quite good for the test samples. However, this selction is not very robust and failed in the actual Monte-Carlo simulation giving results wide off the mark.

probabilities = [0.1, 0.4, 0.7];
controlFactor = 100;
quantile = zeros(size(probabilities));
indicator = zeros(size(probabilities));
for index = 1:length(data)
    control = controlFactor / index;
    indices = (data(index) >= quantile);
    indicator(indices) = probabilities(indices);
    indices = (data(index) < quantile);
    indicator(indices) = probabilities(indices) - 1;
    quantile = quantile + control * indicator;
end

Is there a more robust solution for iterative quantile estimation or does anyone have an implementation for an adaptive approach with small memory consumption?

Solution

After trying some of the adaptive iterative approaches that I found in literature without great success (not sure, if I did it right), I came up with a solution that gives me good results for my test samples and also for the actual Monte-Carlo-Simulation.

I buffer a subset of simulation results, compute the sample quantiles and average over all subset sample quantiles in the end. This seems to work quite well and without tuning many parameters. The only parameter is the buffer size which is 100 in my case.

The results converge quite fast and increasing sample size does not improve the results dramatically. There seems to be a small but constant bias that presumably is the averaged error of the subset sample quantiles. And that is the downside of my solution. By choosing the buffer size, one fixes the achievable accuracy. Increasing the buffer size reduces this bias. In the end, it seems to be a memory and accuracy tradeoff.

% Generate data
rng('default');
data = sqrt(0.5) * randn(10000, 1) + 5 * rand(10000, 1) + 10;

% Set parameters
probabilities = 0.2;

% Compute reference sample quantiles
quantileEstimation1 = quantile(data, probabilities);

% Estimate quantiles with computing the mean over a number of subset
% sample quantiles
subsetSize = 100;
quantileSum = 0;
for index = 1:length(data) / subsetSize;

    quantileSum = quantileSum + quantile(data(((index - 1) * subsetSize + 1):(index * subsetSize)), probabilities);

end
quantileEstimation2 = quantileSum / (length(data) / subsetSize);

% Estimate quantiles with iterative computation
quantileEstimation3 = zeros(size(probabilities));
indicator = zeros(size(probabilities));
controlFactor = 2 * sqrt(2 * pi);
for index = 1:length(data)

    control = controlFactor / index;
    indices = (data(index) >= quantileEstimation3);
    indicator(indices) = probabilities(indices);
    indices = (data(index) < quantileEstimation3);
    indicator(indices) = probabilities(indices) - 1;
    quantileEstimation3 = quantileEstimation3 + control * indicator;

end

fprintf('Reference result: %f\nSubset result: %f\nIterative result: %f\n\n', quantileEstimation1, quantileEstimation2, quantileEstimation3);