Search code examples
matlabvectorrepeatrun-length-encoding

Repeat elements of vector


I have a value vector A containing elements i, for example:

A = [0.1 0.2 0.3 0.4 0.5]; and say r = [5 2 3 2 1];

Now I want to create a new vector Anew containing r(i) repetitions of the values i in A, such that the first r(1)=5 items in Anew have value A(1) and the length of the new vector is sum(r). Thus:

Anew = [0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.3 0.3 0.3 0.4 0.4 0.5]

I am sure this can be done with an elaborate for-loop combining e.g. repmat, but any chance someone knows how to do this in a smoother way?


Solution

  • As far as I'm aware, there is no equivalent function to do that in MATLAB, though R has rep that can do that for you.... so jealous.

    In any case, the only way I can suggest is to run a for loop with repmat as you suggested. However, you can perhaps do arrayfun instead if you want to do this as a one-liner... well it's technically two to do the post-processing required to get this into a single vector. As such, you can try this:

    Anew = arrayfun(@(x) repmat(A(x), r(x), 1), 1:numel(A), 'uni', 0);
    Anew = vertcat(Anew{:});
    

    This essentially does the for loop and concatenation of the replicated vectors with less code. We go through each pair of values in A and r and spit out replicated vectors. Each of them will be in a cell array, which is why vertcat is required to put it all into one vector.

    We get:

    Anew =
    
        0.1000
        0.1000
        0.1000
        0.1000
        0.1000
        0.2000
        0.2000
        0.3000
        0.3000
        0.3000
        0.4000
        0.4000
        0.5000
    

    Take note that other people have tried something similar to what you're doing in this post: A similar function to R's rep in Matlab. This is essentially mimicking R's way of doing rep, which is what you want to do!


    Alternative - Using for loops

    Because of @Divakar's benchmarking, I'm curious to see how pre-allocating the array, then using an actual for loop to iterate through A and r and populate it by indexing would benchmark. As such, the equivalent code to the above using for loops and indexing would be:

    Anew = zeros(sum(r), 1);
    counter = 1;
    for idx = 1 : numel(r)
        Anew(counter : counter + r(idx) - 1) = A(idx);
        counter = counter + r(idx);
    end
    

    We would need a variable that keeps track of where we need to insert elements in the array, which is stored in counter. We offset this by the total number of elements to replicate per number, which is stored in each value of r.

    As such, this method completely avoids using repmat and just uses indexing to generate our replicated vectors instead.


    Benchmarking (à la Divakar)

    Building on top of Divakar's benchmarking code, I actually tried running all of the tests on my machine, in addition to the for loop approach. I simply used his benchmarking code with the same test cases.

    These are the timing results I get per algorithm:

    Case #1 - N = 4000, max_repeat = 4000

    -------------------  With arrayfun
    Elapsed time is 1.202805 seconds.
    -------------------  With cumsum
    Elapsed time is 1.691591 seconds.
    -------------------  With bsxfun
    Elapsed time is 0.835201 seconds.
    -------------------  With for loop
    Elapsed time is 0.136628 seconds.
    

    Case #2 - N = 10000, max_repeat = 1000

    -------------------  With arrayfun
    Elapsed time is 2.117631 seconds.
    -------------------  With cumsum
    Elapsed time is 1.080247 seconds.
    -------------------  With bsxfun
    Elapsed time is 0.540892 seconds.
    -------------------  With for loop
    Elapsed time is 0.127728 seconds.
    

    In these cases, cumsum actually beats out arrayfun... which is what I originally expected. bsxfun beats everyone else out, except for the for loop. My guess is with the differing times in arrayfun between myself and Divakar, we are running our code on different architectures. I'm currently running my tests using MATLAB R2013a on a Mac OS X 10.9.5 MacBook Pro machine.

    As we can see, the for loop is much quicker. I know for a fact that when it comes to indexing operations in a for loop, the JIT kicks in and gives you better performance.