matlab for-loop vectorization sparse-matrix

Performance of vectorizing code to create a sparse matrix with a single 1 per row from a vector of indexes

I have a large column vector y containing integer values from 1 to 10. I wanted to convert it to a matrix where each row is full of 0s except for a 1 at the index given by the value at the respective row of y.

This example should make it clearer:

y = [3; 4; 1; 10; 9; 9; 4; 2; ...]

% gets converted to:

Y = [
    0 0 1 0 0 0 0 0 0 0;
    0 0 0 1 0 0 0 0 0 0;
    1 0 0 0 0 0 0 0 0 0;
    0 0 0 0 0 0 0 0 0 1;
    0 0 0 0 0 0 0 0 1 0;
    0 0 0 0 0 0 0 0 1 0;
    0 0 0 1 0 0 0 0 0 0;
    0 1 0 0 0 0 0 0 0 0;
    ...
    ]

I have written the following code for this (it works):

m = length(y);
Y = zeros(m, 10);
for i = 1:m
    Y(i, y(i)) = 1;
end

I know there are ways I could remove the for loop in this code (vectorizing). This post contains a few, including something like:

Y = full(sparse(1:length(y), y, ones(length(y),1)));

But I had to convert y to doubles to be able to use this, and the result is actually about 3x slower than my "for" approach, using 10.000.000 as the length of y.

Is it likely that doing this kind of vectorization will lead to better performance for a very large y? I've read many times that vectorizing calculations leads to better performance (not only in MATLAB), but this kind of solution seems to result in more calculations.
Is there a way to actually improve performance over the for approach in this example? Maybe the problem here is simply that acting on doubles instead of ints isn't the best thing for comparison, but I couldn't find a way to use sparse otherwise.

Solution

Here is a test to comapre:

function [t,v] = testIndicatorMatrix()
    y = randi([1 10], [1e6 1], 'double');
    funcs = {
        @() func1(y);
        @() func2(y);
        @() func3(y);
        @() func4(y);
    };

    t = cellfun(@timeit, funcs, 'Uniform',true);
    v = cellfun(@feval, funcs, 'Uniform',false);
    assert(isequal(v{:}))
end

function Y = func1(y)
    m = numel(y);
    Y = zeros(m, 10);
    for i = 1:m
        Y(i, y(i)) = 1;
    end
end

function Y = func2(y)
    m = numel(y);
    Y = full(sparse(1:m, y, 1, m, 10, m));
end

function Y = func3(y)
    m = numel(y);
    Y = zeros(m,10);
    Y(sub2ind([m,10], (1:m).', y)) = 1;
end

function Y = func4(y)
    m = numel(y);
    Y = zeros(m,10);
    Y((y-1).*m + (1:m).') = 1;
end

I get:

>> testIndicatorMatrix
ans =
    0.0388
    0.1712
    0.0490
    0.0430

Such a simple for-loop can be dynamically JIT-compiled at runtime, and would run really fast (even slightly faster than vectorized code)!