matlab parameter-passing pass-by-reference quicksort in-place

In-Place Quicksort in matlab

I wrote a small quicksort implementation in matlab to sort some custom data. Because I am sorting a cell-array and I need the indexes of the sort-order and do not want to restructure the cell-array itself I need my own implementation (maybe there is one available that works, but I did not find it).

My current implementation works by partitioning into a left and right array and then passing these arrays to the recursive call. Because I do not know the size of left and and right I just grow them inside a loop which I know is horribly slow in matlab.

I know you can do an in place quicksort, but I was warned about never modifying the content of variables passed into a function, because call by reference is not implemented the way one would expect in matlab (or so I was told). Is this correct? Would an in-place quicksort work as expected in matlab or is there something I need to take care of? What other hints would you have for implementing this kind of thing?

Solution

Implementing a sort on complex data in user M-code is probably going to be a loss in terms of performance due to the overhead of M-level operations compared to Matlab's builtins. Try to reframe the operation in terms of Matlab's existing vectorized functions.

Based on your comment, it sounds like you're sorting on a single-value key that's inside the structs in the cells. You can probably get a good speedup by extracting the sort key to a primitive numeric array and calling the builtin sort on that.

%// An example cell array of structs that I think looks like your input
c = num2cell(struct('foo',{'a','b','c','d'}, 'bar',{6 1 3 2}))
%// Let's say the "bar" field is what you want to sort on.
key = cellfun(@(s)s.bar, c) %// Extract the sort key using cellfun
[sortedKey,ix] = sort(key) %// Sort on just the key using fast numeric sort() builtin
sortedC = c(ix); %// ix is a reordering index in to c; apply the sort using a single indexing operation
reordering = cellfun(@(s)s.foo, sortedC)  %// for human readability of results

If you're sorting on multiple field values, extract all the m key values from the n cells to an n-by-m array, with columns in descending order of precedence, and use sortrows on it.

%// Multi-key sort
keyCols = {'bar','baz'};
key = NaN(numel(c), numel(keyCols));
for i = 1:numel(keyCols)
    keyCol = keyCols{i};
    key(:,i) = cellfun(@(s)s.(keyCol), c);
end
[sortedKey,ix] = sortrows(key);
sortedC = c(ix);
reordering = cellfun(@(s)s.foo, sortedC)

One of the keys to performance in Matlab is to get your data in primitive arrays, and use vectorized operations on those primitive arrays. Matlab code that looks like C++ STL code with algorithms and references to comparison functions and the like will often be slow; even if your code is good in O(n) complexity terms, the fixed cost of user-level M-code operations, especially on non-primitives, can be a killer.

Also, if your structs are homogeneous (that is, they all have the same set of fields), you can store them directly in a struct array instead of a cell array of structs, and it will be more compact. If you can do more extensive redesign, rearranging your data structures to be "planar-organized" - where you have a struct of arrays, reading across the ith elemnt of all the fields as a record, instead of an array of structs of scalar fields - could be a good efficiency win. Either of these reorganizations would make constructing the sort key array cheaper.