I have n cell arrays c1,c2,…,cn, having dimensions L1 × 1,L2 × 1,…, Ln × 1, respectively. (FWIW, each cell array contains elements of a unique class, but this class may not be the same for all the arrays.)
I want to produce a dataset
object representing the Cartesian product (aka "cross-join") of these n cell arrays.
I'm looking for a programmatic way to do this that will work for any n.
To be clear about what I mean by "Cartesian product" (or "cross-join"): I want to produce a dataset object containing n columns and L1 × L2 × … ×Ln rows, one row for each possible combination of an entry from c1, an entry from c2, …, an entry from cn - 1, and an entry from cn. (It's OK to assume that none of c1,c2,…,cn contains duplicate entries. IOW, one may assume that every ci is equal to unique(
ci)
.)
An example where n = 3 is given below; the desired result is the dataset
object factors
. (Of course, the names of factors
's columns represent an additional parameter. Also, in this example, all the cell arrays contain strings, but, as already mentioned, in general, the different arrays will contain entries of different classes.)
>> c1
c1 =
'even'
'odd'
>> c2
c2 =
'green'
'red'
'yellow'
>> c3
c3 =
'clubs'
'diamonds'
'hearts'
'spades'
>> factors
factors =
Parity TrafficLight Suit
'even' 'red' 'spades'
'even' 'red' 'hearts'
'even' 'red' 'diamonds'
'even' 'red' 'clubs'
'even' 'yellow' 'spades'
'even' 'yellow' 'hearts'
'even' 'yellow' 'diamonds'
'even' 'yellow' 'clubs'
'even' 'green' 'spades'
'even' 'green' 'hearts'
'even' 'green' 'diamonds'
'even' 'green' 'clubs'
'odd' 'red' 'spades'
'odd' 'red' 'hearts'
'odd' 'red' 'diamonds'
'odd' 'red' 'clubs'
'odd' 'yellow' 'spades'
'odd' 'yellow' 'hearts'
'odd' 'yellow' 'diamonds'
'odd' 'yellow' 'clubs'
'odd' 'green' 'spades'
'odd' 'green' 'hearts'
'odd' 'green' 'diamonds'
'odd' 'green' 'clubs'
This works for
It makes use of cellfun
, arrayfun
and comma-separated lists. The Cartesian product is computed on indices (not on actual elements) using ndgrid
, with fliplr
to yield the order you want (first column varies slowest, last column varies fastest).
The result is given as a cell array with n columns. If you need it in the form of a dataset, define appropriate names and use cell2dataset
to convert.
c1 = {'even','odd'}; %// example data
c2 = {'green','red','yellow'};
c3 = {'clubs','diamonds','hearts','spades'};
sets = {c1, c2, c3}; %// can have an arbirary number of c's
num = numel(sets);
nums = cellfun(@(c) numel(c), sets);
inds = cell(1,num);
vec = fliplr(arrayfun(@(n) 1:n, nums, 'uni', 0));
[inds{:}] = ndgrid(vec{:});
inds = fliplr(inds);
factors = arrayfun(@(n) {sets{n}{inds{n}}},1:num, 'uni', 0);
factors = cat(1, factors{:}).';
Result:
>> factors
factors =
'even' 'green' 'clubs'
'even' 'green' 'diamonds'
'even' 'green' 'hearts'
'even' 'green' 'spades'
'even' 'red' 'clubs'
'even' 'red' 'diamonds'
'even' 'red' 'hearts'
'even' 'red' 'spades'
'even' 'yellow' 'clubs'
'even' 'yellow' 'diamonds'
'even' 'yellow' 'hearts'
'even' 'yellow' 'spades'
'odd' 'green' 'clubs'
'odd' 'green' 'diamonds'
'odd' 'green' 'hearts'
'odd' 'green' 'spades'
'odd' 'red' 'clubs'
'odd' 'red' 'diamonds'
'odd' 'red' 'hearts'
'odd' 'red' 'spades'
'odd' 'yellow' 'clubs'
'odd' 'yellow' 'diamonds'
'odd' 'yellow' 'hearts'
'odd' 'yellow' 'spades'