I have a table which contains various data types. For instance,
chars = {'a'; 'bc'; 'de'};
nums = [1; 20; 300];
tbl = table(chars, nums);
I want to convert all numerical columns to character array variables. By this, I mean the result should be equivalent to if I had initialised the table using:
nums = {'1'; '20', '300'};
tbl = table(chars, nums);
I want to do this as fast as possible because I have a table with potentially millions of rows.
Current working solutions:
Firstly, I get the numerical columns, in the case of this example this will be column 2...
% Get the columns numbers which are numerical. This works fine, although a bit clunky.
numcols = find(varfun(@isnumeric, tbl(1,:), 'output', 'uniform'));
colnames = tbl.Properties.VariableNames(numcols); % Get the corresponding column names
Then, I have a couple of methods which convert those numerical columns to character array types, they both involve looping over the numerical columns and using some other loop-in-disguise ___fun
function...
arrayfun
for ii = 1:numel(colnames)
% This arrayfun is *slow* for large tables
tbl.(colnames{ii}) = arrayfun( @num2str, tbl.(colnames{ii}), 'uniformoutput', 0 );
end
num2cell
and cellfun
for ii = 1:numel(colnames)
% num2cell is relatively quick
tbl.(colnames{ii}) = num2cell( tbl.(colnames{ii}) );
% cellfun is about as slow as arrayfun, as might be expected
tbl.(colnames{ii}) = cellfun( @num2str, tbl.(colnames{ii}), 'uniformoutput', 0 );
end
Speed test below, note I'm only doing one column and 1e5 elements, whilst in reality I want to do many columns and potentially ten times more rows, so you can clearly see the speed issues...
% Setup
nums = (1:1e5).'; tbl = table(nums);
% Functions
f1 = @() arrayfun( @num2str, tbl.nums, 'uni', 0 );
f2 = @() cellfun( @num2str, num2cell(tbl.nums), 'uni', 0 );
% Timing
timeit(f1) % >> 5.15 sec
timeit(f2) % >> 5.16 sec
You can see the methods are basically equivalent, as probably expected from their similarity.
Does anyone know a faster way to convert all data in a table to character array variable type? I had thought to go via categoricals, but not sure how to proceed there.
I'd prefer a solution compatible with R2015b.
Note: It is relevant that my data contains mixed types, since I cannot (and wouldn't want to) use varfun( ... )
on the entire table.
You can use sprintf
to convert to large char array and then read that back using strread
to put in a cell array:
% Setup
nums = (1:1e5).'; tbl = table(nums);
% Functions
f1 = @() arrayfun( @num2str, tbl.nums, 'uni', 0 );
f2 = @() cellfun( @num2str, num2cell(tbl.nums), 'uni', 0 );
f3 = @() strread ( sprintf ( '%i\n', tbl.nums ), '%s', 'delimiter', '\n' );
f4 = @() textscan ( sprintf ( '%i\n', tbl.nums ), '%s', 'delimiter', '\n' );
% Timing
timeit(f1) %
timeit(f2) %
timeit(f3) %
timeit(f4) %
r1 = feval(f1);
r2 = feval(f2);
r3 = feval(f3);
r4 = feval(f4);
% check they are equal
isequal ( r1, r2 )
isequal ( r3, r1 )
isequal ( r4{1}, r1 )
On my computer with R2015b I get:
f1 -> 3.78 seconds
f2 -> 3.79 seconds
f3 -> 0.10 seconds
f4 -> 0.07 seconds
And as per the isequal checks they are all the same.
If your data is non integer -> you will need to change the sprintf statement, I put it as integer to ensure that the isequal verified that the results were the same.