I'm trying to load the following dataset:
Afghanistan,5,1,648,16,10,2,0,3,5,1,1,0,1,1,1,0,green,0,0,0,0,1,0,0,1,0,0,black,green
Albania,3,1,29,3,6,6,0,0,3,1,0,0,1,0,1,0,red,0,0,0,0,1,0,0,0,1,0,red,red
Algeria,4,1,2388,20,8,2,2,0,3,1,1,0,0,1,0,0,green,0,0,0,0,1,1,0,0,0,0,green,white
...
Problem is it contains both integers and strings. I found some information on how to get out the integers only. But haven't been able to see if there's any way to get all the data.
My question is that possible ??
If that is not possible, is there then any way to find the numbers on each line and throw everything else away without having to choose the columns?
I need specifically since it seems I cannot use str2num
on a whole line at a time.
Almost anything is possible, you just have to define your goal accurately.
Assuming that your database is stored as a text file, you can parse it line by line using textread
, and then apply regexp
to filter only the numerical fields (this does not require having prior knowledge about the columns):
C = textread('database.txt', '%s', 'delimiter', '\n');
C = cellfun(@(x)regexp(x, '\d+', 'match'), C, 'Uniform', false);
The result here is a cell array of cell array of strings, where each string corresponds to a numerical field in a specific line.
Since the numbers are still stored as strings, you'd probably need to convert them to actual numerical values. There's a multitude of ways to do that, but you can use str2num
in a tricky way: it can convert delimited strings into an array of numbers. This means that if you concatenate all strings in a specific line back into one string, and put spaces in between, you can apply str2num
on all of them at once, like so:
C = cellfun(@(x)str2num(sprintf('%s ', x{:})), C, 'Uniform', false);
The resulting C
is a cell array of vectors, each vector containing the values of all numerical fields in the corresponding line. To access a specific vector, you can use curly braces ({}
). For instance, to access the numbers of the second line, you would use C{2}
.
All the non-numerical fields are discarded in the process of parsing, of course. If you want to keep them as well, you should use a different regular expression with regexp
.
Good luck!