I have the two following tables of data, one named data1, the other named data2. The left-hand column is a categorical variable and the right hand column is frequency I would like to rewrite these tables but where there are missing categories in the left-hand column I would like it to put in the correct missing category and then put a '0' in the right-hand frequency column.
data1 = [
1 170
2 120
3 100
4 40
5 30
6 20
7 10
9 8
10 2
11 1
14 1
];
data2 = [
1 240
2 200
3 180
4 60
5 50
6 40
7 30
8 20
9 8
10 2
12 1
19 1
];
To be clearer I will explain with an example. In data1, 8 12 and 13 are missing in the left-hand column. I would like matlab to recreate this table but with 0 values for 8, 12 and 13 so it looks as follows. I would also like it to have additional empty categories after '14' because data2 is longer and has more categories. I have also included what data2 should look like with filled in values.
data1 = [
1 170
2 120
3 100
4 40
5 30
6 20
7 10
8 0
9 8
10 2
11 1
12 0
13 0
14 1
15 0
16 0
17 0
18 0
19 0
];
data2 = [
1 240
2 200
3 180
4 60
5 50
6 40
7 30
8 20
9 8
10 2
11 0
12 1
13 0
14 0
15 0
16 0
17 0
18 0
19 1
];
I have a handful of datasets which generally all start with 1,2,3,4,5...etc but then they all have slightly different categories on the left-hand column, because where values are missing it just omits the category rather than putting 0. How do i write a code so that it automatically fills in any blanks with a 0. It would be good if the code could identify what the 'highest' number of categories is amongst all the datasets and then fill in blanks based on this.
my aim is to put together a grouped bar chart with data series that are all the same length.
UPDATED OUTPUT WITH 3 DATASETS
this is what your AllJoins code outputs in my matlab:
A table1 table2 table3
__ ______ ______ ______
1 170 240 2400
2 120 200 2000
3 100 180 0
4 40 60 0
5 30 50 0
6 20 40 0
7 10 30 0
8 0 20 0
9 8 8 0
10 2 2 0
11 1 0 0
12 0 1 0
14 1 0 0
19 0 1 0
20 0 0 1800
I would like the code to fill in the missing consecutive numbers in column A so that it looks as follows:
A table1 table2 table3
__ ______ ______ ______
1 170 240 2400
2 120 200 2000
3 100 180 0
4 40 60 0
5 30 50 0
6 20 40 0
7 10 30 0
8 0 20 0
9 8 8 0
10 2 2 0
11 1 0 0
12 0 1 0
13 0 0 0
14 1 0 0
15 0 0 0
16 0 0 0
17 0 0 0
18 0 0 0
19 0 1 0
20 0 0 1800
You can convert the datasets to a table and then use outerjoin. Then you can replace the NaNs with whatever you want using fillmissing.
table1 = array2table(data1);
table1.Properties.VariableNames = {'A', 'B'};
table2 = array2table(data2);
table2.Properties.VariableNames = {'A', 'B'};
newTable = outerjoin(table1, table2, 'LeftKeys', {'A'}, 'RightKeys', {'A'}, 'MergeKeys', true)
which produces:
A B_table1 B_table2
__ ________ ________
1 170 240
2 120 200
3 100 180
4 40 60
5 30 50
6 20 40
7 10 30
8 NaN 20
9 8 8
10 2 2
11 1 NaN
12 NaN 1
14 1 NaN
19 NaN 1
And then get your zeros with newTable2 = fillmissing(newTable, 'constant', 0)
, which prints:
A B_table1 B_table2
__ ________ ________
1 170 240
2 120 200
3 100 180
4 40 60
5 30 50
6 20 40
7 10 30
8 0 20
9 8 8
10 2 2
11 1 0
12 0 1
14 1 0
19 0 1
UPDATE
To combine multiple tables, you can either nest the outerjoin or write a function to loop over it (see similar Matlab forum question). Here's an example.
Given data1
and data2
in OP, plus a new data3
:
data3 = [
1 2400
2 2000
20 1800
];
Contents of myscript.m
:
table1 = MakeTable(data1);
table2 = MakeTable(data2);
table3 = MakeTable(data3);
AllJoins = MultiOuterJoin(table1, table2, table3);
% Functions
function Table = MakeTable(Array)
Table = array2table(Array);
Table.Properties.VariableNames = {'A', 'B'}; % set your column names, e.g. {'freq', 'count'}
end
function Joined = MultiOuterJoin(varargin)
Joined = varargin{1};
Joined.Properties.VariableNames{end} = inputname(1); % set #2 column name to be based on table name
for k = 2:nargin
Joined = outerjoin(Joined, varargin{k}, 'LeftKeys', {'A'}, 'RightKeys', {'A'}, 'MergeKeys', true);
name = inputname(k);
Joined.Properties.VariableNames{end} = name; % set merged column name to be based on table name
end
end
Which returns AllJoins
:
A table1 table2 table3
__ ______ ______ ______
1 170 240 2400
2 120 200 2000
3 100 180 NaN
4 40 60 NaN
5 30 50 NaN
6 20 40 NaN
7 10 30 NaN
8 0 20 NaN
9 8 8 NaN
10 2 2 NaN
11 1 0 NaN
12 0 1 NaN
13 0 0 NaN
14 1 0 NaN
15 0 0 NaN
16 0 0 NaN
17 0 0 NaN
18 0 0 NaN
19 0 1 NaN
20 NaN NaN 1800