I have a crosstabulation problem. My data has 9600 rows, and here is a simplified sample, where I have removed several parameters:
Year Sex Area
1938 F 2
1938 M 2
1939 M 2
1939 M 5
1955 F 4
1957 F 4
1955 M 4
1943 M 6
1988 F 1
1988 M 2
1987 M 2
1955 M 3
1984 M 4
1984 F 5
1966 M 6
1955 F 8
1984 F 9
1955 M 3
1981 M 6
1938 F 6
I am looking for a table that crosstabulates my data like this, and I need the tally of cases Area~Year, not the aggregated sum of the parameter values.
Year Area=1 Area=2 Area=3 Area=3 Area=5 Area=6 Area=7 Area=8 Area=9
1938 7 0 1 9 2 8 1 14 4
1939 4 4 12 66 3 42 87 24 54
1940 9 33 1 1 6 22 1 5 15
1941 12 2 8 77 11 23 4 14 6
...
I know there are several ways to do this, and they're probably quite simple, but I can't find the right procedure.
Use dcast function
library(reshape2)
dcast(df, Year~Area, value.var = "Area", fun.aggregate = length)
Result:
Year 1 2 3 4 5 6 8 9
1 1938 0 2 0 0 0 1 0 0
2 1939 0 1 0 0 1 0 0 0
3 1943 0 0 0 0 0 1 0 0
4 1955 0 0 2 2 0 0 1 0
5 1957 0 0 0 1 0 0 0 0
6 1966 0 0 0 0 0 1 0 0
7 1981 0 0 0 0 0 1 0 0
8 1984 0 0 0 1 1 0 0 1
9 1987 0 1 0 0 0 0 0 0
10 1988 1 1 0 0 0 0 0 0
You can rename the columns further.
ct <- dcast(df, Year~Area, value.var = "Area", fun.aggregate = length)
colnames(ct) <- c(colnames(ct[1]), paste0("Area=", colnames(ct[-1])))
ct
Result:
Year Area=1 Area=2 Area=3 Area=4 Area=5 Area=6 Area=8 Area=9
1 1938 0 2 0 0 0 1 0 0
2 1939 0 1 0 0 1 0 0 0
3 1943 0 0 0 0 0 1 0 0
4 1955 0 0 2 2 0 0 1 0
5 1957 0 0 0 1 0 0 0 0
6 1966 0 0 0 0 0 1 0 0
7 1981 0 0 0 0 0 1 0 0
8 1984 0 0 0 1 1 0 0 1
9 1987 0 1 0 0 0 0 0 0
10 1988 1 1 0 0 0 0 0 0