Search code examples
rcrosstab

Crosstabulation based on values in two columns


I have a crosstabulation problem. My data has 9600 rows, and here is a simplified sample, where I have removed several parameters:

Year   Sex    Area
1938    F      2
1938    M      2
1939    M      2
1939    M      5
1955    F      4
1957    F      4
1955    M      4
1943    M      6
1988    F      1
1988    M      2
1987    M      2
1955    M      3
1984    M      4
1984    F      5
1966    M      6
1955    F      8
1984    F      9
1955    M      3
1981    M      6
1938    F      6

I am looking for a table that crosstabulates my data like this, and I need the tally of cases Area~Year, not the aggregated sum of the parameter values.

Year   Area=1  Area=2  Area=3  Area=3  Area=5  Area=6  Area=7  Area=8  Area=9
1938     7       0       1       9       2       8       1      14       4 
1939     4       4      12      66       3      42      87      24      54 
1940     9      33       1       1       6      22       1       5      15 
1941     12      2       8      77      11      23       4      14       6 
...

I know there are several ways to do this, and they're probably quite simple, but I can't find the right procedure.


Solution

  • Use dcast function

    library(reshape2)
    dcast(df, Year~Area, value.var = "Area", fun.aggregate = length)
    

    Result:

       Year 1 2 3 4 5 6 8 9
    1  1938 0 2 0 0 0 1 0 0
    2  1939 0 1 0 0 1 0 0 0
    3  1943 0 0 0 0 0 1 0 0
    4  1955 0 0 2 2 0 0 1 0
    5  1957 0 0 0 1 0 0 0 0
    6  1966 0 0 0 0 0 1 0 0
    7  1981 0 0 0 0 0 1 0 0
    8  1984 0 0 0 1 1 0 0 1
    9  1987 0 1 0 0 0 0 0 0
    10 1988 1 1 0 0 0 0 0 0
    

    You can rename the columns further.

    ct <- dcast(df, Year~Area, value.var = "Area", fun.aggregate = length)
    colnames(ct) <- c(colnames(ct[1]), paste0("Area=", colnames(ct[-1])))
    ct
    

    Result:

       Year Area=1 Area=2 Area=3 Area=4 Area=5 Area=6 Area=8 Area=9
    1  1938      0      2      0      0      0      1      0      0
    2  1939      0      1      0      0      1      0      0      0
    3  1943      0      0      0      0      0      1      0      0
    4  1955      0      0      2      2      0      0      1      0
    5  1957      0      0      0      1      0      0      0      0
    6  1966      0      0      0      0      0      1      0      0
    7  1981      0      0      0      0      0      1      0      0
    8  1984      0      0      0      1      1      0      0      1
    9  1987      0      1      0      0      0      0      0      0
    10 1988      1      1      0      0      0      0      0      0