I am trying to construct a contingency matrix for instances between a caller and callee. I am just having issues because my variable caller_id contains values that are 5 numbers in length; however, I need to separate the values based on if they begin with 1, 2, or 3. For example, my data is of the pattern:
CALLER CALLEE
12345 1
23456 1
35643 2
Where the prefix of Caller and the value for Callee could be 1, 2, or 3, representing 1 for of white ethnicity, 2 for of black ethnicity, and 3 for unknown. I need to then create a contingency matrix such as:
White Caller Black Caller
White Callee # of calls # of calls
Black Callee # of calls # of calls
Unknown Callee # of calls # of calls
If anyone has any advice on how I could go about separating the values and creating the matrix, it would be much appreciated. Thank you in advance.
With base R you may use
with(df, table(CALLER = substr(CALLER, 0, 1), CALLEE))
# CALLEE
# CALLER 1 2
# 1 1 0
# 2 1 0
# 3 0 1
where substr(df$CALLER, 0, 1)
extracts the first digit from df$CALLER
(see ?substr
) and then table
gives the contingency table.