Since my data is much more complicated, I made a smaller sample dataset (I left the reshape in to show how I generated the data).
set.seed(7)
x = rep(seq(2010,2014,1), each=4)
y = rep(seq(1,4,1), 5)
z = matrix(replicate(5, sample(c("A", "B", "C", "D"))))
temp_df = cbind.data.frame(x,y,z)
colnames(temp_df) = c("Year", "Rank", "ID")
head(temp_df)
require(reshape2)
dcast(temp_df, Year ~ Rank)
which results in...
> dcast(temp_df, Year ~ Rank)
Using ID as value column: use value.var to override.
Year 1 2 3 4
1 2010 D B A C
2 2011 A C D B
3 2012 A B D C
4 2013 D A C B
5 2014 C A B D
Now I essentially want to use a function like unique, but ignoring order to find where the first 3 elements are unique.
Thus in this case:
I would have A,B,C in row 5
I would have A,B,D in rows 1&3
I would have A,C,D in rows 2&4
Also I need counts of these "unique" events
Also 2 more things. First, my values are strings, and I need to leave them as strings. Second, if possible, I would have a column between year and 1 called Weighting, and then when counting these unique combinations I would include each's weighting. This isn't as important because all weightings will be small positive integer values, so I can potentially duplicate the rows earlier to account for weighting, and then tabulate unique pairs.
You could do something like this:
df <- dcast(temp_df, Year ~ Rank)
combos <- apply(df[, 2:4], 1, function(x) paste0(sort(x), collapse = ""))
combos
# 1 2 3 4 5
# "BCD" "ABC" "ACD" "BCD" "ABC"
For each row of the data frame, the values in columns 1, 2, and 3 (as labeled in the post) are sorted using sort
, then concatenated using paste0
. Since order doesn't matter, this ensures that identical cases are labeled consistently.
Note that the paste0
function is equivalent to paste(..., sep = "")
. The collapse
argument says to concatenate the values of a vector into a single string, with vector values separated by the value passed to collapse
. In this case, we're setting collapse = ""
, which means there will be no separation between values, resulting in "ABC"
, "ACD"
, etc.
Then you can get the count of each combination using table
:
table(combos)
# ABC ACD BCD
# 2 1 2