I have two data frames: codes and supply. Codes (shown below) is comprised of four fields: state,codetype,code,codetitle supply has 12 columns but three of them are state,codetype, and code
An example of this is below
state codetype code codetitle
32 15 123456 Something
32 15 123455 Something Else
32 10 123455 Something Else
From there, I use the following code to concatenate the item
supply1<- supply%>%mutate(supply1= paste0(state,codetype,code))
codes1<- codes%>%mutate(codes1= paste0(state,codetype,code))
My question is how do I find out what combinations of state,codetype,code are in supply1 but not codes1. I would use excel and the match function to do this but there are 1.9 million rows and that exceeds the capacity of Excel.
Have looked at documentation about antijoin. However, being that there is no common field such as ID, getting a bit confused.
library(dplyr)
anti_join(supply, codes, by = c("state", "codetype", "code"))
# state codetype code codetitle
# 1 34 15 123459 Something_4
codes$code_rn <- seq_len(nrow(codes))
supply$supply_rn <- seq_len(nrow(supply))
temp <- merge(codes, supply, by = c("state", "codetype", "code"))
temp
# state codetype code codetitle.x code_rn codetitle.y supply_rn
# 1 32 15 123455 Something_Else 2 Something_3 2
# 2 32 15 123456 Something 1 Something_2 1
supply[ !supply$supply_rn %in% temp$supply_rn, ]
# state codetype code codetitle supply_rn
# 3 34 15 123459 Something_4 3
(and some column clean-up)
This is effectively what you were starting with:
supply_id <- with(supply, paste(state, codetype, code, sep = "|"))
supply_id
# [1] "32 15 123456" "32 15 123455" "34 15 123459"
codes_id <- with(codes, paste(state, codetype, code, sep = "|"))
codes_in
# [1] "32|15|123456" "32|15|123455" "32|10|123455"
supply[!supply_id %in% codes_id,]
# state codetype code codetitle supply_rn
# 3 34 15 123459 Something_4 3
codes <- read.table(header = TRUE, text="
state codetype code codetitle
32 15 123456 Something
32 15 123455 Something_Else
32 10 123455 Something_Else")
supply <- read.table(header = TRUE, text="
state codetype code codetitle
32 15 123456 Something_2
32 15 123455 Something_3
34 15 123459 Something_4")