I have a dataframe in which I need to sift through each row manually and determine if the columns I've matched using the RecordLinkage package are indeed a match. Some of the records have a high probability of being a match when they aren't simply due to spurious association. I'd like to quickly identify these without exporting my data to a csv and scrolling through them case by case. What I'd like instead is to iterate through each row of the data, and for each row prompt the user (me) with a question "is this a match (y/n)?", where the answer ('yes' or 'no') gets input into a column for that row.
This code will reproduce a quick example of data,
id= c(1, 2, 3, 4)
loc1 = c("21ST AVE", "5TH ST", "HICKMAN ST", "GULF DR")
loc2 = c("21ST AVE BEACH ST", "5 EAST HARPER BLVD", "28 HARLEY ST", "1000 GULF DR")
day1 = c(12, 13, 14, 15)
day2 = c(12, 13, 14, 15)
time1 = c("20:52", "12:52", "15:35", "14:45")
time2 = c("20:52", "18:29", "03:55", "15:01")
df = data.frame(id, loc1, loc2, day1, day2, time1, time2)
Providing this result,
id loc1 loc2 day1 day2 time1 time2
1 21ST AVE 21ST AVE BEACH ST 12 12 20:52 20:52
2 5TH ST 5 EAST HERST BLVD 13 13 12:52 18:29
3 HICKMAN ST 28 HARLEY ST 14 14 15:35 03:51
4 GULF DR 1000 GULF DR 15 15 14:45 15:01
What I'd like is for a prompt to ask
Is this a match (y/n)?
----------------------
id loc1 loc2 day1 day2 time1 time2
1 21ST AVE 21ST AVE BEACH ST 12 12 20:52 20:52
Whereby answering yes or no on each row would give the following result,
id loc1 loc2 day1 day2 time1 time2 match
1 21ST AVE 21ST AVE BEACH ST 12 12 20:52 20:52 y
2 5TH ST 5 EAST HERST BLVD 13 13 12:52 18:29 n
3 HICKMAN ST 28 HARLEY ST 14 14 15:35 03:55 n
4 GULF DR 1000 GULF DR 15 15 14:45 15:01 y
I'm not even sure if this is a) possible, b) feasible, or c) the best way to go about it. Open to thoughts/suggestions. Thanks.
First make a function...
checkRow<-function(df){
match<-vector()
for(i in 1:nrow(df)){
print(df[i,])
ans<-readline("Is this a match? (y or n)")
match<-c(match, ans)
}
return(cbind(df, match))
}
Then call it as such:
checked<-checkRow(df)