I am writing in if statement, which checks for duplicated. If there are any I want to continue executing, but return a message indicating which are the duplicates. I tried the message()
, but I am unsure how to include the values of locations
.
if(anyDuplicated(regionGroups$location) > 0){
duplicateRegions <- regionGroups[, 'count' := .N, by = location][count > 1, .SD[1], by = location][[1]]
message("Location is not unique in the table regionGroups. There are length(duplicateRegions) duplicated locations, namely: duplicateRegions[1],duplicateRegions[2] ")
regionGroups <- regionGroups[!duplicated(regionGroups$location)]
}
(anyDuplicated(regionGroups$location) > 0)
[1] TRUE
duplicateRegions
[1] 55100 26080
The desired output is:
Location is not unique in the table regionGroups. There are 2 duplicated locations, namely: 55100, 26080
The complicated thing is there may be many more duplicatedRegions and the numbers will change.
QUESTION: how to write the message()
statement such that the output will list the respective values of duplicateRegions
?
Hi does this work for you? I'm a little unsure as to why you need the if
statement if you are going to execute anyway as there appears to be no else
element required, perhaps you have left this out for simplicity.
Another point to note is that the duplicated
does not pick up the first of the duplicates in the duplicated set so when you use:
regionGroups[!duplicated(regionGroups$location),]
it will always eliminate all but the first duplicate. That maybe ok for you but just as a warning.
Also if you take this approach: namely: duplicateRegions[1],duplicateRegions[2]
in the message function you are assuming you know how many duplicates you will have which would not be the case. You can just collapse the string with: paste(as.character(regionGroups$location[dups]), collapse = ", "))
so you don't need to worry about that.
if(any(duplicated(regionGroups$location))){
dups <- which(duplicated(regionGroups$location))
dup_regions <- regionGroups$location[dups]
message(" Location is not unique in the table regionGroups. There are ",
length(dups)," duplicated locations, namely: ", paste(as.character(regionGroups$location[dups]), collapse = ", "))
regionGroups <- regionGroups[!duplicated(regionGroups$location),]
}