Search code examples
rdataframedatasettidyversevisualization

How to only select rows that are duplicated in a column in a dataframe


I have joined two dataframes together and I am trying to select only the 'Branch Codes' that are duplicated.

I want to join the datasets 'BranchData' and 'BranchCode' so that any branch codes that are common to both datasets are included as well as those are not common to both datasets.

However, the last line of the code below does not seem to work!

BranchData$'Branch Code' <
as.numeric(BranchData$'Branch Code')
BranchCalls$'Branch Code' <- as.numeric(BranchCalls$'Branch Code')
BranchData <- na.omit(BranchData)
merged <- full_join(BranchData,BranchCalls)
merged <- merged %>% group_by(merged$`Branch Code`) %>% filter(n() >= 2)

Also, when I try to put the duplicates into groups, so that all the duplicates are together, but the following code doesn't seem to work!:

merged <- group_by(merged,merged$'Branch Code')

Minimal Reproducible Example:

    structure(list(`Branch Code` = c(401801, 436801, 403801, 164801, 
198801), `Location Type` = c("Urban", "Urban", "Urban Deprived", 
"Rural", "Urban"), Type = c("MAIN", "MAIN", "MAIN", "MAIN", "LM"
), Status = c("Open", "Open", "Open", "Open", "Open"), Segment = c("Agency", 
"Agency", "Agency", "Agency", "Agency"), `Multiple (partner that owns multiple branches)` = c("Multiple 11", 
"Multiple 11", "Multiple 12", "Multiple 13", "Multiple 13"), 
    RetailType = c("Books_Stationery", "Books_Stationery", "Convenience", 
    "Convenience", "Convenience"), `Volume of transactions` = c(2238, 
    1514, 1346, 1338, 625), `Open hours` = c(47.75, 50.2500000000001, 
    46.5, 48.25, 114.25), `X Pos` = c(394169, 393488, 394434, 
    392153, 393094), `Y Pos` = c(806326, 805877, 804347, 796902, 
    802789), Urbanity = c("Major Centre", "Major Centre", "High Density", 
    "Low Density", "Low Density"), `Case Reference Number` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), `Created On` = structure(c(NA_real_, NA_real_, NA_real_, 
    NA_real_, NA_real_), tzone = "UTC", class = c("POSIXct", 
    "POSIXt")), `Branch Type` = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), L1 = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), L2 = c(NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_), L3 = c(NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_), L4 = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `Case Type` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    )), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
), na.action = structure(c(`3` = 3L, `4` = 4L, `5` = 5L, `6` = 6L, 
`7` = 7L, `8` = 8L, `9` = 9L, `11` = 11L, `13` = 13L, `16` = 16L, 
`17` = 17L, `18` = 18L, `20` = 20L, `21` = 21L, `22` = 22L, `23` = 23L, 
`26` = 26L, `27` = 27L, `28` = 28L, `29` = 29L, `31` = 31L, `32` = 32L, 
`33` = 33L, `34` = 34L, `35` = 35L, `36` = 36L, `37` = 37L, `39` = 39L, 
`40` = 40L, `41` = 41L, `42` = 42L, `43` = 43L, `44` = 44L, `45` = 45L, 
`46` = 46L, `47` = 47L, `48` = 48L, `49` = 49L, `51` = 51L, `52` = 52L, 
`54` = 54L, `55` = 55L, `57` = 57L, `58` = 58L, `59` = 59L, `60` = 60L, 
`61` = 61L, `62` = 62L, `63` = 63L, `65` = 65L, `67` = 67L, `68` = 68L, 
`69` = 69L, `70` = 70L, `71` = 71L, `72` = 72L, `74` = 74L, `75` = 75L, 
`76` = 76L, `77` = 77L, `78` = 78L, `80` = 80L, `81` = 81L, `82` = 82L, 
`83` = 83L, `84` = 84L, `86` = 86L, `87` = 87L, `88` = 88L, `89` = 89L, 
`91` = 91L, `92` = 92L, `93` = 93L, `96` = 96L, `97` = 97L, `98` = 98L, 
`99` = 99L, `100` = 100L, `101` = 101L, `103` = 103L, `106` = 106L, 
`107` = 107L, `108` = 108L, `109` = 109L, `110` = 110L, `111` = 111L, 
`112` = 112L, `113` = 113L, `114` = 114L, `115` = 115L, `116` = 116L, 
`117` = 117L, `118` = 118L, `119` = 119L, `120` = 120L, `121` = 121L, 
`122` = 122L, `123` = 123L, `124` = 124L, `126` = 126L, `127` = 127L, 
`129` = 129L, `130` = 130L, `131` = 131L, `132` = 132L, `133` = 133L, 
`134` = 134L, `135` = 135L, `136` = 136L, `137` = 137L, `139` = 139L, 
`140` = 140L, `141` = 141L, `142` = 142L, `143` = 143L, `144` = 144L, 
`145` = 145L, `146` = 146L, `147` = 147L, `148` = 148L, `149` = 149L, 
`150` = 150L, `151` = 151L, `152` = 152L, `153` = 153L, `155` = 155L, 
`156` = 156L, `157` = 157L, `160` = 160L, `161` = 161L, `162` = 162L, 
`163` = 163L, `165` = 165L, `166` = 166L, `167` = 167L, `168` = 168L, 
`169` = 169L, `174` = 174L, `175` = 175L, `176` = 176L, `177` = 177L, 
`178` = 178L, `179` = 179L, `180` = 180L, `182` = 182L, `183` = 183L, 
`185` = 185L, `186` = 186L, `188` = 188L, `189` = 189L, `190` = 190L, 
`191` = 191L, `192` = 192L, `193` = 193L, `194` = 194L, `195` = 195L, 
`196` = 196L, `197` = 197L, `198` = 198L, `199` = 199L, `200` = 200L, 
`201` = 201L, `203` = 203L, `204` = 204L, `205` = 205L, `206` = 206L, 
`207` = 207L, `209` = 209L, `210` = 210L, `211` = 211L, `212` = 212L, 
`213` = 213L, `214` = 214L, `215` = 215L, `216` = 216L, `217` = 217L, 
`218` = 218L, `219` = 219L, `220` = 220L, `221` = 221L, `222` = 222L, 
`223` = 223L, `224` = 224L, `226` = 226L, `227` = 227L, `228` = 228L, 
`229` = 229L, `230` = 230L, `231` = 231L, `232` = 232L, `233` = 233L, 
`234` = 234L, `236` = 236L, `237` = 237L, `238` = 238L, `239` = 239L, 
`240` = 240L, `241` = 241L, `242` = 242L, `243` = 243L, `244` = 244L, 
`245` = 245L, `247` = 247L, `248` = 248L, `249` = 249L, `250` = 250L, 
`251` = 251L, `252` = 252L, `253` = 253L, `254` = 254L, `255` = 255L, 
`256` = 256L, `257` = 257L, `258` = 258L, `259` = 259L, `260` = 260L, 
`261` = 261L, `262` = 262L, `263` = 263L, `264` = 264L, `265` = 265L, 
`266` = 266L, `267` = 267L, `268` = 268L, `269` = 269L, `270` = 270L, 
`271` = 271L, `272` = 272L, `273` = 273L, `274` = 274L, `276` = 276L, 
`278` = 278L, `280` = 280L, `281` = 281L, `282` = 282L, `283` = 283L, 
`284` = 284L, `285` = 285L, `286` = 286L, `288` = 288L, `289` = 289L, 
`291` = 291L, `292` = 292L, `293` = 293L, `294` = 294L, `296` = 296L, 
`297` = 297L, `298` = 298L, `299` = 299L, `300` = 300L, `301` = 301L, 
`304` = 304L, `305` = 305L, `306` = 306L, `307` = 307L, `308` = 308L, 
`311` = 311L, `312` = 312L, `313` = 313L, `316` = 316L, `319` = 319L, 
`321` = 321L, `322` = 322L, `323` = 323L, `324` = 324L, `325` = 325L, 
`326` = 326L, `327` = 327L, `328` = 328L, `329` = 329L, `330` = 330L, 
`331` = 331L, `332` = 332L, `333` = 333L, `335` = 335L, `337` = 337L, 
`338` = 338L, `339` = 339L, `340` = 340L, `341` = 341L, `342` = 342L, 
`343` = 343L, `344` = 344L, `345` = 345L, `346` = 346L, `347` = 347L, 
`348` = 348L, `349` = 349L, `350` = 350L, `351` = 351L, `352` = 352L, 
`353` = 353L, `354` = 354L, `355` = 355L, `356` = 356L, `357` = 357L, 
`359` = 359L, `360` = 360L, `361` = 361L, `362` = 362L, `363` = 363L, 
`365` = 365L, `366` = 366L, `367` = 367L, `368` = 368L, `370` = 370L, 
`371` = 371L, `372` = 372L, `373` = 373L, `375` = 375L, `376` = 376L, 
`378` = 378L, `379` = 379L, `380` = 380L, `381` = 381L, `382` = 382L, 
`384` = 384L, `385` = 385L, `387` = 387L, `388` = 388L, `389` = 389L, 
`390` = 390L, `391` = 391L, `392` = 392L, `393` = 393L, `395` = 395L, 
`396` = 396L, `397` = 397L, `398` = 398L, `399` = 399L, `400` = 400L, 
`401` = 401L, `403` = 403L, `404` = 404L, `405` = 405L, `409` = 409L, 
`412` = 412L, `413` = 413L, `414` = 414L, `415` = 415L, `416` = 416L, 
`418` = 418L, `419` = 419L, `420` = 420L, `421` = 421L, `422` = 422L, 
`423` = 423L, `426` = 426L, `427` = 427L, `428` = 428L, `429` = 429L, 
`432` = 432L, `433` = 433L, `435` = 435L, `436` = 436L, `437` = 437L, 
`438` = 438L, `440` = 440L, `441` = 441L, `442` = 442L, `443` = 443L, 

I would be so grateful if anybody could give me a helping hand!

Thank you so much!


Solution

  • You can do it using table:

    merged %>% filter(table(`Branch Code`)[`Branch Code`] > 1)
    

    or using add_count:

    merged %>% add_count(`Branch Code`) %>% filter(n > 1)