Search code examples
rduplicatesr-faq

Remove duplicated rows


I have read a CSV file into an R data.frame. Some of the rows have the same element in one of the columns. I would like to remove rows that are duplicates in that column. For example:

platform_external_dbus          202           16                     google        1
platform_external_dbus          202           16         space-ghost.verbum        1
platform_external_dbus          202           16                  localhost        1
platform_external_dbus          202           16          users.sourceforge        8
platform_external_dbus          202           16                    hughsie        1

I would like only one of these rows since the others have the same data in the first column.


Solution

  • just isolate your data frame to the columns you need, then use the unique function :D

    # in the above example, you only need the first three columns
    deduped.data <- unique( yourdata[ , 1:3 ] )
    # the fourth column no longer 'distinguishes' them, 
    # so they're duplicates and thrown out.