Search code examples
rlistduplicatesunique

Remove duplicate in a large list while keeping the named number in R


I have a very large list (1 582 238 elements) and I would like to delete all the duplicate while keeping the name of the numbers.

my list looks like this

$`GUE/NGL.mepid`
[1] 197701
...
$`Verts/ALE.mepid`
[1] 197837

It is available here : https://github.com/JMcrocs/MEPVote/blob/master/MEPList.rds

When I use unique(mylist), I lose the name of the numbers.

 [[1]]
[1] 197701

[[2]]
[1] 197533

[[3]]
[1] 197521

Sadly the list is too big to turn into a data.frame so I have not found a solution.

Please can you help me?

Best Regards,


Solution

  • Try this:

    df <- readRDS('MEPList.rds')
    df1 <- as.data.frame(do.call(rbind,df))
    df2 <- df1[!duplicated(df1$V1),,drop=F]
    

    Output:

    head(df2)
    
                        V1
    GUE.NGL.mepid   197701
    GUE.NGL.mepid.1 197533
    GUE.NGL.mepid.2 197521
    GUE.NGL.mepid.3 187917
    GUE.NGL.mepid.4 124986
    GUE.NGL.mepid.5 197529
    

    Then you could format the rownames() to get the names.