Search code examples
c#linqunionplinq

Fast union of two big lists


I m using union on two big lists (over 1 million entries) and it s quite slow (a few minutes) I need the feature to remove duplicates so I cannot use concat and my lists are not sorted. Is there a faster way? Maybe using plinq?


Solution

  • You are not saying what are your items in the list, but one option is to use proper data structure for this task - you want to only keep unique items - it is definition of SET, so use HashSet.

    var hashSet = new HashSet<int>(list1);
    hashSet.UnionWith(list2);
    

    Also I measured time for code above vs Linq.Union:

    var list3 = list1.Union(list2).Distinct();
    

    And here is timing ( HashSet.UnionWith works almost twice faster):

    HashSet.UnionWith
    real    0m4.111s
    user    0m3.890s
    sys 0m0.132s
    
    real    0m4.562s
    user    0m4.074s
    sys 0m0.170s
    
    real    0m4.052s
    user    0m3.851s
    sys 0m0.129s
    
    real    0m4.003s
    user    0m3.814s
    sys 0m0.125s
    
    real    0m4.058s
    user    0m3.858s
    sys 0m0.126s
    
    
    Linq.Union.Distinct
    real    0m7.579s
    user    0m7.014s
    sys 0m0.428s
    
    real    0m7.498s
    user    0m6.965s
    sys 0m0.419s
    
    real    0m7.596s
    user    0m6.994s
    sys 0m0.412s
    
    real    0m7.446s
    user    0m6.917s
    sys 0m0.416s
    
    real    0m7.452s
    user    0m6.928s
    sys 0m0.403s