I m using union on two big lists (over 1 million entries) and it s quite slow (a few minutes)
I need the feature to remove duplicates so I cannot use concat and my lists are not sorted.
Is there a faster way? Maybe using plinq
?
You are not saying what are your items in the list, but one option is to use proper data structure for this task - you want to only keep unique items - it is definition of SET, so use HashSet.
var hashSet = new HashSet<int>(list1);
hashSet.UnionWith(list2);
Also I measured time for code above vs Linq.Union:
var list3 = list1.Union(list2).Distinct();
And here is timing ( HashSet.UnionWith works almost twice faster):
HashSet.UnionWith
real 0m4.111s
user 0m3.890s
sys 0m0.132s
real 0m4.562s
user 0m4.074s
sys 0m0.170s
real 0m4.052s
user 0m3.851s
sys 0m0.129s
real 0m4.003s
user 0m3.814s
sys 0m0.125s
real 0m4.058s
user 0m3.858s
sys 0m0.126s
Linq.Union.Distinct
real 0m7.579s
user 0m7.014s
sys 0m0.428s
real 0m7.498s
user 0m6.965s
sys 0m0.419s
real 0m7.596s
user 0m6.994s
sys 0m0.412s
real 0m7.446s
user 0m6.917s
sys 0m0.416s
real 0m7.452s
user 0m6.928s
sys 0m0.403s