Search code examples
iosobjective-cnsarrayduplicatesnsorderedset

Objective-C: Remove duplicates efficiently from large data structures


I have an object containing an array of NSNumbers (indexes) and an array of NSDictionaries (indexesTitles) corresponding to indexes, containing some info. I have to call a method for each object.index and associate object.indexTitles to the returning results, saving them into a single array. At the end of it, I want to remove indexes duplicates, preserving the associated indextTitles in an efficient way, because I'm working with large arrays.

NSMutableArray *resultArray = [NSMutableArray array];
NSMutableArray *titlesArray = [NSMutableArray array];
for(NSNumber *index in object.indexes)
{
    NSArray *resultsIndexArray = [self methodThatReturnsAnArray];
    NSString *indexTitleDictionary = [object.indexesTitles objectAtIndex:i];
    for(NSNumber *resultId in resultsIndexArray)
    {
        [titlesArray addObject:indexDictionary];
        [resultArray addObject:resultId];
    }
    i++;
}
[fullResultsArray addObject:titlesArray];
[fullResultsArray addObject:resultArray];

I've found that the most efficient way to remove duplicates is using an
NSOrderedSet like this:

NSOrderedSet *orderedSet = [NSOrderedSet orderedSetWithArray:resultArray];
resultArray = [orderedSet.array mutableCopy];

How can I remove the corresponding entries in titlesArray? how can I preserve the association? I've also tried to use a NSDictionary like {resultId, titleDictionary} and storing them into an array, but I haven't found a efficient way to remove dictionaries with the same result, they are all too slow.

Any suggestion?


Solution

  • It is not completely clear to me what your problem is, maybe this will help:

    A good way to remove duplicates is not to add them in the first place, replace:

    for(NSNumber *resultId in resultsIndexArray)
    {
        [titlesArray addObject:indexDictionary];
        [resultArray addObject:resultId];
    }
    

    with:

    for(NSNumber *resultId in resultsIndexArray)
    {
        // only add if resultId not already in resultArray
        if( ![resultArray containsObject:resultId] )
        {
            [titlesArray addObject:indexDictionary];
            [resultArray addObject:resultId];
        }
    }
    

    The containsObject: call requires a linear search, if your data set is large you might wish to change resultArray to an NSMutableSet and titlesArray to an NSMutableDictionary mapping from resultId to indexDictionary values.

    HTH