Search code examples
objective-cnsarraynsset

Filter millions of strings and keep only unique ones Objective-C


I have a very large amount of strings in 200 txt files which I'm trying to filter and keep the unique ones only. I was thinking to use NSSet for this, but the problem is that there are 300 millions of string in initial files and I can't load them all into a NSSet because its initializing for a very long time.

Can anybody suggest a better approache or a work around that could help me to solve this problem?


Solution

  • Here a solution that is low cost for memory and cpu consumption :

    You can use a sqlite database : create a table with one column string as unique key that will receive each string you are parsing.

    During insertion of each string, if string is already in the table it won't be inserted and at the end the table will only contain unique strings.

    Make your code in order to keep insertions of strings on insertion failure because of an already existing string (duplicate key)

    Edit : add also an index on this column because your needs concerns a lot of entries