Search code examples
pythonlistlambdastartswith

Most efficient way to get first value that startswith of large list


I have a very large list with over a 100M strings. An example of that list look as follows:

l = ['1,1,5.8067',
     '1,2,4.9700',
     '2,2,3.9623',
     '2,3,1.9438',
     '2,7,1.0645', 
     '3,3,8.9331',
     '3,5,2.6772',
     '3,7,3.8107',
     '3,9,7.1008']

I would like to get the first string that starts with e.g. '3'.

To do so, I have used a lambda iterator followed by next() to get the first item:

next(filter(lambda i: i.startswith('3,'), l))
Out[1]: '3,3,8.9331'

Considering the size of the list, this strategy unfortunately still takes relatively much time for a process I have to do over and over again. I was wondering if someone could come up with an even faster, more efficient approach. I am open for alternative strategies.


Solution

  • I have no way of testing it myself but it is possible that if you will join all the strings with a char that is not in any of the string:

    concat_list = '$'.join(l)
    

    And now use a simple .find('$3,'), it would be faster. It might happen if all the strings are relatively short. Since now all the string is in one place in memory.


    If the amount of unique letters in the text is small you can use Abrahamson-Kosaraju method and het time complexity of practically O(n)


    Another approach is to use joblib, create n threads when the i'th thread is checking the i + k * n, when one is finding the pattern it stops the others. So the time complexity is O(naive algorithm / n).