Search code examples
pythonlist-comprehensionjupyter-labfor-in-loop

How come this List Comprehension does not produce the same results as this for/in loop?


Generating a List of random strings, then using a for/in loop and also a List Comprehension expresison to fund the longest string and the length of that string.

Both techniques compute the max length correctly, but sometimes the for/in loop finds the same longest word as the List Comprehension, sometimes not. Why? What's the logic error?

import random
import string
def cobble_large_dataset(dataset_number_of_elements):
    '''
    Build a list of Lists, each List is a String of a random sequence of 1-10 characters
    '''
    myList = []         # Empty List   
    for i in range(0,dataset_number_of_elements):
        string_length = random.randint(1, 10)
        tmp = ''.join(random.choices(string.ascii_uppercase + string.digits, k=string_length))  # https://stackoverflow.com/questions/2257441/random-string-generation-with-upper-case-letters-and-digits
        tmp = [tmp]
        #print(tmp)
        myList.extend([tmp])
    return myList    

def list_comprehension_test(wordsList):
    '''
    Process a List of Lists using List Comprehension. 
    Each List in the List of Lists is a single String
    '''
    start_time = time.time()
   
    maximumWordLength, longest_word = max([(len(x[0]), x[0]) for x in wordsList]) # This works because x is a List of strings
    return ((time.time() - start_time), longest_word, maximumWordLength)

def brute_force_test(wordsList):
    '''
    Process a List of Lists using a brute-force for/in loop. 
    Each List in the List of Lists is a single String    
    '''
    start_time = time.time()
    maximumWordLength = 0
    for word in wordsList:
        tmp = word[0]
        #print(tmp)
        if (len(tmp) >= maximumWordLength):
            maximumWordLength = len(tmp)
            longest_word = tmp
            #print(tmp)
            #print(longest_word + " : " + str(maximumWordLength))
    return ((time.time() - start_time), longest_word, maximumWordLength)

import time
start_time = time.time()
dataset = cobble_large_dataset(100)
print (str(len(dataset)) + ' Strings generated in ' + str((time.time() - start_time)) + ' seconds.')

# Let's see if both techniques produce the same results:
result_brute_force = brute_force_test(dataset)
print('Results from Brute Force = ' + result_brute_force[1] + ', ' + str(result_brute_force[2]) + ' characters' )
result_list_comprehension = list_comprehension_test(dataset)
print('Results from List Comprehension = ' + result_list_comprehension[1] + ', ' + str(result_list_comprehension[2]) + ' characters' )
if (result_list_comprehension[1] == result_brute_force[1]):
    print("Techniques produced the same results.")
else:
    print("Techniques DID NOT PRODUCE the same results

Solution

  • In your list comprehension case, you want to tell max to just operate on the first item in each of the pairs of values in the list. This is the equivalent of what the for-loop case is doing, since it only considers the length of each string. So you want:

    maximumWordLength, longest_word = max(
        [(len(x[0]), x[0]) for x in wordsList],
        key = lambda x: x[0])  # This works because x is a List of strings
    

    As others have already pointed out, you also want to change the >= comparison in the brute-force case to >. If you make these two changes, you will get the same result from your two methods.