There is a list (price_list) of 10 million numbers as a price in range (0-100). I would like to do the following operations consequently:
Here is the sample code:
import time import numpy as np from functools import reduce with open('price_list.txt') as f: price_list = f.read().split('\n') price_list = np.array(price_list).astype(int) # convert string to int
I tried following options:
Using NumPy module (0.1069948673248291 seconds): total = np.sum(price_list[price_list < 30] * 1.05)
Using brute force loop (9.485718965530396 seconds)
Using filter, map, and reduce functions (10.078220844268799 seconds): total = reduce(lambda x,y: x+y,map(lambda x: x * 1.05, filter(lambda x: x < 30, price_list)) )
Using list comprehension (8.609008073806763 seconds): total= sum([x * 1.05 for x in price_list if x <30])
Using generator (8.780538320541382 seconds): total= sum((x * 1.05 for x in price_list if x <30))
It is obvious that the NumPy module is significantly fast in these operations. Is there any alternative solution for running this code faster using Python built-in functions and capabilities?
The Numpy is fast in the calculation but it is inefficient in converting the string list to an int list
To better compare the results, the running time can be divided into three categories for this problem:
This separation is required to clearly show the efficiency of python list vs NumPy array. Having a look to the code:
import time
import numpy as np
from functools import reduce
from collections import Counter
def print_result(total_price, duration, method):
print('{method:<35s} {0:>20.3f} (sec), total_price={1:<10.3f}'
.format(duration, total_price, method=method))
start = time.time()
with open('price_list.txt') as f:
price_list = f.read().split('\n')
print('Duration for reading from file: {0:<50.3f}'.format(time.time() - start))
start = time.time()
np_price_array = np.array(price_list).astype(int) # uing numpy array
print('Duration for converting to numpy int array: {0:<50.3f}'.format(
time.time() - start))
start = time.time()
price_list = list(map(int, price_list))
print('Duration for converting to python int list: {0:<50.3f}'.format(
time.time() - start))
start = time.time()
total_price = np.sum(np_price_array[np_price_array < 30] * 1.05)
print_result(total_price, time.time() - start, "NumPy function (numpy array)")
start = time.time()
total_price = sum([x for x in np_price_array if x < 30])*1.05
print_result(total_price, time.time() - start,
"List comprehension (numpy array)")
start = time.time()
total_price = sum((x for x in price_list if x < 30))*1.05
print_result(total_price, time.time() - start, "List generator (Python list)")
start = time.time()
total_price = sum([x for x in price_list if x < 30])*1.05
print_result(total_price, time.time() - start,
"List comprehension (Python list)")
start = time.time()
total_price = sum([float(x[0]) * float(x[1])
for x in Counter(price_list).items() if x[0] < 30]) * 1.05
print_result(total_price, time.time() - start, "Counter (Python list)")
start = time.time()
total_price = reduce(
lambda x, y: x+y, filter(lambda x: x < 30, price_list)) * 1.05
print_result(total_price, time.time() - start, "functions (Python list)")
Following chart is as the result of the code output: