Does anyone know how I can optimize this code better to run larger files. It works with smaller inputs, but I need it to run a file with over 200,000 words. Any suggestions?
Thank you.
import random
import re
def quick_sort(a,i,n):
if n <= 1:
return
mid = (len(a)) // 2
x = a[random.randint(0,len(a)-1)]
p = i - 1
j = i
q = i + n
while j < q:
if a[j] < x:
p = p + 1
a[j],a[p] = a[p],a[j]
j = j + 1
elif a[j] > x:
q = q - 1
a[j],a[q] = a[q],a[j]
else:
j = j + 1
quick_sort(a,i,p-i+1)
quick_sort(a,q,n-(q-i))
file_name = input("Enter file name: ")
my_list = []
with open(file_name,'r') as f:
for line in f:
line = re.sub('[!#?,.:";\']', '', line).lower()
token = line.split()
for t in token:
my_list.append(t)
a = my_list
quick_sort(a,0,len(my_list))
print("List After Calling Quick Sort: ",a)
Your random selection of an index to use for your pivot x
is using the whole size of the input list a
, not just the part you're supposed to be sorting on the current call. This means that very often your pivot won't be in the current section at all, and so you won't be able to usefully reduce your problem (because all of the values will be on the same side of the pivot). This leads to lots and lots of recursion, and for larger inputs you'll almost always hit the recursion cap.
The fix is simple, just change how you get x
:
x = a[random.randrange(i, i+n)]
I like randrange
a lot better than randint
, but you could use randint(i, i+n-1)
if you feel the other way.