I am writing a script to list the 20 largest files in a target directory. Once I have the files, I perform some math on the size to apply the correct human readable sizing information, i.e., Kb, Mb, Gb.
This however is getting the sort out of order. How can I do this, and keep the sort order intact?
#! /usr/bin/env python
import operator, os, sys
args = sys.argv
if len(args) != 2:
print "You must one enter one directory as an argument."
sys.exit(1)
else:
target = args[1]
data = {}
for root, dirs, files in os.walk(target):
for name in files:
filename = os.path.join(root, name)
if os.path.exists(filename):
size = float(os.path.getsize(filename))
data[filename] = size
sorted_data = sorted(data.iteritems(), key=operator.itemgetter(1), reverse=True)
total = str(len(sorted_data))
while len(sorted_data) > 20:
sorted_data.pop()
final_data = {}
for name in sorted_data:
size = str(name[1])
if size >= 1024:
size = round(float(size) / 1024, 2)
if size >= 1024:
size = round(size / 1024, 2)
if size >= 1024:
size = round(size / 1024, 2)
size = str(size) + "Gb"
else:
size = str(size) + "Mb"
else:
size = str(size) + "Kb"
final_data[name] = size
print "The 20 largest files are:\n"
for name in final_data:
print str(final_data[name]) + " " + str(name)
print "\nThere are a total of " + total + " files located in " + target
Your problem is that you create a brand new dictionary to store the modified filesize data. Because that dictionary doesn't contain any information about the file sizes, and because dictionaries don't store their information in any fixed order, you lose your sort order. But it's simple to recover; simply iterate over the sorted_data
instead of the over the final_data
, using final_data
to access the human-readable file sizes. So something like this:
for filename, size in sorted_data:
print filename, final_data[filename]
But an even better solution would be to put your human-readable string generating code into a function!
def human_readable_size(size):
# logic to convert size
return hr_size
Now you don't even have to create a dictionary:
for filename, size in sorted_data:
print filename, human_readable_size(size)