Search code examples
pythonsorting

Array of strings sorted with less specific match first


I have a list of files named like:

file1.jpg
file2.jpg
.
.
.

where occasionally there is a duplicate such that the correct order should be:

fileN.jpg
fileN 1.jpg
fileN 2.jpg

not:

fileN 1.jpg
fileN 2.jpg
fileN.jpg

Note in my use case, the string fileN can contain special characters (including spaces). With the following python script, I get almost what I want, but with the incorrect ordering for the special duplicate case (i.e, the latter example above instead of the one before it).

import os;
files = os.listdir(os.path.curdir)
files.sort()
for file in files:
    print(file)

How do I get the correct ordering, as shown in my 2nd code snippet? (I'm sure there is a more elegant CS name for the sorting I am looking for - that terminology would also be appreciated. I imagine something like this question has been asked before, but I am having trouble formulating the right query.)


Solution

  • sort takes an optional key function which lets you control the sort order.

    What you can do is split the files into name and extension with .split('.'). This turns file1 1.jpg into ['file 1', 'jpg']—not permanently; just temporarily for the purposes of sorting. Python will naturally put ['file1', 'jpg'] ahead of ['file1 1', 'jpg'], giving you the sort order you desire.

    >>> files.sort(key=lambda name: name.split('.'))
    >>> files
    ['file1.jpg', 'file1 1.jpg', 'file1 2.jpg', 'file2.jpg', 'file2 1.jpg', 'file2 2.jpg']