I have written a shell script that gets all the file names from a folder, and all its sub-folders, and copies them to the clipboard after sorting (removing all paths; I just need a simple file list of the thousands of randomly named files within).
What I can’t figure out is how to get the SORT command to sort properly. Meaning, the way a spreadsheet would sort things. Or the way your Mac finder sorts things.
Underscores > numbers > letters (regardless of case)
Anyone know how to do this? Sort -n
only works for files starting with numbers, sort -f
was close but separated the lower case and capitals in a weird way, and anything starting with a number was all over the place. Sort -V
was the closest, but anything started with an underscore went to the bottom instead of the top… I’m about to lose my mind. 🤣
I’ve been trying to figure this out for a week, and no combination of anything I have tried gets the sort command to actually, ya know, sort properly.
Help?
If I understand the problem correctly, you want the "natural sort order" as described in Natural sort order - Wikipedia, Sorting for Humans : Natural Sort Order, and macos - How does finder sort folders when they contain digits and characters?.
Using Linux sort(1) you need the -V
(--version-sort
) option for "natural" sort. You also need the -f
(--ignore-case
) option to disregard the case of letters. So, assuming that the file names are stored one-per-line in a file called files.txt
you can produce a list (mostly) sorted in the way that you want with:
sort -Vf files.txt
However, sort -Vf
sorts underscores after digits and letters on my system. I've tried using different locales (see How to set locale in the current terminal's session?), but with no success. I can't see a way to change this with sort
options (but I may be missing something).
The characters .
and ~
seem to consistently sort before numbers and letters with sort -V
. A possible hack to work around the problem is to swap underscore with one of them, sort, and then swap again. For example:
tr '_~' '~_' <files.txt | LC_ALL=C sort -Vf | tr '_~' '~_'
seems to do what you want on my system. I've explicitly set the locale for the sort
command with LC_ALL=C ...
so it should behave the same on other systems. (See Why doesn't sort sort the same on every machine?.)