I have a huge number of files to sort all named in some terrible convention.
Here are some examples:
(4)_mr__mcloughlin____.txt
12__sir_john_farr____.txt
(b)mr__chope____.txt
dame_elaine_kellett-bowman____.txt
dr__blackburn______.txt
These names are supposed to be a different person (speaker) each. Someone in another IT department produced these from a ton of XML files using some script but the naming is unfathomably stupid as you can see.
I need to sort literally tens of thousands of these files with multiple files of text for each person; each with something stupid making the filename different, be it more underscores or some random number. They need to be sorted by speaker.
This would be easier with a script to do most of the work then I could just go back and merge folders that should be under the same name or whatever.
There are a number of ways I was thinking about doing this.
I plan on using Perl, but I can try a new language if it's worth it. I'm not sure how to go about reading in each filename in a directory one at a time into a string for parsing into an actual name. I'm not completely sure how to parse with regex in perl either, but that might be googleable.
For the sorting, I was just gonna use the shell command:
`cp filename.txt /example/destination/filename.txt`
but just cause that's all I know so it's easiest.
I dont even have a pseudocode idea of what im going to do either so if someone knows the best sequence of actions, im all ears. I guess I am looking for a lot of help, I am open to any suggestions. Many many many thanks to anyone who can help.
B.
I hope I understand your question right, it's a bit ambiguous IMHO. This code is untested, but should do what I think you want.
use File::Copy;
sub sanatize {
local $_ = shift;
s/\b(?:dame|dr|mr|sir)\b|\d+|\(\w+\)|.txt$//g;
s/[ _]+/ /g;
s/^ | $//g;
return lc $_;
}
sub sort_files_to_dirs {
my @files = @_;
for my $filename (@files) {
my $dirname = sanatize($filename);
mkdir $dirname if not -e $dirname;
copy($filename, "$dirname/$filename");
}
}