Map multiple columns from multiple files which are slightly different

I am looking for a good practical method of tackling metadata normalization between multiple files that have slightly different schema's for a batch ETL job in Talend.

I have a few hundred historical reports (around 25K to 200K records each) with about 100 to 150 columns per excel file. Most of the column names are the same for all the files (98% overlap) however there are subtle evil differences:

Different Column orders
Different Column names (sometimes using and sometimes not using abbreviations)
Different counts of columns
Sometimes columns have spaces between words, sometimes, dots, dashes or underscores
etc.

Short of writing a specialized application or brute forcing all the files by manually correcting them, are there any good free tools or methods that would provide a diff and correction between file column names in an intelligent or semi-automated fashion?

Solution

You could use Talend Open Studio to achieve that. But I do see one caveat.

The official way

In order to make Talend understand your Excel files, you will need to first load it's metadata. The caveat is that you will need to load all metadata by hand (one by one). In the free version of Talend (Open Studio Data), there is no support for dynamic metadata.
Using components like tMap you can then map your input metadata into your desired output metadata (could be a Excel file or a Database or something else). During this step you can shape your input data into your desired output (fixing / ignoring / transforming it / etc).

The unofficial way

There seems to exist a user contributed component that offers support the Excel dynamic metadata. I did not test it, but it worth trying : http://www.talendforge.org/exchange/?eid=663&product=tos&action=view&nav=1,1,1

This can evolve as components are released, updated frequently. My answer is about the status as it is on version 5.3.1