I'm trying to use FINDSTR to search through a folder full of text files, using a text file of strings, then output to results.txt
The text file of strings contains 3,200 lines, each containing an authors name and associated book title. Examples:
George Orwell 1984
H. G. Wells War of the Worlds
Isaac Asimov I, Robot
I also have a folder containing a dozen text lists of ebook filenames (Some of the lists have over 500K lines.), for example:
George Orwell - 1984 (epub).rar
H G Wells - War of the Worlds (pdf).rar
Isaac Asimov - [Robot 0.1] - I, Robot (Mobi).rar
I need to search the text files of filenames for the 3,200 author and titles, and output the results to a 3rd text list.
The filenames also contain other stuff like series info, format, etc, so I'm looking for any lines that contain those authors names and titles but are not exact matches to the search strings, as in my examples above.
This is what I've tried. It matches exact strings OK but I cannot see how to make it find the filenames that contain other stuff as well as all the words in the search strings.
findstr /g:C:\strings.txt *.txt >>C:\results.txt
Can anyone please help me out with the code. Thanks.
This find in files requires a regular expression search because of the strings in strings.txt
do not exist 1:1 in *.txt files.
It is necessary to change the strings in strings.txt
from
George Orwell 1984
H. G. Wells War of the Worlds
Isaac Asimov I, Robot
to
George.*Orwell.*1984
H.*G.*Wells.*War.*of.*the.*Worlds
Isaac.*Asimov.*I.*Robot
This can be done by opening strings.txt
in a text editor with Perl regular expression support and run from top of the file a Perl regular expression replace all with search string [^\w\r\n]+
and replace string .*
. The search expression results in searching for one or more characters not being a word character, a carriage return or a line-feed.
Then it is possible to use:
findstr /I /R /G:C:\Temp\strings.txt *.txt >>C:\Temp\results.txt
strings.txt
and results.txt
should not be in current directory containing the *.txt files searched by FINDSTR or a different file extension than .txt
is used for these two files.