Search code examples
windowsbatch-fileduplicatesfindstr

Search for a double word inside .txt file using batch file


I did some research on google for finding the answer on my question the only thing i found that was similar too my question was on this site: Search for a word inside .txt file using batch file

I made a batch file that is creating a .txt file with 8 lines like this:

Hello

Mate

How

Are

You

Doing

Bye

Bye

I want to make a batch file that can detect double words inside the text file in this file it must detect Bye

In the batch file that also made the .txt file with the 8 lines I want to let it detect the double word Bye in the text file and after some research I came to the conclusion that this must be possible with findstr.

Can findtr detect that the word Bye twice occurs?

In my file i want to get a report like this

echo in the text file you made %isn't or there is% a double word so I want the result of findstr into the variable %isn't or there is%

Sorry if this is a bad question but im new to stackoverflow and im dutch :/ (lot of text translated with google translate) and can you explain if this is a bad question how i can make it clearer for you


Solution

  • Stephan's answer works, but it prints out every replicate word as many times as it appears. It also fairly inefficient, reading the entire file once for every line in the file.

    Here is a fairly simple pure batch solution that prints out each replicate word only once. The task is much simpler if you use SORT to group all the replicates together. However, the Windows SORT command ignores case, so the IF must also ignore case. This solution only reads the file twice, regardless of size, once for SORT, and once for FOR /F.

    @echo off
    setlocal enableDelayedExpansion
    
    set "prev="
    set "dup="
    for /f "delims=" %%W in ('sort test.txt') do (
      if /i %%W==!prev! (
        if not defined dup echo(%%W
        set dup=1
      ) else set "dup="
      set "prev=%%W"
    )
    

    If you want the word comparison to be case sensitive, then the above algorithm requires a case sensitive SORT routine. I've written JSORT.BAT to do just that (among other things). It is pure script (hybrid JScript/batch) that runs natively on Windows.

    But if you are willing to use a JScrpt/batch hybrid, then the solution becomes very simple if you add my JREPL.BAT regular expression find/replace utility. The /M option allows me to search for repeated words across newlines.

    jsort test.txt | jrepl "^(.+)$(\r?\n\1$)+" $1 /jmatch /m
    

    There is significant initialization time to fire up the JScript engine, so this solution is a bit slower than the pure batch solution if the file is small. But if the file is large, than this is much faster than the pure batch solution.