batch-file for-loop filter duplicates findstr

Deleting duplicate text lines using a batch file

I am creating a text file that displays a file extenstion of each file in the folder. I want to get rid of duplicate since it creates a line of text for each file.

After a bit of searching, I figured out I should use findstr to overwrite the initial file with a new version that has specific extension removed (after I write it to the filtered file).

for %%A in (*.*) do echo %%~xA >> initial.txt
for /F %%B in (initial.txt) do (
    echo %%B >> filtered.txt
    for /F %%C in (initial.txt) do findstr /v %%C initial.txt > initial.txt
)

but it leaves the initial file empty (as expected) while still copying every single line to the filtered.txt file. I'd be very glad for some help.

Solution

You could create the file filtered.txt immediately, there is no need for initial.txt:

> "filtered.txt" rem/
for %%A in ("*.*") do (
    > nul find /I "%%~xA" "filtered.txt"
    if ErrorLevel 1 (
        >> "filtered.txt" echo %%~xA
    )
)

Here I am using find rather than findstr because only simple literal strings are to be searched. find (like findstr) sets the ErrorLevel to 0 if at least one match is encountered, and to 1 if none has been found.

Depending on the returned ErrorLevel, the currently iterated file extension in %%~xA is echoed and redirected into filtered.txt or not. So if filtered.txt already contains the current item, it is not echoed, but if no match is encountered, the item is appended to the file.

The first line creates an empty file filtered.txt for find not to fail for its first execution.

If you want to use the file initial.txt anyway, you could do the following:

>> "initial.txt" (
    for %%A in ("*.*") do echo %%~xA
)
> "filtered.txt" rem/
for /F "usebackq eol=| delims=" %%A in ("initial.txt") do (
    > nul find /I "%%~xA" "filtered.txt"
    if ErrorLevel 1 (
        >> "filtered.txt" echo %%~xA
    )
)

This code is almost the same as above; the only differences are the preceded creation of initial.txt (potentionally containing duplicates), and the enumeration of its content (by for /F) rather than enumerating the current directory directly.