Search code examples
batch-filefor-loopduplicatesnested-loopsno-duplicates

Batch filter duplicate lines and write to a new file (semi-finished)


I have successfully made a script that filters out duplicate lines in a file and saves the results to a variable semi-colon separated (sort of an "array"). I could not find any real good solution to it.

@echo off
setlocal enabledelayedexpansion

rem test.txt contains:
rem 2007-01-01
rem 2007-01-01
rem 2007-01-01
rem 2008-12-12
rem 2007-01-01
rem 2009-06-06
rem ... and so on

set file=test.txt

for /f "Tokens=* Delims=" %%i in ('type %file%') do (
    set read=%%i
    set read-array=!read-array!;!read!
)

rem removes first trailing ";"
set read-array=!read-array:*;=!
echo !read-array!

for /f "Tokens=* Delims=" %%i in ('type %file%') do (
    set dupe=0
    rem searches array for the current read line (%%i) and if it does exist, it deletes ALL occurences of it
    echo !read-array! | find /i "%%i" >nul && set dupe=1
    if ["!dupe!"] EQU ["1"] (
        set read-array=!read-array:%%i;=!
        set read-array=!read-array:;%%i=!
    )
    rem searches array for the current read line (%%i) and if it does not exist, it adds it once
    echo !read-array! | find /i "%%i" >nul || set read-array=!read-array!;%%i
)

rem results: no duplicates
echo !read-array!

Contents of !read-array! is 2008-12-12;2007-01-01;2009-06-06

I now want to take out each item in the array and write them to a new file, with line breaks after each item. Example:

2008-12-12
2007-01-01
2009-06-06

So this is what I've come up with so far.

The problem I'm having is that the second for-loop doesn't accept the !loop! variable as a token definition when being nested. It does however accept %loop% if it's not nested. The reason I'm doing it this way is that the !read-array! may have a unknown number of items, therefore I count them as well. Any ideas?

rem count items in array
set c=0
for %%i in (!read-array!) do set /a c+=1

echo %c% items in array
for /l %%j in (1,1,%c%) do (
    set loop=%%j
    for /f "Tokens=!loop! Delims=;" %%i in ("!read-array!") do (
        echo %%i
        rem echo %%i>>%file%
    )
)
exit /b

Solution

  • At end of your first section, when contents of !read-array! is 2008-12-12;2007-01-01;2009-06-06, you may directly separate the elements of your "list" with a simple for because the standard separators in Batch files may be, besides spaces, comma, semicolon or equal signs:

    for %%i in (%read-array%) do echo %%i
    

    However, may I suggest you a simpler method?

    Why not define a "real" array with the subscript value of the lines? This way, several repeated lines will store its value in the same array element. At end, just display the values of the resulting elements:

    @echo off
    set file=test.txt
    for /F "Delims=" %%i in (%file%) do (
        set read-array[%%i]=%%i
    )
    rem del %file%
    for /F "Tokens=2 Delims==" %%i in ('set read-array[') do (
        echo %%i
        rem echo %%i>>%file%
    )
    

    EDIT Alternative solution

    There is another method that assemble a list of values separated by semicolon as you proposed. In this case each value is first removed from previous list content and immediately inserted again, so at end of the cycle each value is present just once.

    @echo off
    setlocal EnableDelayedExpansion
    set file=test.txt
    for /F "Delims=" %%i in (%file%) do (
        set read-array=!read-array:;%%i=!;%%i
    )
    rem del %file%
    for %%i in (%read-array%) do (
        echo %%i
        rem echo %%i>> %file%
    )