Search code examples
loopscsvbatch-fileautomationpreprocessor

Batch file: count duplicate ids and write them in column of csv


I am currently trying to automate the preprocessing process on a csv file via a batch file. I have the following table:

id;street;name;nrOfIds
4014001;T1;example1;0
4014002;B2;example2;0
4014003;B3;example3;0
4014004;L1;example4;0
4015001;M3;example5;0
4015002;B9;example6;0
4016001;T4;example7;0
4016002;L2;example8;0
4016003;L1;example9;0

The first row "id" holds the id of the entry which is made unique by the last 3 digts (for example 001, 002, 003, ...). The digits before the last three digits are not unique. As you can see in the result table, I want to count how often the first part of the ID (so the part before the last three digits) exists in the table and I want to write the sum into the third column named "nrOfIds". The result table then should look like this:

id;street;name;nrOfIds
4014001;T1;example1;4
4014002;B2;example2;4
4014003;B3;example3;4
4014004;L1;example4;4
4015001;M3;example5;2
4015002;B9;example6;2
4016001;T4;example7;3
4016002;L2;example8;3
4016003;L1;example9;3

For example, the part before the last three digits of the first line (4014) exists exactly 4 times in the whole table, so I write 4 in the "nrOfIds" column and so on.

The code used for this looks like this:

@echo off
setlocal enabledelayedexpansion

for /F "tokens=1-3* delims=;" %%a in (%PREPROCESSING_INPUT_PATH%%INPUT_FILENAME%) do (
   (echo %%a;%%b;%%c)> "%PREPROCESSING_INPUT_PATH%%OUTPUT_FILENAME%" & goto :file
)
:file
(for /F "skip=1 tokens=1-3* delims=;" %%a in (%PREPROCESSING_INPUT_PATH%%INPUT_FILENAME%) do (
  REM count ids (like 4014, 4015, ... and write sum into "nrOfIds" column
  ) 
) >> %PREPROCESSING_OUTPUT_PATH%%OUTPUT_FILENAME%


pause

Any suggestions on how to do this? Thank you very much in advance! Your help is greatly appreciated.


Solution

  • Pretty similar to the previous answer I posted, here we just use find /C to identify the number of occurrences of the last 3 digits of the ID:

    @echo off
    setlocal enabledelayedexpansion
    set "infile=z:\folder31\testcsv.csv"
    set "outfile=%PREPROCESSING_OUTPUT_PATH%testOutput.csv"
    for /f "usebackq delims=" %%a in ("%infile%") do (
        (echo %%a)>"%outfile%" & goto :file
    )
    :file
    (for /f "skip=1 usebackq tokens=1-4*delims=;" %%a in ("%infile%") do (
        set "match=%%a"
        for /f %%i in ('findstr /B "!match:~0,-3!" "%infile%" ^| find /C "!match:~0,-3!"') do (
        set /a _cnt=%%i
        echo %%a;%%b;%%c;!_cnt!
      )
     )
    )>>"%outfile%"
    

    Debug version:

    @echo off
    setlocal enabledelayedexpansion
    set "infile=%PREPROCESSING_INPUT_PATH%%INPUT_FILENAME%"
    set "outfile=%PREPROCESSING_OUTPUT_PATH%%OUTPUT_FILENAME%"
    for /f "usebackq delims=" %%a in ("%infile%") do (
        (echo %%a) & goto :file
    )
    :file
    (for /f "skip=1 usebackq tokens=1-4*delims=;" %%a in ("%infile%") do (
         set "match=%%a"
     for /f %%i in ('findstr /B "!match:~0,-3!" "%infile%" ^|find /C "!match:~0,-3!"') do (
         set /a _cnt=%%i
         echo %%a;%%b;%%c;!_cnt!
      )
     )
    )
    pause