Search code examples
windowsbatch-filefindstr

How to sort .pdf based on their content using CMD


I have like 200 PDF documents that will be saved in a folder (every day), I have to sort them based on a their content. (All pdf-documents have the string "X_P1" or "X_P2" in it)

My first step is to convert the .pdf file to a.txt files using XPDF:

for /r %%i in (*pdf) do "C:\Users\xxx\pdftotext.exe" -layout "%%i"

So I end up with 200 PDF files and 200 text files in a folder.

Looks like this:

p100.pdf
p100.txt
p101.pdf
p101.txt
...

So for the next step I thought of searching for the string "X_P1" in the .txt file with FINDSTR and save the filename as a variable. (e.g. p100) Next step: Move all files that name is the same as the variabel to a folder.

I'm not very familiar with batch/powershell so how can I work with the result from FINDSTR. I thought of maybe using the errorlevels? So if I get ERORRLEVEL 0 move to folder 1 .


Solution

  • (All pdf-documents have the string "X_P1" or "X_P2" in it)

    Sort them based on their content.

    Convert a .pdf to text. If you find "X_P1" move to xp1 if not move to xp2.

    @ECHO OFF 
    SETLOCAL ENABLEEXTENSIONS
    REM begin debug 
    REM del "%userprofile%\Desktop\*.pdf" 2>nul 
    REM rd /q /s xp1 2>nul 
    REM rd /q /s xp2 2>nul
    REM copy /y "%userprofile%\Desktop\New Folder\*.pdf" "%userprofile%\Desktop\" 1>nul
    REM   end debug
    md xp1 2>nul
    md xp2 2>nul
    for /f %%a in ('dir /b *.pdf') do (
    pdftotext.exe -raw %%a tmp.txt
    find "X_P1" tmp.txt > nul && move %%a xp1 || move %%a xp2
    )
    del tmp.txt
    exit /b