I have like 200 PDF documents that will be saved in a folder (every day), I have to sort them based on a their content. (All pdf-documents have the string "X_P1
" or "X_P2
" in it)
My first step is to convert the .pdf
file to a.txt
files using XPDF
:
for /r %%i in (*pdf) do "C:\Users\xxx\pdftotext.exe" -layout "%%i"
So I end up with 200 PDF files and 200 text files in a folder.
Looks like this:
p100.pdf
p100.txt
p101.pdf
p101.txt
...
So for the next step I thought of searching for the string "X_P1
" in the .txt
file with FINDSTR
and save the filename as a variable. (e.g. p100
) Next step: Move all files that name is the same as the variabel to a folder.
I'm not very familiar with batch/powershell so how can I work with the result from FINDSTR
. I thought of maybe using the errorlevels
? So if I get ERORRLEVEL 0
move to folder 1 .
(All pdf-documents have the string "X_P1" or "X_P2" in it)
Sort them based on their content.
Convert a .pdf to text. If you find "X_P1" move to xp1
if not move to xp2
.
@ECHO OFF
SETLOCAL ENABLEEXTENSIONS
REM begin debug
REM del "%userprofile%\Desktop\*.pdf" 2>nul
REM rd /q /s xp1 2>nul
REM rd /q /s xp2 2>nul
REM copy /y "%userprofile%\Desktop\New Folder\*.pdf" "%userprofile%\Desktop\" 1>nul
REM end debug
md xp1 2>nul
md xp2 2>nul
for /f %%a in ('dir /b *.pdf') do (
pdftotext.exe -raw %%a tmp.txt
find "X_P1" tmp.txt > nul && move %%a xp1 || move %%a xp2
)
del tmp.txt
exit /b