Search code examples
windowsbatch-filecmd

Copy pdf files based on the content (keyword)


I am trying to create the cmd code to scan through and copy the pdf files containing certain keyword and to save the copies to the separate folder. Below is my code, but it doesn't work

@echo off

set "source=C:\instructions"
set "target=C:\instructions\cafe"
set "string=cafe"

set "logfile=logs.txt"

call :main >> "%logfile%"

pause

goto :EOF

:main

for /r "%source%\%dt%" %%a in ("*.pdf") do (
    find /c /i "%string%" "%%~a" 1>&2
    if not errorlevel 1 (
        set /p "out=%%~a / " <nul
        if exist "%target%\%%~nxa" (
            echo:Already exists
        ) ELSE (
            copy "%%~a" "%target%"
            if errorlevel 1 (
                echo:Failed
            ) ELSE (
                echo:Success
            )
        )
    )
)

goto :EOF

Could anyone help me with this please?


Solution

  • Find only works on the plain text content of an encoded pdf so if the keywords are encrypted they may not be found. To get around that limitation windows has content indexing, which for pdf needs an iFilter, which is usually provided by the default pdf reader (avoid adding more than one). If you did not install one from Adobe, SumatraPDF, Tracker PDF-Xchange or Foxit Reader. you will find a good (free but limited) one at https://www.pdflib.com/download/tet-pdf-ifilter/

    Assuming the text is detectable

    Your main issue is the common need for setlocal enabledelayedexpansion there are a few others (such as if target folder does not exist) so I suggest you remove the hiding of messages but have corrected the main problem.

    @echo off
    
    REM use delayed expansion for testing !errorlevel!
    setlocal enabledelayedexpansion
    
    set "source=C:\instructions"
    set "target=C:\instructions\cafe"
    set "string=cafe"
    set "logfile=logs.txt"
    
    call :main >> "%logfile%"
    
    pause
    
    goto :EOF
    
    :main
    REM &dt% will default to nothing ? is it needed?
    
    for /r "%source%\%dt%" %%a in ("*.pdf") do (
        find /c /i "%string%" "%%~a" 1>&2
    REM your test here needs changing to this
        if !errorlevel! == 0 (
            set /p "out=%%~a / " <nul
            if exist "%target%\%%~nxa" (
                echo:Already exists
            ) ELSE (
                copy "%%~a" "%target%"
    REM your test here needs changing to this
                if !errorlevel! == 1 (
                    echo:Failed
                ) ELSE (
                    echo:Success
                )
            )
        )
    )
    
    goto :EOF