Search code examples
batch-filebatch-processingwkhtmltopdf

Batch file for generating automatic local PDF filenames with wkhtmltopdf


I have a simple batch file with which I want to use the wkhtmltopdf to create PDF files of an archived set of URLs.

The simple command of my batch file for wkhtmltopdf is as follows

start
cd C:\Program Files\wkhtmltopdf\bin
start wkhtmltopdf.exe https://web.archive.org/web/20200524/website.org/article-may-2020-title"C:/Desktop/pdfs/file1.pdf"
pause

This works as expected in a Windows 10 env. as it generates the single PDF file in above location, but the filename is how you set it.

What I want to achieve is to get the article slug from the URL after and make it so the PDF that is generated locally will have the same filename as the article slug;

I.e. from the URL above, take the part (which is after .....website[.]org/) article-may-2020-title and then the locally saved file would be autogenerated or filled into the batch file as "C:/Desktop/pdfs/article-may-2020-title.pdf"

Can this be done with a batch file? Is this easier to be done with a powershell script. If so any hints are appreciated.

Thanks.


Solution

  • There could be used the following commented batch file:

    @echo off
    setlocal EnableExtensions DisableDelayedExpansion
    set "ProgramDirectory=%ProgramFiles%\wkhtmltopdf\bin"
    set "OutputDirectory=%ProgramDirectory%\pdfs"
    
    set "ListFile=%~1"
    rem Is the batch file started without any argument?
    if not defined ListFile goto GetListFile
    
    rem The batch file is started with an argument being interpreted as
    rem file name of the urls list file which is checked for existence.
    if exist "%ListFile%" for %%I in ("%ListFile%") do set "ListFile=%%~fI" & goto ProcessList
    echo ERROR: File "%ListFile%" not found!& goto EndBatch
    
    :GetListFile
    rem Use urls.txt on existing in the current directory as urls list file.
    if exist urls.txt for %%I in (urls.txt) do set "ListFile=%%~fI" & goto ProcessList
    
    rem Use urls.txt in program files directory of wkhtmltopdf as urls list file.
    if exist "%ProgramDirectory%\urls.txt" set "ListFile=%ProgramDirectory%\urls.txt" & goto ProcessList
    echo ERROR: No file urls.txt found!& goto EndBatch
    
    :ProcessList
    rem Change the current directory to program files directory of wkhtmltopdf.
    cd /D "%ProgramDirectory%" 2>nul
    if errorlevel 1 echo ERROR: Directory "%ProgramDirectory%" does not exist!& goto EndBatch
    
    rem Check the existence of program file wkhtmltopdf.exe.
    if not exist "%ProgramDirectory%\wkhtmltopdf.exe" echo ERROR: File "%ProgramDirectory%\wkhtmltopdf.exe" not found!& goto EndBatch
    
    rem Create the output directory and check if that is done successfully.
    md "%OutputDirectory%" 2>nul
    if not exist "%OutputDirectory%\" echo ERROR: Failed to create directory "%OutputDirectory%"!& goto EndBatch
    
    echo Processing the urls in file: "%ListFile%"
    for /F useback^ delims^=^ eol^= %%I in ("%ListFile%") do "%ProgramDirectory%\wkhtmltopdf.exe" "%%~I" "%OutputDirectory%\%%~nxI.pdf"
    
    :EndBatch
    endlocal
    echo(
    pause
    

    The program files directory of wkhtmltopdf is defined in the third line.

    The output directory for the PDF files is defined in the fourth line.

    The batch file can be started with an argument which is interpreted as name of the file containing the urls. Otherwise the batch file searches in current directory for a file with name urls.txt which can be any directory. Last there is searched for urls.txt in program files directory of wkhtmltopdf.

    The main command line is the FOR command line which processes all non-empty lines in the urls list file with an empty list of string delimiters to turn off the default line splitting and no character for end of line to really process all non-empty lines in the urls list file.

    There could be used also "usebackq delims=" instead of useback^ delims^=^ eol^= to process all lines in urls list file, except the urls with a semicolon at the beginning of the line. In other words an url in the list file could be commented out with ; at the beginning of the line on using "usebackq delims=" in the FOR command line.

    The string after last / in each url is used as file name for the PDF file.

    To understand the commands used and how they work, open a command prompt window, execute there the following commands, and read the displayed help pages for each command, entirely and carefully.

    • call /? ... explains %~1
    • cd /?
    • echo /?
    • endlocal /?
    • for /?
    • goto /?
    • if /?
    • md /?
    • pause /?
    • rem /?
    • set /?
    • setlocal /?

    See also single line with multiple commands using Windows batch file for an explanation of the operator &.