I'm looking to create an automated Powershell script with task scheduler to do a mass rename of auto-generated PDFs and then save them to a second folder. The original name is irrelevant but is generally in the form 0013238974.pdf. These each need to be renamed based on text contained within the file. Example:
TEXT TEXT TEXT
$ACCT_ID
TEXT TEXT TEXT
Thus the new name of the file would need to be $ACCT_ID.pdf, and then saved in the new destination. I've got no problem with the move, that's just a simple
Get-ChildItem -Path C:\Original\PDF\Generation\Folder -Include *.pdf -Recurse |
copy-item -destination C:\The\Folder\I\Need\Them\In
But I'm stumped after that when it comes to extracting the information from the already generated PDF and saving the renamed version as $ACCT_ID.pdf.
I considered running it through a separate PDF print command instead of open/resave, but that doesn't solve my $ACCT_ID extraction problem.
Thanks for any insight on this.
There isn't any build-in functionality for reading PDF files in PowerShell so your best bet is to use a third party .NET component. There are several commercial and also at least a few free open source alternatives.
Here's a few lines of example code using iTextSharp to read the PDF:
Add-Type -Path .\itextsharp.dll
$pdfReader = New-Object iTextSharp.text.pdf.PdfReader("C:\file.pdf")
$textFromFirstPage = [iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdfReader, 1)
$pdfReader.Dispose()
How you go about finding your account id after that of course depends on the text of your files.