The other day I helped a co worker with a script he was working on. He needed to read text from a PDF with Powershell. I had done this in the past with autoit but that wasn’t going to be an option this time. There are a lot of posts about this online but they almost all lead to itext7 I don’t if my co worker and I are just dumb but we just could not get their module installed. I did end up finding a different way to get this done. You really just need this DLL that has the library to deal with PDF files. I cant upload it here but you can get it easily.
Get DLL
You can download the .DLL file from this site (UPDATE 10-19-21: the original .dll is no longer at the original link. Another commenter pointed out its on github here also I created a share link for the .dll I use for this and I know this one works. That link is here) . When you get to the site click the “Download Archive” button. This will give you a zip file. Extract it, inside the folder open sourceCode, Main, Libraries. There you will find itextsharp.dll. Copy this file to C:\PS\ (this is where our script will look).
Read text from PDF file
I made this into a function so it is easy to use in a larger script. here it is:
function convert-PDFtoText {
param(
[Parameter(Mandatory=$true)][string]$file
)
Add-Type -Path "C:\ps\itextsharp.dll"
$pdf = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $file
for ($page = 1; $page -le $pdf.NumberOfPages; $page++){
$text=[iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdf,$page)
Write-Output $text
}
$pdf.Close()
}
This is is an example of how to run it and display the results to the screen.
$file = "C:\Path\To\PDF.pdf"
convert-PDFtoText $file
With this example we set the text into a variable for later use.
$file = "C:\Path\To\PDF.pdf"
$text = convert-PDFtoText $file
Really nice, simple solution for PDF text ingest!
There is a small typo in your example code:
$tex=[iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdf,$page)
Should read:
$text=[iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdf,$page)
Thanks again 🙂
Good catch! I updated the example. Thanks for the heads up and I’m glad this helped you out!
Very good article.
Great Job !!!
This exactly what I was looking for!! Thank you for sharing.
Dear Chris,
I have been looking for something like that but for some reason I am stuck 🙁
If I understood correctly, I have to create the PS folder and copy the itextshart.dll to it. Doing some research, I unlocked the file through the properties but I am stuck with the command.
When I try to run:
$text = convert-PDFtoText $file
I end up with an error:
Add-Type : Could not load file or assembly ‘file:///C:\ps\itextsharp.dll’ or one of its dependencies. Operation is not
supported. (Exception from HRESULT: 0x80131515)
At line:5 char:2
+ Add-Type -Path “C:\ps\itextsharp.dll”
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Add-Type], FileLoadException
+ FullyQualifiedErrorId : System.IO.FileLoadException,Microsoft.PowerShell.Commands.AddTypeCommand
New-Object : Cannot find type [iTextSharp.text.pdf.pdfreader]: verify that the assembly containing this type is loaded.
At line:6 char:9
+ $pdf = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $fi …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidType: (:) [New-Object], PSArgumentException
+ FullyQualifiedErrorId : TypeNotFound,Microsoft.PowerShell.Commands.NewObjectCommand
You cannot call a method on a null-valued expression.
At line:11 char:2
+ $pdf.Close()
+ ~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Do I need to copy other files into the PS folder or what am I doing wrong?
Thanx for the help,
Mike
Hmm I just followed this guide on another PC and it just worked… Are you running powershell as admin? This is the line in the function that is broken for you. If you can get this to run the rest of the script should work: Add-Type -Path “C:\ps\itextsharp.dll”
The only thing I put in the C:\ps folder was that itextsharp.dll so maybe the dll itself you have is bad? here is the one I just tested: https://1drv.ms/u/s!Al3V0Ewdxn5Kk9t0b2xpNFlSHCwF5w?e=iYTy7Z
I just needed to click “Unblock” in the file properties dialog in Windows Explorer.
Awesome!
Sorry used the wrong name… Skylar of course, my bad
Hello, what was the mistake exactly? I am having the exact same issue here.
He never replied back to say if what I sent fixed the problem, but I’m pretty sure he just has a bad copy of itextsharp.dll. Did you try the one I linked for him? The easiest way to test if the dll is working is to run this command:
Add-Type -Path “C:\ps\itextsharp.dll”
if you don’t get an error the rest of the script should work.
I followed the instructions, added the dll, and ran powershell s admin.
ran: Add-Type -Path “C:\PS\itextsharp.dll”
My error is:
Add-Type : Could not load file or assembly ‘file:///C:\PS\itextsharp.dll’ or one of its dependencies. Operation is not supported. (Exception from HRESULT: 0x80131515)
At line:1 char:1
+ Add-Type -Path “C:\PS\itextsharp.dll”
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Add-Type], FileLoadException
+ FullyQualifiedErrorId : System.IO.FileLoadException,Microsoft.PowerShell.Commands.AddTypeCommand
The source link I had in the original article is no longer valid. Another commenter found it on github though here is the link: https://github.com/itext/itextsharp/releases/tag/5.5.13.1
The issue was a bad dll. Get it from “https://github.com/itext/itextsharp/releases/tag/5.5.13.1”
Thanks Chris! I have been looking for a new place to download the .dll for everyone since the source I had archived it. Thanks again!
Hi Skylar,
I am having some issues running the script, which i think it’s from the DLL. I get the same error if i just run Add-Type -Path “C:\path\itextsharp.dll”
—————————————–
Add-Type : Could not load file or assembly ‘file:///C:\Temp\Testxx\itextsharp.dll’ or one of its dependencies. Operation
is not supported. (Exception from HRESULT: 0x80131515)
At line:1 char:1
+ Add-Type -Path “C:\Temp\Testxx\itextsharp.dll”
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Add-Type], FileLoadException
+ FullyQualifiedErrorId : System.IO.FileLoadException,Microsoft.PowerShell.Commands.AddTypeCommand
————————————————
Any idea?
Many thanks
The source link I had in the original article is no longer valid. Another commenter found it on github though here is the link: https://github.com/itext/itextsharp/releases/tag/5.5.13.1
I’m getting the same error Mike and Aaron received. When I click the link to the .dll you listed in the comments, not the original link, it does not work. Can someone give me a working .dll??
The source link I had in the original article is no longer valid. Another commenter found it on github though here is the link: https://github.com/itext/itextsharp/releases/tag/5.5.13.1
Your link https://archive.codeplex.com/?p=itspdfservice zip file does NOT contain that dll.
The source link I had in the original article is no longer valid. Another commenter found it on github though here is the link: https://github.com/itext/itextsharp/releases/tag/5.5.13.1