Loading Now

Read text from a PDF with Powershell

The other day I helped a co worker with a script he was working on. He needed to read text from a PDF with Powershell. I had done this in the past with autoit but that wasn’t going to be an option this time. There are a lot of posts about this online but they almost all lead to itext7 I don’t if my co worker and I are just dumb but we just could not get their module installed. I did end up finding a different way to get this done. You really just need this DLL that has the library to deal with PDF files. I cant upload it here but you can get it easily.

Get DLL

You can download the .DLL file from this site (UPDATE 10-19-21: the original .dll is no longer at the original link. Another commenter pointed out its on github here also I created a share link for the .dll I use for this and I know this one works. That link is here) . When you get to the site click the “Download Archive” button. This will give you a zip file. Extract it, inside the folder open sourceCode, Main, Libraries. There you will find itextsharp.dll. Copy this file to C:\PS\ (this is where our script will look).

Read text from PDF file

I made this into a function so it is easy to use in a larger script. here it is:

function convert-PDFtoText {
	param(
		[Parameter(Mandatory=$true)][string]$file
	)	
	Add-Type -Path "C:\ps\itextsharp.dll"
	$pdf = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $file
	for ($page = 1; $page -le $pdf.NumberOfPages; $page++){
		$text=[iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdf,$page)
		Write-Output $text
	}	
	$pdf.Close()
}

This is is an example of how to run it and display the results to the screen.

$file = "C:\Path\To\PDF.pdf"

convert-PDFtoText $file

With this example we set the text into a variable for later use.

$file = "C:\Path\To\PDF.pdf"
$text = convert-PDFtoText $file

My name is Skylar Pearce, I have been working as a System Administror since 2013 as well some side consulting work. During my career I have worked with everything from Active Directory and vCenter to configuring routers and switches and phone systems, documenting and scripting my way through the whole thing. I have a Security+ certification and am currently working on my PenTest+. Throughout my career I have gained almost all of my knowledge from blogs like this. It is now time for me to pay it back. Over time I have gathered scripts and tricks over the years that I will share on this site. A lot of the posts here will be mainly reference posts, some will be full on how to’s. I am happy to go into more depth on any other topics I go over here, just make a comment on a post. I will do my best to post once a day on weekdays but as I run out of ideas it may slow down. My WordPress skills are still growing so the site will likely get better over time as I learn. You can reach me at contact@allthesystems.com or on LinkedIn

21 comments

comments user
Chris Campbell

Really nice, simple solution for PDF text ingest!

There is a small typo in your example code:
$tex=[iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdf,$page)

Should read:
$text=[iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdf,$page)

Thanks again 🙂

    comments user
    Skylar Pearce

    Good catch! I updated the example. Thanks for the heads up and I’m glad this helped you out!

comments user
Timoteo Brito

Very good article.
Great Job !!!

comments user
Jim

This exactly what I was looking for!! Thank you for sharing.

comments user
Mike

Dear Chris,
I have been looking for something like that but for some reason I am stuck 🙁
If I understood correctly, I have to create the PS folder and copy the itextshart.dll to it. Doing some research, I unlocked the file through the properties but I am stuck with the command.

When I try to run:

$text = convert-PDFtoText $file

I end up with an error:

Add-Type : Could not load file or assembly ‘file:///C:\ps\itextsharp.dll’ or one of its dependencies. Operation is not
supported. (Exception from HRESULT: 0x80131515)
At line:5 char:2
+ Add-Type -Path “C:\ps\itextsharp.dll”
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Add-Type], FileLoadException
+ FullyQualifiedErrorId : System.IO.FileLoadException,Microsoft.PowerShell.Commands.AddTypeCommand

New-Object : Cannot find type [iTextSharp.text.pdf.pdfreader]: verify that the assembly containing this type is loaded.
At line:6 char:9
+ $pdf = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $fi …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidType: (:) [New-Object], PSArgumentException
+ FullyQualifiedErrorId : TypeNotFound,Microsoft.PowerShell.Commands.NewObjectCommand

You cannot call a method on a null-valued expression.
At line:11 char:2
+ $pdf.Close()
+ ~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull

Do I need to copy other files into the PS folder or what am I doing wrong?
Thanx for the help,
Mike

    comments user
    Skylar Pearce

    Hmm I just followed this guide on another PC and it just worked… Are you running powershell as admin? This is the line in the function that is broken for you. If you can get this to run the rest of the script should work: Add-Type -Path “C:\ps\itextsharp.dll”

    The only thing I put in the C:\ps folder was that itextsharp.dll so maybe the dll itself you have is bad? here is the one I just tested: https://1drv.ms/u/s!Al3V0Ewdxn5Kk9t0b2xpNFlSHCwF5w?e=iYTy7Z

      comments user
      Anonymous

      I just needed to click “Unblock” in the file properties dialog in Windows Explorer.

        comments user
        Skylar Pearce

        Awesome!

comments user
Mike

Sorry used the wrong name… Skylar of course, my bad

    comments user
    Aaron

    Hello, what was the mistake exactly? I am having the exact same issue here.

      comments user
      Skylar Pearce

      He never replied back to say if what I sent fixed the problem, but I’m pretty sure he just has a bad copy of itextsharp.dll. Did you try the one I linked for him? The easiest way to test if the dll is working is to run this command:

      Add-Type -Path “C:\ps\itextsharp.dll”

      if you don’t get an error the rest of the script should work.

        comments user
        Richard

        I followed the instructions, added the dll, and ran powershell s admin.

        ran: Add-Type -Path “C:\PS\itextsharp.dll”

        My error is:
        Add-Type : Could not load file or assembly ‘file:///C:\PS\itextsharp.dll’ or one of its dependencies. Operation is not supported. (Exception from HRESULT: 0x80131515)
        At line:1 char:1
        + Add-Type -Path “C:\PS\itextsharp.dll”
        + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        + CategoryInfo : NotSpecified: (:) [Add-Type], FileLoadException
        + FullyQualifiedErrorId : System.IO.FileLoadException,Microsoft.PowerShell.Commands.AddTypeCommand

comments user
chris wieman

The issue was a bad dll. Get it from “https://github.com/itext/itextsharp/releases/tag/5.5.13.1”

    comments user
    Skylar Pearce

    Thanks Chris! I have been looking for a new place to download the .dll for everyone since the source I had archived it. Thanks again!

comments user
Allferry

Hi Skylar,

I am having some issues running the script, which i think it’s from the DLL. I get the same error if i just run Add-Type -Path “C:\path\itextsharp.dll”
—————————————–
Add-Type : Could not load file or assembly ‘file:///C:\Temp\Testxx\itextsharp.dll’ or one of its dependencies. Operation
is not supported. (Exception from HRESULT: 0x80131515)
At line:1 char:1
+ Add-Type -Path “C:\Temp\Testxx\itextsharp.dll”
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Add-Type], FileLoadException
+ FullyQualifiedErrorId : System.IO.FileLoadException,Microsoft.PowerShell.Commands.AddTypeCommand
————————————————

Any idea?
Many thanks

comments user
Alan

I’m getting the same error Mike and Aaron received. When I click the link to the .dll you listed in the comments, not the original link, it does not work. Can someone give me a working .dll??

comments user
Joe Bruns

Your link https://archive.codeplex.com/?p=itspdfservice zip file does NOT contain that dll.

Comments are closed.