Search code examples
c#pdftext-files

Convert a pdf file to text in C#


I need to convert a .pdf file to a .txt file

How can I do this in C#?


Solution

  • Ghostscript could do what you need. Below is a command for extracting text from a pdf file into a txt file (you can run it from a command line to test if it works for you):

    gswin32c.exe -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f ps2ascii.ps "test.pdf" -c quit >"test.txt"
    

    Check here: codeproject: Convert PDF to Image Using Ghostscript API for details on how to use ghostscript with C#