Search code examples
multibyte-charactersironpdf

Multibyte characters reading problem in IronPdf


I am trying IronPDF. I want to insert PDF metadata to database which I read with IronPDF. However, some "ı" characters in the metadata are not read with IronPDF. Spaces are left in place of these characters. Here is my code sample:

var md = PdfDocument.FromFile("___PATH OF PDF FILE___");
var article_title = md.MetaData.Title;

When I copy paste string to Notepad++ it gives a result like this:

enter image description here

And here is the screenshot of application view:

enter image description here

Is there a way to solve this problem or is this a bug of IronPDF? If everything goes well, of course, I think of buying. But of course, if it fails on the first try, continue to iTextSharp.

EDIT: First of all, I apologize for Windows, which made me surprised. I struggled to get a new system up all day and unfortunately it's still visual studio etc. not to be installed. I added one of the files I had problems with in the below and the IronPDF version appears as 2019.7.0.0.

PDF file: https://yadi.sk/d/HwP9JWRWTzMlSA


Solution

  • First of all, since you haven't provided us with a sample PDF to work with; I've google some Turkish PDF documents having metadata with Turkish characters. This is the file that I came up with: link enter image description here As you can see above the Author metadata field has ı Turkish character.

    Then I created a dotnet fiddle in order to test this file using IronPDF (with the latest available version - since you haven't specified any): sample using IronPDF

    The output from this sample is ElifCakroglu which is showing the exact same symptom when copied to Notepad++: enter image description here

    Playing with the encodings did not help resolving this issue. So I created another dotnet fiddle to test your alternative solution which was iTextSharp: sample using iTextSharp

    This time everything was working as it should be: ElifCakıroglu

    Note: I've also tried creating a Word 2016 document and saving it as a PDF then using that file with the above samples and both of them did not work (not accepting as a valid PDF) for some reason. After that I tried and online PDF document validator, but the file was fine. Then I used an online converter to change the PDF version with the default settings and used the output PDF with both samples and the surprising thing is that both of them worked correctly.

    My conclusion is that iTextSharp is working consistently with both documents having metadata with Turkish characters present, while IronPDF works correctly 50% of the time.