Search code examples
javapdfpdf-generationpdfbox

Writing Arabic with PDFBOX with correct characters presentation form without being separated


I'm trying to generate a PDF that contains Arabic text using PDFBox Apache but the text is generated as separated characters because Apache parses given Arabic string to a sequence of general 'official' Unicode characters that is equivalent to the isolated form of Arabic characters.

Here is an example:
Target text to Write in PDF "Should be expected output in PDF File" -> جملة بالعربي
What I get in PDF File ->

incorrect text

I tried some methods but it's no use here are some of them:
1. Converting String to Stream of bits and trying to extract right values
2. Treating String a sequence of bytes with UTF-8 && UTF-16 and extracting values from them

There is some approach seems very promising to get the value "Unicode" of each character But it generate general "official Unicode" Here is what I mean

System.out.println( Integer.toHexString( (int)(new String("كلمة").charAt(1))) );  

output is 644 but fee0 was the expected output because this character is in middle from then I should get the middle Unicode fee0

so what I want is some method that generates the correct Unicode not the just the official one

The very Left column in the first table in the following link represents the general Unicode
Arabic Unicode Tables Wikipedia


Solution

  • Here is a code that works. Download a sample font, e.g. trado.ttf

    EDIT: I have since been using the Amiri font, which can be downloaded from the aliftype/amiri Github repository.

    Make sure the pdfbox-app and icu4j jar files are in your classpath.

    import java.io.File;
    import java.io.IOException;
    
    import com.ibm.icu.text.ArabicShaping;
    import com.ibm.icu.text.ArabicShapingException;
    import com.ibm.icu.text.Bidi;
    
    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.pdmodel.PDPage;
    import org.apache.pdfbox.pdmodel.PDPageContentStream;
    import org.apache.pdfbox.pdmodel.font.*;
    
    public class Main {
        public static void main(String[] args) throws IOException , ArabicShapingException
        {
        File f = new File("Amiri-Regular.ttf");
            PDDocument doc = new PDDocument();
            PDPage Page = new PDPage();
            doc.addPage(Page);
            PDPageContentStream Writer = new PDPageContentStream(doc, Page);
            Writer.beginText();
            Writer.setFont(PDType0Font.load(doc, f), 20);
            Writer.newLineAtOffset(0, 700);
            String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
            Writer.showText(bidiReorder(s));
            Writer.endText();
            Writer.close();
            doc.save(new File("File_Test.pdf"));
            doc.close();
        }
    
        private static String bidiReorder(String text)
        {
            try {
            Bidi bidi = new Bidi((new ArabicShaping(ArabicShaping.LETTERS_SHAPE)).shape(text), 127);
                bidi.setReorderingMode(0);
                return bidi.writeReordered(2);
            }
            catch (ArabicShapingException ase3) {
            return text;
        }
        }
        
    }