Search code examples
c#pdfencodingitext

Saving pdf with cyrillic names of form fields


I have dozens pdf files where I need to change part of field name one to another.
The problem is that developers use mapping with cyrillic names (russian).
I use itext7 library.
I get field names with GetFieldName(), then SetFieldName().
But field names have wrong encoding in new files.
I tried to use Encoding like this:


    Byte[] newNameBytes = Encoding.GetEncoding(1251).GetBytes(newName);
    string utf8NewName = Encoding.GetEncoding(1200).GetString(newNameBytes);
    textField.SetFieldName(utf8NewName);

Tried different types (UTF8, Unicode, CP1251, default) and nothing.
Everything that I achieved is different kinds of unreadable field names in new file.
I found how to set styles, fonts but it's about text inside field.

I guess iText doesn't recognize any chars except latin...
Any suggestions are welcome.

Old file:

old file

New file with renamed fields:

new file

What I mean:

pdf property

Code I wrote:


    string source = @"C:\Test\old.pdf";
    string destination = @"C:\Test\new.pdf";
    
    // Old and new field name
    var names = new Dictionary<string, string>()
    {
        { "Заемщик>Рабочий телефон", "Заемщик>Телефон организации" },
        { "Заемщик>Компания", "Заемщик>Название организации" },
    };
    
    var document = new PdfDocument(new PdfReader(source), new PdfWriter(destination));
    var form = PdfAcroForm.GetAcroForm(document, false);
    var fields = form.GetFormFields();
    
    foreach (var name in names)
    {
        // Find all names contain same value;
        var fieldsByName = (from f in fields
                            where f.Key.Contains(name.Key)
                            select f).ToList();
    
        foreach (var field in fieldsByName)
        {
            // if multiple fields exist with same name. I don't know how to operate properly kids (child fields).
            if (field.Value is PdfTextFormField textField)
            {
                string oldName = textField.GetFieldName().ToString();
                string newName = oldName.Replace(name.Key, name.Value);
                textField.SetFieldName(newName);
            }
        }
    }
    document.Close();


Solution

  • There is a bug in SetFieldName apparently.

    textField.SetFieldName("Заемщик>Телефон организации");
    textField.GetFieldName(); // returns '05<I8:>"5;5D>= >@30=870F88'
    

    The solution is to use lower level functions from PdfDictionary.

    Replace

    textField.SetFieldName(newName);
    

    with

    textField.GetPdfObject().Put("T", new PdfString(newName, "UnicodeBig"));
    

    or

    var aPdfDictionary = textField.GetPdfObject();
    var aPdfString = new PdfString(newName, "UnicodeBig");
    aPdfDictionary.Put("T", aPdfString);
    

    and

    textField.GetFieldName()
    

    to test result.