Search code examples
phpparsingtext-parsingpdftk

Parsing pdftk dump_data_fields using PHP?


I need some advice on what's the best way to parse the output given by pdftk dump_data_fields using PHP?

In addition, the properties that I need to extract are: FieldName, FieldNameAlt and optionally FieldMaxLength and FieldStateOptions.

FieldType: Text
FieldName: TestName1
FieldNameAlt: TestName1
FieldFlags: 29360128
FieldJustification: Left
FieldMaxLength: 5
---
FieldType: Button
FieldName: TestName3
FieldFlags: 0
FieldJustification: Left
FieldStateOption: Off
FieldStateOption: Yes
---
...

Solution

  • Would something like this suffice?

    $handle = fopen("/tmp/bla.txt", "r");
    if ($handle) {
        $output = array();
        while (($line = fgets($handle)) !== false) {
            if (trim($line) === "---") {
                // Block completed; process it
                if (sizeof($output) > 0) {
                    print_r($output);
                }
                $output = array();
                continue;
            }
            // Process contents of data block
            $parts = explode(":", $line);
            if (sizeof($parts) === 2) {
                $key = trim($parts[0]);
                $value = trim($parts[1]);
                if (isset($output[$key])) {
                    $i = 1;
                    while(isset($output[$key.$i])) $i++;
                    $output[$key.$i] = $value;
                }
                else {
                    $output[$key] = $value;
                }
            }
            else {
                // handle malformed input
            }
        }
    
        // process final block
        if (sizeof($output) > 0) {
            print_r($output);
        }
        fclose($handle);
    }
    else {
        // error while opening the file
    }
    

    This gives you the following output:

    Array
    (
        [FieldType] => Text
        [FieldName] => TestName1
        [FieldNameAlt] => TestName1
        [FieldFlags] => 29360128
        [FieldJustification] => Left
        [FieldMaxLength] => 5
    )
    Array
    (
        [FieldType] => Button
        [FieldName] => TestName3
        [FieldFlags] => 0
        [FieldJustification] => Left
        [FieldStateOption] => Off
        [FieldStateOption1] => Yes
    )
    

    Fishing out those values is then as easy as:

    echo $output["FieldName"];