Search code examples
javascriptjsonpdftk

Parsing a colon-separated string into JSON in Node/Javascript


I'm working on a Node server, and I used pdftk to extract field data from a PDF to be injected with form data.

I'm trying to get a JSON object of Field Names to iterate over, but I can't seem to get it right. It will likely need to be a loop, because it will change based what fields are in the PDF.

This is the string I have from the output.

---
FieldType: Text
FieldName: topmostSubform[0].Page1[0].p1-t17[0]
FieldFlags: 8388608
FieldValue:
FieldJustification: Center
FieldMaxLength: 10
---
FieldType: Text
FieldName: topmostSubform[0].Page1[0].p1-t20[0]
FieldFlags: 8388608
FieldValue:
FieldJustification: Center
FieldMaxLength: 10
---
FieldType: Button
FieldName: topmostSubform[0].Page1[0].p1-cb7[0]
FieldFlags: 0
FieldValue:
FieldJustification: Left
FieldStateOption: 1
FieldStateOption: Off
---
FieldType: Text
FieldName: topmostSubform[0].Page1[0].p1-t38[0]
FieldFlags: 8388608
FieldValue:
FieldJustification: Center
---
FieldType: Text
FieldName: topmostSubform[0].Page1[0].p1-t50[0]
FieldFlags: 8388608
FieldValue:
FieldJustification: Left

Would you recommend regex? What is the best way to go about this problem?


Solution

  • Use split repeatedly to break it down into its components:

    var input = '---\n\
    FieldType: Text\n\
    FieldName: topmostSubform[0].Page1[0].p1-t17[0]\n\
    FieldFlags: 8388608\n\
    FieldValue:\n\
    FieldJustification: Center\n\
    FieldMaxLength: 10\n\
    ---\n\
    FieldType: Text\n\
    FieldName: topmostSubform[0].Page1[0].p1-t20[0]\n\
    FieldFlags: 8388608\n\
    FieldValue:\n\
    FieldJustification: Center\n\
    FieldMaxLength: 10\n\
    ---\n\
    FieldType: Button\n\
    FieldName: topmostSubform[0].Page1[0].p1-cb7[0]\n\
    FieldFlags: 0\n\
    FieldValue:\n\
    FieldJustification: Left\n\
    FieldStateOption: 1\n\
    FieldStateOption: Off\n\
    ---\n\
    FieldType: Text\n\
    FieldName: topmostSubform[0].Page1[0].p1-t38[0]\n\
    FieldFlags: 8388608\n\
    FieldValue:\n\
    FieldJustification: Center\n\
    ---\n\
    FieldType: Text\n\
    FieldName: topmostSubform[0].Page1[0].p1-t50[0]\n\
    FieldFlags: 8388608\n\
    FieldValue:\n\
    FieldJustification: Left';
    
    var fields = [];
    var field_strings = input.split(/[\r\n]*---[\r\n]*/);
    for (var i = 0; i < field_strings.length; i++) {
        if (field_strings[i] == '') { // Skip blank field at beginning
            continue;
        }
        var obj = {};
        var props_strings = field_strings[i].split('\n');
        for (var j = 0; j < props_strings.length; j++) {
            var keyvalue = props_strings[j].split(':');
            obj[keyvalue[0]] = keyvalue[1].trim();
        }
        fields.push(obj);
    }
    console.log(fields);