Search code examples
ibm-cloudibm-watsondocument-conversion

Having trouble getting usable results from Watson's Document Conversion service


When I try to convert this document

https://public.dhe.ibm.com/common/ssi/ecm/po/en/poq12347usen/POQ12347USEN.PDF

with Watson's Document Conversion service, all I get is four answer units, one for each level-4 heading. What I really need is 47 answer units, one for each FAQ question. How can I achieve this?


Solution

  • Often a custom configuration can be used to produce more usable results in the case of a document such as this one. The custom configuration can be passed to Document Conversion in a config form part on the request.

    Please refer to the documentation (https://www.ibm.com/watson/developercloud/doc/document-conversion/customizing.shtml) for more details on the options available. In this particular case, the following seems to give improved results:

    {
      "conversion_target": "ANSWER_UNITS",
      "pdf": {
        "heading": {
          "fonts": [
            {"level": 1, "min_size": 14, "max_size": 80},
            {"level": 2, "min_size": 11, "max_size": 12, "bold": true},
            {"level": 3, "min_size": 9, "max_size": 11, "bold": true}
          ]
        }
      }
    }