Search code examples
azureazure-aiazure-ai-search

Implement a Condition in an Azure AI Search Skillset


We are utilizing Azure skills to identify the language code and then perform OCR on documents with an unknown language code. Could someone provide guidance on how to implement this conditional code? I'm encountering errors and might not be applying it correctly.


Solution

  • You no need to do document extraction, by default the indexer does those things and gives the content and metadata in /document context.

    For getting language code you can give the field name as /document/metadata_language .

    You pass this as inputs for conditional skill and do OCR for further improvement.

    Alter you conditional skill like below.

    {
      "@odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
      "name": "Language Check",
      "description": "Check if language code is 'Unknown'",
      "context": "/document",
      "inputs": [
        {
          "name": "condition",
          "source": "= $(/document//document/metadata_language) == '(Unknown)'"
        },
        {
          "name": "whenTrue",
          "source": "/document/normalized_images/*"  //here you can also use  /document/content
        },
        {
          "name": "whenFalse",
          "source": "= null"
        }
      ],
      "outputs": [
        {
          "name": "output",
          "targetName": "imagesForOcr"
        }
      ]
    },
    

    Note: Make sure you give indexer configuration to exract content and metadata. Below is the indexer definition.

    {
      "@odata.context": "https://jgsaisearch.search.windows.net/$metadata#indexers/$entity",
      "@odata.etag": "\"0x8DC91AC57C625BF\"",
      "name": "azureblob-indexer",
      "description": "",
      "dataSourceName": "ds",
      "skillsetName": "skillset1718943474465",
      "targetIndexName": "azureblob-index",
      "disabled": null,
      "schedule": null,
      "parameters": {
        "batchSize": null,
        "maxFailedItems": 0,
        "maxFailedItemsPerBatch": 0,
        "base64EncodeKeys": null,
        "configuration": {
          "dataToExtract": "contentAndMetadata",
          "parsingMode": "default"
        }
      },
      "fieldMappings": [
        {
          "sourceFieldName": "metadata_storage_path",
          "targetFieldName": "metadata_storage_path",
          "mappingFunction": {
            "name": "base64Encode",
            "parameters": null
          }
        }
      ],
      "outputFieldMappings": [
        {
          "sourceFieldName": "/document/myLanguageCode",
          "targetFieldName": "lang_code"
        }
      ],
      "cache": null,
      "encryptionKey": null
    }
    

    Output:

    enter image description here