Search code examples

using date metadata from docx, pdf files for Azure cognitive search

I'm uploading a lot of DocX and PDF files into blob storage to be used in Azure cognitive search. I'm using it to experiment with some AI capabilities I already, and it works well but I would like to try the filterable freshness. I'm not sure how the metadata for these PDF files (e.g., 'author', 'date', 'title') can be added through a skill. Any advice would be appreciated. Thanks

  "@odata.context": ... ,
  "@odata.etag": ... ,
  "name": "freshness",
  "description": "Skillset to chunk documents and generate embeddings",
  "skills": [
      "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
      "name": "#3",
      "description": "Extracts metadata from the document",
      "context": "/document",
      "inputs": [
          "name": "metadata_creation_date",
          "source": "/document/metadata_creation_date"
      "outputs": [
          "name": "output",
          "targetName": "creationDate"
  "cognitiveServices": null,
  "knowledgeStore": null,
  "indexProjections": {
    "selectors": [
        "targetIndexName": "freshness",
        "parentKeyFieldName": "parent_id",
        "sourceContext": "/document/pages/*",
        "mappings": [
            "name": "creationDate",
            "source": "/document/creationDate",
            "sourceContext": null,
            "inputs": []
    "parameters": {
      "projectionMode": "skipIndexingParentDocuments"
  "encryptionKey": null


  • If you already having the index then you can create new field of type Edm.DateTimeOffset

    enter image description here

    After creating, map the fields indexer in fieldMappings

    "fieldMappings": [
          "sourceFieldName": "metadata_storage_path",
          "targetFieldName": "metadata_storage_path",
          "mappingFunction": {
            "name": "base64Encode",
            "parameters": null


    while importing data in the Customize target index you can make it filterable.

    enter image description here

    Check the Filterable as shown in the image.

    enter image description here


    enter image description here