Search code examples
.netazure-ai-search

Azure AI Search using blob storage - can't get past a Base64 decode issue


I am successfully using Azure AI Search pointing at a storage container which lives inside an Azure Storage Account. I have everything working as expected DataSource Index, Indexer and Skill set.

The only issue I cannot solve (I have spent a lot of time searching for a solution and trying various fixes recommended by others but nothing resolves the issue) is that my REST API search endpoint successfully returns results. And when I decode the Base64 strings manually using a Base64 decoding site they are correctly converted to valid URLs that point to my files in Azure storage. Here is the following base64 string:

aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvMTAucG5n0

And here it is decoded manually:

https://rdmc01devazuresearchsa.blob.core.windows.net/rdmc01-dev-docs/10.png

Here are the full REST API search results:

    {
    "@odata.context": "https://rdmc01-dev-azure-search-service.search.windows.net/indexes('azureblob-index')/$metadata#docs(*)",
    "@odata.count": 4,
    "value": [
        {
            "@search.score": 8.4224205,
            "language": "English",
            "organizations": [
                "Microsoft",
                "Open source",
                "FEDORA",
                "Centos",
                "Linux Foundation"
            ],
            "metadata_storage_path": "aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvMTYuZG9jeA2",
            "metadata_storage_name": "16.docx"
        },
        {
            "@search.score": 6.806098,
            "language": "English",
            "organizations": [],
            "metadata_storage_path": "aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvMTAucG5n0",
            "metadata_storage_name": "10.png"
        },
        {
            "@search.score": 6.806098,
            "language": "English",
            "organizations": [],
            "metadata_storage_path": "aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvbW9sbGllLnBuZw2",
            "metadata_storage_name": "mollie.png"
        },
        {
            "@search.score": 6.7477694,
            "language": "English",
            "organizations": [],
            "metadata_storage_path": "aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvMTQuanBn0",
            "metadata_storage_name": "14.jpg"
        }
    ]
}

However, when I use .NET C# to decode them I get the following error:

FormatException: The input is not a valid Base-64 string as it contains a 
non-base 64 character, more than two padding characters, 
or an illegal character among the padding characters.

Any help would be great as I have run out of ideas.


Solution

  • The error is due to padding. The length of base64 should be a multiple of 4.

    Use the sample code below:

    using System;
    using System.Text.RegularExpressions;
    public class Program
    {
        public static void Main()
        {
            string base64String = "aHR0cHM6Ly9yZG1jMDFkZXZhenVyZXNlYXJjaHNhLmJsb2IuY29yZS53aW5kb3dzLm5ldC9yZG1jMDEtZGV2LWRvY3MvbW9sbGllLnBuZw2";
            var rem = base64String.Length % 4;
            
            base64String += new string('=', 4 - rem);
            Console.WriteLine(base64String);
            Console.WriteLine(System.Text.Encoding.UTF8.GetString(Convert.FromBase64String(base64String)));
        }
     
    }
    

    In this code, I am adding the missing lengths.

    Output:

    enter image description here

    It works for all the file paths provided except 10.png and 14.jpg, as they are corrupted somewhere during the process.

    Removing the last character 0 resolves errors for both files.

    enter image description here