Search code examples
c#azureazure-cognitive-servicestext-analytics-api

Azure KeyPhrase API returning 400 at times


I'm getting mixed results with the Azure KeyPhrase API - sometimes successful (by that I mean 200 result) and others I'm getting 400 bad request. To test the service, I'm sending the contents from a Azure PDF on their NoSQL service.

The documentation says that each document may be upto 5k characters. So as to rule that out, (I started off with 5k) I'm limiting each to at most 1k characters.

How can I can get more info on what is the cause of the failure? I've already checked the Portal, but there's not much detail there.

I am using this endpoint: https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases

Some sample failures:

  • {"documents":[{"language":"en","id":1,"text":"David Chappell Understanding NoSQL on Microsoft Azure Sponsored by Microsoft Corporation Copyright © 2014 Chappell & Associates"}]}

  • {"documents":[{"language":"en","id":1,"text":"3 Relational technology has been the dominant approach to working with data for decades. Typically accessed using Structured Query Language (SQL), relational databases are incredibly useful. And as their popularity suggests, they can be applied in many different situations. But relational technology isn’t always the best approach. Suppose you need to work with very large amounts of data, for example, too much to store on a single machine. Scaling relational technology to work effectively across many servers (physical or virtual) can be challenging. Or suppose your application works with data that’s not a natural fit for relational systems, such as JavaScript Object Notation (JSON) documents. Shoehorning the data into relational tables is possible, but a storage technology expressly designed to work with this kind of information might be simpler. NoSQL technologies have been created to address problems like these. As the name suggests, the label encompasses a variety of storage"}]}

** added my quick/dirty poc code ***

List<string> sendRequest(object data)
    {
        string url = "https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases";
        string key = "api-code-here";
        string hdr = "Ocp-Apim-Subscription-Key";
        var wc = new WebClient();
        wc.Headers.Add(hdr, key);
        wc.Headers.Add(HttpRequestHeader.ContentType, "application/json");

        TextAnalyticsResult results = null;

        string json = JsonConvert.SerializeObject(data);
        try
        {
            var bytes = Encoding.Default.GetBytes(json);
            var d2 = wc.UploadData(url, bytes);
            var dataString = Encoding.Default.GetString(d2);
            results = JsonConvert.DeserializeObject<TextAnalyticsResult>(dataString);                
        }
        catch (Exception ex)
        {
            var s = ex.Message;
        }
        System.Threading.Thread.Sleep(125);

        if (results != null && results.documents != null)
            return results.documents.SelectMany(x => x.keyPhrases).ToList();
        else
            return new List<string>();
    }

Called by:

foreach (var k in vals)
        {
            data.documents.Clear();
            int countSpaces = k.Count(Char.IsWhiteSpace);
            if (countSpaces > 3)
            {
                if (k.Length > maxLen)
                {
                    var v = k;
                    while (v.Length > maxLen)
                    {
                        var tmp = v.Substring(0, maxLen);
                        var idx = tmp.LastIndexOf(" ");
                        tmp = tmp.Substring(0, idx).Trim();
                        data.documents.Add(new
                        {
                            language = "en",
                            id = data.documents.Count() + 1,
                            text = tmp
                        });
                        v = v.Substring(idx + 1).Trim();

                        phrases.AddRange(sendRequest(data));
                        data.documents.Clear();
                    }

                    data.documents.Add(new
                    {
                        language = "en",
                        id = data.documents.Count() + 1,
                        text = v
                    });
                    phrases.AddRange(sendRequest(data));
                    data.documents.Clear();
                }
                else
                {
                    data.documents.Add(new
                    {
                        language = "en",
                        id = 1,
                        text = k
                    });

                    phrases.AddRange(sendRequest(data));
                    data.documents.Clear();
                };
            }             
        }

Solution

  • Try changing this line

    var bytes = Encoding.Default.GetBytes(json);
    

    to

    var bytes = Encoding.UTF8.GetBytes(json);