Search code examples
pythonpython-3.xazureazure-cognitive-servicestext-analytics-api

Azure Cognitive Services: Problem with Text Analytics PII Endpoint in Python SDK


I'm trying to perform some more in-depth PII detection as the standard code that might be found here: https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/personally-identifiable-information/quickstart?pivots=programming-language-python fails to find some more detailed entities (like French registration plates number, for example).

Everything works fine when I use the standard endpoint: 'https://whatever.cognitiveservices.azure.com/'

However, when I switch to 'https://whatever.cognitiveservices.azure.com/text/analytics/v3.1/entities/recognition/pii?piiCategories=default,FRDriversLicenseNumber" (an example found here: https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/personally-identifiable-information/how-to-call ) I get an 404 error.

I believe it might be the Python SDK Issue, as when I try the API console - it works just fine. https://westus2.dev.cognitive.microsoft.com/docs/services/TextAnalytics-v3-1/operations/EntitiesRecognitionPii

The code:

key = "key"
endpoint = "https://whatever.cognitiveservices.azure.com/text/analytics/v3.1/entities/recognition/pii?piiCategories=default,FRDriversLicenseNumber/"

from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

# Authenticate the client using your key and endpoint 
def authenticate_client():
    ta_credential = AzureKeyCredential(key)
    text_analytics_client = TextAnalyticsClient(
            endpoint=endpoint, 
            credential=ta_credential)
    return text_analytics_client

client = authenticate_client()

# Example method for detecting sensitive information (PII) from text 
def pii_recognition_example(client):
    documents = [
        "The employee's SSN is 859-98-0987.",
        "The employee's phone number is 555-555-5555."
    ]
    response = client.recognize_pii_entities(documents, language="en")
    result = [doc for doc in response if not doc.is_error]
    for doc in result:
        print("Redacted Text: {}".format(doc.redacted_text))
        for entity in doc.entities:
            print("Entity: {}".format(entity.text))
            print("\tCategory: {}".format(entity.category))
            print("\tConfidence Score: {}".format(entity.confidence_score))
            print("\tOffset: {}".format(entity.offset))
            print("\tLength: {}".format(entity.length))
pii_recognition_example(client)

Solution

  • As it is not stated in the MS docs yet, the endpoint should be kept simple:

    endpoint = "https://.cognitiveservices.azure.com"

    and the details passed to the response = client.recognize_pii_entities().

    The below code works just fine:

     key = "key"
     endpoint = "https://<name>.cognitiveservices.azure.com"
        
     from azure.ai.textanalytics import TextAnalyticsClient
     from azure.core.credentials import AzureKeyCredential
        
     # Authenticate the client using your key and endpoint 
     def authenticate_client():
         ta_credential = AzureKeyCredential(key)
         text_analytics_client = TextAnalyticsClient(
                 endpoint=endpoint, 
                 credential=ta_credential)
         return text_analytics_client
        
     client = authenticate_client()
        
     # Example method for detecting sensitive information (PII) from text 
     def pii_recognition_example(client):
         documents = [
             "The employee's SSN is 859-98-0987.",
             "The employee's phone number is 555-555-5555."
         ]
         response = client.recognize_pii_entities(documents, language="en", categories_filter=["default", "FRDriversLicenseNumber"])
         result = [doc for doc in response if not doc.is_error]
         for doc in result:
             print("Redacted Text: {}".format(doc.redacted_text))
             for entity in doc.entities:
                 print("Entity: {}".format(entity.text))
                 print("\tCategory: {}".format(entity.category))
                 print("\tConfidence Score: {}".format(entity.confidence_score))
                 print("\tOffset: {}".format(entity.offset))
                 print("\tLength: {}".format(entity.length))
     pii_recognition_example(client)