Search code examples
azure-language-understandingsummarizationazure-ai

Increase summary length using MS Azure-AI services


Recently, I have been using Azure AI cognitive services to summarize text using document summarization and conversation summarization of it. But the summary length using both document summarization and conversation summarization is very less.


Solution

  • According to the documentation, you can give a maximum sentence length of 20 for a summary.

    If you want to get a summary with more than 20 sentences, you can split your document and summarize it.

    Example: If your document length is long, split it based on the topic or according to your requirements, then summarize it.

    Below is the document I have with a length of 4779.

    enter image description here

    Next, split it and summarize it.

    Here, I am using the Python SDK to perform an extractive summary.

    Code:

    # This example requires environment variables named "LANGUAGE_KEY" and "LANGUAGE_ENDPOINT"
    key = "db2............."
    endpoint = "https://<congnitive_name>.cognitiveservices.azure.com/"
    
    from azure.ai.textanalytics import TextAnalyticsClient
    from azure.core.credentials import AzureKeyCredential
    
    # Authenticate the client using your key and endpoint 
    def authenticate_client():
        ta_credential = AzureKeyCredential(key)
        text_analytics_client = TextAnalyticsClient(
                endpoint=endpoint, 
                credential=ta_credential)
        return text_analytics_client
    
    client = authenticate_client()
    
    # Example method for summarizing text
    def sample_extractive_summarization(client,doc):
    
        poller1 = client.begin_extract_summary(documents=doc,max_sentence_count=20)
    
        document_results = poller1.result()
        for i in document_results:
            print(len(i['sentences']))
    
    
    
    
    sample_extractive_summarization(client,document)
    

    Output before chunking the document.

    enter image description here

    You can see a maximum of 20 sentences.

    Output After chunking.

    Code for chunking.

    def chunk_string(string, chunk_size):
        chunks = []
        for i in range(0, len(string), chunk_size):
            chunks.append(string[i:i+chunk_size])
        return chunks
    
    chunk_size = 1000
    chunks = chunk_string(document[0], chunk_size)
    
    sample_extractive_summarization(client,chunks)
    

    Here, I am chunking with a length of 1000.

    enter image description here

    Now, if you add those lengths, you will get 25 sentences.

    With the help of chunking, you can increase the summary length.

    Note: I just used indexing for chunking, but in your case, you should do chunking that makes sense for your document, like topic-wise splitting the document.