Search code examples
pythontext-filestxt

Python program which can count and identify the number of acronyms in a text file


I have tried this code from my side, any suggestion and help is appreciated. To be more specific, I want to create a python program which can count and identify the number of acronyms in a text file. And the output of the program should display every acronyms present in the specified text file and how many time each of those acronyms occurred in the file.

*Note- The below code is not giving the desired output. Any type of help and suggestion is appreciated.

Link for the Text File , You guys can have a look- https://drive.google.com/file/d/1zlqsmJKqGIdD7qKicVmF0W6OgF5-g7Qk/view?usp=sharing

This text file contain various acronyms which are used in it. So, I basically want to write a python script to identify those acronyms and count how many times those acronyms occurred. The acronyms are of various type which can be 2 or more letters and it can either be of small or capital letters. For further reference about acronyms please have a look at the text file provided at the google drive.

Any updated code is also appreciated.

acronyms = 0 # number of acronyms

#open file File.txt in read mode with name file
with open('Larex_text_file.txt', "r", errors ='ignore') as file:
    text = str(file.read())
    import re

    print(re.sub("([a-zA-Z]\.*){2,}s?", "", text))

    for line in text: # for every line in file
        for word in line.split(' '): # for every word in line
            if word.isupper(): # if word is all uppercase letters
                acronyms+=1

print("Number of acronyms:", acronyms) #print number of acronyms

Solution

  • Answer to the question-

    #open file File.txt in read mode with name file
    with open('Larex_text_file.txt', "r", errors ='ignore') as file:
        text = str(file.read())
        for word in text.split(' '): # for every word in line
            if word.isupper() and word.isalpha(): # if word is all uppercase letters
                acronyms+=1
                if len(word) == 1:  #ignoring the word found in the file of single character as they are not acronyms
                  pass
                else:
                  index = len(acronym_word)
                  acronym_word.insert(index, word)  #storing all the acronyms founded in the file to a list
    
    uniqWords = sorted(set(acronym_word)) #remove duplicate words and sort the list of acronyms
    for word in uniqWords:
        print(word, ":", acronym_word.count(word))