I have tried this code from my side, any suggestion and help is appreciated. To be more specific, I want to create a python program which can count and identify the number of acronyms in a text file. And the output of the program should display every acronyms present in the specified text file and how many time each of those acronyms occurred in the file.
*Note- The below code is not giving the desired output. Any type of help and suggestion is appreciated.
Link for the Text File , You guys can have a look- https://drive.google.com/file/d/1zlqsmJKqGIdD7qKicVmF0W6OgF5-g7Qk/view?usp=sharing
This text file contain various acronyms which are used in it. So, I basically want to write a python script to identify those acronyms and count how many times those acronyms occurred. The acronyms are of various type which can be 2 or more letters and it can either be of small or capital letters. For further reference about acronyms please have a look at the text file provided at the google drive.
Any updated code is also appreciated.
acronyms = 0 # number of acronyms
#open file File.txt in read mode with name file
with open('Larex_text_file.txt', "r", errors ='ignore') as file:
text = str(file.read())
import re
print(re.sub("([a-zA-Z]\.*){2,}s?", "", text))
for line in text: # for every line in file
for word in line.split(' '): # for every word in line
if word.isupper(): # if word is all uppercase letters
acronyms+=1
print("Number of acronyms:", acronyms) #print number of acronyms
Answer to the question-
#open file File.txt in read mode with name file
with open('Larex_text_file.txt', "r", errors ='ignore') as file:
text = str(file.read())
for word in text.split(' '): # for every word in line
if word.isupper() and word.isalpha(): # if word is all uppercase letters
acronyms+=1
if len(word) == 1: #ignoring the word found in the file of single character as they are not acronyms
pass
else:
index = len(acronym_word)
acronym_word.insert(index, word) #storing all the acronyms founded in the file to a list
uniqWords = sorted(set(acronym_word)) #remove duplicate words and sort the list of acronyms
for word in uniqWords:
print(word, ":", acronym_word.count(word))