I need to make a program in python which looks through a given file. Let's say acronyms.txt, and then returns a percentage value of how many lines contain at least 1 three letter acronym. For example:
NSW is a very large state.
It's bigger than TAS.
but WA is the biggest!
After reading this it should return 66.7% as 66.7% of the lines contain a three letter acronym. It is also rounded to the first decimal place as you can see. I am not very familiar with regex but I think it would be simplest with regex.
EDIT:
I have finished the code but i need it to recognize acronyms with dots between them, EG N.S.W should be recognized as an acronym. How do i do this?
Any help would be appreciated!
You can do something like:
total_lines = 0
matched_lines = 0
for line in open("filename"):
total_lines += 1
matched_lines += bool(re.search(r"\b[A-Z]{3}\b", line))
print "%f%%" % (float(matched_lines) / total_lines * 100)
Note '\b' in search pattern -- it matches empty string in beginning or end of word. It helps you to prevent unwanted matches with acronyms longer than 3 ('asdf ASDF asdf') or with acronyms inside word ('asdfASDasdf').