Search code examples
pythonfilefor-loopextracttxt

How to use "for loop" in Python to extract year and firm name (for earning call transcripts) from a txt file


I have a txt file of this type:

Thomson Reuters StreetEvents Event Transcript
E D I T E D   V E R S I O N

Q3 2003 ABM Industries Earnings Conference Call
SEPTEMBER 10, 2003 / 1:00PM GMT

================================================================================
Corporate Participants
================================================================================

My txt file is saved:C:\sam\2003-Sep-10-ABM.N-140985434256-Transcript.txt.

I want to extract only transcript year (as 2003) and firm name (as ABM Industries). I used below codes, but ended up with all years.

Code:

import re
f = open("C:\\sam\\2003-Sep-10-ABM.N-140985434256-Transcript.txt", 'r')
content = f.read()
pattern = "\d{4}"
years = re.findall(pattern, content)
for year in years:
    print(year)

My Output: 2003 2003 2003 2003 2002 2003 2002 2003 2003 2002 2003 2002 2002 2003 2002 2002 2002 2002 2002 2003 2003 2003 2004 2003 2003 2003 2004 2019

Expected Output: 2003 ABM Industries


Solution

  • If I understand you correctly, this should work:

    import re 
    content = """Q3 2003 ABM Industries Earnings Conference Call
    SEPTEMBER 10, 2003 / 1:00PM GMT"""
    pattern = "\d{4}+\s\w+\s\w+"
    years = re.findall(pattern, content)[0]
    print(years)
    

    Output: "2003 ABM Industries"