Search code examples
pythonpython-3.xtext-extraction

Extract hostname and datetime from text file in Python


I'd like to extract hostnames and datetime from a text file using Python. Below is the text and I need to extract the date behind 'notAfter=' and the hostname behind 'UnitId:' into a dictionary where the datetime is attached to the hostname.

- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/1
- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/0
- Stdout: |
    notAfter=Jun  2 10:15:03 2031 GMT
  UnitId: octavia/2

Solution

  • A pretty simple regex will do it notAfter=(.*)\n\s+UnitId: (.*)

    import re
    
    content = """- Stdout: |
        notAfter=Jun  2 10:15:03 2031 GMT
      UnitId: octavia/1
    - Stdout: |
        notAfter=Jun  2 10:15:03 2031 GMT
      UnitId: octavia/0
    - Stdout: |
        notAfter=Jun  2 10:15:03 2031 GMT
      UnitId: octavia/2"""
    
    results = [{'datetime': dt, 'hostname': host}
               for dt, host in re.findall(r"notAfter=(.*)\n\s+UnitId: (.*)", content)]
    print(results)
    
    # [{'datetime': 'Jun  2 10:15:03 2031 GMT', 'hostname': 'octavia/1'}, 
    #  {'datetime': 'Jun  2 10:15:03 2031 GMT', 'hostname': 'octavia/0'}, 
    #  {'datetime': 'Jun  2 10:15:03 2031 GMT', 'hostname': 'octavia/2'}]