Search code examples
pythonregexlinuxregex-group

Python pattern match from a file


Experts, I am Just trying to match the pattern from my raw data file so as to list the not running service into html format.

I have took the help from the googling and using something like below but its not working, any help on this will be greatful.

code:

Html_file= open("result1.html","w")
html_str = """
<table border=1>
     <tr>
       <th bgcolor=fe9a2e>Hostname</th>
       <th bgcolor=fe9a2e>Service</th>
     </tr>
"""
Html_file.write(html_str)
fh=open(sys.argv[1],"r")
for line in fh:
        pat_match=re.match("^HostName:"\s+(.*?)\".*", line)
        pat_match1=re.match("^Service Status:"\s+(.*not.*?)\".*", line)
        if pat_match:
                Html_file.write("""<TR><TD bgcolor=fe9a2e>""" + pat_match.group(1) + """</TD>\n""")
        elif pat_match1:
                Html_file.write("""<TR><TD><TD>""" + pat_match1.group(2) + """</TD></TD></TR>\n""")

raw data:

HostName: dbfoxn001.examle.com
Service Status:  NTP is Running on the host dbfoxn001.examle.com
Service Status:  NSCD is not Running on the host dbfoxn001.examle.com
Service Status:  SSSD is Running on the host dbfoxn001.examle.com
Service Status:  Postfix  is Running on the host dbfoxn001.examle.com
Service Status:  Automount is Running on the host dbfoxn001.examle.com
HostName: dbfoxn002.examle.com                   SSH Authentication failed

Required Result:

Hostname                        Service
dbfoxn001.examle.com            NSCD is not Running on the host dbfoxn001.examle.com

Solution

  • Your first problem is that your regex is not properly embedded in a string. You need to either escape or remove the offending "s.

    Other than that, the actual regex doesn't really match your input data (for example, you are trying to match some "s which aren't in your input data. I have written regexes as such:

    ^HostName:\s*(.+)
    ^Service Status:\s*(.+is not Running.*)
    

    You can try them here and here.

    Lastly, your python code for generating the html seems to not generate the sort of html you want. My assumption on how the html of your sample table should look like is as follows:

    <table border=1>
      <tr>
        <th bgcolor=fe9a2e>Hostname</th>
        <th bgcolor=fe9a2e>Service</th>
      </tr>
      <tr>
        <td>dbfoxn001.examle.com</td>
        <td>NSCD is not Running on the host dbfoxn001.examle.com</td>
      </tr>
    </table>
    

    To that end I have put the hostname into its own variable rather than writing it to the file and added it each time a status is parsed. I have also added the missing final </table> and closed the open files:

    import sys
    import re
    
    result = open("result1.html","w")
    table_header = """
    <table border=1>
         <tr>
           <th bgcolor=fe9a2e>Hostname</th>
           <th bgcolor=fe9a2e>Service</th>
         </tr>
    """
    result.write(table_header)
    input_file=open(sys.argv[1],"r")
    for line in input_file:
            host_match = re.match("^HostName:\s*(.+)", line)
            status_match = re.match("^Service Status:\s*(.+is not Running.*)", line)
            if host_match:
                    hostname = host_match.group(1)
            elif status_match:
                    result.write("""<tr><td>""" + hostname + """</td><td>""" + status_match.group(1) + """</td></tr>\n""")
    result.write("""</table>"""
    input_file.close()
    result.close()