Search code examples
pythonregexinverse-match

Inverse Match Help in Python


Hello I am looking to trim a McAfee log file and remove all of the "is OK" and other reported instances that I am not interested in seeing. Before we used a shell script that took advantage of the -v option for grep, but now we are looking to write a python script that will work on both linux and windows. After a couple of attempts I was able to get a regex to work in an online regex builder, but I am having a difficult time implementing it into my script. Online REGEX Builder

Edit: I want to remove the "is OK", "is a broken", "is a block lines", and "file could not be opened" lines so then I am just left with a file of just the problems that I am interested in. Sort of of like of like this in shell:

grep -v "is OK" ${OUTDIR}/${OUTFILE} | grep -v "is a broken" | grep -v "file could not be opened" | grep -v "is a block" > ${OUTDIR}/${OUTFILE}.trimmed 2>&1

I read in and search through the file here:

import re

f2 = open(outFilePath)
contents = f2.read()
print contents
p = re.compile("^((?!(is OK)|(file could not be opened)| (is a broken)|(is a block)))*$", re.MULTILINE | re.DOTALL)
m = p.findall(contents)
print len(m)
for iter in m:
    print iter
f2.close()

A sample of the file I am trying to search:

eth0
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current --    ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016

AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.


No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY
/tmp/tmp.BQshVRSiBo ... is OK.
/tmp/keyring-F6vVGf/socket ... file could not be opened.
/tmp/keyring-F6vVGf/socket.ssh ... file could not be opened.
/tmp/keyring-F6vVGf/socket.pkcs11 ... file could not be opened.
/tmp/yum.log ... is OK.
/tmp/tmp.oW75zGUh4S ... is OK.
/tmp/.X11-unix/X0 ... file could not be opened.
/tmp/tmp.LCZ9Ji6OLs ... is OK.
/tmp/tmp.QdAt1TNQSH ... is OK.
/tmp/ks-script-MqIN9F ... is OK.
/tmp/tmp.mHXPvYeKjb/mcupgrade.conf ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uninstall-uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/mcscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/install-uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/readme.txt ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uvscan_secure ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/signlic.txt ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/liblnxfv.so.4 ... is OK.

But am not getting the correct output. I have tried removing both the MULTILINE and DOTALL options as well and still do not get the correct response. Below is the output when running with DOTALL and MULTILINE.

9
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')

Any help would be much appreciated!! Thanks!!


Solution

  • Perhaps think simpler, line by line:

    import re
    import sys
    
    pattern = re.compile(r"(is OK)|(file could not be opened)|(is a broken)|(is a block)")
    
    with open(sys.argv[1]) as handle:
        for line in handle:
            if not pattern.search(line):
                sys.stdout.write(line)
    

    Outputs:

    eth0
    10.0.11.196
    00:0C:29:AF:6A:A7
    parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current --    ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
    temp XML output is: /tmp/HIQZRq7t2R
    McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
    Copyright (C) 2014 McAfee, Inc.
    (408) 988-3832 LICENSED COPY - April 03 2016
    
    AV Engine version: 5700.7163 for Linux64.
    Dat set version: 8124 created Apr 3 2016
    Scanning for 670707 viruses, trojans and variants.
    
    
    No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE
    
    No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS
    
    No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY