Search code examples
pythonregextext-formatting

Problems printing searches not found


In the code below, the program is getting string data from the user and converting it to ascii and hex and searching all .log and .txt files in a certain directory for the string in plain string, hex, and ascii values. The program prints the line # , the string type found, and the file path if the string is found. However, not only do I want it to print the files if the string is found, I would also like it to print the file and path and string searched for in the files that were searched but not found. I'm a newbie, so please don't be frustrated with the simplicity of the problem. I'm still learning. Thanks. Code below:

 elif searchType =='2':
      print "\nDirectory to be searched: " + directory
      print "\nFile result2.log will be created in: c:\Temp_log_files."
      paths = "c:\\Temp_log_files\\result2.log"
      temp = file(paths, "w")
      userstring = raw_input("Enter a string name to search: ")
      userStrHEX = userstring.encode('hex')
      userStrASCII = ''.join(str(ord(char)) for char in userstring)
      regex = re.compile(r"(%s|%s|%s)" % ( re.escape( userstring ), re.escape( userStrHEX ), re.escape( userStrASCII )))
      goby = raw_input("Press Enter to begin search (search ignores whitespace)!\n")


      def walk_dir(directory, extensions=""):
          for path, dirs, files in os.walk(directory):
             for name in files:
                if name.endswith(extensions):
                   yield os.path.join(path, name)

      whitespace = re.compile(r'\s+')
      for line in fileinput.input(walk_dir(directory, (".log", ".txt"))):
          result = regex.search(whitespace.sub('', line))
          if result:
              template = "\nLine: {0}\nFile: {1}\nString Type: {2}\n\n"
              output = template.format(fileinput.filelineno(), fileinput.filename(), result.group())

              print output
              temp.write(output)
              break
          elif not result:
              template = "\nLine: {0}\nString not found in File: {1}\nString Type: {2}\n\n"
              output = template.format(fileinput.filelineno(), fileinput.filename(), result.group())

              print output
              temp.write(output)

      else:          
          print "There are no files in the directory!!!"

Solution

  • Folks, I think user706808 wants to search for all occurrences of searchstring in file and:

    • for each occurrence if string IS found in file, then on a per-LINE basis, print lineno, file pathname
    • if string is NOT found in file, then on a per-FILE basis print file pathname (but not the contents) and searchstring. Easiest way to do that is keep a boolean (or int) track of occurrences (nMatches), then print no-match-message at the end (if nMatches is 0 or False) before you close the file or the pathname goes out of context.

    Can you confirm? Assuming that's what you want, all you need to change is to split up this megaline of code...

    for line in fileinput.input(walk_dir(directory, (".log", ".txt"))):
    

    into...

    for curPathname in walk_dir(directory, (".log", ".txt")):
        nOccurrences = 0
        for line in fileinput.input(curPathname):
            result = regex.search(whitespace.sub('', line))
            if result:
                ...
                nOccurrences += 1  # ignores multiple matches on same line 
            # You don't need an 'elif not result' line, since that should happen on a per-file basis
        # Only get here when we reach EOF
        if (nOccurrences == 0):
            NOW HERE print the "not found" message, for curPathname
        # else you could print "found %d occurrences of %s in ..."
    

    Sound good?

    By the way you can now simply refer to fileinput.filename() as 'curPathname'.

    (Also you might like to abstract the functionality into a function find_occurrences(searchstring,pathname) which returns int or Boolean 'nOccurrences'.)