Search code examples
pythontextdictionaryediting

How to use keys from a dictionary to search for strings?


I'm writing a program which edits a text file. I intend for the program to look for duplicate strings and delete n - 1 lines of similar strings.

Here is the script I have so far:

import re

fname = raw_input("File name - ")
fhand = open(fname, "r+")
fhand.read()


counts = {}
pattern = re.compile(pattern)

# This searches the file for duplicate strings and inserts them into a dictionary with a counter 
# as the value

for line in fhand:
    for match in pattern.findall(line):
        counts.setdefault(match, 0)
        counts[match] += 1

pvar = {}

#This creates a new dictionary which contains all of the keys in the previous dictionary with  
# count > 1

for match, count in counts.items():
    if count > 1:
        pvar[match] = count

fhand.close()
count = 0

# Here I am trying to delete n - 1 instances of each string that was a key in the previous 
# dictionary

with open(fname, 'r+') as fhand:        
    for line in fhand:
        for match, count in pvar.items():
            if re.search(match, line) not in line: 
               continue
               count += 1
            else:
               fhand.write(line)
print count 
fhand.close()

How can I make the last bit of code work? Is it possible to use the keys from the dictionary to identify relevant lines and delete n-1 instances? Or am I doing it completely wrong?

EDIT: Sample from file, this is supposed to be a list with each 'XYZ' instance being on a newline with two whitespace characters in front. The formatting's a bit messed up, my apologies INPUT

-=XYZ[0:2] &
-=XYZ[0:2] &
-=XYZ[3:5] &
=XYZ[6:8] &
=XYZ[9:11] &
=XYZ[12:14] & 
-=XYZ[15:17] &
=XYZ[18:20] &
=XYZ[21:23] &

OUTPUT

=XYZ[0:2]

EDIT

Also, could anyone explain why the last part of the code doesn't return anything?


Solution

  • Here is something without using regex, using a dictionary (so lines are unordered, probably does not matter...):

    #!/usr/bin/env python
    
    import os
    res = {}
    with open("input.txt") as f:
        for line in f.readlines():
            line = line.strip()
            key = line.split('[')[0].replace('-','').replace('=', '')
            if key in res:
                continue
            res[key] = line
            # res[key] = line.replace('&', '').strip()
    print os.linesep.join(res.values())
    

    This does not get rid of the trailing ampersand. If you want to get rid of it uncomment:

    res[key] = line.replace('&', '').strip()