Search code examples
pythonregexfile-iosoundex

Printing the contents and index location of one file by matching it with other file using python


I'm new to python What I want is to be able to print content of a file I have like this..

Mashed Potatoes , topped with this and that ...................... 9.99$

similarly

Product_name , description ......................... price

when I match it with a file containing only Product_names

Mashed Potatoes

Past

Caesar Salad

etc. etc.

The content of the first file are not in a uniform order so that's why I'm trying it with search ,match and print approach

I hope you understand my problem

This is what I have tried

     import re

      content_file = open('/Users/ashishyadav/Downloads/pdfminer-20110515/samples/te.txt',"r")
      product_list = open('/Users/ashishyadav/Desktop/AQ/te.txt',"r")
      output = open("output.txt" , "w")
      line = content_file.read().lower().strip()
      for prod in product_list:
        for match in re.finditer(prod.lower().strip(), line):
         s=match.start()
         e=match.end()
         print >>output, match.group(),"\t",
         print >>output, '%d:%d' % ( s, e),"\n",

what my code does is it matches the second product list file with the full content file but gives me just the index of the product_Names not the description and price ..

what I want is an index/span from Product_name to price..

like from mashed potatoes ---- 9.99$( mashed potatoes - [0:58]),,m just getting [0:14]

and also any way to print the description and price using the same approach

Thanks in advance


Solution

    • Read the whole "second file" into a set X.
    • Read the "first" file line by line.
    • For each line, extract the part before the comma.
    • If this part is in the set X, print whatever is desired.

    Let me know if you need this in python.

    # Read the whole "second file" into a set X.
    with open('foo') as fp:
        names = set(fp)
    
    # Read the "first" file line by line.
    with open('bar') as fp:
        for line in fp:
    
            # For each line, extract the part before the comma.
            name = line.split(',')[0]
    
            # If this part is in the set X, print whatever is desired.
            if name in names:
                 print line