Search code examples
regexstringexcelcsvcross-reference

Easiest way to cross-reference a CSV file with a text file for common strings


I have a list of strings in a CSV file, and another text file that I would like to search for these strings. The CSV file has just the strings that I am interested in, but the text file has a bunch of other text interspersed among the strings of interest (the strings I am interested in are ID numbers for a database of proteins). What would the easiest way of going about this be? I want to check the text file for the presence of every string in the CSV file. I am working in a research lab at a top university, so you would be aiding cutting-edge research!

Thanks :)


Solution

  • I would use Python for this. To print the matching lines, you could do this:

    import csv
    with open("strings.csv") as csvfile: 
        reader = csv.reader(csvfile)
        searchstrings = {row[0] for row in reader}   # Construct a set of keywords
    with open("text.txt") as txtfile:
        for number, line in enumerate(txtfile):
            for needle in searchstrings:
                if needle in line: 
                    print("Line {0}: {1}".format(number, line.strip()))
                    break   # only necessary if there are several matches per line