Search code examples
pythoncsvtext-files

I'm trying to find words from a text file in another text file


I built a simple graphical user interface (GUI) with basketball info to make finding information about players easier. The GUI utilizes data that has been scraped from various sources using the 'requests' library. It works well but there is a problem; within my code lies a list of players which must be compared against this scraped data in order for everything to work properly. This means that if I want to add or remove any names from this list, I have to go into my IDE or directly into my code - I need to change this. Having an external text file where all these player names can be stored would provide much needed flexibility when managing them.

#This is how the players list looks in the code.
basketball = ['Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis' ... #and many others]

#This is how the info in the scrapped file looks like:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

#The rest of the code is working well, this is the final part where it uses the list to write the players that were found it both files.

with open("freeze.csv",'r') as freeze:
    for word in basketball:
        if word in freeze:
            freeze.write(word)

# Up to this point I get the correct output, but now I need the list 'basketball' in a text file so can can iterate the same way

# I tried differents solutions but none of them work for me

with open('final_G_league.csv') as text,  open('freeze1.csv') as filter_words:
    st = set(map(str.rstrip,filter_words))
    txt = next(text).split()
    out = [word  for word in txt if word not in st]

# This one gives me the first line of the scrapped text

import csv

file1 = open("final_G_league.csv",'r')
file2 = open("freeze1.csv",'r')

data_read1= csv.reader(file1)
data_read2 = csv.reader(file2)

# convert the data to a list
data1 = [data for data in data_read1]
data2 = [data for data in data_read2]

for i in range(len(data1)):
    if data1[i] != data2[i]:
        print("Line " + str(i) + " is a mismatch.")
        print(f"{data1[i]} doesn't match {data2[i]}")

file1.close()
file2.close()

#This one returns a list with a bunch of names and a list index error.

file1 = open('final_G_league.csv','r')
file2 = open('freeze_list.txt','r')

list1 = file1.readlines()
list2 = file2.readlines()

for i in list1:
    for j in list2:
        if j in i:

# I also tried the answers in this post:
#https://stackoverflow.com/questions/31343457/filter-words-from-one-text-file-in-another-text-file

Solution

  • Let's assume we have following input files:

    freeze_list.txt - comma separated list of filter words (players) enclosed in quotes:

    'Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis', 'Anthony, Cole', 'Anunoby, O.G.', 'Ayton, Deandre',
    'Banchero, Paolo', 'Bane, Desmond', 'Barnes, Scottie', 'Barrett, RJ', 'Beal, Bradley', 'Booker, Devin', 'Bridges, Mikal',
    'Brown, Jaylen', 'Brunson, Jalen', 'Butler, Jimmy', 'Forbes, Bryn'
    

    final_G_league.csv - scrapped lines that we want to filter, using words from the freeze_list.txt file:

    Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
    "Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
    "Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
    "Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
    "Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,
    

    I would split the responsibilities of the script in code segments to make it more readable and manageable:

    1. Define constants (later you could make them parameters)
    2. Read filter words from a file
    3. Filter scrapped lines
    4. Dump output to a file

    The constants:

    FILTER_WORDS_FILE_NAME = "freeze_list.txt"
    SCRAPPED_FILE_NAME = "final_G_league.csv"
    FILTERED_FILE_NAME = "freeze.csv"
    

    Read filter words from a file:

    with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
        filter_words = eval('(' + filter_words_file.read() + ')')
    

    Filter lines from the scrapped file:

    matched_lines = []
    with open(SCRAPPED_FILE_NAME) as scrapped_file:
        for line in scrapped_file:
            # Check if any of the keywords is found in the line
            for filter_word in filter_words:
                if filter_word in line:
                    matched_lines.append(line)
                    # stop checking other words for performance and 
                    # to avoid sending same line multipe times to the output
                    break
    

    Dump filtered lines into a file:

    with open(FILTERED_FILE_NAME, "w") as filtered_file:
        for line in matched_lines:
            filtered_file.write(line)
    

    The output freeze.csv after running above segments in a sequence is:

    "Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
    

    Suggestion

    Not sure why you have chosen to store the filter words in a comma separated list. I would prefer using a plain list of words - one word per line.

    freeze_list.txt:

    Adebayo, Bam
    Allen, Jarrett
    Antetokounmpo, Giannis
    Butler, Jimmy
    Forbes, Bryn
    

    The reading becomes straightforward:

    with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
        filter_words = [word.strip() for word in filter_words_file]
    

    The output freeze.csv is the same:

    "Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,