Search code examples
pythontext-fileswith-statement

Remove certain links from a textfile by reading textfile


So I have whitelist.txt which contains some links, and scrapedlist.txt which contains other links, and also links that are in whitelist.txt.

I'm trying to open and read whitelist.txt and then open and read scrapedlist.txt - to write to a new file updatedlist2.txt which will have all the contents of scrapedlist.txt minus whitelist.txt.

I'm pretty new to Python, so still learning. I've searched for answers, and this is what I came up with:

def whitelist_file_func():
    with open("whitelist.txt", "r") as whitelist_read:
        whitelist_read.readlines()
    whitelist_read.close()

    unique2 = set()

    with open("scrapedlist.txt", "r") as scrapedlist_read:
        scrapedlist_lines = scrapedlist_read.readlines()
    scrapedlist_read.close()

    unique3 = set()

    with open("updatedlist2.txt", "w") as whitelist_write2:
   
        for line in scrapedlist_lines:
            if unique2 not in line and line not in unique3:
                whitelist_write2.write(line)
                unique3.add(line)

I get this error and I'm also not sure if I'm doing it the right way:

if unique2 not in line and line not in unique3:
TypeError: 'in <string>' requires string as left operand, not set

What should I do to achieve the above-mentioned and also is my code right?

EDIT:

whitelist.txt:

KUWAIT
ISRAEL
FRANCE

scrapedlist.txt:

USA
CANADA
GERMANY
KUWAIT
ISRAEL
FRANCE

updatedlist2.txt (this is how it should be):

USA
CANADA
GERMANY

Solution

  • Based on your description, I applied some changes to your code.

    1. readlines() method is replaced with read().splitlines(). Both of them read the whole file and convert each line to a list item. The difference is readlines() include \n at the end of items.
    2. unique2 and unique3 are removed. I couldn't find their usage.
    3. By two first parts whitelist_lines and scrapedlist_lines are two lists that contain links. Based on your description we need lines of scrapedlist_lines that are not in the whitelist_lines list so condition if unique2 not in line and line not in unique3: changed to if line not in whitelist_lines:.
    4. If you are using Python 2.5 and higher the close() can be called for you automatically using the with statement.

    The final code is:

    with open("whitelist.txt", "r") as whitelist_read:
        whitelist_lines = whitelist_read.read().split("\n")
        
    with open("scrapedlist.txt", "r") as scrapedlist_read:
        scrapedlist_lines = scrapedlist_read.read().split("\n")
    
    with open("updatedlist2.txt", "w") as whitelist_write2:
        for line in scrapedlist_lines:
            if line not in whitelist_lines:
                whitelist_write2.write(line + "\n")