So I have whitelist.txt which contains some links, and scrapedlist.txt which contains other links, and also links that are in whitelist.txt.
I'm trying to open and read whitelist.txt and then open and read scrapedlist.txt - to write to a new file updatedlist2.txt which will have all the contents of scrapedlist.txt minus whitelist.txt.
I'm pretty new to Python, so still learning. I've searched for answers, and this is what I came up with:
def whitelist_file_func():
with open("whitelist.txt", "r") as whitelist_read:
whitelist_read.readlines()
whitelist_read.close()
unique2 = set()
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.readlines()
scrapedlist_read.close()
unique3 = set()
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if unique2 not in line and line not in unique3:
whitelist_write2.write(line)
unique3.add(line)
I get this error and I'm also not sure if I'm doing it the right way:
if unique2 not in line and line not in unique3:
TypeError: 'in <string>' requires string as left operand, not set
What should I do to achieve the above-mentioned and also is my code right?
EDIT:
whitelist.txt:
KUWAIT
ISRAEL
FRANCE
scrapedlist.txt:
USA
CANADA
GERMANY
KUWAIT
ISRAEL
FRANCE
updatedlist2.txt (this is how it should be):
USA
CANADA
GERMANY
Based on your description, I applied some changes to your code.
readlines()
method is replaced with read().splitlines()
. Both of them read the whole file and convert each line to a list item. The difference is readlines()
include \n
at the end of items.unique2
and unique3
are removed. I couldn't find their usage.whitelist_lines
and scrapedlist_lines
are two lists that contain links. Based on your description we need lines of scrapedlist_lines
that are not in the whitelist_lines
list so condition if unique2 not in line and line not in unique3:
changed to if line not in whitelist_lines:
.The final code is:
with open("whitelist.txt", "r") as whitelist_read:
whitelist_lines = whitelist_read.read().split("\n")
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.read().split("\n")
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if line not in whitelist_lines:
whitelist_write2.write(line + "\n")