Search code examples
pythonweb-scrapingcontextmanagerbeautifulsoup

Reading and appending files context manager: Doesn't seem to read, only writes


I am trying to read and append to a file but when I am using context manager it doesn't seem to work.

In this code I am trying to get all links on a site that contain one of the items in my 'serien' list. If the link is in the list, I am first checking whether the link is already in the file. If the link is found, it is supposed to not append the link again. But it does.

I am either guessing that I am not using the right mode or that I somehow screwed up with my context manager. Or am I completely wrong

import requests
from bs4 import BeautifulSoup


serien = ['izombie', 'grandfathered', 'new-girl']
serien_links = []


#Gets chapter links
def episode_links(index_url):
    r = requests.get(index_url)
    soup = BeautifulSoup(r.content, 'lxml')
    links = soup.find_all('a')
    url_list = []
    for url in links:
        url_list.append((url.get('href')))
    return url_list

urls_unfiltered = episode_links('http://watchseriesus.tv/last-350-posts/')
with open('link.txt', 'a+') as f:
    for serie in serien:
        for x in urls_unfiltered:
            #check whether link is already in file. If not write link to file
            if serie in x and serie not in f.read():
                f.write('{}\n'.format(x))

This is my first time using context managers. Tips would be appreciated.

Edit: Similar project without context manager. Here I also tried using context managers but gave up after I had the same problem.

file2_out = open('url_list.txt', 'a') #local url list for chapter check
for x in link_list:
    #Checking chapter existence in folder and downloading chapter
    if x not in open('url_list.txt').read(): #Is url of chapter in local url list?
        #push = pb.push_note(get_title(x), x)
        file2_out.write('{}\n'.format(x)) #adding downloaded chapter to local url list
        print('{} saved.'.format(x))


file2_out.close()

And with context manager:

with open('url_list.txt', 'a+') as f:
    for x in link_list:
        #Checking chapter existence in folder and downloading chapter
        if x not in f.read(): #Is url of chapter in local url list?
            #push = pb.push_note(get_title(x), x)
            f.write('{}\n'.format(x)) #adding downloaded chapter to local url list
            print('{} saved.'.format(x))

Solution

  • as @martineau mentioned f.read() reads the whole file and then gets empty string. try the below code. it reads the contents to list and later comparisons happens on the list.

    import requests
    from bs4 import BeautifulSoup
    
    serien = ['izombie', 'grandfathered', 'new-girl']
    serien_links = []
    
    
    # Gets chapter links
    def episode_links(index_url):
        r = requests.get(index_url)
        soup = BeautifulSoup(r.content, 'lxml')
        links = soup.find_all('a')
        url_list = []
        for url in links:
            url_list.append((url.get('href')))
        return url_list
    
    
    urls_unfiltered = episode_links('http://watchseriesus.tv/last-350-posts/')
    with open('link.txt', 'a+') as f:
        cont = f.read().splitlines()
        for serie in serien:
            for x in urls_unfiltered:
                # check whether link is already in file. If not write link to file
                if (serie in x) and (x not in cont):
                    f.write('{}\n'.format(x))