Search code examples
python-3.xpython-requests-html

How to use a URL list (.txt) in python correctly?


I need to make a checker for availability of an unlimited number of sites, could you please tell me how to implement a list of URLs so the script reads the URLs from the .txt and loops each of them?

My approximate non-working code:

import requests
urls_list = open('G:\\urls_list.txt', 'r+')
for url in urls_list:
    response = requests.get(url)
    if response.status_code != 200:
        print('Not active'.format(url))



Solution

  • It is because, when doing for url in urls_list: ..., the url string will contain a newline character \n at the end.

    You need to use str.rstrip to remove them.

    Also, you better use a context-manager when reading files (with open("urls.txt") as f), it handles closing the file for you.

    import requests
    
    with open("urls.txt") as f:
        for url in map(str.rstrip, f):
    
            print("-" * 34)
            print(f"{url = }")
    
            try:
                response = requests.get(url)
            except requests.ConnectionError as e:
                print(e)
                continue
    
            print(f"{response.status_code = }")
    
    ----------------------------------
    url = 'https://github.com/'
    response.status_code = 200
    ----------------------------------
    url = 'https://githubb.com/'
    HTTPSConnectionPool(host='githubb.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f26a6758f70>: Failed to establish a new connection: [Errno -2] Name or service not known'))
    ----------------------------------
    url = 'https://www.gnu.org/software/bash/manual/bash.html'
    response.status_code = 200
    ----------------------------------
    url = 'https://www.gnu.org/software/bash/manual/bashh.html'
    response.status_code = 404
    ----------------------------------
    url = 'https://stackoverflow.com/'
    response.status_code = 200
    ----------------------------------
    url = 'https://stackoverfloww.com/'
    response.status_code = 200