Search code examples
pythonpython-3.xcsvweb-scrapingreturn

Why should I use `return` when the script is functional without that?


I've created a script using Python to parse the movie names and its years spread across multiple pages from a torrent site and write them to a csv file. It is working errorlessly and writing the data to a csv file without any issues.

I did the whole thing without the usage of this very line return itemlist within my get_data() function and as I create this function write_data() fully independent so I wrote the data to a CSV file taken from this list itemlist located under the variable URLS.

If I keep the existing design intact, is it necessary to use this very line return itemlist which is commented out now? If so, why?

import requests
from bs4 import BeautifulSoup
import csv

URLS = ["https://yts.am/browse-movies?page={}".format(page) for page in range(1,6)]
itemlist = []

def get_data(links):
    for url in links:
        res = requests.get(url)
        soup = BeautifulSoup(res.text,"lxml")
        for record in soup.select('.browse-movie-bottom'):
            items = {}
            items["Name"] = record.select_one('.browse-movie-title').text
            items["Year"] = record.select_one('.browse-movie-year').text
            itemlist.append(items)
    # return itemlist

def write_data():
    with open("outputfile.csv","w", newline="") as f:
        writer = csv.DictWriter(f,['Name','Year'])
        writer.writeheader()
        for data in itemlist:
            writer.writerow(data)

if __name__ == '__main__':
    get_data(URLS)
    write_data()

Solution

  • With existing design you don't need that line because your get_data intend to modify list from outer scope instead of return list.

    But if you want to rename itemlist you need to rename it also in both get_data and write_data (in all functions that might use it)

    You might need return itemlist if you define write_data as

    def write_data(some_list):
        ...
    

    and use it as

    if __name__ == '__main__':
        write_data(get_data(URLS))
    

    In this case write_data receives list returned by get_data and you don't need to define itemlist = [] outside get_data