def convert():
for url in url_list:
news=Article(url)
news.download()
while news.download_state != 2:
time.sleep(1)
news.parse()
l.append(
{'Title':news.title, 'Text': news.text.replace('\n',' '), 'Date':news.publish_date, 'Author':news.authors}
)
convert()
df = pd.DataFrame.from_dict(l)
df.to_csv('Amazon_try2'+'.csv',encoding='utf-8', index=False)
The function convert() goes through a list of url and process each of them. Each url is a link to an article. I am fetching the important attributes of articles such as author, text etc and then storing this in a data frame. After that, I am converting data frame to a csv file. The script ran for about 5 hours as there were 589 urls in url_list. But I still couldn't get the csv file. Can somebody spot out where I am going wrong.
probably your function stops here:
while news.download_state != 2:
time.sleep(1)
it is waiting for the change of the download state but it never happens. your function should also return a list
something like this should work:
def convert():
for url in url_list:
news=Article(url)
news.download()
news.parse()
l.append(
{'Title':news.title, 'Text': news.text.replace('\n',' '), 'Date':news.publish_date, 'Author':news.authors}
)
return l
l = convert()
df = pd.DataFrame.from_dict(l)
df.to_csv('Amazon_try2'+'.csv',encoding='utf-8', index=False)