I've been playing around with Flask and created a simple web scraping app that collects news articles and then places them in the dataTables JQuery plugin for output in HTML. The stockNews(ticker) function works great when ticker
is hardcoded. However, I wanted to take it one step further and retrieve user input for ticker
via an html form. This is where I'm having issues, I've tried finding tutorials and reading over the flask documentation but can't seem to get it the app to work as expected.
What I'm trying to achieve:
stockNews(ticker)
when the user submits a stock ticker via html form/<ticker>
as new URLWhat currently happens:
AttributeError: 'NoneType' object has no attribute 'find_all'
this is caused by my function running prior to the user entering a stock ticker in the html form.app.py
import pandas as pd
import datetime as dt
import time
import requests
from tabulate import tabulate
from bs4 import BeautifulSoup
from flask import Flask, render_template, request
import json
app = Flask(__name__)
@app.route('/', methods=['GET', 'POST'])
def index():
# retreives user input
ticker = request.form.get('ticker')
return render_template('index.html', ticker=ticker)
@app.route('/<ticker>', methods=['GET', 'POST'])
index()
# scrapes stock news from finviz
def stockNews(ticker):
url = 'https://finviz.com/quote.ashx?t=' + ticker
html = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
soup = BeautifulSoup(html.text, 'lxml')
# finds news table within finviz website
match = soup.find('table', class_="fullview-news-outer")
dates = []
time = []
# appends dates in html to list
for d in match.find_all("td", width="130"):
if len(d.text.split(' ')) == 2:
dates.append(d.text.split(' ')[0])
time.append(d.text.split(' ')[1])
elif len(d.text.split(' ')) == 1:
dates.append('None')
time.append(d.text.split(' ')[0])
# uses an assignment expression to replace 'None' with previous element in list
dates = [current:=d if d != 'None' else current for d in dates]
articles = []
# appends new title to titles list
for t in match.find_all("a", class_="tab-link-news"):
match.find(class_='tab-link-news')['class'] = "news-link"
articles.append(str(t))
df_news = pd.DataFrame(list(zip(dates, time, articles)), columns=['Date', 'Time', 'Article'])
# formats Date column/datetime string in dataframe
df_news['Date'] = pd.to_datetime(df_news['Date'], errors='ignore').dt.strftime('%Y-%m-%d')
json_news = json.loads(df_news.to_json(orient='records'))
return render_template('index.html', json_news=json_news)
index.html
{% extends 'base.html' %}
{% block title %}
<title>Stock Info</title>
{% endblock %}
{% block body %}
<center>
<form method="POST">
<input name="ticker">
<input type="submit">
</form>
</center>
<div class="container">
<h1 class="header">News</h1>
<table class="table table-striped table-sm" id='news' style="width: 100%;">
<thead style='position: relative; width: 100%;'>
<tr>
<th>Date</th>
<th>Time</th>
<th>Article</th>
</tr>
</thead>
</table>
</div>
<script>
var news = {{ json_news | safe }};
$(document).ready(function() {
$('#news').DataTable( {
"data": news,
"scrollY": 600,
"paging": false,
"scrollCollapse": true,
"order": [[ 0, "desc" ]],
"columns": [
{ "data": "Date" },
{ "data": "Time" },
{ "data": "Article" },
]
} )
} );
</script>
{% endblock %}
base.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- CSS only -->
<script src="https://code.jquery.com/jquery-3.5.1.js" integrity="sha256-QWo7LDvxbWT2tbbQ97B53yJnYU3WhH/C8ycbRAkjPDc=" crossorigin="anonymous"></script>
<link href="https://stackpath.bootstrapcdn.com/bootswatch/4.5.0/slate/bootstrap.min.css" rel="stylesheet" integrity="sha384-idNH3UIOiZbCf8jxqu4iExnH34y5UovfW/Mg8T5WfNvoJolDvknoNqR69V2OexgF" crossorigin="anonymous">
<link href="https://cdn.datatables.net/1.10.21/css/dataTables.bootstrap4.min.css" rel="stylesheet"/>
<link href="{{ url_for('static', filename='css/main.css') }}" rel="stylesheet" type="text/css">
<link href="https://cdn.jsdelivr.net/npm/simplebar@latest/dist/simplebar.css" rel="stylesheet"/>
<link href="https://cdn.datatables.net/1.10.21/js/dataTables.bootstrap4.min.js" rel="stylesheet" type="text/css">
{% block title %} {% endblock %}
</head>
<body>
{% block body %} {% endblock %}
<!-- JS, Popper.js, and jQuery -->
<script src="https://cdn.jsdelivr.net/npm/popper.js@1.16.0/dist/umd/popper.min.js" integrity="sha384-Q6E9RHvbIyZFJoft+2mJbHaEWldlvI9IOYy5n3zV9zzTtmI3UksdQRVvoxMfooAo" crossorigin="anonymous"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js" integrity="sha384-OgVRvuATP1z7JjHLkuOU7Xw704+h835Lr+6QL9UvYjZE3Ipu6Tp75j7Bh/kR0JKI" crossorigin="anonymous"></script>
<script type="text/javascript" charset="utf8" src="https://cdn.datatables.net/1.10.21/js/jquery.dataTables.js"></script>
</body>
</html>
ATTRIBUTE ERROR
I think the way you are trying to get the td's is the issue, something like this may be easier to do:
match = soup.findAll('table', {'class':'fullview-news-outer'})
rows = match.findAll('tr')
for row in rows:
k = row.findAll('td') #This (k) is the td
ROUTING
You are missing one main thing that I think could solve your problem. When defining a route and it's methods like POST or GET, you need to cater for them.
@app.route('/', methods=['GET', 'POST'])
def index():
if request.method == "POST":
ticker = request.form['ticker']
#data processing or loading tables etc.
return render_template('index.html', ticker=ticker)
else:
#Normal Page Load
return render_template("index.html", ticker=None)
You also may want to allow for the none type of ticker using an if statemen in your html:
{% if ticker == None %} No news {% endif %}