I am creating a scraper for msn money. I obtain the values off of the site and run them through a couple for loops to sort them by year. When my for loops finish all the values are those in the 2018 data set. What's wrong with my code?
from urllib.request import urlopen
from bs4 import BeautifulSoup
from lxml import etree
values = {}
values_by_year = {}
counter = 2013
dict_index = 0
temp = ''
url = "https://www.msn.com/en-us/money/stockdetails/financials/nas-googl/fi-a1u3rw?symbol=GOOGL&form=PRFIHQ"
tree = etree.HTML(urlopen(url).read())
for section in tree.xpath('//*[@id="table-content-area"]'):
for i in range(2, 32):
for x in section.xpath('./div/div/div[1]/div/ul[%s]/li[1]/p/text()'
% (i)):
if i == 6:
values[i] = 0
else:
values[x] = 0
for x in range(2015, 2019):
values_by_year[x] = values
for section in tree.xpath('//*[@id="table-content-area"]'):
for i in range(2, 32):
for y in range(1, 6):
for value in section.xpath(
'./div/div/div[1]/div/ul[%s]/li[%s]/p/text()' % (i,y)):
if y == 1:
temp = value
else:
print("value is ", counter+y, "y is ", y)
values_by_year[counter+y][temp] = value
print(values_by_year[2016])
print("\n------\n")
print(values_by_year[2017])
I receive no error messages. My expected result is for the program to output a dictionary names values_by_year where it contains 4 keys to every year. Each year contains a dictionary of the values corresponding to the year. For example "Period End Date" for 2015 would be 12/31/2015 and for the year 2016 it would be 12/31/2016.
The specific problem in your code is this:
for x in range(2015, 2019):
values_by_year[x] = values
That sets the keys 2015 through 2018 to refer to the same dict
of values
, not copies. So when you do:
values_by_year[counter+y][temp] = value
you're not just modifying a dict
associated with counter+y
, but the one associated with all the keys you initialized.
The minimalist fix is to change:
for x in range(2015, 2019):
values_by_year[x] = values
to:
for x in range(2015, 2019):
values_by_year[x] = values.copy()
so you get your defaults initialized as expected, but insert (shallow) copies of the default dict
(which, since the values in it are int
s, is enough).