I have the below code to scrape some specific word list from the financial statements (US SEC EDGAR 10K) text file. Will highly appreciate if you anyone can help me with this. I have manually cross-checked and found the words in the document, but my code is not finding any word at all. I am using Python 3.5.3. Thanks in advance
#!/usr/bin/python
# -*- coding: utf-8 -*-
import urllib.request as urllib2
import time
import csv
import sys
CIK = '0001018724'
Year = '2013'
string_match1 = 'edgar/data/1018724/0001193125-13-028520.txt'
url3 = 'https://www.sec.gov/Archives/' + string_match1
response3 = urllib2.urlopen(url3)
words = [
'anticipate',
'believe',
'depend',
'fluctuate',
'indefinite',
'likelihood',
'possible',
'predict',
'risk',
'uncertain',
]
count = {} # is a dictionary data structure in Python
for elem in words:
count[elem] = 0
for line in response3:
elements = line.split()
for word in words:
count[word] = count[word] + elements.count(word)
print CIK
print Year
print url3
print count
Here is the script output:
0001018724
2013
https://www.sec.gov/Archives/edgar/data/1018724/0001193125-13-028520.txt
{
'believe': 0,
'likelihood': 0,
'anticipate': 0,
'fluctuate': 0,
'predict': 0,
'risk': 0,
'possible': 0,
'indefinite': 0,
'depend': 0,
'uncertain': 0,
}
A simplified version of your code seems to work in Python 3.7 with the requests library:
import requests
url = 'https://www.sec.gov/Archives/edgar/data/1018724/0001193125-13-028520.txt'
response = requests.get(url)
words = [your word list above ]
count = {} # is a dictionary data structure in Python
for elem in words:
count[elem] = 0
info = str(response.content)
count[elem] = count[elem] + info.count(elem)
print(count)
Output:
{'anticipate': 9, 'believe': 32, 'depend': 39, 'fluctuate': 4, 'indefinite': 15, 'likelihood': 15, 'possible': 25,
'predict': 6, 'risk': 55, 'uncertain': 38}