i found some posts aobut this subject, tried them, but can't get them to work.
My code is:
#!/usr/bin/python
# -*- coding: utf-8 -*-
# Tested Python version: 2.7.12
#
# Run "./script.py [inputfile.txt] [outputfile.txt]"
#
# Exit codes:
# 1 - Python version not tested
# 2 - Wrong number command-line arguments
# 3 - Input file, with this name, does not exist
# 4 - Output file, with this name, already exists
# 5 - Problem with input file
# 6 - Problem with output file
import os, sys
import urllib2, re
# Check python version
req_version = (2, 7)
if not sys.version_info[:2] == req_version:
print '...'
print 'Not tested Python version (2.7).'
print 'Your Python version: ', sys.version_info[:2]
print '...'
sys.exit(1)
# Check command-line arguments
if len(sys.argv) < 3:
print '...'
print 'Missing command-line argument(s).'
print 'Argument list:', str(sys.argv)
print '...'
sys.exit(2)
# Check if files exist
if not os.path.exists(sys.argv[1]):
print '...'
print 'Input file %s was not found.' % sys.argv[1]
print '...'
sys.exit(3)
if os.path.exists(sys.argv[2]):
print '---'
print 'Output file %s already exists.' % sys.argv[2]
print '---'
sys.exit(4)
# Read input file line by line, make a list of URL-s and write the
# results to output file
inputfile = sys.argv[1]
outputfile = sys.argv[2]
print '---'
print 'Reading input file %s ..' % inputfile
print '---'
results = []
try:
with open(inputfile, 'r') as in_f:
for line in in_f:
url = line.strip().split(',')[0]
word = line.strip().split(',')[1]
site = urllib2.urlopen(url).read()
print 'Found "%s" on "%s" ->' % (word, url)
# matches = re.search(word)
# if re.search(word, url):
# if len(matches) == 0:
if site.find(word) != -1:
print 'YES'
results.append('.'.join(url, word + ' YES')))
else:
print 'NO'
results.append('.'.join(url, word + ' NO')))
except:
print 'Error reading the file'
sys.exit(5)
#if not inputfile.closed:
# inputfile.close()
print '>>>' + inputfile + ' closed: ' + inputfile.closed
print '...'
print 'Writing results to output file %s ..' % outputfile
print '...'
try:
with open(outputfile, 'w'):
for item in results:
outputfile.write((results) + '\n')
print '>>>' + outputfile.read()
except:
print 'Error writing to file'
sys.exit(6)
#if not outputfile.closed:
# outputfile.close()
print '>>>' + outputfile + ' closed: ' + outputfile.closed
print ''
print '>>> End of script <<<'
print ''
When i run ./script.py inputfile_name.txt outputfile_name.txt, i get except in terminal from reading inputfile:
...
Reading input file inputfile_name txt ..
...
Error reading the file
Could somebody please point out the possible fault in my code. Can't figure it out.
EDIT: moved the variables (url, word, site) under 'for' block and added print after. The script does print first line of url, word but does not print the "Found ...." % word, url after that. If i remove the print url, word then the script gives except error right away.
EDIT2: made changes as suggested by user Oluwafemi Sule. The script works until the inputfile has multiple words after url (sentence), then it gives except.
The error in your code is from appending to results
list with an incorrect number of arguments.
results.append(url, word + ' YES')
can be written as appending a joined string of url, word and verdict delimited by ,
:
results.append(','.join((url, word, 'YES')))
BONUS:
The following code block:
url = line.strip().split(',')[0]
word = line.strip().split(',')[1]
can be rewritten as:
url, word = line.strip().split(',')
to save from splitting line twice
The following blocks can be removed as context managers handle file closing implicitly.
if not inputfile.closed:
inputfile.close()
print '>>>' + inputfile + ' closed: ' + inputfile.closed
And
if not outputfile.closed:
outputfile.close()
print '>>>' + outputfile + ' closed: ' + outputfile.closed
Lastly, out_f isn't being written to. That's a potential AttributeError
calling write on a string
.