Used the following to fix the problems (for the remaining issues, will change my code around). Sorry for the improper code format in my initial post.
import csv, re, mechanize
htmlML = br.response().read()
#escaping ? fixed the regex match
patMemberName = re.compile('<a href=/foo.php\?XID=(d+) ><font color=#000000><b>(.*) </b>')
searchMemberName = re.findall(patMemberName,htmlML)
MembersCsv = 'path-to-csv'
MemberWriter = csv.writer(open(MembersCsv, 'wb')) #adding b fixed the \n in csv
for i in searchMemberName:
MemberWriter.writerow(i)
print (i)
Thank you for your time
For question 1), you have to escape the ?
in the pattern.
import re
htmlML = '<a href=/foo.php?XID=123 ><font color=#000000><b>user</b>'
patMemberID = re.compile('<a href=/foo.php\?XID=(\d*) ><font color=#000000><b>user</b>')
searchMemberID = re.findall(patMemberID, htmlML)
print len(searchMemberID)
for i in searchMemberID:
print (i)
Then the 123
can be extracted from the string
Question 2a)
You can use (.*?)
to replace some string
, the ?
maens non-greedy match