Search code examples
pythonregexcsvscreen-scraping

Regex and csv issues in python 2.7


Used the following to fix the problems (for the remaining issues, will change my code around). Sorry for the improper code format in my initial post.

import csv, re, mechanize  

htmlML = br.response().read() 

#escaping ? fixed the regex match 
patMemberName = re.compile('<a href=/foo.php\?XID=(d+) ><font color=#000000><b>(.*) </b>') 
searchMemberName = re.findall(patMemberName,htmlML)

MembersCsv = 'path-to-csv' 
MemberWriter = csv.writer(open(MembersCsv, 'wb')) #adding b fixed the \n in csv

for i in searchMemberName:
    MemberWriter.writerow(i)
    print (i)

Thank you for your time


Solution

  • For question 1), you have to escape the ? in the pattern.

    import re
    
    htmlML = '<a href=/foo.php?XID=123 ><font color=#000000><b>user</b>'
    patMemberID = re.compile('<a href=/foo.php\?XID=(\d*) ><font color=#000000><b>user</b>')
    
    searchMemberID = re.findall(patMemberID, htmlML)
    print len(searchMemberID)
    
    for i in searchMemberID:
        print (i)
    

    Then the 123 can be extracted from the string

    Question 2a)

    You can use (.*?) to replace some string, the ? maens non-greedy match