I have tweets scraped in MySQL database and I manage to connect to it and query for column that contains tweets' text. Now what I want to do is parse this and extract hashtags into a csv file.
So far, I have this code that is working until the last loop:
import re
import MySQLdb
# connects to database
mydb = MySQLdb.connect(host='****',
user='****',
passwd='****',
db='****')
cursor = mydb.cursor()
# queries for column with tweets text
getdata = 'SELECT text FROM bitscrape'
cursor.execute(getdata)
results = cursor.fetchall()
for i in results:
hashtags = re.findall(r"#(\w+)", i)
print hashtags
I get the following error: TypeError: expected string or buffer. And the problem is in line hashtags = re.findall(r"#(\w+)", i).
Any suggestions?
Thanks!
cursor.fetchall()
returns a list of tuples. Take the first element from each row and pass it to findall()
:
for row in results:
hashtags = re.findall(r"#(\w+)", row[0])
Hope that helps.