I'm quite familiar with python and only know the basics of R; so for a class that requires "use of R", I'm leaning heavily on the library, "reticulate".
I've used this a number of times over the past month or two without issues; however, today I defined a class. I instantiated the class without issues but when I tried to call a method it returned the error AttributeError: 'TweetGrabber' object has no attribute 'user_search'
I'll break my code up into what has worked and what has not, starting with the working:
library('reticulate')
## See the below link to download Python if NOT installed locally.
# https://www.anaconda.com/distribution/
py_config()
use_python(python = '/usr/local/bin/python3')
py_available()
py_install("tweepy")
### === Starts Python environment within R! ===
repl_python()
class TweetGrabber(): # Wrapper for Twitter API.
def __init__(self):
import tweepy
self.tweepy = tweepy
myApi = 'my_key'
sApi = 'my_s_key'
at = 'my_at'
sAt = 'my_s_at'
auth = tweepy.OAuthHandler(myApi, sApi)
auth.set_access_token(at, sAt)
self.api = tweepy.API(auth)
def strip_non_ascii(self,string):
''' Returns the string without non ASCII characters'''
stripped = (c for c in string if 0 < ord(c) < 127)
return ''.join(stripped)
def keyword_search(self,keyword,csv_prefix):
import csv
API_results = self.api.search(q=keyword,rpp=1000,show_user=True)
with open(f'{csv_prefix}.csv', 'w', newline='') as csvfile:
fieldnames = ['tweet_id', 'tweet_text', 'date', 'user_id', 'follower_count',
'retweet_count','user_mentions']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for tweet in API_results:
text = self.strip_non_ascii(tweet.text)
date = tweet.created_at.strftime('%m/%d/%Y')
writer.writerow({
'tweet_id': tweet.id_str,
'tweet_text': text,
'date': date,
'user_id': tweet.user.id_str,
'follower_count': tweet.user.followers_count,
'retweet_count': tweet.retweet_count,
'user_mentions':tweet.entities['user_mentions']
})
def user_search(self,user,csv_prefix):
import csv
API_results = self.tweepy.Cursor(self.api.user_timeline,id=user).items()
with open(f'{csv_prefix}.csv', 'w', newline='') as csvfile:
fieldnames = ['tweet_id', 'tweet_text', 'date', 'user_id', 'user_mentions', 'retweet_count']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for tweet in API_results:
text = self.strip_non_ascii(tweet.text)
date = tweet.created_at.strftime('%m/%d/%Y')
writer.writerow({
'tweet_id': tweet.id_str,
'tweet_text': text,
'date': date,
'user_id': tweet.user.id_str,
'user_mentions':tweet.entities['user_mentions'],
'retweet_count': tweet.retweet_count
})
t = TweetGrabber() # Instantiates the class we've designed
This next line is what triggers the error.
t.user_search(user='Telsa',csv_prefix='tesla_tweets') # Find and save to csv Tesla tweets
Of note, I've run this code in python and it works like a charm. The goal is just a simple API wrapper (for the tweepy API wrapper) so that I can grab and store tweets in a csv with 1 line of code.
I am aware that there are twitter APIs in the R world. I'm working on a compressed timeline where I'm trying to avoid learning twitteR unless that's the only option. If it's really an issue, I can remove the class architecture and call the functions without issue.
I'm puzzled why reticulate can handle so much, falling short of executing class methods. Is there an issue in my code? Does this go beyond what Reticulate is scoped to do?
TL;DR: In the REPL, empty lines mark the end of the class body. What follows is defined in the global scope rather than in the class scope.
It seems that whatever content follows the repl_python()
is directly pasted into the Reticulate REPL (stripping excess indentation). Here an empty line denotes the end of the class definition. After the code for your __init__
follows an empty line and hence the class definition ends here. The following functions are not defined in class scope but instead in the global scope. Consider the following example where I paste some sample code for a class below:
> library('reticulate')
> repl_python()
Python 3.8.1 (/home/a_guest/miniconda3/envs/py38/bin/python3)
Reticulate 1.14 REPL -- A Python interpreter in R.
>>> class Foo:
... def __init__(self):
... self.x = 1
...
>>> def get_x(self):
... return self.x
...
>>>
As you can see from the >>>
following the code for the __init__
function the REPL returns to global scope. This is because the preceding line is empty. A difference to the standard Python REPL is that the latter would complain about the mismatch in indentation for the following functions. Let's check the above defined class:
>>> Foo.get_x
AttributeError: type object 'Foo' has no attribute 'get_x'
>>> get_x
<function get_x at 0x7fc7fd490430>
Obviously the get_x
has been defined in the global scope.
The solution is either to remove the empty lines or to make them non-empty by adding spaces. So for example:
class Foo:
def __init__(self):
self.x = 1
def get_x(self):
return self.x
Or using spaces:
class Foo:
def __init__(self):
self.x = 1
# this line contains some spaces
def get_x(self):
return self.x
The number of spaces is not imported, the line must just be not empty.