Search code examples
pythonrooptweepyreticulate

r reticulate OOP methods failure


I'm quite familiar with python and only know the basics of R; so for a class that requires "use of R", I'm leaning heavily on the library, "reticulate".

I've used this a number of times over the past month or two without issues; however, today I defined a class. I instantiated the class without issues but when I tried to call a method it returned the error AttributeError: 'TweetGrabber' object has no attribute 'user_search'

I'll break my code up into what has worked and what has not, starting with the working:

library('reticulate')

## See the below link to download Python if NOT installed locally.
# https://www.anaconda.com/distribution/

py_config()
use_python(python = '/usr/local/bin/python3')
py_available()
py_install("tweepy")

### === Starts Python environment within R! ===
repl_python()

class TweetGrabber(): # Wrapper for Twitter API.

  def __init__(self):
    import tweepy
    self.tweepy = tweepy
    myApi = 'my_key'
    sApi = 'my_s_key'
    at = 'my_at'
    sAt = 'my_s_at'
    auth = tweepy.OAuthHandler(myApi, sApi)
    auth.set_access_token(at, sAt)
    self.api = tweepy.API(auth)


  def strip_non_ascii(self,string):
    ''' Returns the string without non ASCII characters'''
    stripped = (c for c in string if 0 < ord(c) < 127)
    return ''.join(stripped)

  def keyword_search(self,keyword,csv_prefix):
    import csv        
    API_results = self.api.search(q=keyword,rpp=1000,show_user=True)

    with open(f'{csv_prefix}.csv', 'w', newline='') as csvfile:
      fieldnames = ['tweet_id', 'tweet_text', 'date', 'user_id', 'follower_count',
                'retweet_count','user_mentions']
      writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
      writer.writeheader()

      for tweet in API_results:
        text = self.strip_non_ascii(tweet.text)
        date = tweet.created_at.strftime('%m/%d/%Y')        
        writer.writerow({
          'tweet_id': tweet.id_str,
          'tweet_text': text,
          'date': date,
          'user_id': tweet.user.id_str,
          'follower_count': tweet.user.followers_count,
          'retweet_count': tweet.retweet_count,
          'user_mentions':tweet.entities['user_mentions']
          })        

  def user_search(self,user,csv_prefix):
    import csv
    API_results = self.tweepy.Cursor(self.api.user_timeline,id=user).items()

    with open(f'{csv_prefix}.csv', 'w', newline='') as csvfile:
      fieldnames = ['tweet_id', 'tweet_text', 'date', 'user_id', 'user_mentions', 'retweet_count']
      writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
      writer.writeheader()

      for tweet in API_results:
        text = self.strip_non_ascii(tweet.text)
        date = tweet.created_at.strftime('%m/%d/%Y')        
        writer.writerow({
        'tweet_id': tweet.id_str,
        'tweet_text': text,
        'date': date,
        'user_id': tweet.user.id_str,
        'user_mentions':tweet.entities['user_mentions'],
        'retweet_count': tweet.retweet_count
          })


t = TweetGrabber() # Instantiates the class we've designed

This next line is what triggers the error.

t.user_search(user='Telsa',csv_prefix='tesla_tweets') # Find and save to csv Tesla tweets

Of note, I've run this code in python and it works like a charm. The goal is just a simple API wrapper (for the tweepy API wrapper) so that I can grab and store tweets in a csv with 1 line of code.

I am aware that there are twitter APIs in the R world. I'm working on a compressed timeline where I'm trying to avoid learning twitteR unless that's the only option. If it's really an issue, I can remove the class architecture and call the functions without issue.

I'm puzzled why reticulate can handle so much, falling short of executing class methods. Is there an issue in my code? Does this go beyond what Reticulate is scoped to do?


Solution

  • TL;DR: In the REPL, empty lines mark the end of the class body. What follows is defined in the global scope rather than in the class scope.


    It seems that whatever content follows the repl_python() is directly pasted into the Reticulate REPL (stripping excess indentation). Here an empty line denotes the end of the class definition. After the code for your __init__ follows an empty line and hence the class definition ends here. The following functions are not defined in class scope but instead in the global scope. Consider the following example where I paste some sample code for a class below:

    > library('reticulate')
    > repl_python()
    Python 3.8.1 (/home/a_guest/miniconda3/envs/py38/bin/python3)
    Reticulate 1.14 REPL -- A Python interpreter in R.
    >>> class Foo:
    ...     def __init__(self):
    ...         self.x = 1
    ... 
    >>>     def get_x(self):
    ...         return self.x
    ... 
    >>>
    

    As you can see from the >>> following the code for the __init__ function the REPL returns to global scope. This is because the preceding line is empty. A difference to the standard Python REPL is that the latter would complain about the mismatch in indentation for the following functions. Let's check the above defined class:

    >>> Foo.get_x
    AttributeError: type object 'Foo' has no attribute 'get_x'
    >>> get_x
    <function get_x at 0x7fc7fd490430>
    

    Obviously the get_x has been defined in the global scope.

    Solutions

    The solution is either to remove the empty lines or to make them non-empty by adding spaces. So for example:

    class Foo:
        def __init__(self):
            self.x = 1
        def get_x(self):
            return self.x
    

    Or using spaces:

    class Foo:
        def __init__(self):
            self.x = 1
                              # this line contains some spaces
        def get_x(self):
            return self.x
    

    The number of spaces is not imported, the line must just be not empty.