Search code examples
pythontwittertweepytwitterapi-python

Feature extraction with tweet IDs


I am trying to extract the information of a tweet from an ID. With the ID, I want to get the tweet creation date, the tweet, location, followers, friends, favorites, their profile description, if they're verified, and the language, but I'm having trouble doing it. Next, I will show the steps I follow to carry out what I want.

I have made the following code. To start with, I have the IDs of the tweets in a txt file and I read them as follows:

# Read txt file
txt = '/content/drive/MyDrive/Mini-proyecto Texto/archivo.txt'
with open(txt) as archivo:
  lines = archivo.readlines()

Next, I add each of the IDs to a list:

# Add the IDs to a list
IDs = []
for i in lines:
  IDs.append(i.rsplit())
  #print(i.rsplit())

IDs
#[['1206924075374956547'],
# ['1210912199402819584'],
# ['1210643148998938625'],
# ['1207776839697129472'],
# ['1203627609759920128'],
# ['1205895318212136961'],
# ['1208145724879364100'], ...

Finally, start extracting the information you require as follows:

# Extract information from tweets
tweets_df2 = pd.DataFrame()
for i in IDs:
  try:
    info_tweet = api.get_status(i, tweet_mode="extended")  
  except:
    pass

  tweets_df2 = tweets_df2.append(pd.DataFrame({'ID': info_tweet.id,
                                             'Tweet': info_tweet.full_text,
                                             'Creado_tweet': info_tweet.created_at,
                                             'Locacion_usuario': info_tweet.user.location,
                                             'Seguidores_usuario': info_tweet.user.followers_count,
                                             'Amigos_usuario': info_tweet.user.friends_count,
                                             'Favoritos_usuario': info_tweet.user.favourites_count,
                                             'Descripcion_usuario': info_tweet.user.description,
                                             'Verificado_usuario': info_tweet.user.verified,
                                             'Idioma': info_tweet.lang}, index=[0]))
  tweets_df2 = tweets_df2.reset_index(drop=True)


tweets_df2

The following image is the output of the tweets_df2 variable, but I don't understand why the values are repeated over and over again. Does anyone know what's wrong with my code?

Output of tweets_df2 variable

If you need the txt I provide you with the link of the drive. https://drive.google.com/file/d/1vyohQMpLqlKqm6b4iTItcVVqL2wBMXWp/view?usp=sharing

Thank you very much in advance for your time :3


Solution

  • Your code basically runs fine for me with a few adjustments. I noticed that your identation is not correct and your list of IDs is a list of lists (with just one element) rather than one list of elements.

    Try this:

    api = tweepy.Client(consumer_key=api_key, 
                           consumer_secret=api_key_secret,
                           access_token=access_token, 
                           access_token_secret=access_token_secret,
                           bearer_token=bearer_token, 
                           wait_on_rate_limit=True,
                            )
    
    auth = tweepy.OAuth1UserHandler(
       api_key, api_key_secret, access_token, access_token_secret
    )
    
    api = tweepy.API(auth)
    
    txt = "misocorpus-misogyny.txt"
    with open(txt) as archivo:
        lines = archivo.readlines()
    
    IDs = []
    for i in lines:
        IDs.append(i.strip())# <-- use strip() to remove /n rather than split
    
    tweets_df2 = pd.DataFrame()
    
    for i in IDs:
        try:
            info_tweet = api.get_status(i, tweet_mode="extended")  
        except:
            pass
        
        tweets_df2 = tweets_df2.append(pd.DataFrame({'ID': info_tweet.id,
                                                 'Tweet': info_tweet.full_text,
                                                 'Creado_tweet': info_tweet.created_at,
                                                 'Locacion_usuario': info_tweet.user.location,
                                                 'Seguidores_usuario': info_tweet.user.followers_count,
                                                 'Amigos_usuario': info_tweet.user.friends_count,
                                                 'Favoritos_usuario': info_tweet.user.favourites_count,
                                                 'Descripcion_usuario': info_tweet.user.description,
                                                 'Verificado_usuario': info_tweet.user.verified,
                                                 'Idioma': info_tweet.lang}, index=[0]))
        tweets_df2 = tweets_df2.reset_index(drop=True)
    
    tweets_df2
    

    Result:

        ID  Tweet   Creado_tweet    Locacion_usuario    Seguidores_usuario  Amigos_usuario  Favoritos_usuario   Descripcion_usuario     Verificado_usuario  Idioma
    0   1206924075374956547     Las feminazis quieren por poco que este chico ...   2019-12-17 13:08:17+00:00   Argentina   1683    2709    28982   El Progresismo es un Cáncer que quiere destrui...   False   es
    1   1210912199402819584     @CarlosVerareal @Galois2807 Los halagos con pi...   2019-12-28 13:15:40+00:00   Ecuador     398     1668    3123    Cuando te encuentres n una situación imposible...   False   es
    2   1210643148998938625     @drummniatico No se vaya asustar! Ese es el gr...   2019-12-27 19:26:34+00:00   Samborondon - Ecuador   1901    1432    39508   Todo se alinea a nuestro favor. 💙💙💙💙    False   es
    3   1210643148998938625     @drummniatico No se vaya asustar! Ese es el gr...   2019-12-27 19:26:34+00:00   Samborondon - Ecuador   1901    1432    39508   Todo se alinea a nuestro favor. 💙💙💙💙    False   es
    4   1203627609759920128     Mostritas #Feminazi amenazando como ellas sabe...   2019-12-08 10:49:19+00:00   Lima, Perú  2505    3825    45087   Latam News Report. Regional and World affairs....   False