Search code examples
pythondatetwitterdata-sciencetweepy

Extract date from tweets (Tweepy, Python)


I'm new to Python, and so I'm struggling a bit with this. Basically, the code below gets the text of tweets with the hashtag bitcoin in it, and I want to extract the date and author as well as the text. I've tried different things, but stuck rn. Greatly appreciate any help with this.

import pandas as pd
import numpy as np
import tweepy

api_key = '*'
api_secret_key = '*'
access_token = '*'
access_token_secret = '*'

authentication = tweepy.OAuthHandler(consumer_key, consumer_secret_key)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(authentication, wait_on_rate_limit=True)

#Get tweets about Bitcoin and filter out any retweets
search_term = '#bitcoin -filter:retweets'
tweets = tweepy.Cursor(api.search_tweets, q=search_term, lang='en', since='2018-11-01', tweet_mode='extended').items(50)
all_tweets = [tweet.full_text for tweet in tweets]


df = pd.DataFrame(all_tweets, columns=['Tweets'])
df.head()

Solution

  • If you use dir(tweet) then you see all variables and functions in object tweet

    author
    contributors
    coordinates
    created_at
    destroy
    display_text_range
    entities
    extended_entities
    favorite
    favorite_count
    favorited
    full_text
    geo
    id
    id_str
    in_reply_to_screen_name
    in_reply_to_status_id
    in_reply_to_status_id_str
    in_reply_to_user_id
    in_reply_to_user_id_str
    is_quote_status
    lang
    metadata
    parse
    parse_list
    place
    possibly_sensitive
    retweet
    retweet_count
    retweeted
    retweets
    source
    source_url
    truncated
    user
    

    And there is created_at

    all_tweets = []
    
    for tweet in tweets:
        #print('\n'.join(dir(tweet)))
        all_tweets.append( [tweet.full_text, tweet.created_at] )
    
    df = pd.DataFrame(all_tweets, columns=['Tweets', 'Created At'])
    df.head()
    

    Result:

                                               Tweets                Created At
    0  @Ralvero Of course $KAWA ready for 100x 🚀#ETH ... 2022-03-26 13:51:06+00:00
    1  Pairs:1INCHUSDT \n SELL:1.58500\n Time :3/26/2...  2022-03-26 13:51:06+00:00
    2  @hotcrosscom @iSafePal 🌐 First LIVE Dapp: Cylu... 2022-03-26 13:51:04+00:00
    3  @Justdoitalex @Isabel_Schnabel Finally a truth...  2022-03-26 13:51:03+00:00
    4  #Bitcoin has rejected for the fourth time the ...  2022-03-26 13:50:55+00:00
    

    But your code have problem with since because it seems it was removed in version 3.8

    See: Collect tweets in a specific time period in Tweepy, until and since doesn't work