Search code examples
pythonselenium-webdriverweb-scrapingscrapy

How to Adjust Nitter Scraper to Print New Tweets in Real-Time?


I'm using the ntscraper library to fetch tweets from a specific user. Currently, the script fetches the most recent tweet, but it only pulls pre-existing tweets at the time the script runs. Here's the code I'm using:

from ntscraper import Nitter
import pandas as pd

# Initialize the scraper
scraper = Nitter()

# Fetch the most recent tweet (limit to 1)
tweets_data = scraper.get_tweets("Vader_AI_", mode='user', number=1)

# Extract the latest tweet
if tweets_data and 'tweets' in tweets_data and len(tweets_data['tweets']) > 0:
    latest_tweet = tweets_data['tweets'][0]  # First tweet is the most recent
    print("Latest Tweet:")
    print(f"Text: {latest_tweet['text']}")
    print(f"Link: {latest_tweet['link']}")

    # Optional: Save to CSV
    df = pd.DataFrame([latest_tweet])
    df.to_csv('latest_tweet.csv', index=False)
    print("Latest tweet saved to latest_tweet.csv")
else:
    print("No tweets found.")

Is there a way to adjust this so that it continuously monitors the Twitter page and prints a new tweet in real-time as soon as it is posted? Essentially, I'd like the script to wait and detect new tweets instead of fetching older ones.

Would something like Selenium or Scrapy be necessary, or can this be achieved with ntscraper alone?. I'm trying to avoid APIs.

Any suggestions on the best approach to implement this would be greatly appreciated.

Thank you.


Solution

  • Bury that code within a function:

    def fetch(scraper):
    
        # Fetch the most recent tweet (limit to 1)
        tweets_data = scraper.get_tweets("Vader_AI_", mode='user', number=1)
        
        # Extract the latest tweet
        if tweets_data and 'tweets' in tweets_data and len(tweets_data['tweets']) > 0:
            latest_tweet = tweets_data['tweets'][0]  # First tweet is the most recent
            print("Latest Tweet:")
            print(f"Text: {latest_tweet['text']}")
            ...
    

    Now you're set up to monitor for fresh tweets.

    from time import sleep
    ...
    scraper = Nitter()
    
    while True:
        fetch(scraper)
        sleep(60)
    

    Notice that along with 'text' you get a timestamp field:

    >>> from pprint import pp
    >>> pp(tweets_data)
    {'tweets': [{'link': 'https://twitter.com/Vader_AI_/status/1877024486160998869#m',
                 'text': '$VU is currently trading at $7M MCap, up 21% in the last '
                         '24 hrs.',
                 'user': {'name': 'VaderAI',
                          'username': '@Vader_AI_',
                          'profile_id': '1863883494084276224',
                          'avatar': 'https://pbs.twimg.com/profile_images/1863883494084276224/MhLpfb2V_bigger.jpg'},
                 'date': 'Jan 8, 2025 · 4:08 PM UTC',
                 ...
    

    Use that 'date' field to suppress duplicates. Store it in a local variable like previous, and only report tweets when it changes from the previous timestamp. Or just compare whether the 'text' has changed.