Search code examples
pythonweb-scrapingtelegramreddit

Reddit Scraper and Telegram Bot


I’ve got an idea to scrape some scientific news from the "Science" subreddit and broadcast it via a telegram bot into my telegram channel. I’ve constructed these two simple code fragments in Python for each of these tasks. Now I’m wondering what is the best way to combine them in one solid block of code so that the bot can automatically send the info that has been scraped to the channel each time the program is executed. Both scripts work just fine individually. Please advise.

Reddit Scraper

import praw

# assigning Reddit API data
# see further instructions here --> https://www.reddit.com/prefs/apps
reddit = praw.Reddit(client_id='XXXX', \
                     client_secret='XXXXXXXXXXXXXXXXXXXXXXX', \
                     user_agent='science_bot', \
                     username='XXXXXX', \
                     password='XXXXXXXXXXXXXXXXXX')

# select a subreddit you want to use for scraping data
subreddit = reddit.subreddit('science')
new_subreddit = subreddit.new(limit=500)
print("\t", "Digest of the latest scientific news for today: \n")
for submission in subreddit.new(limit=5):
    print(submission.title)
    print(submission.url, "\n")

Posting Telegram Bot

import requests

def telegram_bot_sendtext(bot_message):
    
    bot_token = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
    bot_chatID = '@XXXXXX'
    send_text = 'https://api.telegram.org/bot' + bot_token + '/sendMessage?chat_id=' + bot_chatID + '&parse_mode=Markdown&text=' + bot_message

    response = requests.get(send_text)

    return response.json()
    

test = telegram_bot_sendtext("Testing my new Telegram bot.")
print(test)

Thanks in advance!


Solution

  • I've solved the problem using the following code structure. Kudos to @maxwell for a simple and elegant idea.

    import telegram
    import telebot
    import praw
    
    bot_token = 'XXXXXXXXXXXXXXXXXXXXXXXXX'
    bot_chatID = '@your_channel_name'
    bot = telebot.TeleBot('XXXXXXXXXXXXXXXXXXXXXXXXX')
    
    reddit = praw.Reddit(client_id='XXXXXXXXXXXXXX', \
                         client_secret='XXXXXXXXXXXXXXXXXXXXXXXX', \
                         user_agent='your_bot_name', \
                         username='your_reddit_username', \
                         password='XXXXXXXXXXXXXX')
    
    def reddit_scraper(submission):
        news_data = []
        subreddit = reddit.subreddit('name_of_subreddit')
        new_subreddit = subreddit.new(limit=500)
        for submission in subreddit.new(limit=5):
            data = {}
            data['title'] = submission.title
            data['link'] = submission.url
            news_data.append(data)
        return news_data
    
    def get_msg(news_data):
        msg = '\n\n\n'
        for news_item in news_data:
            title = news_item['title']
            link = news_item['link']
            msg += title+'\n[<a href="'+link+'">Read the full article --></a>]'
            msg += '\n\n'
    
        return msg
    
    subreddit = reddit.subreddit('name_of_subreddit')
    new_subreddit = subreddit.new(limit=500)
    for submission in subreddit.new(limit=1):
        news_data = reddit_scraper(submission)
        if len(news_data) > 0:
            msg = get_msg(news_data)
            status = bot.send_message(chat_id='@your_channel_name', text=msg, parse_mode=telegram.ParseMode.HTML)        
            if status:            
                print(status)
    else:
        print('No updates.')