Search code examples
pythonpython-multithreadingtweepy

Run python script every hour


I want scheduled to run my python script every hour and save the data in elasticsearch index. So that I used a function I wrote, set_interval which uses the tweepy library. But it doesn't work as I need it to work. It runs every minute and save the data in index. Even after the set that seconds equal to 3600 it runs in every minute. But I want to configure this to run on an hourly basis.

How can I fix this? Heres my python script:

def call_at_interval(time, callback, args):
    while True:
        timer = Timer(time, callback, args=args)
        timer.start()
        timer.join()


def set_interval(time, callback, *args):
    Thread(target=call_at_interval, args=(time, callback, args)).start()


def get_all_tweets(screen_name):
    # authorize twitter, initialize tweepy
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_key, access_secret)
    api = tweepy.API(auth)

    screen_name = ""

    # initialize a list to hold all the tweepy Tweets
    alltweets = []

    # make initial request for most recent tweets (200 is the maximum allowed count)
    new_tweets = api.user_timeline(screen_name=screen_name, count=200)

    # save most recent tweets
    alltweets.extend(new_tweets)

    # save the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    # keep grabbing tweets until there are no tweets left to grab
    while len(new_tweets) > 0:
        #print
        #"getting tweets before %s" % (oldest)

        # all subsiquent requests use the max_id param to prevent duplicates
        new_tweets = api.user_timeline(screen_name=screen_name, count=200, max_id=oldest)

        # save most recent tweets
        alltweets.extend(new_tweets)

        # update the id of the oldest tweet less one
        oldest = alltweets[-1].id - 1

        #print
        #"...%s tweets downloaded so far" % (len(alltweets))

    outtweets = [{'ID': tweet.id_str, 'Text': tweet.text, 'Date': tweet.created_at, 'author': tweet.user.screen_name} for tweet in alltweets]

    def save_es(outtweets, es):  # Peps8 convention
        data = [  # Please without s in data
            {
                "_index": "index name",
                "_type": "type name",
                "_id": index,
                "_source": ID
            }
            for index, ID in enumerate(outtweets)
        ]
        helpers.bulk(es, data)

    save_es(outtweets, es)

    print('Run at:')
    print(datetime.now())
    print("\n")

    set_interval(3600, get_all_tweets(screen_name))

Solution

  • Why do you need so much complexity to do some task every hour? You can run script every one hour this way below, note that it is runned 1 hour + time to do work:

    import time
    
    
    def do_some_work():
        print("Do some work")
        time.sleep(1)
        print("Some work is done!")
    
    
    if __name__ == "__main__":
        time.sleep(60)  # imagine you would like to start work in 1 minute first time
        while True:
            do_some_work()
            time.sleep(3600)  # do work every one hour
    

    If you want to run script exactly every one hour, do the following code below:

    import time
    import threading
    
    
    def do_some_work():
        print("Do some work")
        time.sleep(4)
        print("Some work is done!")
    
    
    if __name__ == "__main__":
        time.sleep(60)  # imagine you would like to start work in 1 minute first time
        while True:
            thr = threading.Thread(target=do_some_work)
            thr.start()
            time.sleep(3600)  # do work every one hour 
    

    In this case thr is supposed to finish it's work faster than 3600 seconds, though it does not, you'll still get results, but results will be from another attempt, see the example below:

    import time
    import threading
    
    
    class AttemptCount:
        def __init__(self, attempt_number):
            self.attempt_number = attempt_number
    
    
    def do_some_work(_attempt_number):
        print(f"Do some work {_attempt_number.attempt_number}")
        time.sleep(4)
        print(f"Some work is done! {_attempt_number.attempt_number}")
        _attempt_number.attempt_number += 1
    
    
    if __name__ == "__main__":
        attempt_number = AttemptCount(1)
        time.sleep(1)  # imagine you would like to start work in 1 minute first time
        while True:
            thr = threading.Thread(target=do_some_work, args=(attempt_number, ),)
            thr.start()
            time.sleep(1)  # do work every one hour
    

    The result you'll gey in the case is:

    Do some work 1 Do some work 1 Do some work 1 Do some work 1 Some work is done! 1 Do some work 2 Some work is done! 2 Do some work 3 Some work is done! 3 Do some work 4 Some work is done! 4 Do some work 5 Some work is done! 5 Do some work 6 Some work is done! 6 Do some work 7 Some work is done! 7 Do some work 8 Some work is done! 8 Do some work 9

    I like using subprocess.Popen for such tasks, if the child subprocess did not finish it's work within one hour due to any reason, you just terminate it and start a new one.

    You also can use CRON to schedule some process to run every one hour.