I want scheduled to run my python script every hour and save the data in elasticsearch index. So that I used a function I wrote, set_interval which uses the tweepy library. But it doesn't work as I need it to work. It runs every minute and save the data in index. Even after the set that seconds equal to 3600 it runs in every minute. But I want to configure this to run on an hourly basis.
How can I fix this? Heres my python script:
def call_at_interval(time, callback, args):
while True:
timer = Timer(time, callback, args=args)
timer.start()
timer.join()
def set_interval(time, callback, *args):
Thread(target=call_at_interval, args=(time, callback, args)).start()
def get_all_tweets(screen_name):
# authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
screen_name = ""
# initialize a list to hold all the tweepy Tweets
alltweets = []
# make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name=screen_name, count=200)
# save most recent tweets
alltweets.extend(new_tweets)
# save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
# keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
#print
#"getting tweets before %s" % (oldest)
# all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name=screen_name, count=200, max_id=oldest)
# save most recent tweets
alltweets.extend(new_tweets)
# update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#print
#"...%s tweets downloaded so far" % (len(alltweets))
outtweets = [{'ID': tweet.id_str, 'Text': tweet.text, 'Date': tweet.created_at, 'author': tweet.user.screen_name} for tweet in alltweets]
def save_es(outtweets, es): # Peps8 convention
data = [ # Please without s in data
{
"_index": "index name",
"_type": "type name",
"_id": index,
"_source": ID
}
for index, ID in enumerate(outtweets)
]
helpers.bulk(es, data)
save_es(outtweets, es)
print('Run at:')
print(datetime.now())
print("\n")
set_interval(3600, get_all_tweets(screen_name))
Why do you need so much complexity to do some task every hour? You can run script every one hour this way below, note that it is runned 1 hour + time to do work:
import time
def do_some_work():
print("Do some work")
time.sleep(1)
print("Some work is done!")
if __name__ == "__main__":
time.sleep(60) # imagine you would like to start work in 1 minute first time
while True:
do_some_work()
time.sleep(3600) # do work every one hour
If you want to run script exactly every one hour, do the following code below:
import time
import threading
def do_some_work():
print("Do some work")
time.sleep(4)
print("Some work is done!")
if __name__ == "__main__":
time.sleep(60) # imagine you would like to start work in 1 minute first time
while True:
thr = threading.Thread(target=do_some_work)
thr.start()
time.sleep(3600) # do work every one hour
In this case thr is supposed to finish it's work faster than 3600 seconds, though it does not, you'll still get results, but results will be from another attempt, see the example below:
import time
import threading
class AttemptCount:
def __init__(self, attempt_number):
self.attempt_number = attempt_number
def do_some_work(_attempt_number):
print(f"Do some work {_attempt_number.attempt_number}")
time.sleep(4)
print(f"Some work is done! {_attempt_number.attempt_number}")
_attempt_number.attempt_number += 1
if __name__ == "__main__":
attempt_number = AttemptCount(1)
time.sleep(1) # imagine you would like to start work in 1 minute first time
while True:
thr = threading.Thread(target=do_some_work, args=(attempt_number, ),)
thr.start()
time.sleep(1) # do work every one hour
The result you'll gey in the case is:
Do some work 1 Do some work 1 Do some work 1 Do some work 1 Some work is done! 1 Do some work 2 Some work is done! 2 Do some work 3 Some work is done! 3 Do some work 4 Some work is done! 4 Do some work 5 Some work is done! 5 Do some work 6 Some work is done! 6 Do some work 7 Some work is done! 7 Do some work 8 Some work is done! 8 Do some work 9
I like using subprocess.Popen for such tasks, if the child subprocess did not finish it's work within one hour due to any reason, you just terminate it and start a new one.
You also can use CRON to schedule some process to run every one hour.