Search code examples
pythonredditpraw

Why reddit cloudsearch returns wrong results with timestamp search?


I have problem with this search:

list(r.search('timestamp:{}..{}'.format(ts1,ts2), sort='new', subreddit=subreddit, syntax='cloudsearch',limit=None))

It gets ~1000 newest submissions from timestamp ts1 (in my case subreddit creation time) to ts2

What my script does is:

  1. Get newest submissions
  2. Take creation time of second newest submission and set it as ts2
  3. Do the search with new timestamp

If after first search I got submissions 1,2,3,4,5,6,7,8,9, then after second one I expect to get 3,4,5,6,7,8,9, unfortunately I don't get them, but get something like 7,8,9. Any idea why?

Below is my script and sample results.

Results:

t3_4zh8zw, 1472107937.0
t3_4zgl1n, 1472096403.0
t3_4zgf34, 1472093883.0
t3_4zg8de, 1472091260.0
t3_4zfzun, 1472087983.0
t3_4zfysv, 1472087571.0
t3_4zf8hg, 1472077921.0
t3_4zf7g6, 1472077542.0
t3_4zf4p5, 1472076595.0
t3_4zf0d7, 1472075090.0
t3_4zeqeg, 1472071708.0
t3_4zeomz, 1472071134.0
t3_4zebse, 1472066994.0
t3_4zduso, 1472061376.0
t3_4zdtne, 1472061014.0
#######################
t3_4zebse, 1472066994.0
t3_4zduso, 1472061376.0
t3_4zdtne, 1472061014.0
t3_4zdipi, 1472057168.0
t3_4zdfj3, 1472056078.0
t3_4zd4v3, 1472052437.0
t3_4zd0l5, 1472051081.0
t3_4zctiu, 1472048701.0
t3_4zazqj, 1472016633.0
t3_4zawm3, 1472015079.0
t3_4zavyc, 1472014757.0
t3_4za5hb, 1472003960.0
t3_4z9ydt, 1472001398.0
t3_4z9xhx, 1472001065.0
t3_4z9ufa, 1471999935.0

Script:

import praw
import time

user_agent = 'clodsearch-timestamp test'
r = praw.Reddit(user_agent=user_agent)

subreddit = r.get_subreddit('laptops')

ts1 = int(subreddit.created_utc)-1
ts2 = int(time.time())

submissions = list(r.search('timestamp:{}..{}'.format(ts1,ts2), sort='new', subreddit=subreddit, syntax='cloudsearch',limit=None) )

for submission in submissions[:15]:
    print("{}, {}".format(submission.fullname, submission.created_utc))

ts2 = int(submissions[1].created_utc) - 1

print('#######################')

submissions = list(r.search('timestamp:{}..{}'.format(ts1,ts2), sort='new', subreddit=subreddit, syntax='cloudsearch',limit=None) )

for submission in submissions[:15]:
    print("{}, {}".format(submission.fullname, submission.created_utc))

Solution

  • For cloudsearch as far as I can gather you shouldn't use created_utc.

    If you change submission.created_utc to just submission.created you will get exactly the behaviour you need.

    This is due to cloudsearch using epochtime directly. There is no need to convert it to UTC or GMT, and doing so will have different effects depending on your timezone.