I am using the Twitter public stream API to search for some keywords. I am writing my script in Java and therefore I use twitter4j. Now I stumbled over the information about status deletion notices:
Status deletion notices (delete)
These messages indicate that a given Tweet has been deleted. Client code must honor these messages by clearing the referenced Tweet from memory and any storage or archive, even in the rare case where a deletion message arrives earlier in the stream that the Tweet it references.
https://dev.twitter.com/docs/streaming-apis/messages#Status_deletion_notices_delete
So I created methods to remove records from my database when such a notice occurs. Unfortunately such a notice never occurs. I searched to find out what I am doing wrong and found some posts in the twitter developer section concerning the same problem:
but unfortunately all these discussions got no answer. So for me it seems like I did no mistake with my code but twitter4j never sends me an deletion notice.
I want to respect the privacy of the twitter users - at least for legal reasons. So my question is:
One alternative seems to be to periodically iterate through all saved Tweets in my Database and request them from twitter to see whether I get a result back or not (so they were deleted). But this doesn't seem to be a practicable way because the data will get more and more and therefore at some point of time I will have limitations (in time, allowed twitter requests, ...). So what should I do?
Thanks in advance! Your help is greatly appreciated.
Ludwig
twitter4j v.3.0.6
Given the nature of the volume of tweets, it's unreasonable to assume that you would check to see if all the tweets are still there. You should make sure that you properly act on a delete notice from twitter. The onus is on them to actually send the delete notification.
That being said, I receive delete notifications from twitter. However, we aren't using the public stream, we are using sitestreams, which relies on authorizing specific social accounts and streaming all updates for those accounts (e.g. favorites, follows, blocks, tweets, retweets, etc) to us in realtime.
If you are doing a stream with filters, for example, it's probably not feasible (or at least very taxing) to run all deleted items through the same pipeline as new items. Or perhaps, to guess at which you were sent based on the times that you were running your filter.
As noted in the issue you linked to, the public streaming API will not necessarily send them out. I'd endeavor to handle them, and possibly provide a tool to manually remove any if a request comes in through another channel, but not worry too much about it, given that twitter doesn't provide the proper facility to be notified of such instances.