Search code examples
twittersocial-networkingtext-miningsocial-media

Databases for filtering pop culture quotes or celebrity-related data out of text (e.g., tweets)?


I am trying to mine social media data, such as tweets. However, social media data have a lot of noise- for example people discussing celebrities or quoting a movie/TV/song, that is something most generally that is not about themselves or somebody they actually know personally.

So, is: are there any dynamic (i.e., automatically updated) databases on the most popular current celebrities? Movie quotes that they are in or song lyrics that they sing would also be relevant.


Solution

  • I don't think such a curated list exists. Smaller ones do exist, for example the 100 top movies quotes on Wikipedia. However, these are not updated.

    One possibility is to filter out the aspects of your input that appear on another social media site that tracks trends, such as Delicious. Unless you are looking for trends, something that rises to the top of two trending sites likely ... is just a trend.

    Delicious has a nice Python wrapper for its API.

    In Pythonic pseudocode,

     data = social-media.content
     data = filter(lambda datum: datum not in delicious.content-list,data)