I want to extract twitter feeds related to a keyword for the months of June and July using Apache Flume. Can this be done in the first place?
AFAIK, the TwitterSource
from Cloudera is just for receiving data at the same time it is generated. I think something similiar occurs with the Twitter 1% firehose source.
Nevertheless, I'm seeing the Twitter API may work with timelines, thus it is a matter of modifying the TwitterSource
source code.