Search code examples
twitterrapidminer

RapidMiner: Search Tweets according to date


I am a business administration student who is currently learning the basics in social media analytics for a research project. My aim at the moment is to track the use of a keyword in tweets. I downloaded RapidMiner and figured out how to search for keywords. However, is there any possibility to fugure out how often the keyword was used in a certain time frame? Can I filter the results so that, as an example, only tweets containing my keyword from December 2017 will be displayed?

Thank you very much for considering my question.


Solution

  • if you have your data extracted as a RapidMiner ExampleSet, you can use the Aggregate-Operator to count the different key words used. Or you can simply use the Filter Examples-Operator to only show the tweets containing the key word. See process below for a simple example. Just copy&paste the xml into the process view of RapidMiner.

    Also feel free to ask further, or re-post, questions in the RapidMiner community forum.

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_direct_mailing_data" compatibility="8.0.001" expanded="true" height="68" name="Generate Direct Mailing Data" width="90" x="45" y="34">
        <description align="center" color="transparent" colored="false" width="126">Generic sample data.&lt;br&gt;We use the &amp;quot;sports&amp;quot; Attribute as key words</description>
      </operator>
      <operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="246" y="34"/>
      <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="340">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="sports.equals.athletics"/>
        </list>
        <description align="center" color="yellow" colored="true" width="126">Alternatively we can filter for a specific sport and then count.</description>
      </operator>
      <operator activated="true" class="aggregate" compatibility="8.0.001" expanded="true" height="82" name="Aggregate (2)" width="90" x="715" y="340">
        <parameter key="use_default_aggregation" value="true"/>
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="sports"/>
        <parameter key="default_aggregation_function" value="count"/>
        <list key="aggregation_attributes"/>
        <description align="center" color="yellow" colored="true" width="126">Type your comment</description>
      </operator>
      <operator activated="true" class="aggregate" compatibility="8.0.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="sports"/>
        <parameter key="default_aggregation_function" value="count"/>
        <list key="aggregation_attributes">
          <parameter key="sports" value="count"/>
        </list>
        <parameter key="group_by_attributes" value="sports"/>
        <description align="center" color="green" colored="true" width="126">The &amp;quot;group by&amp;quot; and the &amp;quot;aggregation&amp;quot; attributes are both set to &amp;quot;sports&amp;quot;</description>
      </operator>
      <connect from_op="Generate Direct Mailing Data" from_port="output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Aggregate (2)" to_port="example set input"/>
      <connect from_op="Aggregate (2)" from_port="example set output" to_port="result 2"/>
      <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>