I am a business administration student who is currently learning the basics in social media analytics for a research project. My aim at the moment is to track the use of a keyword in tweets. I downloaded RapidMiner and figured out how to search for keywords. However, is there any possibility to fugure out how often the keyword was used in a certain time frame? Can I filter the results so that, as an example, only tweets containing my keyword from December 2017 will be displayed?
Thank you very much for considering my question.
if you have your data extracted as a RapidMiner ExampleSet, you can use the Aggregate-Operator to count the different key words used. Or you can simply use the Filter Examples-Operator to only show the tweets containing the key word. See process below for a simple example. Just copy&paste the xml into the process view of RapidMiner.
Also feel free to ask further, or re-post, questions in the RapidMiner community forum.
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_direct_mailing_data" compatibility="8.0.001" expanded="true" height="68" name="Generate Direct Mailing Data" width="90" x="45" y="34">
<description align="center" color="transparent" colored="false" width="126">Generic sample data.<br>We use the &quot;sports&quot; Attribute as key words</description>
</operator>
<operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="246" y="34"/>
<operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="340">
<list key="filters_list">
<parameter key="filters_entry_key" value="sports.equals.athletics"/>
</list>
<description align="center" color="yellow" colored="true" width="126">Alternatively we can filter for a specific sport and then count.</description>
</operator>
<operator activated="true" class="aggregate" compatibility="8.0.001" expanded="true" height="82" name="Aggregate (2)" width="90" x="715" y="340">
<parameter key="use_default_aggregation" value="true"/>
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="sports"/>
<parameter key="default_aggregation_function" value="count"/>
<list key="aggregation_attributes"/>
<description align="center" color="yellow" colored="true" width="126">Type your comment</description>
</operator>
<operator activated="true" class="aggregate" compatibility="8.0.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="sports"/>
<parameter key="default_aggregation_function" value="count"/>
<list key="aggregation_attributes">
<parameter key="sports" value="count"/>
</list>
<parameter key="group_by_attributes" value="sports"/>
<description align="center" color="green" colored="true" width="126">The &quot;group by&quot; and the &quot;aggregation&quot; attributes are both set to &quot;sports&quot;</description>
</operator>
<connect from_op="Generate Direct Mailing Data" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Aggregate (2)" to_port="example set input"/>
<connect from_op="Aggregate (2)" from_port="example set output" to_port="result 2"/>
<connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>