I have a RapidMiner flow that takes a dataset and clusters it. In the output I can see my role, but I can't figure out a way to count the role per cluster. How can I count the number of roles per cluster. I've looked at the Aggregate node but my role isn't an available attribute.
Essentially, I'm trying to figure out if the clusters say anything about the role. I also use Weka and they call this "Classes to clusters evaluation". It basically shows how the class (or role) breakdown per cluster.
Only two attributes are available. My role isn't one of them.
There are 34 total attributes. I want to aggregate by ret_zpc
RapidMiner has the concept of roles. An attribute can be one of regular, id, cluster or label (and some others). There's even an operator, Set Role
that allows the role to be changed. Outside RapidMiner, role, label and class get used interchangeably.
For your question, the Aggregate
operator is what you need. Assuming you have an attribute in your example set with role Cluster
and another with role Label
you select these attributes as the ones to group by. For aggregation attribute, choose another attribute and select count as the aggregation function.
In your case, the attributes you want are not being populated in the drop downs but they can still be used. You just have to type them in manually and explicitly add them to the selection criteria. This absence of attributes can sometimes happen if RapidMiner cannot see any metadata for the attributes. If you change the Read CSV
operator so that it has an explicit mapping you should find that the attributes appear for selection.