I have a large dataset, which I need to produce specific charts from. This is one dataset from a number that is generated by my analytical equipment. I am currently writing a function that will be able to automatically analyse these datasets, and to do this I can use the column in the dataset that is named "Labels".
When I use the table()
function I get the contents and the frequency of the "Labels" column I get the following:
> table(datafile$Label)
Blank C1 C2 C3a C3b C3c C4 DI E1 E2 E3 High Low Medium Mid
11 9 9 9 9 9 9 3 9 9 9 3 3 3 13
P pH3 pH5 pH7 pH9 test Test
9 5 5 5 5 2 1
What I would like to do is to create a vector that I will also call "Labels" that will only contain the labels that occur with a frequency of five or more.
I am then thinking of using a for loop of 1 to length of "Labels", and subset the data table using the rule datafile$Labels == Labels[n]
, where n = 1:length(Labels)
, to create charts for each label of interest in turn.
Is there a specific function that will extract the elements from the column where the condition can be set they occur five or more times? So from my example my new "Labels" vector would be as follows:
> Labels
[1] "Blank" "C1" "C2" "C3a" "C3b" "C3c" "C4" "E1" "E2" "E3" "P" "pH3" "pH5" "pH7" "pH9"
All suggestions will be gratefully received.
Thank you.
We can subset the table
with a logical condition and get the names
tbl <- table(dataFile$Label)
names(tbl)[tbl > 5]