Search code examples
rsubsetdata-extractionfrequency-analysis

Extracting most frequent occurring elements from column in datafile in r


I have a large dataset, which I need to produce specific charts from. This is one dataset from a number that is generated by my analytical equipment. I am currently writing a function that will be able to automatically analyse these datasets, and to do this I can use the column in the dataset that is named "Labels".

When I use the table() function I get the contents and the frequency of the "Labels" column I get the following:

> table(datafile$Label)

 Blank     C1     C2    C3a    C3b    C3c     C4     DI     E1     E2     E3   High    Low Medium    Mid 
    11      9      9      9      9      9      9      3      9      9      9      3      3      3     13 
     P    pH3    pH5    pH7    pH9   test   Test 
     9      5      5      5      5      2      1 

What I would like to do is to create a vector that I will also call "Labels" that will only contain the labels that occur with a frequency of five or more.

I am then thinking of using a for loop of 1 to length of "Labels", and subset the data table using the rule datafile$Labels == Labels[n], where n = 1:length(Labels), to create charts for each label of interest in turn.

Is there a specific function that will extract the elements from the column where the condition can be set they occur five or more times? So from my example my new "Labels" vector would be as follows:

> Labels

[1]    "Blank" "C1" "C2" "C3a" "C3b" "C3c" "C4" "E1" "E2" "E3" "P" "pH3" "pH5" "pH7" "pH9"

All suggestions will be gratefully received.

Thank you.


Solution

  • We can subset the table with a logical condition and get the names

    tbl <- table(dataFile$Label)
    names(tbl)[tbl > 5]