Search code examples
rdataframefiltersubsetdata-manipulation

Filtering Dataframe by keeping numeric values of a specific column only in R


I've the following dataframe

Group Cost Year
A 21 2017
A 22 2016
A closed 2015
B 12 2017
B 11 2016
C ended 2017
C closing 2016
C 13 2015
Group = c("A", "A", "A", "B", "B", "C", "C", "C")
Cost = c(21,22,"closed", 12, 11,"ended", "closing", 13)
Year = c(2017,2016,2015,2017,2016,2017,2016,2015)
df = data.frame(Group, Cost, Year)

Anybody knows how I can filter the dataframe (df) based on keeping over the numeric values in the Cost column? The outcome should be the following table

Group Cost Year
A 21 2017
A 22 2016
B 12 2017
B 11 2016
C 13 2015

Thanks in advance!


Solution

  • You could use a regular expression to filter the relevant rows of your dataframe. The regular expression ^\\d+(\\.\\d+)?$ will check for character that contains only digits, possibly with . as a decimal separator (i.e. 2, 2.3). You could then convert the Cost column to numeric using as.numeric() if needed.

    See the example below:

    Group = c("A", "A", "A", "B", "B", "C", "C", "C")
    Cost = c(21,22,"closed", 12, 11,"ended", "closing", 13)
    Year = c(2017,2016,2015,2017,2016,2017,2016,2015)
    df = data.frame(Group, Cost, Year)
    
    
    df[grep(pattern = "^\\d+(\\.\\d+)?$", df[,"Cost"]), ]
    #>   Group Cost Year
    #> 1     A   21 2017
    #> 2     A   22 2016
    #> 4     B   12 2017
    #> 5     B   11 2016
    #> 8     C   13 2015
    

    Note that this technique works even if your Cost column is of factor class while using df[!is.na(as.numeric(df$Cost)), ] does not. For the latter you need to add as.character() first: df[!is.na(as.numeric(as.character(df$Cost))), ]. Both techniques keep factor levels.