r dataframe filter subset data-manipulation

Filtering Dataframe by keeping numeric values of a specific column only in R

I've the following dataframe

Group	Cost	Year
A	21	2017
A	22	2016
A	closed	2015
B	12	2017
B	11	2016
C	ended	2017
C	closing	2016
C	13	2015

Group = c("A", "A", "A", "B", "B", "C", "C", "C")
Cost = c(21,22,"closed", 12, 11,"ended", "closing", 13)
Year = c(2017,2016,2015,2017,2016,2017,2016,2015)
df = data.frame(Group, Cost, Year)

Anybody knows how I can filter the dataframe (df) based on keeping over the numeric values in the Cost column? The outcome should be the following table

Group	Cost	Year
A	21	2017
A	22	2016
B	12	2017
B	11	2016
C	13	2015

Thanks in advance!

Solution

You could use a regular expression to filter the relevant rows of your dataframe. The regular expression ^\\d+(\\.\\d+)?$ will check for character that contains only digits, possibly with . as a decimal separator (i.e. 2, 2.3). You could then convert the Cost column to numeric using as.numeric() if needed.

See the example below:

Group = c("A", "A", "A", "B", "B", "C", "C", "C")
Cost = c(21,22,"closed", 12, 11,"ended", "closing", 13)
Year = c(2017,2016,2015,2017,2016,2017,2016,2015)
df = data.frame(Group, Cost, Year)


df[grep(pattern = "^\\d+(\\.\\d+)?$", df[,"Cost"]), ]
#>   Group Cost Year
#> 1     A   21 2017
#> 2     A   22 2016
#> 4     B   12 2017
#> 5     B   11 2016
#> 8     C   13 2015

Note that this technique works even if your Cost column is of factor class while using df[!is.na(as.numeric(df$Cost)), ] does not. For the latter you need to add as.character() first: df[!is.na(as.numeric(as.character(df$Cost))), ]. Both techniques keep factor levels.