Search code examples
rdataframevariable-types

How to select only numbers from a dataframe in R using which()


I have a large dataframe in R and am trying to do some stats tests on certain columns, but the non-programmers who made the csv file added a bunch of text notes that I need to ignore.

For example a column might have values: 12,20,40,missing,64,32,no input,45,10

How do I only select the numbers using the which statement? I failed miserably trying: my_data_frame$Column.Title[which(is.numeric(my_data_frame$Column.Title))]

What do I change in the which function to only select the numbers and ignore the text? Thanks!


Solution

  • You can use the built-in as.numeric() converter to do something like this:

    x <- my_data_frame$Column.Title
    xn <- as.numeric(x)
    which(!is.na(xn))
    

    This won't distinguish between NAs created by failed coercion and pre-existing (numeric) NA values.

    If there's a small enough variety of "missing" values you could read the data in with read.csv(..., na.strings=c("NA","missing","no input"))