Search code examples
rtime-serieslarge-datacoerce

Coercing multiple time-series columns to factors in large dataframe


I would like to know if there is an "easy/quick" way to convert character variables to factor.

I am aware, that one could make a vector with the column names and then use lapply. However, I am working with a large data frame with more than 200 variables, so it would be preferable not having to write the 200+ names in the vector.

I am also aware that I can coerce the entire data frame by using lapply, type.convert and sapply, but as I am working with time series data where some is categorical, and some is numerical, I am not interested in that either.

Is there any way to use the column number in this? I.e. [ ,2:200]? I tried the following, but without any luck:

df[ ,2:30] <- lapply(df[ ,2:30], type.convert)
sapply(df, factor)

With the solution above, I would still have to do multiple of them, but it would still be quicker than writing all the variable names.

I also have a feeling a loop might be usable here, but I would not be sure of how to write it out, or if it is even a way to do it.


Solution

  • As you write, that you need to convert (all?) character variables to factors, you could use mutate_if from dplyr

    library(dplyr)
    mutate_if(df, is.character, as.factor)
    

    With this you only operate on columns for which is.character returns TRUE, so you don't need to worry about the column positions or names.