Search code examples
rdataframefactorslevels

How to fix 'invalid factor level'?


I can't run a mean function. Here is my code :

I've tried the factor(data$date) function alone successfully. The shell answers that it is made up of 890 entry of 51 levels.

   data <- read.table("R/DATA.csv", sep = ";", header = TRUE, dec = ",")
   View(data)
   colnames(data)[1] <- "Date"
   eau <- data$"Tension"
   eaucalculee <- ( 0.000616 * eau - 0.1671) * 100
   data["Eau"] <- eaucalculee
     tata <- data.frame("Aucun","Augmentation","Interception")

   tata[1,1]<-mean(data$Eau[data$Date == levels(factor(data$Date))[1]& 
   data$Traitement == "Aucun"])

I would like that te first column first row of the tata dataframe to be filled with the mean but in fact I get this error message :

   In `[<-.factor`(`*tmp*`, iseq, value = 8.6692) :
   invalid factor level, NA generated 

Could you help me please ?

You may find the csv file there : https://drive.google.com/file/d/1zbA25vajouQ4MiUF72hbeV8qP9wlMqB9/view?usp=sharing

Thank you very much


Solution

  • I'm not sure the line tata <- data.frame("Aucun","Augmentation","Interception") does what you expected. If you inspect its result with View(tata) you will see a data frame with one record and 3 columns whose values are your 3 strings (converted to factors, as @s-brunel said). The column names were inferred from their values (X.Aucun., etc). I guess you rather wanted to create a data frame whose column names are the given strings.

    Suggested code, with comments

    data <- read.table("R/DATA.csv", sep = ";", header = TRUE, dec = ",")
    
    # The following is useless since first column is already named Date
    # colnames(data)[1] <- "Date"
    
    # No need to create your intermediate variables eau and eaucalculee: you can 
    # do it directly with the data frame columns
    data$Eau <- ( 0.000616 * data$Tension - 0.1671) * 100
    
    # No need to create your tata data frame before filling its actual content, you
    # can do it directly
    tata <- data.frame(
      Aucun = mean(data$Eau[
        data$Date == levels(factor(data$Date))[1] & data$Traitement == "Aucun"
        ])
      )
    tata$Augmentation = your_formula_here
    tata$Interception = your_formula_here
    

    Note 1: The easiest way to reference a data frame column is with $ and you don't need to use any double quotes. You can also use [[ with the double quotes (equivalent), but beware of [ which will return a data frame with a single column:

    class(data$Date)
    # [1] "factor"
    class(data[["Date"]])
    # [1] "factor"
    class(data["Date"])
    # [1] "data.frame"
    class(data[ , "Date"])
    # [1] "factor"
    

    Note 2: Trying to reverse-engineer your code beyond the question you asked, maybe you want to compute the mean value of Eau for each combination of Date and Traitement. In this case, I would suggest you dplyr and tidyr from the awesome set of packages tidyverse:

    # install.packages("tidyverse") # if you don't already have it
    library(tidyverse)
    
    data <- data %>% 
      mutate(Eau = ( 0.000616 * data$Tension - 0.1671) * 100)
    
    tata_vertical <- data %>% 
      group_by(Date, Traitement) %>% 
      summarise(mean_eau = mean(eau))
    View(tata_vertical)
    
    tata <- tata_vertical %>% spread(Traitement, mean_eau)
    View(tata)
    

    A lot of documentation on https://www.tidyverse.org/learn/