Search code examples
rdataframeundefined

Column I've created is apparently "undefined"?


My code is as below. I have created a column "degree" based on another column which contains integers from 1 to 5.

My code below seems to work because the column has been created sucessfully. However, when I call any code based on the "degrees" column I get NULLL str(my_data$degree)

my_data %>%
mutate(degree = case_when(edcat > 3 ~ "1",                                 
 edcat <=3 ~ "0") )

This is what I get when I use "degree" in any code despite the fact I can see the column has been sucessfully created:

Error in [.data.frame(my_data, , "degree"): undefined columns selected
Traceback:

1. factor(my_data\[, "degree"\])
2. my_data\[, "degree"\]
3. [.data.frame(my_data, , "degree")
4. stop("undefined columns selected")

Solution

  • when you want to update(overwrite) a data frame with new calculation simply use <- like for a variable. However, its better to save in a new df to check the result and keep copy of original (for a beginner to compare input and output) here I save it in my_result. Or instead use my_data <-

    my_result<- my_data %>%
    mutate(degree = case_when(
     edcat > 3 ~ "1",                                 
     edcat <=3 ~ "0"))
    

    Or if you are using same df in next processes:

    my_data<- my_data %>%
        mutate(degree = case_when(
         edcat > 3 ~ "1",                                 
         edcat <=3 ~ "0"))
    

    with sample data for edcat :

    my_data <- data.frame('edcat'= c(1,2,3,5,6,8))
    my_data <- my_data%>%mutate(degree = case_when(
      edcat > 3 ~ "1",                                 
      edcat <=3 ~ "0"))
    
    my_data
    
      edcat degree
    1     1      0
    2     2      0
    3     3      0
    4     5      1
    5     6      1
    6     8      1
    

    Now you can use it any way say count of degrees:

    my_data%>%group_by(degree)%>%summarise(N=n())
    # A tibble: 2 x 2
      degree     N
      <chr>  <int>
    1 0          3
    2 1          3
    

    But all this is basic. Please check good resources to learn dplyr like Hadley Wickams R 4 Data Science