Consider the following columns in my dataset:
df$PT : contain strings with repeating pattern. Example: [1] "60D 0%" "5M 2%" "4 2ND M 5%" ...
df$date : column of dates [1] "2021-01-18" "2021-01-18" "2021-01-18" ...
I managed to create a function that reads inputs from the columns above, makes operations with them and returns another date (let's call it date2). The function works fine (I tested it by passing its arguments manually):
function1(PT,date) {
#if/else chain to generate date2
from PT
and date
#function returns either (date2) or NA according to if/else conditions
}
So far so good. The problem comes when I try to use sapply to apply my function1 for every single term of column df$PT and store the output (which I want to be either a single date or NA for every term in df$PT) in df$new-col, such as:
df$new_col <- sapply(df$PT,function1,date=df$date)
But instead of having the expected output in df$new_col in date format as:
date2a date2b date2c date2d ...
I am obtaining only the first output repeated everywhere, and in string format of a date: 18705 18705 18705 18705 ...
What can be going on and how do I solve it to get the correct calculations of date2 in df$new_col?
Thank you for your help!
Because R is vectorized you can create new df columns directly from existing columns. E.g.:
cars <- mtcars
cars$new <- ifelse(cars$cyl == 6 & cars$mpg > 20, "NewVal", NA)
head(cars)
mpg cyl disp hp drat wt qsec vs am gear carb new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 NewVal
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 NewVal
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 <NA>
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 NewVal
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 <NA>
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 <NA>