I have a problem with R, because I need to do some operations on a list and create a new dataframe with those values from the list. If I use a for loop it takes very long time to do. And don't know how to avoid that for loop, and how to do that "if + case_when" without that for loop.
In the code below there are comments to explain what I do and what happends.
Thanks a lot!!
#search in all rows of list "total"
for(i in 1:nrow(total)) {
#Take with total$Cad[[i]] a value from another list
val1 <- posdi[posdi$cad == str_to_upper(total$Cad[[i]]),]
#Check if "font" value from val1 is equal to "Taake" and take the value
val2 <- val1[val1$font == "Taake",]
#Format date value
thedate <- as.numeric(format(as.Date(total$TheDate[[i]], format="%Y-%m-%d"), '%Y%m%d'))
#And here comes where I can't continue easily. I want to do an IF and make a different
#case_when if the result is between 1 and 5 or between 6 and 7
if(total$dia[[i]] >= 1 & total$dia[[i]] <= 5) {
fran = case_when(
total$secs[[i]]>=0 & total$secs[[i]]<1.5 ~ 1,
total$secs[[i]]>=1.5 & total$secs[[i]]<4 ~ 2,
total$secs[[i]]>=4 & total$secs[[i]]<8 ~ 3,
total$secs[[i]]>=8 & total$secs[[i]]<10 ~ 4)
} else {
fran = case_when(
total$secs[[i]]>=0 & total$secs[[i]]<1.5 ~ 5,
total$secs[[i]]>=1.5 & total$secs[[i]]<4 ~ 6,
total$secs[[i]]>=4 & total$secs[[i]]<8 ~ 7,
total$secs[[i]]>=8 & total$secs[[i]]<10 ~ 8)
}
#and finally, add that "fran" value, those three from the beggining and some from total list to a new dataframe
datosTel[nrow(datosTel) + 1,] = c(val2$cad, str_to_upper(total$Camp[[i]]), total$numsem[[i]], thedate, total$diasem[[i]], fran, 0)
}
#It works with the "for" loop, but it take so much time (it goes one by one and the list has more than 200K rows).
#How can I do it without that for loop and make the "if + case_when" correctly?
Thanks again and have a good day
As said before, my problems are the FOR loop and that IF and CASE_WHEN inside the FOR, because I don't know how to do it if there is no loop
The code inside your loop only touches the current element ([[i]]
), and all operations you are performing are vectorised by default (except for the if
, but we can replace that directly by if_else
).
So you can replace the entire loop with a mutate
or transmute
statement (they do the same, transmute
just does not keep existing columns, so it seems more appropriate in your case).
Furthermore, you can simplify the if
by merging the two branches and adding an offset that depends on total$dia
.
Lastly, your case_when
expression happens to be expressible as a findInterval
expression.
In the following I am assuming that datosTel
is an empty table before your loop, and I am also making some assumptions about the column names which you may need to adjust.
datosTel = total %>%
transmute(
cad = posdi$cad[posdi$cad == str_to_upper(Cad) & posdi$font == "Taake"],
Camp = str_to_upper(Camp),
numsem = numsem,
thedate = as.numeric(format(as.Date(TheDate, format="%Y-%m-%d"), '%Y%m%d')),
diasem = diasem,
offset = if_else(dia >= 1 & dia <= 5, 0, 4),
fran = offset + findInterval(secs, c(0, 1.5, 4, 8, 10, Inf)),
LAST_COLUMN = 0
) %>%
select(-offset)
(Replace LAST_COLUMN
with the actual column name.)
The findInterval
call is equvialent to:
case_when(
secs >= 0 & secs < 1.5 ~ 1,
secs >= 1.5 & secs < 4 ~ 2,
secs >= 4 & secs < 8 ~ 3,
secs >= 8 & secs < 10 ~ 4
)