I have a for loop im trying to convert to a mapply, as I have read that it is faster than for (for loop takes about 2 minutes).
The loop does this: it creates subsets filtering by the unique names of column "OrdenFab" and then, it keeps the unrepeated values on the "Valor" column.
Then it adds this filtered subset to a new data frame, and it keeps adding them all as the loop goes on, getting a filtered dataframe with no repeated values in column "Valor" for each unique value of the column "OrdenFab".
i<-unique(datapesomolde$OrdenFab)
datapesomoldefiltered<-data.frame()
for (j in i){
datapesomoldetemp<-datapesomolde%>%
filter(OrdenFab==j)%>%
filter(!duplicated(Valor))
datapesomoldefiltered<-rbind(datapesomoldefiltered,datapesomoldetemp)
}
Original dataframe is this one (first 20 rows, it has 20626):
> datapesomolde
PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
2 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
3 11013610 226548648 47.30000 2022-04-25 05:52:00 42.38 49.26 45.82
4 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
5 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
6 11013047 226548234 15.20000 2022-04-23 02:48:00 14.43 16.77 15.60
7 11013047 226548234 15.20000 2022-04-23 02:48:00 14.43 16.77 15.60
8 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
9 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
10 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
11 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
12 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
13 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
14 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
15 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
16 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
17 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
18 11013943 226548154 134.00000 2022-04-23 00:07:00 131.76 153.13 142.44
19 11013943 226547066 144.00000 2022-04-22 23:31:00 131.76 153.13 142.44
20 11013050 226547200 15.10000 2022-04-22 23:27:00 14.34 16.66 15.50
Filtered result is this one (first 10 rows):
>datapesomoldefiltered
PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
2 11013610 226548648 47.30000 2022-04-25 05:52:00 42.38 49.26 45.82
3 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
4 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
5 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
6 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
7 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
8 11013943 226548154 134.00000 2022-04-23 00:07:00 131.76 153.13 142.44
9 11013943 226547066 144.00000 2022-04-22 23:31:00 131.76 153.13 142.44
10 11013050 226547200 15.10000 2022-04-22 23:27:00 14.34 16.66 15.50
I'm struggling to convert it to mapply, I am getting a Matrix not a dataframe.
I have tried this:
i<-unique(datapesomolde$OrdenFab)
datapesomoldefiltered<-data.frame()
limpiarof<-function(i){
subset<-filter(datapesomolde,OrdenFab==i)
datapesomoldetemp<-filter(subset,!duplicated(subset$Valor))
return(datapesomoldefiltered<-rbind(datapesomoldefiltered,datapesomoldetemp))
}
datapesomoldefiltered<-mapply(limpiarof,i)
With this try I get a Matrix of 2.2GB, it just has the value of all the columns for each unique value of the "OrdenFab" column.
Can you help me please? Thanks in advance.
I would suggest solving this problem using a more abstract approach, using e.g. tidyverse
:
This should be much faster and clearer:
library(tidyverse)
datapesomoldefiltered <-
datapesomolde |>
group_by(OrdenFab) |>
distinct(Valor, .keep_all = TRUE) |>
ungroup()
datapesomoldefiltered