Search code examples
rpostgresqlsubsetdata-miningdata-wrangling

Grouping data by id in R


i would like to ask your help how to group the observations based on their id in the effective way using R software.

Here's the situation : I have a data frames in R called "ingredients" seems like below :

id ingredients
1 chicken
2 butter
3 rice
2 salt
1 lamb
1 shrimp
3 couscous

The idea is to group the observations based on their id.

Here is what I have done so far :

First, I extraced all the unique values of id :

list_id <- unique(ingredients$id)

list_id is a list of elements containing ("1", "2", "3")

Then, i create an empty list called ingredients_by_id :

ingredients_by_id <- list ()

Finally, I group the ingredients by their id

for(id in list_id){
ingredients_by_id[[id] <- subset(ingredients, id=id)
}

The list called ingredients_by_id contains 3 data frames (several data frames inside list). For example :

list[[1]] : | id | ingredients | | -------- | -------- | | 1 | chicken| | 1 | lamb | | 1 | shrimp |

list[[2]] : | id | ingredients | | -------- | -------- | | 2 | butter | | 2 | salt |

list[[3]] : | id | ingredients | | -------- | -------- | | 3 | rice | | 3 | couscous |

The problem is I have about 35000 id's and it takes a lot of time for me to grouping the observations based on their id. Because the worst fact is the first data frame (ingredients) contains about 25 millions of observations. This data is stored in PostgreSQL.

I expect to reduce the program execution time and make it faster as possible.

Could anyone help me to solve this problem ? I thank you in advance.

Regards,


Solution

  • Probably as simple as it gets in base R

    > by(df$ingredients,df$id,FUN=I)
    
    df$id: 1
    [1] "chicken" "lamb"    "shrimp" 
    ---------------------------------------------------------------------------------- 
    df$id: 2
    [1] "butter" "salt"  
    ---------------------------------------------------------------------------------- 
    df$id: 3
    [1] "rice"     "couscous"