Search code examples
rlapplyanonymous-functionsapply

Using anonymous functions with lapply/sapply in R?


I'm trying to use sapply to take each item in a list (e.g. "Golf","Malibu","Corvette") and create a new list with the highest value in the dataframe that list was split from (e.g. cars$sale_price). I'm trying to use an anonymous function to do so, but I can't get that function to work.

The basic issue here is that I'm not very good at writing functions.

First, I took the original dataframe cars and used split to create a list of unique car names - I called this car_names.

Now, I'm trying to create a new list, using sapply, of the highest sale price of each type of car in the list. I'm sure I'm starting the thing correctly ...

price_list <- sapply(car_names, 

... but I can't for the life of me get an anonymous function to simply apply max to all instances of each car name in cars$sale price.

I've tried a bunch of stuff, all of which has returned an error. Here's an example:

price_list <- sapply(car_names, function(x) {
    max(cars$saleprice[x])
})

Which returns:

Error in h115$nominate_dim1[x] : invalid subscript type 'list'

I'm sure this is trivially simply for even moderate experienced programmers, but I'm ... not one of those! I suspect that I'm pointing to something incorrectly, but I can't get past it. Any ideas?


Edit: Here's a reproducible example.

First, the "source" dataframe:

cars1 <- data.frame("car_names" = c("Corvette", "Corvette", "Corvette", "Golf", "Golf", "Golf", "Malibu", "Malibu", "Malibu"),"saleprice" = c(32000,45000,72000,7500,16000,22000,33000,21000,26500))

Next, splitting the df by car_names:

cars1_split <- split(cars1, cars1$car_names)

Now, attempting to pass max to sapply and getting an error:

maxes <- sapply(cars1_split, function(x){
  max(cars1$saleprice[x])
})

Hopefully this give you guys something to work with!


Solution

  • You have a few options here, let's start with aggregate - not what you asked for but I want to keep your attention high ;)

    aggregate(saleprice ~ car_names, cars1, max)
    #  car_names saleprice
    #1  Corvette     72000
    #2      Golf     22000
    #3    Malibu     33000
    

    Returns a data.frame (which you can easily split if you need a list)

    aggregate is similar to tapply coming next

    tapply(cars1$saleprice, cars1$car_names, FUN = max)
    #Corvette     Golf   Malibu 
    #   72000    22000    33000
    

    Or try by and which.max

    by(cars1, cars1$car_names, FUN = function(x) x[which.max(x$saleprice), ])
    #cars1$car_names: Corvette
    #  car_names saleprice
    #3  Corvette     72000
    #-------------------------------
    #cars1$car_names: Golf
    #  car_names saleprice
    #6      Golf     22000
    #-------------------------------
    #cars1$car_names: Malibu
    #  car_names saleprice
    #7    Malibu     33000
    

    Finally, you can use also lapply and split (for which by is somewhat shorthand)

    lapply(split(cars1, cars1$car_names), function(x) x[which.max(x$saleprice), ])
    #$Corvette
    #  car_names saleprice
    #3  Corvette     72000
    
    #$Golf
    #  car_names saleprice
    #6      Golf     22000
    
    #$Malibu
    #  car_names saleprice
    #7    Malibu     33000