I'm trying to use sapply to take each item in a list (e.g. "Golf","Malibu","Corvette") and create a new list with the highest value in the dataframe that list was split from (e.g. cars$sale_price). I'm trying to use an anonymous function to do so, but I can't get that function to work.
The basic issue here is that I'm not very good at writing functions.
First, I took the original dataframe cars and used split
to create a list of unique car names - I called this car_names.
Now, I'm trying to create a new list, using sapply, of the highest sale price of each type of car in the list. I'm sure I'm starting the thing correctly ...
price_list <- sapply(car_names,
... but I can't for the life of me get an anonymous function to simply apply max
to all instances of each car name in cars$sale price.
I've tried a bunch of stuff, all of which has returned an error. Here's an example:
price_list <- sapply(car_names, function(x) {
max(cars$saleprice[x])
})
Which returns:
Error in h115$nominate_dim1[x] : invalid subscript type 'list'
I'm sure this is trivially simply for even moderate experienced programmers, but I'm ... not one of those! I suspect that I'm pointing to something incorrectly, but I can't get past it. Any ideas?
Edit: Here's a reproducible example.
First, the "source" dataframe:
cars1 <- data.frame("car_names" = c("Corvette", "Corvette", "Corvette", "Golf", "Golf", "Golf", "Malibu", "Malibu", "Malibu"),"saleprice" = c(32000,45000,72000,7500,16000,22000,33000,21000,26500))
Next, splitting the df by car_names:
cars1_split <- split(cars1, cars1$car_names)
Now, attempting to pass max
to sapply
and getting an error:
maxes <- sapply(cars1_split, function(x){
max(cars1$saleprice[x])
})
Hopefully this give you guys something to work with!
You have a few options here, let's start with aggregate
- not what you asked for but I want to keep your attention high ;)
aggregate(saleprice ~ car_names, cars1, max)
# car_names saleprice
#1 Corvette 72000
#2 Golf 22000
#3 Malibu 33000
Returns a data.frame (which you can easily split
if you need a list)
aggregate
is similar to tapply
coming next
tapply(cars1$saleprice, cars1$car_names, FUN = max)
#Corvette Golf Malibu
# 72000 22000 33000
Or try by
and which.max
by(cars1, cars1$car_names, FUN = function(x) x[which.max(x$saleprice), ])
#cars1$car_names: Corvette
# car_names saleprice
#3 Corvette 72000
#-------------------------------
#cars1$car_names: Golf
# car_names saleprice
#6 Golf 22000
#-------------------------------
#cars1$car_names: Malibu
# car_names saleprice
#7 Malibu 33000
Finally, you can use also lapply
and split
(for which by
is somewhat shorthand)
lapply(split(cars1, cars1$car_names), function(x) x[which.max(x$saleprice), ])
#$Corvette
# car_names saleprice
#3 Corvette 72000
#$Golf
# car_names saleprice
#6 Golf 22000
#$Malibu
# car_names saleprice
#7 Malibu 33000