I have an array of retail item prices, and am trying to create a new column that indicates which store has the lowest price.
The rows are the items, while the columns are the retailer names.
Not all of the retailers carry each product, so I need to ignore cells where the value is NA (instead of treating NA values as the lowest-cost item).
Setting everything up, I have
library(dplyr)
# Data
itemName <- c('Apples', 'Bananas', 'Milk', 'Eggs', 'Cereal')
Aldi <- c(NA, 0.55, 1.95, 157.00, 1.10)
Walmart <- c(0.75, 0.69, 2.15, 192.00, 1.10)
Costco <- c(NA, NA, 0.16, 75.00, NA)
Creating the df with a column that identifies the lowest price
retailPRICE <- data.frame(itemName, Aldi, Walmart, Costco)%>%
mutate(cheapestPRICE = pmin(Aldi, Walmart, Costco, na.rm = TRUE))
I'm trying to create a column that looks at all the prices, and pulls in the name of the store with the cheapest price.
retailPRICE$cheapestSTORE <- names(retailPRICE)
[minCol(replace(retailPRICE, is.na(retailPRICE), 0),
ties.method = "first")]
| itemName | Aldi | Walmart | Costco | cheapestPRICE | cheapestSTORE |
| -------- | ------ | -------- | -------- | ------------- --------------
| Apples | NA | 0.75 | NA | 0.75 | Walmart |
| Bananas | 0.55 | 0.69 | NA | 0.55 | Aldi |
| Milk | 1.95 | 2.15 | 0.16 | 0.16 | Costco |
| Eggs | 157.00 | 192.00 | 75.00 | 75.00 | Costco |
| Cereal | 1.10 | 1.10 | NA | 1.10 | Aldi |
Attempting to create the cheapestSTORE column gives the following error: Error in FUN(left) : invalid argument to unary operator
Before I ran into the Error in FUN issue, I had a problem with how minCol
was identifying the lowest value column. It was seeing NAs as the lowest value, so it would pull in the name of that store. I need to be able to exclude any NA entries from the assessment of which column is being indexed.
| itemName | Aldi | Walmart | Costco | cheapestPRICE | cheapestSTORE |
| -------- | ------ | -------- | -------- | ------------- --------------
| Apples | NA | 0.75 | NA | 0.75 | Aldi |
| Bananas | 0.55 | 0.69 | NA | 0.55 | Costco |
| Milk | 1.95 | 2.15 | 0.16 | 0.16 | Costco |
| Eggs | 157.00 | 192.00 | 75.00 | 75.00 | Costco |
| Cereal | 1.10 | 1.10 | NA | 1.10 | Costco |
As you can see, the only row that is being evaluated correctly is Eggs. All of the other items are out of stock in at least one store, which is causing the cheapestSTORE column to identify wrong location as being the cheapest.
Can you please help me figure out what I should be doing when creating the cheapestSTORE column? Thank you!!
Because your data has prices in multiple columns, it is not tidy in the sense of Wickham 2014. I would suggest you reshape your data to long format first, so that each column represent a single variable: product, store and price.
library(dplyr); library(tidyr)
retailPRICE <- data.frame(itemName, Aldi, Walmart, Costco)
retailPRICElong <- retailPRICE |>
pivot_longer(-itemName, names_to = 'Store', values_to = 'Price') |>
drop_na()
We simply drop all the not-in-stock items by dropping NA
values.
Then these operations become much easier, and lead to code that is simple and easy to read:
summarise(
retailPRICElong,
cheapestPrice = min(Price),
cheapestStore = Store[which.min(Price)], # in the case of ties, first one
.by = itemName
)
# A tibble: 5 × 3 itemName cheapestPrice cheapestStore <chr> <dbl> <chr> 1 Apples 0.75 Walmart 2 Bananas 0.55 Aldi 3 Milk 0.16 Costco 4 Eggs 75 Costco 5 Cereal 1.1 Aldi
For this case in particular, all you would need is:
slice_min(retailPRICElong, Price, by = itemName)
which also includes ties by default
# A tibble: 6 × 3 itemName Store Price <chr> <chr> <dbl> 1 Apples Walmart 0.75 2 Bananas Aldi 0.55 3 Milk Costco 0.16 4 Eggs Costco 75 5 Cereal Aldi 1.1 6 Cereal Walmart 1.1