Search code examples
rdataframerowapplycol

Calculate formula over all rows and specific columns of dataframe


I have the following sample dataframe with prices of toys in different shops:

dfData <- data.frame(article = c("Fix", "Foxi", "Stan", "Olli", "Barbie", "Ken", "Hulk"),
                     priceToys1 = c(10, NA, 10.5, NA, 10.7, 11.2, 12.0),
                     priceAllToys = c(NA, 11.4, NA, 11.9, 11.7, 11.1, NA),
                     price123Toys = c(12, 12.4, 12.7, NA, NA, 11.0, 12.1))

Additionally I generate a min price column by adding:

dfData$MinPrice <- apply(dfData[, grep("price", colnames(dfData))], 1, FUN=min, na.rm = TRUE)

So I have this dataframe now:

#  article priceToys1 priceAllToys price123Toys MinPrice
#1     Fix       10.0           NA         12.0     10.0
#2    Foxi         NA         11.4         12.4     11.4
#3    Stan       10.5           NA         12.7     10.5
#4    Olli         NA         11.9           NA     11.9
#5  Barbie       10.7         11.7           NA     10.7
#6     Ken       11.2         11.1         11.0     11.0
#7    Hulk       12.0           NA         12.1     12.0

How do I get additional columns into the dataframe that tell me the factor of all prices relatively to the minimum price in percentage? The new column names should also include the shop name.

The result should look like this:

#  article priceToys1 PercToys1 priceAllToys PercAllToys price123Toys Perc123Toys MinPrice
#1     Fix       10.0     100.0           NA          NA         12.0       120.0     10.0
#2    Foxi         NA        NA         11.4       100.0         12.4       108.8     11.4
#3    Stan       10.5     100.0           NA          NA         12.7       121.0     10.5
#4    Olli         NA        NA         11.9       100.0           NA          NA     11.9
#5  Barbie       10.7     100.0         11.7       109.4           NA          NA     10.7
#6     Ken       11.2     101.8         11.1       100.9         11.0       100.0     11.0
#7    Hulk       12.0     100.0           NA          NA         12.1       100.8     12.0

Solution

  • Two possible solutions:

    1) With the data.table-package:

    # load the 'data.table'-package
    library(data.table)
    
    # get the columnnames on which to operate
    cols <- names(dfData)[2:4] # or: grep("price", names(dfData), value = TRUE)
    
    # convert dfData to a 'data.table'
    setDT(dfData)
    
    # compute the 'fraction'-columns
    dfData[, paste0('Perc', gsub('price','',cols)) := lapply(.SD, function(x) round(100 * x / MinPrice, 1))
           , .SDcols = cols][]
    

    which gives:

       article priceToys1 priceAllToys price123Toys MinPrice PercToys1 PercAllToys Perc123Toys
    1:     Fix       10.0           NA         12.0     10.0     100.0          NA       120.0
    2:    Foxi         NA         11.4         12.4     11.4        NA       100.0       108.8
    3:    Stan       10.5           NA         12.7     10.5     100.0          NA       121.0
    4:    Olli         NA         11.9           NA     11.9        NA       100.0          NA
    5:  Barbie       10.7         11.7           NA     10.7     100.0       109.3          NA
    6:     Ken       11.2         11.1         11.0     11.0     101.8       100.9       100.0
    7:    Hulk       12.0           NA         12.1     12.0     100.0          NA       100.8
    

    2) With base R:

    cols <- names(dfData)[2:4] # or: grep("price", names(dfData), value = TRUE)
    
    dfData[, paste0('Perc', gsub('price','',cols))] <- round(100 * dfData[, cols] / dfData$MinPrice, 1)
    

    which will get you the same result.