Search code examples
rdataframefinancerate

Get the rate of change by finding the change in price


UPDATE: I'm getting a strange result in the outcome. Occasionally, the earliest date of the result show after 2 or 3 etc times for example

Item Kg Date_1 Price_1 change_1 Date_2 Price_2 change_2
Apples 1 2022-02-01 1 NA 2022-02-16 2 1
Meat NA NA NA NA 2022-02-03 1 NA

As you can see, meat is showing no change at first, but the result is showing in the second one. This occurs throughout the program. Any idea why?

I am fairly new to programming. I am working on my portfolio, and am looking at a dataset regarding the price of food from distribution centers to a grocery store. What I am looking at is a set of data with the price, item, and date of transaction. What I am looking for is to find the rate of change from the distribution center to the store, and when it happened.

Note: the price of the item changes from the distribution center.

Here is an example of what I am looking at:

Date Item Price Kg
01.02.2022 Apple $1.00 1
02.02.2022 Meat $4.00 1
03.02.2022 Fish $3.00 1
03.02.2022 Bread $1.00 1
15.02.2022 Meat $5.00 1
15.02.2022 Meat $3.00 1
16.02.2022 Apple $2.00 1
20.02.2022 Fish $3.00 1
25.02.2022 Apple $0.50 1

As you can see, the price for the same quantity for the same product changes randomly over time. What I would like to analyse is:

  1. The rate of change per item
  2. When the change occured

This is the ideal outcome:

item kg 1st_price 1st_price_date 2nd_price 2nd_price_date amount_of_change
Apple 1 $1.00 01.02.2022 $2.00 16.02.2022 +$1.00
Meat 1 $4.00 02.02.2022 $5.00 15.02.2022 +$1.00
Bread 1 $1.00 03.02.2022 N/A N/A N/A
Fish 1 $3.00 03.02.2022 $3.00 20.02.2022 +$0.00

#Continuing the table below. These columns would go to the right of the columns above. #Unfortunetly, StackOverflow was not able to create a table with everything together. #total_change is for the entire period

item 3rd_price 3rd_price_date amount_of_change change_duration_period total_change
Apple $0.50 25.02.2022 -$1.50 01.02.2022-25.02.2022 -$0.50
Meat $3.00 15.02.2022 -$2.00 02.02.2022-1502.2022 -$1.00
Bread N/A N/A N/A 03.02.2022-03.02-2022 +$0.00
Fish $3.00 20.02.2022 +$0.00 03.02.2022-20.02.2022 +$0.00

As you can see, some items can have more price changes per month than others depending on the item. Some items have drastic changes, some have no changes at all.

Presuming there are over 14,000 unique items what would you recommend to gather the data an place them in a table as seen in the "Ideal outcome" section?

I am still new to programming, please don't be too harsh!

Thanks!


Solution

  • Something like this?

    library(tidyverse)
    
    df %>%
      # convert Date to a date, and Price to a number
      mutate(Date = as.Date(Date, format = "%d.%m.%Y"),
             Price = parse_number(Price)) %>%
    
      # for each Item, arrange by Date, tally, and calc price change
      group_by(Item) %>%
      arrange(Date) %>%
      mutate(appearance = row_number(),
             change = Price - lag(Price)) %>%
      ungroup() %>%
    
      # use the tally to reshape wider the date, price and change
      pivot_wider(names_from = appearance, 
                  values_from = c(Date, Price, change),
                  names_vary = "slowest")
    

    Result

    # A tibble: 4 × 11
      Item     Kg Date_1     Price_1 change_1 Date_2     Price_2 change_2 Date_3     Price_3 change_3
      <chr> <int> <date>       <dbl>    <dbl> <date>       <dbl>    <dbl> <date>       <dbl>    <dbl>
    1 Apple     1 2022-02-01       1       NA 2022-02-16       2        1 2022-02-25     0.5     -1.5
    2 Meat      1 2022-02-02       4       NA 2022-02-15       5        1 2022-02-15     3       -2  
    3 Fish      1 2022-02-03       3       NA 2022-02-20       3        0 NA            NA        0  
    4 Bread     1 2022-02-03       1       NA NA              NA        0 NA            NA        0  
    

    Source data

    df <- data.frame(
      stringsAsFactors = FALSE,
                  Date = c("01.02.2022","02.02.2022",
                           "03.02.2022","03.02.2022","15.02.2022","15.02.2022",
                           "16.02.2022","20.02.2022","25.02.2022"),
                  Item = c("Apple","Meat","Fish",
                           "Bread","Meat","Meat","Apple","Fish","Apple"),
                 Price = c("$1.00","$4.00","$3.00",
                           "$1.00","$5.00","$3.00","$2.00","$3.00","$0.50"),
                    Kg = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)
    )