I have to achieve 3 things :
Calculate Col : "Price" in even divided by the the numbers of row splitted.
# Dataset call df
Name Fix Price Date
Tom 600 500 2016-01-01
John 800 400 2016-01-03;2016-01-09
Mary 1100 1000 2016-01-04;2017-09-01;2017-08-10;2018-01-01
Joe 30 25 2016-01-04;2017-09-01
Paul 400 304 2015-01-02
Alex 505 238 2018-01-02;2018-02-02
# Targeted df
Name Fix Price Date
Tom 600 500 2016-01-01
John 800 400 2016-01-03;2016-01-09
Mary 1100 250 2016-01-04
Mary 0 250 2017-09-01
Mary 0 250 2017-08-10
Mary 0 250 2018-01-01
Joe 30 12.5 2016-01-04
Joe 0 12.5 2017-09-01
Paul 400 304 2015-01-02
Alex 505 238 2018-01-02;2018-02-02
I have an approach to achieve 1 and 3 , but i cannot make 2 that i don't know any the Diverse function of fill().
# Find the SPLIT TARGET first :
inds <- sapply(strsplit(df$Date, ";"), function(x)
length(unique(format(as.Date(x), "%Y"))) > 1) ### This approach actually
does not works in my actual data when over 1 Million rows , i chunk it into a smaller data
to fit this approach's limit.
library(tidyverse)
library(stringr)
#Select those indices
df[inds, ] %>% mutate(Price = Price / (str_count(Date,";") + 1)) %>%
separate_rows(Date, sep = ";") %>%
bind_rows(df[!inds,])
*Please remind you cannot use Col : " Name " for something as they just represent that the dataset has other values those have to copy by separate_rows
How can i clear point 2 ? Its drive me crazy .Thanks for any comments in advances
dt = read.table(text = "
Name Fix Price Date
Tom 600 500 2016-01-01
John 800 400 2016-01-03;2016-01-09
Mary 1100 1000 2016-01-04;2017-09-01;2017-08-10;2018-01-01
Joe 30 25 2016-01-04;2017-09-01
Paul 400 304 2015-01-02
Alex 505 238 2018-01-02;2018-02-02
", header=T, stringsAsFactors=F)
library(tidyverse)
library(lubridate)
dt %>%
separate_rows(Date, sep=";") %>% # separate dates
group_by(Name, year = year(ymd(Date))) %>% # for each Name and year of the date
summarise(Fix = unique(Fix), # keep Fix
Price = unique(Price), # keep Price
Date = paste0(Date, collapse = ";")) %>% # combine dates with same year
mutate(Fix = ifelse(row_number() > 1, 0, Fix), # update Fix values
Price = Price/length(Price)) %>% # divide Price by number of rows
ungroup() # forget the grouping
# # A tibble: 9 x 5
# Name year Fix Price Date
# <chr> <dbl> <dbl> <dbl> <chr>
# 1 Alex 2018 505 238 2018-01-02;2018-02-02
# 2 Joe 2016 30 12.5 2016-01-04
# 3 Joe 2017 0 12.5 2017-09-01
# 4 John 2016 800 400 2016-01-03;2016-01-09
# 5 Mary 2016 1100 333. 2016-01-04
# 6 Mary 2017 0 333. 2017-09-01;2017-08-10
# 7 Mary 2018 0 333. 2018-01-01
# 8 Paul 2015 400 304 2015-01-02
# 9 Tom 2016 600 500 2016-01-01