Search code examples
rdataframesplitlong-format-data

Split long format data frame by character indicator


I have data in long format, where data frames are essentially stacked on top of each other. I want to seperate them before changing to wide format. I need some way of telling R to recognize "Sell curve" and "Buy curve" as indicators that seperate data frames, while keeping the column names.

Columns <- c("Buy curve", "Price value", "Volume value", "Price value", "Volume value","Price value", "Volume value",
             "Sell curve", "Price value", "Volume value", "Price value", "Volume value","Price value", "Volume value")
Values <- c(NA, 0, 10, 3, 4, 5, 0, NA, 0, 0, 1, 2, 4, 8)

df1 <- data.frame(Columns, Values)

> df1
        Columns Values
1     Buy curve     NA
2   Price value      0
3  Volume value     10
4   Price value      3
5  Volume value      4
6   Price value      5
7  Volume value      0
8    Sell curve     NA
9   Price value      0
10 Volume value      0
11  Price value      1
12 Volume value      2
13  Price value      4
14 Volume value      8

In the table above, notice how "Buy curve" and "Sell curve" simply indicate the start of the two different data frames with similar columns.

To deal with a large amount of data in this format, I need some function that will automatically recognize the two different sets of data. The following code will show the result I want for the "Sell curve" data frame, and is the closest I've come to a solution.

library(dplyr)
Sell_Curve <- df1 %>% slice(which.max(df1 == "Sell curve") : n())

Of course, this simply discards the "Buy curve" which is not what I really want.


Solution

  • split(df1, cumsum(Columns == 'Sell curve'))
    $`0`
           Columns Values
    1    Buy curve     NA
    2  Price value      0
    3 Volume value     10
    4  Price value      3
    5 Volume value      4
    6  Price value      5
    7 Volume value      0
    
    $`1`
            Columns Values
    8    Sell curve     NA
    9   Price value      0
    10 Volume value      0
    11  Price value      1
    12 Volume value      2
    13  Price value      4
    14 Volume value      8