Search code examples
rloopsconditional-statementsfill

conditionally fill dataframe column based on a range of values from 2 columns


I currently have this loop to trim rows from a dataset (df_2) based on a range of indices, the start and end indices for the sections to include being taken from 2 columns in df_3, and create a new file (df).

for(i in 1:nrow(df_3)){
  if (i==1) df <- df_2[df_3$start[i]:df_3$end[i],]
  else df <- rbind(df,df_2[df_3$start[i]:df_3$endi],])
}

Each section has a value associated with it, which is contained in column 3 of df_3. I want to create a new column in df that repeats the values associated with that section.

Would really appreciate some assistance here feel free to ask for clarification - was as succinct as I could make it!

As suggested by Joran - here are some examples

DF

index  new_column
0     
1
2
3
4
5
6
7
8
9
10

DF_3

start  _end  new_column_values

0      3     1
4      6     2
7      10    3

Solution

  • If I understand your question correctly, you might be able to use cut as follows:

    DF$new_column <- cut(DF$index, 
                         breaks = c(DF_3$start[1], DF_3$end), 
                         include.lowest = TRUE, 
                         labels = DF_3$new_column_values)
    DF
       index new_column
    1      0          1
    2      1          1
    3      2          1
    4      3          1
    5      4          2
    6      5          2
    7      6          2
    8      7          3
    9      8          3
    10     9          3
    11    10          3
    

    In this, I'm trying to make use of the available information. We are basically creating a factor for DF$index and the factor levels are determined by ranges found in another data.frame. Thus, for cut, I've set breaks to be a vector comprising the first start value and all the end values, and I've set the "labels" to be the values from the "new_column_values" variable.

    Note that the resulting "new_column" is not (in the current form) a numeric variable, but a factor.