Search code examples
rdataframeperformanceautomation

Creating an R workflow to loop through unique entries in a table and edit a data frame based on conditions


I am working with plant survey data that has 5 nested subplots within a plot. Users record each plant species and the smallest subplot the species was found in (A being the smallest and E being the biggest). With nested subplots, if it was found in the smallest subplot (A), it was technically found in all subplots (A:E). I need to create an automated workflow that considers each plant in each plot, looks at which subplot it was found, and then fills in the appropriate 1's and 0's.

Here is code to produce an example table:

plants<- c("plant_a","plant_b")
subplot<- c("A","B","C","D","E")

df<- data.frame(plot=rep(1:2,times=1,each=10),
                plant=rep(plants,times=2,each=5),
                subplot=rep(subplot, times=4),
                occurence=NA)

df[c(1,7,13,19),4]<- 1

Since plant_a was recorded in subplot A in plot 1, subplot A:E for this species and plot combination need to have 1 inserted into the occurrence column. For plant_b in plot 1, it was found in subplot B and thus B:E need a 1 for occurrence and a 0 in subplot A. In the real dataset I will have many plants species that occur in multiple plots, but at different subplots. Hence enters my need to develop some kind of workflow

To be honest this level of data manipulation is beyond my limited skill level. I suspect I need a combination of for loop to cycle through the unique plot/species combinations but I get lost after trying to think through how to code out the conditional language for the subplots. Any and all help would be greatly appreciated!


Solution

  • You can group_by the combinations of plots and plants, and then use the fill function from tidyr.

    library(dplyr)
    library(tidyr)
    
    df %>%
      group_by(plot, plant) %>%
      fill(occurence) %>%
      replace_na(list(occurence = 0))
    #> # A tibble: 20 × 4
    #> # Groups:   plot, plant [4]
    #>     plot plant   subplot occurence
    #>    <int> <chr>   <chr>       <dbl>
    #>  1     1 plant_a A               1
    #>  2     1 plant_a B               1
    #>  3     1 plant_a C               1
    #>  4     1 plant_a D               1
    #>  5     1 plant_a E               1
    #>  6     1 plant_b A               0
    #>  7     1 plant_b B               1
    #>  8     1 plant_b C               1
    #>  9     1 plant_b D               1
    #> 10     1 plant_b E               1
    #> 11     2 plant_a A               0
    #> 12     2 plant_a B               0
    #> 13     2 plant_a C               1
    #> 14     2 plant_a D               1
    #> 15     2 plant_a E               1
    #> 16     2 plant_b A               0
    #> 17     2 plant_b B               0
    #> 18     2 plant_b C               0
    #> 19     2 plant_b D               1
    #> 20     2 plant_b E               1
    

    Created on 2023-08-22 with reprex v2.0.2