Search code examples
rdataframedplyrrows

Create new rows based on values in a column and assign these new rows a new value?


I have a data frame with 732 unique observations of 25 variables. What I would like to do is, assuming the reference and joining MeSH ID are not the same, create new rows below where I move the reference ID into the same column as the Joining MeSH ID, deleting the reference MeSH ID column and assigning a LinScore of 1 to the new row containing the reference MeSH ID. So if a row in the table currently looks something like this:

Disease Joining.Mesh.ID Company MoA Reference.MeSH.ID LinScore
Acute Myeloid Leukemia D015470 GSK VGF-B agonist D007951 0.9625

I want the final product to look something like this:

Disease Mesh ID Company Mode of Action LinScore
Acute Myeloid Leukemia D015470 GSK VGF-B agonist 0.9625
Acute Myeloid Leukemia D007951 GSK VGF-B agonist 1

I've asked a previous question that relates to this, and have some code which can create the new rows using pivot_longer() in dplyr, but I don't know how to assign the LinScore of 1 to these new rows.

As always, I'd be grateful for any suggestions!


Solution

  • You can pivot_longer the columns that contain ID, then use if_else to set the LinScore equal to 1 for reference rows

    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    library(tidyr)
    
    df <- structure(list(Disease = "Acute Myeloid Leukemia", Joining.Mesh.ID = "D015470", 
                         Company = "GSK", MoA = "VGF-B agonist", Reference.MeSH.ID = "D007951", 
                         LinScore = 0.9625), row.names = c(NA, -1L), class = "data.frame")
    
    df %>% 
      pivot_longer(cols = ends_with('.ID'),
                    values_to = 'mesh_id') %>% 
      relocate(mesh_id, .after = Disease) %>%
      mutate(LinScore = if_else(grepl('Reference', name), 1, LinScore)) %>% 
      select(-name) 
    #> # A tibble: 2 x 5
    #>   Disease                mesh_id Company MoA           LinScore
    #>   <chr>                  <chr>   <chr>   <chr>            <dbl>
    #> 1 Acute Myeloid Leukemia D015470 GSK     VGF-B agonist    0.962
    #> 2 Acute Myeloid Leukemia D007951 GSK     VGF-B agonist    1
    

    Created on 2021-07-13 by the reprex package (v2.0.0)