Search code examples
rtostringread.table

Using table.read has an issue importing in R


I believe this should be an easy questions, but I can't seem to find what I am doing wrong? I am importing a .txt file, it is getting parsed out correctly, however, I cant access the contents of each cell in the dataframe as a string. The reason I want them as a string is because id like to make an array with all the values.

I've added the code below to reproduce the issue, with the exact same dataset.

data <-read.delim('https://acfdata.coworks.be/cancerdrugsdb.txt',header = TRUE)
data$Targets[1]

Results:

'CDK6; CDK4; CCND1; CCND3; CDKN2A; NRAS; CCND2; SMARCA4; KRAS'

class(data$Targets[1])
'character'

Wanted results

class(data$Targets[1]) = string

I've tried importing with various functions, and have tried the toString() function but it is still a character. Again, maybe there is a different way to do this, but without the string I cant separate

'CDK6; CDK4; CCND1; CCND3; CDKN2A; NRAS; CCND2; SMARCA4; KRAS'

'CDK6, CDK4, CCND1, CCND3, CDKN2A, NRAS, CCND2, SMARCA4, KRAS'

Any help with be appreciated.

Ultimately, I want multiple arrays that have an entry per row.

Thanks again.


Solution

  • Are you trying to 'split' the Targets column into individual values? I.e.

    library(tidyverse)
    
    data <-read.delim('https://acfdata.coworks.be/cancerdrugsdb.txt',header = TRUE)
    
    max_number_of_fields <- data %>%
      mutate(Targets = str_count(string = Targets, pattern = ";")) %>%
      summarise(fields = max(Targets, na.rm = TRUE))
    max_number_of_fields$fields
    #> [1] 68
    
    long_df <- data %>%
      relocate(Targets, .after = last_col()) %>%
      separate(Targets, into = paste0("Target_", 1:(max_number_of_fields$fields + 1))) %>%
      pivot_longer(-c(1:14),
                   values_to = "Targets") %>%
      filter(!is.na(Targets)) %>%
      select(-name)
    #> Warning: Expected 69 pieces. Missing pieces filled with `NA` in 283 rows [1, 2,
    #> 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
    
    select(long_df, c(Product, Targets))
    #> # A tibble: 2,923 × 2
    #>    Product     Targets
    #>    <chr>       <chr>  
    #>  1 Abemaciclib CDK6   
    #>  2 Abemaciclib CDK4   
    #>  3 Abemaciclib CCND1  
    #>  4 Abemaciclib CCND3  
    #>  5 Abemaciclib CDKN2A 
    #>  6 Abemaciclib NRAS   
    #>  7 Abemaciclib CCND2  
    #>  8 Abemaciclib SMARCA4
    #>  9 Abemaciclib KRAS   
    #> 10 Abiraterone CYP17A1
    #> # … with 2,913 more rows
    

    Created on 2022-03-22 by the reprex package (v2.0.1)