Search code examples
rdataframereadfile

Read in a continuous text file into data.frame


I have a text file that only has one column. It's like:

sample1

color 12
length 34
validity 90



sample2

color 15
length 20
validity 120



sample3

color 34
validity 79

There are 3 lines between samples, and 1 line between sample id and its attribute. Also, for sample3, the length record is missing.

I want to read this file into an R data.frame so that it looks like:

       sample1   sample2   sample3
color    12        15        34
length   34        20        NA
validity 90        120       79

Solution

  • You've got a data cleaning problem. Here is my solution for you.

    I copied and pasted your "TXT" file into a blank TextEdit document on Mac, and saved it as file.txt. The order as shown in your "TXT" file is required:

    data <- unlist(read.table("file.txt", header=F, sep="\t", stringsAsFactors=F), use.names=F)
    data
    
    sample_names <- data[grep("sample", data), drop=T]
    sample_names 
    ## [1] "sample1" "sample2" "sample3"
    
    color <- data[grep("color", data), drop=T]
    color
    ## "color 12" "color 15" "color 34"
    
    length <- data[grep("length", data), drop=T]
    length #note missing term, and requires manual coding
    ## [1] "length 34" "length 20"
    
    length <- c(length, NA)
    length
    ## [1] "length 34" "length 20" NA   
    
    validity <- data[grep("validity", data), drop=T]
    validity
    ## [1] "validity 90"  "validity 120" "validity 79" 
    
    ## Assemble into data.frame:
    assembled_df <- rbind(color, length, validity)
    colnames(assembled_df) <- sample_names #update column names
    assembled_df
    ##          sample1       sample2        sample3      
    ## color    "color 12"    "color 15"     "color 34"   
    ## length   "length 34"   "length 20"    NA           
    ## validity "validity 90" "validity 120" "validity 79"
    

    Note that the code might not be generalizable. It is a matter of what the actual TXT file would look like. What's important is to learn to 1) know your data (which you do), 2) come up with a strategy, 3) and then a solution.