Search code examples
rfileloadspacesright-align

read txt files with left-aligned data but inconsistent number of spaces in R


I have a series of txt files formatted in the same way. The first few rows are all about file information. There are no variable names. As you can see spaces between factors are inconsistent but Columns are left-aligned or right-aligned.I know SAS could directly read data with this format and wonder if R provide any function similar.

I tried read.csv function to load these data and I want to save them in a data.frame with 3 columns, while it turns out the option sep = "\s"(multiple spaces) in the function cannot recognize regular expression.

So I tried to read these data in a variable first and use substr function to split them as following. step1

 Factor<-data.frame(substr(Share$V1,1,9),substr(Share$V1,9,14),as.numeric(substr(Share$V1,15,30)))

step2

But this is quite unintelligent, and need to count the spaces between. I wander if there is any method to directly load data as three columns.

    > Factor
   F  T      S
1   +B2P       A     1005757219
2   +BETA      A      826083789

Solution

  • We can use read.table to read it as 3 columns

    read.table(text=as.character(Share$V1), sep="", header=FALSE, 
                     stringsAsFactors=FALSE, col.names = c("FactorName", "Type", "Share"))
    #  FactorName Type      Share
    #1       +B2P    A 1005757219
    #2      +BETA    A  826083789
    #3       +E2P    A  499237181
    #4      +EF2P    A   38647147
    #5     +EFCHG    A  866171133
    #6    +IL1QNS    A  945726018
    #7    +INDMOM    A  862690708
    

    Another option would be to read it directly from the file, skipping the header line and change the column names

    read.table("yourfile.txt", header=FALSE, skip=1, stringsAsFactors=FALSE,
                  col.names = c("FactorName", "Type", "Share"))