Search code examples
rdata-analysis

Reading in columns without clear separators with read.table


Hello I'm loading a data file which is formated as a table separated with multispaces. Ordinarily it is easily loaded via read.table(data_file, sep = "", header = T, fill = T), but some values are not divided with spaces in case they are negative:

 523.2   -166.1      1.62 0.079         0.0      0.0      0.0   2260        0
 528.4   -168.6     -0.71-0.034         0.0      0.0      0.0   2284        0
 533.9   -169.7     -1.75-0.085         0.0      0.0      0.0   2308        0
 538.4   -169.5     -1.60-0.078         0.0      0.0      0.0   2333        0
 543.3   -170.8     -2.83-0.137         0.0      0.0      0.0   2357        0
 548.2   -171.8     -3.77-0.183         0.0      0.0      0.0   2381        0
 552.8   -172.1     -3.87-0.187         0.0      0.0      0.0   2406        0
 554.9   -172.5     -4.23-0.205         0.0      0.0      0.0   2430        0

Then whole part eg -3.77-0.183 is taken as a single value. What is convenient way to cope with this without preliminary file conversion using other scripts. Thanks in advance!


Solution

  • One way would be:

     lines <- readLines("datN.txt")  #read your data using `readLines` 
     lines1 <- gsub("(?<=[0-9])((-|\\s)[0-9]+)", " \\1", lines, perl=TRUE)
    
     dat <- read.table(text=lines1, sep="", header=FALSE)
     dat
     #     V1     V2    V3     V4 V5 V6 V7   V8 V9
     #1 523.2 -166.1  1.62  0.079  0  0  0 2260  0
     #2 528.4 -168.6 -0.71 -0.034  0  0  0 2284  0
     #3 533.9 -169.7 -1.75 -0.085  0  0  0 2308  0
     #4 538.4 -169.5 -1.60 -0.078  0  0  0 2333  0
     #5 543.3 -170.8 -2.83 -0.137  0  0  0 2357  0
     #6 548.2 -171.8 -3.77 -0.183  0  0  0 2381  0
     #7 552.8 -172.1 -3.87 -0.187  0  0  0 2406  0
     #8 554.9 -172.5 -4.23 -0.205  0  0  0 2430  0
    
     str(dat)
     #'data.frame': 8 obs. of  9 variables:
     #$ V1: num  523 528 534 538 543 ...
     #$ V2: num  -166 -169 -170 -170 -171 ...
     #$ V3: num  1.62 -0.71 -1.75 -1.6 -2.83 -3.77 -3.87 -4.23
     #$ V4: num  0.079 -0.034 -0.085 -0.078 -0.137 -0.183 -0.187 -0.205
     #$ V5: num  0 0 0 0 0 0 0 0
     #$ V6: num  0 0 0 0 0 0 0 0
     #$ V7: num  0 0 0 0 0 0 0 0
     #$ V8: int  2260 2284 2308 2333 2357 2381 2406 2430
     #$ V9: int  0 0 0 0 0 0 0 0