Search code examples
rfile-format

How do I use .prt files in R?


I am trying to use climate data for analysis using R code. However, I came across a data format for which I cannot find any documentation.

The .prt extension is used for many applications but I believe mine is a Printer-formatted file.

It has no proper delimiters and it cannot be processed by any other application but I can easily view it in a text editor. Because of the nature of the climate data, processing it in C or Python would be very cumbersome.

Kindly help me to read this file into R or to convert it to a file format readable in R.

EDIT:

The data in the prt file is in the format below. As you can see, it follows a map of India with no proper format or delimiters. Each file consists of certain climate values for each day of the year. I have 53 such files.:

  Day= 1-Jan
       66.5E 67.5E 68.5E 69.5E 70.5E 71.5E 72.5E 73.5E 74.5E 75.5E 76.5E 77.5E 78.5E 79.5E 80.5E 81.5E 82.5E 83.5E 84.5E 85.5E 86.5E 87.5E 88.5E 89.5E 90.5E 91.5E 92.5E 93.5E 94.5E 95.5E 96.5E 97.5E
 37.5N                                                                                                                                                                                                
 36.5N                                       0.0   0.0   0.0   0.0               0.0   0.0                                                                                                            
 35.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                      
 34.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                            
 33.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                            
 32.5N                                                   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                            
 31.5N                                                   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                            
 30.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                      
 29.5N                                       0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                 0.0   0.0   0.0      
 28.5N                           0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  12.0   8.8                                       0.0               0.0   0.0   0.0   0.0   0.0   0.0   0.0
 27.5N                     0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                     0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0
 26.5N                           0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0            
 25.5N                           0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0         0.0   0.0   0.0   0.0   0.0                  
 24.5N               0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0               0.0   0.0   0.0   0.0                  
 23.5N               0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0               0.0   0.0   0.0                        
 22.5N                     0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                     0.0                              
 21.5N                     0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                      
 20.5N                           0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                  
 19.5N                                       0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                        
 18.5N                                       0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                              
 17.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                    
 16.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                          
 15.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                      
 14.5N                                                   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                      
 13.5N                                                   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                      
 12.5N                                                   0.0   0.0   0.0   0.0   0.0   0.0   1.4                                                                                                      
 11.5N                                                         0.0   0.0   0.0   0.0   0.5                                                                                                            
 10.5N                                                         0.0   0.0   0.0   0.0   0.0                                                                                                            
  9.5N                                                               0.0   0.0   0.0   2.4                                                                                                            
  8.5N                                                               0.0   0.3   2.5                                                                                                                  

  Day= 2-Jan

I've tried the method suggested in the comments and this is the output I received. But this is not the output I require. I need each of the values under the latitude-longitude to be separate, not as part of an array element.

> 
[1] "  Day= 1-Jan"                                                                                                                                                                                          
 [2] "       66.5E 67.5E 68.5E 69.5E 70.5E 71.5E 72.5E 73.5E 74.5E 75.5E 76.5E 77.5E 78.5E 79.5E 80.5E 81.5E 82.5E 83.5E 84.5E 85.5E 86.5E 87.5E 88.5E 89.5E 90.5E 91.5E 92.5E 93.5E 94.5E 95.5E 96.5E 97.5E"
 [3] " 37.5N                                                                                                                                                                                                "
 [4] " 36.5N                                       0.0   0.0   0.0   0.0               0.0   0.0                                                                                                            "
 [5] " 35.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                      "
 [6] " 34.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                            "
 [7] " 33.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                            "
 [8] " 32.5N                                                   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                            "
 [9] " 31.5N                                                   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                            "
[10] " 30.5N                                             0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                                      "
[11] " 29.5N                                       0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                                 0.0   0.0   0.0      "
[12] " 28.5N                           0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0  12.0   8.8                                       0.0               0.0   0.0   0.0   0.0   0.0   0.0   0.0"
[13] " 27.5N                     0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                     0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0"
[14] " 26.5N                           0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0            "
[15] " 25.5N                           0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0         0.0   0.0   0.0   0.0   0.0                  "
[16] " 24.5N               0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0               0.0   0.0   0.0   0.0                  "
[17] " 23.5N               0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0               0.0   0.0   0.0                        "
[18] " 22.5N                     0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                     0.0                              "
[19] " 21.5N                     0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                      "
[20] " 20.5N                           0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0                                                                  "

Solution

  • Well, your data format is quite irregular. I wasn't sure if you just had one date in each file or if there were multiple (your example seems to have a second day starting at the bottom). But assuming the latter (which should work for the first scenario as well) here's one strategy using readLines() to just get the data in, then extracting the data of interest with read.fwf

    lines <- readLines("test.prt")
    days <- grep("Day=", xx)
    
    outlist <- lapply(days, function(day){
        headers <- strsplit(gsub("^\\s+","",lines[day+1])," ")[[1]]
        date <- gsub(".*Day= ", "", lines[day], perl=T)
        con <- textConnection(lines[day:(day+30)+1])
        dd <- read.fwf(con, widths=rep(6, 33), header=F, skip=1)
        names(dd) <- c("lat", headers)
        close(con)
        dd<-reshape(dd, idvar="lat", ids="lat",
            times=names(dd)[-1], timevar="lon", 
            varying=list(names(dd)[-1]), v.names="obs",
            direction="long")
        dd <- cbind(date=date, dd)
        dd <- subset(dd, !is.na(obs))
        rownames(dd)<-NULL
        dd
    })
    do.call(rbind, outlist)
    

    So we read all the lines in, then find all the "Day=" positions. Then we read the headers from the next line, and then we create a textConnection() to read the rest of the data in with read.fwf() (which apparently does not have a text= parameter). Next, I reshape the data so that you get one row for each lat/lon. I chose to also merge in the data from the section header and to remove the missing values. Finally, after I have a data.frame for each list, I rbind all the data together. The results look like this

       date    lat   lon obs
    1 1-Jan  24.5N 68.5E   0
    2 1-Jan  23.5N 68.5E   0
    3 1-Jan  27.5N 69.5E   0
    4 1-Jan  24.5N 69.5E   0
    5 1-Jan  23.5N 69.5E   0
    6 1-Jan  22.5N 69.5E   0