Search code examples
rdataframefixed-width

How can I create a DataFrame with separate columns from a fixed width character vector input in R?


I have a fixed width character vector input called "text" that looks something like this:

[1] "           Report"
[2] "Group        ID           Name"
[3] "Number"
[4] "AA          A134          abcd"
[5] "AB          A123          def"
[6] "AC          A345          ghikl"
[7] "BA          B134          jklmmm"
[8] "AD          A987          mn"

I need to create a standard DataFrame. My approach is to first create a text file and then use the read.fwf function to create a clean DataFrame from a fixed width text file input. What I have works, but it forces me to create a text file in my working directory and then read it back in as a fwf:

> cat(text, file = "mytextfile", sep = "\n", append = TRUE)
> read.fwf("mytextfile", skip = 3, widths = c(12, 14, 20))

Is it possible to achieve the same result without saving the intermediate output to my working directory? I tried using paste() and capture.output() without success. While

x = paste(text, collapse = "\n")

seemed to work at first, but when I passed it to

read.fwf(x, skip = 3, widths = c(12, 14, 20))

I got

Error in file(file, "rt") : cannot open the connection
In addition: Warning Message:
In file(file, "rt") : cannot open file '

and capture.output() got me back to square one, a character vector. Any advice is greatly appreciated. Thank you.


Solution

  • You can use textConnection to read file as text in read.fwf and supply the widths.

    data <- read.fwf(textConnection(text), 
                     widths = c(12, 14, 20), strip.white = TRUE, skip = 3)
    data
    #  V1   V2     V3
    #1 AA A134   abcd
    #2 AB A123    def
    #3 AC A345  ghikl
    #4 BA B134 jklmmm
    #5 AD A987     mn
    

    data

    text <- c("           Report", "Group        ID           Name", "Number", 
    "AA          A134          abcd", "AB          A123          def", 
    "AC          A345          ghikl", "BA          B134          jklmmm", 
    "AD          A987          mn")