Search code examples

Parsing a multi space-separated data set and storing it in the right data structure

I have a large dataset with name, age and company.

file.txt :

name firstname1 lastname1
age 30
Company ABC Ltd

name firstname2 lastname2
age 28
Company XYZ Ltd

I need to write a function that will return data structure, given key attribute, provide the corresponding value of the given key.


 content <- parseFile("file.txt")
 content[1]["name"]    # "firstname1 lastname1"
 content[1]["age"]     # 30
 content[1]["Company"] # "ABC Ltd"

 content[2]["name"]    # "firstname2 lastname2"
 content[2]["age"]     # 28
 content[2]["Company"] # "XYZ Ltd"

Up until now, I inferred that a list of the named vector can be used or
A list of objects can be used.

Or Is there any better way to solve this?

explanation with code example will be helpful


  • We can use readLines to get the data, create a delimiter with sub and create a two column data.frame

    df1 <- read.csv(text =sub(" ", ",", dat), header = FALSE,
             stringsAsFactors = FALSE)

    If we need to split as a list

    lst1 <-  split(setNames(as.list(df1$V2), df1$V1), cumsum(df1$V1 == 'name'))
    #[1] "firstname1 lastname1"
    #[1] "30"
    #[1] "28"


    dat <- readLines("file.txt")