Search code examples
rdata-structurestext-parsing

Parsing a multi space-separated data set and storing it in the right data structure


I have a large dataset with name, age and company.

file.txt :

name firstname1 lastname1
age 30
Company ABC Ltd

name firstname2 lastname2
age 28
Company XYZ Ltd

I need to write a function that will return data structure, given key attribute, provide the corresponding value of the given key.

E.g

 content <- parseFile("file.txt")
 content[1]["name"]    # "firstname1 lastname1"
 content[1]["age"]     # 30
 content[1]["Company"] # "ABC Ltd"

 content[2]["name"]    # "firstname2 lastname2"
 content[2]["age"]     # 28
 content[2]["Company"] # "XYZ Ltd"

Up until now, I inferred that a list of the named vector can be used or
A list of objects can be used.

Or Is there any better way to solve this?

explanation with code example will be helpful


Solution

  • We can use readLines to get the data, create a delimiter with sub and create a two column data.frame

    df1 <- read.csv(text =sub(" ", ",", dat), header = FALSE,
             stringsAsFactors = FALSE)
    

    If we need to split as a list

    lst1 <-  split(setNames(as.list(df1$V2), df1$V1), cumsum(df1$V1 == 'name'))
    
    
    lst1[[1]][['name']]
    #[1] "firstname1 lastname1"
    lst1[[1]][['age']]
    #[1] "30"
    lst1[[2]][['age']]
    #[1] "28"
    

    data

    dat <- readLines("file.txt")